diff mbox series

[v6,07/11] x86/smpboot: Disable parallel boot for AMD CPUs

Message ID 20230202215625.3248306-8-usama.arif@bytedance.com (mailing list archive)
State New, archived
Headers show
Series Parallel CPU bringup for x86_64 | expand

Commit Message

Usama Arif Feb. 2, 2023, 9:56 p.m. UTC
From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/kernel/cpu/amd.c          | 11 +++++++++++
 arch/x86/kernel/smpboot.c          | 13 +++++++++++--
 3 files changed, 23 insertions(+), 2 deletions(-)

Comments

Kim Phillips Feb. 3, 2023, 7:48 p.m. UTC | #1
+Mario

Hi,

On 2/2/23 3:56 PM, Usama Arif wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---

I'd like to nack this, but can't (and not because it doesn't have
commit text):

If I:

  - take dwmw2's parallel-6.2-rc6 branch (commit 459d1c46dbd1)
  - remove the set_cpu_bug(c, X86_BUG_NO_PARALLEL_BRINGUP) line from amd.c

Then:

  - a Ryzen 3000 (Picasso A1/Zen+) notebook I have access to fails to boot.
  - Zen 2,3,4-based servers boot fine
  - a Zen1-based server doesn't boot.

This is what's left on its serial port:

[    3.199633] smp: Bringing up secondary CPUs ...
[    3.200732] x86: Booting SMP configuration:
[    3.204242] .... node  #0, CPUs:          #1
[    3.204301] CPU 1 to 93/x86/cpu:kick in 63 21 -114014307645 0 . 0 0 0 0 . 0 114025055970
[    3.204478] ------------[ cut here ]------------
[    3.204481] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
[    3.204490] Modules linked in:
[    3.204493] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc6+ #19
[    3.204496] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
[    3.204498] RIP: 0010:cpu_init+0x2d/0x1f0
[    3.204502] Code: e5 41 56 41 55 41 54 53 65 48 8b 1c 25 80 2e 1f 00 65 44 8b 35 20 e4 39 55 48 8b 05 5d f7 51 02 44 89 f2 f0 48 0f ab 10 73 06 <0f> 0b eb 02 f3 90 48 8b 05 3e f7 51 02 48 0f a3 10 73 f1 45 85 f6
[    3.204504] RSP: 0000:ffffffffac803d70 EFLAGS: 00010083
[    3.204506] RAX: ffff8d293eef6e40 RBX: ffff8d1d40010000 RCX: 0000000000000008
[    3.204508] RDX: 0000000000000000 RSI: ffff8d1d1c40b048 RDI: ffffffffac566418
[    3.204509] RBP: ffffffffac803d90 R08: 00000000fffffe14 R09: ffff8d1d1c406078
[    3.204510] R10: ffffffffac803dc0 R11: 0000000000000000 R12: 0000000000000000
[    3.204511] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    3.204512] FS:  0000000000000000(0000) GS:ffff8d1d1c400000(0000) knlGS:0000000000000000
[    3.204514] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.204515] CR2: 0000000000000000 CR3: 0000800daec12000 CR4: 00000000003100a0
[    3.204517] Call Trace:
[    3.204519] ---[ end trace 0000000000000000 ]---
[    3.204580] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0 APIC: 2
[    3.288686]    #2
[    3.288735] CPU 2 to 93/x86/cpu:kick in 210 42 -114355248756 0 . 0 0 0 0 . 0 114356192013
[    3.288798] ------------[ cut here ]------------
[    3.288804] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
[    3.288815] Modules linked in:
[    3.288819] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.2.0-rc6+ #19
[    3.288823] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
[    3.288826] RIP: 0010:cpu_init+0x2d/0x1f0
[    3.288831] Code: e5 41 56 41 55 41 54 53 65 48 8b 1c 25 80 2e 1f 00 65 44 8b 35 20 e4 39 55 48 8b 05 5d f7 51 02 44 89 f2 f0 48 0f ab 10 73 06 <0f> 0b eb 02 f3 90 48 8b 05 3e f7 51 02 48 0f a3 10 73 f1 45 85 f6
[    3.288835] RSP: 0000:ffffffffac803d70 EFLAGS: 00010083
[    3.288838] RAX: ffff8d293eef6e40 RBX: ffff8d1d40010000 RCX: 0000000000000008
[    3.288841] RDX: 0000000000000000 RSI: ffff8d1d1c40b048 RDI: ffffffffac566418
[    3.288844] RBP: ffffffffac803d90 R08: 00000000fffffe14 R09: ffff8d1d1c406078
[    3.288845] R10: ffffffffac803dc0 R11: 0000000000000000 R12: 0000000000000000
[    3.288848] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    3.288850] FS:  0000000000000000(0000) GS:ffff8d1d1c400000(0000) knlGS:0000000000000000
[    3.288852] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.288855] CR2: 0000000000000000 CR3: 0000800daec12000 CR4: 00000000003100a0
[    3.288857] Call Trace:
[    3.288859] ---[ end trace 0000000000000000 ]---
[    3.288925] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0 APIC: 8
6.36[   [    3. 68 33]3 [ #3[  [ #
                               [    3.368623[    3
  [    3.368623]    #3
[    3.368662] ------------[ cut here ]------------
[    3.368673] CPU 3 to 93/x86/cpu:kick in 504 315 -114684508974 0 . 0 0 0 0 . 0 114685353594
[    3.368705] BUG: scheduling while atomic: swapper/0/1/0x00000003
[    3.368708] 7 locks held by swapper/0/1:
[    3.368710]  #0: ffffffffacbff920 (console_lock){....}-{0:0}, at: vprintk_emit+0x13a/0x2e0
[    3.368721]  #1: ffffffffacbffd48 (console_srcu){....}-{0:0}, at: console_flush_all+0x2d/0x250
[    3.368728]  #2: ffffffffac87f540 (console_owner){....}-{0:0}, at: console_emit_next_record.constprop.22+0x189/0x350
[    3.368735]  #3: ffffffffadaae838 (&port_lock_key){....}-{2:2}, at: serial8250_console_write+0x88/0x3c0
[    3.368745]  #4: ffffffffac86aa50 (cpu_add_remove_lock){....}-{3:3}, at: cpu_up+0x6a/0xd0
[    3.368753]  #5: ffffffffac86a9a0 (cpu_hotplug_lock){....}-{0:0}, at: _cpu_up+0x3d/0x2f0
[    3.368760]  #6: ffffffffac8763b0 (smpboot_threads_lock){....}-{3:3}, at: smpboot_create_threads+0x21/0x80
[    3.368769] Modules linked in:
[    3.368770] Preemption disabled at:
[    3.368771] [<ffffffffaae717a4>] do_cpu_up+0x3e4/0x780
[    3.368777] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.2.0-rc6+ #19
[    3.368781] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
[    3.368782] Call Trace:
[    3.368783]  <TASK>
[    3.368789]  dump_stack_lvl+0x49/0x63
[    3.368795]  ? do_cpu_up+0x3e4/0x780
[    3.368799]  dump_stack+0x10/0x16
[    3.368802]  __schedule_bug+0xad/0xd0
[    3.368808]  __schedule+0x76/0x8a0
[    3.368812]  ? sched_clock+0x9/0x10
[    3.368817]  ? sched_clock_local+0x17/0x90
[    3.368826]  ? sort_range+0x30/0x30
[    3.368830]  schedule+0x88/0xd0
[    3.368833]  schedule_timeout+0x40/0x320
[    3.368840]  ? __this_cpu_preempt_check+0x13/0x20
[    3.368844]  ? lock_release+0x353/0x3c0
[    3.368852]  ? sort_range+0x30/0x30
[    3.368856]  wait_for_completion_killable+0xe0/0x1c0
[    3.368864]  __kthread_create_on_node+0xfe/0x1e0
[    3.368876]  ? wait_for_completion_killable+0x38/0x1c0
[    3.368884]  kthread_create_on_node+0x46/0x70
[    3.368894]  kthread_create_on_cpu+0x2c/0x90
[    3.368899]  __smpboot_create_thread+0x87/0x140
[    3.368905]  smpboot_create_threads+0x3f/0x80
[    3.368909]  ? idle_thread_get+0x40/0x40
[    3.368913]  cpuhp_invoke_callback+0x13c/0x5d0
[    3.368921]  __cpuhp_invoke_callback_range+0x69/0xf0
[    3.368929]  _cpu_up+0x12a/0x2f0
[    3.368937]  cpu_up+0x8f/0xd0
[    3.368942]  bringup_nonboot_cpus+0x7c/0x160
[    3.368950]  smp_init+0x2a/0x83
[    3.368957]  kernel_init_freeable+0x1a1/0x309
[    3.368961]  ? lock_release+0x353/0x3c0
[    3.368972]  ? rest_init+0x140/0x140
[    3.368977]  kernel_init+0x1a/0x130
[    3.368980]  ret_from_fork+0x22/0x30
[    3.368996]  </TASK>
[    3.369419]
[    3.369420] .... node  #1, CPUs:     #4
[    3.369466] ------------[ cut here ]------------
[    3.369469] CPU 4 to 93/x86/cpu:kick in 378 42 -114685407543 0 . 0 0 0 0 . 0 114687022569
[    3.369474] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
[    3.369487] Modules linked in:
[    3.369491] ------------[ cut here ]------------
[    3.369494] DEBUG_LOCKS_WARN_ON(val > preempt_count())
[    3.369493] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.2.0-rc6+ #19
[    3.369499] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018


...which points to the WARN_ON here:

static void wait_for_master_cpu(int cpu)
{
#ifdef CONFIG_SMP
         /*
          * wait for ACK from master CPU before continuing
          * with AP initialization
          */
         WARN_ON(cpumask_test_and_set_cpu(cpu, cpu_initialized_mask));
         while (!cpumask_test_cpu(cpu, cpu_callout_mask))
                 cpu_relax();
#endif
}

Let me know if you'd like me to test any changes.

Thanks,

Kim
Kim Phillips Feb. 3, 2023, 9:45 p.m. UTC | #2
On 2/3/23 2:19 PM, Woodhouse, David wrote:
> Would be interesting to know if that goes away if you test only the
> first part of the tree. My first inclination is to suspect that's a bug
> in the later patches... although the APIC ID mismatch is interesting.
> That part (with each AP getting its own APIC ID from CPUID) is in the
> *first* part of the series....

dwmw2/parallel-6.2-rc6-part1 (commit 942b3faa258c) re-enabled for
AMD gets the same splat(s):

[    3.195498] smp: Bringing up secondary CPUs ...
[    3.196670] x86: Booting SMP configuration:
[    3.200189] .... node  #0, CPUs:          #1
[    3.200247] CPU 1 to 93/x86/cpu:kick in 315 42 -155206575216 0 . 0 0 0 0 . 0 155217324423
[    3.200418] ------------[ cut here ]------------
[    3.200420] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
[    3.200428] Modules linked in:
[    3.200430] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc6+ #19
[    3.200433] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
[    3.200435] RIP: 0010:cpu_init+0x2d/0x1f0
[    3.200438] Code: e5 41 56 41 55 41 54 53 65 48 8b 1c 25 80 2e 1f 00 65 44 8b 35 20 e4 99 47 48 8b 05 5d f7 51 02 44 89 f2 f0 48 0f ab 10 73 06 <0f> 0b eb 02 f3 90 48 8b 05 3e f7 51 02 48 0f a3 10 73 f1 45 85 f6
[    3.200440] RSP: 0000:ffffffffba203d70 EFLAGS: 00010083
[    3.200443] RAX: ffff8e027eef6e40 RBX: ffff8dfe80064b80 RCX: 0000000000000008
[    3.200444] RDX: 0000000000000000 RSI: ffff8df65c40b048 RDI: ffffffffb9f66418
[    3.200445] RBP: ffffffffba203d90 R08: 00000000fffffe4d R09: ffff8df65c406078
[    3.200446] R10: ffffffffba203dc0 R11: 0000000000000000 R12: 0000000000000000
[    3.200447] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    3.200448] FS:  0000000000000000(0000) GS:ffff8df65c400000(0000) knlGS:0000000000000000
[    3.200450] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.200451] CR2: 0000000000000000 CR3: 0000800307e12000 CR4: 00000000003100a0
[    3.200453] Call Trace:
[    3.200454] ---[ end trace 0000000000000000 ]---
[    3.200509] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0 APIC: 2
[    3.284620]    #2
[    3.284669] CPU 2 to 93/x86/cpu:kick in 252 42 -155547668514 0 . 0 0 0 0 . 0 155548597197
[    3.284727] ------------[ cut here ]------------
[    3.284732] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
[    3.284741] Modules linked in:
[    3.284745] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.2.0-rc6+ #19
[    3.284749] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
[    3.284752] RIP: 0010:cpu_init+0x2d/0x1f0
[    3.284756] Code: e5 41 56 41 55 41 54 53 65 48 8b 1c 25 80 2e 1f 00 65 44 8b 35 20 e4 99 47 48 8b 05 5d f7 51 02 44 89 f2 f0 48 0f ab 10 73 06 <0f> 0b eb 02 f3 90 48 8b 05 3e f7 51 02 48 0f a3 10 73 f1 45 85 f6
[    3.284760] RSP: 0000:ffffffffba203d70 EFLAGS: 00010083
[    3.284764] RAX: ffff8e027eef6e40 RBX: ffff8dfe80064b80 RCX: 0000000000000008
[    3.284766] RDX: 0000000000000000 RSI: ffff8df65c40b048 RDI: ffffffffb9f66418
[    3.284769] RBP: ffffffffba203d90 R08: 00000000fffffe4d R09: ffff8df65c406078
[    3.284771] R10: ffffffffba203dc0 R11: 0000000000000000 R12: 0000000000000000
[    3.284773] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[    3.284775] FS:  0000000000000000(0000) GS:ffff8df65c400000(0000) knlGS:0000000000000000
[    3.284778] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.284779] CR2: 0000000000000000 CR3: 0000800307e12000 CR4: 00000000003100a0
[    3.284781] Call Trace:
[    3.284783] ---[ end trace 0000000000000000 ]---
[    3.284841] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0 APIC: 8
[    3.364[[   [. 343 35     5 5] [  3 364575  [
  3 [    [   [3 3 [ [  3

Thanks,

Kim
David Woodhouse Feb. 3, 2023, 10:25 p.m. UTC | #3
On Fri, 2023-02-03 at 15:45 -0600, Kim Phillips wrote:
> On 2/3/23 2:19 PM, Woodhouse, David wrote:
> > Would be interesting to know if that goes away if you test only the
> > first part of the tree. My first inclination is to suspect that's a bug
> > in the later patches... although the APIC ID mismatch is interesting.
> > That part (with each AP getting its own APIC ID from CPUID) is in the
> > *first* part of the series....
> 
> dwmw2/parallel-6.2-rc6-part1 (commit 942b3faa258c) re-enabled for
> AMD gets the same splat(s):

Sure that's the right image? Looks like it still has the timing debug
patch from the last commit in the tree?


> [    3.195498] smp: Bringing up secondary CPUs ...
> [    3.196670] x86: Booting SMP configuration:
> [    3.200189] .... node  #0, CPUs:          #1
> [    3.200247] CPU 1 to 93/x86/cpu:kick in 315 42 -155206575216 0 . 0 0 0 0 . 0 155217324423
> [    3.200418] ------------[ cut here ]------------
> [    3.200420] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
> [    3.200428] Modules linked in:
> [    3.200430] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc6+ #19
> [    3.200433] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
> [    3.200435] RIP: 0010:cpu_init+0x2d/0x1f0
> [    3.200438] Code: e5 41 56 41 55 41 54 53 65 48 8b 1c 25 80 2e 1f 00 65 44 8b 35 20 e4 99 47 48 8b 05 5d f7 51 02 44 89 f2 f0 48 0f ab 10 73 06 <0f> 0b eb 02 f3 90 48 8b 05 3e f7 51 02 48 0f a3 10 73 f1 45 85 f6
> [    3.200440] RSP: 0000:ffffffffba203d70 EFLAGS: 00010083
> [    3.200443] RAX: ffff8e027eef6e40 RBX: ffff8dfe80064b80 RCX: 0000000000000008
> [    3.200444] RDX: 0000000000000000 RSI: ffff8df65c40b048 RDI: ffffffffb9f66418
> [    3.200445] RBP: ffffffffba203d90 R08: 00000000fffffe4d R09: ffff8df65c406078
> [    3.200446] R10: ffffffffba203dc0 R11: 0000000000000000 R12: 0000000000000000
> [    3.200447] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [    3.200448] FS:  0000000000000000(0000) GS:ffff8df65c400000(0000) knlGS:0000000000000000
> [    3.200450] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    3.200451] CR2: 0000000000000000 CR3: 0000800307e12000 CR4: 00000000003100a0
> [    3.200453] Call Trace:
> [    3.200454] ---[ end trace 0000000000000000 ]---
> [    3.200509] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0 APIC: 2
> [    3.284620]    #2
> [    3.284669] CPU 2 to 93/x86/cpu:kick in 252 42 -155547668514 0 . 0 0 0 0 . 0 155548597197
> [    3.284727] ------------[ cut here ]------------
> [    3.284732] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
> [    3.284741] Modules linked in:
> [    3.284745] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W          6.2.0-rc6+ #19
> [    3.284749] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
> [    3.284752] RIP: 0010:cpu_init+0x2d/0x1f0
> [    3.284756] Code: e5 41 56 41 55 41 54 53 65 48 8b 1c 25 80 2e 1f 00 65 44 8b 35 20 e4 99 47 48 8b 05 5d f7 51 02 44 89 f2 f0 48 0f ab 10 73 06 <0f> 0b eb 02 f3 90 48 8b 05 3e f7 51 02 48 0f a3 10 73 f1 45 85 f6
> [    3.284760] RSP: 0000:ffffffffba203d70 EFLAGS: 00010083
> [    3.284764] RAX: ffff8e027eef6e40 RBX: ffff8dfe80064b80 RCX: 0000000000000008
> [    3.284766] RDX: 0000000000000000 RSI: ffff8df65c40b048 RDI: ffffffffb9f66418
> [    3.284769] RBP: ffffffffba203d90 R08: 00000000fffffe4d R09: ffff8df65c406078
> [    3.284771] R10: ffffffffba203dc0 R11: 0000000000000000 R12: 0000000000000000
> [    3.284773] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [    3.284775] FS:  0000000000000000(0000) GS:ffff8df65c400000(0000) knlGS:0000000000000000
> [    3.284778] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    3.284779] CR2: 0000000000000000 CR3: 0000800307e12000 CR4: 00000000003100a0
> [    3.284781] Call Trace:
> [    3.284783] ---[ end trace 0000000000000000 ]---
> [    3.284841] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0 APIC: 8
> [    3.364[[   [. 343 35     5 5] [  3 364575  [
>   3 [    [   [3 3 [ [  3
> 
> Thanks,
> 
> Kim
David Woodhouse Feb. 4, 2023, 9:07 a.m. UTC | #4
On Fri, 2023-02-03 at 13:48 -0600, Kim Phillips wrote:
> +Mario
> 
> Hi,
> 
> On 2/2/23 3:56 PM, Usama Arif wrote:
> > From: David Woodhouse <dwmw@amazon.co.uk>
> > 
> > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> > ---
> 
> I'd like to nack this, but can't (and not because it doesn't have
> commit text):
> 
> If I:
> 
>   - take dwmw2's parallel-6.2-rc6 branch (commit 459d1c46dbd1)
>   - remove the set_cpu_bug(c, X86_BUG_NO_PARALLEL_BRINGUP) line from amd.c
> 
> Then:
> 
>   - a Ryzen 3000 (Picasso A1/Zen+) notebook I have access to fails to boot.
>   - Zen 2,3,4-based servers boot fine
>   - a Zen1-based server doesn't boot.
> 
> This is what's left on its serial port:
> 
> [    3.199633] smp: Bringing up secondary CPUs ...
> [    3.200732] x86: Booting SMP configuration:
> [    3.204242] .... node  #0, CPUs:          #1
> [    3.204301] CPU 1 to 93/x86/cpu:kick in 63 21 -114014307645 0 . 0 0 0 0 . 0 114025055970
> [    3.204478] ------------[ cut here ]------------
> [    3.204481] WARNING: CPU: 0 PID: 1 at arch/x86/kernel/cpu/common.c:2122 cpu_init+0x2d/0x1f0
> [    3.204490] Modules linked in:
> [    3.204493] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc6+ #19
> [    3.204496] Hardware name: AMD Corporation Speedway/Speedway, BIOS RSW1009C 07/27/2018
> [    3.204498] RIP: 0010:cpu_init+0x2d/0x1f0
> [    3.204502] Code: e5 41 56 41 55 41 54 53 65 48 8b 1c 25 80 2e 1f 00 65 44 8b 35 20 e4 39 55 48 8b 05 5d f7 51 02 44 89 f2 f0 48 0f ab 10 73 06 <0f> 0b eb 02 f3 90 48 8b 05 3e f7 51 02 48 0f a3 10 73 f1 45 85 f6
> [    3.204504] RSP: 0000:ffffffffac803d70 EFLAGS: 00010083
> [    3.204506] RAX: ffff8d293eef6e40 RBX: ffff8d1d40010000 RCX: 0000000000000008
> [    3.204508] RDX: 0000000000000000 RSI: ffff8d1d1c40b048 RDI: ffffffffac566418
> [    3.204509] RBP: ffffffffac803d90 R08: 00000000fffffe14 R09: ffff8d1d1c406078
> [    3.204510] R10: ffffffffac803dc0 R11: 0000000000000000 R12: 0000000000000000
> [    3.204511] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [    3.204512] FS:  0000000000000000(0000) GS:ffff8d1d1c400000(0000) knlGS:0000000000000000
> [    3.204514] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    3.204515] CR2: 0000000000000000 CR3: 0000800daec12000 CR4: 00000000003100a0
> [    3.204517] Call Trace:
> [    3.204519] ---[ end trace 0000000000000000 ]---
> [    3.204580] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0 APIC: 2


So this is where it all starts to go wrong, and we didn't really change
this in the less-tested later part of the series. So even though I'd
like to be sure you've tested with just '-part1' to be sure, I suspect
that isn't the issue.

The warning you highlighted the end of the end of the log is just
complaining that a given AP is *already* marked as initialized when
it's trying to come up, and that's just fallout of the fact that they
don't know which CPU they are. They *all* come up thinking they're
CPU#0.

So that's weird. Now, what we do in this series is stop *telling* the
AP which CPU# it is in a global variable as we bring them up one at a
time, and instead we let them get their own APIC ID from CPUID leaf 0xb
and look up their per-cpu data that way.

The commit message for 'Support parallel startup of secondary CPUs'
(currently commit 0f52d4eaaf0c) explains that in a bit more detail.

Please could you try parallel-6.2-rc6-part1 with something like this?
If I boot this in qemu with a weird topology to stop it being a 1:1
mapping, I do see a series of BbCcDdDeEeFfIgJh... showing the APIC ID
and CPU# that each AP finds as it starts up.

qemu-system-x86_64 -kernel arch/x86/boot/bzImage -smp 24,sockets=4,cores=3,threads=2 -append "console=ttyS0,115200 lpj=16528321 earlyprintk=ttyS0" -serial mon:stdio -display none -m 1G -accel kvm,kernel-irqchip=split

...
[    0.570022] x86: Booting SMP configuration:
BbCc[    0.570923] .... node  #0, CPUs:        #1  #2
[    0.711468] Callback from call_rcu_tasks_rude() invoked.
DdEe[    0.713316]   #3  #4
[    0.854459] Callback from call_rcu_tasks() invoked.
FfIgJhKiLjMkNlQmRnSoTpUqVrYsZt[u\v]w^x[    0.856289]   #5  #6  #7  #8  #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23



diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index d07f694691d2..c3219dc2a201 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -247,8 +247,8 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
 	 * Is this the boot CPU coming up? If so everything is available
 	 * in initial_gs, initial_stack and early_gdt_descr.
 	 */
-	movl	smpboot_control(%rip), %eax
-	testl	%eax, %eax
+	movl	smpboot_control(%rip), %edx
+	testl	%edx, %edx
 	jz	.Lsetup_cpu
 
 	/*
@@ -259,30 +259,45 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
 	 * Bit 30	STARTUP_SECONDARY flag
 	 * Bit 31	STARTUP_PARALLEL flag (use CPUID 0x0b for APIC ID)
 	 */
-	testl	$STARTUP_PARALLEL, %eax
+	testl	$STARTUP_PARALLEL, %edx
 	jnz	.Luse_cpuid_0b
-	andl	$0x0FFFFFFF, %eax
+	andl	$0x0FFFFFFF, %edx
 	jmp	.Lsetup_AP
 
 .Luse_cpuid_0b:
 	mov	$0x0B, %eax
 	xorl	%ecx, %ecx
 	cpuid
-	mov	%edx, %eax
+
+/* test hack: print 'a' + APICID */
 
 .Lsetup_AP:
-	/* EAX contains the APICID of the current CPU */
+	/* EDX contains the APICID of the current CPU */
+
+	/* Test hack: Print APIC ID and then CPU# when we find it. */
+	mov	%edx, %ecx
+	mov	%edx, %eax
+	addb	$'A', %al
+	mov	$0x3f8, %dx
+	outb    %al, %dx
+	mov	%ecx, %edx
+	mov	$'a', %al
+
 	xorl	%ecx, %ecx
 	leaq	cpuid_to_apicid(%rip), %rbx
 
 .Lfind_cpunr:
-	cmpl	(%rbx), %eax
+	cmpl	(%rbx), %edx
 	jz	.Linit_cpu_data
 	addq	$4, %rbx
 	addq	$8, %rcx
+	addb	$1, %al
 	jmp	.Lfind_cpunr
 
 .Linit_cpu_data:
+	mov	$0x3f8, %dx
+	outb    %al, %dx
+
 	/* Get the per cpu offset */
 	leaq	__per_cpu_offset(%rip), %rbx
 	addq	%rcx, %rbx
David Woodhouse Feb. 4, 2023, 10:09 a.m. UTC | #5
On Fri, 2023-02-03 at 13:48 -0600, Kim Phillips wrote:
> [    3.204580] [Firmware Bug]: CPU0: APIC id mismatch. Firmware: 0
> APIC: 2


Could it just be that some processors have CPUID leaves above 0xb but
don't actually support 0xb?

What does this do?

--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1578,6 +1578,18 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
 	    cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
 		do_parallel_bringup = false;
 
+	if (do_parallel_bringup) {
+		/* Check that CPUID 0x0B really does look sane. */
+		unsigned int eax, ebx, ecx, edx;
+
+		cpuid_count(0xb, 0, &eax, &ebx, &ecx, &edx);
+		printk("CPUID 0xB level 0: %x %x %x %x\n", eax, ebx, ecx, edx);
+		if (!eax && !ebx) {
+			pr_info("Disabling parallel bringup because CPUID 0xb looks untrustworthy\n");
+			do_parallel_bringup = false;
+		}
+	}
+
 	if (do_parallel_bringup &&
 	    boot_cpu_has_bug(X86_BUG_NO_PARALLEL_BRINGUP)) {
 		pr_info("Disabling parallel bringup due to CPU bugs\n");
David Woodhouse Feb. 4, 2023, 3:40 p.m. UTC | #6
On Fri, 2023-02-03 at 13:48 -0600, Kim Phillips wrote:
> 
> I'd like to nack this, but can't (and not because it doesn't have
> commit text)
> 


So, I *think* that worked precisely as designed; thanks for letting it
grab your attention :)

> If I:
> 
>   - take dwmw2's parallel-6.2-rc6 branch (commit 459d1c46dbd1)
>   - remove the set_cpu_bug(c, X86_BUG_NO_PARALLEL_BRINGUP) line from amd.c
> 
> Then:
> 
>   - a Ryzen 3000 (Picasso A1/Zen+) notebook I have access to fails to boot.
>   - Zen 2,3,4-based servers boot fine
>   - a Zen1-based server doesn't boot.

I've changed it to use CPUID 0xb only if we're actually in x2apic mode,
which Boris tells me won't be the case on Zen1 because that doesn't
support X2APIC.

When we're not in x2apic mode, we can use CPUID 0x1 because the 8 bits
of APIC ID we find there are perfectly sufficient.

New tree in the same place as before, commit ce7e2d1e046a for the
parallel-6.2-rc6-part1 tag and 17bbd12ee03 for parallel-6.2-rc6.

However...

Even though we *can* support non-X2APIC processors, we *might* want to
play it safe and not go back that far; only enabling parallel bringup
on machines with X2APIC which roughly correlates with "lots of CPUs"
since that's where the benefit is.
Arjan van de Ven Feb. 4, 2023, 6:18 p.m. UTC | #7
> 
> However...
> 
> Even though we *can* support non-X2APIC processors, we *might* want to
> play it safe and not go back that far; only enabling parallel bringup
> on machines with X2APIC which roughly correlates with "lots of CPUs"
> since that's where the benefit is.

I think that this is the right approach, at least on the initial patch series.
KISS principle; do all the easy-but-important cases first, get it stable and working
and in later series/kernels the range can be expanded.... if it matters.


> 
> 
>
David Woodhouse Feb. 4, 2023, 10:31 p.m. UTC | #8
On 4 February 2023 18:18:55 GMT, Arjan van de Ven <arjan@linux.intel.com> wrote:
>> 
>> However...
>> 
>> Even though we *can* support non-X2APIC processors, we *might* want to
>> play it safe and not go back that far; only enabling parallel bringup
>> on machines with X2APIC which roughly correlates with "lots of CPUs"
>> since that's where the benefit is.
>
>I think that this is the right approach, at least on the initial patch series.
>KISS principle; do all the easy-but-important cases first, get it stable and working
>and in later series/kernels the range can be expanded.... if it matters.

Agreed. I'll split it to do it only with X2APIC for the initial series, and then hold the CPUID 0x1 part back for the next phase.
Usama Arif Feb. 5, 2023, 10:13 p.m. UTC | #9
On 04/02/2023 22:31, David Woodhouse wrote:
> 
> 
> On 4 February 2023 18:18:55 GMT, Arjan van de Ven <arjan@linux.intel.com> wrote:
>>>
>>> However...
>>>
>>> Even though we *can* support non-X2APIC processors, we *might* want to
>>> play it safe and not go back that far; only enabling parallel bringup
>>> on machines with X2APIC which roughly correlates with "lots of CPUs"
>>> since that's where the benefit is.
>>
>> I think that this is the right approach, at least on the initial patch series.
>> KISS principle; do all the easy-but-important cases first, get it stable and working
>> and in later series/kernels the range can be expanded.... if it matters.
> 
> Agreed. I'll split it to do it only with X2APIC for the initial series, and then hold the CPUID 0x1 part back for the next phase.

This was an interesting find! I tested the latest branch 
parallel-6.2-rc6 and it works well. The numbers from Russ makes the 
patch series look so much better! :)

If we do it with x2apic only and not support non-x2apic CPUID 0x1 case, 
maybe we apply the following diff to part 1?

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index f53a060a899b..f6b89cf40076 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1564,7 +1564,7 @@ void __init native_smp_prepare_cpus(unsigned int 
max_cpus)
          * sufficient). Otherwise it's too hard. And not for SEV-ES
          * guests because they can't use CPUID that early.
          */
-       if (IS_ENABLED(CONFIG_X86_32) || boot_cpu_data.cpuid_level < 1 ||
+       if (IS_ENABLED(CONFIG_X86_32) || !x2apic_mode ||
             (x2apic_mode && boot_cpu_data.cpuid_level < 0xb) ||
             cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
                 do_parallel_bringup = false;



For reusing timer calibration, calibrate_delay ends up being used in 
start_kernel, smp_callin, debug_calc_bogomips and check_cx686_slop. I 
think reusing timer calibration would be ok in the first 2 uses? but not 
really sure about the other 2. cx686 seems to be quite old so not sure 
if anyone will have it to test or will ever run 6.2 kernel on it :).  I 
guess if unsure, better to leave out initially and try and get part1 merged?

Thanks,
Usama
David Woodhouse Feb. 6, 2023, 8:05 a.m. UTC | #10
On Sun, 2023-02-05 at 22:13 +0000, Usama Arif wrote:
> 
> 
> On 04/02/2023 22:31, David Woodhouse wrote:
> > 
> > 
> > On 4 February 2023 18:18:55 GMT, Arjan van de Ven <arjan@linux.intel.com> wrote:
> > > > 
> > > > However...
> > > > 
> > > > Even though we *can* support non-X2APIC processors, we *might* want to
> > > > play it safe and not go back that far; only enabling parallel bringup
> > > > on machines with X2APIC which roughly correlates with "lots of CPUs"
> > > > since that's where the benefit is.
> > > 
> > > I think that this is the right approach, at least on the initial patch series.
> > > KISS principle; do all the easy-but-important cases first, get it stable and working
> > > and in later series/kernels the range can be expanded.... if it matters.
> > 
> > Agreed. I'll split it to do it only with X2APIC for the initial series, and then hold the CPUID 0x1 part back for the next phase.
> 
> This was an interesting find! I tested the latest branch 
> parallel-6.2-rc6 and it works well. The numbers from Russ makes the 
> patch series look so much better! :)
> 
> If we do it with x2apic only and not support non-x2apic CPUID 0x1 case, 
> maybe we apply the following diff to part 1?

Using x2apic_mode would also disable parallel boot when the CPU *does*
have X2APIC support but the kernel just isn't using it at the moment. I
think boot_cpu_has(X86_FEATURE_X2APIC) is the better criterion?

I was thinking I'd tweak the 'no_parallel_bringup' command line
argument into something that also allows us to *enable* it without
X2APIC being supported.

But I've also been pondering the fact that this is all only for 64-bit
anyway. It's not like we're doing it for the zoo of ancient i586 and
even i486 machines where the APICs were hooked up with blue wire and
duct tape.

Maybe "64-bit only" is good enough, with a command line opt-out. And
maybe a printk pointing out the existence of that command line option
before the bringup, just in case? 

> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index f53a060a899b..f6b89cf40076 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1564,7 +1564,7 @@ void __init native_smp_prepare_cpus(unsigned int 
> max_cpus)
>           * sufficient). Otherwise it's too hard. And not for SEV-ES
>           * guests because they can't use CPUID that early.
>           */
> -       if (IS_ENABLED(CONFIG_X86_32) || boot_cpu_data.cpuid_level < 1 ||
> +       if (IS_ENABLED(CONFIG_X86_32) || !x2apic_mode ||
>              (x2apic_mode && boot_cpu_data.cpuid_level < 0xb) ||
>              cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
>                  do_parallel_bringup = false;
> 
> 
> 
> For reusing timer calibration, calibrate_delay ends up being used in 
> start_kernel, smp_callin, debug_calc_bogomips and check_cx686_slop. I
> think reusing timer calibration would be ok in the first 2 uses? 
> 

It already explicitly allows the calibration to be reused from another
CPU in the same *core*, but no further. I think we'd need to be sure
about the correctness of extending that further, and which of the mess
of constant/invariant/we-really-mean-it-this-time TSC flags need to be
set for that to be OK.

> but not really sure about the other 2. cx686 seems to be quite old so not sure 
> if anyone will have it to test or will ever run 6.2 kernel on it :).  I 
> guess if unsure, better to leave out initially and try and get part1 merged?

I doubt cx686 advertises constant TSC; the early ones didn't even *have* a TSC.
Does it even support SMP?
Usama Arif Feb. 6, 2023, 12:11 p.m. UTC | #11
On 06/02/2023 08:05, David Woodhouse wrote:
> On Sun, 2023-02-05 at 22:13 +0000, Usama Arif wrote:
>>
>>
>> On 04/02/2023 22:31, David Woodhouse wrote:
>>>
>>>
>>> On 4 February 2023 18:18:55 GMT, Arjan van de Ven <arjan@linux.intel.com> wrote:
>>>>>
>>>>> However...
>>>>>
>>>>> Even though we *can* support non-X2APIC processors, we *might* want to
>>>>> play it safe and not go back that far; only enabling parallel bringup
>>>>> on machines with X2APIC which roughly correlates with "lots of CPUs"
>>>>> since that's where the benefit is.
>>>>
>>>> I think that this is the right approach, at least on the initial patch series.
>>>> KISS principle; do all the easy-but-important cases first, get it stable and working
>>>> and in later series/kernels the range can be expanded.... if it matters.
>>>
>>> Agreed. I'll split it to do it only with X2APIC for the initial series, and then hold the CPUID 0x1 part back for the next phase.
>>
>> This was an interesting find! I tested the latest branch
>> parallel-6.2-rc6 and it works well. The numbers from Russ makes the
>> patch series look so much better! :)
>>
>> If we do it with x2apic only and not support non-x2apic CPUID 0x1 case,
>> maybe we apply the following diff to part 1?
> 
> Using x2apic_mode would also disable parallel boot when the CPU *does*
> have X2APIC support but the kernel just isn't using it at the moment. I
> think boot_cpu_has(X86_FEATURE_X2APIC) is the better criterion?
> 

x2apic_mode is set to 0 only in the case when nox2apic is specified in 
the kernel cmdline or if x2apic_setup fails. As 0xB leaf gets the 
"x2apic id" and not the "apic id", I thought it would be better to not 
use the x2apic id if the user doesnt want to use x2apic (cmdline), or 
the kernel fails to set it up.

Another thing I noticed from the Intel Architecture Manual CPUID—CPU 
Identification section:

"CPUID leaf 1FH is a preferred superset to leaf 0BH. Intel recommends 
first checking for the existence of Leaf 1FH before using leaf 0BH."

So I think we should switch from 0BH to using the 1FH leaf EDX register.

> I was thinking I'd tweak the 'no_parallel_bringup' command line
> argument into something that also allows us to *enable* it without
> X2APIC being supported.
> 
> But I've also been pondering the fact that this is all only for 64-bit
> anyway. It's not like we're doing it for the zoo of ancient i586 and
> even i486 machines where the APICs were hooked up with blue wire and
> duct tape.
> 
> Maybe "64-bit only" is good enough, with a command line opt-out. And
> maybe a printk pointing out the existence of that command line option
> before the bringup, just in case?
> 

I think that makes sense, the patch only has a significant impact when 
the core count is high, and x2apic was made to support higher CPU count.


>> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
>> index f53a060a899b..f6b89cf40076 100644
>> --- a/arch/x86/kernel/smpboot.c
>> +++ b/arch/x86/kernel/smpboot.c
>> @@ -1564,7 +1564,7 @@ void __init native_smp_prepare_cpus(unsigned int
>> max_cpus)
>>            * sufficient). Otherwise it's too hard. And not for SEV-ES
>>            * guests because they can't use CPUID that early.
>>            */
>> -       if (IS_ENABLED(CONFIG_X86_32) || boot_cpu_data.cpuid_level < 1 ||
>> +       if (IS_ENABLED(CONFIG_X86_32) || !x2apic_mode ||
>>               (x2apic_mode && boot_cpu_data.cpuid_level < 0xb) ||
>>               cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
>>                   do_parallel_bringup = false;
>>
>>
>>
>> For reusing timer calibration, calibrate_delay ends up being used in
>> start_kernel, smp_callin, debug_calc_bogomips and check_cx686_slop. I
>> think reusing timer calibration would be ok in the first 2 uses?
>>
> 
> It already explicitly allows the calibration to be reused from another
> CPU in the same *core*, but no further. I think we'd need to be sure
> about the correctness of extending that further, and which of the mess
> of constant/invariant/we-really-mean-it-this-time TSC flags need to be
> set for that to be OK.
> 
>> but not really sure about the other 2. cx686 seems to be quite old so not sure
>> if anyone will have it to test or will ever run 6.2 kernel on it :).  I
>> guess if unsure, better to leave out initially and try and get part1 merged?
> 
> I doubt cx686 advertises constant TSC; the early ones didn't even *have* a TSC.
> Does it even support SMP?

Just checked the wiki page, and it says core count is 1 :)
Kim Phillips Feb. 6, 2023, 5:58 p.m. UTC | #12
On 2/4/23 9:40 AM, David Woodhouse wrote:
> On Fri, 2023-02-03 at 13:48 -0600, Kim Phillips wrote:
>> If I:
>>
>>    - take dwmw2's parallel-6.2-rc6 branch (commit 459d1c46dbd1)
>>    - remove the set_cpu_bug(c, X86_BUG_NO_PARALLEL_BRINGUP) line from amd.c
>>
>> Then:
>>
>>    - a Ryzen 3000 (Picasso A1/Zen+) notebook I have access to fails to boot.
>>    - Zen 2,3,4-based servers boot fine
>>    - a Zen1-based server doesn't boot.
> 
> I've changed it to use CPUID 0xb only if we're actually in x2apic mode,
> which Boris tells me won't be the case on Zen1 because that doesn't
> support X2APIC.
> 
> When we're not in x2apic mode, we can use CPUID 0x1 because the 8 bits
> of APIC ID we find there are perfectly sufficient.
> 
> New tree in the same place as before, commit ce7e2d1e046a for the
> parallel-6.2-rc6-part1 tag and 17bbd12ee03 for parallel-6.2-rc6.

Thanks, Zen 1 through 4 based servers all boot both those two tree
commits successfully.

I'll try that Ryzen again later.

Kim
Sean Christopherson Feb. 6, 2023, 6:07 p.m. UTC | #13
On Mon, Feb 06, 2023, Usama Arif wrote:
> 
> 
> On 06/02/2023 08:05, David Woodhouse wrote:
> > On Sun, 2023-02-05 at 22:13 +0000, Usama Arif wrote:
> > > 
> > > 
> > > On 04/02/2023 22:31, David Woodhouse wrote:
> > > > 
> > > > 
> > > > On 4 February 2023 18:18:55 GMT, Arjan van de Ven <arjan@linux.intel.com> wrote:
> > > > > > 
> > > > > > However...
> > > > > > 
> > > > > > Even though we *can* support non-X2APIC processors, we *might* want to
> > > > > > play it safe and not go back that far; only enabling parallel bringup
> > > > > > on machines with X2APIC which roughly correlates with "lots of CPUs"
> > > > > > since that's where the benefit is.
> > > > > 
> > > > > I think that this is the right approach, at least on the initial patch series.
> > > > > KISS principle; do all the easy-but-important cases first, get it stable and working
> > > > > and in later series/kernels the range can be expanded.... if it matters.
> > > > 
> > > > Agreed. I'll split it to do it only with X2APIC for the initial series,

And sanity check CPUID.0xB output even when x2APIC is supported, e.g. require
CPUID.0xB.EBX to be non-zero.  Odds are very good that there are VMs in the wild
that support x2APIC but have an empty CPUID.0xB due to it being a topology leaf,
i.e. may be suppressed when vCPUs aren't pinned.  QEMU even has a knob to deliberately
disable CPUID.0xB, e.g. booting a VM with

 cpu host,host-phys-bits,-cpuid-0xb,+x2apic

works just fine.

> > > > and then hold the CPUID 0x1 part back for the next phase.
> > > This was an interesting find! I tested the latest branch
> > > parallel-6.2-rc6 and it works well. The numbers from Russ makes the
> > > patch series look so much better! :)
> > > 
> > > If we do it with x2apic only and not support non-x2apic CPUID 0x1 case,
> > > maybe we apply the following diff to part 1?
> > 
> > Using x2apic_mode would also disable parallel boot when the CPU *does*
> > have X2APIC support but the kernel just isn't using it at the moment. I
> > think boot_cpu_has(X86_FEATURE_X2APIC) is the better criterion?
> > 
> 
> x2apic_mode is set to 0 only in the case when nox2apic is specified in the
> kernel cmdline or if x2apic_setup fails. As 0xB leaf gets the "x2apic id"
> and not the "apic id", I thought it would be better to not use the x2apic id
> if the user doesnt want to use x2apic (cmdline), or the kernel fails to set
> it up.

I agree with David that checking boot_cpu_has(X86_FEATURE_X2APIC) is preferred,
x2APIC goes unused on a lot of platforms due to the kernel's interrupt remapping
requirement.  I would rather have a single explicit "no_parallel_bringup" option
than try to infer the user's intentions based on tangentially related params.

> Another thing I noticed from the Intel Architecture Manual CPUID—CPU
> Identification section:
> 
> "CPUID leaf 1FH is a preferred superset to leaf 0BH. Intel recommends first
> checking for the existence of Leaf 1FH before using leaf 0BH."
> 
> So I think we should switch from 0BH to using the 1FH leaf EDX register.

I don't think using 0x1F will buy us anything except complexity.  0x1F provides more
details on the topology, but its x2APIC ID enumeration isn't any more trustworthy
than 0xB.

> > I was thinking I'd tweak the 'no_parallel_bringup' command line
> > argument into something that also allows us to *enable* it without
> > X2APIC being supported.
> > 
> > But I've also been pondering the fact that this is all only for 64-bit
> > anyway. It's not like we're doing it for the zoo of ancient i586 and
> > even i486 machines where the APICs were hooked up with blue wire and
> > duct tape.

The only thing I can see benefiting from parallel bringup without x2APIC is large
VMs running on pre-x2AVIC hardware.  E.g. IIRC, we (Google) hide x2APIC on AMD-based
VMs so that VMs would take advantage of AVIC acceleration if AVIC were even to be
enabled.

But that's more of an argument for "try to use CPUID.0x1" than it is an argument
for trying to use CPUID.0xB without x2APIC.

> > Maybe "64-bit only" is good enough, with a command line opt-out. And
> > maybe a printk pointing out the existence of that command line option
> > before the bringup, just in case?
Thomas Gleixner Feb. 7, 2023, 12:09 a.m. UTC | #14
On Thu, Feb 02 2023 at 21:56, Usama Arif wrote:

> From: David Woodhouse <dwmw@amazon.co.uk>

-ENOCONTENT

> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Thomas Gleixner Feb. 7, 2023, 12:23 a.m. UTC | #15
On Sat, Feb 04 2023 at 15:40, David Woodhouse wrote:
> On Fri, 2023-02-03 at 13:48 -0600, Kim Phillips wrote:
>> Then:
>> 
>>   - a Ryzen 3000 (Picasso A1/Zen+) notebook I have access to fails to boot.
>>   - Zen 2,3,4-based servers boot fine
>>   - a Zen1-based server doesn't boot.
>
> I've changed it to use CPUID 0xb only if we're actually in x2apic mode,
> which Boris tells me won't be the case on Zen1 because that doesn't
> support X2APIC.

Correct.

> When we're not in x2apic mode, we can use CPUID 0x1 because the 8 bits
> of APIC ID we find there are perfectly sufficient.

Is that worth the trouble?

> Even though we *can* support non-X2APIC processors, we *might* want to
> play it safe and not go back that far; only enabling parallel bringup
> on machines with X2APIC which roughly correlates with "lots of CPUs"
> since that's where the benefit is.

The parallel bringup code is complex enough already, so please don't
optimize for the non-interesting case in the first place. When this has
stabilized then the CPUID 0x1 mechanism can be added if anyone thinks
it's interesting. KISS is still the best engineering principle.

Thanks,

        tglx
David Woodhouse Feb. 7, 2023, 10:04 a.m. UTC | #16
On Tue, 2023-02-07 at 01:23 +0100, Thomas Gleixner wrote:
> On Sat, Feb 04 2023 at 15:40, David Woodhouse wrote:
> > On Fri, 2023-02-03 at 13:48 -0600, Kim Phillips wrote:
> > > Then:
> > > 
> > >   - a Ryzen 3000 (Picasso A1/Zen+) notebook I have access to fails to boot.
> > >   - Zen 2,3,4-based servers boot fine
> > >   - a Zen1-based server doesn't boot.
> > 
> > I've changed it to use CPUID 0xb only if we're actually in x2apic mode,
> > which Boris tells me won't be the case on Zen1 because that doesn't
> > support X2APIC.
> 
> Correct.
> 
> > When we're not in x2apic mode, we can use CPUID 0x1 because the 8 bits
> > of APIC ID we find there are perfectly sufficient.
> 
> Is that worth the trouble?

Well, that's what was being debated. I think the conclusion that was
bring reached was that it *is* worth the trouble, because there will be
a number of physical and especially virtual machines which have a high
CPU count but which don't actually use X2APIC mode. And which might not
even *support* CPUID 0xb.

So using CPUID 0x1 when there is no APIC ID greater than 254 does seem
to make sense.


> > Even though we *can* support non-X2APIC processors, we *might* want to
> > play it safe and not go back that far; only enabling parallel bringup
> > on machines with X2APIC which roughly correlates with "lots of CPUs"
> > since that's where the benefit is.
> 
> The parallel bringup code is complex enough already, so please don't
> optimize for the non-interesting case in the first place. When this has
> stabilized then the CPUID 0x1 mechanism can be added if anyone thinks
> it's interesting. KISS is still the best engineering principle.

Actually it ends up being trivial. It probably makes sense to keep it
in there even if it can only be exercised by a deliberate opt-in on
older CPUs. I reworked the register usage from your original anyway,
which helps a little.

	testl	$STARTUP_APICID_CPUID_0B, %edx
	jnz	.Luse_cpuid_0b
	testl	$STARTUP_APICID_CPUID_01, %edx
	jnz	.Luse_cpuid_01
	andl	$0x0FFFFFFF, %edx
	jmp	.Lsetup_AP

.Luse_cpuid_01:
	mov	$0x01, %eax
	cpuid
	mov	%ebx, %edx
	shr	$24, %edx
	jmp	.Lsetup_AP

.Luse_cpuid_0b:
	mov	$0x0B, %eax
	xorl	%ecx, %ecx
	cpuid

.Lsetup_AP:
	/* EDX contains the APICID of the current CPU */
Thomas Gleixner Feb. 7, 2023, 2:44 p.m. UTC | #17
David!

On Tue, Feb 07 2023 at 10:04, David Woodhouse wrote:
> On Tue, 2023-02-07 at 01:23 +0100, Thomas Gleixner wrote:
>> > When we're not in x2apic mode, we can use CPUID 0x1 because the 8 bits
>> > of APIC ID we find there are perfectly sufficient.
>> 
>> Is that worth the trouble?
>
> Well, that's what was being debated. I think the conclusion that was
> bring reached was that it *is* worth the trouble, because there will be
> a number of physical and especially virtual machines which have a high
> CPU count but which don't actually use X2APIC mode. And which might not
> even *support* CPUID 0xb.
>
> So using CPUID 0x1 when there is no APIC ID greater than 254 does seem
> to make sense.

Fair enough.

>> > Even though we *can* support non-X2APIC processors, we *might* want to
>> > play it safe and not go back that far; only enabling parallel bringup
>> > on machines with X2APIC which roughly correlates with "lots of CPUs"
>> > since that's where the benefit is.
>> 
>> The parallel bringup code is complex enough already, so please don't
>> optimize for the non-interesting case in the first place. When this has
>> stabilized then the CPUID 0x1 mechanism can be added if anyone thinks
>> it's interesting. KISS is still the best engineering principle.
>
> Actually it ends up being trivial. It probably makes sense to keep it
> in there even if it can only be exercised by a deliberate opt-in on
> older CPUs. I reworked the register usage from your original anyway,
> which helps a little.
>
> 	testl	$STARTUP_APICID_CPUID_0B, %edx
> 	jnz	.Luse_cpuid_0b
> 	testl	$STARTUP_APICID_CPUID_01, %edx
> 	jnz	.Luse_cpuid_01
> 	andl	$0x0FFFFFFF, %edx
> 	jmp	.Lsetup_AP
>
> .Luse_cpuid_01:
> 	mov	$0x01, %eax
> 	cpuid
> 	mov	%ebx, %edx
> 	shr	$24, %edx
> 	jmp	.Lsetup_AP
>
> .Luse_cpuid_0b:
> 	mov	$0x0B, %eax
> 	xorl	%ecx, %ecx
> 	cpuid
>
> .Lsetup_AP:
> 	/* EDX contains the APICID of the current CPU */

That looks trivial enough. So no objections from my side. Not sure
whether this needs a special opt-in though. We probably want an opt-out
for the parallel bringup mode for diagnosis purposes anyway and that
should be good enough for a start.

Thanks,

        tglx
Thomas Gleixner Feb. 7, 2023, 2:46 p.m. UTC | #18
On Tue, Feb 07 2023 at 09:52, David Woodhouse wrote:

> On Tue, 2023-02-07 at 01:09 +0100, Thomas Gleixner wrote:
>> On Thu, Feb 02 2023 at 21:56, Usama Arif wrote:
>>
>> -ENOCONTENT
>
>
> Yeah, I'm aware of the nonsense here too but when I'm actually writing
> a commit message, years of habit kick in and it doesn't occur to me to
> pointlessly repeat the words that are already there, one line up where
> it says "Disable parallel boot for AMD CPUs".
>
> I'm old and stupid, but eventually I'll start to remember that people
> nowadays like to gratuitously refuse to read those words, and they want
> them repeated in a different place.

I'm not asking for repeating information from the subject line but to
actually add a rationale. The WHY is the most important information of a
changelog, no?

> Bear with me...

I do, but that doesn't mean I'm giving you special treatment :)

Thanks,

        tglx
Kim Phillips Feb. 7, 2023, 4:27 p.m. UTC | #19
On 2/6/23 11:58 AM, Kim Phillips wrote:
> On 2/4/23 9:40 AM, David Woodhouse wrote:
>> On Fri, 2023-02-03 at 13:48 -0600, Kim Phillips wrote:
>>> If I:
>>>
>>>    - take dwmw2's parallel-6.2-rc6 branch (commit 459d1c46dbd1)
>>>    - remove the set_cpu_bug(c, X86_BUG_NO_PARALLEL_BRINGUP) line from amd.c
>>>
>>> Then:
>>>
>>>    - a Ryzen 3000 (Picasso A1/Zen+) notebook I have access to fails to boot.
>>>    - Zen 2,3,4-based servers boot fine
>>>    - a Zen1-based server doesn't boot.
>>
>> I've changed it to use CPUID 0xb only if we're actually in x2apic mode,
>> which Boris tells me won't be the case on Zen1 because that doesn't
>> support X2APIC.
>>
>> When we're not in x2apic mode, we can use CPUID 0x1 because the 8 bits
>> of APIC ID we find there are perfectly sufficient.
>>
>> New tree in the same place as before, commit ce7e2d1e046a for the
>> parallel-6.2-rc6-part1 tag and 17bbd12ee03 for parallel-6.2-rc6.
> 
> Thanks, Zen 1 through 4 based servers all boot both those two tree
> commits successfully.
> 
> I'll try that Ryzen again later.

The Ryzen 3000 also successfully boots those two branches now.

Thanks,

Kim
diff mbox series

Patch

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 61012476d66e..ed7f32354edc 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -466,5 +466,6 @@ 
 #define X86_BUG_MMIO_UNKNOWN		X86_BUG(26) /* CPU is too old and its MMIO Stale Data status is unknown */
 #define X86_BUG_RETBLEED		X86_BUG(27) /* CPU is affected by RETBleed */
 #define X86_BUG_EIBRS_PBRSB		X86_BUG(28) /* EIBRS is vulnerable to Post Barrier RSB Predictions */
+#define X86_BUG_NO_PARALLEL_BRINGUP	X86_BUG(29) /* CPU has hardware issues with parallel AP bringup */
 
 #endif /* _ASM_X86_CPUFEATURES_H */
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index f769d6d08b43..19b5c8342d7e 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -941,6 +941,17 @@  static void init_amd(struct cpuinfo_x86 *c)
 	case 0x19: init_amd_zn(c); break;
 	}
 
+	/*
+	 * Various AMD CPUs appear to not to cope with APs being brought up
+	 * in parallel. In debugging, the AP doesn't even seem to reach an
+	 * outb to port 0x3f8 right at the top of the startup trampoline.
+	 * We don't *think* there are any remaining software issues which
+	 * may contribute to this, although it's possible. So far, attempts
+	 * to get AMD to investigate this have been to no avail. So just
+	 * disable parallel bring up for all AMD CPUs for now.
+	 */
+	set_cpu_bug(c, X86_BUG_NO_PARALLEL_BRINGUP);
+
 	/*
 	 * Enable workaround for FXSAVE leak on CPUs
 	 * without a XSaveErPtr feature
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 741da8d306a4..656897b055f5 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1534,13 +1534,22 @@  void __init native_smp_prepare_cpus(unsigned int max_cpus)
 	 * We can do 64-bit AP bringup in parallel if the CPU reports its
 	 * APIC ID in CPUID leaf 0x0B. Otherwise it's too hard. And not
 	 * for SEV-ES guests because they can't use CPUID that early.
-	 * Also, some AMD CPUs crash when doing parallel cpu bringup, disable
-	 * it for all AMD CPUs to be on the safe side.
 	 */
 	if (IS_ENABLED(CONFIG_X86_32) || boot_cpu_data.cpuid_level < 0x0B ||
 	    cc_platform_has(CC_ATTR_GUEST_STATE_ENCRYPT))
 		do_parallel_bringup = false;
 
+	/*
+	 * Various AMD CPUs appear not to cope with APs being brought up
+	 * in parallel, so just disable parallel bring up for all AMD CPUs
+	 * for now.
+	 */
+	if (do_parallel_bringup &&
+	    boot_cpu_has_bug(X86_BUG_NO_PARALLEL_BRINGUP)) {
+		pr_info("Disabling parallel bringup due to CPU bugs\n");
+		do_parallel_bringup = false;
+	}
+
 	snp_set_wakeup_secondary_cpu();
 }