mbox series

[v12,00/11] Parallel CPU bringup for x86_64

Message ID 20230226110802.103134-1-usama.arif@bytedance.com (mailing list archive)
Headers show
Series Parallel CPU bringup for x86_64 | expand

Message

Usama Arif Feb. 26, 2023, 11:07 a.m. UTC
The main code change over v11 is the build error fix by Brian Gerst and
acquiring tr_lock in trampoline_64.S whenever the stack is setup.

The git history is also rewritten to move the commits that removed
initial_stack, early_gdt_descr and initial_gs earlier in the patchset.

Thanks,
Usama

Changes across versions:
v2: Cut it back to just INIT/SIPI/SIPI in parallel for now, nothing more
v3: Clean up x2apic patch, add MTRR optimisation, lock topology update
    in preparation for more parallelisation.
v4: Fixes to the real mode parallelisation patch spotted by SeanC, to
    avoid scribbling on initial_gs in common_cpu_up(), and to allow all
    24 bits of the physical X2APIC ID to be used. That patch still needs
    a Signed-off-by from its original author, who once claimed not to
    remember writing it at all. But now we've fixed it, hopefully he'll
    admit it now :)
v5: rebase to v6.1 and remeasure performance, disable parallel bringup
    for AMD CPUs.
v6: rebase to v6.2-rc6, disabled parallel boot on amd as a cpu bug and
    reused timer calibration for secondary CPUs.
v7: [David Woodhouse] iterate over all possible CPUs to find any existing
    cluster mask in alloc_clustermask. (patch 1/9)
    Keep parallel AMD support enabled in AMD, using APIC ID in CPUID leaf
    0x0B (for x2APIC mode) or CPUID leaf 0x01 where 8 bits are sufficient.
    Included sanity checks for APIC id from 0x0B. (patch 6/9)
    Removed patch for reusing timer calibration for secondary CPUs.
    commit message and code improvements.
v8: Fix CPU0 hotplug by setting up the initial_gs, initial_stack and
    early_gdt_descr.
    Drop trampoline lock and bail if APIC ID not found in find_cpunr.
    Code comments improved and debug prints added.
v9: Drop patch to avoid repeated saves of MTRR at boot time.
    rebased and retested at v6.2-rc8.
    added kernel doc for no_parallel_bringup and made do_parallel_bringup
    __ro_after_init.
v10: Fixed suspend/resume not working with parallel smpboot.
     rebased and retested to 6.2.
     fixed checkpatch errors.
v11: Added patches from Brian Gerst to remove the global variables initial_gs,
     initial_stack, and early_gdt_descr from the 64-bit boot code
     (https://lore.kernel.org/all/20230222221301.245890-1-brgerst@gmail.com/).
v12: Fixed compilation errors, acquire tr_lock for every stack setup in
     trampoline_64.S.
     Rearranged commits for a cleaner git history.

Brian Gerst (3):
  x86/smpboot: Remove initial_stack on 64-bit
  x86/smpboot: Remove early_gdt_descr on 64-bit
  x86/smpboot: Remove initial_gs

David Woodhouse (8):
  x86/apic/x2apic: Allow CPU cluster_mask to be populated in parallel
  cpu/hotplug: Move idle_thread_get() to <linux/smpboot.h>
  cpu/hotplug: Add dynamic parallel bringup states before
    CPUHP_BRINGUP_CPU
  x86/smpboot: Reference count on smpboot_setup_warm_reset_vector()
  x86/smpboot: Split up native_cpu_up into separate phases and document
    them
  x86/smpboot: Support parallel startup of secondary CPUs
  x86/smpboot: Send INIT/SIPI/SIPI to secondary CPUs in parallel
  x86/smpboot: Serialize topology updates for secondary bringup

 .../admin-guide/kernel-parameters.txt         |   3 +
 arch/x86/include/asm/processor.h              |   6 +-
 arch/x86/include/asm/realmode.h               |   4 +-
 arch/x86/include/asm/smp.h                    |  15 +-
 arch/x86/include/asm/topology.h               |   2 -
 arch/x86/kernel/acpi/sleep.c                  |  15 +-
 arch/x86/kernel/apic/apic.c                   |   2 +-
 arch/x86/kernel/apic/x2apic_cluster.c         | 126 ++++---
 arch/x86/kernel/asm-offsets.c                 |   1 +
 arch/x86/kernel/cpu/common.c                  |   6 +-
 arch/x86/kernel/head_64.S                     | 129 +++++--
 arch/x86/kernel/smpboot.c                     | 350 +++++++++++++-----
 arch/x86/realmode/init.c                      |   3 +
 arch/x86/realmode/rm/trampoline_64.S          |  27 +-
 arch/x86/xen/smp_pv.c                         |   4 +-
 arch/x86/xen/xen-head.S                       |   2 +-
 include/linux/cpuhotplug.h                    |   2 +
 include/linux/smpboot.h                       |   7 +
 kernel/cpu.c                                  |  31 +-
 kernel/smpboot.h                              |   2 -
 20 files changed, 537 insertions(+), 200 deletions(-)

Comments

Oleksandr Natalenko Feb. 26, 2023, 6:31 p.m. UTC | #1
Hello.

On neděle 26. února 2023 12:07:51 CET Usama Arif wrote:
> The main code change over v11 is the build error fix by Brian Gerst and
> acquiring tr_lock in trampoline_64.S whenever the stack is setup.
> 
> The git history is also rewritten to move the commits that removed
> initial_stack, early_gdt_descr and initial_gs earlier in the patchset.
> 
> Thanks,
> Usama
> 
> Changes across versions:
> v2: Cut it back to just INIT/SIPI/SIPI in parallel for now, nothing more
> v3: Clean up x2apic patch, add MTRR optimisation, lock topology update
>     in preparation for more parallelisation.
> v4: Fixes to the real mode parallelisation patch spotted by SeanC, to
>     avoid scribbling on initial_gs in common_cpu_up(), and to allow all
>     24 bits of the physical X2APIC ID to be used. That patch still needs
>     a Signed-off-by from its original author, who once claimed not to
>     remember writing it at all. But now we've fixed it, hopefully he'll
>     admit it now :)
> v5: rebase to v6.1 and remeasure performance, disable parallel bringup
>     for AMD CPUs.
> v6: rebase to v6.2-rc6, disabled parallel boot on amd as a cpu bug and
>     reused timer calibration for secondary CPUs.
> v7: [David Woodhouse] iterate over all possible CPUs to find any existing
>     cluster mask in alloc_clustermask. (patch 1/9)
>     Keep parallel AMD support enabled in AMD, using APIC ID in CPUID leaf
>     0x0B (for x2APIC mode) or CPUID leaf 0x01 where 8 bits are sufficient.
>     Included sanity checks for APIC id from 0x0B. (patch 6/9)
>     Removed patch for reusing timer calibration for secondary CPUs.
>     commit message and code improvements.
> v8: Fix CPU0 hotplug by setting up the initial_gs, initial_stack and
>     early_gdt_descr.
>     Drop trampoline lock and bail if APIC ID not found in find_cpunr.
>     Code comments improved and debug prints added.
> v9: Drop patch to avoid repeated saves of MTRR at boot time.
>     rebased and retested at v6.2-rc8.
>     added kernel doc for no_parallel_bringup and made do_parallel_bringup
>     __ro_after_init.
> v10: Fixed suspend/resume not working with parallel smpboot.
>      rebased and retested to 6.2.
>      fixed checkpatch errors.
> v11: Added patches from Brian Gerst to remove the global variables initial_gs,
>      initial_stack, and early_gdt_descr from the 64-bit boot code
>      (https://lore.kernel.org/all/20230222221301.245890-1-brgerst@gmail.com/).
> v12: Fixed compilation errors, acquire tr_lock for every stack setup in
>      trampoline_64.S.
>      Rearranged commits for a cleaner git history.
> 
> Brian Gerst (3):
>   x86/smpboot: Remove initial_stack on 64-bit
>   x86/smpboot: Remove early_gdt_descr on 64-bit
>   x86/smpboot: Remove initial_gs
> 
> David Woodhouse (8):
>   x86/apic/x2apic: Allow CPU cluster_mask to be populated in parallel
>   cpu/hotplug: Move idle_thread_get() to <linux/smpboot.h>
>   cpu/hotplug: Add dynamic parallel bringup states before
>     CPUHP_BRINGUP_CPU
>   x86/smpboot: Reference count on smpboot_setup_warm_reset_vector()
>   x86/smpboot: Split up native_cpu_up into separate phases and document
>     them
>   x86/smpboot: Support parallel startup of secondary CPUs
>   x86/smpboot: Send INIT/SIPI/SIPI to secondary CPUs in parallel
>   x86/smpboot: Serialize topology updates for secondary bringup
> 
>  .../admin-guide/kernel-parameters.txt         |   3 +
>  arch/x86/include/asm/processor.h              |   6 +-
>  arch/x86/include/asm/realmode.h               |   4 +-
>  arch/x86/include/asm/smp.h                    |  15 +-
>  arch/x86/include/asm/topology.h               |   2 -
>  arch/x86/kernel/acpi/sleep.c                  |  15 +-
>  arch/x86/kernel/apic/apic.c                   |   2 +-
>  arch/x86/kernel/apic/x2apic_cluster.c         | 126 ++++---
>  arch/x86/kernel/asm-offsets.c                 |   1 +
>  arch/x86/kernel/cpu/common.c                  |   6 +-
>  arch/x86/kernel/head_64.S                     | 129 +++++--
>  arch/x86/kernel/smpboot.c                     | 350 +++++++++++++-----
>  arch/x86/realmode/init.c                      |   3 +
>  arch/x86/realmode/rm/trampoline_64.S          |  27 +-
>  arch/x86/xen/smp_pv.c                         |   4 +-
>  arch/x86/xen/xen-head.S                       |   2 +-
>  include/linux/cpuhotplug.h                    |   2 +
>  include/linux/smpboot.h                       |   7 +
>  kernel/cpu.c                                  |  31 +-
>  kernel/smpboot.h                              |   2 -
>  20 files changed, 537 insertions(+), 200 deletions(-)

With `CONFIG_FORCE_NR_CPUS=y` this results in:

```
ld: vmlinux.o: in function `secondary_startup_64_no_verify':
(.head.text+0x10c): undefined reference to `nr_cpu_ids'
```

That's because in `arch/x86/kernel/head_64.S` `secondary_startup_64_no_verify()` refers to `nr_cpu_ids` under `#ifdef CONFIG_SMP`, but this symbol is available under the following conditions:

```
38 #if (NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS)
39 #define nr_cpu_ids ((unsigned int)NR_CPUS)
40 #else
41 extern unsigned int nr_cpu_ids;
42 #endif

1090 #if (NR_CPUS > 1) && !defined(CONFIG_FORCE_NR_CPUS)
1091 /* Setup number of possible processor ids */
1092 unsigned int nr_cpu_ids __read_mostly = NR_CPUS;
1093 EXPORT_SYMBOL(nr_cpu_ids);
1094 #endif
```

So having `CONFIG_SMP=y` and, for instance, `CONFIG_NR_CPUS=320`, it is possible to compile out `EXPORT_SYMBOL(nr_cpu_ids);` if `CONFIG_FORCE_NR_CPUS=y` is set.
Usama Arif Feb. 26, 2023, 8:59 p.m. UTC | #2
On 26/02/2023 18:31, Oleksandr Natalenko wrote:
> Hello.
> 
> On neděle 26. února 2023 12:07:51 CET Usama Arif wrote:
>> The main code change over v11 is the build error fix by Brian Gerst and
>> acquiring tr_lock in trampoline_64.S whenever the stack is setup.
>>
>> The git history is also rewritten to move the commits that removed
>> initial_stack, early_gdt_descr and initial_gs earlier in the patchset.
>>
>> Thanks,
>> Usama
>>
>> Changes across versions:
>> v2: Cut it back to just INIT/SIPI/SIPI in parallel for now, nothing more
>> v3: Clean up x2apic patch, add MTRR optimisation, lock topology update
>>      in preparation for more parallelisation.
>> v4: Fixes to the real mode parallelisation patch spotted by SeanC, to
>>      avoid scribbling on initial_gs in common_cpu_up(), and to allow all
>>      24 bits of the physical X2APIC ID to be used. That patch still needs
>>      a Signed-off-by from its original author, who once claimed not to
>>      remember writing it at all. But now we've fixed it, hopefully he'll
>>      admit it now :)
>> v5: rebase to v6.1 and remeasure performance, disable parallel bringup
>>      for AMD CPUs.
>> v6: rebase to v6.2-rc6, disabled parallel boot on amd as a cpu bug and
>>      reused timer calibration for secondary CPUs.
>> v7: [David Woodhouse] iterate over all possible CPUs to find any existing
>>      cluster mask in alloc_clustermask. (patch 1/9)
>>      Keep parallel AMD support enabled in AMD, using APIC ID in CPUID leaf
>>      0x0B (for x2APIC mode) or CPUID leaf 0x01 where 8 bits are sufficient.
>>      Included sanity checks for APIC id from 0x0B. (patch 6/9)
>>      Removed patch for reusing timer calibration for secondary CPUs.
>>      commit message and code improvements.
>> v8: Fix CPU0 hotplug by setting up the initial_gs, initial_stack and
>>      early_gdt_descr.
>>      Drop trampoline lock and bail if APIC ID not found in find_cpunr.
>>      Code comments improved and debug prints added.
>> v9: Drop patch to avoid repeated saves of MTRR at boot time.
>>      rebased and retested at v6.2-rc8.
>>      added kernel doc for no_parallel_bringup and made do_parallel_bringup
>>      __ro_after_init.
>> v10: Fixed suspend/resume not working with parallel smpboot.
>>       rebased and retested to 6.2.
>>       fixed checkpatch errors.
>> v11: Added patches from Brian Gerst to remove the global variables initial_gs,
>>       initial_stack, and early_gdt_descr from the 64-bit boot code
>>       (https://lore.kernel.org/all/20230222221301.245890-1-brgerst@gmail.com/).
>> v12: Fixed compilation errors, acquire tr_lock for every stack setup in
>>       trampoline_64.S.
>>       Rearranged commits for a cleaner git history.
>>
>> Brian Gerst (3):
>>    x86/smpboot: Remove initial_stack on 64-bit
>>    x86/smpboot: Remove early_gdt_descr on 64-bit
>>    x86/smpboot: Remove initial_gs
>>
>> David Woodhouse (8):
>>    x86/apic/x2apic: Allow CPU cluster_mask to be populated in parallel
>>    cpu/hotplug: Move idle_thread_get() to <linux/smpboot.h>
>>    cpu/hotplug: Add dynamic parallel bringup states before
>>      CPUHP_BRINGUP_CPU
>>    x86/smpboot: Reference count on smpboot_setup_warm_reset_vector()
>>    x86/smpboot: Split up native_cpu_up into separate phases and document
>>      them
>>    x86/smpboot: Support parallel startup of secondary CPUs
>>    x86/smpboot: Send INIT/SIPI/SIPI to secondary CPUs in parallel
>>    x86/smpboot: Serialize topology updates for secondary bringup
>>
>>   .../admin-guide/kernel-parameters.txt         |   3 +
>>   arch/x86/include/asm/processor.h              |   6 +-
>>   arch/x86/include/asm/realmode.h               |   4 +-
>>   arch/x86/include/asm/smp.h                    |  15 +-
>>   arch/x86/include/asm/topology.h               |   2 -
>>   arch/x86/kernel/acpi/sleep.c                  |  15 +-
>>   arch/x86/kernel/apic/apic.c                   |   2 +-
>>   arch/x86/kernel/apic/x2apic_cluster.c         | 126 ++++---
>>   arch/x86/kernel/asm-offsets.c                 |   1 +
>>   arch/x86/kernel/cpu/common.c                  |   6 +-
>>   arch/x86/kernel/head_64.S                     | 129 +++++--
>>   arch/x86/kernel/smpboot.c                     | 350 +++++++++++++-----
>>   arch/x86/realmode/init.c                      |   3 +
>>   arch/x86/realmode/rm/trampoline_64.S          |  27 +-
>>   arch/x86/xen/smp_pv.c                         |   4 +-
>>   arch/x86/xen/xen-head.S                       |   2 +-
>>   include/linux/cpuhotplug.h                    |   2 +
>>   include/linux/smpboot.h                       |   7 +
>>   kernel/cpu.c                                  |  31 +-
>>   kernel/smpboot.h                              |   2 -
>>   20 files changed, 537 insertions(+), 200 deletions(-)
> 
> With `CONFIG_FORCE_NR_CPUS=y` this results in:
> 
> ```
> ld: vmlinux.o: in function `secondary_startup_64_no_verify':
> (.head.text+0x10c): undefined reference to `nr_cpu_ids'
> ```
> 
> That's because in `arch/x86/kernel/head_64.S` `secondary_startup_64_no_verify()` refers to `nr_cpu_ids` under `#ifdef CONFIG_SMP`, but this symbol is available under the following conditions:
> 
> ```
> 38 #if (NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS)
> 39 #define nr_cpu_ids ((unsigned int)NR_CPUS)
> 40 #else
> 41 extern unsigned int nr_cpu_ids;
> 42 #endif
> 
> 1090 #if (NR_CPUS > 1) && !defined(CONFIG_FORCE_NR_CPUS)
> 1091 /* Setup number of possible processor ids */
> 1092 unsigned int nr_cpu_ids __read_mostly = NR_CPUS;
> 1093 EXPORT_SYMBOL(nr_cpu_ids);
> 1094 #endif
> ```
> 
> So having `CONFIG_SMP=y` and, for instance, `CONFIG_NR_CPUS=320`, it is possible to compile out `EXPORT_SYMBOL(nr_cpu_ids);` if `CONFIG_FORCE_NR_CPUS=y` is set.
> 

I think something like below diff should work in all scenarios?

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index c2aa0aa26b45..e3727dab9cab 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -35,7 +35,7 @@ typedef struct cpumask { DECLARE_BITMAP(bits, 
NR_CPUS); } cpumask_t;
   */
  #define cpumask_pr_args(maskp)         nr_cpu_ids, cpumask_bits(maskp)

-#if (NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS)
+#if ((NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS)) && 
!defined(CONFIG_SMP)
  #define nr_cpu_ids ((unsigned int)NR_CPUS)
  #else
  extern unsigned int nr_cpu_ids;
@@ -43,7 +43,7 @@ extern unsigned int nr_cpu_ids;

  static inline void set_nr_cpu_ids(unsigned int nr)
  {
-#if (NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS)
+#if ((NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS)) && 
!defined(CONFIG_SMP)
         WARN_ON(nr != nr_cpu_ids);
  #else
         nr_cpu_ids = nr;
diff --git a/kernel/smp.c b/kernel/smp.c
index 06a413987a14..a051b16d4a24 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -1087,11 +1087,9 @@ static int __init maxcpus(char *str)

  early_param("maxcpus", maxcpus);

-#if (NR_CPUS > 1) && !defined(CONFIG_FORCE_NR_CPUS)
  /* Setup number of possible processor ids */
  unsigned int nr_cpu_ids __read_mostly = NR_CPUS;
  EXPORT_SYMBOL(nr_cpu_ids);
-#endif

  /* An arch may set nr_cpu_ids earlier if needed, so this would be 
redundant */
  void __init setup_nr_cpu_ids(void)
David Woodhouse Feb. 27, 2023, 6:13 a.m. UTC | #3
On 26 February 2023 20:59:17 GMT, Usama Arif <usama.arif@bytedance.com> wrote:
>
>
>On 26/02/2023 18:31, Oleksandr Natalenko wrote:
>> Hello.
>> 
>> On neděle 26. února 2023 12:07:51 CET Usama Arif wrote:
>>> The main code change over v11 is the build error fix by Brian Gerst and
>>> acquiring tr_lock in trampoline_64.S whenever the stack is setup.
>>> 
>>> The git history is also rewritten to move the commits that removed
>>> initial_stack, early_gdt_descr and initial_gs earlier in the patchset.
>>> 
>>> Thanks,
>>> Usama
>>> 
>>> Changes across versions:
>>> v2: Cut it back to just INIT/SIPI/SIPI in parallel for now, nothing more
>>> v3: Clean up x2apic patch, add MTRR optimisation, lock topology update
>>>      in preparation for more parallelisation.
>>> v4: Fixes to the real mode parallelisation patch spotted by SeanC, to
>>>      avoid scribbling on initial_gs in common_cpu_up(), and to allow all
>>>      24 bits of the physical X2APIC ID to be used. That patch still needs
>>>      a Signed-off-by from its original author, who once claimed not to
>>>      remember writing it at all. But now we've fixed it, hopefully he'll
>>>      admit it now :)
>>> v5: rebase to v6.1 and remeasure performance, disable parallel bringup
>>>      for AMD CPUs.
>>> v6: rebase to v6.2-rc6, disabled parallel boot on amd as a cpu bug and
>>>      reused timer calibration for secondary CPUs.
>>> v7: [David Woodhouse] iterate over all possible CPUs to find any existing
>>>      cluster mask in alloc_clustermask. (patch 1/9)
>>>      Keep parallel AMD support enabled in AMD, using APIC ID in CPUID leaf
>>>      0x0B (for x2APIC mode) or CPUID leaf 0x01 where 8 bits are sufficient.
>>>      Included sanity checks for APIC id from 0x0B. (patch 6/9)
>>>      Removed patch for reusing timer calibration for secondary CPUs.
>>>      commit message and code improvements.
>>> v8: Fix CPU0 hotplug by setting up the initial_gs, initial_stack and
>>>      early_gdt_descr.
>>>      Drop trampoline lock and bail if APIC ID not found in find_cpunr.
>>>      Code comments improved and debug prints added.
>>> v9: Drop patch to avoid repeated saves of MTRR at boot time.
>>>      rebased and retested at v6.2-rc8.
>>>      added kernel doc for no_parallel_bringup and made do_parallel_bringup
>>>      __ro_after_init.
>>> v10: Fixed suspend/resume not working with parallel smpboot.
>>>       rebased and retested to 6.2.
>>>       fixed checkpatch errors.
>>> v11: Added patches from Brian Gerst to remove the global variables initial_gs,
>>>       initial_stack, and early_gdt_descr from the 64-bit boot code
>>>       (https://lore.kernel.org/all/20230222221301.245890-1-brgerst@gmail.com/).
>>> v12: Fixed compilation errors, acquire tr_lock for every stack setup in
>>>       trampoline_64.S.
>>>       Rearranged commits for a cleaner git history.
>>> 
>>> Brian Gerst (3):
>>>    x86/smpboot: Remove initial_stack on 64-bit
>>>    x86/smpboot: Remove early_gdt_descr on 64-bit
>>>    x86/smpboot: Remove initial_gs
>>> 
>>> David Woodhouse (8):
>>>    x86/apic/x2apic: Allow CPU cluster_mask to be populated in parallel
>>>    cpu/hotplug: Move idle_thread_get() to <linux/smpboot.h>
>>>    cpu/hotplug: Add dynamic parallel bringup states before
>>>      CPUHP_BRINGUP_CPU
>>>    x86/smpboot: Reference count on smpboot_setup_warm_reset_vector()
>>>    x86/smpboot: Split up native_cpu_up into separate phases and document
>>>      them
>>>    x86/smpboot: Support parallel startup of secondary CPUs
>>>    x86/smpboot: Send INIT/SIPI/SIPI to secondary CPUs in parallel
>>>    x86/smpboot: Serialize topology updates for secondary bringup
>>> 
>>>   .../admin-guide/kernel-parameters.txt         |   3 +
>>>   arch/x86/include/asm/processor.h              |   6 +-
>>>   arch/x86/include/asm/realmode.h               |   4 +-
>>>   arch/x86/include/asm/smp.h                    |  15 +-
>>>   arch/x86/include/asm/topology.h               |   2 -
>>>   arch/x86/kernel/acpi/sleep.c                  |  15 +-
>>>   arch/x86/kernel/apic/apic.c                   |   2 +-
>>>   arch/x86/kernel/apic/x2apic_cluster.c         | 126 ++++---
>>>   arch/x86/kernel/asm-offsets.c                 |   1 +
>>>   arch/x86/kernel/cpu/common.c                  |   6 +-
>>>   arch/x86/kernel/head_64.S                     | 129 +++++--
>>>   arch/x86/kernel/smpboot.c                     | 350 +++++++++++++-----
>>>   arch/x86/realmode/init.c                      |   3 +
>>>   arch/x86/realmode/rm/trampoline_64.S          |  27 +-
>>>   arch/x86/xen/smp_pv.c                         |   4 +-
>>>   arch/x86/xen/xen-head.S                       |   2 +-
>>>   include/linux/cpuhotplug.h                    |   2 +
>>>   include/linux/smpboot.h                       |   7 +
>>>   kernel/cpu.c                                  |  31 +-
>>>   kernel/smpboot.h                              |   2 -
>>>   20 files changed, 537 insertions(+), 200 deletions(-)
>> 
>> With `CONFIG_FORCE_NR_CPUS=y` this results in:
>> 
>> ```
>> ld: vmlinux.o: in function `secondary_startup_64_no_verify':
>> (.head.text+0x10c): undefined reference to `nr_cpu_ids'
>> ```
>> 
>> That's because in `arch/x86/kernel/head_64.S` `secondary_startup_64_no_verify()` refers to `nr_cpu_ids` under `#ifdef CONFIG_SMP`, but this symbol is available under the following conditions:
>> 
>> ```
>> 38 #if (NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS)
>> 39 #define nr_cpu_ids ((unsigned int)NR_CPUS)
>> 40 #else
>> 41 extern unsigned int nr_cpu_ids;
>> 42 #endif
>> 
>> 1090 #if (NR_CPUS > 1) && !defined(CONFIG_FORCE_NR_CPUS)
>> 1091 /* Setup number of possible processor ids */
>> 1092 unsigned int nr_cpu_ids __read_mostly = NR_CPUS;
>> 1093 EXPORT_SYMBOL(nr_cpu_ids);
>> 1094 #endif
>> ```
>> 
>> So having `CONFIG_SMP=y` and, for instance, `CONFIG_NR_CPUS=320`, it is possible to compile out `EXPORT_SYMBOL(nr_cpu_ids);` if `CONFIG_FORCE_NR_CPUS=y` is set.
>> 
>
>I think something like below diff should work in all scenarios?

I'd've changed the asm side to use the constant limit.
Usama Arif Feb. 27, 2023, 6:14 a.m. UTC | #4
On 26/02/2023 20:59, Usama Arif wrote:
> 
> 
> On 26/02/2023 18:31, Oleksandr Natalenko wrote:
>> Hello.
>>
>> On neděle 26. února 2023 12:07:51 CET Usama Arif wrote:
>>> The main code change over v11 is the build error fix by Brian Gerst and
>>> acquiring tr_lock in trampoline_64.S whenever the stack is setup.
>>>
>>> The git history is also rewritten to move the commits that removed
>>> initial_stack, early_gdt_descr and initial_gs earlier in the patchset.
>>>
>>> Thanks,
>>> Usama
>>>
>>> Changes across versions:
>>> v2: Cut it back to just INIT/SIPI/SIPI in parallel for now, nothing more
>>> v3: Clean up x2apic patch, add MTRR optimisation, lock topology update
>>>      in preparation for more parallelisation.
>>> v4: Fixes to the real mode parallelisation patch spotted by SeanC, to
>>>      avoid scribbling on initial_gs in common_cpu_up(), and to allow all
>>>      24 bits of the physical X2APIC ID to be used. That patch still 
>>> needs
>>>      a Signed-off-by from its original author, who once claimed not to
>>>      remember writing it at all. But now we've fixed it, hopefully he'll
>>>      admit it now :)
>>> v5: rebase to v6.1 and remeasure performance, disable parallel bringup
>>>      for AMD CPUs.
>>> v6: rebase to v6.2-rc6, disabled parallel boot on amd as a cpu bug and
>>>      reused timer calibration for secondary CPUs.
>>> v7: [David Woodhouse] iterate over all possible CPUs to find any 
>>> existing
>>>      cluster mask in alloc_clustermask. (patch 1/9)
>>>      Keep parallel AMD support enabled in AMD, using APIC ID in CPUID 
>>> leaf
>>>      0x0B (for x2APIC mode) or CPUID leaf 0x01 where 8 bits are 
>>> sufficient.
>>>      Included sanity checks for APIC id from 0x0B. (patch 6/9)
>>>      Removed patch for reusing timer calibration for secondary CPUs.
>>>      commit message and code improvements.
>>> v8: Fix CPU0 hotplug by setting up the initial_gs, initial_stack and
>>>      early_gdt_descr.
>>>      Drop trampoline lock and bail if APIC ID not found in find_cpunr.
>>>      Code comments improved and debug prints added.
>>> v9: Drop patch to avoid repeated saves of MTRR at boot time.
>>>      rebased and retested at v6.2-rc8.
>>>      added kernel doc for no_parallel_bringup and made 
>>> do_parallel_bringup
>>>      __ro_after_init.
>>> v10: Fixed suspend/resume not working with parallel smpboot.
>>>       rebased and retested to 6.2.
>>>       fixed checkpatch errors.
>>> v11: Added patches from Brian Gerst to remove the global variables 
>>> initial_gs,
>>>       initial_stack, and early_gdt_descr from the 64-bit boot code
>>>       
>>> (https://lore.kernel.org/all/20230222221301.245890-1-brgerst@gmail.com/).
>>> v12: Fixed compilation errors, acquire tr_lock for every stack setup in
>>>       trampoline_64.S.
>>>       Rearranged commits for a cleaner git history.
>>>
>>> Brian Gerst (3):
>>>    x86/smpboot: Remove initial_stack on 64-bit
>>>    x86/smpboot: Remove early_gdt_descr on 64-bit
>>>    x86/smpboot: Remove initial_gs
>>>
>>> David Woodhouse (8):
>>>    x86/apic/x2apic: Allow CPU cluster_mask to be populated in parallel
>>>    cpu/hotplug: Move idle_thread_get() to <linux/smpboot.h>
>>>    cpu/hotplug: Add dynamic parallel bringup states before
>>>      CPUHP_BRINGUP_CPU
>>>    x86/smpboot: Reference count on smpboot_setup_warm_reset_vector()
>>>    x86/smpboot: Split up native_cpu_up into separate phases and document
>>>      them
>>>    x86/smpboot: Support parallel startup of secondary CPUs
>>>    x86/smpboot: Send INIT/SIPI/SIPI to secondary CPUs in parallel
>>>    x86/smpboot: Serialize topology updates for secondary bringup
>>>
>>>   .../admin-guide/kernel-parameters.txt         |   3 +
>>>   arch/x86/include/asm/processor.h              |   6 +-
>>>   arch/x86/include/asm/realmode.h               |   4 +-
>>>   arch/x86/include/asm/smp.h                    |  15 +-
>>>   arch/x86/include/asm/topology.h               |   2 -
>>>   arch/x86/kernel/acpi/sleep.c                  |  15 +-
>>>   arch/x86/kernel/apic/apic.c                   |   2 +-
>>>   arch/x86/kernel/apic/x2apic_cluster.c         | 126 ++++---
>>>   arch/x86/kernel/asm-offsets.c                 |   1 +
>>>   arch/x86/kernel/cpu/common.c                  |   6 +-
>>>   arch/x86/kernel/head_64.S                     | 129 +++++--
>>>   arch/x86/kernel/smpboot.c                     | 350 +++++++++++++-----
>>>   arch/x86/realmode/init.c                      |   3 +
>>>   arch/x86/realmode/rm/trampoline_64.S          |  27 +-
>>>   arch/x86/xen/smp_pv.c                         |   4 +-
>>>   arch/x86/xen/xen-head.S                       |   2 +-
>>>   include/linux/cpuhotplug.h                    |   2 +
>>>   include/linux/smpboot.h                       |   7 +
>>>   kernel/cpu.c                                  |  31 +-
>>>   kernel/smpboot.h                              |   2 -
>>>   20 files changed, 537 insertions(+), 200 deletions(-)
>>
>> With `CONFIG_FORCE_NR_CPUS=y` this results in:
>>
>> ```
>> ld: vmlinux.o: in function `secondary_startup_64_no_verify':
>> (.head.text+0x10c): undefined reference to `nr_cpu_ids'
>> ```
>>
>> That's because in `arch/x86/kernel/head_64.S` 
>> `secondary_startup_64_no_verify()` refers to `nr_cpu_ids` under 
>> `#ifdef CONFIG_SMP`, but this symbol is available under the following 
>> conditions:
>>
>> ```
>> 38 #if (NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS)
>> 39 #define nr_cpu_ids ((unsigned int)NR_CPUS)
>> 40 #else
>> 41 extern unsigned int nr_cpu_ids;
>> 42 #endif
>>
>> 1090 #if (NR_CPUS > 1) && !defined(CONFIG_FORCE_NR_CPUS)
>> 1091 /* Setup number of possible processor ids */
>> 1092 unsigned int nr_cpu_ids __read_mostly = NR_CPUS;
>> 1093 EXPORT_SYMBOL(nr_cpu_ids);
>> 1094 #endif
>> ```
>>
>> So having `CONFIG_SMP=y` and, for instance, `CONFIG_NR_CPUS=320`, it 
>> is possible to compile out `EXPORT_SYMBOL(nr_cpu_ids);` if 
>> `CONFIG_FORCE_NR_CPUS=y` is set.
>>
> 
> I think something like below diff should work in all scenarios?
> 
> diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
> index c2aa0aa26b45..e3727dab9cab 100644
> --- a/include/linux/cpumask.h
> +++ b/include/linux/cpumask.h
> @@ -35,7 +35,7 @@ typedef struct cpumask { DECLARE_BITMAP(bits, 
> NR_CPUS); } cpumask_t;
>    */
>   #define cpumask_pr_args(maskp)         nr_cpu_ids, cpumask_bits(maskp)
> 
> -#if (NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS)
> +#if ((NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS)) && 
> !defined(CONFIG_SMP)
>   #define nr_cpu_ids ((unsigned int)NR_CPUS)
>   #else
>   extern unsigned int nr_cpu_ids;
> @@ -43,7 +43,7 @@ extern unsigned int nr_cpu_ids;
> 
>   static inline void set_nr_cpu_ids(unsigned int nr)
>   {
> -#if (NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS)
> +#if ((NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS)) && 
> !defined(CONFIG_SMP)
>          WARN_ON(nr != nr_cpu_ids);
>   #else
>          nr_cpu_ids = nr;
> diff --git a/kernel/smp.c b/kernel/smp.c
> index 06a413987a14..a051b16d4a24 100644
> --- a/kernel/smp.c
> +++ b/kernel/smp.c
> @@ -1087,11 +1087,9 @@ static int __init maxcpus(char *str)
> 
>   early_param("maxcpus", maxcpus);
> 
> -#if (NR_CPUS > 1) && !defined(CONFIG_FORCE_NR_CPUS)
>   /* Setup number of possible processor ids */
>   unsigned int nr_cpu_ids __read_mostly = NR_CPUS;
>   EXPORT_SYMBOL(nr_cpu_ids);
> -#endif
> 
>   /* An arch may set nr_cpu_ids earlier if needed, so this would be 
> redundant */
>   void __init setup_nr_cpu_ids(void)

Or better just do below?

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 17bdd6122dca..5d709aa67df4 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -273,7 +273,11 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, 
SYM_L_GLOBAL)
         cmpl    (%rbx,%rcx,4), %edx
         jz      .Lsetup_cpu
         inc     %ecx
+#if (NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS)
+       cmpl    $NR_CPUS, %ecx
+#else
         cmpl    nr_cpu_ids(%rip), %ecx
+#endif
         jb      .Lfind_cpunr

         /*  APIC ID not found in the table. Drop the trampoline lock 
and bail. */
Usama Arif Feb. 27, 2023, 6:25 a.m. UTC | #5
On 27/02/2023 06:13, David Woodhouse wrote:
> 
> 
> On 26 February 2023 20:59:17 GMT, Usama Arif <usama.arif@bytedance.com> wrote:
>>
>>
>> On 26/02/2023 18:31, Oleksandr Natalenko wrote:
>>> Hello.
>>>
>>> On neděle 26. února 2023 12:07:51 CET Usama Arif wrote:
>>>> The main code change over v11 is the build error fix by Brian Gerst and
>>>> acquiring tr_lock in trampoline_64.S whenever the stack is setup.
>>>>
>>>> The git history is also rewritten to move the commits that removed
>>>> initial_stack, early_gdt_descr and initial_gs earlier in the patchset.
>>>>
>>>> Thanks,
>>>> Usama
>>>>
>>>> Changes across versions:
>>>> v2: Cut it back to just INIT/SIPI/SIPI in parallel for now, nothing more
>>>> v3: Clean up x2apic patch, add MTRR optimisation, lock topology update
>>>>       in preparation for more parallelisation.
>>>> v4: Fixes to the real mode parallelisation patch spotted by SeanC, to
>>>>       avoid scribbling on initial_gs in common_cpu_up(), and to allow all
>>>>       24 bits of the physical X2APIC ID to be used. That patch still needs
>>>>       a Signed-off-by from its original author, who once claimed not to
>>>>       remember writing it at all. But now we've fixed it, hopefully he'll
>>>>       admit it now :)
>>>> v5: rebase to v6.1 and remeasure performance, disable parallel bringup
>>>>       for AMD CPUs.
>>>> v6: rebase to v6.2-rc6, disabled parallel boot on amd as a cpu bug and
>>>>       reused timer calibration for secondary CPUs.
>>>> v7: [David Woodhouse] iterate over all possible CPUs to find any existing
>>>>       cluster mask in alloc_clustermask. (patch 1/9)
>>>>       Keep parallel AMD support enabled in AMD, using APIC ID in CPUID leaf
>>>>       0x0B (for x2APIC mode) or CPUID leaf 0x01 where 8 bits are sufficient.
>>>>       Included sanity checks for APIC id from 0x0B. (patch 6/9)
>>>>       Removed patch for reusing timer calibration for secondary CPUs.
>>>>       commit message and code improvements.
>>>> v8: Fix CPU0 hotplug by setting up the initial_gs, initial_stack and
>>>>       early_gdt_descr.
>>>>       Drop trampoline lock and bail if APIC ID not found in find_cpunr.
>>>>       Code comments improved and debug prints added.
>>>> v9: Drop patch to avoid repeated saves of MTRR at boot time.
>>>>       rebased and retested at v6.2-rc8.
>>>>       added kernel doc for no_parallel_bringup and made do_parallel_bringup
>>>>       __ro_after_init.
>>>> v10: Fixed suspend/resume not working with parallel smpboot.
>>>>        rebased and retested to 6.2.
>>>>        fixed checkpatch errors.
>>>> v11: Added patches from Brian Gerst to remove the global variables initial_gs,
>>>>        initial_stack, and early_gdt_descr from the 64-bit boot code
>>>>        (https://lore.kernel.org/all/20230222221301.245890-1-brgerst@gmail.com/).
>>>> v12: Fixed compilation errors, acquire tr_lock for every stack setup in
>>>>        trampoline_64.S.
>>>>        Rearranged commits for a cleaner git history.
>>>>
>>>> Brian Gerst (3):
>>>>     x86/smpboot: Remove initial_stack on 64-bit
>>>>     x86/smpboot: Remove early_gdt_descr on 64-bit
>>>>     x86/smpboot: Remove initial_gs
>>>>
>>>> David Woodhouse (8):
>>>>     x86/apic/x2apic: Allow CPU cluster_mask to be populated in parallel
>>>>     cpu/hotplug: Move idle_thread_get() to <linux/smpboot.h>
>>>>     cpu/hotplug: Add dynamic parallel bringup states before
>>>>       CPUHP_BRINGUP_CPU
>>>>     x86/smpboot: Reference count on smpboot_setup_warm_reset_vector()
>>>>     x86/smpboot: Split up native_cpu_up into separate phases and document
>>>>       them
>>>>     x86/smpboot: Support parallel startup of secondary CPUs
>>>>     x86/smpboot: Send INIT/SIPI/SIPI to secondary CPUs in parallel
>>>>     x86/smpboot: Serialize topology updates for secondary bringup
>>>>
>>>>    .../admin-guide/kernel-parameters.txt         |   3 +
>>>>    arch/x86/include/asm/processor.h              |   6 +-
>>>>    arch/x86/include/asm/realmode.h               |   4 +-
>>>>    arch/x86/include/asm/smp.h                    |  15 +-
>>>>    arch/x86/include/asm/topology.h               |   2 -
>>>>    arch/x86/kernel/acpi/sleep.c                  |  15 +-
>>>>    arch/x86/kernel/apic/apic.c                   |   2 +-
>>>>    arch/x86/kernel/apic/x2apic_cluster.c         | 126 ++++---
>>>>    arch/x86/kernel/asm-offsets.c                 |   1 +
>>>>    arch/x86/kernel/cpu/common.c                  |   6 +-
>>>>    arch/x86/kernel/head_64.S                     | 129 +++++--
>>>>    arch/x86/kernel/smpboot.c                     | 350 +++++++++++++-----
>>>>    arch/x86/realmode/init.c                      |   3 +
>>>>    arch/x86/realmode/rm/trampoline_64.S          |  27 +-
>>>>    arch/x86/xen/smp_pv.c                         |   4 +-
>>>>    arch/x86/xen/xen-head.S                       |   2 +-
>>>>    include/linux/cpuhotplug.h                    |   2 +
>>>>    include/linux/smpboot.h                       |   7 +
>>>>    kernel/cpu.c                                  |  31 +-
>>>>    kernel/smpboot.h                              |   2 -
>>>>    20 files changed, 537 insertions(+), 200 deletions(-)
>>>
>>> With `CONFIG_FORCE_NR_CPUS=y` this results in:
>>>
>>> ```
>>> ld: vmlinux.o: in function `secondary_startup_64_no_verify':
>>> (.head.text+0x10c): undefined reference to `nr_cpu_ids'
>>> ```
>>>
>>> That's because in `arch/x86/kernel/head_64.S` `secondary_startup_64_no_verify()` refers to `nr_cpu_ids` under `#ifdef CONFIG_SMP`, but this symbol is available under the following conditions:
>>>
>>> ```
>>> 38 #if (NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS)
>>> 39 #define nr_cpu_ids ((unsigned int)NR_CPUS)
>>> 40 #else
>>> 41 extern unsigned int nr_cpu_ids;
>>> 42 #endif
>>>
>>> 1090 #if (NR_CPUS > 1) && !defined(CONFIG_FORCE_NR_CPUS)
>>> 1091 /* Setup number of possible processor ids */
>>> 1092 unsigned int nr_cpu_ids __read_mostly = NR_CPUS;
>>> 1093 EXPORT_SYMBOL(nr_cpu_ids);
>>> 1094 #endif
>>> ```
>>>
>>> So having `CONFIG_SMP=y` and, for instance, `CONFIG_NR_CPUS=320`, it is possible to compile out `EXPORT_SYMBOL(nr_cpu_ids);` if `CONFIG_FORCE_NR_CPUS=y` is set.
>>>
>>
>> I think something like below diff should work in all scenarios?
> 
> I'd've changed the asm side to use the constant limit.

Yup, just needed the morning coffee :) Had sent the proper fix in 
https://lore.kernel.org/all/5e8ad90a-1dc6-95c2-e020-5e95da6f9eda@bytedance.com/#t

I guess the diff is still small over v12 (including the cosmetic 
changes) to send out a new version so soon, probably better to wait a 
couple of days incase something else comes up as well?

Thanks,
Usama
David Woodhouse Feb. 27, 2023, 3:29 p.m. UTC | #6
On Mon, 2023-02-27 at 06:14 +0000, Usama Arif wrote:
> 
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index 17bdd6122dca..5d709aa67df4 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -273,7 +273,11 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, 
> SYM_L_GLOBAL)
>          cmpl    (%rbx,%rcx,4), %edx
>          jz      .Lsetup_cpu
>          inc     %ecx
> +#if (NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS)
> +       cmpl    $NR_CPUS, %ecx
> +#else
>          cmpl    nr_cpu_ids(%rip), %ecx
> +#endif
>          jb      .Lfind_cpunr
> 
>          /*  APIC ID not found in the table. Drop the trampoline lock
> and bail. */

The whitespace looks dodgy there but maybe that's just your mail client?

Given this code is already in #ifdef CONFIG_SMP, can NR_CPUS be 1?
Usama Arif Feb. 27, 2023, 4:32 p.m. UTC | #7
On 27/02/2023 15:29, David Woodhouse wrote:
> On Mon, 2023-02-27 at 06:14 +0000, Usama Arif wrote:
>>
>> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
>> index 17bdd6122dca..5d709aa67df4 100644
>> --- a/arch/x86/kernel/head_64.S
>> +++ b/arch/x86/kernel/head_64.S
>> @@ -273,7 +273,11 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify,
>> SYM_L_GLOBAL)
>>           cmpl    (%rbx,%rcx,4), %edx
>>           jz      .Lsetup_cpu
>>           inc     %ecx
>> +#if (NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS)
>> +       cmpl    $NR_CPUS, %ecx
>> +#else
>>           cmpl    nr_cpu_ids(%rip), %ecx
>> +#endif
>>           jb      .Lfind_cpunr
>>
>>           /*  APIC ID not found in the table. Drop the trampoline lock
>> and bail. */
> 
> The whitespace looks dodgy there but maybe that's just your mail client?
> 
> Given this code is already in #ifdef CONFIG_SMP, can NR_CPUS be 1?

Ah yes, we have

config NR_CPUS_RANGE_BEGIN
	int
	default NR_CPUS_RANGE_END if MAXSMP
	default    1 if !SMP
	default    2

in arch/x86/Kconfig which doesn't let us select 1 for NR_CPUS if SMP is 
enabled, so this should be enough
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 17bdd6122dca..c79ae67492e1 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -273,7 +273,11 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, 
SYM_L_GLOBAL)
         cmpl    (%rbx,%rcx,4), %edx
         jz      .Lsetup_cpu
         inc     %ecx
+#if defined(CONFIG_FORCE_NR_CPUS)
+       cmpl    $NR_CPUS, %ebx
+#else
         cmpl    nr_cpu_ids(%rip), %ecx
+#endif
         jb      .Lfind_cpunr
Guilherme G. Piccoli Feb. 27, 2023, 9:39 p.m. UTC | #8
Hi Usama and David, thanks for the great series!

I've tested it on Steam Deck (with and without the "no_parallel_bringup"
parameter), it works fine - also tested S3/deep sleep-resume cycle.

Feel free to add (to the series):
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>

Also, just taking the opportunity since I'm already replying here: on
patch 09, found two typos:

s/correect/correct (commit message)
s/brinugp/bring-up (kernel-parameters.txt)

Cheers,


Guilherme
David Woodhouse Feb. 28, 2023, 9:07 a.m. UTC | #9
On Mon, 2023-02-27 at 18:39 -0300, Guilherme G. Piccoli wrote:
> Hi Usama and David, thanks for the great series!
> 
> I've tested it on Steam Deck (with and without the
> "no_parallel_bringup"
> parameter), it works fine - also tested S3/deep sleep-resume cycle.
> 
> Feel free to add (to the series):
> Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> 
> Also, just taking the opportunity since I'm already replying here: on
> patch 09, found two typos:
> 
> s/correect/correct (commit message)
> s/brinugp/bring-up (kernel-parameters.txt)
> 
> Cheers,

Thanks. I've done that and pushed it out to
https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/parallel-6.2-rc8-v12bis
ready for the next round.
Paul Menzel Feb. 28, 2023, 10:13 a.m. UTC | #10
Dear Guilherme,


Am 27.02.23 um 22:39 schrieb Guilherme G. Piccoli:

> I've tested it on Steam Deck (with and without the "no_parallel_bringup"
> parameter), it works fine - also tested S3/deep sleep-resume cycle.
> 
> Feel free to add (to the series):
> Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>

Thank you for testing the series. It’d be great if you could share the 
timing differences.

[…]


Kind regards,

Paul
Guilherme G. Piccoli Feb. 28, 2023, 12:04 p.m. UTC | #11
On 28/02/2023 07:13, Paul Menzel wrote:
> Dear Guilherme,
> 
> 
> Am 27.02.23 um 22:39 schrieb Guilherme G. Piccoli:
> 
>> I've tested it on Steam Deck (with and without the "no_parallel_bringup"
>> parameter), it works fine - also tested S3/deep sleep-resume cycle.
>>
>> Feel free to add (to the series):
>> Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> 
> Thank you for testing the series. It’d be great if you could share the 
> timing differences.
> 
> […]
> 
> 
> Kind regards,
> 
> Paul

Hi Paul!

The results...weren't so great, I felt no difference heh
Which is also not bad, it seems the series favors big SMP systems, Deck
has only 8 CPUs.

But maybe the way I measured is not ideal? I just compared timestamps on
dmesg from the first SMP message up to the one that says the boot of
secondary CPUs is complete. Do you have a better suggestion? I can try
things here.

Cheers,


Guilherme