diff mbox series

[v8,9/9] x86/smpboot: Serialize topology updates for secondary bringup

Message ID 20230209154156.266385-10-usama.arif@bytedance.com (mailing list archive)
State Superseded
Headers show
Series Parallel CPU bringup for x86_64 | expand

Commit Message

Usama Arif Feb. 9, 2023, 3:41 p.m. UTC
From: David Woodhouse <dwmw@amazon.co.uk>

The toplogy update is performed by the AP via smp_callin() after the BSP
has called do_wait_cpu_initialized(), setting the AP's bit in
cpu_callout_mask to allow it to proceed.

In preparation to enable further parallelism of AP bringup, add locking to
serialize the update even if multiple APs are (in future) permitted to
proceed through the next stages of bringup in parallel.

Without such ordering (and with that future extra parallelism), confusion
ensues:

[    1.360149] x86: Booting SMP configuration:
[    1.360221] .... node  #0, CPUs:        #1  #2  #3  #4  #5  #6  #7  #8  #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23
[    1.366225] .... node  #1, CPUs:   #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47
[    1.370219] .... node  #0, CPUs:   #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66 #67 #68 #69 #70 #71
[    1.378226] .... node  #1, CPUs:   #72 #73 #74 #75 #76 #77 #78 #79 #80 #81 #82 #83 #84 #85 #86 #87 #88 #89 #90 #91 #92 #93 #94 #95
[    1.382037] Brought 96 CPUs to x86/cpu:kick in 72232606 cycles
[    0.104104] smpboot: CPU 26 Converting physical 0 to logical die 1
[    0.104104] smpboot: CPU 27 Converting physical 1 to logical package 2
[    0.104104] smpboot: CPU 24 Converting physical 1 to logical package 3
[    0.104104] smpboot: CPU 27 Converting physical 0 to logical die 2
[    0.104104] smpboot: CPU 25 Converting physical 1 to logical package 4
[    1.385609] Brought 96 CPUs to x86/cpu:wait-init in 9269218 cycles
[    1.395285] Brought CPUs online in 28930764 cycles
[    1.395469] smp: Brought up 2 nodes, 96 CPUs
[    1.395689] smpboot: Max logical packages: 2
[    1.396222] smpboot: Total of 96 processors activated (576000.00 BogoMIPS)

[Usama Arif: fixed rebase conflict]
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Usama Arif <usama.arif@bytedance.com>
---
 arch/x86/include/asm/smp.h      |  4 +-
 arch/x86/include/asm/topology.h |  2 -
 arch/x86/kernel/cpu/common.c    |  6 +--
 arch/x86/kernel/smpboot.c       | 73 ++++++++++++++++++++-------------
 arch/x86/xen/smp_pv.c           |  4 +-
 5 files changed, 48 insertions(+), 41 deletions(-)

Comments

Thomas Gleixner Feb. 13, 2023, 8:43 p.m. UTC | #1
On Thu, Feb 09 2023 at 15:41, Usama Arif wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> The toplogy update is performed by the AP via smp_callin() after the BSP
> has called do_wait_cpu_initialized(), setting the AP's bit in
> cpu_callout_mask to allow it to proceed.
>
> In preparation to enable further parallelism of AP bringup, add locking to
> serialize the update even if multiple APs are (in future) permitted to
> proceed through the next stages of bringup in parallel.

This one is also only relevant for further parallelisation, right?

Thanks,

        tglx
David Woodhouse Feb. 13, 2023, 8:53 p.m. UTC | #2
On 13 February 2023 21:43:13 CET, Thomas Gleixner <tglx@linutronix.de> wrote:
>On Thu, Feb 09 2023 at 15:41, Usama Arif wrote:
>> From: David Woodhouse <dwmw@amazon.co.uk>
>>
>> The toplogy update is performed by the AP via smp_callin() after the BSP
>> has called do_wait_cpu_initialized(), setting the AP's bit in
>> cpu_callout_mask to allow it to proceed.
>>
>> In preparation to enable further parallelism of AP bringup, add locking to
>> serialize the update even if multiple APs are (in future) permitted to
>> proceed through the next stages of bringup in parallel.
>
>This one is also only relevant for further parallelisation, right?

I believe so, yes. But it's low-hanging fruit and might as well go in now.
Usama Arif Feb. 13, 2023, 10:30 p.m. UTC | #3
On 13/02/2023 20:53, David Woodhouse wrote:
> 
> 
> On 13 February 2023 21:43:13 CET, Thomas Gleixner <tglx@linutronix.de> wrote:
>> On Thu, Feb 09 2023 at 15:41, Usama Arif wrote:
>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>>
>>> The toplogy update is performed by the AP via smp_callin() after the BSP
>>> has called do_wait_cpu_initialized(), setting the AP's bit in
>>> cpu_callout_mask to allow it to proceed.
>>>
>>> In preparation to enable further parallelism of AP bringup, add locking to
>>> serialize the update even if multiple APs are (in future) permitted to
>>> proceed through the next stages of bringup in parallel.
>>
>> This one is also only relevant for further parallelisation, right?
> 
> I believe so, yes. But it's low-hanging fruit and might as well go in now.

Yes, only needed if we parallelize further, i.e. after 
do_wait_cpu_initialized. As David said, its a simple enough and easy 
patch, but its not needed for parallelizing INIT/SIPI.

I tested Davids' part2 branch 
(https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/parallel-6.2-rc7) 
again just to be sure, and the only commit that makes a significant 
difference in smpboot time on top of part1 (this series) is reusing 
timer calibration (100ms to 34ms).

Parallelizing do_wait_cpu_initialized didn't significantly improve 
smpboot time (only reduced to 31ms on average of 4 runs so maybe within 
the margin of error?), so I think its better to move this patch into its 
own series with any further parallelization only if it shows a further 
improvement in smpboot time?

Thanks,
Usama
David Woodhouse Feb. 14, 2023, 6:57 a.m. UTC | #4
On 13 February 2023 23:30:20 CET, Usama Arif <usama.arif@bytedance.com> wrote:
>
>
>On 13/02/2023 20:53, David Woodhouse wrote:
>> 
>> 
>> On 13 February 2023 21:43:13 CET, Thomas Gleixner <tglx@linutronix.de> wrote:
>>> On Thu, Feb 09 2023 at 15:41, Usama Arif wrote:
>>>> From: David Woodhouse <dwmw@amazon.co.uk>
>>>> 
>>>> The toplogy update is performed by the AP via smp_callin() after the BSP
>>>> has called do_wait_cpu_initialized(), setting the AP's bit in
>>>> cpu_callout_mask to allow it to proceed.
>>>> 
>>>> In preparation to enable further parallelism of AP bringup, add locking to
>>>> serialize the update even if multiple APs are (in future) permitted to
>>>> proceed through the next stages of bringup in parallel.
>>> 
>>> This one is also only relevant for further parallelisation, right?
>> 
>> I believe so, yes. But it's low-hanging fruit and might as well go in now.
>
>Yes, only needed if we parallelize further, i.e. after do_wait_cpu_initialized. As David said, its a simple enough and easy patch, but its not needed for parallelizing INIT/SIPI.
>
>I tested Davids' part2 branch (https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/parallel-6.2-rc7) again just to be sure, and the only commit that makes a significant difference in smpboot time on top of part1 (this series) is reusing timer calibration (100ms to 34ms).
>
>Parallelizing do_wait_cpu_initialized didn't significantly improve smpboot time (only reduced to 31ms on average of 4 runs so maybe within the margin of error?), so I think its better to move this patch into its own series with any further parallelization only if it shows a further improvement in smpboot time?

Let's do it anyway; it's harmless to make it thread-safe even if it doesn't get invoked that way today. There is further parallelism to be had with letting the APs run themselves all the way to the online state, and the "part2" in my tree is only the next step towards that.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 33c0d5fd8af6..b4b29e052b6e 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -48,8 +48,6 @@  struct smp_ops {
 };
 
 /* Globals due to paravirt */
-extern void set_cpu_sibling_map(int cpu);
-
 #ifdef CONFIG_SMP
 extern struct smp_ops smp_ops;
 
@@ -137,7 +135,7 @@  void native_send_call_func_single_ipi(int cpu);
 void x86_idle_thread_init(unsigned int cpu, struct task_struct *idle);
 
 void smp_store_boot_cpu_info(void);
-void smp_store_cpu_info(int id);
+void smp_store_cpu_info(int id, bool force_single_core);
 
 asmlinkage __visible void smp_reboot_interrupt(void);
 __visible void smp_reschedule_interrupt(struct pt_regs *regs);
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 458c891a8273..4bccbd949a99 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -136,8 +136,6 @@  static inline int topology_max_smt_threads(void)
 	return __max_smt_threads;
 }
 
-int topology_update_package_map(unsigned int apicid, unsigned int cpu);
-int topology_update_die_map(unsigned int dieid, unsigned int cpu);
 int topology_phys_to_logical_pkg(unsigned int pkg);
 int topology_phys_to_logical_die(unsigned int die, unsigned int cpu);
 bool topology_is_primary_thread(unsigned int cpu);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 9cfca3d7d0e2..cf1a4eff7a76 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1766,7 +1766,7 @@  static void generic_identify(struct cpuinfo_x86 *c)
  * Validate that ACPI/mptables have the same information about the
  * effective APIC id and update the package map.
  */
-static void validate_apic_and_package_id(struct cpuinfo_x86 *c)
+static void validate_apic_id(struct cpuinfo_x86 *c)
 {
 #ifdef CONFIG_SMP
 	unsigned int apicid, cpu = smp_processor_id();
@@ -1777,8 +1777,6 @@  static void validate_apic_and_package_id(struct cpuinfo_x86 *c)
 		pr_err(FW_BUG "CPU%u: APIC id mismatch. Firmware: %x APIC: %x\n",
 		       cpu, apicid, c->initial_apicid);
 	}
-	BUG_ON(topology_update_package_map(c->phys_proc_id, cpu));
-	BUG_ON(topology_update_die_map(c->cpu_die_id, cpu));
 #else
 	c->logical_proc_id = 0;
 #endif
@@ -1969,7 +1967,7 @@  void identify_secondary_cpu(struct cpuinfo_x86 *c)
 #ifdef CONFIG_X86_32
 	enable_sep_cpu();
 #endif
-	validate_apic_and_package_id(c);
+	validate_apic_id(c);
 	x86_spec_ctrl_setup_ap();
 	update_srbds_msr();
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index df839264266b..3ec5182d9698 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -180,16 +180,12 @@  static void smp_callin(void)
 	apic_ap_setup();
 
 	/*
-	 * Save our processor parameters. Note: this information
-	 * is needed for clock calibration.
-	 */
-	smp_store_cpu_info(cpuid);
-
-	/*
+	 * Save our processor parameters and update topology.
+	 * Note: this information is needed for clock calibration.
 	 * The topology information must be up to date before
 	 * calibrate_delay() and notify_cpu_starting().
 	 */
-	set_cpu_sibling_map(raw_smp_processor_id());
+	smp_store_cpu_info(cpuid, false);
 
 	ap_init_aperfmperf();
 
@@ -243,6 +239,12 @@  static void notrace start_secondary(void *unused)
 	 * its bit bit in cpu_callout_mask to release it.
 	 */
 	cpu_init_secondary();
+
+	/*
+	 * Even though notify_cpu_starting() will do this, it does so too late
+	 * as the AP may already have triggered lockdep splats by then. See
+	 * commit 29368e093 ("x86/smpboot:  Move rcu_cpu_starting() earlier").
+	 */
 	rcu_cpu_starting(raw_smp_processor_id());
 	x86_cpuinit.early_percpu_clock_init();
 
@@ -351,7 +353,7 @@  EXPORT_SYMBOL(topology_phys_to_logical_die);
  * @pkg:	The physical package id as retrieved via CPUID
  * @cpu:	The cpu for which this is updated
  */
-int topology_update_package_map(unsigned int pkg, unsigned int cpu)
+static int topology_update_package_map(unsigned int pkg, unsigned int cpu)
 {
 	int new;
 
@@ -374,7 +376,7 @@  int topology_update_package_map(unsigned int pkg, unsigned int cpu)
  * @die:	The die id as retrieved via CPUID
  * @cpu:	The cpu for which this is updated
  */
-int topology_update_die_map(unsigned int die, unsigned int cpu)
+static int topology_update_die_map(unsigned int die, unsigned int cpu)
 {
 	int new;
 
@@ -405,25 +407,7 @@  void __init smp_store_boot_cpu_info(void)
 	c->initialized = true;
 }
 
-/*
- * The bootstrap kernel entry code has set these up. Save them for
- * a given CPU
- */
-void smp_store_cpu_info(int id)
-{
-	struct cpuinfo_x86 *c = &cpu_data(id);
-
-	/* Copy boot_cpu_data only on the first bringup */
-	if (!c->initialized)
-		*c = boot_cpu_data;
-	c->cpu_index = id;
-	/*
-	 * During boot time, CPU0 has this setup already. Save the info when
-	 * bringing up AP or offlined CPU0.
-	 */
-	identify_secondary_cpu(c);
-	c->initialized = true;
-}
+static arch_spinlock_t topology_lock = __ARCH_SPIN_LOCK_UNLOCKED;
 
 static bool
 topology_same_node(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
@@ -629,7 +613,7 @@  static struct sched_domain_topology_level x86_topology[] = {
  */
 static bool x86_has_numa_in_package;
 
-void set_cpu_sibling_map(int cpu)
+static void set_cpu_sibling_map(int cpu)
 {
 	bool has_smt = smp_num_siblings > 1;
 	bool has_mp = has_smt || boot_cpu_data.x86_max_cores > 1;
@@ -708,6 +692,37 @@  void set_cpu_sibling_map(int cpu)
 	}
 }
 
+/*
+ * The bootstrap kernel entry code has set these up. Save them for
+ * a given CPU
+ */
+void smp_store_cpu_info(int id, bool force_single_core)
+{
+	struct cpuinfo_x86 *c = &cpu_data(id);
+
+	/* Copy boot_cpu_data only on the first bringup */
+	if (!c->initialized)
+		*c = boot_cpu_data;
+	c->cpu_index = id;
+	/*
+	 * During boot time, CPU0 has this setup already. Save the info when
+	 * bringing up AP or offlined CPU0.
+	 */
+	identify_secondary_cpu(c);
+
+	arch_spin_lock(&topology_lock);
+	BUG_ON(topology_update_package_map(c->phys_proc_id, id));
+	BUG_ON(topology_update_die_map(c->cpu_die_id, id));
+	c->initialized = true;
+
+	/* For Xen PV */
+	if (force_single_core)
+		c->x86_max_cores = 1;
+
+	set_cpu_sibling_map(id);
+	arch_spin_unlock(&topology_lock);
+}
+
 /* maps the cpu to the sched domain representing multi-core */
 const struct cpumask *cpu_coregroup_mask(int cpu)
 {
diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
index 6175f2c5c822..09f94f940689 100644
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -71,9 +71,7 @@  static void cpu_bringup(void)
 		xen_enable_syscall();
 	}
 	cpu = smp_processor_id();
-	smp_store_cpu_info(cpu);
-	cpu_data(cpu).x86_max_cores = 1;
-	set_cpu_sibling_map(cpu);
+	smp_store_cpu_info(cpu, true);
 
 	speculative_store_bypass_ht_init();