diff mbox series

[v14,10/12] x86/smpboot: Send INIT/SIPI/SIPI to secondary CPUs in parallel

Message ID 20230308171328.1562857-11-usama.arif@bytedance.com (mailing list archive)
State Superseded
Headers show
Series Parallel CPU bringup for x86_64 | expand

Commit Message

Usama Arif March 8, 2023, 5:13 p.m. UTC
From: David Woodhouse <dwmw@amazon.co.uk>

When the APs can find their own APIC ID without assistance, perform the
AP bringup in parallel.

Register a CPUHP_BP_PARALLEL_DYN stage "x86/cpu:kick" which just calls
do_boot_cpu() to deliver INIT/SIPI/SIPI to each AP in turn before the
normal native_cpu_up() does the rest of the hand-holding.

The APs will then take turns through the real mode code (which has its
own bitlock for exclusion) until they make it to their own stack, then
proceed through the first few lines of start_secondary() and execute
these parts in parallel:

 start_secondary()
    -> cr4_init()
    -> (some 32-bit only stuff so not in the parallel cases)
    -> cpu_init_secondary()
       -> cpu_init_exception_handling()
       -> cpu_init()
          -> wait_for_master_cpu()

At this point they wait for the BSP to set their bit in cpu_callout_mask
(from do_wait_cpu_initialized()), and release them to continue through
the rest of cpu_init() and beyond.

This reduces the time taken for bringup on my 28-thread Haswell system
from about 120ms to 80ms. On a socket 96-thread Skylake it takes the
bringup time from 500ms to 100ms.

There is more speedup to be had by doing the remaining parts in parallel
too — especially notify_cpu_starting() in which the AP takes itself
through all the stages from CPUHP_BRINGUP_CPU to CPUHP_ONLINE. But those
require careful auditing to ensure they are reentrant, before we can go
that far.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Usama Arif <usama.arif@bytedance.com>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 arch/x86/kernel/smpboot.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

Comments

Thomas Gleixner March 11, 2023, 8:54 a.m. UTC | #1
On Wed, Mar 08 2023 at 17:13, Usama Arif wrote:
>  
> +/* Bringup step one: Send INIT/SIPI to the target AP */
> +static int native_cpu_kick(unsigned int cpu)
> +{
> +	return do_cpu_up(cpu, idle_thread_get(cpu));

This idle_thread_get() is not sufficient. bringup_cpu() does:

	struct task_struct *idle = idle_thread_get(cpu);

	/*
	 * Reset stale stack state from the last time this CPU was online.
	 */
	scs_task_reset(idle);
	kasan_unpoison_task_stack(idle);

But with this new model neither the shadow stack gets reset nor the
kasan unpoisoning happens _before_ the to be kicked CPU starts
executing.

That needs a new function which does the get() and the above.

Thanks,

        tglx
David Woodhouse March 11, 2023, 9:55 a.m. UTC | #2
On Sat, 2023-03-11 at 10:54 +0200, Thomas Gleixner wrote:
> On Wed, Mar 08 2023 at 17:13, Usama Arif wrote:
> >  
> > +/* Bringup step one: Send INIT/SIPI to the target AP */
> > +static int native_cpu_kick(unsigned int cpu)
> > +{
> > +       return do_cpu_up(cpu, idle_thread_get(cpu));
> 
> This idle_thread_get() is not sufficient. bringup_cpu() does:
> 
>         struct task_struct *idle = idle_thread_get(cpu);
> 
>         /*
>          * Reset stale stack state from the last time this CPU was online.
>          */
>         scs_task_reset(idle);
>         kasan_unpoison_task_stack(idle);
> 
> But with this new model neither the shadow stack gets reset nor the
> kasan unpoisoning happens _before_ the to be kicked CPU starts
> executing.
> 
> That needs a new function which does the get() and the above.

Ah, good catch. Those were added after we started on this journey :)

I think I'll do it with a 'bool unpoison' argument to
idle_thread_get(). Or just make it unconditional; they're idempotent
anyway and cheap enough? Kind of weird to be doing it from finish_cpu()
though, so I'll probably stick with the argument.


....*types*....

Erm, there are circumstances (!CONFIG_GENERIC_SMP_IDLE_THREAD) when
idle_thread_get() just unconditionally returns NULL.

At first glance, it doesn't look like scs_task_reset() copes with being
passed a NULL. Am I missing something?

 $ grep -c GENERIC_SMP_IDLE_THREAD `grep -l SMP arch/*/Kconfig`
arch/alpha/Kconfig:1
arch/arc/Kconfig:1
arch/arm64/Kconfig:1
arch/arm/Kconfig:1
arch/csky/Kconfig:1
arch/hexagon/Kconfig:1
arch/ia64/Kconfig:1
arch/loongarch/Kconfig:1
arch/mips/Kconfig:1
arch/openrisc/Kconfig:1
arch/parisc/Kconfig:1
arch/powerpc/Kconfig:1
arch/riscv/Kconfig:1
arch/s390/Kconfig:1
arch/sh/Kconfig:1
arch/sparc/Kconfig:1
arch/um/Kconfig:0
arch/x86/Kconfig:1
arch/xtensa/Kconfig:1

Maybe just nobody but UM cares?
David Woodhouse March 11, 2023, 10:17 a.m. UTC | #3
On Sat, 2023-03-11 at 10:54 +0200, Thomas Gleixner wrote:
> On Wed, Mar 08 2023 at 17:13, Usama Arif wrote:
> >  
> > +/* Bringup step one: Send INIT/SIPI to the target AP */
> > +static int native_cpu_kick(unsigned int cpu)
> > +{
> > +       return do_cpu_up(cpu, idle_thread_get(cpu));
> 
> This idle_thread_get() is not sufficient. bringup_cpu() does:
> 
>         struct task_struct *idle = idle_thread_get(cpu);
> 
>         /*
>          * Reset stale stack state from the last time this CPU was online.
>          */
>         scs_task_reset(idle);
>         kasan_unpoison_task_stack(idle);
> 
> But with this new model neither the shadow stack gets reset nor the
> kasan unpoisoning happens _before_ the to be kicked CPU starts
> executing.
> 
> That needs a new function which does the get() and the above.

From f2ea5c62be5f63a8c701e0eda8accd177939e087 Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw@amazon.co.uk>
Date: Thu, 23 Feb 2023 19:11:30 +0000
Subject: [PATCH 02/10] cpu/hotplug: Move idle_thread_get() to
 <linux/smpboot.h>

Instead of relying purely on the special-case wrapper in bringup_cpu()
to pass the idle thread to __cpu_up(), expose idle_thread_get() so that
the architecture code can obtain it directly when necessary.

This will be useful when the existing __cpu_up() is split into multiple
phases, only *one* of which will actually need the idle thread.

If the architecture code is to register its new pre-bringup states with
the cpuhp core, having a special-case wrapper to pass extra arguments is
non-trivial and it's easier just to let the arch register its function
pointer to be invoked with the standard API.

To reduce duplication, move the shadow stack reset and kasan unpoisoning
into idle_thread_get() too.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Usama Arif <usama.arif@bytedance.com>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Kim Phillips <kim.phillips@amd.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
 include/linux/smpboot.h | 10 ++++++++++
 kernel/cpu.c            | 13 +++----------
 kernel/smpboot.c        | 11 ++++++++++-
 kernel/smpboot.h        |  2 --
 4 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/include/linux/smpboot.h b/include/linux/smpboot.h
index 9d1bc65d226c..df6417703e4c 100644
--- a/include/linux/smpboot.h
+++ b/include/linux/smpboot.h
@@ -5,6 +5,16 @@
 #include <linux/types.h>
 
 struct task_struct;
+
+#ifdef CONFIG_GENERIC_SMP_IDLE_THREAD
+struct task_struct *idle_thread_get(unsigned int cpu, bool unpoison);
+#else
+static inline struct task_struct *idle_thread_get(unsigned int cpu, bool unpoison)
+{
+	return NULL;
+}
+#endif
+
 /* Cookie handed to the thread_fn*/
 struct smpboot_thread_data;
 
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 6c0a92ca6bb5..6b3dccb4a888 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -31,7 +31,6 @@
 #include <linux/smpboot.h>
 #include <linux/relay.h>
 #include <linux/slab.h>
-#include <linux/scs.h>
 #include <linux/percpu-rwsem.h>
 #include <linux/cpuset.h>
 #include <linux/random.h>
@@ -588,15 +587,9 @@ static int bringup_wait_for_ap(unsigned int cpu)
 
 static int bringup_cpu(unsigned int cpu)
 {
-	struct task_struct *idle = idle_thread_get(cpu);
+	struct task_struct *idle = idle_thread_get(cpu, true);
 	int ret;
 
-	/*
-	 * Reset stale stack state from the last time this CPU was online.
-	 */
-	scs_task_reset(idle);
-	kasan_unpoison_task_stack(idle);
-
 	/*
 	 * Some architectures have to walk the irq descriptors to
 	 * setup the vector space for the cpu which comes online.
@@ -614,7 +607,7 @@ static int bringup_cpu(unsigned int cpu)
 
 static int finish_cpu(unsigned int cpu)
 {
-	struct task_struct *idle = idle_thread_get(cpu);
+	struct task_struct *idle = idle_thread_get(cpu, false);
 	struct mm_struct *mm = idle->active_mm;
 
 	/*
@@ -1378,7 +1371,7 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
 
 	if (st->state == CPUHP_OFFLINE) {
 		/* Let it fail before we try to bring the cpu up */
-		idle = idle_thread_get(cpu);
+		idle = idle_thread_get(cpu, false);
 		if (IS_ERR(idle)) {
 			ret = PTR_ERR(idle);
 			goto out;
diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index 2c7396da470c..24e81c725e7b 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -11,6 +11,7 @@
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/sched/task.h>
+#include <linux/scs.h>
 #include <linux/export.h>
 #include <linux/percpu.h>
 #include <linux/kthread.h>
@@ -27,12 +28,20 @@
  */
 static DEFINE_PER_CPU(struct task_struct *, idle_threads);
 
-struct task_struct *idle_thread_get(unsigned int cpu)
+struct task_struct *idle_thread_get(unsigned int cpu, bool unpoison)
 {
 	struct task_struct *tsk = per_cpu(idle_threads, cpu);
 
 	if (!tsk)
 		return ERR_PTR(-ENOMEM);
+
+	if (unpoison) {
+		/*
+		 * Reset stale stack state from last time this CPU was online.
+		 */
+		scs_task_reset(tsk);
+		kasan_unpoison_task_stack(tsk);
+	}
 	return tsk;
 }
 
diff --git a/kernel/smpboot.h b/kernel/smpboot.h
index 34dd3d7ba40b..60c609318ad6 100644
--- a/kernel/smpboot.h
+++ b/kernel/smpboot.h
@@ -5,11 +5,9 @@
 struct task_struct;
 
 #ifdef CONFIG_GENERIC_SMP_IDLE_THREAD
-struct task_struct *idle_thread_get(unsigned int cpu);
 void idle_thread_set_boot_cpu(void);
 void idle_threads_init(void);
 #else
-static inline struct task_struct *idle_thread_get(unsigned int cpu) { return NULL; }
 static inline void idle_thread_set_boot_cpu(void) { }
 static inline void idle_threads_init(void) { }
 #endif
Thomas Gleixner March 11, 2023, 2:14 p.m. UTC | #4
On Sat, Mar 11 2023 at 09:55, David Woodhouse wrote:
> On Sat, 2023-03-11 at 10:54 +0200, Thomas Gleixner wrote:
> I think I'll do it with a 'bool unpoison' argument to
> idle_thread_get(). Or just make it unconditional; they're idempotent
> anyway and cheap enough? Kind of weird to be doing it from finish_cpu()
> though, so I'll probably stick with the argument.

Eew.

> ....*types*....
>
> Erm, there are circumstances (!CONFIG_GENERIC_SMP_IDLE_THREAD) when
> idle_thread_get() just unconditionally returns NULL.
>
> At first glance, it doesn't look like scs_task_reset() copes with being
> passed a NULL. Am I missing something?

Shadow call stacks are only enabled by arm64 today, and that uses
the generic idle threads.

Thanks,

        tglx
David Woodhouse March 11, 2023, 2:25 p.m. UTC | #5
On 11 March 2023 14:14:53 GMT, Thomas Gleixner <tglx@linutronix.de> wrote:
>On Sat, Mar 11 2023 at 09:55, David Woodhouse wrote:
>> On Sat, 2023-03-11 at 10:54 +0200, Thomas Gleixner wrote:
>> I think I'll do it with a 'bool unpoison' argument to
>> idle_thread_get(). Or just make it unconditional; they're idempotent
>> anyway and cheap enough? Kind of weird to be doing it from finish_cpu()
>> though, so I'll probably stick with the argument.
>
>Eew.

Hm? I prefer the idea that idle_thread_get() is able to just return a *usable* one, and that we don't rely on architectures to have the *same* set of functions to unpoison/prepare it, and keep those duplicates in sync...

I suppose we could make a separate make_that_idle_thread_you_gave_me_actually_useful() function and avoid the duplication of anything but *that* call... but meh.
diff mbox series

Patch

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index fd4e678b6588..a3572b2ebfd3 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -57,6 +57,7 @@ 
 #include <linux/pgtable.h>
 #include <linux/overflow.h>
 #include <linux/stackprotector.h>
+#include <linux/smpboot.h>
 
 #include <asm/acpi.h>
 #include <asm/cacheinfo.h>
@@ -992,7 +993,8 @@  static void announce_cpu(int cpu, int apicid)
 		node_width = num_digits(num_possible_nodes()) + 1; /* + '#' */
 
 	if (cpu == 1)
-		printk(KERN_INFO "x86: Booting SMP configuration:\n");
+		printk(KERN_INFO "x86: Booting SMP configuration in %s:\n",
+		       do_parallel_bringup ? "parallel" : "series");
 
 	if (system_state < SYSTEM_RUNNING) {
 		if (node != current_node) {
@@ -1325,9 +1327,12 @@  int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 {
 	int ret;
 
-	ret = do_cpu_up(cpu, tidle);
-	if (ret)
-		return ret;
+	/* If parallel AP bringup isn't enabled, perform the first steps now. */
+	if (!do_parallel_bringup) {
+		ret = do_cpu_up(cpu, tidle);
+		if (ret)
+			return ret;
+	}
 
 	ret = do_wait_cpu_initialized(cpu);
 	if (ret)
@@ -1349,6 +1354,12 @@  int native_cpu_up(unsigned int cpu, struct task_struct *tidle)
 	return ret;
 }
 
+/* Bringup step one: Send INIT/SIPI to the target AP */
+static int native_cpu_kick(unsigned int cpu)
+{
+	return do_cpu_up(cpu, idle_thread_get(cpu));
+}
+
 /**
  * arch_disable_smp_support() - disables SMP support for x86 at runtime
  */
@@ -1517,6 +1528,8 @@  static bool prepare_parallel_bringup(void)
 		smpboot_control = STARTUP_APICID_CPUID_01;
 	}
 
+	cpuhp_setup_state_nocalls(CPUHP_BP_PARALLEL_DYN, "x86/cpu:kick",
+				  native_cpu_kick, NULL);
 	return true;
 }