diff mbox

[v3,1/5] x86: initialize secondary CPU only if master CPU will wait for it

Message ID 1397150061-29735-2-git-send-email-imammedo@redhat.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Igor Mammedov April 10, 2014, 5:14 p.m. UTC
Hang is observed on virtual machines during CPU hotplug,
especially in big guests with many CPUs. (It reproducible
more often if host is over-committed).

It happens because master CPU gives up waiting on
secondary CPU and allows it to run wild. As result
AP causes locking or crashing system. For example
as described here: https://lkml.org/lkml/2014/3/6/257

If master CPU have sent STARTUP IPI successfully,
and AP signalled to master CPU that it's ready
to start initialization, make master CPU wait
indefinitely till AP is onlined.
To ensure that AP won't ever run wild, make it
wait at early startup till master CPU confirms its
intention to wait for AP.

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
---
v2:
 - ammend comment in cpu_init()
v3:
 - leave timeouts in do_boot_cpu(), so that master CPU
   won't hang if AP doesn't respond, use cpu_initialized_mask
   as a way for AP to signal to master CPU that it's ready
   to start initialzation.
---
 arch/x86/kernel/cpu/common.c |   28 +++++++-----
 arch/x86/kernel/smpboot.c    |  100 +++++++++++++-----------------------------
 2 files changed, 48 insertions(+), 80 deletions(-)

Comments

Ingo Molnar April 14, 2014, 9:16 a.m. UTC | #1
* Igor Mammedov <imammedo@redhat.com> wrote:

>  	/*
> +	 * wait for ACK from master CPU before continuing
> +	 * with AP initialization
> +	 */
> +	cpumask_set_cpu(cpu, cpu_initialized_mask);
> +	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> +		cpu_relax();

> +	/*
> +	 * wait for ACK from master CPU before continuing
> +	 * with AP initialization
> +	 */
> +	cpumask_set_cpu(cpu, cpu_initialized_mask);
> +	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> +		cpu_relax();

That repetitive pattern could be stuck into a properly named helper 
inline function.

(Also, before the cpumask_set_cpu() we should probably do a WARN_ON() 
if the bit is already set.)

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Igor Mammedov April 14, 2014, 9:52 a.m. UTC | #2
On Mon, 14 Apr 2014 11:16:00 +0200
Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Igor Mammedov <imammedo@redhat.com> wrote:
> 
> >  	/*
> > +	 * wait for ACK from master CPU before continuing
> > +	 * with AP initialization
> > +	 */
> > +	cpumask_set_cpu(cpu, cpu_initialized_mask);
> > +	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > +		cpu_relax();
> 
> > +	/*
> > +	 * wait for ACK from master CPU before continuing
> > +	 * with AP initialization
> > +	 */
> > +	cpumask_set_cpu(cpu, cpu_initialized_mask);
> > +	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > +		cpu_relax();
> 
> That repetitive pattern could be stuck into a properly named helper 
> inline function.
sure

> (Also, before the cpumask_set_cpu() we should probably do a WARN_ON() 
> if the bit is already set.)
The reason why there is no any WARN_ON or likes is that printk is quite
complicated, takes looks and so on. So it's not safe at this point since
CPU could be shot down by any time by INIT/SIPI until it's out of 
cpu_callout_mask loop.
That said it's possible to add WARN_ON in do_boot_cpu() before
cpu_initialized_mask is cleared, to achieve the same effect,
so I'll stick it there.


> 
> Thanks,
> 
> 	Ingo
Ingo Molnar April 14, 2014, 10:03 a.m. UTC | #3
* Igor Mammedov <imammedo@redhat.com> wrote:

> On Mon, 14 Apr 2014 11:16:00 +0200
> Ingo Molnar <mingo@kernel.org> wrote:
> 
> > 
> > * Igor Mammedov <imammedo@redhat.com> wrote:
> > 
> > >  	/*
> > > +	 * wait for ACK from master CPU before continuing
> > > +	 * with AP initialization
> > > +	 */
> > > +	cpumask_set_cpu(cpu, cpu_initialized_mask);
> > > +	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > > +		cpu_relax();
> > 
> > > +	/*
> > > +	 * wait for ACK from master CPU before continuing
> > > +	 * with AP initialization
> > > +	 */
> > > +	cpumask_set_cpu(cpu, cpu_initialized_mask);
> > > +	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > > +		cpu_relax();
> > 
> > That repetitive pattern could be stuck into a properly named helper 
> > inline function.
> sure
> 
> > (Also, before the cpumask_set_cpu() we should probably do a WARN_ON() 
> > if the bit is already set.)
> The reason why there is no any WARN_ON or likes is that printk is quite
> complicated, takes looks and so on. [...]

[ Yeah, I too heard that printk(), like a pretty girl, is complicated
  and makes people look twice. ]

> [...] So it's not safe at this point since
> CPU could be shot down by any time by INIT/SIPI until it's out of 
> cpu_callout_mask loop.

Not sure where you got that from, but it's not a valid concern really: 
the only place where we don't want to do a printk() is in printk code 
itself.

Debug warnings, by definition, should never trigger. If they trigger 
then they will very likely not cause lockups, but will cause the bug 
to be fixed.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Igor Mammedov April 14, 2014, 10:21 a.m. UTC | #4
On Mon, 14 Apr 2014 12:03:35 +0200
Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Igor Mammedov <imammedo@redhat.com> wrote:
> 
> > On Mon, 14 Apr 2014 11:16:00 +0200
> > Ingo Molnar <mingo@kernel.org> wrote:
> > 
> > > 
> > > * Igor Mammedov <imammedo@redhat.com> wrote:
> > > 
> > > >  	/*
> > > > +	 * wait for ACK from master CPU before continuing
> > > > +	 * with AP initialization
> > > > +	 */
> > > > +	cpumask_set_cpu(cpu, cpu_initialized_mask);
> > > > +	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > > > +		cpu_relax();
> > > 
> > > > +	/*
> > > > +	 * wait for ACK from master CPU before continuing
> > > > +	 * with AP initialization
> > > > +	 */
> > > > +	cpumask_set_cpu(cpu, cpu_initialized_mask);
> > > > +	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > > > +		cpu_relax();
> > > 
> > > That repetitive pattern could be stuck into a properly named helper 
> > > inline function.
> > sure
> > 
> > > (Also, before the cpumask_set_cpu() we should probably do a WARN_ON() 
> > > if the bit is already set.)
> > The reason why there is no any WARN_ON or likes is that printk is quite
> > complicated, takes looks and so on. [...]
> 
> [ Yeah, I too heard that printk(), like a pretty girl, is complicated
>   and makes people look twice. ]
> 
> > [...] So it's not safe at this point since
> > CPU could be shot down by any time by INIT/SIPI until it's out of 
> > cpu_callout_mask loop.
> 
> Not sure where you got that from, but it's not a valid concern really: 
> the only place where we don't want to do a printk() is in printk code 
> itself.
> 
> Debug warnings, by definition, should never trigger. If they trigger 
> then they will very likely not cause lockups, but will cause the bug 
> to be fixed.
ok, I'll add WARN_ON in cpu_init() as you've suggested.

> Thanks,
> 
> 	Ingo
Igor Mammedov April 14, 2014, 12:50 p.m. UTC | #5
On Mon, 14 Apr 2014 12:03:35 +0200
Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Igor Mammedov <imammedo@redhat.com> wrote:
> 
> > On Mon, 14 Apr 2014 11:16:00 +0200
> > Ingo Molnar <mingo@kernel.org> wrote:
> > 
> > > 
> > > * Igor Mammedov <imammedo@redhat.com> wrote:
> > > 
> > > >  	/*
> > > > +	 * wait for ACK from master CPU before continuing
> > > > +	 * with AP initialization
> > > > +	 */
> > > > +	cpumask_set_cpu(cpu, cpu_initialized_mask);
> > > > +	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > > > +		cpu_relax();
> > > 
> > > > +	/*
> > > > +	 * wait for ACK from master CPU before continuing
> > > > +	 * with AP initialization
> > > > +	 */
> > > > +	cpumask_set_cpu(cpu, cpu_initialized_mask);
> > > > +	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > > > +		cpu_relax();
> > > 
> > > That repetitive pattern could be stuck into a properly named helper 
> > > inline function.
> > sure
> > 
> > > (Also, before the cpumask_set_cpu() we should probably do a WARN_ON() 
> > > if the bit is already set.)
WARN_ON will never be triggered here since bit is always cleared by master
CPU before AP gets here. There is no harm keeping WARN_ON though,
do you still want it be here?

It could be useful to put WARN_ON in do_boot_cpu() before bit is cleared,
so that user would see that he tries to online AP which has failed
previous time. It's not really necessary since failed to online attempt
reported in logs at ERR level now, see patch 2/5.

Thanks,
   Igor

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar April 14, 2014, 2:51 p.m. UTC | #6
* Igor Mammedov <imammedo@redhat.com> wrote:

> On Mon, 14 Apr 2014 12:03:35 +0200
> Ingo Molnar <mingo@kernel.org> wrote:
> 
> > 
> > * Igor Mammedov <imammedo@redhat.com> wrote:
> > 
> > > On Mon, 14 Apr 2014 11:16:00 +0200
> > > Ingo Molnar <mingo@kernel.org> wrote:
> > > 
> > > > 
> > > > * Igor Mammedov <imammedo@redhat.com> wrote:
> > > > 
> > > > >  	/*
> > > > > +	 * wait for ACK from master CPU before continuing
> > > > > +	 * with AP initialization
> > > > > +	 */
> > > > > +	cpumask_set_cpu(cpu, cpu_initialized_mask);
> > > > > +	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > > > > +		cpu_relax();
> > > > 
> > > > > +	/*
> > > > > +	 * wait for ACK from master CPU before continuing
> > > > > +	 * with AP initialization
> > > > > +	 */
> > > > > +	cpumask_set_cpu(cpu, cpu_initialized_mask);
> > > > > +	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > > > > +		cpu_relax();
> > > > 
> > > > That repetitive pattern could be stuck into a properly named helper 
> > > > inline function.
> > > sure
> > > 
> > > > (Also, before the cpumask_set_cpu() we should probably do a WARN_ON() 
> > > > if the bit is already set.)
>
> WARN_ON will never be triggered here since bit is always cleared by 
> master CPU before AP gets here. There is no harm keeping WARN_ON 
> though, do you still want it be here?

The previous code panic()ed on this condition - so it makes sense to 
at least keep a WARN_ON(). That it won't ever trigger is good:

> It could be useful to put WARN_ON in do_boot_cpu() before bit is 
> cleared, so that user would see that he tries to online AP which has 
> failed previous time. It's not really necessary since failed to 
> online attempt reported in logs at ERR level now, see patch 2/5.

WARN_ON()s are not used to communicate with users, they are used to 
show developers that there's a _bug_ in the code!

So a WARN_ON() not triggering, ever, is a good thing.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Igor Mammedov April 14, 2014, 3:03 p.m. UTC | #7
On Mon, 14 Apr 2014 16:51:19 +0200
Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Igor Mammedov <imammedo@redhat.com> wrote:
> 
> > On Mon, 14 Apr 2014 12:03:35 +0200
> > Ingo Molnar <mingo@kernel.org> wrote:
> > 
> > > 
> > > * Igor Mammedov <imammedo@redhat.com> wrote:
> > > 
> > > > On Mon, 14 Apr 2014 11:16:00 +0200
> > > > Ingo Molnar <mingo@kernel.org> wrote:
> > > > 
> > > > > 
> > > > > * Igor Mammedov <imammedo@redhat.com> wrote:
> > > > > 
> > > > > >  	/*
> > > > > > +	 * wait for ACK from master CPU before continuing
> > > > > > +	 * with AP initialization
> > > > > > +	 */
> > > > > > +	cpumask_set_cpu(cpu, cpu_initialized_mask);
> > > > > > +	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > > > > > +		cpu_relax();
> > > > > 
> > > > > > +	/*
> > > > > > +	 * wait for ACK from master CPU before continuing
> > > > > > +	 * with AP initialization
> > > > > > +	 */
> > > > > > +	cpumask_set_cpu(cpu, cpu_initialized_mask);
> > > > > > +	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
> > > > > > +		cpu_relax();
> > > > > 
> > > > > That repetitive pattern could be stuck into a properly named helper 
> > > > > inline function.
> > > > sure
> > > > 
> > > > > (Also, before the cpumask_set_cpu() we should probably do a WARN_ON() 
> > > > > if the bit is already set.)
> >
> > WARN_ON will never be triggered here since bit is always cleared by 
> > master CPU before AP gets here. There is no harm keeping WARN_ON 
> > though, do you still want it be here?
> 
> The previous code panic()ed on this condition - so it makes sense to 
> at least keep a WARN_ON(). That it won't ever trigger is good:
> 
> > It could be useful to put WARN_ON in do_boot_cpu() before bit is 
> > cleared, so that user would see that he tries to online AP which has 
> > failed previous time. It's not really necessary since failed to 
> > online attempt reported in logs at ERR level now, see patch 2/5.
> 
> WARN_ON()s are not used to communicate with users, they are used to 
> show developers that there's a _bug_ in the code!
> 
> So a WARN_ON() not triggering, ever, is a good thing.

Thanks for your patience
I'll repost fixed and tested series in a minute

> 
> Thanks,
> 
> 	Ingo

--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a135239..6650110 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1236,16 +1236,23 @@  void cpu_init(void)
 	struct task_struct *me;
 	struct tss_struct *t;
 	unsigned long v;
-	int cpu;
+	int cpu = stack_smp_processor_id();
 	int i;
 
 	/*
+	 * wait for ACK from master CPU before continuing
+	 * with AP initialization
+	 */
+	cpumask_set_cpu(cpu, cpu_initialized_mask);
+	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
+		cpu_relax();
+
+	/*
 	 * Load microcode on this cpu if a valid microcode is available.
 	 * This is early microcode loading procedure.
 	 */
 	load_ucode_ap();
 
-	cpu = stack_smp_processor_id();
 	t = &per_cpu(init_tss, cpu);
 	oist = &per_cpu(orig_ist, cpu);
 
@@ -1257,9 +1264,6 @@  void cpu_init(void)
 
 	me = current;
 
-	if (cpumask_test_and_set_cpu(cpu, cpu_initialized_mask))
-		panic("CPU#%d already initialized!\n", cpu);
-
 	pr_debug("Initializing CPU#%d\n", cpu);
 
 	clear_in_cr4(X86_CR4_VME|X86_CR4_PVI|X86_CR4_TSD|X86_CR4_DE);
@@ -1336,13 +1340,15 @@  void cpu_init(void)
 	struct tss_struct *t = &per_cpu(init_tss, cpu);
 	struct thread_struct *thread = &curr->thread;
 
-	show_ucode_info_early();
+	/*
+	 * wait for ACK from master CPU before continuing
+	 * with AP initialization
+	 */
+	cpumask_set_cpu(cpu, cpu_initialized_mask);
+	while (!cpumask_test_cpu(cpu, cpu_callout_mask))
+		cpu_relax();
 
-	if (cpumask_test_and_set_cpu(cpu, cpu_initialized_mask)) {
-		printk(KERN_WARNING "CPU#%d already initialized!\n", cpu);
-		for (;;)
-			local_irq_enable();
-	}
+	show_ucode_info_early();
 
 	printk(KERN_INFO "Initializing CPU#%d\n", cpu);
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 3482693..5e57a0a 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -111,7 +111,6 @@  atomic_t init_deasserted;
 static void smp_callin(void)
 {
 	int cpuid, phys_id;
-	unsigned long timeout;
 
 	/*
 	 * If waken up by an INIT in an 82489DX configuration
@@ -130,37 +129,6 @@  static void smp_callin(void)
 	 * (This works even if the APIC is not enabled.)
 	 */
 	phys_id = read_apic_id();
-	if (cpumask_test_cpu(cpuid, cpu_callin_mask)) {
-		panic("%s: phys CPU#%d, CPU#%d already present??\n", __func__,
-					phys_id, cpuid);
-	}
-	pr_debug("CPU#%d (phys ID: %d) waiting for CALLOUT\n", cpuid, phys_id);
-
-	/*
-	 * STARTUP IPIs are fragile beasts as they might sometimes
-	 * trigger some glue motherboard logic. Complete APIC bus
-	 * silence for 1 second, this overestimates the time the
-	 * boot CPU is spending to send the up to 2 STARTUP IPIs
-	 * by a factor of two. This should be enough.
-	 */
-
-	/*
-	 * Waiting 2s total for startup (udelay is not yet working)
-	 */
-	timeout = jiffies + 2*HZ;
-	while (time_before(jiffies, timeout)) {
-		/*
-		 * Has the boot CPU finished it's STARTUP sequence?
-		 */
-		if (cpumask_test_cpu(cpuid, cpu_callout_mask))
-			break;
-		cpu_relax();
-	}
-
-	if (!time_before(jiffies, timeout)) {
-		panic("%s: CPU%d started up but did not get a callout!\n",
-		      __func__, cpuid);
-	}
 
 	/*
 	 * the boot CPU has finished the init stage and is spinning
@@ -750,8 +718,8 @@  static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 	unsigned long start_ip = real_mode_header->trampoline_start;
 
 	unsigned long boot_error = 0;
-	int timeout;
 	int cpu0_nmi_registered = 0;
+	unsigned long timeout;
 
 	/* Just in case we booted with a single CPU. */
 	alternatives_enable_smp();
@@ -799,6 +767,14 @@  static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 	}
 
 	/*
+	 * AP might wait on cpu_callout_mask in cpu_init() with
+	 * cpu_initialized_mask set if previous attempt to online
+	 * it timed-out. Clear cpu_initialized_mask so that after
+	 * INIT/SIPI it could start with a clean state.
+	 */
+	cpumask_clear_cpu(cpu, cpu_initialized_mask);
+
+	/*
 	 * Wake up a CPU in difference cases:
 	 * - Use the method in the APIC driver if it's defined
 	 * Otherwise,
@@ -810,56 +786,42 @@  static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle)
 		boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid,
 						     &cpu0_nmi_registered);
 
+
 	if (!boot_error) {
 		/*
-		 * allow APs to start initializing.
+		 * Wait 10s total for a response from AP
 		 */
-		pr_debug("Before Callout %d\n", cpu);
-		cpumask_set_cpu(cpu, cpu_callout_mask);
-		pr_debug("After Callout %d\n", cpu);
+		boot_error = -1;
+		timeout = jiffies + 10*HZ;
+		while (time_before(jiffies, timeout)) {
+			if (cpumask_test_cpu(cpu, cpu_initialized_mask)) {
+				/*
+				 * Tell AP to proceed with initialization
+				 */
+				cpumask_set_cpu(cpu, cpu_callout_mask);
+				boot_error = 0;
+				break;
+			}
+			udelay(100);
+			schedule();
+		}
+	}
 
+	if (!boot_error) {
 		/*
-		 * Wait 5s total for a response
+		 * Wait till AP completes initial initialization
 		 */
-		for (timeout = 0; timeout < 50000; timeout++) {
-			if (cpumask_test_cpu(cpu, cpu_callin_mask))
-				break;	/* It has booted */
-			udelay(100);
+		while (!cpumask_test_cpu(cpu, cpu_callin_mask)) {
 			/*
 			 * Allow other tasks to run while we wait for the
 			 * AP to come online. This also gives a chance
 			 * for the MTRR work(triggered by the AP coming online)
 			 * to be completed in the stop machine context.
 			 */
+			udelay(100);
 			schedule();
 		}
-
-		if (cpumask_test_cpu(cpu, cpu_callin_mask)) {
-			print_cpu_msr(&cpu_data(cpu));
-			pr_debug("CPU%d: has booted.\n", cpu);
-		} else {
-			boot_error = 1;
-			if (*trampoline_status == 0xA5A5A5A5)
-				/* trampoline started but...? */
-				pr_err("CPU%d: Stuck ??\n", cpu);
-			else
-				/* trampoline code not run */
-				pr_err("CPU%d: Not responding\n", cpu);
-			if (apic->inquire_remote_apic)
-				apic->inquire_remote_apic(apicid);
-		}
-	}
-
-	if (boot_error) {
-		/* Try to put things back the way they were before ... */
-		numa_remove_cpu(cpu); /* was set by numa_add_cpu */
-
-		/* was set by do_boot_cpu() */
-		cpumask_clear_cpu(cpu, cpu_callout_mask);
-
-		/* was set by cpu_init() */
-		cpumask_clear_cpu(cpu, cpu_initialized_mask);
-
+	} else {
 		set_cpu_present(cpu, false);
 		per_cpu(x86_cpu_to_apicid, cpu) = BAD_APICID;
 	}