diff mbox series

[v2,01/14] smp: Create a new function to shutdown nonboot cpus

Message ID 20191125112754.25223-2-qais.yousef@arm.com (mailing list archive)
State New, archived
Headers show
Series [v2,01/14] smp: Create a new function to shutdown nonboot cpus | expand

Commit Message

Qais Yousef Nov. 25, 2019, 11:27 a.m. UTC
This function will be used later in machine_shutdown() for some archs.

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Josh Poimboeuf <jpoimboe@redhat.com>
CC: "Peter Zijlstra (Intel)" <peterz@infradead.org>
CC: Jiri Kosina <jkosina@suse.cz>
CC: Nicholas Piggin <npiggin@gmail.com>
CC: Daniel Lezcano <daniel.lezcano@linaro.org>
CC: Ingo Molnar <mingo@kernel.org>
CC: Eiichi Tsukata <devel@etsukata.com>
CC: Zhenzhong Duan <zhenzhong.duan@oracle.com>
CC: Nadav Amit <namit@vmware.com>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
CC: Tony Luck <tony.luck@intel.com>
CC: Fenghua Yu <fenghua.yu@intel.com>
CC: Russell King <linux@armlinux.org.uk>
CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will@kernel.org>
CC: linux-arm-kernel@lists.infradead.org
CC: linux-ia64@vger.kernel.org
CC: linux-kernel@vger.kernel.org
---
 include/linux/cpu.h |  2 ++
 kernel/cpu.c        | 17 +++++++++++++++++
 2 files changed, 19 insertions(+)

Comments

Russell King (Oracle) Jan. 21, 2020, 5:03 p.m. UTC | #1
On Mon, Nov 25, 2019 at 11:27:41AM +0000, Qais Yousef wrote:
> This function will be used later in machine_shutdown() for some archs.
> 
> Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Josh Poimboeuf <jpoimboe@redhat.com>
> CC: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> CC: Jiri Kosina <jkosina@suse.cz>
> CC: Nicholas Piggin <npiggin@gmail.com>
> CC: Daniel Lezcano <daniel.lezcano@linaro.org>
> CC: Ingo Molnar <mingo@kernel.org>
> CC: Eiichi Tsukata <devel@etsukata.com>
> CC: Zhenzhong Duan <zhenzhong.duan@oracle.com>
> CC: Nadav Amit <namit@vmware.com>
> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> CC: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> CC: Tony Luck <tony.luck@intel.com>
> CC: Fenghua Yu <fenghua.yu@intel.com>
> CC: Russell King <linux@armlinux.org.uk>
> CC: Catalin Marinas <catalin.marinas@arm.com>
> CC: Will Deacon <will@kernel.org>
> CC: linux-arm-kernel@lists.infradead.org
> CC: linux-ia64@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> ---
>  include/linux/cpu.h |  2 ++
>  kernel/cpu.c        | 17 +++++++++++++++++
>  2 files changed, 19 insertions(+)
> 
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index bc6c879bd110..8229932fb053 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -118,6 +118,7 @@ extern void cpu_hotplug_disable(void);
>  extern void cpu_hotplug_enable(void);
>  void clear_tasks_mm_cpumask(int cpu);
>  int cpu_down(unsigned int cpu);
> +extern void smp_shutdown_nonboot_cpus(unsigned int primary_cpu);
>  
>  #else /* CONFIG_HOTPLUG_CPU */
>  
> @@ -129,6 +130,7 @@ static inline int  cpus_read_trylock(void) { return true; }
>  static inline void lockdep_assert_cpus_held(void) { }
>  static inline void cpu_hotplug_disable(void) { }
>  static inline void cpu_hotplug_enable(void) { }
> +static inline void smp_shutdown_nonboot_cpus(unsigned int primary_cpu) { }
>  #endif	/* !CONFIG_HOTPLUG_CPU */
>  
>  /* Wrappers which go away once all code is converted */
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index e2cad3ee2ead..94055a0d989e 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -1058,6 +1058,23 @@ int cpu_down(unsigned int cpu)
>  }
>  EXPORT_SYMBOL(cpu_down);
>  
> +void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
> +{
> +	unsigned int cpu;
> +
> +	if (!cpu_online(primary_cpu)) {
> +		pr_info("Attempting to shutdodwn nonboot cpus while boot cpu is offline!\n");
> +		cpu_online(primary_cpu);
> +	}
> +
> +	for_each_present_cpu(cpu) {
> +		if (cpu == primary_cpu)
> +			continue;
> +		if (cpu_online(cpu))
> +			cpu_down(cpu);
> +	}

How does this avoid racing with userspace attempting to restart CPUs
that have already been taken down by this function?
Qais Yousef Jan. 21, 2020, 5:47 p.m. UTC | #2
On 01/21/20 17:03, Russell King - ARM Linux admin wrote:
> On Mon, Nov 25, 2019 at 11:27:41AM +0000, Qais Yousef wrote:
> > +void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
> > +{
> > +	unsigned int cpu;
> > +
> > +	if (!cpu_online(primary_cpu)) {
> > +		pr_info("Attempting to shutdodwn nonboot cpus while boot cpu is offline!\n");
> > +		cpu_online(primary_cpu);

Eh, that should be cpu_up(primary_cpu)!

Which I have to say I'm not if is the right thing to do.
migrate_to_reboot_cpu() picks the first online cpu if reboot_cpu (assumed 0) is
offline

migrate_to_reboot_cpu():
 225         /* Make certain the cpu I'm about to reboot on is online */
 226         if (!cpu_online(cpu))
 227                 cpu = cpumask_first(cpu_online_mask);

> > +	}
> > +
> > +	for_each_present_cpu(cpu) {
> > +		if (cpu == primary_cpu)
> > +			continue;
> > +		if (cpu_online(cpu))
> > +			cpu_down(cpu);
> > +	}
> 
> How does this avoid racing with userspace attempting to restart CPUs
> that have already been taken down by this function?

This is meant to be called from machine_shutdown() only.

But you've got a point.

The previous logic that used disable_nonboot_cpus(), which in turn called
freeze_secondary_cpus() didn't hold hotplug lock. So I assumed the higher level
logic of machine_shutdown() ensures that hotplug lock is held to synchronize
with potential other hotplug operations.

But I can see now that it doesn't.

With this series that migrates users to use device_{online,offline}, holding
the lock_device_hotplug() should protect against such races.

Worth noting that this an existing problem in the code and not something
I introduced, of course it makes sense to fix it properly as part of this
series.

I'm not sure how the other archs deal with this TBH.

Thanks for having a look!

Cheers

--
Qais Yousef
Russell King (Oracle) Jan. 21, 2020, 6:09 p.m. UTC | #3
On Tue, Jan 21, 2020 at 05:47:52PM +0000, Qais Yousef wrote:
> On 01/21/20 17:03, Russell King - ARM Linux admin wrote:
> > On Mon, Nov 25, 2019 at 11:27:41AM +0000, Qais Yousef wrote:
> > > +void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
> > > +{
> > > +	unsigned int cpu;
> > > +
> > > +	if (!cpu_online(primary_cpu)) {
> > > +		pr_info("Attempting to shutdodwn nonboot cpus while boot cpu is offline!\n");
> > > +		cpu_online(primary_cpu);
> 
> Eh, that should be cpu_up(primary_cpu)!
> 
> Which I have to say I'm not if is the right thing to do.
> migrate_to_reboot_cpu() picks the first online cpu if reboot_cpu (assumed 0) is
> offline
> 
> migrate_to_reboot_cpu():
>  225         /* Make certain the cpu I'm about to reboot on is online */
>  226         if (!cpu_online(cpu))
>  227                 cpu = cpumask_first(cpu_online_mask);
> 
> > > +	}
> > > +
> > > +	for_each_present_cpu(cpu) {
> > > +		if (cpu == primary_cpu)
> > > +			continue;
> > > +		if (cpu_online(cpu))
> > > +			cpu_down(cpu);
> > > +	}
> > 
> > How does this avoid racing with userspace attempting to restart CPUs
> > that have already been taken down by this function?
> 
> This is meant to be called from machine_shutdown() only.
> 
> But you've got a point.
> 
> The previous logic that used disable_nonboot_cpus(), which in turn called
> freeze_secondary_cpus() didn't hold hotplug lock. So I assumed the higher level
> logic of machine_shutdown() ensures that hotplug lock is held to synchronize
> with potential other hotplug operations.

freeze_secondary_cpus() takes the CPU maps lock while it takes CPUs
down, and then disables cpu hotplug by incrementing
cpu_hotplug_disabled.  Incrementing that prevents cpu_up() and
cpu_down() being used, thereby preventing userspace from changing the
online state of any CPU in the system.

> But I can see now that it doesn't.
> 
> With this series that migrates users to use device_{online,offline}, holding
> the lock_device_hotplug() should protect against such races.
> 
> Worth noting that this an existing problem in the code and not something
> I introduced, of course it makes sense to fix it properly as part of this
> series.
> 
> I'm not sure how the other archs deal with this TBH.
> 
> Thanks for having a look!
> 
> Cheers
> 
> --
> Qais Yousef
>
Qais Yousef Jan. 22, 2020, 10:32 a.m. UTC | #4
On 01/21/20 18:09, Russell King - ARM Linux admin wrote:
> On Tue, Jan 21, 2020 at 05:47:52PM +0000, Qais Yousef wrote:
> > On 01/21/20 17:03, Russell King - ARM Linux admin wrote:
> > > On Mon, Nov 25, 2019 at 11:27:41AM +0000, Qais Yousef wrote:
> > > > +void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
> > > > +{
> > > > +	unsigned int cpu;
> > > > +
> > > > +	if (!cpu_online(primary_cpu)) {
> > > > +		pr_info("Attempting to shutdodwn nonboot cpus while boot cpu is offline!\n");
> > > > +		cpu_online(primary_cpu);
> > 
> > Eh, that should be cpu_up(primary_cpu)!
> > 
> > Which I have to say I'm not if is the right thing to do.
> > migrate_to_reboot_cpu() picks the first online cpu if reboot_cpu (assumed 0) is
> > offline
> > 
> > migrate_to_reboot_cpu():
> >  225         /* Make certain the cpu I'm about to reboot on is online */
> >  226         if (!cpu_online(cpu))
> >  227                 cpu = cpumask_first(cpu_online_mask);
> > 
> > > > +	}
> > > > +
> > > > +	for_each_present_cpu(cpu) {
> > > > +		if (cpu == primary_cpu)
> > > > +			continue;
> > > > +		if (cpu_online(cpu))
> > > > +			cpu_down(cpu);
> > > > +	}
> > > 
> > > How does this avoid racing with userspace attempting to restart CPUs
> > > that have already been taken down by this function?
> > 
> > This is meant to be called from machine_shutdown() only.
> > 
> > But you've got a point.
> > 
> > The previous logic that used disable_nonboot_cpus(), which in turn called
> > freeze_secondary_cpus() didn't hold hotplug lock. So I assumed the higher level
> > logic of machine_shutdown() ensures that hotplug lock is held to synchronize
> > with potential other hotplug operations.
> 
> freeze_secondary_cpus() takes the CPU maps lock while it takes CPUs
> down, and then disables cpu hotplug by incrementing
> cpu_hotplug_disabled.  Incrementing that prevents cpu_up() and
> cpu_down() being used, thereby preventing userspace from changing the
> online state of any CPU in the system.

I see. Sorry I missed the CPU maps lock.

Yes this makes sense and should work here too.

Thanks for the help.

Thomas, I'll wait for your comment on this and potentially other patches before
sending v3.

Thanks

--
Qais Yousef
diff mbox series

Patch

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index bc6c879bd110..8229932fb053 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -118,6 +118,7 @@  extern void cpu_hotplug_disable(void);
 extern void cpu_hotplug_enable(void);
 void clear_tasks_mm_cpumask(int cpu);
 int cpu_down(unsigned int cpu);
+extern void smp_shutdown_nonboot_cpus(unsigned int primary_cpu);
 
 #else /* CONFIG_HOTPLUG_CPU */
 
@@ -129,6 +130,7 @@  static inline int  cpus_read_trylock(void) { return true; }
 static inline void lockdep_assert_cpus_held(void) { }
 static inline void cpu_hotplug_disable(void) { }
 static inline void cpu_hotplug_enable(void) { }
+static inline void smp_shutdown_nonboot_cpus(unsigned int primary_cpu) { }
 #endif	/* !CONFIG_HOTPLUG_CPU */
 
 /* Wrappers which go away once all code is converted */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index e2cad3ee2ead..94055a0d989e 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1058,6 +1058,23 @@  int cpu_down(unsigned int cpu)
 }
 EXPORT_SYMBOL(cpu_down);
 
+void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
+{
+	unsigned int cpu;
+
+	if (!cpu_online(primary_cpu)) {
+		pr_info("Attempting to shutdodwn nonboot cpus while boot cpu is offline!\n");
+		cpu_online(primary_cpu);
+	}
+
+	for_each_present_cpu(cpu) {
+		if (cpu == primary_cpu)
+			continue;
+		if (cpu_online(cpu))
+			cpu_down(cpu);
+	}
+}
+
 #else
 #define takedown_cpu		NULL
 #endif /*CONFIG_HOTPLUG_CPU*/