diff mbox series

[v8,15/19] arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system

Message ID 20210602164719.31777-16-will@kernel.org (mailing list archive)
State New, archived
Headers show
Series Add support for 32-bit tasks on asymmetric AArch32 systems | expand

Commit Message

Will Deacon June 2, 2021, 4:47 p.m. UTC
If we want to support 32-bit applications, then when we identify a CPU
with mismatched 32-bit EL0 support we must ensure that we will always
have an active 32-bit CPU available to us from then on. This is important
for the scheduler, because is_cpu_allowed() will be constrained to 32-bit
CPUs for compat tasks and forced migration due to a hotplug event will
hang if no 32-bit CPUs are available.

On detecting a mismatch, prevent offlining of either the mismatching CPU
if it is 32-bit capable, or find the first active 32-bit capable CPU
otherwise.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/cpufeature.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

Comments

Mark Rutland June 3, 2021, 12:58 p.m. UTC | #1
On Wed, Jun 02, 2021 at 05:47:15PM +0100, Will Deacon wrote:
> If we want to support 32-bit applications, then when we identify a CPU
> with mismatched 32-bit EL0 support we must ensure that we will always
> have an active 32-bit CPU available to us from then on. This is important
> for the scheduler, because is_cpu_allowed() will be constrained to 32-bit
> CPUs for compat tasks and forced migration due to a hotplug event will
> hang if no 32-bit CPUs are available.
> 
> On detecting a mismatch, prevent offlining of either the mismatching CPU
> if it is 32-bit capable, or find the first active 32-bit capable CPU
> otherwise.
> 
> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/kernel/cpufeature.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 4194a47de62d..b31d7a1eaed6 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -2877,15 +2877,33 @@ void __init setup_cpu_features(void)
>  
>  static int enable_mismatched_32bit_el0(unsigned int cpu)
>  {
> +	static int lucky_winner = -1;

This is cute, but could we please give it a meaningful name, e.g.
`pinned_cpu` ?

> +
>  	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, cpu);
>  	bool cpu_32bit = id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0);
>  
>  	if (cpu_32bit) {
>  		cpumask_set_cpu(cpu, cpu_32bit_el0_mask);
>  		static_branch_enable_cpuslocked(&arm64_mismatched_32bit_el0);
> -		setup_elf_hwcaps(compat_elf_hwcaps);
>  	}
>  
> +	if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit)
> +		return 0;
> +
> +	if (lucky_winner >= 0)
> +		return 0;
> +
> +	/*
> +	 * We've detected a mismatch. We need to keep one of our CPUs with
> +	 * 32-bit EL0 online so that is_cpu_allowed() doesn't end up rejecting
> +	 * every CPU in the system for a 32-bit task.
> +	 */
> +	lucky_winner = cpu_32bit ? cpu : cpumask_any_and(cpu_32bit_el0_mask,
> +							 cpu_active_mask);
> +	get_cpu_device(lucky_winner)->offline_disabled = true;
> +	setup_elf_hwcaps(compat_elf_hwcaps);
> +	pr_info("Asymmetric 32-bit EL0 support detected on CPU %u; CPU hot-unplug disabled on CPU %u\n",
> +		cpu, lucky_winner);
>  	return 0;
>  }

I guess this is going to play havoc with kexec and hibernate. :/

Thanks,
Mark.
Will Deacon June 3, 2021, 5:40 p.m. UTC | #2
On Thu, Jun 03, 2021 at 01:58:56PM +0100, Mark Rutland wrote:
> On Wed, Jun 02, 2021 at 05:47:15PM +0100, Will Deacon wrote:
> > If we want to support 32-bit applications, then when we identify a CPU
> > with mismatched 32-bit EL0 support we must ensure that we will always
> > have an active 32-bit CPU available to us from then on. This is important
> > for the scheduler, because is_cpu_allowed() will be constrained to 32-bit
> > CPUs for compat tasks and forced migration due to a hotplug event will
> > hang if no 32-bit CPUs are available.
> > 
> > On detecting a mismatch, prevent offlining of either the mismatching CPU
> > if it is 32-bit capable, or find the first active 32-bit capable CPU
> > otherwise.
> > 
> > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/kernel/cpufeature.c | 20 +++++++++++++++++++-
> >  1 file changed, 19 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > index 4194a47de62d..b31d7a1eaed6 100644
> > --- a/arch/arm64/kernel/cpufeature.c
> > +++ b/arch/arm64/kernel/cpufeature.c
> > @@ -2877,15 +2877,33 @@ void __init setup_cpu_features(void)
> >  
> >  static int enable_mismatched_32bit_el0(unsigned int cpu)
> >  {
> > +	static int lucky_winner = -1;
> 
> This is cute, but could we please give it a meaningful name, e.g.
> `pinned_cpu` ?

I really don't see the problem, nor why it's "cute".

Tell you what, I'll add a comment instead:

	/*
	 * The first 32-bit-capable CPU we detected and so can no longer
	 * be offlined by userspace. -1 indicates we haven't yet onlined
	 * a 32-bit-capable CPU.
	 */

> >  	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, cpu);
> >  	bool cpu_32bit = id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0);
> >  
> >  	if (cpu_32bit) {
> >  		cpumask_set_cpu(cpu, cpu_32bit_el0_mask);
> >  		static_branch_enable_cpuslocked(&arm64_mismatched_32bit_el0);
> > -		setup_elf_hwcaps(compat_elf_hwcaps);
> >  	}
> >  
> > +	if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit)
> > +		return 0;
> > +
> > +	if (lucky_winner >= 0)
> > +		return 0;
> > +
> > +	/*
> > +	 * We've detected a mismatch. We need to keep one of our CPUs with
> > +	 * 32-bit EL0 online so that is_cpu_allowed() doesn't end up rejecting
> > +	 * every CPU in the system for a 32-bit task.
> > +	 */
> > +	lucky_winner = cpu_32bit ? cpu : cpumask_any_and(cpu_32bit_el0_mask,
> > +							 cpu_active_mask);
> > +	get_cpu_device(lucky_winner)->offline_disabled = true;
> > +	setup_elf_hwcaps(compat_elf_hwcaps);
> > +	pr_info("Asymmetric 32-bit EL0 support detected on CPU %u; CPU hot-unplug disabled on CPU %u\n",
> > +		cpu, lucky_winner);
> >  	return 0;
> >  }
> 
> I guess this is going to play havoc with kexec and hibernate. :/

The kernel can still offline the CPUs (see the whole freezer mess that I
linked to in the cover letter). What specific havoc are you thinking of?

Will
Mark Rutland June 4, 2021, 9:49 a.m. UTC | #3
On Thu, Jun 03, 2021 at 06:40:57PM +0100, Will Deacon wrote:
> On Thu, Jun 03, 2021 at 01:58:56PM +0100, Mark Rutland wrote:
> > On Wed, Jun 02, 2021 at 05:47:15PM +0100, Will Deacon wrote:
> > > If we want to support 32-bit applications, then when we identify a CPU
> > > with mismatched 32-bit EL0 support we must ensure that we will always
> > > have an active 32-bit CPU available to us from then on. This is important
> > > for the scheduler, because is_cpu_allowed() will be constrained to 32-bit
> > > CPUs for compat tasks and forced migration due to a hotplug event will
> > > hang if no 32-bit CPUs are available.
> > > 
> > > On detecting a mismatch, prevent offlining of either the mismatching CPU
> > > if it is 32-bit capable, or find the first active 32-bit capable CPU
> > > otherwise.
> > > 
> > > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > > Signed-off-by: Will Deacon <will@kernel.org>
> > > ---
> > >  arch/arm64/kernel/cpufeature.c | 20 +++++++++++++++++++-
> > >  1 file changed, 19 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > > index 4194a47de62d..b31d7a1eaed6 100644
> > > --- a/arch/arm64/kernel/cpufeature.c
> > > +++ b/arch/arm64/kernel/cpufeature.c
> > > @@ -2877,15 +2877,33 @@ void __init setup_cpu_features(void)
> > >  
> > >  static int enable_mismatched_32bit_el0(unsigned int cpu)
> > >  {
> > > +	static int lucky_winner = -1;
> > 
> > This is cute, but could we please give it a meaningful name, e.g.
> > `pinned_cpu` ?
> 
> I really don't see the problem, nor why it's "cute".
> 
> Tell you what, I'll add a comment instead:
> 
> 	/*
> 	 * The first 32-bit-capable CPU we detected and so can no longer
> 	 * be offlined by userspace. -1 indicates we haven't yet onlined
> 	 * a 32-bit-capable CPU.
> 	 */

Thanks for the comment; that's helpful.

However, my concern here is that when we inevitably have to discuss this
with others in future, "lucky winner" is jarring (and also unclear to
those where English is not their native language). For clarity, it would
be really nice to use a term like "cpu", "chosen_cpu", "pinned_cpu",
etc.

However, you're the maintainer; choose what you think is appropriate.

> > >  	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, cpu);
> > >  	bool cpu_32bit = id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0);
> > >  
> > >  	if (cpu_32bit) {
> > >  		cpumask_set_cpu(cpu, cpu_32bit_el0_mask);
> > >  		static_branch_enable_cpuslocked(&arm64_mismatched_32bit_el0);
> > > -		setup_elf_hwcaps(compat_elf_hwcaps);
> > >  	}
> > >  
> > > +	if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit)
> > > +		return 0;
> > > +
> > > +	if (lucky_winner >= 0)
> > > +		return 0;
> > > +
> > > +	/*
> > > +	 * We've detected a mismatch. We need to keep one of our CPUs with
> > > +	 * 32-bit EL0 online so that is_cpu_allowed() doesn't end up rejecting
> > > +	 * every CPU in the system for a 32-bit task.
> > > +	 */
> > > +	lucky_winner = cpu_32bit ? cpu : cpumask_any_and(cpu_32bit_el0_mask,
> > > +							 cpu_active_mask);
> > > +	get_cpu_device(lucky_winner)->offline_disabled = true;
> > > +	setup_elf_hwcaps(compat_elf_hwcaps);
> > > +	pr_info("Asymmetric 32-bit EL0 support detected on CPU %u; CPU hot-unplug disabled on CPU %u\n",
> > > +		cpu, lucky_winner);
> > >  	return 0;
> > >  }
> > 
> > I guess this is going to play havoc with kexec and hibernate. :/
> 
> The kernel can still offline the CPUs (see the whole freezer mess that I
> linked to in the cover letter). What specific havoc are you thinking of?

Ah. If this is just inhibiting userspace-driven offlining, that sounds
fine.

For kexec, I was concerned that either this would inhibit kexec, or
smp_shutdown_nonboot_cpus() would fail to offline the pinned CPU, and
that'd trigger a BUG(), which would be unfortunate.

For hibernate, the equivalent is freeze_secondary_cpus(), which I guess
is dealt with by the freezer bits you mention.

Thanks,
Mark.
Qais Yousef June 4, 2021, 12:14 p.m. UTC | #4
On 06/04/21 10:49, Mark Rutland wrote:
> On Thu, Jun 03, 2021 at 06:40:57PM +0100, Will Deacon wrote:
> > On Thu, Jun 03, 2021 at 01:58:56PM +0100, Mark Rutland wrote:
> > > On Wed, Jun 02, 2021 at 05:47:15PM +0100, Will Deacon wrote:
> > > > If we want to support 32-bit applications, then when we identify a CPU
> > > > with mismatched 32-bit EL0 support we must ensure that we will always
> > > > have an active 32-bit CPU available to us from then on. This is important
> > > > for the scheduler, because is_cpu_allowed() will be constrained to 32-bit
> > > > CPUs for compat tasks and forced migration due to a hotplug event will
> > > > hang if no 32-bit CPUs are available.
> > > > 
> > > > On detecting a mismatch, prevent offlining of either the mismatching CPU
> > > > if it is 32-bit capable, or find the first active 32-bit capable CPU
> > > > otherwise.
> > > > 
> > > > Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
> > > > Signed-off-by: Will Deacon <will@kernel.org>
> > > > ---
> > > >  arch/arm64/kernel/cpufeature.c | 20 +++++++++++++++++++-
> > > >  1 file changed, 19 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > > > index 4194a47de62d..b31d7a1eaed6 100644
> > > > --- a/arch/arm64/kernel/cpufeature.c
> > > > +++ b/arch/arm64/kernel/cpufeature.c
> > > > @@ -2877,15 +2877,33 @@ void __init setup_cpu_features(void)
> > > >  
> > > >  static int enable_mismatched_32bit_el0(unsigned int cpu)
> > > >  {
> > > > +	static int lucky_winner = -1;
> > > 
> > > This is cute, but could we please give it a meaningful name, e.g.
> > > `pinned_cpu` ?
> > 
> > I really don't see the problem, nor why it's "cute".
> > 
> > Tell you what, I'll add a comment instead:
> > 
> > 	/*
> > 	 * The first 32-bit-capable CPU we detected and so can no longer
> > 	 * be offlined by userspace. -1 indicates we haven't yet onlined
> > 	 * a 32-bit-capable CPU.
> > 	 */
> 
> Thanks for the comment; that's helpful.
> 
> However, my concern here is that when we inevitably have to discuss this
> with others in future, "lucky winner" is jarring (and also unclear to
> those where English is not their native language). For clarity, it would
> be really nice to use a term like "cpu", "chosen_cpu", "pinned_cpu",
> etc.
> 
> However, you're the maintainer; choose what you think is appropriate.
> 
> > > >  	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, cpu);
> > > >  	bool cpu_32bit = id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0);
> > > >  
> > > >  	if (cpu_32bit) {
> > > >  		cpumask_set_cpu(cpu, cpu_32bit_el0_mask);
> > > >  		static_branch_enable_cpuslocked(&arm64_mismatched_32bit_el0);
> > > > -		setup_elf_hwcaps(compat_elf_hwcaps);
> > > >  	}
> > > >  
> > > > +	if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit)
> > > > +		return 0;
> > > > +
> > > > +	if (lucky_winner >= 0)
> > > > +		return 0;
> > > > +
> > > > +	/*
> > > > +	 * We've detected a mismatch. We need to keep one of our CPUs with
> > > > +	 * 32-bit EL0 online so that is_cpu_allowed() doesn't end up rejecting
> > > > +	 * every CPU in the system for a 32-bit task.
> > > > +	 */
> > > > +	lucky_winner = cpu_32bit ? cpu : cpumask_any_and(cpu_32bit_el0_mask,
> > > > +							 cpu_active_mask);
> > > > +	get_cpu_device(lucky_winner)->offline_disabled = true;
> > > > +	setup_elf_hwcaps(compat_elf_hwcaps);
> > > > +	pr_info("Asymmetric 32-bit EL0 support detected on CPU %u; CPU hot-unplug disabled on CPU %u\n",
> > > > +		cpu, lucky_winner);
> > > >  	return 0;
> > > >  }
> > > 
> > > I guess this is going to play havoc with kexec and hibernate. :/
> > 
> > The kernel can still offline the CPUs (see the whole freezer mess that I
> > linked to in the cover letter). What specific havoc are you thinking of?
> 
> Ah. If this is just inhibiting userspace-driven offlining, that sounds
> fine.
> 
> For kexec, I was concerned that either this would inhibit kexec, or
> smp_shutdown_nonboot_cpus() would fail to offline the pinned CPU, and
> that'd trigger a BUG(), which would be unfortunate.
> 
> For hibernate, the equivalent is freeze_secondary_cpus(), which I guess
> is dealt with by the freezer bits you mention.

()->offline_disabled will only block offline requests performed by
device_offline(). kexec, hibernate, suspend/resume use cpu_online/offline()
directly so won't be impacted by that. I have sent patches that make
cpu_online/offline() 'private' and not used or exported outside of cpu
subsystem and the odd support function for arch code. All other users use
device_offline() or add/remove_cpu() now.

I have made sure to test kexec, suspend to disk and ram in my older similar
implementation in the past. So we should be good.

Cheers

--
Qais Yousef
diff mbox series

Patch

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 4194a47de62d..b31d7a1eaed6 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2877,15 +2877,33 @@  void __init setup_cpu_features(void)
 
 static int enable_mismatched_32bit_el0(unsigned int cpu)
 {
+	static int lucky_winner = -1;
+
 	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, cpu);
 	bool cpu_32bit = id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0);
 
 	if (cpu_32bit) {
 		cpumask_set_cpu(cpu, cpu_32bit_el0_mask);
 		static_branch_enable_cpuslocked(&arm64_mismatched_32bit_el0);
-		setup_elf_hwcaps(compat_elf_hwcaps);
 	}
 
+	if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit)
+		return 0;
+
+	if (lucky_winner >= 0)
+		return 0;
+
+	/*
+	 * We've detected a mismatch. We need to keep one of our CPUs with
+	 * 32-bit EL0 online so that is_cpu_allowed() doesn't end up rejecting
+	 * every CPU in the system for a 32-bit task.
+	 */
+	lucky_winner = cpu_32bit ? cpu : cpumask_any_and(cpu_32bit_el0_mask,
+							 cpu_active_mask);
+	get_cpu_device(lucky_winner)->offline_disabled = true;
+	setup_elf_hwcaps(compat_elf_hwcaps);
+	pr_info("Asymmetric 32-bit EL0 support detected on CPU %u; CPU hot-unplug disabled on CPU %u\n",
+		cpu, lucky_winner);
 	return 0;
 }