diff mbox series

[v4,12/14] arm64: Prevent offlining first CPU with 32-bit EL0 on mismatched system

Message ID 20201124155039.13804-13-will@kernel.org (mailing list archive)
State New, archived
Headers show
Series An alternative series for asymmetric AArch32 systems | expand

Commit Message

Will Deacon Nov. 24, 2020, 3:50 p.m. UTC
If we want to support 32-bit applications, then when we identify a CPU
with mismatched 32-bit EL0 support we must ensure that we will always
have an active 32-bit CPU available to us from then on. This is important
for the scheduler, because is_cpu_allowed() will be constrained to 32-bit
CPUs for compat tasks and forced migration due to a hotplug event will
hang if no 32-bit CPUs are available.

On detecting a mismatch, prevent offlining of either the mismatching CPU
if it is 32-bit capable, or find the first active 32-bit capable CPU
otherwise.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/cpufeature.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Comments

Qais Yousef Nov. 27, 2020, 1:41 p.m. UTC | #1
On 11/24/20 15:50, Will Deacon wrote:
> If we want to support 32-bit applications, then when we identify a CPU
> with mismatched 32-bit EL0 support we must ensure that we will always
> have an active 32-bit CPU available to us from then on. This is important
> for the scheduler, because is_cpu_allowed() will be constrained to 32-bit
> CPUs for compat tasks and forced migration due to a hotplug event will
> hang if no 32-bit CPUs are available.
> 
> On detecting a mismatch, prevent offlining of either the mismatching CPU
> if it is 32-bit capable, or find the first active 32-bit capable CPU
> otherwise.
                                       ^^^^^

You use cpumask_any_and(). Better use cpumask_first_and()? We have a truly
random function now, cpumask_any_and_distribute(), if you'd like to pick
something 'truly' random.

I tried to clean up the cpumask_any* API mess, but ended up with too much
headache instead. Killng of cpumask_any*() might be the easier option. But
that's something different..

> 
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/kernel/cpufeature.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 29017cbb6c8e..fe470683b43e 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1237,6 +1237,8 @@ has_cpuid_feature(const struct arm64_cpu_capabilities *entry, int scope)
>  
>  static int enable_mismatched_32bit_el0(unsigned int cpu)
>  {
> +	static int lucky_winner = -1;
> +
>  	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, cpu);
>  	bool cpu_32bit = id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0);
>  
> @@ -1245,6 +1247,22 @@ static int enable_mismatched_32bit_el0(unsigned int cpu)
>  		static_branch_enable_cpuslocked(&arm64_mismatched_32bit_el0);
>  	}
>  
> +	if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit)
> +		return 0;

Hmm I'm struggling to get what you're doing here. You're treating CPU0 (the
boot CPU) specially here, but I don't get why?

> +
> +	if (lucky_winner >= 0)
> +		return 0;
> +
> +	/*
> +	 * We've detected a mismatch. We need to keep one of our CPUs with
> +	 * 32-bit EL0 online so that is_cpu_allowed() doesn't end up rejecting
> +	 * every CPU in the system for a 32-bit task.
> +	 */
> +	lucky_winner = cpu_32bit ? cpu : cpumask_any_and(cpu_32bit_el0_mask,
> +							 cpu_active_mask);

cpumask_any_and() could return an error. It could be hard or even impossible to
trigger, but better check if lucky_winner is not >= nr_cpu_ids before calling
get_cpu_device(lucky_winner) to stay in the safe side and avoid a potential
splat?

We can do better by the way and do smarter check in remove_cpu() to block
offlining the last aarch32 capable CPU without 'hardcoding' a specific cpu. But
won't insist and happy to wait for someone to come complaining this is not good
enough first.

Some vendors play games with hotplug to help with saving power. They might want
to dynamically nominate the last man standing 32bit capable CPU. Again, we can
wait for someone to complain first I guess.

Cheers

--
Qais Yousef

> +	get_cpu_device(lucky_winner)->offline_disabled = true;
> +	pr_info("Asymmetric 32-bit EL0 support detected on CPU %u; CPU hot-unplug disabled on CPU %u\n",
> +		cpu, lucky_winner);
>  	return 0;
>  }
>  
> -- 
> 2.29.2.454.gaff20da3a2-goog
>
Will Deacon Dec. 1, 2020, 10:13 p.m. UTC | #2
On Fri, Nov 27, 2020 at 01:41:22PM +0000, Qais Yousef wrote:
> On 11/24/20 15:50, Will Deacon wrote:
> > If we want to support 32-bit applications, then when we identify a CPU
> > with mismatched 32-bit EL0 support we must ensure that we will always
> > have an active 32-bit CPU available to us from then on. This is important
> > for the scheduler, because is_cpu_allowed() will be constrained to 32-bit
> > CPUs for compat tasks and forced migration due to a hotplug event will
> > hang if no 32-bit CPUs are available.
> > 
> > On detecting a mismatch, prevent offlining of either the mismatching CPU
> > if it is 32-bit capable, or find the first active 32-bit capable CPU
> > otherwise.
>                                        ^^^^^
> 
> You use cpumask_any_and(). Better use cpumask_first_and()? We have a truly
> random function now, cpumask_any_and_distribute(), if you'd like to pick
> something 'truly' random.

I think cpumask_any_and() is better, because it makes it clear that I don't
care about which CPU is chosen (and under the hood it ends up calling
cpumask_first_and() _anyway_). So this is purely cosmetic.

> > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > index 29017cbb6c8e..fe470683b43e 100644
> > --- a/arch/arm64/kernel/cpufeature.c
> > +++ b/arch/arm64/kernel/cpufeature.c
> > @@ -1237,6 +1237,8 @@ has_cpuid_feature(const struct arm64_cpu_capabilities *entry, int scope)
> >  
> >  static int enable_mismatched_32bit_el0(unsigned int cpu)
> >  {
> > +	static int lucky_winner = -1;
> > +
> >  	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, cpu);
> >  	bool cpu_32bit = id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0);
> >  
> > @@ -1245,6 +1247,22 @@ static int enable_mismatched_32bit_el0(unsigned int cpu)
> >  		static_branch_enable_cpuslocked(&arm64_mismatched_32bit_el0);
> >  	}
> >  
> > +	if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit)
> > +		return 0;
> 
> Hmm I'm struggling to get what you're doing here. You're treating CPU0 (the
> boot CPU) specially here, but I don't get why?

If our ability to execute 32-bit code is the same as the boot CPU then we
don't have to do anything. That way, we can postpone nominating the lucky
winner until we really need to.

> > +	if (lucky_winner >= 0)
> > +		return 0;
> > +
> > +	/*
> > +	 * We've detected a mismatch. We need to keep one of our CPUs with
> > +	 * 32-bit EL0 online so that is_cpu_allowed() doesn't end up rejecting
> > +	 * every CPU in the system for a 32-bit task.
> > +	 */
> > +	lucky_winner = cpu_32bit ? cpu : cpumask_any_and(cpu_32bit_el0_mask,
> > +							 cpu_active_mask);
> 
> cpumask_any_and() could return an error. It could be hard or even impossible to
> trigger, but better check if lucky_winner is not >= nr_cpu_ids before calling
> get_cpu_device(lucky_winner) to stay in the safe side and avoid a potential
> splat?

I don't see how it can return an error here. There are two cases to
consider:

  1. The CPU being brought online is the first 32-bit-capable CPU. In which
     case, we don't use cpumask_any_and() at all.

  2. The CPU being brought online is the first 64-bit-only CPU. In which
     case, the CPU doing the onlining is 32-bit capable and will be in
     the active mask.

> We can do better by the way and do smarter check in remove_cpu() to block
> offlining the last aarch32 capable CPU without 'hardcoding' a specific cpu. But
> won't insist and happy to wait for someone to come complaining this is not good
> enough first.

I couldn't find a satisfactory way to do this without the possibility of
subtle races, so I'd prefer to keep it simple for the moment. In particular,
I wanted to make sure that somebody iterating over the cpu_possible_mask
and calling is_cpu_allowed(p, cpu) for each CPU and a 32-bit task can not
reach the end of the mask without ever getting a value of 'true'.

I'm open to revisiting this once some of this is merged, but right now
I don't think it's needed and it certainly adds complexity.

> Some vendors play games with hotplug to help with saving power. They might want
> to dynamically nominate the last man standing 32bit capable CPU. Again, we can
> wait for someone to complain first I guess.

The reality is that either all "big" cores or all "little" cores will be the
ones that are 32-bit capable, so I doubt it matters an awful lot which one
of the cluster is left online from a PM perspective. The real problem is
that a core has to be left online at all, but I don't think we can avoid
that.

Will
Qais Yousef Dec. 2, 2020, 12:59 p.m. UTC | #3
On 12/01/20 22:13, Will Deacon wrote:
> On Fri, Nov 27, 2020 at 01:41:22PM +0000, Qais Yousef wrote:
> > On 11/24/20 15:50, Will Deacon wrote:
> > > If we want to support 32-bit applications, then when we identify a CPU
> > > with mismatched 32-bit EL0 support we must ensure that we will always
> > > have an active 32-bit CPU available to us from then on. This is important
> > > for the scheduler, because is_cpu_allowed() will be constrained to 32-bit
> > > CPUs for compat tasks and forced migration due to a hotplug event will
> > > hang if no 32-bit CPUs are available.
> > > 
> > > On detecting a mismatch, prevent offlining of either the mismatching CPU
> > > if it is 32-bit capable, or find the first active 32-bit capable CPU
> > > otherwise.
> >                                        ^^^^^
> > 
> > You use cpumask_any_and(). Better use cpumask_first_and()? We have a truly
> > random function now, cpumask_any_and_distribute(), if you'd like to pick
> > something 'truly' random.
> 
> I think cpumask_any_and() is better, because it makes it clear that I don't
> care about which CPU is chosen (and under the hood it ends up calling
> cpumask_first_and() _anyway_). So this is purely cosmetic.
> 
> > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > > index 29017cbb6c8e..fe470683b43e 100644
> > > --- a/arch/arm64/kernel/cpufeature.c
> > > +++ b/arch/arm64/kernel/cpufeature.c
> > > @@ -1237,6 +1237,8 @@ has_cpuid_feature(const struct arm64_cpu_capabilities *entry, int scope)
> > >  
> > >  static int enable_mismatched_32bit_el0(unsigned int cpu)
> > >  {
> > > +	static int lucky_winner = -1;
> > > +
> > >  	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, cpu);
> > >  	bool cpu_32bit = id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0);
> > >  
> > > @@ -1245,6 +1247,22 @@ static int enable_mismatched_32bit_el0(unsigned int cpu)
> > >  		static_branch_enable_cpuslocked(&arm64_mismatched_32bit_el0);
> > >  	}
> > >  
> > > +	if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit)
> > > +		return 0;
> > 
> > Hmm I'm struggling to get what you're doing here. You're treating CPU0 (the
> > boot CPU) specially here, but I don't get why?
> 
> If our ability to execute 32-bit code is the same as the boot CPU then we
> don't have to do anything. That way, we can postpone nominating the lucky
> winner until we really need to.

Okay I see what you're doing now. The '== cpu_32bit' part of the check gave me
trouble. If the first N cpus are 64bit only, we'll skip them here. Worth
a comment?

Wouldn't it be better to replace this with a check if cpu_32bit_el0_mask is
empty instead? That would be a lot easier to read.

> 
> > > +	if (lucky_winner >= 0)
> > > +		return 0;
> > > +
> > > +	/*
> > > +	 * We've detected a mismatch. We need to keep one of our CPUs with
> > > +	 * 32-bit EL0 online so that is_cpu_allowed() doesn't end up rejecting
> > > +	 * every CPU in the system for a 32-bit task.
> > > +	 */
> > > +	lucky_winner = cpu_32bit ? cpu : cpumask_any_and(cpu_32bit_el0_mask,
> > > +							 cpu_active_mask);
> > 
> > cpumask_any_and() could return an error. It could be hard or even impossible to
> > trigger, but better check if lucky_winner is not >= nr_cpu_ids before calling
> > get_cpu_device(lucky_winner) to stay in the safe side and avoid a potential
> > splat?
> 
> I don't see how it can return an error here. There are two cases to
> consider:
> 
>   1. The CPU being brought online is the first 32-bit-capable CPU. In which
>      case, we don't use cpumask_any_and() at all.
> 
>   2. The CPU being brought online is the first 64-bit-only CPU. In which
>      case, the CPU doing the onlining is 32-bit capable and will be in
>      the active mask.

Now I understand the if condition above; yeah I think this will not hit.
The condition above simply guarantees cpu_32bit_el0_mask is not empty. And
since this is online path, there's a guarantee there's a single bit shared
between the 2 masks since the same path must have set this shared bit.

> 
> > We can do better by the way and do smarter check in remove_cpu() to block
> > offlining the last aarch32 capable CPU without 'hardcoding' a specific cpu. But
> > won't insist and happy to wait for someone to come complaining this is not good
> > enough first.
> 
> I couldn't find a satisfactory way to do this without the possibility of
> subtle races, so I'd prefer to keep it simple for the moment. In particular,
> I wanted to make sure that somebody iterating over the cpu_possible_mask
> and calling is_cpu_allowed(p, cpu) for each CPU and a 32-bit task can not
> reach the end of the mask without ever getting a value of 'true'.
> 
> I'm open to revisiting this once some of this is merged, but right now
> I don't think it's needed and it certainly adds complexity.

Agreed. I just wanted to share some awareness. Let's not make this series more
complicated than it needs to be.

> 
> > Some vendors play games with hotplug to help with saving power. They might want
> > to dynamically nominate the last man standing 32bit capable CPU. Again, we can
> > wait for someone to complain first I guess.
> 
> The reality is that either all "big" cores or all "little" cores will be the
> ones that are 32-bit capable, so I doubt it matters an awful lot which one
> of the cluster is left online from a PM perspective. The real problem is
> that a core has to be left online at all, but I don't think we can avoid
> that.

I don't have specific info, but in theory it could matter. We enforce
a specific core to be always online, rather than allow any core to be offlined
until there's a single one left in the mask. I agree we shouldn't worry about
this for now.

And yes, we can't avoid keeping the last 32bit capable cpu online. The same way
we can't offline the last cpu in a normal system.

Thanks

--
Qais Yousef
Will Deacon Dec. 2, 2020, 5:42 p.m. UTC | #4
On Wed, Dec 02, 2020 at 12:59:52PM +0000, Qais Yousef wrote:
> On 12/01/20 22:13, Will Deacon wrote:
> > On Fri, Nov 27, 2020 at 01:41:22PM +0000, Qais Yousef wrote:
> > > On 11/24/20 15:50, Will Deacon wrote:
> > > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > > > index 29017cbb6c8e..fe470683b43e 100644
> > > > --- a/arch/arm64/kernel/cpufeature.c
> > > > +++ b/arch/arm64/kernel/cpufeature.c
> > > > @@ -1237,6 +1237,8 @@ has_cpuid_feature(const struct arm64_cpu_capabilities *entry, int scope)
> > > >  
> > > >  static int enable_mismatched_32bit_el0(unsigned int cpu)
> > > >  {
> > > > +	static int lucky_winner = -1;
> > > > +
> > > >  	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, cpu);
> > > >  	bool cpu_32bit = id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0);
> > > >  
> > > > @@ -1245,6 +1247,22 @@ static int enable_mismatched_32bit_el0(unsigned int cpu)
> > > >  		static_branch_enable_cpuslocked(&arm64_mismatched_32bit_el0);
> > > >  	}
> > > >  
> > > > +	if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit)
> > > > +		return 0;
> > > 
> > > Hmm I'm struggling to get what you're doing here. You're treating CPU0 (the
> > > boot CPU) specially here, but I don't get why?
> > 
> > If our ability to execute 32-bit code is the same as the boot CPU then we
> > don't have to do anything. That way, we can postpone nominating the lucky
> > winner until we really need to.
> 
> Okay I see what you're doing now. The '== cpu_32bit' part of the check gave me
> trouble. If the first N cpus are 64bit only, we'll skip them here. Worth
> a comment?
> 
> Wouldn't it be better to replace this with a check if cpu_32bit_el0_mask is
> empty instead? That would be a lot easier to read.

Sorry, but I don't follow. What if all the CPUs are 32-bit capable?

I'll leave the code as-is, since I don't think it's particularly hard to
understand, and it does the right thing.

Will
Qais Yousef Dec. 2, 2020, 6:08 p.m. UTC | #5
On 12/02/20 17:42, Will Deacon wrote:
> On Wed, Dec 02, 2020 at 12:59:52PM +0000, Qais Yousef wrote:
> > On 12/01/20 22:13, Will Deacon wrote:
> > > On Fri, Nov 27, 2020 at 01:41:22PM +0000, Qais Yousef wrote:
> > > > On 11/24/20 15:50, Will Deacon wrote:
> > > > > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > > > > index 29017cbb6c8e..fe470683b43e 100644
> > > > > --- a/arch/arm64/kernel/cpufeature.c
> > > > > +++ b/arch/arm64/kernel/cpufeature.c
> > > > > @@ -1237,6 +1237,8 @@ has_cpuid_feature(const struct arm64_cpu_capabilities *entry, int scope)
> > > > >  
> > > > >  static int enable_mismatched_32bit_el0(unsigned int cpu)
> > > > >  {
> > > > > +	static int lucky_winner = -1;
> > > > > +
> > > > >  	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, cpu);
> > > > >  	bool cpu_32bit = id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0);
> > > > >  
> > > > > @@ -1245,6 +1247,22 @@ static int enable_mismatched_32bit_el0(unsigned int cpu)
> > > > >  		static_branch_enable_cpuslocked(&arm64_mismatched_32bit_el0);
> > > > >  	}
> > > > >  
> > > > > +	if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit)
> > > > > +		return 0;
> > > > 
> > > > Hmm I'm struggling to get what you're doing here. You're treating CPU0 (the
> > > > boot CPU) specially here, but I don't get why?
> > > 
> > > If our ability to execute 32-bit code is the same as the boot CPU then we
> > > don't have to do anything. That way, we can postpone nominating the lucky
> > > winner until we really need to.
> > 
> > Okay I see what you're doing now. The '== cpu_32bit' part of the check gave me
> > trouble. If the first N cpus are 64bit only, we'll skip them here. Worth
> > a comment?
> > 
> > Wouldn't it be better to replace this with a check if cpu_32bit_el0_mask is
> > empty instead? That would be a lot easier to read.
> 
> Sorry, but I don't follow. What if all the CPUs are 32-bit capable?

You're right I missed this case.

--
Qais Yousef
diff mbox series

Patch

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 29017cbb6c8e..fe470683b43e 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1237,6 +1237,8 @@  has_cpuid_feature(const struct arm64_cpu_capabilities *entry, int scope)
 
 static int enable_mismatched_32bit_el0(unsigned int cpu)
 {
+	static int lucky_winner = -1;
+
 	struct cpuinfo_arm64 *info = &per_cpu(cpu_data, cpu);
 	bool cpu_32bit = id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0);
 
@@ -1245,6 +1247,22 @@  static int enable_mismatched_32bit_el0(unsigned int cpu)
 		static_branch_enable_cpuslocked(&arm64_mismatched_32bit_el0);
 	}
 
+	if (cpumask_test_cpu(0, cpu_32bit_el0_mask) == cpu_32bit)
+		return 0;
+
+	if (lucky_winner >= 0)
+		return 0;
+
+	/*
+	 * We've detected a mismatch. We need to keep one of our CPUs with
+	 * 32-bit EL0 online so that is_cpu_allowed() doesn't end up rejecting
+	 * every CPU in the system for a 32-bit task.
+	 */
+	lucky_winner = cpu_32bit ? cpu : cpumask_any_and(cpu_32bit_el0_mask,
+							 cpu_active_mask);
+	get_cpu_device(lucky_winner)->offline_disabled = true;
+	pr_info("Asymmetric 32-bit EL0 support detected on CPU %u; CPU hot-unplug disabled on CPU %u\n",
+		cpu, lucky_winner);
 	return 0;
 }