diff mbox series

arm64: smp: do not allocate CPU IDs to invalid CPU nodes

Message ID 20240621075045.249798-1-lingyue@xiaomi.com (mailing list archive)
State New, archived
Headers show
Series arm64: smp: do not allocate CPU IDs to invalid CPU nodes | expand

Commit Message

Lingyue June 21, 2024, 7:50 a.m. UTC
Many modules, such as arch topology, rely on num_possible_cpus() to
allocate memory and then access the allocated space using CPU IDs.
These modules assume that there are no gaps in cpu_possible_mask.
However, in of_parse_and_init_cpus(), CPU IDs are still allocated
for invalid CPU nodes, leading to gaps in cpu_possible_mask and
resulting in out-of-bounds memory access. So it is crucial to avoid
allocating CPU IDs to invalid CPU nodes.

This issue can be reproduced easily on QEMU with KASAN enabled, by
modifing reg property of a CPU node to 0xFFFFFFFF

[    0.197756] BUG: KASAN: slab-out-of-bounds in topology_normalize_cpu_scale.part.0+0x2cc/0x34c
[    0.199518] Read of size 4 at addr ffff000007ebe924 by task swapper/0/1
[    0.200087]
[    0.200739] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.10.0-rc4 #3
[    0.201647] Hardware name: linux,dummy-virt (DT)
[    0.203067] Call trace:
[    0.203404]  dump_backtrace+0x90/0xe8
[    0.203974]  show_stack+0x18/0x24
[    0.204424]  dump_stack_lvl+0x78/0x90
[    0.205090]  print_report+0x114/0x5cc
[    0.205908]  kasan_report+0xa4/0xf0
[    0.206488]  __asan_report_load4_noabort+0x20/0x2c
[    0.207427]  topology_normalize_cpu_scale.part.0+0x2cc/0x34c
[    0.208275]  init_cpu_topology+0x254/0x430
[    0.209518]  smp_prepare_cpus+0x20/0x25c
[    0.210824]  kernel_init_freeable+0x1dc/0x4fc
[    0.212047]  kernel_init+0x24/0x1ec
[    0.213143]  ret_from_fork+0x10/0x20

Signed-off-by: Lingyue <lingyue@xiaomi.com>
---
 arch/arm64/kernel/smp.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Mark Rutland June 21, 2024, 10:12 a.m. UTC | #1
On Fri, Jun 21, 2024 at 03:50:45PM +0800, Lingyue wrote:
> Many modules, such as arch topology, rely on num_possible_cpus() to
> allocate memory and then access the allocated space using CPU IDs.
> These modules assume that there are no gaps in cpu_possible_mask.

Is there any documented requirement that cpu_possible_mask has no gaps?

It looks like other architectures can have gaps in their
cpu_possible_mask, there's no documented requiremetns AFAICT, and there
are a bunch of commits handling cpu_possible_mask having gaps, e.g.

  bc75e99983df1efd ("rcu: Correctly handle sparse possible cpus")
  3da43104d3187184 ("ARC: Adjust cpuinfo for non-continuous cpu ids")
  72917235fd5f0863 ("tracing: Fix for non-continuous cpu ids")

... so I don't think that the topology code should assume that there are
no gaps in cpu_possible_mask.

> However, in of_parse_and_init_cpus(), CPU IDs are still allocated
> for invalid CPU nodes, leading to gaps in cpu_possible_mask and
> resulting in out-of-bounds memory access. So it is crucial to avoid
> allocating CPU IDs to invalid CPU nodes.

AFAICT the topology code could use 'nr_cpu_ids' instead of
'nr_possible_cpus()', like the tracing commit above, or it could use a
per-cpu allocation to avoid this.

> This issue can be reproduced easily on QEMU with KASAN enabled, by
> modifing reg property of a CPU node to 0xFFFFFFFF
> 
> [    0.197756] BUG: KASAN: slab-out-of-bounds in topology_normalize_cpu_scale.part.0+0x2cc/0x34c
> [    0.199518] Read of size 4 at addr ffff000007ebe924 by task swapper/0/1
> [    0.200087]
> [    0.200739] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.10.0-rc4 #3
> [    0.201647] Hardware name: linux,dummy-virt (DT)
> [    0.203067] Call trace:
> [    0.203404]  dump_backtrace+0x90/0xe8
> [    0.203974]  show_stack+0x18/0x24
> [    0.204424]  dump_stack_lvl+0x78/0x90
> [    0.205090]  print_report+0x114/0x5cc
> [    0.205908]  kasan_report+0xa4/0xf0
> [    0.206488]  __asan_report_load4_noabort+0x20/0x2c
> [    0.207427]  topology_normalize_cpu_scale.part.0+0x2cc/0x34c
> [    0.208275]  init_cpu_topology+0x254/0x430
> [    0.209518]  smp_prepare_cpus+0x20/0x25c
> [    0.210824]  kernel_init_freeable+0x1dc/0x4fc
> [    0.212047]  kernel_init+0x24/0x1ec
> [    0.213143]  ret_from_fork+0x10/0x20
> 
> Signed-off-by: Lingyue <lingyue@xiaomi.com>
> ---
>  arch/arm64/kernel/smp.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 31c8b3094dd7..5b4178145920 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -638,12 +638,12 @@ static void __init of_parse_and_init_cpus(void)
>  		u64 hwid = of_get_cpu_hwid(dn, 0);
>  
>  		if (hwid & ~MPIDR_HWID_BITMASK)
> -			goto next;
> +			continue;
>  
>  		if (is_mpidr_duplicate(cpu_count, hwid)) {
>  			pr_err("%pOF: duplicate cpu reg properties in the DT\n",
>  				dn);
> -			goto next;
> +			continue;
>  		}
>  
>  		/*
> @@ -656,7 +656,7 @@ static void __init of_parse_and_init_cpus(void)
>  			if (bootcpu_valid) {
>  				pr_err("%pOF: duplicate boot cpu reg property in DT\n",
>  					dn);
> -				goto next;
> +				continue;
>  			}
>  

People get very upset when CPU numbering changes, so I'd prefer to avoid
this if possible.

Mark.

>  			bootcpu_valid = true;
> -- 
> 2.34.1
>
Lingyue June 22, 2024, 8:31 a.m. UTC | #2
On Fri, Jun 21, 2024 at 11:12:06AM +0100, Mark Rutland wrote:
> On Fri, Jun 21, 2024 at 03:50:45PM +0800, Lingyue wrote:
> > Many modules, such as arch topology, rely on num_possible_cpus() to
> > allocate memory and then access the allocated space using CPU IDs.
> > These modules assume that there are no gaps in cpu_possible_mask.
> 
> Is there any documented requirement that cpu_possible_mask has no gaps?
> 
> It looks like other architectures can have gaps in their
> cpu_possible_mask, there's no documented requiremetns AFAICT, and there
> are a bunch of commits handling cpu_possible_mask having gaps, e.g.
> 
>   bc75e99983df1efd ("rcu: Correctly handle sparse possible cpus")
>   3da43104d3187184 ("ARC: Adjust cpuinfo for non-continuous cpu ids")
>   72917235fd5f0863 ("tracing: Fix for non-continuous cpu ids")
> 
> ... so I don't think that the topology code should assume that there are
> no gaps in cpu_possible_mask.
>
Yes, I also don't find any documented requirement about it.

> > However, in of_parse_and_init_cpus(), CPU IDs are still allocated
> > for invalid CPU nodes, leading to gaps in cpu_possible_mask and
> > resulting in out-of-bounds memory access. So it is crucial to avoid
> > allocating CPU IDs to invalid CPU nodes.
> 
> AFAICT the topology code could use 'nr_cpu_ids' instead of
> 'nr_possible_cpus()', like the tracing commit above, or it could use a
> per-cpu allocation to avoid this.
>
In this case, of course we can modify the arch topology code to solve the problem.
However, I propose that if we can ensure there are no gaps in the cpu_possible_mask,
we can solve such misuse issues once and for all, without having to dig and fix other
potential similar problems one by one.

> > This issue can be reproduced easily on QEMU with KASAN enabled, by
> > modifing reg property of a CPU node to 0xFFFFFFFF
> > 
> > [    0.197756] BUG: KASAN: slab-out-of-bounds in topology_normalize_cpu_scale.part.0+0x2cc/0x34c
> > [    0.199518] Read of size 4 at addr ffff000007ebe924 by task swapper/0/1
> > [    0.200087]
> > [    0.200739] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.10.0-rc4 #3
> > [    0.201647] Hardware name: linux,dummy-virt (DT)
> > [    0.203067] Call trace:
> > [    0.203404]  dump_backtrace+0x90/0xe8
> > [    0.203974]  show_stack+0x18/0x24
> > [    0.204424]  dump_stack_lvl+0x78/0x90
> > [    0.205090]  print_report+0x114/0x5cc
> > [    0.205908]  kasan_report+0xa4/0xf0
> > [    0.206488]  __asan_report_load4_noabort+0x20/0x2c
> > [    0.207427]  topology_normalize_cpu_scale.part.0+0x2cc/0x34c
> > [    0.208275]  init_cpu_topology+0x254/0x430
> > [    0.209518]  smp_prepare_cpus+0x20/0x25c
> > [    0.210824]  kernel_init_freeable+0x1dc/0x4fc
> > [    0.212047]  kernel_init+0x24/0x1ec
> > [    0.213143]  ret_from_fork+0x10/0x20
> > 
> > Signed-off-by: Lingyue <lingyue@xiaomi.com>
> > ---
> >  arch/arm64/kernel/smp.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > index 31c8b3094dd7..5b4178145920 100644
> > --- a/arch/arm64/kernel/smp.c
> > +++ b/arch/arm64/kernel/smp.c
> > @@ -638,12 +638,12 @@ static void __init of_parse_and_init_cpus(void)
> >  		u64 hwid = of_get_cpu_hwid(dn, 0);
> >  
> >  		if (hwid & ~MPIDR_HWID_BITMASK)
> > -			goto next;
> > +			continue;
> >  
> >  		if (is_mpidr_duplicate(cpu_count, hwid)) {
> >  			pr_err("%pOF: duplicate cpu reg properties in the DT\n",
> >  				dn);
> > -			goto next;
> > +			continue;
> >  		}
> >  
> >  		/*
> > @@ -656,7 +656,7 @@ static void __init of_parse_and_init_cpus(void)
> >  			if (bootcpu_valid) {
> >  				pr_err("%pOF: duplicate boot cpu reg property in DT\n",
> >  					dn);
> > -				goto next;
> > +				continue;
> >  			}
> >  
> 
> People get very upset when CPU numbering changes, so I'd prefer to avoid
> this if possible.
> 
> Mark.
> 
This modification will only affect CPU ID allocation if there are invalid or
duplicate CPU nodes in device tree. IMO, it is not a typical use case, but
please let me know if there are any other use case.

Many thanks for your response.

Lingyue.

> >  			bootcpu_valid = true;
> > -- 
> > 2.34.1
> >
diff mbox series

Patch

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 31c8b3094dd7..5b4178145920 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -638,12 +638,12 @@  static void __init of_parse_and_init_cpus(void)
 		u64 hwid = of_get_cpu_hwid(dn, 0);
 
 		if (hwid & ~MPIDR_HWID_BITMASK)
-			goto next;
+			continue;
 
 		if (is_mpidr_duplicate(cpu_count, hwid)) {
 			pr_err("%pOF: duplicate cpu reg properties in the DT\n",
 				dn);
-			goto next;
+			continue;
 		}
 
 		/*
@@ -656,7 +656,7 @@  static void __init of_parse_and_init_cpus(void)
 			if (bootcpu_valid) {
 				pr_err("%pOF: duplicate boot cpu reg property in DT\n",
 					dn);
-				goto next;
+				continue;
 			}
 
 			bootcpu_valid = true;