Message ID | 20240126064451.5465-1-shijie@os.amperecomputing.com (mailing list archive) |
---|---|
State | Handled Elsewhere |
Headers | show |
Series | [v3] NUMA: Early use of cpu_to_node() returns 0 instead of the correct node id | expand |
On Thu, 25 Jan 2024 22:44:51 PST (-0800), shijie@os.amperecomputing.com wrote: > During the kernel booting, the generic cpu_to_node() is called too early in > arm64, powerpc and riscv when CONFIG_NUMA is enabled. > > There are at least four places in the common code where > the generic cpu_to_node() is called before it is initialized: > 1.) early_trace_init() in kernel/trace/trace.c > 2.) sched_init() in kernel/sched/core.c > 3.) init_sched_fair_class() in kernel/sched/fair.c > 4.) workqueue_init_early() in kernel/workqueue.c > > In order to fix the bug, the patch introduces early_numa_node_init() > which is called after smp_prepare_boot_cpu() in start_kernel. > early_numa_node_init will initialize the "numa_node" as soon as > the early_cpu_to_node() is ready, before the cpu_to_node() is called > at the first time. > > Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com> > --- > v2 --> v3: > Do not change the cpu_to_node to function pointer. > Introduce early_numa_node_init() which initialize > the numa_node at an early stage. > > v2: https://lore.kernel.org/all/20240123045843.75969-1-shijie@os.amperecomputing.com/ > > v1 --> v2: > In order to fix the x86 compiling error, move the cpu_to_node() > from driver/base/arch_numa.c to driver/base/node.c. > > v1: http://lists.infradead.org/pipermail/linux-arm-kernel/2024-January/896160.html > > An old different title patch: > http://lists.infradead.org/pipermail/linux-arm-kernel/2024-January/895963.html > > --- > init/main.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/init/main.c b/init/main.c > index e24b0780fdff..39efe5ed58a0 100644 > --- a/init/main.c > +++ b/init/main.c > @@ -870,6 +870,19 @@ static void __init print_unknown_bootoptions(void) > memblock_free(unknown_options, len); > } > > +static void __init early_numa_node_init(void) > +{ > +#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID > +#ifndef cpu_to_node > + int cpu; > + > + /* The early_cpu_to_node() should be ready here. */ > + for_each_possible_cpu(cpu) > + set_cpu_numa_node(cpu, early_cpu_to_node(cpu)); > +#endif > +#endif > +} > + > asmlinkage __visible __init __no_sanitize_address __noreturn __no_stack_protector > void start_kernel(void) > { > @@ -900,6 +913,7 @@ void start_kernel(void) > setup_nr_cpu_ids(); > setup_per_cpu_areas(); > smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */ > + early_numa_node_init(); > boot_cpu_hotplug_init(); > > pr_notice("Kernel command line: %s\n", saved_command_line); Acked-by: Palmer Dabbelt <palmer@rivosinc.com> # RISC-V I don't really understand the init/main.c stuff all that well, I'm adding Andrew as it looks like he's been merging stuff here.
On Fri, 26 Jan 2024 14:44:51 +0800 Huang Shijie <shijie@os.amperecomputing.com> wrote: > During the kernel booting, the generic cpu_to_node() is called too early in > arm64, powerpc and riscv when CONFIG_NUMA is enabled. > > There are at least four places in the common code where > the generic cpu_to_node() is called before it is initialized: > 1.) early_trace_init() in kernel/trace/trace.c > 2.) sched_init() in kernel/sched/core.c > 3.) init_sched_fair_class() in kernel/sched/fair.c > 4.) workqueue_init_early() in kernel/workqueue.c > > In order to fix the bug, the patch introduces early_numa_node_init() > which is called after smp_prepare_boot_cpu() in start_kernel. > early_numa_node_init will initialize the "numa_node" as soon as > the early_cpu_to_node() is ready, before the cpu_to_node() is called > at the first time. What are the userspace-visible runtime effects of this bug?
在 2024/3/28 2:17, Andrew Morton 写道: > On Fri, 26 Jan 2024 14:44:51 +0800 Huang Shijie <shijie@os.amperecomputing.com> wrote: > >> During the kernel booting, the generic cpu_to_node() is called too early in >> arm64, powerpc and riscv when CONFIG_NUMA is enabled. >> >> There are at least four places in the common code where >> the generic cpu_to_node() is called before it is initialized: >> 1.) early_trace_init() in kernel/trace/trace.c >> 2.) sched_init() in kernel/sched/core.c >> 3.) init_sched_fair_class() in kernel/sched/fair.c >> 4.) workqueue_init_early() in kernel/workqueue.c >> >> In order to fix the bug, the patch introduces early_numa_node_init() >> which is called after smp_prepare_boot_cpu() in start_kernel. >> early_numa_node_init will initialize the "numa_node" as soon as >> the early_cpu_to_node() is ready, before the cpu_to_node() is called >> at the first time. > What are the userspace-visible runtime effects of this bug? > For this bug, I do not see too much performance impact in the userspace applications. It just pollutes the CPU caches in NUMA. Thanks Huang Shijie
On Wed, 27 Mar 2024, Andrew Morton wrote: >> In order to fix the bug, the patch introduces early_numa_node_init() >> which is called after smp_prepare_boot_cpu() in start_kernel. >> early_numa_node_init will initialize the "numa_node" as soon as >> the early_cpu_to_node() is ready, before the cpu_to_node() is called >> at the first time. > > What are the userspace-visible runtime effects of this bug? Performance is reduced since there is increase in off node accesses.
diff --git a/init/main.c b/init/main.c index e24b0780fdff..39efe5ed58a0 100644 --- a/init/main.c +++ b/init/main.c @@ -870,6 +870,19 @@ static void __init print_unknown_bootoptions(void) memblock_free(unknown_options, len); } +static void __init early_numa_node_init(void) +{ +#ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID +#ifndef cpu_to_node + int cpu; + + /* The early_cpu_to_node() should be ready here. */ + for_each_possible_cpu(cpu) + set_cpu_numa_node(cpu, early_cpu_to_node(cpu)); +#endif +#endif +} + asmlinkage __visible __init __no_sanitize_address __noreturn __no_stack_protector void start_kernel(void) { @@ -900,6 +913,7 @@ void start_kernel(void) setup_nr_cpu_ids(); setup_per_cpu_areas(); smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */ + early_numa_node_init(); boot_cpu_hotplug_init(); pr_notice("Kernel command line: %s\n", saved_command_line);
During the kernel booting, the generic cpu_to_node() is called too early in arm64, powerpc and riscv when CONFIG_NUMA is enabled. There are at least four places in the common code where the generic cpu_to_node() is called before it is initialized: 1.) early_trace_init() in kernel/trace/trace.c 2.) sched_init() in kernel/sched/core.c 3.) init_sched_fair_class() in kernel/sched/fair.c 4.) workqueue_init_early() in kernel/workqueue.c In order to fix the bug, the patch introduces early_numa_node_init() which is called after smp_prepare_boot_cpu() in start_kernel. early_numa_node_init will initialize the "numa_node" as soon as the early_cpu_to_node() is ready, before the cpu_to_node() is called at the first time. Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com> --- v2 --> v3: Do not change the cpu_to_node to function pointer. Introduce early_numa_node_init() which initialize the numa_node at an early stage. v2: https://lore.kernel.org/all/20240123045843.75969-1-shijie@os.amperecomputing.com/ v1 --> v2: In order to fix the x86 compiling error, move the cpu_to_node() from driver/base/arch_numa.c to driver/base/node.c. v1: http://lists.infradead.org/pipermail/linux-arm-kernel/2024-January/896160.html An old different title patch: http://lists.infradead.org/pipermail/linux-arm-kernel/2024-January/895963.html --- init/main.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)