mbox series

[v2,00/25] mm: introduce numa_memblks

Message ID 20240723064156.4009477-1-rppt@kernel.org (mailing list archive)
Headers show
Series mm: introduce numa_memblks | expand

Message

Mike Rapoport July 23, 2024, 6:41 a.m. UTC
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>

Hi,

Following the discussion about handling of CXL fixed memory windows on
arm64 [1] I decided to bite the bullet and move numa_memblks from x86 to
the generic code so they will be available on arm64/riscv and maybe on
loongarch sometime later.

While it could be possible to use memblock to describe CXL memory windows,
it currently lacks notion of unpopulated memory ranges and numa_memblks
does implement this.

Another reason to make numa_memblks generic is that both arch_numa (arm64
and riscv) and loongarch use trimmed copy of x86 code although there is no
fundamental reason why the same code cannot be used on all these platforms.
Having numa_memblks in mm/ will make it's interaction with ACPI and FDT
more consistent and I believe will reduce maintenance burden.

And with generic numa_memblks it is (almost) straightforward to enable NUMA
emulation on arm64 and riscv.

The first 9 commits in this series are cleanups that are not strictly
related to numa_memblks.
Commits 10-16 slightly reorder code in x86 to allow extracting numa_memblks
and NUMA emulation to the generic code.
Commits 17-19 actually move the code from arch/x86/ to mm/ and commits 20-22
does some aftermath cleanups.
Commit 23 switches arch_numa to numa_memblks.
Commit 24 enables usage of phys_to_target_node() and
memory_add_physaddr_to_nid() with numa_memblks.
Commit 25 moves the description for numa=fake from x86 to admin-guide

[1] https://lore.kernel.org/all/20240529171236.32002-1-Jonathan.Cameron@huawei.com/

v1: https://lore.kernel.org/all/20240716111346.3676969-1-rppt@kernel.org
* add cleanup for arch_alloc_nodedata and HAVE_ARCH_NODEDATA_EXTENSION
* add patch that moves description of numa=fake kernel parameter from
  x86 to admin-guide
* reduce rounding up of node_data allocations from PAGE_SIZE to
  SMP_CACHE_BYTES
* restore single allocation attempt of numa_distance
* fix several comments
* added review tags

Mike Rapoport (Microsoft) (25):
  mm: move kernel/numa.c to mm/
  MIPS: sgi-ip27: make NODE_DATA() the same as on all other architectures
  MIPS: sgi-ip27: ensure node_possible_map only contains valid nodes
  MIPS: sgi-ip27: drop HAVE_ARCH_NODEDATA_EXTENSION
  MIPS: loongson64: rename __node_data to node_data
  MIPS: loongson64: drop HAVE_ARCH_NODEDATA_EXTENSION
  mm: drop CONFIG_HAVE_ARCH_NODEDATA_EXTENSION
  arch, mm: move definition of node_data to generic code
  arch, mm: pull out allocation of NODE_DATA to generic code
  x86/numa: simplify numa_distance allocation
  x86/numa: use get_pfn_range_for_nid to verify that node spans memory
  x86/numa: move FAKE_NODE_* defines to numa_emu
  x86/numa_emu: simplify allocation of phys_dist
  x86/numa_emu: split __apicid_to_node update to a helper function
  x86/numa_emu: use a helper function to get MAX_DMA32_PFN
  x86/numa: numa_{add,remove}_cpu: make cpu parameter unsigned
  mm: introduce numa_memblks
  mm: move numa_distance and related code from x86 to numa_memblks
  mm: introduce numa_emulation
  mm: numa_memblks: introduce numa_memblks_init
  mm: numa_memblks: make several functions and variables static
  mm: numa_memblks: use memblock_{start,end}_of_DRAM() when sanitizing
    meminfo
  arch_numa: switch over to numa_memblks
  mm: make range-to-target_node lookup facility a part of numa_memblks
  docs: move numa=fake description to kernel-parameters.txt

 .../admin-guide/kernel-parameters.txt         |  15 +
 .../arch/x86/x86_64/boot-options.rst          |  12 -
 arch/arm64/include/asm/Kbuild                 |   1 +
 arch/arm64/include/asm/mmzone.h               |  13 -
 arch/arm64/include/asm/topology.h             |   1 +
 arch/loongarch/include/asm/Kbuild             |   1 +
 arch/loongarch/include/asm/mmzone.h           |  16 -
 arch/loongarch/include/asm/topology.h         |   1 +
 arch/loongarch/kernel/numa.c                  |  21 -
 arch/mips/Kconfig                             |   5 -
 arch/mips/include/asm/mach-ip27/mmzone.h      |   1 -
 .../mips/include/asm/mach-loongson64/mmzone.h |   4 -
 arch/mips/loongson64/numa.c                   |  28 +-
 arch/mips/sgi-ip27/ip27-memory.c              |  12 +-
 arch/mips/sgi-ip27/ip27-smp.c                 |   2 +
 arch/powerpc/include/asm/mmzone.h             |   6 -
 arch/powerpc/mm/numa.c                        |  26 +-
 arch/riscv/include/asm/Kbuild                 |   1 +
 arch/riscv/include/asm/mmzone.h               |  13 -
 arch/riscv/include/asm/topology.h             |   4 +
 arch/s390/include/asm/Kbuild                  |   1 +
 arch/s390/include/asm/mmzone.h                |  17 -
 arch/s390/kernel/numa.c                       |   3 -
 arch/sh/include/asm/mmzone.h                  |   3 -
 arch/sh/mm/init.c                             |   7 +-
 arch/sh/mm/numa.c                             |   3 -
 arch/sparc/include/asm/mmzone.h               |   4 -
 arch/sparc/mm/init_64.c                       |  11 +-
 arch/x86/Kconfig                              |   9 +-
 arch/x86/include/asm/Kbuild                   |   1 +
 arch/x86/include/asm/mmzone.h                 |   6 -
 arch/x86/include/asm/mmzone_32.h              |  17 -
 arch/x86/include/asm/mmzone_64.h              |  18 -
 arch/x86/include/asm/numa.h                   |  26 +-
 arch/x86/include/asm/sparsemem.h              |   9 -
 arch/x86/mm/Makefile                          |   1 -
 arch/x86/mm/amdtopology.c                     |   1 +
 arch/x86/mm/numa.c                            | 618 +-----------------
 arch/x86/mm/numa_internal.h                   |  24 -
 drivers/acpi/numa/srat.c                      |   1 +
 drivers/base/Kconfig                          |   1 +
 drivers/base/arch_numa.c                      | 223 ++-----
 drivers/cxl/Kconfig                           |   2 +-
 drivers/dax/Kconfig                           |   2 +-
 drivers/of/of_numa.c                          |   1 +
 include/asm-generic/mmzone.h                  |   5 +
 include/asm-generic/numa.h                    |   6 +-
 include/linux/memory_hotplug.h                |  48 --
 include/linux/numa.h                          |   5 +
 include/linux/numa_memblks.h                  |  58 ++
 kernel/Makefile                               |   1 -
 kernel/numa.c                                 |  26 -
 mm/Kconfig                                    |  11 +
 mm/Makefile                                   |   3 +
 mm/mm_init.c                                  |   3 +-
 mm/numa.c                                     |  57 ++
 {arch/x86/mm => mm}/numa_emulation.c          |  42 +-
 mm/numa_memblks.c                             | 568 ++++++++++++++++
 58 files changed, 867 insertions(+), 1158 deletions(-)
 delete mode 100644 arch/arm64/include/asm/mmzone.h
 delete mode 100644 arch/loongarch/include/asm/mmzone.h
 delete mode 100644 arch/riscv/include/asm/mmzone.h
 delete mode 100644 arch/s390/include/asm/mmzone.h
 delete mode 100644 arch/x86/include/asm/mmzone.h
 delete mode 100644 arch/x86/include/asm/mmzone_32.h
 delete mode 100644 arch/x86/include/asm/mmzone_64.h
 create mode 100644 include/asm-generic/mmzone.h
 create mode 100644 include/linux/numa_memblks.h
 delete mode 100644 kernel/numa.c
 create mode 100644 mm/numa.c
 rename {arch/x86/mm => mm}/numa_emulation.c (94%)
 create mode 100644 mm/numa_memblks.c


base-commit: 22a40d14b572deb80c0648557f4bd502d7e83826

Comments

Zi Yan July 25, 2024, 12:35 a.m. UTC | #1
On 24 Jul 2024, at 18:44, Zi Yan wrote:

> On 23 Jul 2024, at 2:41, Mike Rapoport wrote:
>
>> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
>>
>> Hi,
>>
>> Following the discussion about handling of CXL fixed memory windows on
>> arm64 [1] I decided to bite the bullet and move numa_memblks from x86 to
>> the generic code so they will be available on arm64/riscv and maybe on
>> loongarch sometime later.
>>
>> While it could be possible to use memblock to describe CXL memory windows,
>> it currently lacks notion of unpopulated memory ranges and numa_memblks
>> does implement this.
>>
>> Another reason to make numa_memblks generic is that both arch_numa (arm64
>> and riscv) and loongarch use trimmed copy of x86 code although there is no
>> fundamental reason why the same code cannot be used on all these platforms.
>> Having numa_memblks in mm/ will make it's interaction with ACPI and FDT
>> more consistent and I believe will reduce maintenance burden.
>>
>> And with generic numa_memblks it is (almost) straightforward to enable NUMA
>> emulation on arm64 and riscv.
>>
>> The first 9 commits in this series are cleanups that are not strictly
>> related to numa_memblks.
>> Commits 10-16 slightly reorder code in x86 to allow extracting numa_memblks
>> and NUMA emulation to the generic code.
>> Commits 17-19 actually move the code from arch/x86/ to mm/ and commits 20-22
>> does some aftermath cleanups.
>> Commit 23 switches arch_numa to numa_memblks.
>> Commit 24 enables usage of phys_to_target_node() and
>> memory_add_physaddr_to_nid() with numa_memblks.
>> Commit 25 moves the description for numa=fake from x86 to admin-guide
>>
>> [1] https://lore.kernel.org/all/20240529171236.32002-1-Jonathan.Cameron@huawei.com/
>>
>> v1: https://lore.kernel.org/all/20240716111346.3676969-1-rppt@kernel.org
>> * add cleanup for arch_alloc_nodedata and HAVE_ARCH_NODEDATA_EXTENSION
>> * add patch that moves description of numa=fake kernel parameter from
>>   x86 to admin-guide
>> * reduce rounding up of node_data allocations from PAGE_SIZE to
>>   SMP_CACHE_BYTES
>> * restore single allocation attempt of numa_distance
>> * fix several comments
>> * added review tags
>>
>> Mike Rapoport (Microsoft) (25):
>>   mm: move kernel/numa.c to mm/
>>   MIPS: sgi-ip27: make NODE_DATA() the same as on all other architectures
>>   MIPS: sgi-ip27: ensure node_possible_map only contains valid nodes
>>   MIPS: sgi-ip27: drop HAVE_ARCH_NODEDATA_EXTENSION
>>   MIPS: loongson64: rename __node_data to node_data
>>   MIPS: loongson64: drop HAVE_ARCH_NODEDATA_EXTENSION
>>   mm: drop CONFIG_HAVE_ARCH_NODEDATA_EXTENSION
>>   arch, mm: move definition of node_data to generic code
>>   arch, mm: pull out allocation of NODE_DATA to generic code
>>   x86/numa: simplify numa_distance allocation
>>   x86/numa: use get_pfn_range_for_nid to verify that node spans memory
>>   x86/numa: move FAKE_NODE_* defines to numa_emu
>>   x86/numa_emu: simplify allocation of phys_dist
>>   x86/numa_emu: split __apicid_to_node update to a helper function
>>   x86/numa_emu: use a helper function to get MAX_DMA32_PFN
>>   x86/numa: numa_{add,remove}_cpu: make cpu parameter unsigned
>>   mm: introduce numa_memblks
>>   mm: move numa_distance and related code from x86 to numa_memblks
>>   mm: introduce numa_emulation
>>   mm: numa_memblks: introduce numa_memblks_init
>>   mm: numa_memblks: make several functions and variables static
>>   mm: numa_memblks: use memblock_{start,end}_of_DRAM() when sanitizing
>>     meminfo
>>   arch_numa: switch over to numa_memblks
>>   mm: make range-to-target_node lookup facility a part of numa_memblks
>>   docs: move numa=fake description to kernel-parameters.txt
>>
> Hi,
>
> I have tested this series on both x86_64 and arm64. It works fine on x86_64.
> All numa=fake= options work as they did before the series.
>
> But I am not able to boot the kernel (no printout at all) on arm64 VM
> (Mac mini M1 VMWare). By git bisecting, arch_numa: switch over to numa_memblks
> is the first patch causing the boot failure. I see the warning:
>
> WARNING: modpost: vmlinux: section mismatch in reference: numa_add_cpu+0x1c (section: .text) -> early_cpu_to_node (section: .init.text)
>
> I am not sure if it is red herring or not, since changing early_cpu_to_node
> to cpu_to_node in numa_add_cpu() from mm/numa_emulation.c did get rid of the
> warning, but the system still failed to boot.
>
> Please note that you need binutils 2.40 to build the arm64 kernel, since there
> is a bug(https://sourceware.org/bugzilla/show_bug.cgi?id=31924) in 2.42 preventing
> arm64 kernel from booting as well.
>
> My config is attached.

I get more info after adding earlycon to the boot option.
pgdat is NULL, causing issues when free_area_init_node() is dereferencing
it at first WARN_ON.

FYI, my build is this series on top of v6.10 instead of the base commit,
where the series applies cleanly on top v6.10.

[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000080000000-0x00000000ffd82fff]
[    0.000000]   node   0: [mem 0x00000000ffd83000-0x00000000fffb5fff]
[    0.000000]   node   0: [mem 0x00000000fffb6000-0x000000017befffff]
[    0.000000]   node   0: [mem 0x000000017bf00000-0x000000017bfbffff]
[    0.000000]   node   0: [mem 0x000000017bfc0000-0x000000017c02ffff]
[    0.000000]   node   0: [mem 0x000000017c030000-0x000000017c03ffff]
[    0.000000]   node   0: [mem 0x000000017c040000-0x000000017c09ffff]
[    0.000000]   node   0: [mem 0x000000017c0a0000-0x000000017c13ffff]
[    0.000000]   node   0: [mem 0x000000017c140000-0x000000017f41ffff]
[    0.000000]   node   0: [mem 0x000000017f420000-0x000000017f4affff]
[    0.000000]   node   0: [mem 0x000000017f4b0000-0x000000017f5bffff]
[    0.000000]   node   0: [mem 0x000000017f5c0000-0x000000017f5dffff]
[    0.000000]   node   0: [mem 0x000000017f5e0000-0x000000017fffffff]
[    0.000000] pgdat: 0000000000000000, nid: 0
[    0.000000] Unable to handle kernel paging request at virtual address 0000000000002220
[    0.000000] Mem abort info:
[    0.000000]   ESR = 0x0000000096000004
[    0.000000]   EC = 0x25: DABT (current EL), IL = 32 bits
[    0.000000]   SET = 0, FnV = 0
[    0.000000]   EA = 0, S1PTW = 0
[    0.000000]   FSC = 0x04: level 0 translation fault
[    0.000000] Data abort info:
[    0.000000]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[    0.000000]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[    0.000000]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[    0.000000] [0000000000002220] user address but active_mm is swapper
[    0.000000] Internal error: Oops: 0000000096000004 [#1] SMP
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.10.0+ #17
[    0.000000] pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    0.000000] pc : free_area_init+0x720/0xf90
[    0.000000] lr : free_area_init+0x714/0xf90
[    0.000000] sp : ffff800081eb3c20
[    0.000000] x29: ffff800081eb3c20 x28: 000000017b5e710c x27: ffff800082158000
[    0.000000] x26: 000000017ffff168 x25: ffff800081ecc000 x24: 0000000000000000
[    0.000000] x23: 0000000000000000 x22: 0000000000000000 x21: ffff8000821f0480
[    0.000000] x20: 0000000000000000 x19: ffff8000818863f0 x18: 0000000000000006
[    0.000000] x17: 00000000007fb000 x16: 000000017f805000 x15: ffff800081eb36b0
[    0.000000] x14: 0000000000000000 x13: 30203a64696e202c x12: ffff800081f3ef10
[    0.000000] x11: 0000000000000001 x10: 0000000000000001 x9 : 0000000000017fe8
[    0.000000] x8 : c0000000ffffefff x7 : ffff800081ee6d40 x6 : 0000000000057fa8
[    0.000000] x5 : ffff800081f3eeb8 x4 : 0000000000000000 x3 : 0000000000000000
[    0.000000] x2 : 0000000000000000 x1 : ffff800081ec8c40 x0 : 000000000000001f
[    0.000000] Call trace:
[    0.000000]  free_area_init+0x720/0xf90
[    0.000000]  bootmem_init+0x158/0x218
[    0.000000]  setup_arch+0x220/0x650
[    0.000000]  start_kernel+0x74/0x7e0
[    0.000000]  __primary_switched+0x80/0x90
[    0.000000] Code: 97a64606 b940b7f6 a90dffff f876dab4 (b9622280)
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

--
Best Regards,
Yan, Zi
Zi Yan July 25, 2024, 2:48 a.m. UTC | #2
On 24 Jul 2024, at 20:35, Zi Yan wrote:

> On 24 Jul 2024, at 18:44, Zi Yan wrote:
>
>> On 23 Jul 2024, at 2:41, Mike Rapoport wrote:
>>
>>> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
>>>
>>> Hi,
>>>
>>> Following the discussion about handling of CXL fixed memory windows on
>>> arm64 [1] I decided to bite the bullet and move numa_memblks from x86 to
>>> the generic code so they will be available on arm64/riscv and maybe on
>>> loongarch sometime later.
>>>
>>> While it could be possible to use memblock to describe CXL memory windows,
>>> it currently lacks notion of unpopulated memory ranges and numa_memblks
>>> does implement this.
>>>
>>> Another reason to make numa_memblks generic is that both arch_numa (arm64
>>> and riscv) and loongarch use trimmed copy of x86 code although there is no
>>> fundamental reason why the same code cannot be used on all these platforms.
>>> Having numa_memblks in mm/ will make it's interaction with ACPI and FDT
>>> more consistent and I believe will reduce maintenance burden.
>>>
>>> And with generic numa_memblks it is (almost) straightforward to enable NUMA
>>> emulation on arm64 and riscv.
>>>
>>> The first 9 commits in this series are cleanups that are not strictly
>>> related to numa_memblks.
>>> Commits 10-16 slightly reorder code in x86 to allow extracting numa_memblks
>>> and NUMA emulation to the generic code.
>>> Commits 17-19 actually move the code from arch/x86/ to mm/ and commits 20-22
>>> does some aftermath cleanups.
>>> Commit 23 switches arch_numa to numa_memblks.
>>> Commit 24 enables usage of phys_to_target_node() and
>>> memory_add_physaddr_to_nid() with numa_memblks.
>>> Commit 25 moves the description for numa=fake from x86 to admin-guide
>>>
>>> [1] https://lore.kernel.org/all/20240529171236.32002-1-Jonathan.Cameron@huawei.com/
>>>
>>> v1: https://lore.kernel.org/all/20240716111346.3676969-1-rppt@kernel.org
>>> * add cleanup for arch_alloc_nodedata and HAVE_ARCH_NODEDATA_EXTENSION
>>> * add patch that moves description of numa=fake kernel parameter from
>>>   x86 to admin-guide
>>> * reduce rounding up of node_data allocations from PAGE_SIZE to
>>>   SMP_CACHE_BYTES
>>> * restore single allocation attempt of numa_distance
>>> * fix several comments
>>> * added review tags
>>>
>>> Mike Rapoport (Microsoft) (25):
>>>   mm: move kernel/numa.c to mm/
>>>   MIPS: sgi-ip27: make NODE_DATA() the same as on all other architectures
>>>   MIPS: sgi-ip27: ensure node_possible_map only contains valid nodes
>>>   MIPS: sgi-ip27: drop HAVE_ARCH_NODEDATA_EXTENSION
>>>   MIPS: loongson64: rename __node_data to node_data
>>>   MIPS: loongson64: drop HAVE_ARCH_NODEDATA_EXTENSION
>>>   mm: drop CONFIG_HAVE_ARCH_NODEDATA_EXTENSION
>>>   arch, mm: move definition of node_data to generic code
>>>   arch, mm: pull out allocation of NODE_DATA to generic code
>>>   x86/numa: simplify numa_distance allocation
>>>   x86/numa: use get_pfn_range_for_nid to verify that node spans memory
>>>   x86/numa: move FAKE_NODE_* defines to numa_emu
>>>   x86/numa_emu: simplify allocation of phys_dist
>>>   x86/numa_emu: split __apicid_to_node update to a helper function
>>>   x86/numa_emu: use a helper function to get MAX_DMA32_PFN
>>>   x86/numa: numa_{add,remove}_cpu: make cpu parameter unsigned
>>>   mm: introduce numa_memblks
>>>   mm: move numa_distance and related code from x86 to numa_memblks
>>>   mm: introduce numa_emulation
>>>   mm: numa_memblks: introduce numa_memblks_init
>>>   mm: numa_memblks: make several functions and variables static
>>>   mm: numa_memblks: use memblock_{start,end}_of_DRAM() when sanitizing
>>>     meminfo
>>>   arch_numa: switch over to numa_memblks
>>>   mm: make range-to-target_node lookup facility a part of numa_memblks
>>>   docs: move numa=fake description to kernel-parameters.txt
>>>
>> Hi,
>>
>> I have tested this series on both x86_64 and arm64. It works fine on x86_64.
>> All numa=fake= options work as they did before the series.
>>
>> But I am not able to boot the kernel (no printout at all) on arm64 VM
>> (Mac mini M1 VMWare). By git bisecting, arch_numa: switch over to numa_memblks
>> is the first patch causing the boot failure. I see the warning:
>>
>> WARNING: modpost: vmlinux: section mismatch in reference: numa_add_cpu+0x1c (section: .text) -> early_cpu_to_node (section: .init.text)
>>
>> I am not sure if it is red herring or not, since changing early_cpu_to_node
>> to cpu_to_node in numa_add_cpu() from mm/numa_emulation.c did get rid of the
>> warning, but the system still failed to boot.
>>
>> Please note that you need binutils 2.40 to build the arm64 kernel, since there
>> is a bug(https://sourceware.org/bugzilla/show_bug.cgi?id=31924) in 2.42 preventing
>> arm64 kernel from booting as well.
>>
>> My config is attached.
>
> I get more info after adding earlycon to the boot option.
> pgdat is NULL, causing issues when free_area_init_node() is dereferencing
> it at first WARN_ON.
>
> FYI, my build is this series on top of v6.10 instead of the base commit,
> where the series applies cleanly on top v6.10.

OK, the issue comes from that my arm64 VM has no ACPI but x86_64 VM has it,
thus on arm64 VM numa_init(arch_acpi_numa_ini) failed in arch_numa_init()
and the code falls back to numa_init(dummy_numa_init). In dummy_numa_init(),
before patch 23 "arch_numa: switch over to numa_memblks", numa_add_memblk()
from drivers/base/arch_numa.c is called on arm64, which unconditionally
set 0 to numa_nodes_parsed. This is missing in the x86 version of
numa_add_memblk(), which is now used by all arch. By adding the patch
below, my arm64 kernel boots in the VM.


diff --git a/drivers/base/arch_numa.c b/drivers/base/arch_numa.c
index 806550239d08..354f15b8d9b7 100644
--- a/drivers/base/arch_numa.c
+++ b/drivers/base/arch_numa.c
@@ -279,6 +279,7 @@ static int __init dummy_numa_init(void)
                pr_err("NUMA init failed\n");
                return ret;
        }
+       node_set(0, numa_nodes_parsed);

        numa_off = true;
        return 0;


Feel free to add

Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64

after you incorporate the fix.


--
Best Regards,
Yan, Zi
Mike Rapoport July 26, 2024, 9:40 a.m. UTC | #3
On Wed, Jul 24, 2024 at 10:48:42PM -0400, Zi Yan wrote:
> On 24 Jul 2024, at 20:35, Zi Yan wrote:
> > On 24 Jul 2024, at 18:44, Zi Yan wrote:
> >>
> >> Hi,
> >>
> >> I have tested this series on both x86_64 and arm64. It works fine on x86_64.
> >> All numa=fake= options work as they did before the series.
> >>
> >> But I am not able to boot the kernel (no printout at all) on arm64 VM
> >> (Mac mini M1 VMWare). By git bisecting, arch_numa: switch over to numa_memblks
> >> is the first patch causing the boot failure. I see the warning:
> >>
> >> WARNING: modpost: vmlinux: section mismatch in reference: numa_add_cpu+0x1c (section: .text) -> early_cpu_to_node (section: .init.text)
> >>
> >> I am not sure if it is red herring or not, since changing early_cpu_to_node
> >> to cpu_to_node in numa_add_cpu() from mm/numa_emulation.c did get rid of the
> >> warning, but the system still failed to boot.
> >>
> >> Please note that you need binutils 2.40 to build the arm64 kernel, since there
> >> is a bug(https://sourceware.org/bugzilla/show_bug.cgi?id=31924) in 2.42 preventing
> >> arm64 kernel from booting as well.
> >>
> >> My config is attached.
> >
> > I get more info after adding earlycon to the boot option.
> > pgdat is NULL, causing issues when free_area_init_node() is dereferencing
> > it at first WARN_ON.
> >
> > FYI, my build is this series on top of v6.10 instead of the base commit,
> > where the series applies cleanly on top v6.10.
> 
> OK, the issue comes from that my arm64 VM has no ACPI but x86_64 VM has it,
> thus on arm64 VM numa_init(arch_acpi_numa_ini) failed in arch_numa_init()
> and the code falls back to numa_init(dummy_numa_init). In dummy_numa_init(),
> before patch 23 "arch_numa: switch over to numa_memblks", numa_add_memblk()
> from drivers/base/arch_numa.c is called on arm64, which unconditionally
> set 0 to numa_nodes_parsed. This is missing in the x86 version of
> numa_add_memblk(), which is now used by all arch. By adding the patch
> below, my arm64 kernel boots in the VM.
> 
> 
> diff --git a/drivers/base/arch_numa.c b/drivers/base/arch_numa.c
> index 806550239d08..354f15b8d9b7 100644
> --- a/drivers/base/arch_numa.c
> +++ b/drivers/base/arch_numa.c
> @@ -279,6 +279,7 @@ static int __init dummy_numa_init(void)
>                 pr_err("NUMA init failed\n");
>                 return ret;
>         }
> +       node_set(0, numa_nodes_parsed);
> 
>         numa_off = true;
>         return 0;
> 
> 
> Feel free to add
> 
> Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64
> 
> after you incorporate the fix.

Thanks a lot for testing, debugging and fixing! 
> 
> --
> Best Regards,
> Yan, Zi