mbox series

[00/17] mm: introduce numa_memblks

Message ID 20240716111346.3676969-1-rppt@kernel.org (mailing list archive)
Headers show
Series mm: introduce numa_memblks | expand

Message

Mike Rapoport July 16, 2024, 11:13 a.m. UTC
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>

Hi,

Following the discussion about handling of CXL fixed memory windows on
arm64 [1] I decided to bite the bullet and move numa_memblks from x86 to
the generic code so they will be available on arm64/riscv and maybe on
loongarch sometime later.

While it could be possible to use memblock to describe CXL memory windows,
it currently lacks notion of unpopulated memory ranges and numa_memblks
does implement this.

Another reason to make numa_memblks generic is that both arch_numa (arm64
and riscv) and loongarch use trimmed copy of x86 code although there is no
fundamental reason why the same code cannot be used on all these platforms.
Having numa_memblks in mm/ will make it's interaction with ACPI and FDT
more consistent and I believe will reduce maintenance burden.

And with generic numa_memblks it is (almost) straightforward to enable NUMA
emulation on arm64 and riscv.

The first 5 commits in this series are cleanups that are not strictly
related to numa_memblks.

Commits 6-11 slightly reorder code in x86 to allow extracting numa_memblks
and NUMA emulation to the generic code.

Commits 12-14 actually move the code from arch/x86/ to mm/ and commit 15
does some aftermath cleanups.

Commit 16 switches arch_numa to numa_memblks.

Commit 17 enables usage of phys_to_target_node() and
memory_add_physaddr_to_nid() with numa_memblks.

[1] https://lore.kernel.org/all/20240529171236.32002-1-Jonathan.Cameron@huawei.com/

Mike Rapoport (Microsoft) (17):
  mm: move kernel/numa.c to mm/
  MIPS: sgi-ip27: make NODE_DATA() the same as on all other
    architectures
  MIPS: loongson64: rename __node_data to node_data
  arch, mm: move definition of node_data to generic code
  arch, mm: pull out allocation of NODE_DATA to generic code
  x86/numa: simplify numa_distance allocation
  x86/numa: move FAKE_NODE_* defines to numa_emu
  x86/numa_emu: simplify allocation of phys_dist
  x86/numa_emu: split __apicid_to_node update to a helper function
  x86/numa_emu: use a helper function to get MAX_DMA32_PFN
  x86/numa: numa_{add,remove}_cpu: make cpu parameter unsigned
  mm: introduce numa_memblks
  mm: move numa_distance and related code from x86 to numa_memblks
  mm: introduce numa_emulation
  mm: make numa_memblks more self-contained
  arch_numa: switch over to numa_memblks
  mm: make range-to-target_node lookup facility a part of numa_memblks

 arch/arm64/include/asm/Kbuild                 |   1 +
 arch/arm64/include/asm/mmzone.h               |  13 -
 arch/arm64/include/asm/topology.h             |   1 +
 arch/loongarch/include/asm/Kbuild             |   1 +
 arch/loongarch/include/asm/mmzone.h           |  16 -
 arch/loongarch/include/asm/topology.h         |   1 +
 arch/loongarch/kernel/numa.c                  |  21 -
 arch/mips/include/asm/mach-ip27/mmzone.h      |   1 -
 .../mips/include/asm/mach-loongson64/mmzone.h |   4 -
 arch/mips/loongson64/numa.c                   |  20 +-
 arch/mips/sgi-ip27/ip27-memory.c              |   2 +-
 arch/powerpc/include/asm/mmzone.h             |   6 -
 arch/powerpc/mm/numa.c                        |  26 +-
 arch/riscv/include/asm/Kbuild                 |   1 +
 arch/riscv/include/asm/mmzone.h               |  13 -
 arch/riscv/include/asm/topology.h             |   4 +
 arch/s390/include/asm/Kbuild                  |   1 +
 arch/s390/include/asm/mmzone.h                |  17 -
 arch/s390/kernel/numa.c                       |   3 -
 arch/sh/include/asm/mmzone.h                  |   3 -
 arch/sh/mm/init.c                             |   7 +-
 arch/sh/mm/numa.c                             |   3 -
 arch/sparc/include/asm/mmzone.h               |   4 -
 arch/sparc/mm/init_64.c                       |  11 +-
 arch/x86/Kconfig                              |   9 +-
 arch/x86/include/asm/Kbuild                   |   1 +
 arch/x86/include/asm/mmzone.h                 |   6 -
 arch/x86/include/asm/mmzone_32.h              |  17 -
 arch/x86/include/asm/mmzone_64.h              |  18 -
 arch/x86/include/asm/numa.h                   |  24 +-
 arch/x86/include/asm/sparsemem.h              |   9 -
 arch/x86/mm/Makefile                          |   1 -
 arch/x86/mm/amdtopology.c                     |   1 +
 arch/x86/mm/numa.c                            | 618 +-----------------
 arch/x86/mm/numa_internal.h                   |  24 -
 drivers/acpi/numa/srat.c                      |   1 +
 drivers/base/Kconfig                          |   1 +
 drivers/base/arch_numa.c                      | 223 ++-----
 drivers/cxl/Kconfig                           |   2 +-
 drivers/dax/Kconfig                           |   2 +-
 drivers/of/of_numa.c                          |   1 +
 include/asm-generic/mmzone.h                  |   5 +
 include/asm-generic/numa.h                    |   6 +-
 include/linux/numa.h                          |   5 +
 include/linux/numa_memblks.h                  |  58 ++
 kernel/Makefile                               |   1 -
 kernel/numa.c                                 |  26 -
 mm/Kconfig                                    |  11 +
 mm/Makefile                                   |   3 +
 mm/numa.c                                     |  57 ++
 {arch/x86/mm => mm}/numa_emulation.c          |  42 +-
 mm/numa_memblks.c                             | 565 ++++++++++++++++
 52 files changed, 847 insertions(+), 1070 deletions(-)
 delete mode 100644 arch/arm64/include/asm/mmzone.h
 delete mode 100644 arch/loongarch/include/asm/mmzone.h
 delete mode 100644 arch/riscv/include/asm/mmzone.h
 delete mode 100644 arch/s390/include/asm/mmzone.h
 delete mode 100644 arch/x86/include/asm/mmzone.h
 delete mode 100644 arch/x86/include/asm/mmzone_32.h
 delete mode 100644 arch/x86/include/asm/mmzone_64.h
 create mode 100644 include/asm-generic/mmzone.h
 create mode 100644 include/linux/numa_memblks.h
 delete mode 100644 kernel/numa.c
 create mode 100644 mm/numa.c
 rename {arch/x86/mm => mm}/numa_emulation.c (94%)
 create mode 100644 mm/numa_memblks.c


base-commit: 22a40d14b572deb80c0648557f4bd502d7e83826

Comments

Jonathan Cameron July 19, 2024, 1:33 p.m. UTC | #1
On Tue, 16 Jul 2024 14:13:29 +0300
Mike Rapoport <rppt@kernel.org> wrote:

> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> 
> Hi,
> 
> Following the discussion about handling of CXL fixed memory windows on
> arm64 [1] I decided to bite the bullet and move numa_memblks from x86 to
> the generic code so they will be available on arm64/riscv and maybe on
> loongarch sometime later.
> 
> While it could be possible to use memblock to describe CXL memory windows,
> it currently lacks notion of unpopulated memory ranges and numa_memblks
> does implement this.
> 
> Another reason to make numa_memblks generic is that both arch_numa (arm64
> and riscv) and loongarch use trimmed copy of x86 code although there is no
> fundamental reason why the same code cannot be used on all these platforms.
> Having numa_memblks in mm/ will make it's interaction with ACPI and FDT
> more consistent and I believe will reduce maintenance burden.
> 
> And with generic numa_memblks it is (almost) straightforward to enable NUMA
> emulation on arm64 and riscv.
> 
> The first 5 commits in this series are cleanups that are not strictly
> related to numa_memblks.
> 
> Commits 6-11 slightly reorder code in x86 to allow extracting numa_memblks
> and NUMA emulation to the generic code.
> 
> Commits 12-14 actually move the code from arch/x86/ to mm/ and commit 15
> does some aftermath cleanups.
> 
> Commit 16 switches arch_numa to numa_memblks.
> 
> Commit 17 enables usage of phys_to_target_node() and
> memory_add_physaddr_to_nid() with numa_memblks.

Hi Mike,

I've lightly tested with emulated CXL + Generic Ports and Generic
Initiators as well as more normal cpus and memory via qemu on arm64 and it's
looking good.

From my earlier series, patch 4 is probably still needed to avoid
presenting nodes with nothing in them at boot (but not if we hotplug
memory then remove it again in which case they disappear)
https://lore.kernel.org/all/20240529171236.32002-5-Jonathan.Cameron@huawei.com/
However that was broken/inconsistent before your rework so I can send that
patch separately. 

Thanks for getting this sorted!  I should get time to do more extensive
testing and review in next week or so.

Jonathan

> 
> [1] https://lore.kernel.org/all/20240529171236.32002-1-Jonathan.Cameron@huawei.com/
> 
> Mike Rapoport (Microsoft) (17):
>   mm: move kernel/numa.c to mm/
>   MIPS: sgi-ip27: make NODE_DATA() the same as on all other
>     architectures
>   MIPS: loongson64: rename __node_data to node_data
>   arch, mm: move definition of node_data to generic code
>   arch, mm: pull out allocation of NODE_DATA to generic code
>   x86/numa: simplify numa_distance allocation
>   x86/numa: move FAKE_NODE_* defines to numa_emu
>   x86/numa_emu: simplify allocation of phys_dist
>   x86/numa_emu: split __apicid_to_node update to a helper function
>   x86/numa_emu: use a helper function to get MAX_DMA32_PFN
>   x86/numa: numa_{add,remove}_cpu: make cpu parameter unsigned
>   mm: introduce numa_memblks
>   mm: move numa_distance and related code from x86 to numa_memblks
>   mm: introduce numa_emulation
>   mm: make numa_memblks more self-contained
>   arch_numa: switch over to numa_memblks
>   mm: make range-to-target_node lookup facility a part of numa_memblks
> 
>  arch/arm64/include/asm/Kbuild                 |   1 +
>  arch/arm64/include/asm/mmzone.h               |  13 -
>  arch/arm64/include/asm/topology.h             |   1 +
>  arch/loongarch/include/asm/Kbuild             |   1 +
>  arch/loongarch/include/asm/mmzone.h           |  16 -
>  arch/loongarch/include/asm/topology.h         |   1 +
>  arch/loongarch/kernel/numa.c                  |  21 -
>  arch/mips/include/asm/mach-ip27/mmzone.h      |   1 -
>  .../mips/include/asm/mach-loongson64/mmzone.h |   4 -
>  arch/mips/loongson64/numa.c                   |  20 +-
>  arch/mips/sgi-ip27/ip27-memory.c              |   2 +-
>  arch/powerpc/include/asm/mmzone.h             |   6 -
>  arch/powerpc/mm/numa.c                        |  26 +-
>  arch/riscv/include/asm/Kbuild                 |   1 +
>  arch/riscv/include/asm/mmzone.h               |  13 -
>  arch/riscv/include/asm/topology.h             |   4 +
>  arch/s390/include/asm/Kbuild                  |   1 +
>  arch/s390/include/asm/mmzone.h                |  17 -
>  arch/s390/kernel/numa.c                       |   3 -
>  arch/sh/include/asm/mmzone.h                  |   3 -
>  arch/sh/mm/init.c                             |   7 +-
>  arch/sh/mm/numa.c                             |   3 -
>  arch/sparc/include/asm/mmzone.h               |   4 -
>  arch/sparc/mm/init_64.c                       |  11 +-
>  arch/x86/Kconfig                              |   9 +-
>  arch/x86/include/asm/Kbuild                   |   1 +
>  arch/x86/include/asm/mmzone.h                 |   6 -
>  arch/x86/include/asm/mmzone_32.h              |  17 -
>  arch/x86/include/asm/mmzone_64.h              |  18 -
>  arch/x86/include/asm/numa.h                   |  24 +-
>  arch/x86/include/asm/sparsemem.h              |   9 -
>  arch/x86/mm/Makefile                          |   1 -
>  arch/x86/mm/amdtopology.c                     |   1 +
>  arch/x86/mm/numa.c                            | 618 +-----------------
>  arch/x86/mm/numa_internal.h                   |  24 -
>  drivers/acpi/numa/srat.c                      |   1 +
>  drivers/base/Kconfig                          |   1 +
>  drivers/base/arch_numa.c                      | 223 ++-----
>  drivers/cxl/Kconfig                           |   2 +-
>  drivers/dax/Kconfig                           |   2 +-
>  drivers/of/of_numa.c                          |   1 +
>  include/asm-generic/mmzone.h                  |   5 +
>  include/asm-generic/numa.h                    |   6 +-
>  include/linux/numa.h                          |   5 +
>  include/linux/numa_memblks.h                  |  58 ++
>  kernel/Makefile                               |   1 -
>  kernel/numa.c                                 |  26 -
>  mm/Kconfig                                    |  11 +
>  mm/Makefile                                   |   3 +
>  mm/numa.c                                     |  57 ++
>  {arch/x86/mm => mm}/numa_emulation.c          |  42 +-
>  mm/numa_memblks.c                             | 565 ++++++++++++++++
>  52 files changed, 847 insertions(+), 1070 deletions(-)
>  delete mode 100644 arch/arm64/include/asm/mmzone.h
>  delete mode 100644 arch/loongarch/include/asm/mmzone.h
>  delete mode 100644 arch/riscv/include/asm/mmzone.h
>  delete mode 100644 arch/s390/include/asm/mmzone.h
>  delete mode 100644 arch/x86/include/asm/mmzone.h
>  delete mode 100644 arch/x86/include/asm/mmzone_32.h
>  delete mode 100644 arch/x86/include/asm/mmzone_64.h
>  create mode 100644 include/asm-generic/mmzone.h
>  create mode 100644 include/linux/numa_memblks.h
>  delete mode 100644 kernel/numa.c
>  create mode 100644 mm/numa.c
>  rename {arch/x86/mm => mm}/numa_emulation.c (94%)
>  create mode 100644 mm/numa_memblks.c
> 
> 
> base-commit: 22a40d14b572deb80c0648557f4bd502d7e83826
Mike Rapoport July 22, 2024, 8:08 a.m. UTC | #2
On Fri, Jul 19, 2024 at 02:33:47PM +0100, Jonathan Cameron wrote:
> On Tue, 16 Jul 2024 14:13:29 +0300
> Mike Rapoport <rppt@kernel.org> wrote:
> 
> > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> > 
> > Hi,
> > 
> > Following the discussion about handling of CXL fixed memory windows on
> > arm64 [1] I decided to bite the bullet and move numa_memblks from x86 to
> > the generic code so they will be available on arm64/riscv and maybe on
> > loongarch sometime later.
> > 
> > While it could be possible to use memblock to describe CXL memory windows,
> > it currently lacks notion of unpopulated memory ranges and numa_memblks
> > does implement this.
> > 
> > Another reason to make numa_memblks generic is that both arch_numa (arm64
> > and riscv) and loongarch use trimmed copy of x86 code although there is no
> > fundamental reason why the same code cannot be used on all these platforms.
> > Having numa_memblks in mm/ will make it's interaction with ACPI and FDT
> > more consistent and I believe will reduce maintenance burden.
> > 
> > And with generic numa_memblks it is (almost) straightforward to enable NUMA
> > emulation on arm64 and riscv.
> > 
> > The first 5 commits in this series are cleanups that are not strictly
> > related to numa_memblks.
> > 
> > Commits 6-11 slightly reorder code in x86 to allow extracting numa_memblks
> > and NUMA emulation to the generic code.
> > 
> > Commits 12-14 actually move the code from arch/x86/ to mm/ and commit 15
> > does some aftermath cleanups.
> > 
> > Commit 16 switches arch_numa to numa_memblks.
> > 
> > Commit 17 enables usage of phys_to_target_node() and
> > memory_add_physaddr_to_nid() with numa_memblks.
> 
> Hi Mike,
> 
> I've lightly tested with emulated CXL + Generic Ports and Generic
> Initiators as well as more normal cpus and memory via qemu on arm64 and it's
> looking good.
> 
> From my earlier series, patch 4 is probably still needed to avoid
> presenting nodes with nothing in them at boot (but not if we hotplug
> memory then remove it again in which case they disappear)
> https://lore.kernel.org/all/20240529171236.32002-5-Jonathan.Cameron@huawei.com/
> However that was broken/inconsistent before your rework so I can send that
> patch separately. 

I'd appreciate it :)
 
> Thanks for getting this sorted!  I should get time to do more extensive
> testing and review in next week or so.

Thanks, you may want to wait for v2, I'm planning to send it this week.
 
> Jonathan
> 
> > 
> > [1] https://lore.kernel.org/all/20240529171236.32002-1-Jonathan.Cameron@huawei.com/
> > 
> > Mike Rapoport (Microsoft) (17):
> >   mm: move kernel/numa.c to mm/
> >   MIPS: sgi-ip27: make NODE_DATA() the same as on all other
> >     architectures
> >   MIPS: loongson64: rename __node_data to node_data
> >   arch, mm: move definition of node_data to generic code
> >   arch, mm: pull out allocation of NODE_DATA to generic code
> >   x86/numa: simplify numa_distance allocation
> >   x86/numa: move FAKE_NODE_* defines to numa_emu
> >   x86/numa_emu: simplify allocation of phys_dist
> >   x86/numa_emu: split __apicid_to_node update to a helper function
> >   x86/numa_emu: use a helper function to get MAX_DMA32_PFN
> >   x86/numa: numa_{add,remove}_cpu: make cpu parameter unsigned
> >   mm: introduce numa_memblks
> >   mm: move numa_distance and related code from x86 to numa_memblks
> >   mm: introduce numa_emulation
> >   mm: make numa_memblks more self-contained
> >   arch_numa: switch over to numa_memblks
> >   mm: make range-to-target_node lookup facility a part of numa_memblks
> > 
> >  arch/arm64/include/asm/Kbuild                 |   1 +
> >  arch/arm64/include/asm/mmzone.h               |  13 -
> >  arch/arm64/include/asm/topology.h             |   1 +
> >  arch/loongarch/include/asm/Kbuild             |   1 +
> >  arch/loongarch/include/asm/mmzone.h           |  16 -
> >  arch/loongarch/include/asm/topology.h         |   1 +
> >  arch/loongarch/kernel/numa.c                  |  21 -
> >  arch/mips/include/asm/mach-ip27/mmzone.h      |   1 -
> >  .../mips/include/asm/mach-loongson64/mmzone.h |   4 -
> >  arch/mips/loongson64/numa.c                   |  20 +-
> >  arch/mips/sgi-ip27/ip27-memory.c              |   2 +-
> >  arch/powerpc/include/asm/mmzone.h             |   6 -
> >  arch/powerpc/mm/numa.c                        |  26 +-
> >  arch/riscv/include/asm/Kbuild                 |   1 +
> >  arch/riscv/include/asm/mmzone.h               |  13 -
> >  arch/riscv/include/asm/topology.h             |   4 +
> >  arch/s390/include/asm/Kbuild                  |   1 +
> >  arch/s390/include/asm/mmzone.h                |  17 -
> >  arch/s390/kernel/numa.c                       |   3 -
> >  arch/sh/include/asm/mmzone.h                  |   3 -
> >  arch/sh/mm/init.c                             |   7 +-
> >  arch/sh/mm/numa.c                             |   3 -
> >  arch/sparc/include/asm/mmzone.h               |   4 -
> >  arch/sparc/mm/init_64.c                       |  11 +-
> >  arch/x86/Kconfig                              |   9 +-
> >  arch/x86/include/asm/Kbuild                   |   1 +
> >  arch/x86/include/asm/mmzone.h                 |   6 -
> >  arch/x86/include/asm/mmzone_32.h              |  17 -
> >  arch/x86/include/asm/mmzone_64.h              |  18 -
> >  arch/x86/include/asm/numa.h                   |  24 +-
> >  arch/x86/include/asm/sparsemem.h              |   9 -
> >  arch/x86/mm/Makefile                          |   1 -
> >  arch/x86/mm/amdtopology.c                     |   1 +
> >  arch/x86/mm/numa.c                            | 618 +-----------------
> >  arch/x86/mm/numa_internal.h                   |  24 -
> >  drivers/acpi/numa/srat.c                      |   1 +
> >  drivers/base/Kconfig                          |   1 +
> >  drivers/base/arch_numa.c                      | 223 ++-----
> >  drivers/cxl/Kconfig                           |   2 +-
> >  drivers/dax/Kconfig                           |   2 +-
> >  drivers/of/of_numa.c                          |   1 +
> >  include/asm-generic/mmzone.h                  |   5 +
> >  include/asm-generic/numa.h                    |   6 +-
> >  include/linux/numa.h                          |   5 +
> >  include/linux/numa_memblks.h                  |  58 ++
> >  kernel/Makefile                               |   1 -
> >  kernel/numa.c                                 |  26 -
> >  mm/Kconfig                                    |  11 +
> >  mm/Makefile                                   |   3 +
> >  mm/numa.c                                     |  57 ++
> >  {arch/x86/mm => mm}/numa_emulation.c          |  42 +-
> >  mm/numa_memblks.c                             | 565 ++++++++++++++++
> >  52 files changed, 847 insertions(+), 1070 deletions(-)
> >  delete mode 100644 arch/arm64/include/asm/mmzone.h
> >  delete mode 100644 arch/loongarch/include/asm/mmzone.h
> >  delete mode 100644 arch/riscv/include/asm/mmzone.h
> >  delete mode 100644 arch/s390/include/asm/mmzone.h
> >  delete mode 100644 arch/x86/include/asm/mmzone.h
> >  delete mode 100644 arch/x86/include/asm/mmzone_32.h
> >  delete mode 100644 arch/x86/include/asm/mmzone_64.h
> >  create mode 100644 include/asm-generic/mmzone.h
> >  create mode 100644 include/linux/numa_memblks.h
> >  delete mode 100644 kernel/numa.c
> >  create mode 100644 mm/numa.c
> >  rename {arch/x86/mm => mm}/numa_emulation.c (94%)
> >  create mode 100644 mm/numa_memblks.c
> > 
> > 
> > base-commit: 22a40d14b572deb80c0648557f4bd502d7e83826
>