diff mbox series

mm/hugetlb: Fix hugepage allocation for interleaved memory nodes

Message ID 7e0ca1e8acd7dd5c1fe7cbb252de4eb55a8e851b.1727984881.git.ritesh.list@gmail.com (mailing list archive)
State New
Headers show
Series mm/hugetlb: Fix hugepage allocation for interleaved memory nodes | expand

Commit Message

Ritesh Harjani (IBM) Oct. 3, 2024, 8 p.m. UTC
gather_bootmem_prealloc() function assumes the start nid as
0 and size as num_node_state(N_MEMORY). Since memory attached numa nodes
can be interleaved in any fashion, hence ensure current code checks for all
online numa nodes as part of gather_bootmem_prealloc_parallel().
Let's still make max_threads as N_MEMORY so that we can possibly have
a uniform distribution of online nodes among these parallel threads.

e.g. qemu cmdline
========================
numa_cmd="-numa node,nodeid=1,memdev=mem1,cpus=2-3 -numa node,nodeid=0,cpus=0-1 -numa dist,src=0,dst=1,val=20"
mem_cmd="-object memory-backend-ram,id=mem1,size=16G"

w/o this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
==========================
~ # cat /proc/meminfo  |grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB
Hugetlb:               0 kB

with this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
===========================
~ # cat /proc/meminfo |grep -i huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:       2
HugePages_Free:        2
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB
Hugetlb:         2097152 kB

Fixes: b78b27d02930 ("hugetlb: parallelize 1G hugetlb initialization")
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: Donet Tom <donettom@linux.ibm.com>
Cc: Gang Li <gang.li@linux.dev>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: David Rientjes <rientjes@google.com>
Cc: linux-mm@kvack.org
---

==== Additional data ====

w/o this patch:
================
~ # dmesg |grep -Ei "numa|node|huge"
[    0.000000][    T0] numa: Partition configured for 2 NUMA nodes.
[    0.000000][    T0]  memory[0x0]     [0x0000000000000000-0x00000003ffffffff], 0x0000000400000000 bytes on node 1 flags: 0x0
[    0.000000][    T0] numa:   NODE_DATA [mem 0x3dde50800-0x3dde57fff]
[    0.000000][    T0] numa:     NODE_DATA(0) on node 1
[    0.000000][    T0] numa:   NODE_DATA [mem 0x3dde49000-0x3dde507ff]
[    0.000000][    T0] Movable zone start for each node
[    0.000000][    T0] Early memory node ranges
[    0.000000][    T0]   node   1: [mem 0x0000000000000000-0x00000003ffffffff]
[    0.000000][    T0] Initmem setup node 0 as memoryless
[    0.000000][    T0] Initmem setup node 1 [mem 0x0000000000000000-0x00000003ffffffff]
[    0.000000][    T0] Kernel command line: root=/dev/vda1 console=ttyS0 nokaslr slub_max_order=0 norandmaps memblock=debug noreboot default_hugepagesz=1GB hugepagesz=1GB hugepages=2
[    0.000000][    T0] memblock_alloc_try_nid_raw: 1073741824 bytes align=0x40000000 nid=1 from=0x0000000000000000 max_addr=0x0000000000000000 __alloc_bootmem_huge_page+0x1ac/0x2c8
[    0.000000][    T0] memblock_alloc_try_nid_raw: 1073741824 bytes align=0x40000000 nid=1 from=0x0000000000000000 max_addr=0x0000000000000000 __alloc_bootmem_huge_page+0x1ac/0x2c8
[    0.000000][    T0] Inode-cache hash table entries: 1048576 (order: 7, 8388608 bytes, linear)
[    0.000000][    T0] Fallback order for Node 0: 1
[    0.000000][    T0] Fallback order for Node 1: 1
[    0.000000][    T0] SLUB: HWalign=128, Order=0-0, MinObjects=0, CPUs=4, Nodes=2
[    0.044978][    T0] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
[    0.209159][    T1] Timer migration: 2 hierarchy levels; 8 children per group; 1 crossnode level
[    0.414281][    T1] smp: Brought up 2 nodes, 4 CPUs
[    0.415268][    T1] numa: Node 0 CPUs: 0-1
[    0.416030][    T1] numa: Node 1 CPUs: 2-3
[   13.644459][   T41] node 1 deferred pages initialised in 12040ms
[   14.241701][    T1] HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
[   14.242781][    T1] HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
[   14.243806][    T1] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
[   14.244753][    T1] HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page
[   16.490452][    T1] pci_bus 0000:00: Unknown NUMA node; performance will be reduced
[   27.804266][    T1] Demotion targets for Node 1: null

with this patch:
=================
~ # dmesg |grep -Ei "numa|node|huge"
[    0.000000][    T0] numa: Partition configured for 2 NUMA nodes.
[    0.000000][    T0]  memory[0x0]     [0x0000000000000000-0x00000003ffffffff], 0x0000000400000000 bytes on node 1 flags: 0x0
[    0.000000][    T0] numa:   NODE_DATA [mem 0x3dde50800-0x3dde57fff]
[    0.000000][    T0] numa:     NODE_DATA(0) on node 1
[    0.000000][    T0] numa:   NODE_DATA [mem 0x3dde49000-0x3dde507ff]
[    0.000000][    T0] Movable zone start for each node
[    0.000000][    T0] Early memory node ranges
[    0.000000][    T0]   node   1: [mem 0x0000000000000000-0x00000003ffffffff]
[    0.000000][    T0] Initmem setup node 0 as memoryless
[    0.000000][    T0] Initmem setup node 1 [mem 0x0000000000000000-0x00000003ffffffff]
[    0.000000][    T0] Kernel command line: root=/dev/vda1 console=ttyS0 nokaslr slub_max_order=0 norandmaps memblock=debug noreboot default_hugepagesz=1GB hugepagesz=1GB hugepages=2
[    0.000000][    T0] memblock_alloc_try_nid_raw: 1073741824 bytes align=0x40000000 nid=1 from=0x0000000000000000 max_addr=0x0000000000000000 __alloc_bootmem_huge_page+0x1ac/0x2c8
[    0.000000][    T0] memblock_alloc_try_nid_raw: 1073741824 bytes align=0x40000000 nid=1 from=0x0000000000000000 max_addr=0x0000000000000000 __alloc_bootmem_huge_page+0x1ac/0x2c8
[    0.000000][    T0] Inode-cache hash table entries: 1048576 (order: 7, 8388608 bytes, linear)
[    0.000000][    T0] Fallback order for Node 0: 1
[    0.000000][    T0] Fallback order for Node 1: 1
[    0.000000][    T0] SLUB: HWalign=128, Order=0-0, MinObjects=0, CPUs=4, Nodes=2
[    0.048825][    T0] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
[    0.204211][    T1] Timer migration: 2 hierarchy levels; 8 children per group; 1 crossnode level
[    0.378821][    T1] smp: Brought up 2 nodes, 4 CPUs
[    0.379642][    T1] numa: Node 0 CPUs: 0-1
[    0.380302][    T1] numa: Node 1 CPUs: 2-3
[   11.577527][   T41] node 1 deferred pages initialised in 10250ms
[   12.557856][    T1] HugeTLB: registered 1.00 GiB page size, pre-allocated 2 pages
[   12.574197][    T1] HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
[   12.576339][    T1] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
[   12.577262][    T1] HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page
[   15.102445][    T1] pci_bus 0000:00: Unknown NUMA node; performance will be reduced
[   26.173888][    T1] Demotion targets for Node 1: null

 mm/hugetlb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--
2.39.5

Comments

Ritesh Harjani (IBM) Oct. 7, 2024, 6:45 p.m. UTC | #1
"Ritesh Harjani (IBM)" <ritesh.list@gmail.com> writes:

> gather_bootmem_prealloc() function assumes the start nid as
> 0 and size as num_node_state(N_MEMORY). Since memory attached numa nodes
> can be interleaved in any fashion, hence ensure current code checks for all
> online numa nodes as part of gather_bootmem_prealloc_parallel().
> Let's still make max_threads as N_MEMORY so that we can possibly have
> a uniform distribution of online nodes among these parallel threads.
>
> e.g. qemu cmdline
> ========================
> numa_cmd="-numa node,nodeid=1,memdev=mem1,cpus=2-3 -numa node,nodeid=0,cpus=0-1 -numa dist,src=0,dst=1,val=20"
> mem_cmd="-object memory-backend-ram,id=mem1,size=16G"
>

I think this patch still might not work for below numa config. Because
in this we have an offline node-0, node-1 with only cpus and node-2 with 
cpus and memory. 

numa_cmd="-numa node,nodeid=2,memdev=mem1,cpus=2-3 -numa node,nodeid=1,cpus=0-1 -numa node,nodeid=0"
mem_cmd="-object memory-backend-ram,id=mem1,size=32G"

Maybe N_POSSIBLE will help instead of N_MEMORY in below patch, but let
me give some thought to this before posting v2.

-ritesh


> w/o this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
> ==========================
> ~ # cat /proc/meminfo  |grep -i huge
> AnonHugePages:         0 kB
> ShmemHugePages:        0 kB
> FileHugePages:         0 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:    1048576 kB
> Hugetlb:               0 kB
>
> with this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
> ===========================
> ~ # cat /proc/meminfo |grep -i huge
> AnonHugePages:         0 kB
> ShmemHugePages:        0 kB
> FileHugePages:         0 kB
> HugePages_Total:       2
> HugePages_Free:        2
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:    1048576 kB
> Hugetlb:         2097152 kB
>
> Fixes: b78b27d02930 ("hugetlb: parallelize 1G hugetlb initialization")
> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> Cc: Donet Tom <donettom@linux.ibm.com>
> Cc: Gang Li <gang.li@linux.dev>
> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Muchun Song <muchun.song@linux.dev>
> Cc: David Rientjes <rientjes@google.com>
> Cc: linux-mm@kvack.org
> ---
>
> ==== Additional data ====
>
> w/o this patch:
> ================
> ~ # dmesg |grep -Ei "numa|node|huge"
> [    0.000000][    T0] numa: Partition configured for 2 NUMA nodes.
> [    0.000000][    T0]  memory[0x0]     [0x0000000000000000-0x00000003ffffffff], 0x0000000400000000 bytes on node 1 flags: 0x0
> [    0.000000][    T0] numa:   NODE_DATA [mem 0x3dde50800-0x3dde57fff]
> [    0.000000][    T0] numa:     NODE_DATA(0) on node 1
> [    0.000000][    T0] numa:   NODE_DATA [mem 0x3dde49000-0x3dde507ff]
> [    0.000000][    T0] Movable zone start for each node
> [    0.000000][    T0] Early memory node ranges
> [    0.000000][    T0]   node   1: [mem 0x0000000000000000-0x00000003ffffffff]
> [    0.000000][    T0] Initmem setup node 0 as memoryless
> [    0.000000][    T0] Initmem setup node 1 [mem 0x0000000000000000-0x00000003ffffffff]
> [    0.000000][    T0] Kernel command line: root=/dev/vda1 console=ttyS0 nokaslr slub_max_order=0 norandmaps memblock=debug noreboot default_hugepagesz=1GB hugepagesz=1GB hugepages=2
> [    0.000000][    T0] memblock_alloc_try_nid_raw: 1073741824 bytes align=0x40000000 nid=1 from=0x0000000000000000 max_addr=0x0000000000000000 __alloc_bootmem_huge_page+0x1ac/0x2c8
> [    0.000000][    T0] memblock_alloc_try_nid_raw: 1073741824 bytes align=0x40000000 nid=1 from=0x0000000000000000 max_addr=0x0000000000000000 __alloc_bootmem_huge_page+0x1ac/0x2c8
> [    0.000000][    T0] Inode-cache hash table entries: 1048576 (order: 7, 8388608 bytes, linear)
> [    0.000000][    T0] Fallback order for Node 0: 1
> [    0.000000][    T0] Fallback order for Node 1: 1
> [    0.000000][    T0] SLUB: HWalign=128, Order=0-0, MinObjects=0, CPUs=4, Nodes=2
> [    0.044978][    T0] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
> [    0.209159][    T1] Timer migration: 2 hierarchy levels; 8 children per group; 1 crossnode level
> [    0.414281][    T1] smp: Brought up 2 nodes, 4 CPUs
> [    0.415268][    T1] numa: Node 0 CPUs: 0-1
> [    0.416030][    T1] numa: Node 1 CPUs: 2-3
> [   13.644459][   T41] node 1 deferred pages initialised in 12040ms
> [   14.241701][    T1] HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
> [   14.242781][    T1] HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
> [   14.243806][    T1] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
> [   14.244753][    T1] HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page
> [   16.490452][    T1] pci_bus 0000:00: Unknown NUMA node; performance will be reduced
> [   27.804266][    T1] Demotion targets for Node 1: null
>
> with this patch:
> =================
> ~ # dmesg |grep -Ei "numa|node|huge"
> [    0.000000][    T0] numa: Partition configured for 2 NUMA nodes.
> [    0.000000][    T0]  memory[0x0]     [0x0000000000000000-0x00000003ffffffff], 0x0000000400000000 bytes on node 1 flags: 0x0
> [    0.000000][    T0] numa:   NODE_DATA [mem 0x3dde50800-0x3dde57fff]
> [    0.000000][    T0] numa:     NODE_DATA(0) on node 1
> [    0.000000][    T0] numa:   NODE_DATA [mem 0x3dde49000-0x3dde507ff]
> [    0.000000][    T0] Movable zone start for each node
> [    0.000000][    T0] Early memory node ranges
> [    0.000000][    T0]   node   1: [mem 0x0000000000000000-0x00000003ffffffff]
> [    0.000000][    T0] Initmem setup node 0 as memoryless
> [    0.000000][    T0] Initmem setup node 1 [mem 0x0000000000000000-0x00000003ffffffff]
> [    0.000000][    T0] Kernel command line: root=/dev/vda1 console=ttyS0 nokaslr slub_max_order=0 norandmaps memblock=debug noreboot default_hugepagesz=1GB hugepagesz=1GB hugepages=2
> [    0.000000][    T0] memblock_alloc_try_nid_raw: 1073741824 bytes align=0x40000000 nid=1 from=0x0000000000000000 max_addr=0x0000000000000000 __alloc_bootmem_huge_page+0x1ac/0x2c8
> [    0.000000][    T0] memblock_alloc_try_nid_raw: 1073741824 bytes align=0x40000000 nid=1 from=0x0000000000000000 max_addr=0x0000000000000000 __alloc_bootmem_huge_page+0x1ac/0x2c8
> [    0.000000][    T0] Inode-cache hash table entries: 1048576 (order: 7, 8388608 bytes, linear)
> [    0.000000][    T0] Fallback order for Node 0: 1
> [    0.000000][    T0] Fallback order for Node 1: 1
> [    0.000000][    T0] SLUB: HWalign=128, Order=0-0, MinObjects=0, CPUs=4, Nodes=2
> [    0.048825][    T0] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
> [    0.204211][    T1] Timer migration: 2 hierarchy levels; 8 children per group; 1 crossnode level
> [    0.378821][    T1] smp: Brought up 2 nodes, 4 CPUs
> [    0.379642][    T1] numa: Node 0 CPUs: 0-1
> [    0.380302][    T1] numa: Node 1 CPUs: 2-3
> [   11.577527][   T41] node 1 deferred pages initialised in 10250ms
> [   12.557856][    T1] HugeTLB: registered 1.00 GiB page size, pre-allocated 2 pages
> [   12.574197][    T1] HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
> [   12.576339][    T1] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
> [   12.577262][    T1] HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page
> [   15.102445][    T1] pci_bus 0000:00: Unknown NUMA node; performance will be reduced
> [   26.173888][    T1] Demotion targets for Node 1: null
>
>  mm/hugetlb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 9a3a6e2dee97..60f45314c151 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3443,7 +3443,7 @@ static void __init gather_bootmem_prealloc(void)
>  		.thread_fn	= gather_bootmem_prealloc_parallel,
>  		.fn_arg		= NULL,
>  		.start		= 0,
> -		.size		= num_node_state(N_MEMORY),
> +		.size		= num_node_state(N_ONLINE),
>  		.align		= 1,
>  		.min_chunk	= 1,
>  		.max_threads	= num_node_state(N_MEMORY),
> --
> 2.39.5
Muchun Song Oct. 8, 2024, 7:59 a.m. UTC | #2
> On Oct 8, 2024, at 02:45, Ritesh Harjani (IBM) <ritesh.list@gmail.com> wrote:
> 
> "Ritesh Harjani (IBM)" <ritesh.list@gmail.com> writes:
> 
>> gather_bootmem_prealloc() function assumes the start nid as
>> 0 and size as num_node_state(N_MEMORY). Since memory attached numa nodes
>> can be interleaved in any fashion, hence ensure current code checks for all
>> online numa nodes as part of gather_bootmem_prealloc_parallel().
>> Let's still make max_threads as N_MEMORY so that we can possibly have
>> a uniform distribution of online nodes among these parallel threads.
>> 
>> e.g. qemu cmdline
>> ========================
>> numa_cmd="-numa node,nodeid=1,memdev=mem1,cpus=2-3 -numa node,nodeid=0,cpus=0-1 -numa dist,src=0,dst=1,val=20"
>> mem_cmd="-object memory-backend-ram,id=mem1,size=16G"
>> 
> 
> I think this patch still might not work for below numa config. Because
> in this we have an offline node-0, node-1 with only cpus and node-2 with 
> cpus and memory. 
> 
> numa_cmd="-numa node,nodeid=2,memdev=mem1,cpus=2-3 -numa node,nodeid=1,cpus=0-1 -numa node,nodeid=0"
> mem_cmd="-object memory-backend-ram,id=mem1,size=32G"
> 
> Maybe N_POSSIBLE will help instead of N_MEMORY in below patch, but let
> me give some thought to this before posting v2.

How about setting .size with nr_node_ids?

Muchun,
THanks.

> 
> -ritesh
> 
> 
>> w/o this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
>> ==========================
>> ~ # cat /proc/meminfo  |grep -i huge
>> AnonHugePages:         0 kB
>> ShmemHugePages:        0 kB
>> FileHugePages:         0 kB
>> HugePages_Total:       0
>> HugePages_Free:        0
>> HugePages_Rsvd:        0
>> HugePages_Surp:        0
>> Hugepagesize:    1048576 kB
>> Hugetlb:               0 kB
>> 
>> with this patch for cmdline (default_hugepagesz=1GB hugepagesz=1GB hugepages=2):
>> ===========================
>> ~ # cat /proc/meminfo |grep -i huge
>> AnonHugePages:         0 kB
>> ShmemHugePages:        0 kB
>> FileHugePages:         0 kB
>> HugePages_Total:       2
>> HugePages_Free:        2
>> HugePages_Rsvd:        0
>> HugePages_Surp:        0
>> Hugepagesize:    1048576 kB
>> Hugetlb:         2097152 kB
>> 
>> Fixes: b78b27d02930 ("hugetlb: parallelize 1G hugetlb initialization")
>> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>> Cc: Donet Tom <donettom@linux.ibm.com>
>> Cc: Gang Li <gang.li@linux.dev>
>> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
>> Cc: Muchun Song <muchun.song@linux.dev>
>> Cc: David Rientjes <rientjes@google.com>
>> Cc: linux-mm@kvack.org
>> ---
>> 
>> ==== Additional data ====
>> 
>> w/o this patch:
>> ================
>> ~ # dmesg |grep -Ei "numa|node|huge"
>> [    0.000000][    T0] numa: Partition configured for 2 NUMA nodes.
>> [    0.000000][    T0]  memory[0x0]     [0x0000000000000000-0x00000003ffffffff], 0x0000000400000000 bytes on node 1 flags: 0x0
>> [    0.000000][    T0] numa:   NODE_DATA [mem 0x3dde50800-0x3dde57fff]
>> [    0.000000][    T0] numa:     NODE_DATA(0) on node 1
>> [    0.000000][    T0] numa:   NODE_DATA [mem 0x3dde49000-0x3dde507ff]
>> [    0.000000][    T0] Movable zone start for each node
>> [    0.000000][    T0] Early memory node ranges
>> [    0.000000][    T0]   node   1: [mem 0x0000000000000000-0x00000003ffffffff]
>> [    0.000000][    T0] Initmem setup node 0 as memoryless
>> [    0.000000][    T0] Initmem setup node 1 [mem 0x0000000000000000-0x00000003ffffffff]
>> [    0.000000][    T0] Kernel command line: root=/dev/vda1 console=ttyS0 nokaslr slub_max_order=0 norandmaps memblock=debug noreboot default_hugepagesz=1GB hugepagesz=1GB hugepages=2
>> [    0.000000][    T0] memblock_alloc_try_nid_raw: 1073741824 bytes align=0x40000000 nid=1 from=0x0000000000000000 max_addr=0x0000000000000000 __alloc_bootmem_huge_page+0x1ac/0x2c8
>> [    0.000000][    T0] memblock_alloc_try_nid_raw: 1073741824 bytes align=0x40000000 nid=1 from=0x0000000000000000 max_addr=0x0000000000000000 __alloc_bootmem_huge_page+0x1ac/0x2c8
>> [    0.000000][    T0] Inode-cache hash table entries: 1048576 (order: 7, 8388608 bytes, linear)
>> [    0.000000][    T0] Fallback order for Node 0: 1
>> [    0.000000][    T0] Fallback order for Node 1: 1
>> [    0.000000][    T0] SLUB: HWalign=128, Order=0-0, MinObjects=0, CPUs=4, Nodes=2
>> [    0.044978][    T0] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
>> [    0.209159][    T1] Timer migration: 2 hierarchy levels; 8 children per group; 1 crossnode level
>> [    0.414281][    T1] smp: Brought up 2 nodes, 4 CPUs
>> [    0.415268][    T1] numa: Node 0 CPUs: 0-1
>> [    0.416030][    T1] numa: Node 1 CPUs: 2-3
>> [   13.644459][   T41] node 1 deferred pages initialised in 12040ms
>> [   14.241701][    T1] HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
>> [   14.242781][    T1] HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
>> [   14.243806][    T1] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
>> [   14.244753][    T1] HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page
>> [   16.490452][    T1] pci_bus 0000:00: Unknown NUMA node; performance will be reduced
>> [   27.804266][    T1] Demotion targets for Node 1: null
>> 
>> with this patch:
>> =================
>> ~ # dmesg |grep -Ei "numa|node|huge"
>> [    0.000000][    T0] numa: Partition configured for 2 NUMA nodes.
>> [    0.000000][    T0]  memory[0x0]     [0x0000000000000000-0x00000003ffffffff], 0x0000000400000000 bytes on node 1 flags: 0x0
>> [    0.000000][    T0] numa:   NODE_DATA [mem 0x3dde50800-0x3dde57fff]
>> [    0.000000][    T0] numa:     NODE_DATA(0) on node 1
>> [    0.000000][    T0] numa:   NODE_DATA [mem 0x3dde49000-0x3dde507ff]
>> [    0.000000][    T0] Movable zone start for each node
>> [    0.000000][    T0] Early memory node ranges
>> [    0.000000][    T0]   node   1: [mem 0x0000000000000000-0x00000003ffffffff]
>> [    0.000000][    T0] Initmem setup node 0 as memoryless
>> [    0.000000][    T0] Initmem setup node 1 [mem 0x0000000000000000-0x00000003ffffffff]
>> [    0.000000][    T0] Kernel command line: root=/dev/vda1 console=ttyS0 nokaslr slub_max_order=0 norandmaps memblock=debug noreboot default_hugepagesz=1GB hugepagesz=1GB hugepages=2
>> [    0.000000][    T0] memblock_alloc_try_nid_raw: 1073741824 bytes align=0x40000000 nid=1 from=0x0000000000000000 max_addr=0x0000000000000000 __alloc_bootmem_huge_page+0x1ac/0x2c8
>> [    0.000000][    T0] memblock_alloc_try_nid_raw: 1073741824 bytes align=0x40000000 nid=1 from=0x0000000000000000 max_addr=0x0000000000000000 __alloc_bootmem_huge_page+0x1ac/0x2c8
>> [    0.000000][    T0] Inode-cache hash table entries: 1048576 (order: 7, 8388608 bytes, linear)
>> [    0.000000][    T0] Fallback order for Node 0: 1
>> [    0.000000][    T0] Fallback order for Node 1: 1
>> [    0.000000][    T0] SLUB: HWalign=128, Order=0-0, MinObjects=0, CPUs=4, Nodes=2
>> [    0.048825][    T0] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
>> [    0.204211][    T1] Timer migration: 2 hierarchy levels; 8 children per group; 1 crossnode level
>> [    0.378821][    T1] smp: Brought up 2 nodes, 4 CPUs
>> [    0.379642][    T1] numa: Node 0 CPUs: 0-1
>> [    0.380302][    T1] numa: Node 1 CPUs: 2-3
>> [   11.577527][   T41] node 1 deferred pages initialised in 10250ms
>> [   12.557856][    T1] HugeTLB: registered 1.00 GiB page size, pre-allocated 2 pages
>> [   12.574197][    T1] HugeTLB: 0 KiB vmemmap can be freed for a 1.00 GiB page
>> [   12.576339][    T1] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
>> [   12.577262][    T1] HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page
>> [   15.102445][    T1] pci_bus 0000:00: Unknown NUMA node; performance will be reduced
>> [   26.173888][    T1] Demotion targets for Node 1: null
>> 
>> mm/hugetlb.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index 9a3a6e2dee97..60f45314c151 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -3443,7 +3443,7 @@ static void __init gather_bootmem_prealloc(void)
>> .thread_fn = gather_bootmem_prealloc_parallel,
>> .fn_arg = NULL,
>> .start = 0,
>> - .size = num_node_state(N_MEMORY),
>> + .size = num_node_state(N_ONLINE),
>> .align = 1,
>> .min_chunk = 1,
>> .max_threads = num_node_state(N_MEMORY),
>> --
>> 2.39.5
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 9a3a6e2dee97..60f45314c151 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3443,7 +3443,7 @@  static void __init gather_bootmem_prealloc(void)
 		.thread_fn	= gather_bootmem_prealloc_parallel,
 		.fn_arg		= NULL,
 		.start		= 0,
-		.size		= num_node_state(N_MEMORY),
+		.size		= num_node_state(N_ONLINE),
 		.align		= 1,
 		.min_chunk	= 1,
 		.max_threads	= num_node_state(N_MEMORY),