Message ID | 20231225093025.23215-1-huangpei@loongson.cn (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | MIPS: loongson64: fix boot failure | expand |
On Mon, Dec 25, 2023 at 05:30:25PM +0800, Huang Pei wrote: > arch/mips/loongson64/numa.c | 2 -- > 2 files changed, 2 insertions(+), 3 deletions(-) > > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c > index 246c6a6b0261..9b632b4c10c3 100644 > --- a/arch/mips/kernel/traps.c > +++ b/arch/mips/kernel/traps.c > @@ -2007,7 +2007,8 @@ unsigned long vi_handlers[64]; > > void reserve_exception_space(phys_addr_t addr, unsigned long size) > { > - memblock_reserve(addr, size); > + if(memblock_is_region_memory(addr, size)) > + memblock_reserve(addr, size); > } I don't think this is a good idea, reserve_exception_space() is called in cpu_probe() and at that point memblock setup hasn't been done. So we end up in a situation commit bd67b711bfaa ("MIPS: kernel: Reserve exception base early to prevent corruption") fixed. Thomas.
There are two issues here: + one is that misuse of memblock_reserve, memblock_reserve is allocating memory (owned by kernel) for kernel use, but the misuse is reserving memory (not owned by kernel). Actually, not memblock_add{,_note}ing memory is right use of reserving memory not owned by kernel. + the other one is the using memblock_reserve without knowing whether the memory is owned by kernel, but this is an issue for MIPS R2+, so just bypass it;
Hi mm folks, Just a quick question, what is the expected behavior of memblock_reserve a region that is not added to memblock with memblock_add? I'm unable to find any documentation about memblock_reserve in comments and boot-time-mm, but as per my understanding to the code, this should be a legit usage? In practical we run into uninitialized nid of reserved block problem, should we fix it in our usage, or on memblock side? Thanks 在 2023/12/25 09:30, Huang Pei 写道: > Since commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()), > loongson64 booting failed with CONFIG_DEFERRED_STRUCT_PAGE_INIT like > this: > ---------------------------------------------------------------------- > Call Trace: > [<ffffffff8235d088>] reserve_bootmem_region+0xa8/0x184 > [<ffffffff82333940>] memblock_free_all+0x104/0x2a8 > [<ffffffff8231d8e4>] mem_init+0x84/0x94 > [<ffffffff82330958>] mm_core_init+0xf8/0x308 > [<ffffffff82318c38>] start_kernel+0x43c/0x86c > > Code: 10400028 2402fff0 de420000 <dc432880> 0203182b 14600022 > 64420070 00003025 24040003 > > ---[ end trace 0000000000000000 ]--- > Kernel panic - not syncing: Attempted to kill the idle task! > ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- > ---------------------------------------------------------------------- > > The root cause is no memory region "0x0-0x1fffff" paired with > memory-reserved region "0x0-0x1fffff" and "0x0-0xfff", with "memblock > =debug": > > ---------------------------------------------------------------------- > memory[0x0] [0x0000000000200000-0x000000000effffff], > 0x000000000ee00000 bytes on node 0 flags: 0x0 !!!!here > memory[0x1] [0x0000000090000000-0x00000000fdffffff], > 0x000000006e000000 bytes on node 0 flags: 0x0 > memory[0x2] [0x0000000100000000-0x000000027fffffff], > 0x0000000180000000 bytes on node 0 flags: 0x0 > memory[0x3] [0x0000100000000000-0x000010000fffffff], > 0x0000000010000000 bytes on node 1 flags: 0x0 > memory[0x4] [0x0000100090000000-0x000010027fffffff], > 0x00000001f0000000 bytes on node 1 flags: 0x0 > reserved.cnt = 0x1f > reserved[0x0] [0x0000000000000000-0x000000000190c80a], > 0x000000000190c80b bytes flags: 0x0 !!!!oops 0x0-0x1fffff not in memory[0] > reserved[0x1] [0x000000000190c810-0x000000000190eea3], > 0x0000000000002694 bytes flags: 0x0 > ---------------------------------------------------------------------- > > It caused memory-reserved region "0x0-0x1fffff" without valid node id > in "memblock_get_region_node" from "memmap_init_reserved_pages", lead to > "reserve_bootmem_region-> init_reserved_page -> early_pfn_to_nid()" > accessing "node_data" out of bound. > > To fix this bug, we should remove unnecessary memory block reservation. > > +. no need to reserve 0x0-0x1fffff below kernel loading address, since > it is not registered by "memblock_add_node" > > +. no need to reserve 0x0-0xfff for exception handling if it is not > registered by "memblock_add" either. > > Fixes: commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()) > Signed-off-by: Huang Pei <huangpei@loongson.cn> > --- > arch/mips/kernel/traps.c | 3 ++- > arch/mips/loongson64/numa.c | 2 -- > 2 files changed, 2 insertions(+), 3 deletions(-) > > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c > index 246c6a6b0261..9b632b4c10c3 100644 > --- a/arch/mips/kernel/traps.c > +++ b/arch/mips/kernel/traps.c > @@ -2007,7 +2007,8 @@ unsigned long vi_handlers[64]; > > void reserve_exception_space(phys_addr_t addr, unsigned long size) > { > - memblock_reserve(addr, size); > + if(memblock_is_region_memory(addr, size)) > + memblock_reserve(addr, size); > } > > void __init *set_except_vector(int n, void *addr) > diff --git a/arch/mips/loongson64/numa.c b/arch/mips/loongson64/numa.c > index 8f61e93c0c5b..0f516dde81da 100644 > --- a/arch/mips/loongson64/numa.c > +++ b/arch/mips/loongson64/numa.c > @@ -130,8 +130,6 @@ static void __init node_mem_init(unsigned int node) > memblock_reserve((node_addrspace_offset | 0xfe000000), > 32 << 20); > > - /* Reserve pfn range 0~node[0]->node_start_pfn */ > - memblock_reserve(0, PAGE_SIZE * start_pfn); > } > } >
On Mon, Jan 15, 2024 at 02:08:21PM +0000, Jiaxun Yang wrote: > Hi mm folks, > > Just a quick question, what is the expected behavior of memblock_reserve > a region that is not added to memblock with memblock_add? > > I'm unable to find any documentation about memblock_reserve in comments and > boot-time-mm, but as per my understanding to the code, this should be a > legit usage? > > In practical we run into uninitialized nid of reserved block problem, should > we fix it > in our usage, or on memblock side? > > Thanks I have same question. Kernel make a dmi info(see dmi_string, or add memblock=debug) copy. If reserved SMIBIOS memory belong to kernel, no need to make a copy. But if not, how could it be included in a range of memblock.memory? > > 在 2023/12/25 09:30, Huang Pei 写道: > > Since commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()), > > loongson64 booting failed with CONFIG_DEFERRED_STRUCT_PAGE_INIT like > > this: > > ---------------------------------------------------------------------- > > Call Trace: > > [<ffffffff8235d088>] reserve_bootmem_region+0xa8/0x184 > > [<ffffffff82333940>] memblock_free_all+0x104/0x2a8 > > [<ffffffff8231d8e4>] mem_init+0x84/0x94 > > [<ffffffff82330958>] mm_core_init+0xf8/0x308 > > [<ffffffff82318c38>] start_kernel+0x43c/0x86c > > > > Code: 10400028 2402fff0 de420000 <dc432880> 0203182b 14600022 > > 64420070 00003025 24040003 > > > > ---[ end trace 0000000000000000 ]--- > > Kernel panic - not syncing: Attempted to kill the idle task! > > ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- > > ---------------------------------------------------------------------- > > > > The root cause is no memory region "0x0-0x1fffff" paired with > > memory-reserved region "0x0-0x1fffff" and "0x0-0xfff", with "memblock > > =debug": > > > > ---------------------------------------------------------------------- > > memory[0x0] [0x0000000000200000-0x000000000effffff], > > 0x000000000ee00000 bytes on node 0 flags: 0x0 !!!!here > > memory[0x1] [0x0000000090000000-0x00000000fdffffff], > > 0x000000006e000000 bytes on node 0 flags: 0x0 > > memory[0x2] [0x0000000100000000-0x000000027fffffff], > > 0x0000000180000000 bytes on node 0 flags: 0x0 > > memory[0x3] [0x0000100000000000-0x000010000fffffff], > > 0x0000000010000000 bytes on node 1 flags: 0x0 > > memory[0x4] [0x0000100090000000-0x000010027fffffff], > > 0x00000001f0000000 bytes on node 1 flags: 0x0 > > reserved.cnt = 0x1f > > reserved[0x0] [0x0000000000000000-0x000000000190c80a], > > 0x000000000190c80b bytes flags: 0x0 !!!!oops 0x0-0x1fffff not in memory[0] > > reserved[0x1] [0x000000000190c810-0x000000000190eea3], > > 0x0000000000002694 bytes flags: 0x0 > > ---------------------------------------------------------------------- > > > > It caused memory-reserved region "0x0-0x1fffff" without valid node id > > in "memblock_get_region_node" from "memmap_init_reserved_pages", lead to > > "reserve_bootmem_region-> init_reserved_page -> early_pfn_to_nid()" > > accessing "node_data" out of bound. > > > > To fix this bug, we should remove unnecessary memory block reservation. > > > > +. no need to reserve 0x0-0x1fffff below kernel loading address, since > > it is not registered by "memblock_add_node" > > > > +. no need to reserve 0x0-0xfff for exception handling if it is not > > registered by "memblock_add" either. > > > > Fixes: commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()) > > Signed-off-by: Huang Pei <huangpei@loongson.cn> > > --- > > arch/mips/kernel/traps.c | 3 ++- > > arch/mips/loongson64/numa.c | 2 -- > > 2 files changed, 2 insertions(+), 3 deletions(-) > > > > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c > > index 246c6a6b0261..9b632b4c10c3 100644 > > --- a/arch/mips/kernel/traps.c > > +++ b/arch/mips/kernel/traps.c > > @@ -2007,7 +2007,8 @@ unsigned long vi_handlers[64]; > > void reserve_exception_space(phys_addr_t addr, unsigned long size) > > { > > - memblock_reserve(addr, size); > > + if(memblock_is_region_memory(addr, size)) > > + memblock_reserve(addr, size); > > } > > void __init *set_except_vector(int n, void *addr) > > diff --git a/arch/mips/loongson64/numa.c b/arch/mips/loongson64/numa.c > > index 8f61e93c0c5b..0f516dde81da 100644 > > --- a/arch/mips/loongson64/numa.c > > +++ b/arch/mips/loongson64/numa.c > > @@ -130,8 +130,6 @@ static void __init node_mem_init(unsigned int node) > > memblock_reserve((node_addrspace_offset | 0xfe000000), > > 32 << 20); > > - /* Reserve pfn range 0~node[0]->node_start_pfn */ > > - memblock_reserve(0, PAGE_SIZE * start_pfn); > > } > > } > > -- > --- > Jiaxun Yang
On Mon, Jan 15, 2024 at 02:08:21PM +0000, Jiaxun Yang wrote: > Hi mm folks, > > Just a quick question, what is the expected behavior of memblock_reserve > a region that is not added to memblock with memblock_add? > > I'm unable to find any documentation about memblock_reserve in comments and > boot-time-mm, but as per my understanding to the code, this should be a > legit usage? Yes, memblock allows reserving memory that was not added to memblock with memblock_add(). > In practical we run into uninitialized nid of reserved block problem, should > we fix it > in our usage, or on memblock side? Apparently it's a bug in memblock :( If you revert 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") does the issue disappear? > Thanks > > 在 2023/12/25 09:30, Huang Pei 写道: > > Since commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()), > > loongson64 booting failed with CONFIG_DEFERRED_STRUCT_PAGE_INIT like > > this: > > ---------------------------------------------------------------------- > > Call Trace: > > [<ffffffff8235d088>] reserve_bootmem_region+0xa8/0x184 > > [<ffffffff82333940>] memblock_free_all+0x104/0x2a8 > > [<ffffffff8231d8e4>] mem_init+0x84/0x94 > > [<ffffffff82330958>] mm_core_init+0xf8/0x308 > > [<ffffffff82318c38>] start_kernel+0x43c/0x86c > > > > Code: 10400028 2402fff0 de420000 <dc432880> 0203182b 14600022 > > 64420070 00003025 24040003 > > > > ---[ end trace 0000000000000000 ]--- > > Kernel panic - not syncing: Attempted to kill the idle task! > > ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- > > ---------------------------------------------------------------------- > > > > The root cause is no memory region "0x0-0x1fffff" paired with > > memory-reserved region "0x0-0x1fffff" and "0x0-0xfff", with "memblock > > =debug": > > > > ---------------------------------------------------------------------- > > memory[0x0] [0x0000000000200000-0x000000000effffff], > > 0x000000000ee00000 bytes on node 0 flags: 0x0 !!!!here > > memory[0x1] [0x0000000090000000-0x00000000fdffffff], > > 0x000000006e000000 bytes on node 0 flags: 0x0 > > memory[0x2] [0x0000000100000000-0x000000027fffffff], > > 0x0000000180000000 bytes on node 0 flags: 0x0 > > memory[0x3] [0x0000100000000000-0x000010000fffffff], > > 0x0000000010000000 bytes on node 1 flags: 0x0 > > memory[0x4] [0x0000100090000000-0x000010027fffffff], > > 0x00000001f0000000 bytes on node 1 flags: 0x0 > > reserved.cnt = 0x1f > > reserved[0x0] [0x0000000000000000-0x000000000190c80a], > > 0x000000000190c80b bytes flags: 0x0 !!!!oops 0x0-0x1fffff not in memory[0] > > reserved[0x1] [0x000000000190c810-0x000000000190eea3], > > 0x0000000000002694 bytes flags: 0x0 > > ---------------------------------------------------------------------- > > > > It caused memory-reserved region "0x0-0x1fffff" without valid node id > > in "memblock_get_region_node" from "memmap_init_reserved_pages", lead to > > "reserve_bootmem_region-> init_reserved_page -> early_pfn_to_nid()" > > accessing "node_data" out of bound. > > > > To fix this bug, we should remove unnecessary memory block reservation. > > > > +. no need to reserve 0x0-0x1fffff below kernel loading address, since > > it is not registered by "memblock_add_node" > > > > +. no need to reserve 0x0-0xfff for exception handling if it is not > > registered by "memblock_add" either. > > > > Fixes: commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()) > > Signed-off-by: Huang Pei <huangpei@loongson.cn> > > --- > > arch/mips/kernel/traps.c | 3 ++- > > arch/mips/loongson64/numa.c | 2 -- > > 2 files changed, 2 insertions(+), 3 deletions(-) > > > > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c > > index 246c6a6b0261..9b632b4c10c3 100644 > > --- a/arch/mips/kernel/traps.c > > +++ b/arch/mips/kernel/traps.c > > @@ -2007,7 +2007,8 @@ unsigned long vi_handlers[64]; > > void reserve_exception_space(phys_addr_t addr, unsigned long size) > > { > > - memblock_reserve(addr, size); > > + if(memblock_is_region_memory(addr, size)) > > + memblock_reserve(addr, size); > > } > > void __init *set_except_vector(int n, void *addr) > > diff --git a/arch/mips/loongson64/numa.c b/arch/mips/loongson64/numa.c > > index 8f61e93c0c5b..0f516dde81da 100644 > > --- a/arch/mips/loongson64/numa.c > > +++ b/arch/mips/loongson64/numa.c > > @@ -130,8 +130,6 @@ static void __init node_mem_init(unsigned int node) > > memblock_reserve((node_addrspace_offset | 0xfe000000), > > 32 << 20); > > - /* Reserve pfn range 0~node[0]->node_start_pfn */ > > - memblock_reserve(0, PAGE_SIZE * start_pfn); > > } > > } > > -- > --- > Jiaxun Yang >
On Tue, Jan 16, 2024 at 10:39:04AM +0200, Mike Rapoport wrote: > On Mon, Jan 15, 2024 at 02:08:21PM +0000, Jiaxun Yang wrote: > > Hi mm folks, > > > > Just a quick question, what is the expected behavior of memblock_reserve > > a region that is not added to memblock with memblock_add? > > > > I'm unable to find any documentation about memblock_reserve in comments and > > boot-time-mm, but as per my understanding to the code, this should be a > > legit usage? > > Yes, memblock allows reserving memory that was not added to memblock with > memblock_add(). I think arch/platform specific code should fix this bug, like, -------------------------------------------------------------------------- //for loongson64 memblock_set_node(0, 1ULL << 44, &memblock.reserved, 0); -------------------------------------------------------------------------- or maybe memblock provide something like memblock_reserve_node > > > In practical we run into uninitialized nid of reserved block problem, should > > we fix it > > in our usage, or on memblock side? > > Apparently it's a bug in memblock :( > > If you revert 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") > does the issue disappear? Yes, I git bisect this commit. But I don't think it is a bug in memblock. IMO, memblock_reserve under NUMA set nid of reserved region to MAX_NUMNODES, which is the point that cause the "memblock_get_region_node from memmap_init_reserved_pages " passing a invalid node id(aka MAX_NUMNODES) to "reserver_bootmem_region -> init_reserved_page -> early_pfn_to_nid". If arch-specific code DOES NOT initialize the nid of reserved region(only it know that), or the reserved region NOT within a memblock added by memblock_add, memblock can not give a valid node id to the reserved region. Commit 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") just reveals the embarrassment case by an out of bound memory access. > > > Thanks > > > > 在 2023/12/25 09:30, Huang Pei 写道: > > > Since commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()), > > > loongson64 booting failed with CONFIG_DEFERRED_STRUCT_PAGE_INIT like > > > this: > > > ---------------------------------------------------------------------- > > > Call Trace: > > > [<ffffffff8235d088>] reserve_bootmem_region+0xa8/0x184 > > > [<ffffffff82333940>] memblock_free_all+0x104/0x2a8 > > > [<ffffffff8231d8e4>] mem_init+0x84/0x94 > > > [<ffffffff82330958>] mm_core_init+0xf8/0x308 > > > [<ffffffff82318c38>] start_kernel+0x43c/0x86c > > > > > > Code: 10400028 2402fff0 de420000 <dc432880> 0203182b 14600022 > > > 64420070 00003025 24040003 > > > > > > ---[ end trace 0000000000000000 ]--- > > > Kernel panic - not syncing: Attempted to kill the idle task! > > > ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- > > > ---------------------------------------------------------------------- > > > > > > The root cause is no memory region "0x0-0x1fffff" paired with > > > memory-reserved region "0x0-0x1fffff" and "0x0-0xfff", with "memblock > > > =debug": > > > > > > ---------------------------------------------------------------------- > > > memory[0x0] [0x0000000000200000-0x000000000effffff], > > > 0x000000000ee00000 bytes on node 0 flags: 0x0 !!!!here > > > memory[0x1] [0x0000000090000000-0x00000000fdffffff], > > > 0x000000006e000000 bytes on node 0 flags: 0x0 > > > memory[0x2] [0x0000000100000000-0x000000027fffffff], > > > 0x0000000180000000 bytes on node 0 flags: 0x0 > > > memory[0x3] [0x0000100000000000-0x000010000fffffff], > > > 0x0000000010000000 bytes on node 1 flags: 0x0 > > > memory[0x4] [0x0000100090000000-0x000010027fffffff], > > > 0x00000001f0000000 bytes on node 1 flags: 0x0 > > > reserved.cnt = 0x1f > > > reserved[0x0] [0x0000000000000000-0x000000000190c80a], > > > 0x000000000190c80b bytes flags: 0x0 !!!!oops 0x0-0x1fffff not in memory[0] > > > reserved[0x1] [0x000000000190c810-0x000000000190eea3], > > > 0x0000000000002694 bytes flags: 0x0 > > > ---------------------------------------------------------------------- > > > > > > It caused memory-reserved region "0x0-0x1fffff" without valid node id > > > in "memblock_get_region_node" from "memmap_init_reserved_pages", lead to > > > "reserve_bootmem_region-> init_reserved_page -> early_pfn_to_nid()" > > > accessing "node_data" out of bound. > > > > > > To fix this bug, we should remove unnecessary memory block reservation. > > > > > > +. no need to reserve 0x0-0x1fffff below kernel loading address, since > > > it is not registered by "memblock_add_node" > > > > > > +. no need to reserve 0x0-0xfff for exception handling if it is not > > > registered by "memblock_add" either. > > > > > > Fixes: commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()) > > > Signed-off-by: Huang Pei <huangpei@loongson.cn> > > > --- > > > arch/mips/kernel/traps.c | 3 ++- > > > arch/mips/loongson64/numa.c | 2 -- > > > 2 files changed, 2 insertions(+), 3 deletions(-) > > > > > > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c > > > index 246c6a6b0261..9b632b4c10c3 100644 > > > --- a/arch/mips/kernel/traps.c > > > +++ b/arch/mips/kernel/traps.c > > > @@ -2007,7 +2007,8 @@ unsigned long vi_handlers[64]; > > > void reserve_exception_space(phys_addr_t addr, unsigned long size) > > > { > > > - memblock_reserve(addr, size); > > > + if(memblock_is_region_memory(addr, size)) > > > + memblock_reserve(addr, size); > > > } > > > void __init *set_except_vector(int n, void *addr) > > > diff --git a/arch/mips/loongson64/numa.c b/arch/mips/loongson64/numa.c > > > index 8f61e93c0c5b..0f516dde81da 100644 > > > --- a/arch/mips/loongson64/numa.c > > > +++ b/arch/mips/loongson64/numa.c > > > @@ -130,8 +130,6 @@ static void __init node_mem_init(unsigned int node) > > > memblock_reserve((node_addrspace_offset | 0xfe000000), > > > 32 << 20); > > > - /* Reserve pfn range 0~node[0]->node_start_pfn */ > > > - memblock_reserve(0, PAGE_SIZE * start_pfn); > > > } > > > } > > > > -- > > --- > > Jiaxun Yang > > > > -- > Sincerely yours, > Mike.
On 2024/1/16 20:23, Huang Pei wrote: > On Tue, Jan 16, 2024 at 10:39:04AM +0200, Mike Rapoport wrote: >> On Mon, Jan 15, 2024 at 02:08:21PM +0000, Jiaxun Yang wrote: >>> Hi mm folks, >>> >>> Just a quick question, what is the expected behavior of memblock_reserve >>> a region that is not added to memblock with memblock_add? >>> >>> I'm unable to find any documentation about memblock_reserve in comments and >>> boot-time-mm, but as per my understanding to the code, this should be a >>> legit usage? >> Yes, memblock allows reserving memory that was not added to memblock with >> memblock_add(). > I think arch/platform specific code should fix this bug, like, > -------------------------------------------------------------------------- > //for loongson64 > memblock_set_node(0, 1ULL << 44, &memblock.reserved, 0); > > -------------------------------------------------------------------------- > > or maybe memblock provide something like memblock_reserve_node Hi pei, Can you test the following patch to see if it fixes this bug? diff --git a/mm/mm_init.c b/mm/mm_init.c index 2c19f5515e36..97721d99fdce 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -708,6 +708,9 @@ static void __meminit init_reserved_page(unsigned long pfn, int nid) pg_data_t *pgdat; int zid; + if (unlikely(nid == NUMA_NO_NODE || nid >= MAX_NUMNODES)) + nid = early_pfn_to_nid(pfn); + if (early_page_initialised(pfn, nid)) return; >> >>> In practical we run into uninitialized nid of reserved block problem, should >>> we fix it >>> in our usage, or on memblock side? >> Apparently it's a bug in memblock :( >> >> If you revert 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") >> does the issue disappear? > Yes, I git bisect this commit. > > But I don't think it is a bug in memblock. IMO, memblock_reserve under > NUMA set nid of reserved region to MAX_NUMNODES, which is the point > that cause the "memblock_get_region_node from memmap_init_reserved_pages " > passing a invalid node id(aka MAX_NUMNODES) to "reserver_bootmem_region > -> init_reserved_page -> early_pfn_to_nid". If arch-specific code DOES NOT > initialize the nid of reserved region(only it know that), or the reserved > region NOT within a memblock added by memblock_add, memblock can not > give a valid node id to the reserved region. Commit 61167ad5fecd ("mm: pass nid to > reserve_bootmem_region()") just reveals the embarrassment case by an > out of bound memory access. > >> >>> Thanks >>> >>> 在 2023/12/25 09:30, Huang Pei 写道: >>>> Since commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()), >>>> loongson64 booting failed with CONFIG_DEFERRED_STRUCT_PAGE_INIT like >>>> this: >>>> ---------------------------------------------------------------------- >>>> Call Trace: >>>> [<ffffffff8235d088>] reserve_bootmem_region+0xa8/0x184 >>>> [<ffffffff82333940>] memblock_free_all+0x104/0x2a8 >>>> [<ffffffff8231d8e4>] mem_init+0x84/0x94 >>>> [<ffffffff82330958>] mm_core_init+0xf8/0x308 >>>> [<ffffffff82318c38>] start_kernel+0x43c/0x86c >>>> >>>> Code: 10400028 2402fff0 de420000 <dc432880> 0203182b 14600022 >>>> 64420070 00003025 24040003 >>>> >>>> ---[ end trace 0000000000000000 ]--- >>>> Kernel panic - not syncing: Attempted to kill the idle task! >>>> ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- >>>> ---------------------------------------------------------------------- >>>> >>>> The root cause is no memory region "0x0-0x1fffff" paired with >>>> memory-reserved region "0x0-0x1fffff" and "0x0-0xfff", with "memblock >>>> =debug": >>>> >>>> ---------------------------------------------------------------------- >>>> memory[0x0] [0x0000000000200000-0x000000000effffff], >>>> 0x000000000ee00000 bytes on node 0 flags: 0x0 !!!!here >>>> memory[0x1] [0x0000000090000000-0x00000000fdffffff], >>>> 0x000000006e000000 bytes on node 0 flags: 0x0 >>>> memory[0x2] [0x0000000100000000-0x000000027fffffff], >>>> 0x0000000180000000 bytes on node 0 flags: 0x0 >>>> memory[0x3] [0x0000100000000000-0x000010000fffffff], >>>> 0x0000000010000000 bytes on node 1 flags: 0x0 >>>> memory[0x4] [0x0000100090000000-0x000010027fffffff], >>>> 0x00000001f0000000 bytes on node 1 flags: 0x0 >>>> reserved.cnt = 0x1f >>>> reserved[0x0] [0x0000000000000000-0x000000000190c80a], >>>> 0x000000000190c80b bytes flags: 0x0 !!!!oops 0x0-0x1fffff not in memory[0] >>>> reserved[0x1] [0x000000000190c810-0x000000000190eea3], >>>> 0x0000000000002694 bytes flags: 0x0 >>>> ---------------------------------------------------------------------- >>>> >>>> It caused memory-reserved region "0x0-0x1fffff" without valid node id >>>> in "memblock_get_region_node" from "memmap_init_reserved_pages", lead to >>>> "reserve_bootmem_region-> init_reserved_page -> early_pfn_to_nid()" >>>> accessing "node_data" out of bound. >>>> >>>> To fix this bug, we should remove unnecessary memory block reservation. >>>> >>>> +. no need to reserve 0x0-0x1fffff below kernel loading address, since >>>> it is not registered by "memblock_add_node" >>>> >>>> +. no need to reserve 0x0-0xfff for exception handling if it is not >>>> registered by "memblock_add" either. >>>> >>>> Fixes: commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()) >>>> Signed-off-by: Huang Pei <huangpei@loongson.cn> >>>> --- >>>> arch/mips/kernel/traps.c | 3 ++- >>>> arch/mips/loongson64/numa.c | 2 -- >>>> 2 files changed, 2 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c >>>> index 246c6a6b0261..9b632b4c10c3 100644 >>>> --- a/arch/mips/kernel/traps.c >>>> +++ b/arch/mips/kernel/traps.c >>>> @@ -2007,7 +2007,8 @@ unsigned long vi_handlers[64]; >>>> void reserve_exception_space(phys_addr_t addr, unsigned long size) >>>> { >>>> - memblock_reserve(addr, size); >>>> + if(memblock_is_region_memory(addr, size)) >>>> + memblock_reserve(addr, size); >>>> } >>>> void __init *set_except_vector(int n, void *addr) >>>> diff --git a/arch/mips/loongson64/numa.c b/arch/mips/loongson64/numa.c >>>> index 8f61e93c0c5b..0f516dde81da 100644 >>>> --- a/arch/mips/loongson64/numa.c >>>> +++ b/arch/mips/loongson64/numa.c >>>> @@ -130,8 +130,6 @@ static void __init node_mem_init(unsigned int node) >>>> memblock_reserve((node_addrspace_offset | 0xfe000000), >>>> 32 << 20); >>>> - /* Reserve pfn range 0~node[0]->node_start_pfn */ >>>> - memblock_reserve(0, PAGE_SIZE * start_pfn); >>>> } >>>> } >>> -- >>> --- >>> Jiaxun Yang >>> >> -- >> Sincerely yours, >> Mike.
On Wed, Jan 17, 2024 at 10:20:00AM +0800, Yajun Deng wrote: > > On 2024/1/16 20:23, Huang Pei wrote: > > On Tue, Jan 16, 2024 at 10:39:04AM +0200, Mike Rapoport wrote: > > > On Mon, Jan 15, 2024 at 02:08:21PM +0000, Jiaxun Yang wrote: > > > > Hi mm folks, > > > > > > > > Just a quick question, what is the expected behavior of memblock_reserve > > > > a region that is not added to memblock with memblock_add? > > > > > > > > I'm unable to find any documentation about memblock_reserve in comments and > > > > boot-time-mm, but as per my understanding to the code, this should be a > > > > legit usage? > > > Yes, memblock allows reserving memory that was not added to memblock with > > > memblock_add(). > > I think arch/platform specific code should fix this bug, like, > > -------------------------------------------------------------------------- > > //for loongson64 > > memblock_set_node(0, 1ULL << 44, &memblock.reserved, 0); > > > > -------------------------------------------------------------------------- > > > > or maybe memblock provide something like memblock_reserve_node > > Hi pei, > > Can you test the following patch to see if it fixes this bug? > > diff --git a/mm/mm_init.c b/mm/mm_init.c > index 2c19f5515e36..97721d99fdce 100644 > --- a/mm/mm_init.c > +++ b/mm/mm_init.c > @@ -708,6 +708,9 @@ static void __meminit init_reserved_page(unsigned long > pfn, int nid) > pg_data_t *pgdat; > int zid; > > + if (unlikely(nid == NUMA_NO_NODE || nid >= MAX_NUMNODES)) > + nid = early_pfn_to_nid(pfn); > + > if (early_page_initialised(pfn, nid)) > return; I do not think this fix set the right nid, ONLY arch/platform know that nid int __meminit early_pfn_to_nid(unsigned long pfn) { static DEFINE_SPINLOCK(early_pfn_lock); int nid; spin_lock(&early_pfn_lock); nid = __early_pfn_to_nid(pfn, &early_pfnnid_cache); if (nid < 0) //!!!first_online_node MAY NOT be the node the pfn belong to!!! nid = first_online_node; spin_unlock(&early_pfn_lock); return nid; } > > > > > > In practical we run into uninitialized nid of reserved block problem, should > > > > we fix it > > > > in our usage, or on memblock side? > > > Apparently it's a bug in memblock :( > > > > > > If you revert 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") > > > does the issue disappear? > > Yes, I git bisect this commit. > > > > But I don't think it is a bug in memblock. IMO, memblock_reserve under > > NUMA set nid of reserved region to MAX_NUMNODES, which is the point > > that cause the "memblock_get_region_node from memmap_init_reserved_pages " > > passing a invalid node id(aka MAX_NUMNODES) to "reserver_bootmem_region > > -> init_reserved_page -> early_pfn_to_nid". If arch-specific code DOES NOT > > initialize the nid of reserved region(only it know that), or the reserved > > region NOT within a memblock added by memblock_add, memblock can not > > give a valid node id to the reserved region. Commit 61167ad5fecd ("mm: pass nid to > > reserve_bootmem_region()") just reveals the embarrassment case by an > > out of bound memory access. > > > > > > Thanks > > > > > > > > 在 2023/12/25 09:30, Huang Pei 写道: > > > > > Since commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()), > > > > > loongson64 booting failed with CONFIG_DEFERRED_STRUCT_PAGE_INIT like > > > > > this: > > > > > ---------------------------------------------------------------------- > > > > > Call Trace: > > > > > [<ffffffff8235d088>] reserve_bootmem_region+0xa8/0x184 > > > > > [<ffffffff82333940>] memblock_free_all+0x104/0x2a8 > > > > > [<ffffffff8231d8e4>] mem_init+0x84/0x94 > > > > > [<ffffffff82330958>] mm_core_init+0xf8/0x308 > > > > > [<ffffffff82318c38>] start_kernel+0x43c/0x86c > > > > > > > > > > Code: 10400028 2402fff0 de420000 <dc432880> 0203182b 14600022 > > > > > 64420070 00003025 24040003 > > > > > > > > > > ---[ end trace 0000000000000000 ]--- > > > > > Kernel panic - not syncing: Attempted to kill the idle task! > > > > > ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- > > > > > ---------------------------------------------------------------------- > > > > > > > > > > The root cause is no memory region "0x0-0x1fffff" paired with > > > > > memory-reserved region "0x0-0x1fffff" and "0x0-0xfff", with "memblock > > > > > =debug": > > > > > > > > > > ---------------------------------------------------------------------- > > > > > memory[0x0] [0x0000000000200000-0x000000000effffff], > > > > > 0x000000000ee00000 bytes on node 0 flags: 0x0 !!!!here > > > > > memory[0x1] [0x0000000090000000-0x00000000fdffffff], > > > > > 0x000000006e000000 bytes on node 0 flags: 0x0 > > > > > memory[0x2] [0x0000000100000000-0x000000027fffffff], > > > > > 0x0000000180000000 bytes on node 0 flags: 0x0 > > > > > memory[0x3] [0x0000100000000000-0x000010000fffffff], > > > > > 0x0000000010000000 bytes on node 1 flags: 0x0 > > > > > memory[0x4] [0x0000100090000000-0x000010027fffffff], > > > > > 0x00000001f0000000 bytes on node 1 flags: 0x0 > > > > > reserved.cnt = 0x1f > > > > > reserved[0x0] [0x0000000000000000-0x000000000190c80a], > > > > > 0x000000000190c80b bytes flags: 0x0 !!!!oops 0x0-0x1fffff not in memory[0] > > > > > reserved[0x1] [0x000000000190c810-0x000000000190eea3], > > > > > 0x0000000000002694 bytes flags: 0x0 > > > > > ---------------------------------------------------------------------- > > > > > > > > > > It caused memory-reserved region "0x0-0x1fffff" without valid node id > > > > > in "memblock_get_region_node" from "memmap_init_reserved_pages", lead to > > > > > "reserve_bootmem_region-> init_reserved_page -> early_pfn_to_nid()" > > > > > accessing "node_data" out of bound. > > > > > > > > > > To fix this bug, we should remove unnecessary memory block reservation. > > > > > > > > > > +. no need to reserve 0x0-0x1fffff below kernel loading address, since > > > > > it is not registered by "memblock_add_node" > > > > > > > > > > +. no need to reserve 0x0-0xfff for exception handling if it is not > > > > > registered by "memblock_add" either. > > > > > > > > > > Fixes: commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()) > > > > > Signed-off-by: Huang Pei <huangpei@loongson.cn> > > > > > --- > > > > > arch/mips/kernel/traps.c | 3 ++- > > > > > arch/mips/loongson64/numa.c | 2 -- > > > > > 2 files changed, 2 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c > > > > > index 246c6a6b0261..9b632b4c10c3 100644 > > > > > --- a/arch/mips/kernel/traps.c > > > > > +++ b/arch/mips/kernel/traps.c > > > > > @@ -2007,7 +2007,8 @@ unsigned long vi_handlers[64]; > > > > > void reserve_exception_space(phys_addr_t addr, unsigned long size) > > > > > { > > > > > - memblock_reserve(addr, size); > > > > > + if(memblock_is_region_memory(addr, size)) > > > > > + memblock_reserve(addr, size); > > > > > } > > > > > void __init *set_except_vector(int n, void *addr) > > > > > diff --git a/arch/mips/loongson64/numa.c b/arch/mips/loongson64/numa.c > > > > > index 8f61e93c0c5b..0f516dde81da 100644 > > > > > --- a/arch/mips/loongson64/numa.c > > > > > +++ b/arch/mips/loongson64/numa.c > > > > > @@ -130,8 +130,6 @@ static void __init node_mem_init(unsigned int node) > > > > > memblock_reserve((node_addrspace_offset | 0xfe000000), > > > > > 32 << 20); > > > > > - /* Reserve pfn range 0~node[0]->node_start_pfn */ > > > > > - memblock_reserve(0, PAGE_SIZE * start_pfn); > > > > > } > > > > > } > > > > -- > > > > --- > > > > Jiaxun Yang > > > > > > > -- > > > Sincerely yours, > > > Mike.
On 2024/1/17 11:01, Huang Pei wrote: > On Wed, Jan 17, 2024 at 10:20:00AM +0800, Yajun Deng wrote: >> On 2024/1/16 20:23, Huang Pei wrote: >>> On Tue, Jan 16, 2024 at 10:39:04AM +0200, Mike Rapoport wrote: >>>> On Mon, Jan 15, 2024 at 02:08:21PM +0000, Jiaxun Yang wrote: >>>>> Hi mm folks, >>>>> >>>>> Just a quick question, what is the expected behavior of memblock_reserve >>>>> a region that is not added to memblock with memblock_add? >>>>> >>>>> I'm unable to find any documentation about memblock_reserve in comments and >>>>> boot-time-mm, but as per my understanding to the code, this should be a >>>>> legit usage? >>>> Yes, memblock allows reserving memory that was not added to memblock with >>>> memblock_add(). >>> I think arch/platform specific code should fix this bug, like, >>> -------------------------------------------------------------------------- >>> //for loongson64 >>> memblock_set_node(0, 1ULL << 44, &memblock.reserved, 0); >>> >>> -------------------------------------------------------------------------- >>> >>> or maybe memblock provide something like memblock_reserve_node >> Hi pei, >> >> Can you test the following patch to see if it fixes this bug? >> >> diff --git a/mm/mm_init.c b/mm/mm_init.c >> index 2c19f5515e36..97721d99fdce 100644 >> --- a/mm/mm_init.c >> +++ b/mm/mm_init.c >> @@ -708,6 +708,9 @@ static void __meminit init_reserved_page(unsigned long >> pfn, int nid) >> pg_data_t *pgdat; >> int zid; >> >> + if (unlikely(nid == NUMA_NO_NODE || nid >= MAX_NUMNODES)) >> + nid = early_pfn_to_nid(pfn); >> + >> if (early_page_initialised(pfn, nid)) >> return; > I do not think this fix set the right nid, ONLY arch/platform know that > nid > > int __meminit early_pfn_to_nid(unsigned long pfn) > { > static DEFINE_SPINLOCK(early_pfn_lock); > int nid; > > spin_lock(&early_pfn_lock); > nid = __early_pfn_to_nid(pfn, > &early_pfnnid_cache); > if (nid < 0) > //!!!first_online_node MAY NOT be the node the pfn belong to!!! > nid = first_online_node; > > spin_unlock(&early_pfn_lock); > > return > nid; > } Okay, I don't think this bug is caused by commit 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()"), because even if you revert this commit, it will still get nid by early_pfn_to_nid(). Did I get that right? >> >>>>> In practical we run into uninitialized nid of reserved block problem, should >>>>> we fix it >>>>> in our usage, or on memblock side? >>>> Apparently it's a bug in memblock :( >>>> >>>> If you revert 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") >>>> does the issue disappear? >>> Yes, I git bisect this commit. >>> >>> But I don't think it is a bug in memblock. IMO, memblock_reserve under >>> NUMA set nid of reserved region to MAX_NUMNODES, which is the point >>> that cause the "memblock_get_region_node from memmap_init_reserved_pages " >>> passing a invalid node id(aka MAX_NUMNODES) to "reserver_bootmem_region >>> -> init_reserved_page -> early_pfn_to_nid". If arch-specific code DOES NOT >>> initialize the nid of reserved region(only it know that), or the reserved >>> region NOT within a memblock added by memblock_add, memblock can not >>> give a valid node id to the reserved region. Commit 61167ad5fecd ("mm: pass nid to >>> reserve_bootmem_region()") just reveals the embarrassment case by an >>> out of bound memory access. >>> >>>>> Thanks >>>>> >>>>> 在 2023/12/25 09:30, Huang Pei 写道: >>>>>> Since commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()), >>>>>> loongson64 booting failed with CONFIG_DEFERRED_STRUCT_PAGE_INIT like >>>>>> this: >>>>>> ---------------------------------------------------------------------- >>>>>> Call Trace: >>>>>> [<ffffffff8235d088>] reserve_bootmem_region+0xa8/0x184 >>>>>> [<ffffffff82333940>] memblock_free_all+0x104/0x2a8 >>>>>> [<ffffffff8231d8e4>] mem_init+0x84/0x94 >>>>>> [<ffffffff82330958>] mm_core_init+0xf8/0x308 >>>>>> [<ffffffff82318c38>] start_kernel+0x43c/0x86c >>>>>> >>>>>> Code: 10400028 2402fff0 de420000 <dc432880> 0203182b 14600022 >>>>>> 64420070 00003025 24040003 >>>>>> >>>>>> ---[ end trace 0000000000000000 ]--- >>>>>> Kernel panic - not syncing: Attempted to kill the idle task! >>>>>> ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- >>>>>> ---------------------------------------------------------------------- >>>>>> >>>>>> The root cause is no memory region "0x0-0x1fffff" paired with >>>>>> memory-reserved region "0x0-0x1fffff" and "0x0-0xfff", with "memblock >>>>>> =debug": >>>>>> >>>>>> ---------------------------------------------------------------------- >>>>>> memory[0x0] [0x0000000000200000-0x000000000effffff], >>>>>> 0x000000000ee00000 bytes on node 0 flags: 0x0 !!!!here >>>>>> memory[0x1] [0x0000000090000000-0x00000000fdffffff], >>>>>> 0x000000006e000000 bytes on node 0 flags: 0x0 >>>>>> memory[0x2] [0x0000000100000000-0x000000027fffffff], >>>>>> 0x0000000180000000 bytes on node 0 flags: 0x0 >>>>>> memory[0x3] [0x0000100000000000-0x000010000fffffff], >>>>>> 0x0000000010000000 bytes on node 1 flags: 0x0 >>>>>> memory[0x4] [0x0000100090000000-0x000010027fffffff], >>>>>> 0x00000001f0000000 bytes on node 1 flags: 0x0 >>>>>> reserved.cnt = 0x1f >>>>>> reserved[0x0] [0x0000000000000000-0x000000000190c80a], >>>>>> 0x000000000190c80b bytes flags: 0x0 !!!!oops 0x0-0x1fffff not in memory[0] >>>>>> reserved[0x1] [0x000000000190c810-0x000000000190eea3], >>>>>> 0x0000000000002694 bytes flags: 0x0 >>>>>> ---------------------------------------------------------------------- >>>>>> >>>>>> It caused memory-reserved region "0x0-0x1fffff" without valid node id >>>>>> in "memblock_get_region_node" from "memmap_init_reserved_pages", lead to >>>>>> "reserve_bootmem_region-> init_reserved_page -> early_pfn_to_nid()" >>>>>> accessing "node_data" out of bound. >>>>>> >>>>>> To fix this bug, we should remove unnecessary memory block reservation. >>>>>> >>>>>> +. no need to reserve 0x0-0x1fffff below kernel loading address, since >>>>>> it is not registered by "memblock_add_node" >>>>>> >>>>>> +. no need to reserve 0x0-0xfff for exception handling if it is not >>>>>> registered by "memblock_add" either. >>>>>> >>>>>> Fixes: commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()) >>>>>> Signed-off-by: Huang Pei <huangpei@loongson.cn> >>>>>> --- >>>>>> arch/mips/kernel/traps.c | 3 ++- >>>>>> arch/mips/loongson64/numa.c | 2 -- >>>>>> 2 files changed, 2 insertions(+), 3 deletions(-) >>>>>> >>>>>> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c >>>>>> index 246c6a6b0261..9b632b4c10c3 100644 >>>>>> --- a/arch/mips/kernel/traps.c >>>>>> +++ b/arch/mips/kernel/traps.c >>>>>> @@ -2007,7 +2007,8 @@ unsigned long vi_handlers[64]; >>>>>> void reserve_exception_space(phys_addr_t addr, unsigned long size) >>>>>> { >>>>>> - memblock_reserve(addr, size); >>>>>> + if(memblock_is_region_memory(addr, size)) >>>>>> + memblock_reserve(addr, size); >>>>>> } >>>>>> void __init *set_except_vector(int n, void *addr) >>>>>> diff --git a/arch/mips/loongson64/numa.c b/arch/mips/loongson64/numa.c >>>>>> index 8f61e93c0c5b..0f516dde81da 100644 >>>>>> --- a/arch/mips/loongson64/numa.c >>>>>> +++ b/arch/mips/loongson64/numa.c >>>>>> @@ -130,8 +130,6 @@ static void __init node_mem_init(unsigned int node) >>>>>> memblock_reserve((node_addrspace_offset | 0xfe000000), >>>>>> 32 << 20); >>>>>> - /* Reserve pfn range 0~node[0]->node_start_pfn */ >>>>>> - memblock_reserve(0, PAGE_SIZE * start_pfn); >>>>>> } >>>>>> } >>>>> -- >>>>> --- >>>>> Jiaxun Yang >>>>> >>>> -- >>>> Sincerely yours, >>>> Mike.
On Wed, Jan 17, 2024 at 11:17:18AM +0800, Yajun Deng wrote: > > On 2024/1/17 11:01, Huang Pei wrote: > > On Wed, Jan 17, 2024 at 10:20:00AM +0800, Yajun Deng wrote: > > > On 2024/1/16 20:23, Huang Pei wrote: > > > > On Tue, Jan 16, 2024 at 10:39:04AM +0200, Mike Rapoport wrote: > > > > > On Mon, Jan 15, 2024 at 02:08:21PM +0000, Jiaxun Yang wrote: > > > > > > Hi mm folks, > > > > > > > > > > > > Just a quick question, what is the expected behavior of memblock_reserve > > > > > > a region that is not added to memblock with memblock_add? > > > > > > > > > > > > I'm unable to find any documentation about memblock_reserve in comments and > > > > > > boot-time-mm, but as per my understanding to the code, this should be a > > > > > > legit usage? > > > > > Yes, memblock allows reserving memory that was not added to memblock with > > > > > memblock_add(). > > > > I think arch/platform specific code should fix this bug, like, > > > > -------------------------------------------------------------------------- > > > > //for loongson64 > > > > memblock_set_node(0, 1ULL << 44, &memblock.reserved, 0); > > > > > > > > -------------------------------------------------------------------------- > > > > > > > > or maybe memblock provide something like memblock_reserve_node > > > Hi pei, > > > > > > Can you test the following patch to see if it fixes this bug? > > > > > > diff --git a/mm/mm_init.c b/mm/mm_init.c > > > index 2c19f5515e36..97721d99fdce 100644 > > > --- a/mm/mm_init.c > > > +++ b/mm/mm_init.c > > > @@ -708,6 +708,9 @@ static void __meminit init_reserved_page(unsigned long > > > pfn, int nid) > > > pg_data_t *pgdat; > > > int zid; > > > > > > + if (unlikely(nid == NUMA_NO_NODE || nid >= MAX_NUMNODES)) > > > + nid = early_pfn_to_nid(pfn); > > > + > > > if (early_page_initialised(pfn, nid)) > > > return; > > I do not think this fix set the right nid, ONLY arch/platform know that > > nid > > > > int __meminit early_pfn_to_nid(unsigned long pfn) > > { > > static DEFINE_SPINLOCK(early_pfn_lock); > > int nid; > > > > spin_lock(&early_pfn_lock); > > nid = __early_pfn_to_nid(pfn, > > &early_pfnnid_cache); > > if (nid < 0) > > //!!!first_online_node MAY NOT be the node the pfn belong to!!! > > nid = first_online_node; > > > > spin_unlock(&early_pfn_lock); > > > > return > > nid; > > } > > > Okay, I don't think this bug is caused by commit 61167ad5fecd ("mm: pass nid > to reserve_bootmem_region()"), > > because even if you revert this commit, it will still get nid by > early_pfn_to_nid(). Did I get that right? Yes, more accurately, this bug is exposed by commit 61167ad5fecd. My previous fix is based on presumptions that memory_reserve should reserve memory added by memblock_add{,_node}, if going across this limitation, there need to set the valid nid for reserved memory region. > > > > > > > > > > In practical we run into uninitialized nid of reserved block problem, should > > > > > > we fix it > > > > > > in our usage, or on memblock side? > > > > > Apparently it's a bug in memblock :( > > > > > > > > > > If you revert 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") > > > > > does the issue disappear? > > > > Yes, I git bisect this commit. > > > > > > > > But I don't think it is a bug in memblock. IMO, memblock_reserve under > > > > NUMA set nid of reserved region to MAX_NUMNODES, which is the point > > > > that cause the "memblock_get_region_node from memmap_init_reserved_pages " > > > > passing a invalid node id(aka MAX_NUMNODES) to "reserver_bootmem_region > > > > -> init_reserved_page -> early_pfn_to_nid". If arch-specific code DOES NOT > > > > initialize the nid of reserved region(only it know that), or the reserved > > > > region NOT within a memblock added by memblock_add, memblock can not > > > > give a valid node id to the reserved region. Commit 61167ad5fecd ("mm: pass nid to > > > > reserve_bootmem_region()") just reveals the embarrassment case by an > > > > out of bound memory access. > > > > > > > > > > Thanks > > > > > > > > > > > > 在 2023/12/25 09:30, Huang Pei 写道: > > > > > > > Since commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()), > > > > > > > loongson64 booting failed with CONFIG_DEFERRED_STRUCT_PAGE_INIT like > > > > > > > this: > > > > > > > ---------------------------------------------------------------------- > > > > > > > Call Trace: > > > > > > > [<ffffffff8235d088>] reserve_bootmem_region+0xa8/0x184 > > > > > > > [<ffffffff82333940>] memblock_free_all+0x104/0x2a8 > > > > > > > [<ffffffff8231d8e4>] mem_init+0x84/0x94 > > > > > > > [<ffffffff82330958>] mm_core_init+0xf8/0x308 > > > > > > > [<ffffffff82318c38>] start_kernel+0x43c/0x86c > > > > > > > > > > > > > > Code: 10400028 2402fff0 de420000 <dc432880> 0203182b 14600022 > > > > > > > 64420070 00003025 24040003 > > > > > > > > > > > > > > ---[ end trace 0000000000000000 ]--- > > > > > > > Kernel panic - not syncing: Attempted to kill the idle task! > > > > > > > ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > > > > > The root cause is no memory region "0x0-0x1fffff" paired with > > > > > > > memory-reserved region "0x0-0x1fffff" and "0x0-0xfff", with "memblock > > > > > > > =debug": > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > memory[0x0] [0x0000000000200000-0x000000000effffff], > > > > > > > 0x000000000ee00000 bytes on node 0 flags: 0x0 !!!!here > > > > > > > memory[0x1] [0x0000000090000000-0x00000000fdffffff], > > > > > > > 0x000000006e000000 bytes on node 0 flags: 0x0 > > > > > > > memory[0x2] [0x0000000100000000-0x000000027fffffff], > > > > > > > 0x0000000180000000 bytes on node 0 flags: 0x0 > > > > > > > memory[0x3] [0x0000100000000000-0x000010000fffffff], > > > > > > > 0x0000000010000000 bytes on node 1 flags: 0x0 > > > > > > > memory[0x4] [0x0000100090000000-0x000010027fffffff], > > > > > > > 0x00000001f0000000 bytes on node 1 flags: 0x0 > > > > > > > reserved.cnt = 0x1f > > > > > > > reserved[0x0] [0x0000000000000000-0x000000000190c80a], > > > > > > > 0x000000000190c80b bytes flags: 0x0 !!!!oops 0x0-0x1fffff not in memory[0] > > > > > > > reserved[0x1] [0x000000000190c810-0x000000000190eea3], > > > > > > > 0x0000000000002694 bytes flags: 0x0 > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > > > > > It caused memory-reserved region "0x0-0x1fffff" without valid node id > > > > > > > in "memblock_get_region_node" from "memmap_init_reserved_pages", lead to > > > > > > > "reserve_bootmem_region-> init_reserved_page -> early_pfn_to_nid()" > > > > > > > accessing "node_data" out of bound. > > > > > > > > > > > > > > To fix this bug, we should remove unnecessary memory block reservation. > > > > > > > > > > > > > > +. no need to reserve 0x0-0x1fffff below kernel loading address, since > > > > > > > it is not registered by "memblock_add_node" > > > > > > > > > > > > > > +. no need to reserve 0x0-0xfff for exception handling if it is not > > > > > > > registered by "memblock_add" either. > > > > > > > > > > > > > > Fixes: commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()) > > > > > > > Signed-off-by: Huang Pei <huangpei@loongson.cn> > > > > > > > --- > > > > > > > arch/mips/kernel/traps.c | 3 ++- > > > > > > > arch/mips/loongson64/numa.c | 2 -- > > > > > > > 2 files changed, 2 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c > > > > > > > index 246c6a6b0261..9b632b4c10c3 100644 > > > > > > > --- a/arch/mips/kernel/traps.c > > > > > > > +++ b/arch/mips/kernel/traps.c > > > > > > > @@ -2007,7 +2007,8 @@ unsigned long vi_handlers[64]; > > > > > > > void reserve_exception_space(phys_addr_t addr, unsigned long size) > > > > > > > { > > > > > > > - memblock_reserve(addr, size); > > > > > > > + if(memblock_is_region_memory(addr, size)) > > > > > > > + memblock_reserve(addr, size); > > > > > > > } > > > > > > > void __init *set_except_vector(int n, void *addr) > > > > > > > diff --git a/arch/mips/loongson64/numa.c b/arch/mips/loongson64/numa.c > > > > > > > index 8f61e93c0c5b..0f516dde81da 100644 > > > > > > > --- a/arch/mips/loongson64/numa.c > > > > > > > +++ b/arch/mips/loongson64/numa.c > > > > > > > @@ -130,8 +130,6 @@ static void __init node_mem_init(unsigned int node) > > > > > > > memblock_reserve((node_addrspace_offset | 0xfe000000), > > > > > > > 32 << 20); > > > > > > > - /* Reserve pfn range 0~node[0]->node_start_pfn */ > > > > > > > - memblock_reserve(0, PAGE_SIZE * start_pfn); > > > > > > > } > > > > > > > } > > > > > > -- > > > > > > --- > > > > > > Jiaxun Yang > > > > > > > > > > > -- > > > > > Sincerely yours, > > > > > Mike.
On Wed, Jan 17, 2024 at 11:59:10AM +0800, Huang Pei wrote: > On Wed, Jan 17, 2024 at 11:17:18AM +0800, Yajun Deng wrote: > > > > On 2024/1/17 11:01, Huang Pei wrote: > > > On Wed, Jan 17, 2024 at 10:20:00AM +0800, Yajun Deng wrote: > > > > On 2024/1/16 20:23, Huang Pei wrote: > > > > > On Tue, Jan 16, 2024 at 10:39:04AM +0200, Mike Rapoport wrote: > > > > > > On Mon, Jan 15, 2024 at 02:08:21PM +0000, Jiaxun Yang wrote: > > > > > > > Hi mm folks, > > > > > > > > > > > > > > Just a quick question, what is the expected behavior of memblock_reserve > > > > > > > a region that is not added to memblock with memblock_add? > > > > > > > > > > > > > > I'm unable to find any documentation about memblock_reserve in comments and > > > > > > > boot-time-mm, but as per my understanding to the code, this should be a > > > > > > > legit usage? > > > > > > Yes, memblock allows reserving memory that was not added to memblock with > > > > > > memblock_add(). > > > > > I think arch/platform specific code should fix this bug, like, > > > > > -------------------------------------------------------------------------- > > > > > //for loongson64 > > > > > memblock_set_node(0, 1ULL << 44, &memblock.reserved, 0); > > > > > > > > > > -------------------------------------------------------------------------- > > > > > > > > > > or maybe memblock provide something like memblock_reserve_node > > > > Hi pei, > > > > > > > > Can you test the following patch to see if it fixes this bug? > > > > > > > > diff --git a/mm/mm_init.c b/mm/mm_init.c > > > > index 2c19f5515e36..97721d99fdce 100644 > > > > --- a/mm/mm_init.c > > > > +++ b/mm/mm_init.c > > > > @@ -708,6 +708,9 @@ static void __meminit init_reserved_page(unsigned long > > > > pfn, int nid) > > > > pg_data_t *pgdat; > > > > int zid; > > > > > > > > + if (unlikely(nid == NUMA_NO_NODE || nid >= MAX_NUMNODES)) > > > > + nid = early_pfn_to_nid(pfn); > > > > + > > > > if (early_page_initialised(pfn, nid)) > > > > return; IMO this will fix the bug. Before 61167ad5fecd we had nid = first_online_node for reserved pages that didn't have nid set in memblock.reserved. After 61167ad5fecd we try to initialize these pages with MAX_NUMNODES and obviously crash when accessing node structure. I think that the check for a valid nid should be moved to memmap_init_reserved_pages() though. An entire reserved region will have nid set to MAX_NUMNODES, so there is no point to check every page in it. > > > I do not think this fix set the right nid, ONLY arch/platform know that > > > nid Why does it matter to have the right nid in a reserved page that is not part of usable memory? That's true that only arch knows on which node those reserved pages are, but core mm does not care about reserved pages that are not in memory. > > > int __meminit early_pfn_to_nid(unsigned long pfn) > > > { > > > static DEFINE_SPINLOCK(early_pfn_lock); > > > int nid; > > > > > > spin_lock(&early_pfn_lock); > > > nid = __early_pfn_to_nid(pfn, > > > &early_pfnnid_cache); > > > if (nid < 0) > > > //!!!first_online_node MAY NOT be the node the pfn belong to!!! > > > nid = first_online_node; > > > > > > spin_unlock(&early_pfn_lock); > > > > > > return > > > nid; > > > } > > > > > > Okay, I don't think this bug is caused by commit 61167ad5fecd ("mm: pass nid > > to reserve_bootmem_region()"), > > > > because even if you revert this commit, it will still get nid by > > early_pfn_to_nid(). Did I get that right? > > Yes, more accurately, this bug is exposed by commit 61167ad5fecd. My > previous fix is based on presumptions that memory_reserve should reserve memory > added by memblock_add{,_node}, if going across this limitation, there need > to set the valid nid for reserved memory region. > > > > > > > > > > > > > In practical we run into uninitialized nid of reserved block problem, should > > > > > > > we fix it > > > > > > > in our usage, or on memblock side? > > > > > > Apparently it's a bug in memblock :( > > > > > > > > > > > > If you revert 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") > > > > > > does the issue disappear? > > > > > Yes, I git bisect this commit. > > > > > > > > > > But I don't think it is a bug in memblock. IMO, memblock_reserve under > > > > > NUMA set nid of reserved region to MAX_NUMNODES, which is the point > > > > > that cause the "memblock_get_region_node from memmap_init_reserved_pages " > > > > > passing a invalid node id(aka MAX_NUMNODES) to "reserver_bootmem_region > > > > > -> init_reserved_page -> early_pfn_to_nid". If arch-specific code DOES NOT > > > > > initialize the nid of reserved region(only it know that), or the reserved > > > > > region NOT within a memblock added by memblock_add, memblock can not > > > > > give a valid node id to the reserved region. Commit 61167ad5fecd ("mm: pass nid to > > > > > reserve_bootmem_region()") just reveals the embarrassment case by an > > > > > out of bound memory access. > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > 在 2023/12/25 09:30, Huang Pei 写道: > > > > > > > > Since commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()), > > > > > > > > loongson64 booting failed with CONFIG_DEFERRED_STRUCT_PAGE_INIT like > > > > > > > > this: > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > Call Trace: > > > > > > > > [<ffffffff8235d088>] reserve_bootmem_region+0xa8/0x184 > > > > > > > > [<ffffffff82333940>] memblock_free_all+0x104/0x2a8 > > > > > > > > [<ffffffff8231d8e4>] mem_init+0x84/0x94 > > > > > > > > [<ffffffff82330958>] mm_core_init+0xf8/0x308 > > > > > > > > [<ffffffff82318c38>] start_kernel+0x43c/0x86c > > > > > > > > > > > > > > > > Code: 10400028 2402fff0 de420000 <dc432880> 0203182b 14600022 > > > > > > > > 64420070 00003025 24040003 > > > > > > > > > > > > > > > > ---[ end trace 0000000000000000 ]--- > > > > > > > > Kernel panic - not syncing: Attempted to kill the idle task! > > > > > > > > ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > > > > > > > The root cause is no memory region "0x0-0x1fffff" paired with > > > > > > > > memory-reserved region "0x0-0x1fffff" and "0x0-0xfff", with "memblock > > > > > > > > =debug": > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > memory[0x0] [0x0000000000200000-0x000000000effffff], > > > > > > > > 0x000000000ee00000 bytes on node 0 flags: 0x0 !!!!here > > > > > > > > memory[0x1] [0x0000000090000000-0x00000000fdffffff], > > > > > > > > 0x000000006e000000 bytes on node 0 flags: 0x0 > > > > > > > > memory[0x2] [0x0000000100000000-0x000000027fffffff], > > > > > > > > 0x0000000180000000 bytes on node 0 flags: 0x0 > > > > > > > > memory[0x3] [0x0000100000000000-0x000010000fffffff], > > > > > > > > 0x0000000010000000 bytes on node 1 flags: 0x0 > > > > > > > > memory[0x4] [0x0000100090000000-0x000010027fffffff], > > > > > > > > 0x00000001f0000000 bytes on node 1 flags: 0x0 > > > > > > > > reserved.cnt = 0x1f > > > > > > > > reserved[0x0] [0x0000000000000000-0x000000000190c80a], > > > > > > > > 0x000000000190c80b bytes flags: 0x0 !!!!oops 0x0-0x1fffff not in memory[0] > > > > > > > > reserved[0x1] [0x000000000190c810-0x000000000190eea3], > > > > > > > > 0x0000000000002694 bytes flags: 0x0 > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > > > > > > > It caused memory-reserved region "0x0-0x1fffff" without valid node id > > > > > > > > in "memblock_get_region_node" from "memmap_init_reserved_pages", lead to > > > > > > > > "reserve_bootmem_region-> init_reserved_page -> early_pfn_to_nid()" > > > > > > > > accessing "node_data" out of bound. > > > > > > > > > > > > > > > > To fix this bug, we should remove unnecessary memory block reservation. > > > > > > > > > > > > > > > > +. no need to reserve 0x0-0x1fffff below kernel loading address, since > > > > > > > > it is not registered by "memblock_add_node" > > > > > > > > > > > > > > > > +. no need to reserve 0x0-0xfff for exception handling if it is not > > > > > > > > registered by "memblock_add" either. > > > > > > > > > > > > > > > > Fixes: commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()) > > > > > > > > Signed-off-by: Huang Pei <huangpei@loongson.cn> > > > > > > > > --- > > > > > > > > arch/mips/kernel/traps.c | 3 ++- > > > > > > > > arch/mips/loongson64/numa.c | 2 -- > > > > > > > > 2 files changed, 2 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > > > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c > > > > > > > > index 246c6a6b0261..9b632b4c10c3 100644 > > > > > > > > --- a/arch/mips/kernel/traps.c > > > > > > > > +++ b/arch/mips/kernel/traps.c > > > > > > > > @@ -2007,7 +2007,8 @@ unsigned long vi_handlers[64]; > > > > > > > > void reserve_exception_space(phys_addr_t addr, unsigned long size) > > > > > > > > { > > > > > > > > - memblock_reserve(addr, size); > > > > > > > > + if(memblock_is_region_memory(addr, size)) > > > > > > > > + memblock_reserve(addr, size); > > > > > > > > } > > > > > > > > void __init *set_except_vector(int n, void *addr) > > > > > > > > diff --git a/arch/mips/loongson64/numa.c b/arch/mips/loongson64/numa.c > > > > > > > > index 8f61e93c0c5b..0f516dde81da 100644 > > > > > > > > --- a/arch/mips/loongson64/numa.c > > > > > > > > +++ b/arch/mips/loongson64/numa.c > > > > > > > > @@ -130,8 +130,6 @@ static void __init node_mem_init(unsigned int node) > > > > > > > > memblock_reserve((node_addrspace_offset | 0xfe000000), > > > > > > > > 32 << 20); > > > > > > > > - /* Reserve pfn range 0~node[0]->node_start_pfn */ > > > > > > > > - memblock_reserve(0, PAGE_SIZE * start_pfn); > > > > > > > > } > > > > > > > > } > > > > > > > -- > > > > > > > --- > > > > > > > Jiaxun Yang > > > > > > > > > > > > > -- > > > > > > Sincerely yours, > > > > > > Mike. >
On Wed, Jan 17, 2024 at 08:46:34AM +0200, Mike Rapoport wrote: > On Wed, Jan 17, 2024 at 11:59:10AM +0800, Huang Pei wrote: > > On Wed, Jan 17, 2024 at 11:17:18AM +0800, Yajun Deng wrote: > > > > > > On 2024/1/17 11:01, Huang Pei wrote: > > > > On Wed, Jan 17, 2024 at 10:20:00AM +0800, Yajun Deng wrote: > > > > > On 2024/1/16 20:23, Huang Pei wrote: > > > > > > On Tue, Jan 16, 2024 at 10:39:04AM +0200, Mike Rapoport wrote: > > > > > > > On Mon, Jan 15, 2024 at 02:08:21PM +0000, Jiaxun Yang wrote: > > > > > > > > Hi mm folks, > > > > > > > > > > > > > > > > Just a quick question, what is the expected behavior of memblock_reserve > > > > > > > > a region that is not added to memblock with memblock_add? > > > > > > > > > > > > > > > > I'm unable to find any documentation about memblock_reserve in comments and > > > > > > > > boot-time-mm, but as per my understanding to the code, this should be a > > > > > > > > legit usage? > > > > > > > Yes, memblock allows reserving memory that was not added to memblock with > > > > > > > memblock_add(). > > > > > > I think arch/platform specific code should fix this bug, like, > > > > > > -------------------------------------------------------------------------- > > > > > > //for loongson64 > > > > > > memblock_set_node(0, 1ULL << 44, &memblock.reserved, 0); > > > > > > > > > > > > -------------------------------------------------------------------------- > > > > > > > > > > > > or maybe memblock provide something like memblock_reserve_node > > > > > Hi pei, > > > > > > > > > > Can you test the following patch to see if it fixes this bug? > > > > > > > > > > diff --git a/mm/mm_init.c b/mm/mm_init.c > > > > > index 2c19f5515e36..97721d99fdce 100644 > > > > > --- a/mm/mm_init.c > > > > > +++ b/mm/mm_init.c > > > > > @@ -708,6 +708,9 @@ static void __meminit init_reserved_page(unsigned long > > > > > pfn, int nid) > > > > > pg_data_t *pgdat; > > > > > int zid; > > > > > > > > > > + if (unlikely(nid == NUMA_NO_NODE || nid >= MAX_NUMNODES)) > > > > > + nid = early_pfn_to_nid(pfn); > > > > > + > > > > > if (early_page_initialised(pfn, nid)) > > > > > return; > > IMO this will fix the bug. Before 61167ad5fecd we had nid = first_online_node > for reserved pages that didn't have nid set in memblock.reserved. After > 61167ad5fecd we try to initialize these pages with MAX_NUMNODES and > obviously crash when accessing node structure. > > I think that the check for a valid nid should be moved to > memmap_init_reserved_pages() though. An entire reserved region will have > nid set to MAX_NUMNODES, so there is no point to check every page in it. > > > > > I do not think this fix set the right nid, ONLY arch/platform know that > > > > nid > > Why does it matter to have the right nid in a reserved page that is not > part of usable memory? > IMO, if a reserved page DO have a valid nid, and archs knows that, archs should set it right, and this is just the case of loongson64(and loongarch). > That's true that only arch knows on which node those reserved pages are, > but core mm does not care about reserved pages that are not in memory. > > > > > int __meminit early_pfn_to_nid(unsigned long pfn) > > > > { > > > > static DEFINE_SPINLOCK(early_pfn_lock); > > > > int nid; > > > > > > > > spin_lock(&early_pfn_lock); > > > > nid = __early_pfn_to_nid(pfn, > > > > &early_pfnnid_cache); > > > > if (nid < 0) > > > > //!!!first_online_node MAY NOT be the node the pfn belong to!!! > > > > nid = first_online_node; > > > > > > > > spin_unlock(&early_pfn_lock); > > > > > > > > return > > > > nid; > > > > } > > > > > > > > > Okay, I don't think this bug is caused by commit 61167ad5fecd ("mm: pass nid > > > to reserve_bootmem_region()"), > > > > > > because even if you revert this commit, it will still get nid by > > > early_pfn_to_nid(). Did I get that right? > > > > Yes, more accurately, this bug is exposed by commit 61167ad5fecd. My > > previous fix is based on presumptions that memory_reserve should reserve memory > > added by memblock_add{,_node}, if going across this limitation, there need > > to set the valid nid for reserved memory region. > > > > > > > > > > > > > > > > In practical we run into uninitialized nid of reserved block problem, should > > > > > > > > we fix it > > > > > > > > in our usage, or on memblock side? > > > > > > > Apparently it's a bug in memblock :( > > > > > > > > > > > > > > If you revert 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") > > > > > > > does the issue disappear? > > > > > > Yes, I git bisect this commit. > > > > > > > > > > > > But I don't think it is a bug in memblock. IMO, memblock_reserve under > > > > > > NUMA set nid of reserved region to MAX_NUMNODES, which is the point > > > > > > that cause the "memblock_get_region_node from memmap_init_reserved_pages " > > > > > > passing a invalid node id(aka MAX_NUMNODES) to "reserver_bootmem_region > > > > > > -> init_reserved_page -> early_pfn_to_nid". If arch-specific code DOES NOT > > > > > > initialize the nid of reserved region(only it know that), or the reserved > > > > > > region NOT within a memblock added by memblock_add, memblock can not > > > > > > give a valid node id to the reserved region. Commit 61167ad5fecd ("mm: pass nid to > > > > > > reserve_bootmem_region()") just reveals the embarrassment case by an > > > > > > out of bound memory access. > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > 在 2023/12/25 09:30, Huang Pei 写道: > > > > > > > > > Since commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()), > > > > > > > > > loongson64 booting failed with CONFIG_DEFERRED_STRUCT_PAGE_INIT like > > > > > > > > > this: > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > Call Trace: > > > > > > > > > [<ffffffff8235d088>] reserve_bootmem_region+0xa8/0x184 > > > > > > > > > [<ffffffff82333940>] memblock_free_all+0x104/0x2a8 > > > > > > > > > [<ffffffff8231d8e4>] mem_init+0x84/0x94 > > > > > > > > > [<ffffffff82330958>] mm_core_init+0xf8/0x308 > > > > > > > > > [<ffffffff82318c38>] start_kernel+0x43c/0x86c > > > > > > > > > > > > > > > > > > Code: 10400028 2402fff0 de420000 <dc432880> 0203182b 14600022 > > > > > > > > > 64420070 00003025 24040003 > > > > > > > > > > > > > > > > > > ---[ end trace 0000000000000000 ]--- > > > > > > > > > Kernel panic - not syncing: Attempted to kill the idle task! > > > > > > > > > ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > > > > > > > > > The root cause is no memory region "0x0-0x1fffff" paired with > > > > > > > > > memory-reserved region "0x0-0x1fffff" and "0x0-0xfff", with "memblock > > > > > > > > > =debug": > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > memory[0x0] [0x0000000000200000-0x000000000effffff], > > > > > > > > > 0x000000000ee00000 bytes on node 0 flags: 0x0 !!!!here > > > > > > > > > memory[0x1] [0x0000000090000000-0x00000000fdffffff], > > > > > > > > > 0x000000006e000000 bytes on node 0 flags: 0x0 > > > > > > > > > memory[0x2] [0x0000000100000000-0x000000027fffffff], > > > > > > > > > 0x0000000180000000 bytes on node 0 flags: 0x0 > > > > > > > > > memory[0x3] [0x0000100000000000-0x000010000fffffff], > > > > > > > > > 0x0000000010000000 bytes on node 1 flags: 0x0 > > > > > > > > > memory[0x4] [0x0000100090000000-0x000010027fffffff], > > > > > > > > > 0x00000001f0000000 bytes on node 1 flags: 0x0 > > > > > > > > > reserved.cnt = 0x1f > > > > > > > > > reserved[0x0] [0x0000000000000000-0x000000000190c80a], > > > > > > > > > 0x000000000190c80b bytes flags: 0x0 !!!!oops 0x0-0x1fffff not in memory[0] > > > > > > > > > reserved[0x1] [0x000000000190c810-0x000000000190eea3], > > > > > > > > > 0x0000000000002694 bytes flags: 0x0 > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > > > > > > > > > It caused memory-reserved region "0x0-0x1fffff" without valid node id > > > > > > > > > in "memblock_get_region_node" from "memmap_init_reserved_pages", lead to > > > > > > > > > "reserve_bootmem_region-> init_reserved_page -> early_pfn_to_nid()" > > > > > > > > > accessing "node_data" out of bound. > > > > > > > > > > > > > > > > > > To fix this bug, we should remove unnecessary memory block reservation. > > > > > > > > > > > > > > > > > > +. no need to reserve 0x0-0x1fffff below kernel loading address, since > > > > > > > > > it is not registered by "memblock_add_node" > > > > > > > > > > > > > > > > > > +. no need to reserve 0x0-0xfff for exception handling if it is not > > > > > > > > > registered by "memblock_add" either. > > > > > > > > > > > > > > > > > > Fixes: commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()) > > > > > > > > > Signed-off-by: Huang Pei <huangpei@loongson.cn> > > > > > > > > > --- > > > > > > > > > arch/mips/kernel/traps.c | 3 ++- > > > > > > > > > arch/mips/loongson64/numa.c | 2 -- > > > > > > > > > 2 files changed, 2 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > > > > > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c > > > > > > > > > index 246c6a6b0261..9b632b4c10c3 100644 > > > > > > > > > --- a/arch/mips/kernel/traps.c > > > > > > > > > +++ b/arch/mips/kernel/traps.c > > > > > > > > > @@ -2007,7 +2007,8 @@ unsigned long vi_handlers[64]; > > > > > > > > > void reserve_exception_space(phys_addr_t addr, unsigned long size) > > > > > > > > > { > > > > > > > > > - memblock_reserve(addr, size); > > > > > > > > > + if(memblock_is_region_memory(addr, size)) > > > > > > > > > + memblock_reserve(addr, size); > > > > > > > > > } > > > > > > > > > void __init *set_except_vector(int n, void *addr) > > > > > > > > > diff --git a/arch/mips/loongson64/numa.c b/arch/mips/loongson64/numa.c > > > > > > > > > index 8f61e93c0c5b..0f516dde81da 100644 > > > > > > > > > --- a/arch/mips/loongson64/numa.c > > > > > > > > > +++ b/arch/mips/loongson64/numa.c > > > > > > > > > @@ -130,8 +130,6 @@ static void __init node_mem_init(unsigned int node) > > > > > > > > > memblock_reserve((node_addrspace_offset | 0xfe000000), > > > > > > > > > 32 << 20); > > > > > > > > > - /* Reserve pfn range 0~node[0]->node_start_pfn */ > > > > > > > > > - memblock_reserve(0, PAGE_SIZE * start_pfn); > > > > > > > > > } > > > > > > > > > } > > > > > > > > -- > > > > > > > > --- > > > > > > > > Jiaxun Yang > > > > > > > > > > > > > > > -- > > > > > > > Sincerely yours, > > > > > > > Mike. > > > > -- > Sincerely yours, > Mike.
On Wed, Jan 17, 2024 at 03:45:46PM +0800, Huang Pei wrote: > On Wed, Jan 17, 2024 at 08:46:34AM +0200, Mike Rapoport wrote: > > On Wed, Jan 17, 2024 at 11:59:10AM +0800, Huang Pei wrote: > > > On Wed, Jan 17, 2024 at 11:17:18AM +0800, Yajun Deng wrote: > > > > > > > > On 2024/1/17 11:01, Huang Pei wrote: > > > > > On Wed, Jan 17, 2024 at 10:20:00AM +0800, Yajun Deng wrote: > > > > > > On 2024/1/16 20:23, Huang Pei wrote: > > > > > > > On Tue, Jan 16, 2024 at 10:39:04AM +0200, Mike Rapoport wrote: > > > > > > > > On Mon, Jan 15, 2024 at 02:08:21PM +0000, Jiaxun Yang wrote: > > > > > > > > > Hi mm folks, > > > > > > > > > > > > > > > > > > Just a quick question, what is the expected behavior of memblock_reserve > > > > > > > > > a region that is not added to memblock with memblock_add? > > > > > > > > > > > > > > > > > > I'm unable to find any documentation about memblock_reserve in comments and > > > > > > > > > boot-time-mm, but as per my understanding to the code, this should be a > > > > > > > > > legit usage? > > > > > > > > Yes, memblock allows reserving memory that was not added to memblock with > > > > > > > > memblock_add(). > > > > > > > I think arch/platform specific code should fix this bug, like, > > > > > > > -------------------------------------------------------------------------- > > > > > > > //for loongson64 > > > > > > > memblock_set_node(0, 1ULL << 44, &memblock.reserved, 0); > > > > > > > > > > > > > > -------------------------------------------------------------------------- > > > > > > > > > > > > > > or maybe memblock provide something like memblock_reserve_node > > > > > > Hi pei, > > > > > > > > > > > > Can you test the following patch to see if it fixes this bug? > > > > > > > > > > > > diff --git a/mm/mm_init.c b/mm/mm_init.c > > > > > > index 2c19f5515e36..97721d99fdce 100644 > > > > > > --- a/mm/mm_init.c > > > > > > +++ b/mm/mm_init.c > > > > > > @@ -708,6 +708,9 @@ static void __meminit init_reserved_page(unsigned long > > > > > > pfn, int nid) > > > > > > pg_data_t *pgdat; > > > > > > int zid; > > > > > > > > > > > > + if (unlikely(nid == NUMA_NO_NODE || nid >= MAX_NUMNODES)) > > > > > > + nid = early_pfn_to_nid(pfn); > > > > > > + > > > > > > if (early_page_initialised(pfn, nid)) > > > > > > return; > > > > IMO this will fix the bug. Before 61167ad5fecd we had nid = first_online_node > > for reserved pages that didn't have nid set in memblock.reserved. After > > 61167ad5fecd we try to initialize these pages with MAX_NUMNODES and > > obviously crash when accessing node structure. > > > > I think that the check for a valid nid should be moved to > > memmap_init_reserved_pages() though. An entire reserved region will have > > nid set to MAX_NUMNODES, so there is no point to check every page in it. > > > > > > > I do not think this fix set the right nid, ONLY arch/platform know that > > > > > nid > > > > Why does it matter to have the right nid in a reserved page that is not > > part of usable memory? > > > IMO, if a reserved page DO have a valid nid, and archs knows that, archs > should set it right, and this is just the case of loongson64(and > loongarch). An arch may choose to set nids for reserved regions that are never added to memblock.memory, but mm shouldn't crash if it didn't. > > That's true that only arch knows on which node those reserved pages are, > > but core mm does not care about reserved pages that are not in memory. > > > > > > > int __meminit early_pfn_to_nid(unsigned long pfn) > > > > > { > > > > > static DEFINE_SPINLOCK(early_pfn_lock); > > > > > int nid; > > > > > > > > > > spin_lock(&early_pfn_lock); > > > > > nid = __early_pfn_to_nid(pfn, > > > > > &early_pfnnid_cache); > > > > > if (nid < 0) > > > > > //!!!first_online_node MAY NOT be the node the pfn belong to!!! > > > > > nid = first_online_node; > > > > > > > > > > spin_unlock(&early_pfn_lock); > > > > > > > > > > return > > > > > nid; > > > > > } > > > > > > > > > > > > Okay, I don't think this bug is caused by commit 61167ad5fecd ("mm: pass nid > > > > to reserve_bootmem_region()"), > > > > > > > > because even if you revert this commit, it will still get nid by > > > > early_pfn_to_nid(). Did I get that right? > > > > > > Yes, more accurately, this bug is exposed by commit 61167ad5fecd. My > > > previous fix is based on presumptions that memory_reserve should reserve memory > > > added by memblock_add{,_node}, if going across this limitation, there need > > > to set the valid nid for reserved memory region. > > > > > > > > > > > > > > > > > > > In practical we run into uninitialized nid of reserved block problem, should > > > > > > > > > we fix it > > > > > > > > > in our usage, or on memblock side? > > > > > > > > Apparently it's a bug in memblock :( > > > > > > > > > > > > > > > > If you revert 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") > > > > > > > > does the issue disappear? > > > > > > > Yes, I git bisect this commit. > > > > > > > > > > > > > > But I don't think it is a bug in memblock. IMO, memblock_reserve under > > > > > > > NUMA set nid of reserved region to MAX_NUMNODES, which is the point > > > > > > > that cause the "memblock_get_region_node from memmap_init_reserved_pages " > > > > > > > passing a invalid node id(aka MAX_NUMNODES) to "reserver_bootmem_region > > > > > > > -> init_reserved_page -> early_pfn_to_nid". If arch-specific code DOES NOT > > > > > > > initialize the nid of reserved region(only it know that), or the reserved > > > > > > > region NOT within a memblock added by memblock_add, memblock can not > > > > > > > give a valid node id to the reserved region. Commit 61167ad5fecd ("mm: pass nid to > > > > > > > reserve_bootmem_region()") just reveals the embarrassment case by an > > > > > > > out of bound memory access. > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > 在 2023/12/25 09:30, Huang Pei 写道: > > > > > > > > > > Since commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()), > > > > > > > > > > loongson64 booting failed with CONFIG_DEFERRED_STRUCT_PAGE_INIT like > > > > > > > > > > this: > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > Call Trace: > > > > > > > > > > [<ffffffff8235d088>] reserve_bootmem_region+0xa8/0x184 > > > > > > > > > > [<ffffffff82333940>] memblock_free_all+0x104/0x2a8 > > > > > > > > > > [<ffffffff8231d8e4>] mem_init+0x84/0x94 > > > > > > > > > > [<ffffffff82330958>] mm_core_init+0xf8/0x308 > > > > > > > > > > [<ffffffff82318c38>] start_kernel+0x43c/0x86c > > > > > > > > > > > > > > > > > > > > Code: 10400028 2402fff0 de420000 <dc432880> 0203182b 14600022 > > > > > > > > > > 64420070 00003025 24040003 > > > > > > > > > > > > > > > > > > > > ---[ end trace 0000000000000000 ]--- > > > > > > > > > > Kernel panic - not syncing: Attempted to kill the idle task! > > > > > > > > > > ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > > > > > > > > > > > The root cause is no memory region "0x0-0x1fffff" paired with > > > > > > > > > > memory-reserved region "0x0-0x1fffff" and "0x0-0xfff", with "memblock > > > > > > > > > > =debug": > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > memory[0x0] [0x0000000000200000-0x000000000effffff], > > > > > > > > > > 0x000000000ee00000 bytes on node 0 flags: 0x0 !!!!here > > > > > > > > > > memory[0x1] [0x0000000090000000-0x00000000fdffffff], > > > > > > > > > > 0x000000006e000000 bytes on node 0 flags: 0x0 > > > > > > > > > > memory[0x2] [0x0000000100000000-0x000000027fffffff], > > > > > > > > > > 0x0000000180000000 bytes on node 0 flags: 0x0 > > > > > > > > > > memory[0x3] [0x0000100000000000-0x000010000fffffff], > > > > > > > > > > 0x0000000010000000 bytes on node 1 flags: 0x0 > > > > > > > > > > memory[0x4] [0x0000100090000000-0x000010027fffffff], > > > > > > > > > > 0x00000001f0000000 bytes on node 1 flags: 0x0 > > > > > > > > > > reserved.cnt = 0x1f > > > > > > > > > > reserved[0x0] [0x0000000000000000-0x000000000190c80a], > > > > > > > > > > 0x000000000190c80b bytes flags: 0x0 !!!!oops 0x0-0x1fffff not in memory[0] > > > > > > > > > > reserved[0x1] [0x000000000190c810-0x000000000190eea3], > > > > > > > > > > 0x0000000000002694 bytes flags: 0x0 > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > > > > > > > > > > > It caused memory-reserved region "0x0-0x1fffff" without valid node id > > > > > > > > > > in "memblock_get_region_node" from "memmap_init_reserved_pages", lead to > > > > > > > > > > "reserve_bootmem_region-> init_reserved_page -> early_pfn_to_nid()" > > > > > > > > > > accessing "node_data" out of bound. > > > > > > > > > > > > > > > > > > > > To fix this bug, we should remove unnecessary memory block reservation. > > > > > > > > > > > > > > > > > > > > +. no need to reserve 0x0-0x1fffff below kernel loading address, since > > > > > > > > > > it is not registered by "memblock_add_node" > > > > > > > > > > > > > > > > > > > > +. no need to reserve 0x0-0xfff for exception handling if it is not > > > > > > > > > > registered by "memblock_add" either. > > > > > > > > > > > > > > > > > > > > Fixes: commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()) > > > > > > > > > > Signed-off-by: Huang Pei <huangpei@loongson.cn> > > > > > > > > > > --- > > > > > > > > > > arch/mips/kernel/traps.c | 3 ++- > > > > > > > > > > arch/mips/loongson64/numa.c | 2 -- > > > > > > > > > > 2 files changed, 2 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > > > > > > > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c > > > > > > > > > > index 246c6a6b0261..9b632b4c10c3 100644 > > > > > > > > > > --- a/arch/mips/kernel/traps.c > > > > > > > > > > +++ b/arch/mips/kernel/traps.c > > > > > > > > > > @@ -2007,7 +2007,8 @@ unsigned long vi_handlers[64]; > > > > > > > > > > void reserve_exception_space(phys_addr_t addr, unsigned long size) > > > > > > > > > > { > > > > > > > > > > - memblock_reserve(addr, size); > > > > > > > > > > + if(memblock_is_region_memory(addr, size)) > > > > > > > > > > + memblock_reserve(addr, size); > > > > > > > > > > } > > > > > > > > > > void __init *set_except_vector(int n, void *addr) > > > > > > > > > > diff --git a/arch/mips/loongson64/numa.c b/arch/mips/loongson64/numa.c > > > > > > > > > > index 8f61e93c0c5b..0f516dde81da 100644 > > > > > > > > > > --- a/arch/mips/loongson64/numa.c > > > > > > > > > > +++ b/arch/mips/loongson64/numa.c > > > > > > > > > > @@ -130,8 +130,6 @@ static void __init node_mem_init(unsigned int node) > > > > > > > > > > memblock_reserve((node_addrspace_offset | 0xfe000000), > > > > > > > > > > 32 << 20); > > > > > > > > > > - /* Reserve pfn range 0~node[0]->node_start_pfn */ > > > > > > > > > > - memblock_reserve(0, PAGE_SIZE * start_pfn); > > > > > > > > > > } > > > > > > > > > > } > > > > > > > > > -- > > > > > > > > > --- > > > > > > > > > Jiaxun Yang > > > > > > > > > > > > > > > > > -- > > > > > > > > Sincerely yours, > > > > > > > > Mike. > > > > > > > -- > > Sincerely yours, > > Mike. >
On Wed, Jan 17, 2024 at 01:08:48PM +0200, Mike Rapoport wrote: > On Wed, Jan 17, 2024 at 03:45:46PM +0800, Huang Pei wrote: > > On Wed, Jan 17, 2024 at 08:46:34AM +0200, Mike Rapoport wrote: > > > On Wed, Jan 17, 2024 at 11:59:10AM +0800, Huang Pei wrote: > > > > On Wed, Jan 17, 2024 at 11:17:18AM +0800, Yajun Deng wrote: > > > > > > > > > > On 2024/1/17 11:01, Huang Pei wrote: > > > > > > On Wed, Jan 17, 2024 at 10:20:00AM +0800, Yajun Deng wrote: > > > > > > > On 2024/1/16 20:23, Huang Pei wrote: > > > > > > > > On Tue, Jan 16, 2024 at 10:39:04AM +0200, Mike Rapoport wrote: > > > > > > > > > On Mon, Jan 15, 2024 at 02:08:21PM +0000, Jiaxun Yang wrote: > > > > > > > > > > Hi mm folks, > > > > > > > > > > > > > > > > > > > > Just a quick question, what is the expected behavior of memblock_reserve > > > > > > > > > > a region that is not added to memblock with memblock_add? > > > > > > > > > > > > > > > > > > > > I'm unable to find any documentation about memblock_reserve in comments and > > > > > > > > > > boot-time-mm, but as per my understanding to the code, this should be a > > > > > > > > > > legit usage? > > > > > > > > > Yes, memblock allows reserving memory that was not added to memblock with > > > > > > > > > memblock_add(). > > > > > > > > I think arch/platform specific code should fix this bug, like, > > > > > > > > -------------------------------------------------------------------------- > > > > > > > > //for loongson64 > > > > > > > > memblock_set_node(0, 1ULL << 44, &memblock.reserved, 0); > > > > > > > > > > > > > > > > -------------------------------------------------------------------------- > > > > > > > > > > > > > > > > or maybe memblock provide something like memblock_reserve_node > > > > > > > Hi pei, > > > > > > > > > > > > > > Can you test the following patch to see if it fixes this bug? > > > > > > > > > > > > > > diff --git a/mm/mm_init.c b/mm/mm_init.c > > > > > > > index 2c19f5515e36..97721d99fdce 100644 > > > > > > > --- a/mm/mm_init.c > > > > > > > +++ b/mm/mm_init.c > > > > > > > @@ -708,6 +708,9 @@ static void __meminit init_reserved_page(unsigned long > > > > > > > pfn, int nid) > > > > > > > pg_data_t *pgdat; > > > > > > > int zid; > > > > > > > > > > > > > > + if (unlikely(nid == NUMA_NO_NODE || nid >= MAX_NUMNODES)) > > > > > > > + nid = early_pfn_to_nid(pfn); > > > > > > > + > > > > > > > if (early_page_initialised(pfn, nid)) > > > > > > > return; > > > > > > IMO this will fix the bug. Before 61167ad5fecd we had nid = first_online_node > > > for reserved pages that didn't have nid set in memblock.reserved. After > > > 61167ad5fecd we try to initialize these pages with MAX_NUMNODES and > > > obviously crash when accessing node structure. > > > > > > I think that the check for a valid nid should be moved to > > > memmap_init_reserved_pages() though. An entire reserved region will have > > > nid set to MAX_NUMNODES, so there is no point to check every page in it. > > > > > > > > > I do not think this fix set the right nid, ONLY arch/platform know that > > > > > > nid > > > > > > Why does it matter to have the right nid in a reserved page that is not > > > part of usable memory? > > > > > IMO, if a reserved page DO have a valid nid, and archs knows that, archs > > should set it right, and this is just the case of loongson64(and > > loongarch). I will set the nid for reserved page on loongson64, just like what loongarch did. > An arch may choose to set nids for reserved regions that are never added to > memblock.memory, but mm shouldn't crash if it didn't. > I agree. > > > That's true that only arch knows on which node those reserved pages are, > > > but core mm does not care about reserved pages that are not in memory. > > > > > > > > > int __meminit early_pfn_to_nid(unsigned long pfn) > > > > > > { > > > > > > static DEFINE_SPINLOCK(early_pfn_lock); > > > > > > int nid; > > > > > > > > > > > > spin_lock(&early_pfn_lock); > > > > > > nid = __early_pfn_to_nid(pfn, > > > > > > &early_pfnnid_cache); > > > > > > if (nid < 0) > > > > > > //!!!first_online_node MAY NOT be the node the pfn belong to!!! > > > > > > nid = first_online_node; > > > > > > > > > > > > spin_unlock(&early_pfn_lock); > > > > > > > > > > > > return > > > > > > nid; > > > > > > } > > > > > > > > > > > > > > > Okay, I don't think this bug is caused by commit 61167ad5fecd ("mm: pass nid > > > > > to reserve_bootmem_region()"), > > > > > > > > > > because even if you revert this commit, it will still get nid by > > > > > early_pfn_to_nid(). Did I get that right? > > > > > > > > Yes, more accurately, this bug is exposed by commit 61167ad5fecd. My > > > > previous fix is based on presumptions that memory_reserve should reserve memory > > > > added by memblock_add{,_node}, if going across this limitation, there need > > > > to set the valid nid for reserved memory region. > > > > > > > > > > > > > > > > > > > > > > In practical we run into uninitialized nid of reserved block problem, should > > > > > > > > > > we fix it > > > > > > > > > > in our usage, or on memblock side? > > > > > > > > > Apparently it's a bug in memblock :( > > > > > > > > > > > > > > > > > > If you revert 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") > > > > > > > > > does the issue disappear? > > > > > > > > Yes, I git bisect this commit. > > > > > > > > > > > > > > > > But I don't think it is a bug in memblock. IMO, memblock_reserve under > > > > > > > > NUMA set nid of reserved region to MAX_NUMNODES, which is the point > > > > > > > > that cause the "memblock_get_region_node from memmap_init_reserved_pages " > > > > > > > > passing a invalid node id(aka MAX_NUMNODES) to "reserver_bootmem_region > > > > > > > > -> init_reserved_page -> early_pfn_to_nid". If arch-specific code DOES NOT > > > > > > > > initialize the nid of reserved region(only it know that), or the reserved > > > > > > > > region NOT within a memblock added by memblock_add, memblock can not > > > > > > > > give a valid node id to the reserved region. Commit 61167ad5fecd ("mm: pass nid to > > > > > > > > reserve_bootmem_region()") just reveals the embarrassment case by an > > > > > > > > out of bound memory access. > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > 在 2023/12/25 09:30, Huang Pei 写道: > > > > > > > > > > > Since commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()), > > > > > > > > > > > loongson64 booting failed with CONFIG_DEFERRED_STRUCT_PAGE_INIT like > > > > > > > > > > > this: > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > > Call Trace: > > > > > > > > > > > [<ffffffff8235d088>] reserve_bootmem_region+0xa8/0x184 > > > > > > > > > > > [<ffffffff82333940>] memblock_free_all+0x104/0x2a8 > > > > > > > > > > > [<ffffffff8231d8e4>] mem_init+0x84/0x94 > > > > > > > > > > > [<ffffffff82330958>] mm_core_init+0xf8/0x308 > > > > > > > > > > > [<ffffffff82318c38>] start_kernel+0x43c/0x86c > > > > > > > > > > > > > > > > > > > > > > Code: 10400028 2402fff0 de420000 <dc432880> 0203182b 14600022 > > > > > > > > > > > 64420070 00003025 24040003 > > > > > > > > > > > > > > > > > > > > > > ---[ end trace 0000000000000000 ]--- > > > > > > > > > > > Kernel panic - not syncing: Attempted to kill the idle task! > > > > > > > > > > > ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > > > > > > > > > > > > > The root cause is no memory region "0x0-0x1fffff" paired with > > > > > > > > > > > memory-reserved region "0x0-0x1fffff" and "0x0-0xfff", with "memblock > > > > > > > > > > > =debug": > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > > memory[0x0] [0x0000000000200000-0x000000000effffff], > > > > > > > > > > > 0x000000000ee00000 bytes on node 0 flags: 0x0 !!!!here > > > > > > > > > > > memory[0x1] [0x0000000090000000-0x00000000fdffffff], > > > > > > > > > > > 0x000000006e000000 bytes on node 0 flags: 0x0 > > > > > > > > > > > memory[0x2] [0x0000000100000000-0x000000027fffffff], > > > > > > > > > > > 0x0000000180000000 bytes on node 0 flags: 0x0 > > > > > > > > > > > memory[0x3] [0x0000100000000000-0x000010000fffffff], > > > > > > > > > > > 0x0000000010000000 bytes on node 1 flags: 0x0 > > > > > > > > > > > memory[0x4] [0x0000100090000000-0x000010027fffffff], > > > > > > > > > > > 0x00000001f0000000 bytes on node 1 flags: 0x0 > > > > > > > > > > > reserved.cnt = 0x1f > > > > > > > > > > > reserved[0x0] [0x0000000000000000-0x000000000190c80a], > > > > > > > > > > > 0x000000000190c80b bytes flags: 0x0 !!!!oops 0x0-0x1fffff not in memory[0] > > > > > > > > > > > reserved[0x1] [0x000000000190c810-0x000000000190eea3], > > > > > > > > > > > 0x0000000000002694 bytes flags: 0x0 > > > > > > > > > > > ---------------------------------------------------------------------- > > > > > > > > > > > > > > > > > > > > > > It caused memory-reserved region "0x0-0x1fffff" without valid node id > > > > > > > > > > > in "memblock_get_region_node" from "memmap_init_reserved_pages", lead to > > > > > > > > > > > "reserve_bootmem_region-> init_reserved_page -> early_pfn_to_nid()" > > > > > > > > > > > accessing "node_data" out of bound. > > > > > > > > > > > > > > > > > > > > > > To fix this bug, we should remove unnecessary memory block reservation. > > > > > > > > > > > > > > > > > > > > > > +. no need to reserve 0x0-0x1fffff below kernel loading address, since > > > > > > > > > > > it is not registered by "memblock_add_node" > > > > > > > > > > > > > > > > > > > > > > +. no need to reserve 0x0-0xfff for exception handling if it is not > > > > > > > > > > > registered by "memblock_add" either. > > > > > > > > > > > > > > > > > > > > > > Fixes: commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()) > > > > > > > > > > > Signed-off-by: Huang Pei <huangpei@loongson.cn> > > > > > > > > > > > --- > > > > > > > > > > > arch/mips/kernel/traps.c | 3 ++- > > > > > > > > > > > arch/mips/loongson64/numa.c | 2 -- > > > > > > > > > > > 2 files changed, 2 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > > > > > > > > > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c > > > > > > > > > > > index 246c6a6b0261..9b632b4c10c3 100644 > > > > > > > > > > > --- a/arch/mips/kernel/traps.c > > > > > > > > > > > +++ b/arch/mips/kernel/traps.c > > > > > > > > > > > @@ -2007,7 +2007,8 @@ unsigned long vi_handlers[64]; > > > > > > > > > > > void reserve_exception_space(phys_addr_t addr, unsigned long size) > > > > > > > > > > > { > > > > > > > > > > > - memblock_reserve(addr, size); > > > > > > > > > > > + if(memblock_is_region_memory(addr, size)) > > > > > > > > > > > + memblock_reserve(addr, size); > > > > > > > > > > > } > > > > > > > > > > > void __init *set_except_vector(int n, void *addr) > > > > > > > > > > > diff --git a/arch/mips/loongson64/numa.c b/arch/mips/loongson64/numa.c > > > > > > > > > > > index 8f61e93c0c5b..0f516dde81da 100644 > > > > > > > > > > > --- a/arch/mips/loongson64/numa.c > > > > > > > > > > > +++ b/arch/mips/loongson64/numa.c > > > > > > > > > > > @@ -130,8 +130,6 @@ static void __init node_mem_init(unsigned int node) > > > > > > > > > > > memblock_reserve((node_addrspace_offset | 0xfe000000), > > > > > > > > > > > 32 << 20); > > > > > > > > > > > - /* Reserve pfn range 0~node[0]->node_start_pfn */ > > > > > > > > > > > - memblock_reserve(0, PAGE_SIZE * start_pfn); > > > > > > > > > > > } > > > > > > > > > > > } > > > > > > > > > > -- > > > > > > > > > > --- > > > > > > > > > > Jiaxun Yang > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Sincerely yours, > > > > > > > > > Mike. > > > > > > > > > > -- > > > Sincerely yours, > > > Mike. > > > > -- > Sincerely yours, > Mike.
Patch 1: Just reserve exception vector ONLY ONCE. Patch 2: workaround the boot failure. See [1] for the real fix [1]: https://lore.kernel.org/all/Zad3ynuuu8S-51eI@kernel.org
Hi, Thomas (Sorry, the Subject is not appropriate now, [1] fixed the bug) Patch 1: adjust comment. Patch 2: fix v3's bug. Test passed on 3B1500, with and without [1] [1]:https://lore.kernel.org/all/ZajIQGtqUNWgBWkk@kernel.org/
Patch 1: simplify the code, thanks Thomas. Patch 2: correct comment and commit message.
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c index 246c6a6b0261..9b632b4c10c3 100644 --- a/arch/mips/kernel/traps.c +++ b/arch/mips/kernel/traps.c @@ -2007,7 +2007,8 @@ unsigned long vi_handlers[64]; void reserve_exception_space(phys_addr_t addr, unsigned long size) { - memblock_reserve(addr, size); + if(memblock_is_region_memory(addr, size)) + memblock_reserve(addr, size); } void __init *set_except_vector(int n, void *addr) diff --git a/arch/mips/loongson64/numa.c b/arch/mips/loongson64/numa.c index 8f61e93c0c5b..0f516dde81da 100644 --- a/arch/mips/loongson64/numa.c +++ b/arch/mips/loongson64/numa.c @@ -130,8 +130,6 @@ static void __init node_mem_init(unsigned int node) memblock_reserve((node_addrspace_offset | 0xfe000000), 32 << 20); - /* Reserve pfn range 0~node[0]->node_start_pfn */ - memblock_reserve(0, PAGE_SIZE * start_pfn); } }
Since commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()), loongson64 booting failed with CONFIG_DEFERRED_STRUCT_PAGE_INIT like this: ---------------------------------------------------------------------- Call Trace: [<ffffffff8235d088>] reserve_bootmem_region+0xa8/0x184 [<ffffffff82333940>] memblock_free_all+0x104/0x2a8 [<ffffffff8231d8e4>] mem_init+0x84/0x94 [<ffffffff82330958>] mm_core_init+0xf8/0x308 [<ffffffff82318c38>] start_kernel+0x43c/0x86c Code: 10400028 2402fff0 de420000 <dc432880> 0203182b 14600022 64420070 00003025 24040003 ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Attempted to kill the idle task! ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- ---------------------------------------------------------------------- The root cause is no memory region "0x0-0x1fffff" paired with memory-reserved region "0x0-0x1fffff" and "0x0-0xfff", with "memblock =debug": ---------------------------------------------------------------------- memory[0x0] [0x0000000000200000-0x000000000effffff], 0x000000000ee00000 bytes on node 0 flags: 0x0 !!!!here memory[0x1] [0x0000000090000000-0x00000000fdffffff], 0x000000006e000000 bytes on node 0 flags: 0x0 memory[0x2] [0x0000000100000000-0x000000027fffffff], 0x0000000180000000 bytes on node 0 flags: 0x0 memory[0x3] [0x0000100000000000-0x000010000fffffff], 0x0000000010000000 bytes on node 1 flags: 0x0 memory[0x4] [0x0000100090000000-0x000010027fffffff], 0x00000001f0000000 bytes on node 1 flags: 0x0 reserved.cnt = 0x1f reserved[0x0] [0x0000000000000000-0x000000000190c80a], 0x000000000190c80b bytes flags: 0x0 !!!!oops 0x0-0x1fffff not in memory[0] reserved[0x1] [0x000000000190c810-0x000000000190eea3], 0x0000000000002694 bytes flags: 0x0 ---------------------------------------------------------------------- It caused memory-reserved region "0x0-0x1fffff" without valid node id in "memblock_get_region_node" from "memmap_init_reserved_pages", lead to "reserve_bootmem_region-> init_reserved_page -> early_pfn_to_nid()" accessing "node_data" out of bound. To fix this bug, we should remove unnecessary memory block reservation. +. no need to reserve 0x0-0x1fffff below kernel loading address, since it is not registered by "memblock_add_node" +. no need to reserve 0x0-0xfff for exception handling if it is not registered by "memblock_add" either. Fixes: commit 61167ad5fecd("mm: pass nid to reserve_bootmem_region()) Signed-off-by: Huang Pei <huangpei@loongson.cn> --- arch/mips/kernel/traps.c | 3 ++- arch/mips/loongson64/numa.c | 2 -- 2 files changed, 2 insertions(+), 3 deletions(-)