mbox series

[v2,0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA

Message ID 1562887528-5896-1-git-send-email-Hoan@os.amperecomputing.com (mailing list archive)
Headers show
Series mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA | expand

Message

Hoan Tran July 11, 2019, 11:25 p.m. UTC
In NUMA layout which nodes have memory ranges that span across other nodes,
the mm driver can detect the memory node id incorrectly.

For example, with layout below
Node 0 address: 0000 xxxx 0000 xxxx
Node 1 address: xxxx 1111 xxxx 1111

Note:
 - Memory from low to high
 - 0/1: Node id
 - x: Invalid memory of a node

When mm probes the memory map, without CONFIG_NODES_SPAN_OTHER_NODES
config, mm only checks the memory validity but not the node id.
Because of that, Node 1 also detects the memory from node 0 as below
when it scans from the start address to the end address of node 1.

Node 0 address: 0000 xxxx xxxx xxxx
Node 1 address: xxxx 1111 1111 1111

This layout could occur on any architecture. This patch enables
CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA to fix this issue.

V2:
 * Revise the patch description

Hoan Tran (5):
  mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA
  powerpc: Kconfig: Remove CONFIG_NODES_SPAN_OTHER_NODES
  x86: Kconfig: Remove CONFIG_NODES_SPAN_OTHER_NODES
  sparc: Kconfig: Remove CONFIG_NODES_SPAN_OTHER_NODES
  s390: Kconfig: Remove CONFIG_NODES_SPAN_OTHER_NODES

 arch/powerpc/Kconfig | 9 ---------
 arch/s390/Kconfig    | 8 --------
 arch/sparc/Kconfig   | 9 ---------
 arch/x86/Kconfig     | 9 ---------
 mm/page_alloc.c      | 2 +-
 5 files changed, 1 insertion(+), 36 deletions(-)

Comments

Matthew Wilcox July 12, 2019, 2:45 a.m. UTC | #1
On Thu, Jul 11, 2019 at 11:25:44PM +0000, Hoan Tran OS wrote:
> In NUMA layout which nodes have memory ranges that span across other nodes,
> the mm driver can detect the memory node id incorrectly.
> 
> For example, with layout below
> Node 0 address: 0000 xxxx 0000 xxxx
> Node 1 address: xxxx 1111 xxxx 1111
> 
> Note:
>  - Memory from low to high
>  - 0/1: Node id
>  - x: Invalid memory of a node
> 
> When mm probes the memory map, without CONFIG_NODES_SPAN_OTHER_NODES
> config, mm only checks the memory validity but not the node id.
> Because of that, Node 1 also detects the memory from node 0 as below
> when it scans from the start address to the end address of node 1.
> 
> Node 0 address: 0000 xxxx xxxx xxxx
> Node 1 address: xxxx 1111 1111 1111
> 
> This layout could occur on any architecture. This patch enables
> CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA to fix this issue.

How do you know it could occur on any architecture?  Surely you should
just enable this for the architecture where you've noticed the problem.
Michal Hocko July 12, 2019, 7:02 a.m. UTC | #2
On Thu 11-07-19 23:25:44, Hoan Tran OS wrote:
> In NUMA layout which nodes have memory ranges that span across other nodes,
> the mm driver can detect the memory node id incorrectly.
> 
> For example, with layout below
> Node 0 address: 0000 xxxx 0000 xxxx
> Node 1 address: xxxx 1111 xxxx 1111
> 
> Note:
>  - Memory from low to high
>  - 0/1: Node id
>  - x: Invalid memory of a node
> 
> When mm probes the memory map, without CONFIG_NODES_SPAN_OTHER_NODES
> config, mm only checks the memory validity but not the node id.
> Because of that, Node 1 also detects the memory from node 0 as below
> when it scans from the start address to the end address of node 1.
> 
> Node 0 address: 0000 xxxx xxxx xxxx
> Node 1 address: xxxx 1111 1111 1111
> 
> This layout could occur on any architecture. This patch enables
> CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA to fix this issue.

Yes it can occur on any arch but most sane platforms simply do not
overlap physical ranges. So I do not really see any reason to
unconditionally enable the config for everybody. What is an advantage?
Hoan Tran July 12, 2019, 10:56 a.m. UTC | #3
Hi,

On 7/12/19 2:02 PM, Michal Hocko wrote:
> On Thu 11-07-19 23:25:44, Hoan Tran OS wrote:
>> In NUMA layout which nodes have memory ranges that span across other nodes,
>> the mm driver can detect the memory node id incorrectly.
>>
>> For example, with layout below
>> Node 0 address: 0000 xxxx 0000 xxxx
>> Node 1 address: xxxx 1111 xxxx 1111
>>
>> Note:
>>   - Memory from low to high
>>   - 0/1: Node id
>>   - x: Invalid memory of a node
>>
>> When mm probes the memory map, without CONFIG_NODES_SPAN_OTHER_NODES
>> config, mm only checks the memory validity but not the node id.
>> Because of that, Node 1 also detects the memory from node 0 as below
>> when it scans from the start address to the end address of node 1.
>>
>> Node 0 address: 0000 xxxx xxxx xxxx
>> Node 1 address: xxxx 1111 1111 1111
>>
>> This layout could occur on any architecture. This patch enables
>> CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA to fix this issue.
> 
> Yes it can occur on any arch but most sane platforms simply do not
> overlap physical ranges. So I do not really see any reason to
> unconditionally enable the config for everybody. What is an advantage?
> 

As I observed from arch folder, there are 9 arch support NUMA config.

./arch/ia64/Kconfig:387:config NUMA
./arch/powerpc/Kconfig:582:config NUMA
./arch/sparc/Kconfig:281:config NUMA
./arch/alpha/Kconfig:557:config NUMA
./arch/sh/mm/Kconfig:112:config NUMA
./arch/arm64/Kconfig:841:config NUMA
./arch/x86/Kconfig:1531:config NUMA
./arch/mips/Kconfig:2646:config NUMA
./arch/s390/Kconfig:441:config NUMA

And there are 5 arch enables CONFIG_NODES_SPAN_OTHER_NODES with NUMA

arch/powerpc/Kconfig:637:config NODES_SPAN_OTHER_NODES
arch/sparc/Kconfig:299:config NODES_SPAN_OTHER_NODES
arch/x86/Kconfig:1575:config NODES_SPAN_OTHER_NODES
arch/s390/Kconfig:446:config NODES_SPAN_OTHER_NODES
arch/arm64 (which I intended to enable in the original patch)

It would be good if we can enable it by-default. Otherwise, let arch 
enables it by them-self. Do you have any suggestions?

Thanks
Hoan
Michal Hocko July 12, 2019, 12:12 p.m. UTC | #4
On Fri 12-07-19 10:56:47, Hoan Tran OS wrote:
[...]
> It would be good if we can enable it by-default. Otherwise, let arch 
> enables it by them-self. Do you have any suggestions?

I can hardly make any suggestions when it is not really clear _why_ you
want to remove this config option in the first place. Please explain
what motivated you to make this change.
Will Deacon July 12, 2019, 2:37 p.m. UTC | #5
Hi all,

On Fri, Jul 12, 2019 at 02:12:23PM +0200, Michal Hocko wrote:
> On Fri 12-07-19 10:56:47, Hoan Tran OS wrote:
> [...]
> > It would be good if we can enable it by-default. Otherwise, let arch 
> > enables it by them-self. Do you have any suggestions?
> 
> I can hardly make any suggestions when it is not really clear _why_ you
> want to remove this config option in the first place. Please explain
> what motivated you to make this change.

Sorry, I think this confusion might actually be my fault and Hoan has just
been implementing my vague suggestion here:

https://lore.kernel.org/linux-arm-kernel/20190625101245.s4vxfosoop52gl4e@willie-the-truck/

If the preference of the mm folks is to leave CONFIG_NODES_SPAN_OTHER_NODES
as it is, then we can define it for arm64. I just find it a bit weird that
the majority of NUMA-capable architectures have to add a symbol in the arch
Kconfig file, for what appears to be a performance optimisation applicable
only to ia64, mips and sh.

At the very least we could make the thing selectable.

Will
Michal Hocko July 12, 2019, 3 p.m. UTC | #6
On Fri 12-07-19 15:37:30, Will Deacon wrote:
> Hi all,
> 
> On Fri, Jul 12, 2019 at 02:12:23PM +0200, Michal Hocko wrote:
> > On Fri 12-07-19 10:56:47, Hoan Tran OS wrote:
> > [...]
> > > It would be good if we can enable it by-default. Otherwise, let arch 
> > > enables it by them-self. Do you have any suggestions?
> > 
> > I can hardly make any suggestions when it is not really clear _why_ you
> > want to remove this config option in the first place. Please explain
> > what motivated you to make this change.
> 
> Sorry, I think this confusion might actually be my fault and Hoan has just
> been implementing my vague suggestion here:
> 
> https://lore.kernel.org/linux-arm-kernel/20190625101245.s4vxfosoop52gl4e@willie-the-truck/
> 
> If the preference of the mm folks is to leave CONFIG_NODES_SPAN_OTHER_NODES
> as it is, then we can define it for arm64. I just find it a bit weird that
> the majority of NUMA-capable architectures have to add a symbol in the arch
> Kconfig file, for what appears to be a performance optimisation applicable
> only to ia64, mips and sh.
> 
> At the very least we could make the thing selectable.

Hmm, I thought this was selectable. But I am obviously wrong here.
Looking more closely, it seems that this is indeed only about
__early_pfn_to_nid and as such not something that should add a config
symbol. This should have been called out in the changelog though.

Also while at it, does HAVE_MEMBLOCK_NODE_MAP fall into a similar
bucket? Do we have any NUMA architecture that doesn't enable it?

Thanks!
Hoan Tran July 15, 2019, 5:55 p.m. UTC | #7
Hi,

On 7/12/19 10:00 PM, Michal Hocko wrote:
> On Fri 12-07-19 15:37:30, Will Deacon wrote:
>> Hi all,
>>
>> On Fri, Jul 12, 2019 at 02:12:23PM +0200, Michal Hocko wrote:
>>> On Fri 12-07-19 10:56:47, Hoan Tran OS wrote:
>>> [...]
>>>> It would be good if we can enable it by-default. Otherwise, let arch
>>>> enables it by them-self. Do you have any suggestions?
>>>
>>> I can hardly make any suggestions when it is not really clear _why_ you
>>> want to remove this config option in the first place. Please explain
>>> what motivated you to make this change.
>>
>> Sorry, I think this confusion might actually be my fault and Hoan has just
>> been implementing my vague suggestion here:
>>
>> https://lore.kernel.org/linux-arm-kernel/20190625101245.s4vxfosoop52gl4e@willie-the-truck/
>>
>> If the preference of the mm folks is to leave CONFIG_NODES_SPAN_OTHER_NODES
>> as it is, then we can define it for arm64. I just find it a bit weird that
>> the majority of NUMA-capable architectures have to add a symbol in the arch
>> Kconfig file, for what appears to be a performance optimisation applicable
>> only to ia64, mips and sh.
>>
>> At the very least we could make the thing selectable.
> 
> Hmm, I thought this was selectable. But I am obviously wrong here.
> Looking more closely, it seems that this is indeed only about
> __early_pfn_to_nid and as such not something that should add a config
> symbol. This should have been called out in the changelog though.

Yes, do you have any other comments about my patch?

> 
> Also while at it, does HAVE_MEMBLOCK_NODE_MAP fall into a similar
> bucket? Do we have any NUMA architecture that doesn't enable it?
> 

As I checked with arch Kconfig files, there are 2 architectures, riscv 
and microblaze, do not support NUMA but enable this config.

And 1 architecture, alpha, supports NUMA but does not enable this config.

Thanks and Regards
Hoan

> Thanks!
>
Michal Hocko July 30, 2019, 8:14 a.m. UTC | #8
[Sorry for a late reply]

On Mon 15-07-19 17:55:07, Hoan Tran OS wrote:
> Hi,
> 
> On 7/12/19 10:00 PM, Michal Hocko wrote:
[...]
> > Hmm, I thought this was selectable. But I am obviously wrong here.
> > Looking more closely, it seems that this is indeed only about
> > __early_pfn_to_nid and as such not something that should add a config
> > symbol. This should have been called out in the changelog though.
> 
> Yes, do you have any other comments about my patch?

Not really. Just make sure to explicitly state that
CONFIG_NODES_SPAN_OTHER_NODES is only about __early_pfn_to_nid and that
doesn't really deserve it's own config and can be pulled under NUMA.

> > Also while at it, does HAVE_MEMBLOCK_NODE_MAP fall into a similar
> > bucket? Do we have any NUMA architecture that doesn't enable it?
> > 
> 
> As I checked with arch Kconfig files, there are 2 architectures, riscv 
> and microblaze, do not support NUMA but enable this config.
> 
> And 1 architecture, alpha, supports NUMA but does not enable this config.

Care to have a look and clean this up please?
Hoan Tran July 30, 2019, 5:31 p.m. UTC | #9
Hi,

On 7/30/19 1:14 AM, Michal Hocko wrote:
> [Sorry for a late reply]
> 
> On Mon 15-07-19 17:55:07, Hoan Tran OS wrote:
>> Hi,
>>
>> On 7/12/19 10:00 PM, Michal Hocko wrote:
> [...]
>>> Hmm, I thought this was selectable. But I am obviously wrong here.
>>> Looking more closely, it seems that this is indeed only about
>>> __early_pfn_to_nid and as such not something that should add a config
>>> symbol. This should have been called out in the changelog though.
>>
>> Yes, do you have any other comments about my patch?
> 
> Not really. Just make sure to explicitly state that
> CONFIG_NODES_SPAN_OTHER_NODES is only about __early_pfn_to_nid and that
> doesn't really deserve it's own config and can be pulled under NUMA.

Yes, I will add this info into the patch description.

> 
>>> Also while at it, does HAVE_MEMBLOCK_NODE_MAP fall into a similar
>>> bucket? Do we have any NUMA architecture that doesn't enable it?
>>>
>>
>> As I checked with arch Kconfig files, there are 2 architectures, riscv
>> and microblaze, do not support NUMA but enable this config.
>>
>> And 1 architecture, alpha, supports NUMA but does not enable this config.
> 
> Care to have a look and clean this up please?

Sure, I'll take a look.

Thanks
Hoan
>
Mike Rapoport July 31, 2019, 6:24 a.m. UTC | #10
[ sorry for a late reply too, somehow I missed this thread before ]

On Tue, Jul 30, 2019 at 10:14:15AM +0200, Michal Hocko wrote:
> [Sorry for a late reply]
> 
> On Mon 15-07-19 17:55:07, Hoan Tran OS wrote:
> > Hi,
> > 
> > On 7/12/19 10:00 PM, Michal Hocko wrote:
> [...]
> > > Hmm, I thought this was selectable. But I am obviously wrong here.
> > > Looking more closely, it seems that this is indeed only about
> > > __early_pfn_to_nid and as such not something that should add a config
> > > symbol. This should have been called out in the changelog though.
> > 
> > Yes, do you have any other comments about my patch?
> 
> Not really. Just make sure to explicitly state that
> CONFIG_NODES_SPAN_OTHER_NODES is only about __early_pfn_to_nid and that
> doesn't really deserve it's own config and can be pulled under NUMA.
> 
> > > Also while at it, does HAVE_MEMBLOCK_NODE_MAP fall into a similar
> > > bucket? Do we have any NUMA architecture that doesn't enable it?
> > > 

HAVE_MEMBLOCK_NODE_MAP makes huge difference in node/zone initialization
sequence so it's not only about a singe function.

> > As I checked with arch Kconfig files, there are 2 architectures, riscv 
> > and microblaze, do not support NUMA but enable this config.

My take would be that riscv will support NUMA some day.
 
> > And 1 architecture, alpha, supports NUMA but does not enable this config.

alpha's NUMA support is BROKEN for more than a decade now, I doubt it'll
ever get fixed.
 
> Care to have a look and clean this up please?
> 
> -- 
> Michal Hocko
> SUSE Labs
>
Michal Hocko July 31, 2019, 8:03 a.m. UTC | #11
On Wed 31-07-19 09:24:21, Mike Rapoport wrote:
> [ sorry for a late reply too, somehow I missed this thread before ]
> 
> On Tue, Jul 30, 2019 at 10:14:15AM +0200, Michal Hocko wrote:
> > [Sorry for a late reply]
> > 
> > On Mon 15-07-19 17:55:07, Hoan Tran OS wrote:
> > > Hi,
> > > 
> > > On 7/12/19 10:00 PM, Michal Hocko wrote:
> > [...]
> > > > Hmm, I thought this was selectable. But I am obviously wrong here.
> > > > Looking more closely, it seems that this is indeed only about
> > > > __early_pfn_to_nid and as such not something that should add a config
> > > > symbol. This should have been called out in the changelog though.
> > > 
> > > Yes, do you have any other comments about my patch?
> > 
> > Not really. Just make sure to explicitly state that
> > CONFIG_NODES_SPAN_OTHER_NODES is only about __early_pfn_to_nid and that
> > doesn't really deserve it's own config and can be pulled under NUMA.
> > 
> > > > Also while at it, does HAVE_MEMBLOCK_NODE_MAP fall into a similar
> > > > bucket? Do we have any NUMA architecture that doesn't enable it?
> > > > 
> 
> HAVE_MEMBLOCK_NODE_MAP makes huge difference in node/zone initialization
> sequence so it's not only about a singe function.

The question is whether we want to have this a config option or enable
it unconditionally for each NUMA system.

> > > As I checked with arch Kconfig files, there are 2 architectures, riscv 
> > > and microblaze, do not support NUMA but enable this config.
> 
> My take would be that riscv will support NUMA some day.
>  
> > > And 1 architecture, alpha, supports NUMA but does not enable this config.
> 
> alpha's NUMA support is BROKEN for more than a decade now, I doubt it'll
> ever get fixed.

I can see Al has marked it BROKEN in 2005. Maybe time to rip it out?
Although it doesn't seem to be a lot of code in arch/alpha at first
glance so maybe not worth an effort.
Mike Rapoport July 31, 2019, 11:14 a.m. UTC | #12
On Wed, Jul 31, 2019 at 10:03:09AM +0200, Michal Hocko wrote:
> On Wed 31-07-19 09:24:21, Mike Rapoport wrote:
> > [ sorry for a late reply too, somehow I missed this thread before ]
> > 
> > On Tue, Jul 30, 2019 at 10:14:15AM +0200, Michal Hocko wrote:
> > > [Sorry for a late reply]
> > > 
> > > On Mon 15-07-19 17:55:07, Hoan Tran OS wrote:
> > > > Hi,
> > > > 
> > > > On 7/12/19 10:00 PM, Michal Hocko wrote:
> > > [...]
> > > > > Hmm, I thought this was selectable. But I am obviously wrong here.
> > > > > Looking more closely, it seems that this is indeed only about
> > > > > __early_pfn_to_nid and as such not something that should add a config
> > > > > symbol. This should have been called out in the changelog though.
> > > > 
> > > > Yes, do you have any other comments about my patch?
> > > 
> > > Not really. Just make sure to explicitly state that
> > > CONFIG_NODES_SPAN_OTHER_NODES is only about __early_pfn_to_nid and that
> > > doesn't really deserve it's own config and can be pulled under NUMA.
> > > 
> > > > > Also while at it, does HAVE_MEMBLOCK_NODE_MAP fall into a similar
> > > > > bucket? Do we have any NUMA architecture that doesn't enable it?
> > > > > 
> > 
> > HAVE_MEMBLOCK_NODE_MAP makes huge difference in node/zone initialization
> > sequence so it's not only about a singe function.
> 
> The question is whether we want to have this a config option or enable
> it unconditionally for each NUMA system.

We can make it 'default NUMA', but we can't drop it completely because
microblaze uses sparse_memory_present_with_active_regions() which is
unavailable when HAVE_MEMBLOCK_NODE_MAP=n.

> > > > As I checked with arch Kconfig files, there are 2 architectures, riscv 
> > > > and microblaze, do not support NUMA but enable this config.
> > 
> > My take would be that riscv will support NUMA some day.
> >  
> > > > And 1 architecture, alpha, supports NUMA but does not enable this config.
> > 
> > alpha's NUMA support is BROKEN for more than a decade now, I doubt it'll
> > ever get fixed.
> 
> I can see Al has marked it BROKEN in 2005. Maybe time to rip it out?
> Although it doesn't seem to be a lot of code in arch/alpha at first
> glance so maybe not worth an effort.
> -- 
> Michal Hocko
> SUSE Labs
>
Michal Hocko July 31, 2019, 11:40 a.m. UTC | #13
On Wed 31-07-19 14:14:22, Mike Rapoport wrote:
> On Wed, Jul 31, 2019 at 10:03:09AM +0200, Michal Hocko wrote:
> > On Wed 31-07-19 09:24:21, Mike Rapoport wrote:
> > > [ sorry for a late reply too, somehow I missed this thread before ]
> > > 
> > > On Tue, Jul 30, 2019 at 10:14:15AM +0200, Michal Hocko wrote:
> > > > [Sorry for a late reply]
> > > > 
> > > > On Mon 15-07-19 17:55:07, Hoan Tran OS wrote:
> > > > > Hi,
> > > > > 
> > > > > On 7/12/19 10:00 PM, Michal Hocko wrote:
> > > > [...]
> > > > > > Hmm, I thought this was selectable. But I am obviously wrong here.
> > > > > > Looking more closely, it seems that this is indeed only about
> > > > > > __early_pfn_to_nid and as such not something that should add a config
> > > > > > symbol. This should have been called out in the changelog though.
> > > > > 
> > > > > Yes, do you have any other comments about my patch?
> > > > 
> > > > Not really. Just make sure to explicitly state that
> > > > CONFIG_NODES_SPAN_OTHER_NODES is only about __early_pfn_to_nid and that
> > > > doesn't really deserve it's own config and can be pulled under NUMA.
> > > > 
> > > > > > Also while at it, does HAVE_MEMBLOCK_NODE_MAP fall into a similar
> > > > > > bucket? Do we have any NUMA architecture that doesn't enable it?
> > > > > > 
> > > 
> > > HAVE_MEMBLOCK_NODE_MAP makes huge difference in node/zone initialization
> > > sequence so it's not only about a singe function.
> > 
> > The question is whether we want to have this a config option or enable
> > it unconditionally for each NUMA system.
> 
> We can make it 'default NUMA', but we can't drop it completely because
> microblaze uses sparse_memory_present_with_active_regions() which is
> unavailable when HAVE_MEMBLOCK_NODE_MAP=n.

I suppose you mean that microblaze is using
sparse_memory_present_with_active_regions even without CONFIG_NUMA,
right? I have to confess I do not understand that code. What is the deal
with setting node id there?
Mike Rapoport July 31, 2019, 12:26 p.m. UTC | #14
On Wed, Jul 31, 2019 at 01:40:16PM +0200, Michal Hocko wrote:
> On Wed 31-07-19 14:14:22, Mike Rapoport wrote:
> > On Wed, Jul 31, 2019 at 10:03:09AM +0200, Michal Hocko wrote:
> > > On Wed 31-07-19 09:24:21, Mike Rapoport wrote:
> > > > [ sorry for a late reply too, somehow I missed this thread before ]
> > > > 
> > > > On Tue, Jul 30, 2019 at 10:14:15AM +0200, Michal Hocko wrote:
> > > > > [Sorry for a late reply]
> > > > > 
> > > > > On Mon 15-07-19 17:55:07, Hoan Tran OS wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > On 7/12/19 10:00 PM, Michal Hocko wrote:
> > > > > [...]
> > > > > > > Hmm, I thought this was selectable. But I am obviously wrong here.
> > > > > > > Looking more closely, it seems that this is indeed only about
> > > > > > > __early_pfn_to_nid and as such not something that should add a config
> > > > > > > symbol. This should have been called out in the changelog though.
> > > > > > 
> > > > > > Yes, do you have any other comments about my patch?
> > > > > 
> > > > > Not really. Just make sure to explicitly state that
> > > > > CONFIG_NODES_SPAN_OTHER_NODES is only about __early_pfn_to_nid and that
> > > > > doesn't really deserve it's own config and can be pulled under NUMA.
> > > > > 
> > > > > > > Also while at it, does HAVE_MEMBLOCK_NODE_MAP fall into a similar
> > > > > > > bucket? Do we have any NUMA architecture that doesn't enable it?
> > > > > > > 
> > > > 
> > > > HAVE_MEMBLOCK_NODE_MAP makes huge difference in node/zone initialization
> > > > sequence so it's not only about a singe function.
> > > 
> > > The question is whether we want to have this a config option or enable
> > > it unconditionally for each NUMA system.
> > 
> > We can make it 'default NUMA', but we can't drop it completely because
> > microblaze uses sparse_memory_present_with_active_regions() which is
> > unavailable when HAVE_MEMBLOCK_NODE_MAP=n.
> 
> I suppose you mean that microblaze is using
> sparse_memory_present_with_active_regions even without CONFIG_NUMA,
> right?

Yes.

> I have to confess I do not understand that code. What is the deal
> with setting node id there?

The sparse_memory_present_with_active_regions() iterates over
memblock.memory regions and uses the node id of each region as the
parameter to memory_present(). The assumption here is that sometime before
each region was assigned a proper non-negative node id. 

microblaze uses device tree for memory enumeration and the current FDT code
does memblock_add() that implicitly sets nid in memblock.memory regions to -1.

So in order to have proper node id passed to memory_present() microblaze
has to call memblock_set_node() before it can use
sparse_memory_present_with_active_regions().

> -- 
> Michal Hocko
> SUSE Labs