diff mbox series

arm64: NUMA: Kconfig: Increase max number of nodes

Message ID 20201020173409.1266576-1-vanshikonda@os.amperecomputing.com (mailing list archive)
State New, archived
Headers show
Series arm64: NUMA: Kconfig: Increase max number of nodes | expand

Commit Message

Vanshidhar Konda Oct. 20, 2020, 5:34 p.m. UTC
The current arm64 max NUMA nodes default to 4. Today's arm64 systems can
reach or exceed 16. Increase the number to 64 (matching x86_64).

Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
---
 arch/arm64/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Valentin Schneider Oct. 20, 2020, 6:09 p.m. UTC | #1
Hi,

Nit on the subject: this only increases the default, the max is still 2¹⁰.

On 20/10/20 18:34, Vanshidhar Konda wrote:
> The current arm64 max NUMA nodes default to 4. Today's arm64 systems can
> reach or exceed 16. Increase the number to 64 (matching x86_64).
>
> Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
> ---
>  arch/arm64/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 893130ce1626..3e69d3c981be 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -980,7 +980,7 @@ config NUMA
>  config NODES_SHIFT
>       int "Maximum NUMA Nodes (as a power of 2)"
>       range 1 10
> -	default "2"
> +	default "6"

This leads to more statically allocated memory for things like node to CPU
maps (see uses of MAX_NUMNODES), but that shouldn't be too much of an
issue.

AIUI this also directly correlates to how many more page->flags bits are
required: are we sure the max 10 works on any aarch64 platform? I'm
genuinely asking here, given that I'm mostly a stranger to the mm
world. The default should be something we're somewhat confident works
everywhere.

>       depends on NEED_MULTIPLE_NODES
>       help
>         Specify the maximum number of NUMA Nodes available on the target
Anshuman Khandual Oct. 21, 2020, 4:13 a.m. UTC | #2
On 10/20/2020 11:39 PM, Valentin Schneider wrote:
> 
> Hi,
> 
> Nit on the subject: this only increases the default, the max is still 2¹⁰.

Agreed.

> 
> On 20/10/20 18:34, Vanshidhar Konda wrote:
>> The current arm64 max NUMA nodes default to 4. Today's arm64 systems can
>> reach or exceed 16. Increase the number to 64 (matching x86_64).
>>
>> Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
>> ---
>>  arch/arm64/Kconfig | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 893130ce1626..3e69d3c981be 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -980,7 +980,7 @@ config NUMA
>>  config NODES_SHIFT
>>       int "Maximum NUMA Nodes (as a power of 2)"
>>       range 1 10
>> -	default "2"
>> +	default "6"
> 
> This leads to more statically allocated memory for things like node to CPU
> maps (see uses of MAX_NUMNODES), but that shouldn't be too much of an
> issue.

The smaller systems should not be required to waste those memory in
a default case, unless there is a real and available larger system
with those increased nodes.

> 
> AIUI this also directly correlates to how many more page->flags bits are
> required: are we sure the max 10 works on any aarch64 platform? I'm

We will have to test that. Besides 256 (2 ^ 8) is the first threshold
to be crossed here.

> genuinely asking here, given that I'm mostly a stranger to the mm
> world. The default should be something we're somewhat confident works
> everywhere.

Agreed. Do we really need to match X86 right now ? Do we really have
systems that has 64 nodes ? We should not increase the default node
value and then try to solve some new problems, when there might not
be any system which could even use that. I would suggest increase
NODES_SHIFT value upto as required by a real and available system.

> 
>>       depends on NEED_MULTIPLE_NODES
>>       help
>>         Specify the maximum number of NUMA Nodes available on the target
>
Jonathan Cameron Oct. 21, 2020, 11:02 a.m. UTC | #3
On Wed, 21 Oct 2020 09:43:21 +0530
Anshuman Khandual <anshuman.khandual@arm.com> wrote:

> On 10/20/2020 11:39 PM, Valentin Schneider wrote:
> > 
> > Hi,
> > 
> > Nit on the subject: this only increases the default, the max is still 2¹⁰.  
> 
> Agreed.
> 
> > 
> > On 20/10/20 18:34, Vanshidhar Konda wrote:  
> >> The current arm64 max NUMA nodes default to 4. Today's arm64 systems can
> >> reach or exceed 16. Increase the number to 64 (matching x86_64).
> >>
> >> Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
> >> ---
> >>  arch/arm64/Kconfig | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> >> index 893130ce1626..3e69d3c981be 100644
> >> --- a/arch/arm64/Kconfig
> >> +++ b/arch/arm64/Kconfig
> >> @@ -980,7 +980,7 @@ config NUMA
> >>  config NODES_SHIFT
> >>       int "Maximum NUMA Nodes (as a power of 2)"
> >>       range 1 10
> >> -	default "2"
> >> +	default "6"  
> > 
> > This leads to more statically allocated memory for things like node to CPU
> > maps (see uses of MAX_NUMNODES), but that shouldn't be too much of an
> > issue.  
> 
> The smaller systems should not be required to waste those memory in
> a default case, unless there is a real and available larger system
> with those increased nodes.
> 
> > 
> > AIUI this also directly correlates to how many more page->flags bits are
> > required: are we sure the max 10 works on any aarch64 platform? I'm  
> 
> We will have to test that. Besides 256 (2 ^ 8) is the first threshold
> to be crossed here.
> 
> > genuinely asking here, given that I'm mostly a stranger to the mm
> > world. The default should be something we're somewhat confident works
> > everywhere.  
> 
> Agreed. Do we really need to match X86 right now ? Do we really have
> systems that has 64 nodes ? We should not increase the default node
> value and then try to solve some new problems, when there might not
> be any system which could even use that. I would suggest increase
> NODES_SHIFT value upto as required by a real and available system.

I'm not going to give precise numbers on near future systems but it is public
that we ship 8 NUMA node ARM64 systems today.  Things will get more
interesting as CXL and CCIX enter the market on ARM systems,
given chances are every CXL device will look like another NUMA
node (CXL spec says they should be presented as such) and you
may be able to rack up lots of them.

So I'd argue minimum that makes sense today is 16 nodes, but looking forward
even a little and 64 is not a great stretch.
I'd make the jump to 64 so we can forget about this again for a year or two.
People will want to run today's distros on these new machines and we'd
rather not have to go around all the distros asking them to carry a patch
increasing this count (I assume they are already carrying such a patch
due to those 8 node systems)

Jonathan

> 
> >   
> >>       depends on NEED_MULTIPLE_NODES
> >>       help
> >>         Specify the maximum number of NUMA Nodes available on the target  
> >  
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Vanshidhar Konda Oct. 21, 2020, 4:02 p.m. UTC | #4
On Tue, Oct 20, 2020 at 07:09:36PM +0100, Valentin Schneider wrote:
>
>Hi,
>
>Nit on the subject: this only increases the default, the max is still 2?????.
>
>On 20/10/20 18:34, Vanshidhar Konda wrote:
>> The current arm64 max NUMA nodes default to 4. Today's arm64 systems can
>> reach or exceed 16. Increase the number to 64 (matching x86_64).
>>
>> Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
>> ---
>>  arch/arm64/Kconfig | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 893130ce1626..3e69d3c981be 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -980,7 +980,7 @@ config NUMA
>>  config NODES_SHIFT
>>       int "Maximum NUMA Nodes (as a power of 2)"
>>       range 1 10
>> -	default "2"
>> +	default "6"
>
>This leads to more statically allocated memory for things like node to CPU
>maps (see uses of MAX_NUMNODES), but that shouldn't be too much of an
>issue.
>
>AIUI this also directly correlates to how many more page->flags bits are
>required: are we sure the max 10 works on any aarch64 platform? I'm

I created an experimental setup in which I enabled 1024 NUMA nodes in 
SRAT, SLIT and configured NODES_SHIFT=10 for the kernel. 1022 of these 
nodes were memory-only NUMA nodes. This configuration booted and 
recognized the NUMA nodes correctly.

>genuinely asking here, given that I'm mostly a stranger to the mm
>world. The default should be something we're somewhat confident works
>everywhere.
>
>>       depends on NEED_MULTIPLE_NODES
>>       help
>>         Specify the maximum number of NUMA Nodes available on the target
Valentin Schneider Oct. 21, 2020, 10:29 p.m. UTC | #5
Hi,

On 21/10/20 12:02, Jonathan Cameron wrote:
> On Wed, 21 Oct 2020 09:43:21 +0530
> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>
>> Agreed. Do we really need to match X86 right now ? Do we really have
>> systems that has 64 nodes ? We should not increase the default node
>> value and then try to solve some new problems, when there might not
>> be any system which could even use that. I would suggest increase
>> NODES_SHIFT value upto as required by a real and available system.
>
> I'm not going to give precise numbers on near future systems but it is public
> that we ship 8 NUMA node ARM64 systems today.  Things will get more
> interesting as CXL and CCIX enter the market on ARM systems,
> given chances are every CXL device will look like another NUMA
> node (CXL spec says they should be presented as such) and you
> may be able to rack up lots of them.
>
> So I'd argue minimum that makes sense today is 16 nodes, but looking forward
> even a little and 64 is not a great stretch.
> I'd make the jump to 64 so we can forget about this again for a year or two.
> People will want to run today's distros on these new machines and we'd
> rather not have to go around all the distros asking them to carry a patch
> increasing this count (I assume they are already carrying such a patch
> due to those 8 node systems)
>

I agree that 4 nodes is somewhat anemic; I've had to bump that just to
run some scheduler tests under QEMU. However I still believe we should
exercise caution before cranking it too high, especially when seeing things
like:

  ee38d94a0ad8 ("page flags: prioritize kasan bits over last-cpuid")

To give some numbers, a defconfig build gives me:

  SECTIONS_WIDTH=0 ZONES_WIDTH=2 NODES_SHIFT=2 LAST_CPUPID_SHIFT=(8+8) KASAN_TAG_WIDTH=0
  BITS_PER_LONG=64 NR_PAGEFLAGS=24

IOW, we need 18 + NODES_SHIFT <= 40 -> NODES_SHIFT <= 22. That looks to be
plenty, however this can get cramped fairly easily with any combination of:

  CONFIG_SPARSEMEM_VMEMMAP=n (-18)
  CONFIG_IDLE_PAGE_TRACKING=y (-2)
  CONFIG_KASAN=y + CONFIG_KASAN_SW_TAGS (-8)

Taking Arnd's above example, a randconfig build picking !VMEMMAP already
limits the NODES_SHIFT to 4 *if* we want to keep the CPUPID thing within
the flags (it gets a dedicated field at the tail of struct page
otherwise). If that is something we don't care too much about, then
consider my concerns taken care of.


One more thing though: NR_CPUS can be cranked up to 4096 but we've only set
it to 256 IIRC to support the TX2. From that PoV, I'm agreeing with
Anshuman in that we should set it to match the max encountered on platforms
that are in use right now.

> Jonathan
>
>>
>> >
>> >>       depends on NEED_MULTIPLE_NODES
>> >>       help
>> >>         Specify the maximum number of NUMA Nodes available on the target
>> >
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Robin Murphy Oct. 21, 2020, 11:44 p.m. UTC | #6
On 2020-10-21 12:02, Jonathan Cameron wrote:
> On Wed, 21 Oct 2020 09:43:21 +0530
> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
> 
>> On 10/20/2020 11:39 PM, Valentin Schneider wrote:
>>>
>>> Hi,
>>>
>>> Nit on the subject: this only increases the default, the max is still 2¹⁰.
>>
>> Agreed.
>>
>>>
>>> On 20/10/20 18:34, Vanshidhar Konda wrote:
>>>> The current arm64 max NUMA nodes default to 4. Today's arm64 systems can
>>>> reach or exceed 16. Increase the number to 64 (matching x86_64).
>>>>
>>>> Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
>>>> ---
>>>>   arch/arm64/Kconfig | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>> index 893130ce1626..3e69d3c981be 100644
>>>> --- a/arch/arm64/Kconfig
>>>> +++ b/arch/arm64/Kconfig
>>>> @@ -980,7 +980,7 @@ config NUMA
>>>>   config NODES_SHIFT
>>>>        int "Maximum NUMA Nodes (as a power of 2)"
>>>>        range 1 10
>>>> -	default "2"
>>>> +	default "6"
>>>
>>> This leads to more statically allocated memory for things like node to CPU
>>> maps (see uses of MAX_NUMNODES), but that shouldn't be too much of an
>>> issue.
>>
>> The smaller systems should not be required to waste those memory in
>> a default case, unless there is a real and available larger system
>> with those increased nodes.
>>
>>>
>>> AIUI this also directly correlates to how many more page->flags bits are
>>> required: are we sure the max 10 works on any aarch64 platform? I'm
>>
>> We will have to test that. Besides 256 (2 ^ 8) is the first threshold
>> to be crossed here.
>>
>>> genuinely asking here, given that I'm mostly a stranger to the mm
>>> world. The default should be something we're somewhat confident works
>>> everywhere.
>>
>> Agreed. Do we really need to match X86 right now ? Do we really have
>> systems that has 64 nodes ? We should not increase the default node
>> value and then try to solve some new problems, when there might not
>> be any system which could even use that. I would suggest increase
>> NODES_SHIFT value upto as required by a real and available system.
> 
> I'm not going to give precise numbers on near future systems but it is public
> that we ship 8 NUMA node ARM64 systems today.  Things will get more
> interesting as CXL and CCIX enter the market on ARM systems,
> given chances are every CXL device will look like another NUMA
> node (CXL spec says they should be presented as such) and you
> may be able to rack up lots of them.
> 
> So I'd argue minimum that makes sense today is 16 nodes, but looking forward
> even a little and 64 is not a great stretch.
> I'd make the jump to 64 so we can forget about this again for a year or two.
> People will want to run today's distros on these new machines and we'd
> rather not have to go around all the distros asking them to carry a patch
> increasing this count (I assume they are already carrying such a patch
> due to those 8 node systems)

Nit: I doubt any sane distro is going to carry a patch to adjust the 
*default* value of a Kconfig option. They might tune the actual value in 
their config, but, well, isn't that the whole point of configs? ;)

Robin.

> 
> Jonathan
> 
>>
>>>    
>>>>        depends on NEED_MULTIPLE_NODES
>>>>        help
>>>>          Specify the maximum number of NUMA Nodes available on the target
>>>   
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Vanshidhar Konda Oct. 22, 2020, 1:07 a.m. UTC | #7
On Thu, Oct 22, 2020 at 12:44:15AM +0100, Robin Murphy wrote:
>On 2020-10-21 12:02, Jonathan Cameron wrote:
>>On Wed, 21 Oct 2020 09:43:21 +0530
>>Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>
>>>On 10/20/2020 11:39 PM, Valentin Schneider wrote:
>>>>
>>>>Hi,
>>>>
>>>>Nit on the subject: this only increases the default, the max is still 2?????.
>>>
>>>Agreed.
>>>
>>>>
>>>>On 20/10/20 18:34, Vanshidhar Konda wrote:
>>>>>The current arm64 max NUMA nodes default to 4. Today's arm64 systems can
>>>>>reach or exceed 16. Increase the number to 64 (matching x86_64).
>>>>>
>>>>>Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
>>>>>---
>>>>>  arch/arm64/Kconfig | 2 +-
>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>>diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>>>index 893130ce1626..3e69d3c981be 100644
>>>>>--- a/arch/arm64/Kconfig
>>>>>+++ b/arch/arm64/Kconfig
>>>>>@@ -980,7 +980,7 @@ config NUMA
>>>>>  config NODES_SHIFT
>>>>>       int "Maximum NUMA Nodes (as a power of 2)"
>>>>>       range 1 10
>>>>>-	default "2"
>>>>>+	default "6"
>>>>
>>>>This leads to more statically allocated memory for things like node to CPU
>>>>maps (see uses of MAX_NUMNODES), but that shouldn't be too much of an
>>>>issue.
>>>
>>>The smaller systems should not be required to waste those memory in
>>>a default case, unless there is a real and available larger system
>>>with those increased nodes.
>>>
>>>>
>>>>AIUI this also directly correlates to how many more page->flags bits are
>>>>required: are we sure the max 10 works on any aarch64 platform? I'm
>>>
>>>We will have to test that. Besides 256 (2 ^ 8) is the first threshold
>>>to be crossed here.
>>>
>>>>genuinely asking here, given that I'm mostly a stranger to the mm
>>>>world. The default should be something we're somewhat confident works
>>>>everywhere.
>>>
>>>Agreed. Do we really need to match X86 right now ? Do we really have
>>>systems that has 64 nodes ? We should not increase the default node
>>>value and then try to solve some new problems, when there might not
>>>be any system which could even use that. I would suggest increase
>>>NODES_SHIFT value upto as required by a real and available system.
>>
>>I'm not going to give precise numbers on near future systems but it is public
>>that we ship 8 NUMA node ARM64 systems today.  Things will get more
>>interesting as CXL and CCIX enter the market on ARM systems,
>>given chances are every CXL device will look like another NUMA
>>node (CXL spec says they should be presented as such) and you
>>may be able to rack up lots of them.
>>
>>So I'd argue minimum that makes sense today is 16 nodes, but looking forward
>>even a little and 64 is not a great stretch.
>>I'd make the jump to 64 so we can forget about this again for a year or two.
>>People will want to run today's distros on these new machines and we'd
>>rather not have to go around all the distros asking them to carry a patch
>>increasing this count (I assume they are already carrying such a patch
>>due to those 8 node systems)

To echo Jonathan's statement above we are looking at systems that will
need approximately 64 NUMA nodes over the next 5-6 years - the time for
which an LTS kernel would be maintained. Some of the reason's for
increasing NUMA nodes during this time period include CXL, CCIX and
NVDIMM (like Jonathan pointed out).

The main argument against increasing the NODES_SHIFT seems to be a
concern that it negatively impacts other ARM64 systems. Could anyone
share what kind of systems we are talking about? For a system that has
NEED_MULTIPLE_NODES set, would the impact be noticeable?

Vanshi

>
>Nit: I doubt any sane distro is going to carry a patch to adjust the 
>*default* value of a Kconfig option. They might tune the actual value 
>in their config, but, well, isn't that the whole point of configs? ;)
>
>Robin.
>
>>
>>Jonathan
>>
>>>
>>>>>       depends on NEED_MULTIPLE_NODES
>>>>>       help
>>>>>         Specify the maximum number of NUMA Nodes available on the target
>>>
>>>_______________________________________________
>>>linux-arm-kernel mailing list
>>>linux-arm-kernel@lists.infradead.org
>>>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
>>
>>
>>_______________________________________________
>>linux-arm-kernel mailing list
>>linux-arm-kernel@lists.infradead.org
>>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
Robin Murphy Oct. 22, 2020, 11:21 a.m. UTC | #8
On 2020-10-22 02:07, Vanshi Konda wrote:
> On Thu, Oct 22, 2020 at 12:44:15AM +0100, Robin Murphy wrote:
>> On 2020-10-21 12:02, Jonathan Cameron wrote:
>>> On Wed, 21 Oct 2020 09:43:21 +0530
>>> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>>
>>>> On 10/20/2020 11:39 PM, Valentin Schneider wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Nit on the subject: this only increases the default, the max is 
>>>>> still 2?????.
>>>>
>>>> Agreed.
>>>>
>>>>>
>>>>> On 20/10/20 18:34, Vanshidhar Konda wrote:
>>>>>> The current arm64 max NUMA nodes default to 4. Today's arm64 
>>>>>> systems can
>>>>>> reach or exceed 16. Increase the number to 64 (matching x86_64).
>>>>>>
>>>>>> Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
>>>>>> ---
>>>>>>  arch/arm64/Kconfig | 2 +-
>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>>>> index 893130ce1626..3e69d3c981be 100644
>>>>>> --- a/arch/arm64/Kconfig
>>>>>> +++ b/arch/arm64/Kconfig
>>>>>> @@ -980,7 +980,7 @@ config NUMA
>>>>>>  config NODES_SHIFT
>>>>>>       int "Maximum NUMA Nodes (as a power of 2)"
>>>>>>       range 1 10
>>>>>> -    default "2"
>>>>>> +    default "6"
>>>>>
>>>>> This leads to more statically allocated memory for things like node 
>>>>> to CPU
>>>>> maps (see uses of MAX_NUMNODES), but that shouldn't be too much of an
>>>>> issue.
>>>>
>>>> The smaller systems should not be required to waste those memory in
>>>> a default case, unless there is a real and available larger system
>>>> with those increased nodes.
>>>>
>>>>>
>>>>> AIUI this also directly correlates to how many more page->flags 
>>>>> bits are
>>>>> required: are we sure the max 10 works on any aarch64 platform? I'm
>>>>
>>>> We will have to test that. Besides 256 (2 ^ 8) is the first threshold
>>>> to be crossed here.
>>>>
>>>>> genuinely asking here, given that I'm mostly a stranger to the mm
>>>>> world. The default should be something we're somewhat confident works
>>>>> everywhere.
>>>>
>>>> Agreed. Do we really need to match X86 right now ? Do we really have
>>>> systems that has 64 nodes ? We should not increase the default node
>>>> value and then try to solve some new problems, when there might not
>>>> be any system which could even use that. I would suggest increase
>>>> NODES_SHIFT value upto as required by a real and available system.
>>>
>>> I'm not going to give precise numbers on near future systems but it 
>>> is public
>>> that we ship 8 NUMA node ARM64 systems today.  Things will get more
>>> interesting as CXL and CCIX enter the market on ARM systems,
>>> given chances are every CXL device will look like another NUMA
>>> node (CXL spec says they should be presented as such) and you
>>> may be able to rack up lots of them.
>>>
>>> So I'd argue minimum that makes sense today is 16 nodes, but looking 
>>> forward
>>> even a little and 64 is not a great stretch.
>>> I'd make the jump to 64 so we can forget about this again for a year 
>>> or two.
>>> People will want to run today's distros on these new machines and we'd
>>> rather not have to go around all the distros asking them to carry a 
>>> patch
>>> increasing this count (I assume they are already carrying such a patch
>>> due to those 8 node systems)
> 
> To echo Jonathan's statement above we are looking at systems that will
> need approximately 64 NUMA nodes over the next 5-6 years - the time for
> which an LTS kernel would be maintained. Some of the reason's for
> increasing NUMA nodes during this time period include CXL, CCIX and
> NVDIMM (like Jonathan pointed out).
> 
> The main argument against increasing the NODES_SHIFT seems to be a
> concern that it negatively impacts other ARM64 systems. Could anyone
> share what kind of systems we are talking about? For a system that has
> NEED_MULTIPLE_NODES set, would the impact be noticeable?

Systems like the ESPRESSObin - sure, sane people aren't trying to run 
desktops or development environments in 1GB of RAM, but it's not 
uncommon for them to use a minimal headless install of their favourite 
generic arm64 distro rather than something more "embedded" like OpenWrt 
or Armbian. Increasing a generic kernel's memory footprint (and perhaps 
more importantly, cache footprint) more than necessary is going to have 
*some* impact.

Robin.

> 
> Vanshi
> 
>>
>> Nit: I doubt any sane distro is going to carry a patch to adjust the 
>> *default* value of a Kconfig option. They might tune the actual value 
>> in their config, but, well, isn't that the whole point of configs? ;)
>>
>> Robin.
>>
>>>
>>> Jonathan
>>>
>>>>
>>>>>>       depends on NEED_MULTIPLE_NODES
>>>>>>       help
>>>>>>         Specify the maximum number of NUMA Nodes available on the 
>>>>>> target
>>>>
>>>> _______________________________________________
>>>> linux-arm-kernel mailing list
>>>> linux-arm-kernel@lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>
>>>
>>>
>>> _______________________________________________
>>> linux-arm-kernel mailing list
>>> linux-arm-kernel@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>
Vanshidhar Konda Oct. 22, 2020, 4:25 p.m. UTC | #9
On Thu, Oct 22, 2020 at 12:21:27PM +0100, Robin Murphy wrote:
>On 2020-10-22 02:07, Vanshi Konda wrote:
>>On Thu, Oct 22, 2020 at 12:44:15AM +0100, Robin Murphy wrote:
>>>On 2020-10-21 12:02, Jonathan Cameron wrote:
>>>>On Wed, 21 Oct 2020 09:43:21 +0530
>>>>Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>>>
>>>>>On 10/20/2020 11:39 PM, Valentin Schneider wrote:
>>>>>>
>>>>>>Hi,
>>>>>>
>>>>>>Nit on the subject: this only increases the default, the max 
>>>>>>is still 2?????.
>>>>>
>>>>>Agreed.
>>>>>
>>>>>>
>>>>>>On 20/10/20 18:34, Vanshidhar Konda wrote:
>>>>>>>The current arm64 max NUMA nodes default to 4. Today's 
>>>>>>>arm64 systems can
>>>>>>>reach or exceed 16. Increase the number to 64 (matching x86_64).
>>>>>>>
>>>>>>>Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
>>>>>>>---
>>>>>>>??arch/arm64/Kconfig | 2 +-
>>>>>>>??1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>
>>>>>>>diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>>>>>index 893130ce1626..3e69d3c981be 100644
>>>>>>>--- a/arch/arm64/Kconfig
>>>>>>>+++ b/arch/arm64/Kconfig
>>>>>>>@@ -980,7 +980,7 @@ config NUMA
>>>>>>>??config NODES_SHIFT
>>>>>>>?????????? int "Maximum NUMA Nodes (as a power of 2)"
>>>>>>>?????????? range 1 10
>>>>>>>-?????? default "2"
>>>>>>>+?????? default "6"
>>>>>>
>>>>>>This leads to more statically allocated memory for things 
>>>>>>like node to CPU
>>>>>>maps (see uses of MAX_NUMNODES), but that shouldn't be too much of an
>>>>>>issue.
>>>>>
>>>>>The smaller systems should not be required to waste those memory in
>>>>>a default case, unless there is a real and available larger system
>>>>>with those increased nodes.
>>>>>
>>>>>>
>>>>>>AIUI this also directly correlates to how many more 
>>>>>>page->flags bits are
>>>>>>required: are we sure the max 10 works on any aarch64 platform? I'm
>>>>>
>>>>>We will have to test that. Besides 256 (2 ^ 8) is the first threshold
>>>>>to be crossed here.
>>>>>
>>>>>>genuinely asking here, given that I'm mostly a stranger to the mm
>>>>>>world. The default should be something we're somewhat confident works
>>>>>>everywhere.
>>>>>
>>>>>Agreed. Do we really need to match X86 right now ? Do we really have
>>>>>systems that has 64 nodes ? We should not increase the default node
>>>>>value and then try to solve some new problems, when there might not
>>>>>be any system which could even use that. I would suggest increase
>>>>>NODES_SHIFT value upto as required by a real and available system.
>>>>
>>>>I'm not going to give precise numbers on near future systems but 
>>>>it is public
>>>>that we ship 8 NUMA node ARM64 systems today.?? Things will get more
>>>>interesting as CXL and CCIX enter the market on ARM systems,
>>>>given chances are every CXL device will look like another NUMA
>>>>node (CXL spec says they should be presented as such) and you
>>>>may be able to rack up lots of them.
>>>>
>>>>So I'd argue minimum that makes sense today is 16 nodes, but 
>>>>looking forward
>>>>even a little and 64 is not a great stretch.
>>>>I'd make the jump to 64 so we can forget about this again for a 
>>>>year or two.
>>>>People will want to run today's distros on these new machines and we'd
>>>>rather not have to go around all the distros asking them to 
>>>>carry a patch
>>>>increasing this count (I assume they are already carrying such a patch
>>>>due to those 8 node systems)
>>
>>To echo Jonathan's statement above we are looking at systems that will
>>need approximately 64 NUMA nodes over the next 5-6 years - the time for
>>which an LTS kernel would be maintained. Some of the reason's for
>>increasing NUMA nodes during this time period include CXL, CCIX and
>>NVDIMM (like Jonathan pointed out).
>>
>>The main argument against increasing the NODES_SHIFT seems to be a
>>concern that it negatively impacts other ARM64 systems. Could anyone
>>share what kind of systems we are talking about? For a system that has
>>NEED_MULTIPLE_NODES set, would the impact be noticeable?
>
>Systems like the ESPRESSObin - sure, sane people aren't trying to run 
>desktops or development environments in 1GB of RAM, but it's not 
>uncommon for them to use a minimal headless install of their favourite 
>generic arm64 distro rather than something more "embedded" like 

If someone is running a generic arm64 distro, at least some of them are
already paying the extra cost. NODES_SHIFT for Ubuntu and SuSE kernels
is already 6. CentOS/Redhat and Oracle Linux set it to 3. I've only seen
Debian set it to 2.

Vanshi

>OpenWrt or Armbian. Increasing a generic kernel's memory footprint 
>(and perhaps more importantly, cache footprint) more than necessary is 
>going to have *some* impact.
>
>Robin.
>
>>
>>Vanshi
>>
>>>
>>>Nit: I doubt any sane distro is going to carry a patch to adjust 
>>>the *default* value of a Kconfig option. They might tune the 
>>>actual value in their config, but, well, isn't that the whole 
>>>point of configs? ;)
>>>
>>>Robin.
>>>
>>>>
>>>>Jonathan
>>>>
>>>>>
>>>>>>>?????????? depends on NEED_MULTIPLE_NODES
>>>>>>>?????????? help
>>>>>>>?????????????? Specify the maximum number of NUMA Nodes 
>>>>>>>available on the target
>>>>>
>>>>>_______________________________________________
>>>>>linux-arm-kernel mailing list
>>>>>linux-arm-kernel@lists.infradead.org
>>>>>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>>
>>>>
>>>>
>>>>_______________________________________________
>>>>linux-arm-kernel mailing list
>>>>linux-arm-kernel@lists.infradead.org
>>>>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>>
Dave Kleikamp Oct. 27, 2020, 10:46 p.m. UTC | #10
On 10/22/20 11:25 AM, Vanshi Konda wrote:
> On Thu, Oct 22, 2020 at 12:21:27PM +0100, Robin Murphy wrote:
>> On 2020-10-22 02:07, Vanshi Konda wrote:
>>> On Thu, Oct 22, 2020 at 12:44:15AM +0100, Robin Murphy wrote:
>>>> On 2020-10-21 12:02, Jonathan Cameron wrote:
>>>>> On Wed, 21 Oct 2020 09:43:21 +0530
>>>>> Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>>>>
>>>>>> On 10/20/2020 11:39 PM, Valentin Schneider wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Nit on the subject: this only increases the default, the max is
>>>>>>> still 2?????.
>>>>>>
>>>>>> Agreed.
>>>>>>
>>>>>>>
>>>>>>> On 20/10/20 18:34, Vanshidhar Konda wrote:
>>>>>>>> The current arm64 max NUMA nodes default to 4. Today's arm64
>>>>>>>> systems can
>>>>>>>> reach or exceed 16. Increase the number to 64 (matching x86_64).
>>>>>>>>
>>>>>>>> Signed-off-by: Vanshidhar Konda
>>>>>>>> <vanshikonda@os.amperecomputing.com>
>>>>>>>> ---
>>>>>>>> ??arch/arm64/Kconfig | 2 +-
>>>>>>>> ??1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>>>>>> index 893130ce1626..3e69d3c981be 100644
>>>>>>>> --- a/arch/arm64/Kconfig
>>>>>>>> +++ b/arch/arm64/Kconfig
>>>>>>>> @@ -980,7 +980,7 @@ config NUMA
>>>>>>>> ??config NODES_SHIFT
>>>>>>>> ?????????? int "Maximum NUMA Nodes (as a power of 2)"
>>>>>>>> ?????????? range 1 10
>>>>>>>> -?????? default "2"
>>>>>>>> +?????? default "6"
>>>>>>>
>>>>>>> This leads to more statically allocated memory for things like
>>>>>>> node to CPU
>>>>>>> maps (see uses of MAX_NUMNODES), but that shouldn't be too much
>>>>>>> of an
>>>>>>> issue.
>>>>>>
>>>>>> The smaller systems should not be required to waste those memory in
>>>>>> a default case, unless there is a real and available larger system
>>>>>> with those increased nodes.
>>>>>>
>>>>>>>
>>>>>>> AIUI this also directly correlates to how many more page->flags
>>>>>>> bits are
>>>>>>> required: are we sure the max 10 works on any aarch64 platform? I'm
>>>>>>
>>>>>> We will have to test that. Besides 256 (2 ^ 8) is the first threshold
>>>>>> to be crossed here.
>>>>>>
>>>>>>> genuinely asking here, given that I'm mostly a stranger to the mm
>>>>>>> world. The default should be something we're somewhat confident
>>>>>>> works
>>>>>>> everywhere.
>>>>>>
>>>>>> Agreed. Do we really need to match X86 right now ? Do we really have
>>>>>> systems that has 64 nodes ? We should not increase the default node
>>>>>> value and then try to solve some new problems, when there might not
>>>>>> be any system which could even use that. I would suggest increase
>>>>>> NODES_SHIFT value upto as required by a real and available system.
>>>>>
>>>>> I'm not going to give precise numbers on near future systems but it
>>>>> is public
>>>>> that we ship 8 NUMA node ARM64 systems today.?? Things will get more
>>>>> interesting as CXL and CCIX enter the market on ARM systems,
>>>>> given chances are every CXL device will look like another NUMA
>>>>> node (CXL spec says they should be presented as such) and you
>>>>> may be able to rack up lots of them.
>>>>>
>>>>> So I'd argue minimum that makes sense today is 16 nodes, but
>>>>> looking forward
>>>>> even a little and 64 is not a great stretch.
>>>>> I'd make the jump to 64 so we can forget about this again for a
>>>>> year or two.
>>>>> People will want to run today's distros on these new machines and we'd
>>>>> rather not have to go around all the distros asking them to carry a
>>>>> patch
>>>>> increasing this count (I assume they are already carrying such a patch
>>>>> due to those 8 node systems)
>>>
>>> To echo Jonathan's statement above we are looking at systems that will
>>> need approximately 64 NUMA nodes over the next 5-6 years - the time for
>>> which an LTS kernel would be maintained. Some of the reason's for
>>> increasing NUMA nodes during this time period include CXL, CCIX and
>>> NVDIMM (like Jonathan pointed out).

This is a very good point. It won't be long until systems will be
pushing the number of NUMA nodes and increasing NODES_SHIFT only
slightly now will result in the default configuration not recognizing
all the nodes. CONFIG_NODES_SHIFT=6 seems a reasonable step up for a
generic kernel that should run well on small to very large systems for a
few years to come.
>>> The main argument against increasing the NODES_SHIFT seems to be a
>>> concern that it negatively impacts other ARM64 systems. Could anyone
>>> share what kind of systems we are talking about? For a system that has
>>> NEED_MULTIPLE_NODES set, would the impact be noticeable?
>>
>> Systems like the ESPRESSObin - sure, sane people aren't trying to run
>> desktops or development environments in 1GB of RAM, but it's not
>> uncommon for them to use a minimal headless install of their favourite
>> generic arm64 distro rather than something more "embedded" like 
> 
> If someone is running a generic arm64 distro, at least some of them are
> already paying the extra cost. NODES_SHIFT for Ubuntu and SuSE kernels
> is already 6. CentOS/Redhat and Oracle Linux set it to 3. I've only seen
> Debian set it to 2.

Right. The distros may not agree or even care what the default is, but
it doesn't make sense for the mainline default to lag too far behind
what the major distros use.

Shaggy

> 
> Vanshi
> 
>> OpenWrt or Armbian. Increasing a generic kernel's memory footprint
>> (and perhaps more importantly, cache footprint) more than necessary is
>> going to have *some* impact.
>>
>> Robin.
>>
>>>
>>> Vanshi
>>>
>>>>
>>>> Nit: I doubt any sane distro is going to carry a patch to adjust the
>>>> *default* value of a Kconfig option. They might tune the actual
>>>> value in their config, but, well, isn't that the whole point of
>>>> configs? ;)
>>>>
>>>> Robin.
>>>>
>>>>>
>>>>> Jonathan
>>>>>
>>>>>>
>>>>>>>> ?????????? depends on NEED_MULTIPLE_NODES
>>>>>>>> ?????????? help
>>>>>>>> ?????????????? Specify the maximum number of NUMA Nodes
>>>>>>>> available on the target
Vanshidhar Konda Oct. 27, 2020, 11:14 p.m. UTC | #11
On Thu, Oct 22, 2020 at 12:21:27PM +0100, Robin Murphy wrote:
>On 2020-10-22 02:07, Vanshi Konda wrote:
>>On Thu, Oct 22, 2020 at 12:44:15AM +0100, Robin Murphy wrote:
>>>On 2020-10-21 12:02, Jonathan Cameron wrote:
>>>>On Wed, 21 Oct 2020 09:43:21 +0530
>>>>Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>>>>
>>>>>On 10/20/2020 11:39 PM, Valentin Schneider wrote:
>>>>>>
>>>>>>Hi,
>>>>>>
>>>>>>Nit on the subject: this only increases the default, the max 
>>>>>>is still 2?????.
>>>>>
>>>>>Agreed.
>>>>>
>>>>>>
>>>>>>On 20/10/20 18:34, Vanshidhar Konda wrote:
>>>>>>>The current arm64 max NUMA nodes default to 4. Today's 
>>>>>>>arm64 systems can
>>>>>>>reach or exceed 16. Increase the number to 64 (matching x86_64).
>>>>>>>
>>>>>>>Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com>
>>>>>>>---
>>>>>>>??arch/arm64/Kconfig | 2 +-
>>>>>>>??1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>
>>>>>>>diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>>>>>index 893130ce1626..3e69d3c981be 100644
>>>>>>>--- a/arch/arm64/Kconfig
>>>>>>>+++ b/arch/arm64/Kconfig
>>>>>>>@@ -980,7 +980,7 @@ config NUMA
>>>>>>>??config NODES_SHIFT
>>>>>>>?????????? int "Maximum NUMA Nodes (as a power of 2)"
>>>>>>>?????????? range 1 10
>>>>>>>-?????? default "2"
>>>>>>>+?????? default "6"
>>>>>>
>>>>>>This leads to more statically allocated memory for things 
>>>>>>like node to CPU
>>>>>>maps (see uses of MAX_NUMNODES), but that shouldn't be too much of an
>>>>>>issue.
>>>>>
>>>>>The smaller systems should not be required to waste those memory in
>>>>>a default case, unless there is a real and available larger system
>>>>>with those increased nodes.
>>>>>
>>>>>>
>>>>>>AIUI this also directly correlates to how many more 
>>>>>>page->flags bits are
>>>>>>required: are we sure the max 10 works on any aarch64 platform? I'm
>>>>>
>>>>>We will have to test that. Besides 256 (2 ^ 8) is the first threshold
>>>>>to be crossed here.
>>>>>
>>>>>>genuinely asking here, given that I'm mostly a stranger to the mm
>>>>>>world. The default should be something we're somewhat confident works
>>>>>>everywhere.
>>>>>
>>>>>Agreed. Do we really need to match X86 right now ? Do we really have
>>>>>systems that has 64 nodes ? We should not increase the default node
>>>>>value and then try to solve some new problems, when there might not
>>>>>be any system which could even use that. I would suggest increase
>>>>>NODES_SHIFT value upto as required by a real and available system.
>>>>
>>>>I'm not going to give precise numbers on near future systems but 
>>>>it is public
>>>>that we ship 8 NUMA node ARM64 systems today.?? Things will get more
>>>>interesting as CXL and CCIX enter the market on ARM systems,
>>>>given chances are every CXL device will look like another NUMA
>>>>node (CXL spec says they should be presented as such) and you
>>>>may be able to rack up lots of them.
>>>>
>>>>So I'd argue minimum that makes sense today is 16 nodes, but 
>>>>looking forward
>>>>even a little and 64 is not a great stretch.
>>>>I'd make the jump to 64 so we can forget about this again for a 
>>>>year or two.
>>>>People will want to run today's distros on these new machines and we'd
>>>>rather not have to go around all the distros asking them to 
>>>>carry a patch
>>>>increasing this count (I assume they are already carrying such a patch
>>>>due to those 8 node systems)
>>
>>To echo Jonathan's statement above we are looking at systems that will
>>need approximately 64 NUMA nodes over the next 5-6 years - the time for
>>which an LTS kernel would be maintained. Some of the reason's for
>>increasing NUMA nodes during this time period include CXL, CCIX and
>>NVDIMM (like Jonathan pointed out).
>>
>>The main argument against increasing the NODES_SHIFT seems to be a
>>concern that it negatively impacts other ARM64 systems. Could anyone
>>share what kind of systems we are talking about? For a system that has
>>NEED_MULTIPLE_NODES set, would the impact be noticeable?
>
>Systems like the ESPRESSObin - sure, sane people aren't trying to run 
>desktops or development environments in 1GB of RAM, but it's not 
>uncommon for them to use a minimal headless install of their favourite 
>generic arm64 distro rather than something more "embedded" like 
>OpenWrt or Armbian. Increasing a generic kernel's memory footprint 
>(and perhaps more importantly, cache footprint) more than necessary is 
>going to have *some* impact.
>

Ampere’s platforms support multiple NUMA configuration options to meet
different customer requirements. Multiple configurations have more than
4 (currrent default) NUMA nodes. These fail to initialize NUMA with the
following errors in dmesg:

[ 0.000000] ACPI: SRAT: Too many proximity domains.
[ 0.000000] ACPI: SRAT: SRAT not used.

[ 0.000000] SRAT: Invalid NUMA node -1 in ITS affinity
[ 0.000000] SRAT: Invalid NUMA node -1 in ITS affinity

If we look at the forecast for the next LTS kernel lifetime, the number
of NUMA nodes will increase significantly due to SOCs with significantly
higher core counts, increased number of memory channels, and new devices
such as CCIX attached memory, etc. Supporting these platforms with a
default kernel config will require a minimum NODES_SHIFT value = 6.

Vanshi

>Robin.
>
>>
>>Vanshi
>>
>>>
>>>Nit: I doubt any sane distro is going to carry a patch to adjust 
>>>the *default* value of a Kconfig option. They might tune the 
>>>actual value in their config, but, well, isn't that the whole 
>>>point of configs? ;)
>>>
>>>Robin.
>>>
>>>>
>>>>Jonathan
>>>>
>>>>>
>>>>>>>?????????? depends on NEED_MULTIPLE_NODES
>>>>>>>?????????? help
>>>>>>>?????????????? Specify the maximum number of NUMA Nodes 
>>>>>>>available on the target
>>>>>
>>>>>_______________________________________________
>>>>>linux-arm-kernel mailing list
>>>>>linux-arm-kernel@lists.infradead.org
>>>>>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>>
>>>>
>>>>
>>>>_______________________________________________
>>>>linux-arm-kernel mailing list
>>>>linux-arm-kernel@lists.infradead.org
>>>>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>>
Catalin Marinas Oct. 29, 2020, 1:37 p.m. UTC | #12
On Wed, Oct 21, 2020 at 11:29:41PM +0100, Valentin Schneider wrote:
> On 21/10/20 12:02, Jonathan Cameron wrote:
> > On Wed, 21 Oct 2020 09:43:21 +0530
> > Anshuman Khandual <anshuman.khandual@arm.com> wrote:
> >> Agreed. Do we really need to match X86 right now ? Do we really have
> >> systems that has 64 nodes ? We should not increase the default node
> >> value and then try to solve some new problems, when there might not
> >> be any system which could even use that. I would suggest increase
> >> NODES_SHIFT value upto as required by a real and available system.
> >
> > I'm not going to give precise numbers on near future systems but it is public
> > that we ship 8 NUMA node ARM64 systems today.  Things will get more
> > interesting as CXL and CCIX enter the market on ARM systems,
> > given chances are every CXL device will look like another NUMA
> > node (CXL spec says they should be presented as such) and you
> > may be able to rack up lots of them.
> >
> > So I'd argue minimum that makes sense today is 16 nodes, but looking forward
> > even a little and 64 is not a great stretch.
> > I'd make the jump to 64 so we can forget about this again for a year or two.
> > People will want to run today's distros on these new machines and we'd
> > rather not have to go around all the distros asking them to carry a patch
> > increasing this count (I assume they are already carrying such a patch
> > due to those 8 node systems)
> 
> I agree that 4 nodes is somewhat anemic; I've had to bump that just to
> run some scheduler tests under QEMU. However I still believe we should
> exercise caution before cranking it too high, especially when seeing things
> like:
> 
>   ee38d94a0ad8 ("page flags: prioritize kasan bits over last-cpuid")
> 
> To give some numbers, a defconfig build gives me:
> 
>   SECTIONS_WIDTH=0 ZONES_WIDTH=2 NODES_SHIFT=2 LAST_CPUPID_SHIFT=(8+8) KASAN_TAG_WIDTH=0
>   BITS_PER_LONG=64 NR_PAGEFLAGS=24
> 
> IOW, we need 18 + NODES_SHIFT <= 40 -> NODES_SHIFT <= 22. That looks to be
> plenty, however this can get cramped fairly easily with any combination of:
> 
>   CONFIG_SPARSEMEM_VMEMMAP=n (-18)
>   CONFIG_IDLE_PAGE_TRACKING=y (-2)
>   CONFIG_KASAN=y + CONFIG_KASAN_SW_TAGS (-8)
> 
> Taking Arnd's above example, a randconfig build picking !VMEMMAP already
> limits the NODES_SHIFT to 4 *if* we want to keep the CPUPID thing within
> the flags (it gets a dedicated field at the tail of struct page
> otherwise). If that is something we don't care too much about, then
> consider my concerns taken care of.

I don't think there's any value in allowing SPARSEMEM_VMEMMAP to be
disabled but the option is in the core mm/Kconfig file. We could make
NODES_SHIFT depend on SPARSEMEM_VMEMMAP (there's DISCONTIGMEM as well
but hopefully that's going away soon).

> One more thing though: NR_CPUS can be cranked up to 4096 but we've only set
> it to 256 IIRC to support the TX2. From that PoV, I'm agreeing with
> Anshuman in that we should set it to match the max encountered on platforms
> that are in use right now.

I agree. Let's bump NODES_SHIFT to 4 now to cover existing platforms. If
distros have a 10-year view, they can always ship a kernel configured to
64 nodes, no need to change Kconfig (distros never ship with defconfig).

It may have an impact on more memory constrained platforms but that's
not what defconfig is about. It should allow existing hardware to run
Linux but not necessarily run it in the most efficient way possible.
Vanshidhar Konda Oct. 29, 2020, 7:48 p.m. UTC | #13
On Thu, Oct 29, 2020 at 01:37:10PM +0000, Catalin Marinas wrote:
>On Wed, Oct 21, 2020 at 11:29:41PM +0100, Valentin Schneider wrote:
>> On 21/10/20 12:02, Jonathan Cameron wrote:
>> > On Wed, 21 Oct 2020 09:43:21 +0530
>> > Anshuman Khandual <anshuman.khandual@arm.com> wrote:
>> >> Agreed. Do we really need to match X86 right now ? Do we really have
>> >> systems that has 64 nodes ? We should not increase the default node
>> >> value and then try to solve some new problems, when there might not
>> >> be any system which could even use that. I would suggest increase
>> >> NODES_SHIFT value upto as required by a real and available system.
>> >
>> > I'm not going to give precise numbers on near future systems but it is public
>> > that we ship 8 NUMA node ARM64 systems today.  Things will get more
>> > interesting as CXL and CCIX enter the market on ARM systems,
>> > given chances are every CXL device will look like another NUMA
>> > node (CXL spec says they should be presented as such) and you
>> > may be able to rack up lots of them.
>> >
>> > So I'd argue minimum that makes sense today is 16 nodes, but looking forward
>> > even a little and 64 is not a great stretch.
>> > I'd make the jump to 64 so we can forget about this again for a year or two.
>> > People will want to run today's distros on these new machines and we'd
>> > rather not have to go around all the distros asking them to carry a patch
>> > increasing this count (I assume they are already carrying such a patch
>> > due to those 8 node systems)
>>
>> I agree that 4 nodes is somewhat anemic; I've had to bump that just to
>> run some scheduler tests under QEMU. However I still believe we should
>> exercise caution before cranking it too high, especially when seeing things
>> like:
>>
>>   ee38d94a0ad8 ("page flags: prioritize kasan bits over last-cpuid")
>>
>> To give some numbers, a defconfig build gives me:
>>
>>   SECTIONS_WIDTH=0 ZONES_WIDTH=2 NODES_SHIFT=2 LAST_CPUPID_SHIFT=(8+8) KASAN_TAG_WIDTH=0
>>   BITS_PER_LONG=64 NR_PAGEFLAGS=24
>>
>> IOW, we need 18 + NODES_SHIFT <= 40 -> NODES_SHIFT <= 22. That looks to be
>> plenty, however this can get cramped fairly easily with any combination of:
>>
>>   CONFIG_SPARSEMEM_VMEMMAP=n (-18)
>>   CONFIG_IDLE_PAGE_TRACKING=y (-2)
>>   CONFIG_KASAN=y + CONFIG_KASAN_SW_TAGS (-8)
>>
>> Taking Arnd's above example, a randconfig build picking !VMEMMAP already
>> limits the NODES_SHIFT to 4 *if* we want to keep the CPUPID thing within
>> the flags (it gets a dedicated field at the tail of struct page
>> otherwise). If that is something we don't care too much about, then
>> consider my concerns taken care of.
>
>I don't think there's any value in allowing SPARSEMEM_VMEMMAP to be
>disabled but the option is in the core mm/Kconfig file. We could make
>NODES_SHIFT depend on SPARSEMEM_VMEMMAP (there's DISCONTIGMEM as well
>but hopefully that's going away soon).
>
>> One more thing though: NR_CPUS can be cranked up to 4096 but we've only set
>> it to 256 IIRC to support the TX2. From that PoV, I'm agreeing with
>> Anshuman in that we should set it to match the max encountered on platforms
>> that are in use right now.
>
>I agree. Let's bump NODES_SHIFT to 4 now to cover existing platforms. If
>distros have a 10-year view, they can always ship a kernel configured to
>64 nodes, no need to change Kconfig (distros never ship with defconfig).
>
>It may have an impact on more memory constrained platforms but that's
>not what defconfig is about. It should allow existing hardware to run
>Linux but not necessarily run it in the most efficient way possible.
>

 From the discussion it looks like 4 is an acceptable number to support
current hardware. I'll send a patch with NODES_SHIFT set to 4. Is it 
still possible to add this change to the 5.10 kernel?

Vanshi

>-- 
>Catalin
Catalin Marinas Oct. 30, 2020, 10:21 a.m. UTC | #14
On Thu, Oct 29, 2020 at 12:48:50PM -0700, Vanshidhar Konda wrote:
> On Thu, Oct 29, 2020 at 01:37:10PM +0000, Catalin Marinas wrote:
> > On Wed, Oct 21, 2020 at 11:29:41PM +0100, Valentin Schneider wrote:
> > > On 21/10/20 12:02, Jonathan Cameron wrote:
> > > > On Wed, 21 Oct 2020 09:43:21 +0530
> > > > Anshuman Khandual <anshuman.khandual@arm.com> wrote:
> > > >> Agreed. Do we really need to match X86 right now ? Do we really have
> > > >> systems that has 64 nodes ? We should not increase the default node
> > > >> value and then try to solve some new problems, when there might not
> > > >> be any system which could even use that. I would suggest increase
> > > >> NODES_SHIFT value upto as required by a real and available system.
> > > >
> > > > I'm not going to give precise numbers on near future systems but it is public
> > > > that we ship 8 NUMA node ARM64 systems today.  Things will get more
> > > > interesting as CXL and CCIX enter the market on ARM systems,
> > > > given chances are every CXL device will look like another NUMA
> > > > node (CXL spec says they should be presented as such) and you
> > > > may be able to rack up lots of them.
> > > >
> > > > So I'd argue minimum that makes sense today is 16 nodes, but looking forward
> > > > even a little and 64 is not a great stretch.
> > > > I'd make the jump to 64 so we can forget about this again for a year or two.
> > > > People will want to run today's distros on these new machines and we'd
> > > > rather not have to go around all the distros asking them to carry a patch
> > > > increasing this count (I assume they are already carrying such a patch
> > > > due to those 8 node systems)
> > > 
> > > I agree that 4 nodes is somewhat anemic; I've had to bump that just to
> > > run some scheduler tests under QEMU. However I still believe we should
> > > exercise caution before cranking it too high, especially when seeing things
> > > like:
> > > 
> > >   ee38d94a0ad8 ("page flags: prioritize kasan bits over last-cpuid")
> > > 
> > > To give some numbers, a defconfig build gives me:
> > > 
> > >   SECTIONS_WIDTH=0 ZONES_WIDTH=2 NODES_SHIFT=2 LAST_CPUPID_SHIFT=(8+8) KASAN_TAG_WIDTH=0
> > >   BITS_PER_LONG=64 NR_PAGEFLAGS=24
> > > 
> > > IOW, we need 18 + NODES_SHIFT <= 40 -> NODES_SHIFT <= 22. That looks to be
> > > plenty, however this can get cramped fairly easily with any combination of:
> > > 
> > >   CONFIG_SPARSEMEM_VMEMMAP=n (-18)
> > >   CONFIG_IDLE_PAGE_TRACKING=y (-2)
> > >   CONFIG_KASAN=y + CONFIG_KASAN_SW_TAGS (-8)
> > > 
> > > Taking Arnd's above example, a randconfig build picking !VMEMMAP already
> > > limits the NODES_SHIFT to 4 *if* we want to keep the CPUPID thing within
> > > the flags (it gets a dedicated field at the tail of struct page
> > > otherwise). If that is something we don't care too much about, then
> > > consider my concerns taken care of.
> > 
> > I don't think there's any value in allowing SPARSEMEM_VMEMMAP to be
> > disabled but the option is in the core mm/Kconfig file. We could make
> > NODES_SHIFT depend on SPARSEMEM_VMEMMAP (there's DISCONTIGMEM as well
> > but hopefully that's going away soon).
> > 
> > > One more thing though: NR_CPUS can be cranked up to 4096 but we've only set
> > > it to 256 IIRC to support the TX2. From that PoV, I'm agreeing with
> > > Anshuman in that we should set it to match the max encountered on platforms
> > > that are in use right now.
> > 
> > I agree. Let's bump NODES_SHIFT to 4 now to cover existing platforms. If
> > distros have a 10-year view, they can always ship a kernel configured to
> > 64 nodes, no need to change Kconfig (distros never ship with defconfig).
> > 
> > It may have an impact on more memory constrained platforms but that's
> > not what defconfig is about. It should allow existing hardware to run
> > Linux but not necessarily run it in the most efficient way possible.
> > 
> 
> From the discussion it looks like 4 is an acceptable number to support
> current hardware. I'll send a patch with NODES_SHIFT set to 4. Is it still
> possible to add this change to the 5.10 kernel?

I think we can but I'll leave the decision to Will (and don't forget to
cc the arm64 maintainers on your next post).
diff mbox series

Patch

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 893130ce1626..3e69d3c981be 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -980,7 +980,7 @@  config NUMA
 config NODES_SHIFT
 	int "Maximum NUMA Nodes (as a power of 2)"
 	range 1 10
-	default "2"
+	default "6"
 	depends on NEED_MULTIPLE_NODES
 	help
 	  Specify the maximum number of NUMA Nodes available on the target