Message ID | 20201020173409.1266576-1-vanshikonda@os.amperecomputing.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: NUMA: Kconfig: Increase max number of nodes | expand |
Hi, Nit on the subject: this only increases the default, the max is still 2¹⁰. On 20/10/20 18:34, Vanshidhar Konda wrote: > The current arm64 max NUMA nodes default to 4. Today's arm64 systems can > reach or exceed 16. Increase the number to 64 (matching x86_64). > > Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com> > --- > arch/arm64/Kconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index 893130ce1626..3e69d3c981be 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -980,7 +980,7 @@ config NUMA > config NODES_SHIFT > int "Maximum NUMA Nodes (as a power of 2)" > range 1 10 > - default "2" > + default "6" This leads to more statically allocated memory for things like node to CPU maps (see uses of MAX_NUMNODES), but that shouldn't be too much of an issue. AIUI this also directly correlates to how many more page->flags bits are required: are we sure the max 10 works on any aarch64 platform? I'm genuinely asking here, given that I'm mostly a stranger to the mm world. The default should be something we're somewhat confident works everywhere. > depends on NEED_MULTIPLE_NODES > help > Specify the maximum number of NUMA Nodes available on the target
On 10/20/2020 11:39 PM, Valentin Schneider wrote: > > Hi, > > Nit on the subject: this only increases the default, the max is still 2¹⁰. Agreed. > > On 20/10/20 18:34, Vanshidhar Konda wrote: >> The current arm64 max NUMA nodes default to 4. Today's arm64 systems can >> reach or exceed 16. Increase the number to 64 (matching x86_64). >> >> Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com> >> --- >> arch/arm64/Kconfig | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig >> index 893130ce1626..3e69d3c981be 100644 >> --- a/arch/arm64/Kconfig >> +++ b/arch/arm64/Kconfig >> @@ -980,7 +980,7 @@ config NUMA >> config NODES_SHIFT >> int "Maximum NUMA Nodes (as a power of 2)" >> range 1 10 >> - default "2" >> + default "6" > > This leads to more statically allocated memory for things like node to CPU > maps (see uses of MAX_NUMNODES), but that shouldn't be too much of an > issue. The smaller systems should not be required to waste those memory in a default case, unless there is a real and available larger system with those increased nodes. > > AIUI this also directly correlates to how many more page->flags bits are > required: are we sure the max 10 works on any aarch64 platform? I'm We will have to test that. Besides 256 (2 ^ 8) is the first threshold to be crossed here. > genuinely asking here, given that I'm mostly a stranger to the mm > world. The default should be something we're somewhat confident works > everywhere. Agreed. Do we really need to match X86 right now ? Do we really have systems that has 64 nodes ? We should not increase the default node value and then try to solve some new problems, when there might not be any system which could even use that. I would suggest increase NODES_SHIFT value upto as required by a real and available system. > >> depends on NEED_MULTIPLE_NODES >> help >> Specify the maximum number of NUMA Nodes available on the target >
On Wed, 21 Oct 2020 09:43:21 +0530 Anshuman Khandual <anshuman.khandual@arm.com> wrote: > On 10/20/2020 11:39 PM, Valentin Schneider wrote: > > > > Hi, > > > > Nit on the subject: this only increases the default, the max is still 2¹⁰. > > Agreed. > > > > > On 20/10/20 18:34, Vanshidhar Konda wrote: > >> The current arm64 max NUMA nodes default to 4. Today's arm64 systems can > >> reach or exceed 16. Increase the number to 64 (matching x86_64). > >> > >> Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com> > >> --- > >> arch/arm64/Kconfig | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > >> index 893130ce1626..3e69d3c981be 100644 > >> --- a/arch/arm64/Kconfig > >> +++ b/arch/arm64/Kconfig > >> @@ -980,7 +980,7 @@ config NUMA > >> config NODES_SHIFT > >> int "Maximum NUMA Nodes (as a power of 2)" > >> range 1 10 > >> - default "2" > >> + default "6" > > > > This leads to more statically allocated memory for things like node to CPU > > maps (see uses of MAX_NUMNODES), but that shouldn't be too much of an > > issue. > > The smaller systems should not be required to waste those memory in > a default case, unless there is a real and available larger system > with those increased nodes. > > > > > AIUI this also directly correlates to how many more page->flags bits are > > required: are we sure the max 10 works on any aarch64 platform? I'm > > We will have to test that. Besides 256 (2 ^ 8) is the first threshold > to be crossed here. > > > genuinely asking here, given that I'm mostly a stranger to the mm > > world. The default should be something we're somewhat confident works > > everywhere. > > Agreed. Do we really need to match X86 right now ? Do we really have > systems that has 64 nodes ? We should not increase the default node > value and then try to solve some new problems, when there might not > be any system which could even use that. I would suggest increase > NODES_SHIFT value upto as required by a real and available system. I'm not going to give precise numbers on near future systems but it is public that we ship 8 NUMA node ARM64 systems today. Things will get more interesting as CXL and CCIX enter the market on ARM systems, given chances are every CXL device will look like another NUMA node (CXL spec says they should be presented as such) and you may be able to rack up lots of them. So I'd argue minimum that makes sense today is 16 nodes, but looking forward even a little and 64 is not a great stretch. I'd make the jump to 64 so we can forget about this again for a year or two. People will want to run today's distros on these new machines and we'd rather not have to go around all the distros asking them to carry a patch increasing this count (I assume they are already carrying such a patch due to those 8 node systems) Jonathan > > > > >> depends on NEED_MULTIPLE_NODES > >> help > >> Specify the maximum number of NUMA Nodes available on the target > > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On Tue, Oct 20, 2020 at 07:09:36PM +0100, Valentin Schneider wrote: > >Hi, > >Nit on the subject: this only increases the default, the max is still 2?????. > >On 20/10/20 18:34, Vanshidhar Konda wrote: >> The current arm64 max NUMA nodes default to 4. Today's arm64 systems can >> reach or exceed 16. Increase the number to 64 (matching x86_64). >> >> Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com> >> --- >> arch/arm64/Kconfig | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig >> index 893130ce1626..3e69d3c981be 100644 >> --- a/arch/arm64/Kconfig >> +++ b/arch/arm64/Kconfig >> @@ -980,7 +980,7 @@ config NUMA >> config NODES_SHIFT >> int "Maximum NUMA Nodes (as a power of 2)" >> range 1 10 >> - default "2" >> + default "6" > >This leads to more statically allocated memory for things like node to CPU >maps (see uses of MAX_NUMNODES), but that shouldn't be too much of an >issue. > >AIUI this also directly correlates to how many more page->flags bits are >required: are we sure the max 10 works on any aarch64 platform? I'm I created an experimental setup in which I enabled 1024 NUMA nodes in SRAT, SLIT and configured NODES_SHIFT=10 for the kernel. 1022 of these nodes were memory-only NUMA nodes. This configuration booted and recognized the NUMA nodes correctly. >genuinely asking here, given that I'm mostly a stranger to the mm >world. The default should be something we're somewhat confident works >everywhere. > >> depends on NEED_MULTIPLE_NODES >> help >> Specify the maximum number of NUMA Nodes available on the target
Hi, On 21/10/20 12:02, Jonathan Cameron wrote: > On Wed, 21 Oct 2020 09:43:21 +0530 > Anshuman Khandual <anshuman.khandual@arm.com> wrote: >> >> Agreed. Do we really need to match X86 right now ? Do we really have >> systems that has 64 nodes ? We should not increase the default node >> value and then try to solve some new problems, when there might not >> be any system which could even use that. I would suggest increase >> NODES_SHIFT value upto as required by a real and available system. > > I'm not going to give precise numbers on near future systems but it is public > that we ship 8 NUMA node ARM64 systems today. Things will get more > interesting as CXL and CCIX enter the market on ARM systems, > given chances are every CXL device will look like another NUMA > node (CXL spec says they should be presented as such) and you > may be able to rack up lots of them. > > So I'd argue minimum that makes sense today is 16 nodes, but looking forward > even a little and 64 is not a great stretch. > I'd make the jump to 64 so we can forget about this again for a year or two. > People will want to run today's distros on these new machines and we'd > rather not have to go around all the distros asking them to carry a patch > increasing this count (I assume they are already carrying such a patch > due to those 8 node systems) > I agree that 4 nodes is somewhat anemic; I've had to bump that just to run some scheduler tests under QEMU. However I still believe we should exercise caution before cranking it too high, especially when seeing things like: ee38d94a0ad8 ("page flags: prioritize kasan bits over last-cpuid") To give some numbers, a defconfig build gives me: SECTIONS_WIDTH=0 ZONES_WIDTH=2 NODES_SHIFT=2 LAST_CPUPID_SHIFT=(8+8) KASAN_TAG_WIDTH=0 BITS_PER_LONG=64 NR_PAGEFLAGS=24 IOW, we need 18 + NODES_SHIFT <= 40 -> NODES_SHIFT <= 22. That looks to be plenty, however this can get cramped fairly easily with any combination of: CONFIG_SPARSEMEM_VMEMMAP=n (-18) CONFIG_IDLE_PAGE_TRACKING=y (-2) CONFIG_KASAN=y + CONFIG_KASAN_SW_TAGS (-8) Taking Arnd's above example, a randconfig build picking !VMEMMAP already limits the NODES_SHIFT to 4 *if* we want to keep the CPUPID thing within the flags (it gets a dedicated field at the tail of struct page otherwise). If that is something we don't care too much about, then consider my concerns taken care of. One more thing though: NR_CPUS can be cranked up to 4096 but we've only set it to 256 IIRC to support the TX2. From that PoV, I'm agreeing with Anshuman in that we should set it to match the max encountered on platforms that are in use right now. > Jonathan > >> >> > >> >> depends on NEED_MULTIPLE_NODES >> >> help >> >> Specify the maximum number of NUMA Nodes available on the target >> > >> >> _______________________________________________ >> linux-arm-kernel mailing list >> linux-arm-kernel@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On 2020-10-21 12:02, Jonathan Cameron wrote: > On Wed, 21 Oct 2020 09:43:21 +0530 > Anshuman Khandual <anshuman.khandual@arm.com> wrote: > >> On 10/20/2020 11:39 PM, Valentin Schneider wrote: >>> >>> Hi, >>> >>> Nit on the subject: this only increases the default, the max is still 2¹⁰. >> >> Agreed. >> >>> >>> On 20/10/20 18:34, Vanshidhar Konda wrote: >>>> The current arm64 max NUMA nodes default to 4. Today's arm64 systems can >>>> reach or exceed 16. Increase the number to 64 (matching x86_64). >>>> >>>> Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com> >>>> --- >>>> arch/arm64/Kconfig | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig >>>> index 893130ce1626..3e69d3c981be 100644 >>>> --- a/arch/arm64/Kconfig >>>> +++ b/arch/arm64/Kconfig >>>> @@ -980,7 +980,7 @@ config NUMA >>>> config NODES_SHIFT >>>> int "Maximum NUMA Nodes (as a power of 2)" >>>> range 1 10 >>>> - default "2" >>>> + default "6" >>> >>> This leads to more statically allocated memory for things like node to CPU >>> maps (see uses of MAX_NUMNODES), but that shouldn't be too much of an >>> issue. >> >> The smaller systems should not be required to waste those memory in >> a default case, unless there is a real and available larger system >> with those increased nodes. >> >>> >>> AIUI this also directly correlates to how many more page->flags bits are >>> required: are we sure the max 10 works on any aarch64 platform? I'm >> >> We will have to test that. Besides 256 (2 ^ 8) is the first threshold >> to be crossed here. >> >>> genuinely asking here, given that I'm mostly a stranger to the mm >>> world. The default should be something we're somewhat confident works >>> everywhere. >> >> Agreed. Do we really need to match X86 right now ? Do we really have >> systems that has 64 nodes ? We should not increase the default node >> value and then try to solve some new problems, when there might not >> be any system which could even use that. I would suggest increase >> NODES_SHIFT value upto as required by a real and available system. > > I'm not going to give precise numbers on near future systems but it is public > that we ship 8 NUMA node ARM64 systems today. Things will get more > interesting as CXL and CCIX enter the market on ARM systems, > given chances are every CXL device will look like another NUMA > node (CXL spec says they should be presented as such) and you > may be able to rack up lots of them. > > So I'd argue minimum that makes sense today is 16 nodes, but looking forward > even a little and 64 is not a great stretch. > I'd make the jump to 64 so we can forget about this again for a year or two. > People will want to run today's distros on these new machines and we'd > rather not have to go around all the distros asking them to carry a patch > increasing this count (I assume they are already carrying such a patch > due to those 8 node systems) Nit: I doubt any sane distro is going to carry a patch to adjust the *default* value of a Kconfig option. They might tune the actual value in their config, but, well, isn't that the whole point of configs? ;) Robin. > > Jonathan > >> >>> >>>> depends on NEED_MULTIPLE_NODES >>>> help >>>> Specify the maximum number of NUMA Nodes available on the target >>> >> >> _______________________________________________ >> linux-arm-kernel mailing list >> linux-arm-kernel@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >
On Thu, Oct 22, 2020 at 12:44:15AM +0100, Robin Murphy wrote: >On 2020-10-21 12:02, Jonathan Cameron wrote: >>On Wed, 21 Oct 2020 09:43:21 +0530 >>Anshuman Khandual <anshuman.khandual@arm.com> wrote: >> >>>On 10/20/2020 11:39 PM, Valentin Schneider wrote: >>>> >>>>Hi, >>>> >>>>Nit on the subject: this only increases the default, the max is still 2?????. >>> >>>Agreed. >>> >>>> >>>>On 20/10/20 18:34, Vanshidhar Konda wrote: >>>>>The current arm64 max NUMA nodes default to 4. Today's arm64 systems can >>>>>reach or exceed 16. Increase the number to 64 (matching x86_64). >>>>> >>>>>Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com> >>>>>--- >>>>> arch/arm64/Kconfig | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>>diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig >>>>>index 893130ce1626..3e69d3c981be 100644 >>>>>--- a/arch/arm64/Kconfig >>>>>+++ b/arch/arm64/Kconfig >>>>>@@ -980,7 +980,7 @@ config NUMA >>>>> config NODES_SHIFT >>>>> int "Maximum NUMA Nodes (as a power of 2)" >>>>> range 1 10 >>>>>- default "2" >>>>>+ default "6" >>>> >>>>This leads to more statically allocated memory for things like node to CPU >>>>maps (see uses of MAX_NUMNODES), but that shouldn't be too much of an >>>>issue. >>> >>>The smaller systems should not be required to waste those memory in >>>a default case, unless there is a real and available larger system >>>with those increased nodes. >>> >>>> >>>>AIUI this also directly correlates to how many more page->flags bits are >>>>required: are we sure the max 10 works on any aarch64 platform? I'm >>> >>>We will have to test that. Besides 256 (2 ^ 8) is the first threshold >>>to be crossed here. >>> >>>>genuinely asking here, given that I'm mostly a stranger to the mm >>>>world. The default should be something we're somewhat confident works >>>>everywhere. >>> >>>Agreed. Do we really need to match X86 right now ? Do we really have >>>systems that has 64 nodes ? We should not increase the default node >>>value and then try to solve some new problems, when there might not >>>be any system which could even use that. I would suggest increase >>>NODES_SHIFT value upto as required by a real and available system. >> >>I'm not going to give precise numbers on near future systems but it is public >>that we ship 8 NUMA node ARM64 systems today. Things will get more >>interesting as CXL and CCIX enter the market on ARM systems, >>given chances are every CXL device will look like another NUMA >>node (CXL spec says they should be presented as such) and you >>may be able to rack up lots of them. >> >>So I'd argue minimum that makes sense today is 16 nodes, but looking forward >>even a little and 64 is not a great stretch. >>I'd make the jump to 64 so we can forget about this again for a year or two. >>People will want to run today's distros on these new machines and we'd >>rather not have to go around all the distros asking them to carry a patch >>increasing this count (I assume they are already carrying such a patch >>due to those 8 node systems) To echo Jonathan's statement above we are looking at systems that will need approximately 64 NUMA nodes over the next 5-6 years - the time for which an LTS kernel would be maintained. Some of the reason's for increasing NUMA nodes during this time period include CXL, CCIX and NVDIMM (like Jonathan pointed out). The main argument against increasing the NODES_SHIFT seems to be a concern that it negatively impacts other ARM64 systems. Could anyone share what kind of systems we are talking about? For a system that has NEED_MULTIPLE_NODES set, would the impact be noticeable? Vanshi > >Nit: I doubt any sane distro is going to carry a patch to adjust the >*default* value of a Kconfig option. They might tune the actual value >in their config, but, well, isn't that the whole point of configs? ;) > >Robin. > >> >>Jonathan >> >>> >>>>> depends on NEED_MULTIPLE_NODES >>>>> help >>>>> Specify the maximum number of NUMA Nodes available on the target >>> >>>_______________________________________________ >>>linux-arm-kernel mailing list >>>linux-arm-kernel@lists.infradead.org >>>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >> >> >> >>_______________________________________________ >>linux-arm-kernel mailing list >>linux-arm-kernel@lists.infradead.org >>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >>
On 2020-10-22 02:07, Vanshi Konda wrote: > On Thu, Oct 22, 2020 at 12:44:15AM +0100, Robin Murphy wrote: >> On 2020-10-21 12:02, Jonathan Cameron wrote: >>> On Wed, 21 Oct 2020 09:43:21 +0530 >>> Anshuman Khandual <anshuman.khandual@arm.com> wrote: >>> >>>> On 10/20/2020 11:39 PM, Valentin Schneider wrote: >>>>> >>>>> Hi, >>>>> >>>>> Nit on the subject: this only increases the default, the max is >>>>> still 2?????. >>>> >>>> Agreed. >>>> >>>>> >>>>> On 20/10/20 18:34, Vanshidhar Konda wrote: >>>>>> The current arm64 max NUMA nodes default to 4. Today's arm64 >>>>>> systems can >>>>>> reach or exceed 16. Increase the number to 64 (matching x86_64). >>>>>> >>>>>> Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com> >>>>>> --- >>>>>> arch/arm64/Kconfig | 2 +- >>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig >>>>>> index 893130ce1626..3e69d3c981be 100644 >>>>>> --- a/arch/arm64/Kconfig >>>>>> +++ b/arch/arm64/Kconfig >>>>>> @@ -980,7 +980,7 @@ config NUMA >>>>>> config NODES_SHIFT >>>>>> int "Maximum NUMA Nodes (as a power of 2)" >>>>>> range 1 10 >>>>>> - default "2" >>>>>> + default "6" >>>>> >>>>> This leads to more statically allocated memory for things like node >>>>> to CPU >>>>> maps (see uses of MAX_NUMNODES), but that shouldn't be too much of an >>>>> issue. >>>> >>>> The smaller systems should not be required to waste those memory in >>>> a default case, unless there is a real and available larger system >>>> with those increased nodes. >>>> >>>>> >>>>> AIUI this also directly correlates to how many more page->flags >>>>> bits are >>>>> required: are we sure the max 10 works on any aarch64 platform? I'm >>>> >>>> We will have to test that. Besides 256 (2 ^ 8) is the first threshold >>>> to be crossed here. >>>> >>>>> genuinely asking here, given that I'm mostly a stranger to the mm >>>>> world. The default should be something we're somewhat confident works >>>>> everywhere. >>>> >>>> Agreed. Do we really need to match X86 right now ? Do we really have >>>> systems that has 64 nodes ? We should not increase the default node >>>> value and then try to solve some new problems, when there might not >>>> be any system which could even use that. I would suggest increase >>>> NODES_SHIFT value upto as required by a real and available system. >>> >>> I'm not going to give precise numbers on near future systems but it >>> is public >>> that we ship 8 NUMA node ARM64 systems today. Things will get more >>> interesting as CXL and CCIX enter the market on ARM systems, >>> given chances are every CXL device will look like another NUMA >>> node (CXL spec says they should be presented as such) and you >>> may be able to rack up lots of them. >>> >>> So I'd argue minimum that makes sense today is 16 nodes, but looking >>> forward >>> even a little and 64 is not a great stretch. >>> I'd make the jump to 64 so we can forget about this again for a year >>> or two. >>> People will want to run today's distros on these new machines and we'd >>> rather not have to go around all the distros asking them to carry a >>> patch >>> increasing this count (I assume they are already carrying such a patch >>> due to those 8 node systems) > > To echo Jonathan's statement above we are looking at systems that will > need approximately 64 NUMA nodes over the next 5-6 years - the time for > which an LTS kernel would be maintained. Some of the reason's for > increasing NUMA nodes during this time period include CXL, CCIX and > NVDIMM (like Jonathan pointed out). > > The main argument against increasing the NODES_SHIFT seems to be a > concern that it negatively impacts other ARM64 systems. Could anyone > share what kind of systems we are talking about? For a system that has > NEED_MULTIPLE_NODES set, would the impact be noticeable? Systems like the ESPRESSObin - sure, sane people aren't trying to run desktops or development environments in 1GB of RAM, but it's not uncommon for them to use a minimal headless install of their favourite generic arm64 distro rather than something more "embedded" like OpenWrt or Armbian. Increasing a generic kernel's memory footprint (and perhaps more importantly, cache footprint) more than necessary is going to have *some* impact. Robin. > > Vanshi > >> >> Nit: I doubt any sane distro is going to carry a patch to adjust the >> *default* value of a Kconfig option. They might tune the actual value >> in their config, but, well, isn't that the whole point of configs? ;) >> >> Robin. >> >>> >>> Jonathan >>> >>>> >>>>>> depends on NEED_MULTIPLE_NODES >>>>>> help >>>>>> Specify the maximum number of NUMA Nodes available on the >>>>>> target >>>> >>>> _______________________________________________ >>>> linux-arm-kernel mailing list >>>> linux-arm-kernel@lists.infradead.org >>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >>> >>> >>> >>> _______________________________________________ >>> linux-arm-kernel mailing list >>> linux-arm-kernel@lists.infradead.org >>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >>>
On Thu, Oct 22, 2020 at 12:21:27PM +0100, Robin Murphy wrote: >On 2020-10-22 02:07, Vanshi Konda wrote: >>On Thu, Oct 22, 2020 at 12:44:15AM +0100, Robin Murphy wrote: >>>On 2020-10-21 12:02, Jonathan Cameron wrote: >>>>On Wed, 21 Oct 2020 09:43:21 +0530 >>>>Anshuman Khandual <anshuman.khandual@arm.com> wrote: >>>> >>>>>On 10/20/2020 11:39 PM, Valentin Schneider wrote: >>>>>> >>>>>>Hi, >>>>>> >>>>>>Nit on the subject: this only increases the default, the max >>>>>>is still 2?????. >>>>> >>>>>Agreed. >>>>> >>>>>> >>>>>>On 20/10/20 18:34, Vanshidhar Konda wrote: >>>>>>>The current arm64 max NUMA nodes default to 4. Today's >>>>>>>arm64 systems can >>>>>>>reach or exceed 16. Increase the number to 64 (matching x86_64). >>>>>>> >>>>>>>Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com> >>>>>>>--- >>>>>>>??arch/arm64/Kconfig | 2 +- >>>>>>>??1 file changed, 1 insertion(+), 1 deletion(-) >>>>>>> >>>>>>>diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig >>>>>>>index 893130ce1626..3e69d3c981be 100644 >>>>>>>--- a/arch/arm64/Kconfig >>>>>>>+++ b/arch/arm64/Kconfig >>>>>>>@@ -980,7 +980,7 @@ config NUMA >>>>>>>??config NODES_SHIFT >>>>>>>?????????? int "Maximum NUMA Nodes (as a power of 2)" >>>>>>>?????????? range 1 10 >>>>>>>-?????? default "2" >>>>>>>+?????? default "6" >>>>>> >>>>>>This leads to more statically allocated memory for things >>>>>>like node to CPU >>>>>>maps (see uses of MAX_NUMNODES), but that shouldn't be too much of an >>>>>>issue. >>>>> >>>>>The smaller systems should not be required to waste those memory in >>>>>a default case, unless there is a real and available larger system >>>>>with those increased nodes. >>>>> >>>>>> >>>>>>AIUI this also directly correlates to how many more >>>>>>page->flags bits are >>>>>>required: are we sure the max 10 works on any aarch64 platform? I'm >>>>> >>>>>We will have to test that. Besides 256 (2 ^ 8) is the first threshold >>>>>to be crossed here. >>>>> >>>>>>genuinely asking here, given that I'm mostly a stranger to the mm >>>>>>world. The default should be something we're somewhat confident works >>>>>>everywhere. >>>>> >>>>>Agreed. Do we really need to match X86 right now ? Do we really have >>>>>systems that has 64 nodes ? We should not increase the default node >>>>>value and then try to solve some new problems, when there might not >>>>>be any system which could even use that. I would suggest increase >>>>>NODES_SHIFT value upto as required by a real and available system. >>>> >>>>I'm not going to give precise numbers on near future systems but >>>>it is public >>>>that we ship 8 NUMA node ARM64 systems today.?? Things will get more >>>>interesting as CXL and CCIX enter the market on ARM systems, >>>>given chances are every CXL device will look like another NUMA >>>>node (CXL spec says they should be presented as such) and you >>>>may be able to rack up lots of them. >>>> >>>>So I'd argue minimum that makes sense today is 16 nodes, but >>>>looking forward >>>>even a little and 64 is not a great stretch. >>>>I'd make the jump to 64 so we can forget about this again for a >>>>year or two. >>>>People will want to run today's distros on these new machines and we'd >>>>rather not have to go around all the distros asking them to >>>>carry a patch >>>>increasing this count (I assume they are already carrying such a patch >>>>due to those 8 node systems) >> >>To echo Jonathan's statement above we are looking at systems that will >>need approximately 64 NUMA nodes over the next 5-6 years - the time for >>which an LTS kernel would be maintained. Some of the reason's for >>increasing NUMA nodes during this time period include CXL, CCIX and >>NVDIMM (like Jonathan pointed out). >> >>The main argument against increasing the NODES_SHIFT seems to be a >>concern that it negatively impacts other ARM64 systems. Could anyone >>share what kind of systems we are talking about? For a system that has >>NEED_MULTIPLE_NODES set, would the impact be noticeable? > >Systems like the ESPRESSObin - sure, sane people aren't trying to run >desktops or development environments in 1GB of RAM, but it's not >uncommon for them to use a minimal headless install of their favourite >generic arm64 distro rather than something more "embedded" like If someone is running a generic arm64 distro, at least some of them are already paying the extra cost. NODES_SHIFT for Ubuntu and SuSE kernels is already 6. CentOS/Redhat and Oracle Linux set it to 3. I've only seen Debian set it to 2. Vanshi >OpenWrt or Armbian. Increasing a generic kernel's memory footprint >(and perhaps more importantly, cache footprint) more than necessary is >going to have *some* impact. > >Robin. > >> >>Vanshi >> >>> >>>Nit: I doubt any sane distro is going to carry a patch to adjust >>>the *default* value of a Kconfig option. They might tune the >>>actual value in their config, but, well, isn't that the whole >>>point of configs? ;) >>> >>>Robin. >>> >>>> >>>>Jonathan >>>> >>>>> >>>>>>>?????????? depends on NEED_MULTIPLE_NODES >>>>>>>?????????? help >>>>>>>?????????????? Specify the maximum number of NUMA Nodes >>>>>>>available on the target >>>>> >>>>>_______________________________________________ >>>>>linux-arm-kernel mailing list >>>>>linux-arm-kernel@lists.infradead.org >>>>>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >>>> >>>> >>>> >>>>_______________________________________________ >>>>linux-arm-kernel mailing list >>>>linux-arm-kernel@lists.infradead.org >>>>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >>>>
On 10/22/20 11:25 AM, Vanshi Konda wrote: > On Thu, Oct 22, 2020 at 12:21:27PM +0100, Robin Murphy wrote: >> On 2020-10-22 02:07, Vanshi Konda wrote: >>> On Thu, Oct 22, 2020 at 12:44:15AM +0100, Robin Murphy wrote: >>>> On 2020-10-21 12:02, Jonathan Cameron wrote: >>>>> On Wed, 21 Oct 2020 09:43:21 +0530 >>>>> Anshuman Khandual <anshuman.khandual@arm.com> wrote: >>>>> >>>>>> On 10/20/2020 11:39 PM, Valentin Schneider wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Nit on the subject: this only increases the default, the max is >>>>>>> still 2?????. >>>>>> >>>>>> Agreed. >>>>>> >>>>>>> >>>>>>> On 20/10/20 18:34, Vanshidhar Konda wrote: >>>>>>>> The current arm64 max NUMA nodes default to 4. Today's arm64 >>>>>>>> systems can >>>>>>>> reach or exceed 16. Increase the number to 64 (matching x86_64). >>>>>>>> >>>>>>>> Signed-off-by: Vanshidhar Konda >>>>>>>> <vanshikonda@os.amperecomputing.com> >>>>>>>> --- >>>>>>>> ??arch/arm64/Kconfig | 2 +- >>>>>>>> ??1 file changed, 1 insertion(+), 1 deletion(-) >>>>>>>> >>>>>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig >>>>>>>> index 893130ce1626..3e69d3c981be 100644 >>>>>>>> --- a/arch/arm64/Kconfig >>>>>>>> +++ b/arch/arm64/Kconfig >>>>>>>> @@ -980,7 +980,7 @@ config NUMA >>>>>>>> ??config NODES_SHIFT >>>>>>>> ?????????? int "Maximum NUMA Nodes (as a power of 2)" >>>>>>>> ?????????? range 1 10 >>>>>>>> -?????? default "2" >>>>>>>> +?????? default "6" >>>>>>> >>>>>>> This leads to more statically allocated memory for things like >>>>>>> node to CPU >>>>>>> maps (see uses of MAX_NUMNODES), but that shouldn't be too much >>>>>>> of an >>>>>>> issue. >>>>>> >>>>>> The smaller systems should not be required to waste those memory in >>>>>> a default case, unless there is a real and available larger system >>>>>> with those increased nodes. >>>>>> >>>>>>> >>>>>>> AIUI this also directly correlates to how many more page->flags >>>>>>> bits are >>>>>>> required: are we sure the max 10 works on any aarch64 platform? I'm >>>>>> >>>>>> We will have to test that. Besides 256 (2 ^ 8) is the first threshold >>>>>> to be crossed here. >>>>>> >>>>>>> genuinely asking here, given that I'm mostly a stranger to the mm >>>>>>> world. The default should be something we're somewhat confident >>>>>>> works >>>>>>> everywhere. >>>>>> >>>>>> Agreed. Do we really need to match X86 right now ? Do we really have >>>>>> systems that has 64 nodes ? We should not increase the default node >>>>>> value and then try to solve some new problems, when there might not >>>>>> be any system which could even use that. I would suggest increase >>>>>> NODES_SHIFT value upto as required by a real and available system. >>>>> >>>>> I'm not going to give precise numbers on near future systems but it >>>>> is public >>>>> that we ship 8 NUMA node ARM64 systems today.?? Things will get more >>>>> interesting as CXL and CCIX enter the market on ARM systems, >>>>> given chances are every CXL device will look like another NUMA >>>>> node (CXL spec says they should be presented as such) and you >>>>> may be able to rack up lots of them. >>>>> >>>>> So I'd argue minimum that makes sense today is 16 nodes, but >>>>> looking forward >>>>> even a little and 64 is not a great stretch. >>>>> I'd make the jump to 64 so we can forget about this again for a >>>>> year or two. >>>>> People will want to run today's distros on these new machines and we'd >>>>> rather not have to go around all the distros asking them to carry a >>>>> patch >>>>> increasing this count (I assume they are already carrying such a patch >>>>> due to those 8 node systems) >>> >>> To echo Jonathan's statement above we are looking at systems that will >>> need approximately 64 NUMA nodes over the next 5-6 years - the time for >>> which an LTS kernel would be maintained. Some of the reason's for >>> increasing NUMA nodes during this time period include CXL, CCIX and >>> NVDIMM (like Jonathan pointed out). This is a very good point. It won't be long until systems will be pushing the number of NUMA nodes and increasing NODES_SHIFT only slightly now will result in the default configuration not recognizing all the nodes. CONFIG_NODES_SHIFT=6 seems a reasonable step up for a generic kernel that should run well on small to very large systems for a few years to come. >>> The main argument against increasing the NODES_SHIFT seems to be a >>> concern that it negatively impacts other ARM64 systems. Could anyone >>> share what kind of systems we are talking about? For a system that has >>> NEED_MULTIPLE_NODES set, would the impact be noticeable? >> >> Systems like the ESPRESSObin - sure, sane people aren't trying to run >> desktops or development environments in 1GB of RAM, but it's not >> uncommon for them to use a minimal headless install of their favourite >> generic arm64 distro rather than something more "embedded" like > > If someone is running a generic arm64 distro, at least some of them are > already paying the extra cost. NODES_SHIFT for Ubuntu and SuSE kernels > is already 6. CentOS/Redhat and Oracle Linux set it to 3. I've only seen > Debian set it to 2. Right. The distros may not agree or even care what the default is, but it doesn't make sense for the mainline default to lag too far behind what the major distros use. Shaggy > > Vanshi > >> OpenWrt or Armbian. Increasing a generic kernel's memory footprint >> (and perhaps more importantly, cache footprint) more than necessary is >> going to have *some* impact. >> >> Robin. >> >>> >>> Vanshi >>> >>>> >>>> Nit: I doubt any sane distro is going to carry a patch to adjust the >>>> *default* value of a Kconfig option. They might tune the actual >>>> value in their config, but, well, isn't that the whole point of >>>> configs? ;) >>>> >>>> Robin. >>>> >>>>> >>>>> Jonathan >>>>> >>>>>> >>>>>>>> ?????????? depends on NEED_MULTIPLE_NODES >>>>>>>> ?????????? help >>>>>>>> ?????????????? Specify the maximum number of NUMA Nodes >>>>>>>> available on the target
On Thu, Oct 22, 2020 at 12:21:27PM +0100, Robin Murphy wrote: >On 2020-10-22 02:07, Vanshi Konda wrote: >>On Thu, Oct 22, 2020 at 12:44:15AM +0100, Robin Murphy wrote: >>>On 2020-10-21 12:02, Jonathan Cameron wrote: >>>>On Wed, 21 Oct 2020 09:43:21 +0530 >>>>Anshuman Khandual <anshuman.khandual@arm.com> wrote: >>>> >>>>>On 10/20/2020 11:39 PM, Valentin Schneider wrote: >>>>>> >>>>>>Hi, >>>>>> >>>>>>Nit on the subject: this only increases the default, the max >>>>>>is still 2?????. >>>>> >>>>>Agreed. >>>>> >>>>>> >>>>>>On 20/10/20 18:34, Vanshidhar Konda wrote: >>>>>>>The current arm64 max NUMA nodes default to 4. Today's >>>>>>>arm64 systems can >>>>>>>reach or exceed 16. Increase the number to 64 (matching x86_64). >>>>>>> >>>>>>>Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com> >>>>>>>--- >>>>>>>??arch/arm64/Kconfig | 2 +- >>>>>>>??1 file changed, 1 insertion(+), 1 deletion(-) >>>>>>> >>>>>>>diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig >>>>>>>index 893130ce1626..3e69d3c981be 100644 >>>>>>>--- a/arch/arm64/Kconfig >>>>>>>+++ b/arch/arm64/Kconfig >>>>>>>@@ -980,7 +980,7 @@ config NUMA >>>>>>>??config NODES_SHIFT >>>>>>>?????????? int "Maximum NUMA Nodes (as a power of 2)" >>>>>>>?????????? range 1 10 >>>>>>>-?????? default "2" >>>>>>>+?????? default "6" >>>>>> >>>>>>This leads to more statically allocated memory for things >>>>>>like node to CPU >>>>>>maps (see uses of MAX_NUMNODES), but that shouldn't be too much of an >>>>>>issue. >>>>> >>>>>The smaller systems should not be required to waste those memory in >>>>>a default case, unless there is a real and available larger system >>>>>with those increased nodes. >>>>> >>>>>> >>>>>>AIUI this also directly correlates to how many more >>>>>>page->flags bits are >>>>>>required: are we sure the max 10 works on any aarch64 platform? I'm >>>>> >>>>>We will have to test that. Besides 256 (2 ^ 8) is the first threshold >>>>>to be crossed here. >>>>> >>>>>>genuinely asking here, given that I'm mostly a stranger to the mm >>>>>>world. The default should be something we're somewhat confident works >>>>>>everywhere. >>>>> >>>>>Agreed. Do we really need to match X86 right now ? Do we really have >>>>>systems that has 64 nodes ? We should not increase the default node >>>>>value and then try to solve some new problems, when there might not >>>>>be any system which could even use that. I would suggest increase >>>>>NODES_SHIFT value upto as required by a real and available system. >>>> >>>>I'm not going to give precise numbers on near future systems but >>>>it is public >>>>that we ship 8 NUMA node ARM64 systems today.?? Things will get more >>>>interesting as CXL and CCIX enter the market on ARM systems, >>>>given chances are every CXL device will look like another NUMA >>>>node (CXL spec says they should be presented as such) and you >>>>may be able to rack up lots of them. >>>> >>>>So I'd argue minimum that makes sense today is 16 nodes, but >>>>looking forward >>>>even a little and 64 is not a great stretch. >>>>I'd make the jump to 64 so we can forget about this again for a >>>>year or two. >>>>People will want to run today's distros on these new machines and we'd >>>>rather not have to go around all the distros asking them to >>>>carry a patch >>>>increasing this count (I assume they are already carrying such a patch >>>>due to those 8 node systems) >> >>To echo Jonathan's statement above we are looking at systems that will >>need approximately 64 NUMA nodes over the next 5-6 years - the time for >>which an LTS kernel would be maintained. Some of the reason's for >>increasing NUMA nodes during this time period include CXL, CCIX and >>NVDIMM (like Jonathan pointed out). >> >>The main argument against increasing the NODES_SHIFT seems to be a >>concern that it negatively impacts other ARM64 systems. Could anyone >>share what kind of systems we are talking about? For a system that has >>NEED_MULTIPLE_NODES set, would the impact be noticeable? > >Systems like the ESPRESSObin - sure, sane people aren't trying to run >desktops or development environments in 1GB of RAM, but it's not >uncommon for them to use a minimal headless install of their favourite >generic arm64 distro rather than something more "embedded" like >OpenWrt or Armbian. Increasing a generic kernel's memory footprint >(and perhaps more importantly, cache footprint) more than necessary is >going to have *some* impact. > Ampere’s platforms support multiple NUMA configuration options to meet different customer requirements. Multiple configurations have more than 4 (currrent default) NUMA nodes. These fail to initialize NUMA with the following errors in dmesg: [ 0.000000] ACPI: SRAT: Too many proximity domains. [ 0.000000] ACPI: SRAT: SRAT not used. [ 0.000000] SRAT: Invalid NUMA node -1 in ITS affinity [ 0.000000] SRAT: Invalid NUMA node -1 in ITS affinity If we look at the forecast for the next LTS kernel lifetime, the number of NUMA nodes will increase significantly due to SOCs with significantly higher core counts, increased number of memory channels, and new devices such as CCIX attached memory, etc. Supporting these platforms with a default kernel config will require a minimum NODES_SHIFT value = 6. Vanshi >Robin. > >> >>Vanshi >> >>> >>>Nit: I doubt any sane distro is going to carry a patch to adjust >>>the *default* value of a Kconfig option. They might tune the >>>actual value in their config, but, well, isn't that the whole >>>point of configs? ;) >>> >>>Robin. >>> >>>> >>>>Jonathan >>>> >>>>> >>>>>>>?????????? depends on NEED_MULTIPLE_NODES >>>>>>>?????????? help >>>>>>>?????????????? Specify the maximum number of NUMA Nodes >>>>>>>available on the target >>>>> >>>>>_______________________________________________ >>>>>linux-arm-kernel mailing list >>>>>linux-arm-kernel@lists.infradead.org >>>>>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >>>> >>>> >>>> >>>>_______________________________________________ >>>>linux-arm-kernel mailing list >>>>linux-arm-kernel@lists.infradead.org >>>>http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >>>>
On Wed, Oct 21, 2020 at 11:29:41PM +0100, Valentin Schneider wrote: > On 21/10/20 12:02, Jonathan Cameron wrote: > > On Wed, 21 Oct 2020 09:43:21 +0530 > > Anshuman Khandual <anshuman.khandual@arm.com> wrote: > >> Agreed. Do we really need to match X86 right now ? Do we really have > >> systems that has 64 nodes ? We should not increase the default node > >> value and then try to solve some new problems, when there might not > >> be any system which could even use that. I would suggest increase > >> NODES_SHIFT value upto as required by a real and available system. > > > > I'm not going to give precise numbers on near future systems but it is public > > that we ship 8 NUMA node ARM64 systems today. Things will get more > > interesting as CXL and CCIX enter the market on ARM systems, > > given chances are every CXL device will look like another NUMA > > node (CXL spec says they should be presented as such) and you > > may be able to rack up lots of them. > > > > So I'd argue minimum that makes sense today is 16 nodes, but looking forward > > even a little and 64 is not a great stretch. > > I'd make the jump to 64 so we can forget about this again for a year or two. > > People will want to run today's distros on these new machines and we'd > > rather not have to go around all the distros asking them to carry a patch > > increasing this count (I assume they are already carrying such a patch > > due to those 8 node systems) > > I agree that 4 nodes is somewhat anemic; I've had to bump that just to > run some scheduler tests under QEMU. However I still believe we should > exercise caution before cranking it too high, especially when seeing things > like: > > ee38d94a0ad8 ("page flags: prioritize kasan bits over last-cpuid") > > To give some numbers, a defconfig build gives me: > > SECTIONS_WIDTH=0 ZONES_WIDTH=2 NODES_SHIFT=2 LAST_CPUPID_SHIFT=(8+8) KASAN_TAG_WIDTH=0 > BITS_PER_LONG=64 NR_PAGEFLAGS=24 > > IOW, we need 18 + NODES_SHIFT <= 40 -> NODES_SHIFT <= 22. That looks to be > plenty, however this can get cramped fairly easily with any combination of: > > CONFIG_SPARSEMEM_VMEMMAP=n (-18) > CONFIG_IDLE_PAGE_TRACKING=y (-2) > CONFIG_KASAN=y + CONFIG_KASAN_SW_TAGS (-8) > > Taking Arnd's above example, a randconfig build picking !VMEMMAP already > limits the NODES_SHIFT to 4 *if* we want to keep the CPUPID thing within > the flags (it gets a dedicated field at the tail of struct page > otherwise). If that is something we don't care too much about, then > consider my concerns taken care of. I don't think there's any value in allowing SPARSEMEM_VMEMMAP to be disabled but the option is in the core mm/Kconfig file. We could make NODES_SHIFT depend on SPARSEMEM_VMEMMAP (there's DISCONTIGMEM as well but hopefully that's going away soon). > One more thing though: NR_CPUS can be cranked up to 4096 but we've only set > it to 256 IIRC to support the TX2. From that PoV, I'm agreeing with > Anshuman in that we should set it to match the max encountered on platforms > that are in use right now. I agree. Let's bump NODES_SHIFT to 4 now to cover existing platforms. If distros have a 10-year view, they can always ship a kernel configured to 64 nodes, no need to change Kconfig (distros never ship with defconfig). It may have an impact on more memory constrained platforms but that's not what defconfig is about. It should allow existing hardware to run Linux but not necessarily run it in the most efficient way possible.
On Thu, Oct 29, 2020 at 01:37:10PM +0000, Catalin Marinas wrote: >On Wed, Oct 21, 2020 at 11:29:41PM +0100, Valentin Schneider wrote: >> On 21/10/20 12:02, Jonathan Cameron wrote: >> > On Wed, 21 Oct 2020 09:43:21 +0530 >> > Anshuman Khandual <anshuman.khandual@arm.com> wrote: >> >> Agreed. Do we really need to match X86 right now ? Do we really have >> >> systems that has 64 nodes ? We should not increase the default node >> >> value and then try to solve some new problems, when there might not >> >> be any system which could even use that. I would suggest increase >> >> NODES_SHIFT value upto as required by a real and available system. >> > >> > I'm not going to give precise numbers on near future systems but it is public >> > that we ship 8 NUMA node ARM64 systems today. Things will get more >> > interesting as CXL and CCIX enter the market on ARM systems, >> > given chances are every CXL device will look like another NUMA >> > node (CXL spec says they should be presented as such) and you >> > may be able to rack up lots of them. >> > >> > So I'd argue minimum that makes sense today is 16 nodes, but looking forward >> > even a little and 64 is not a great stretch. >> > I'd make the jump to 64 so we can forget about this again for a year or two. >> > People will want to run today's distros on these new machines and we'd >> > rather not have to go around all the distros asking them to carry a patch >> > increasing this count (I assume they are already carrying such a patch >> > due to those 8 node systems) >> >> I agree that 4 nodes is somewhat anemic; I've had to bump that just to >> run some scheduler tests under QEMU. However I still believe we should >> exercise caution before cranking it too high, especially when seeing things >> like: >> >> ee38d94a0ad8 ("page flags: prioritize kasan bits over last-cpuid") >> >> To give some numbers, a defconfig build gives me: >> >> SECTIONS_WIDTH=0 ZONES_WIDTH=2 NODES_SHIFT=2 LAST_CPUPID_SHIFT=(8+8) KASAN_TAG_WIDTH=0 >> BITS_PER_LONG=64 NR_PAGEFLAGS=24 >> >> IOW, we need 18 + NODES_SHIFT <= 40 -> NODES_SHIFT <= 22. That looks to be >> plenty, however this can get cramped fairly easily with any combination of: >> >> CONFIG_SPARSEMEM_VMEMMAP=n (-18) >> CONFIG_IDLE_PAGE_TRACKING=y (-2) >> CONFIG_KASAN=y + CONFIG_KASAN_SW_TAGS (-8) >> >> Taking Arnd's above example, a randconfig build picking !VMEMMAP already >> limits the NODES_SHIFT to 4 *if* we want to keep the CPUPID thing within >> the flags (it gets a dedicated field at the tail of struct page >> otherwise). If that is something we don't care too much about, then >> consider my concerns taken care of. > >I don't think there's any value in allowing SPARSEMEM_VMEMMAP to be >disabled but the option is in the core mm/Kconfig file. We could make >NODES_SHIFT depend on SPARSEMEM_VMEMMAP (there's DISCONTIGMEM as well >but hopefully that's going away soon). > >> One more thing though: NR_CPUS can be cranked up to 4096 but we've only set >> it to 256 IIRC to support the TX2. From that PoV, I'm agreeing with >> Anshuman in that we should set it to match the max encountered on platforms >> that are in use right now. > >I agree. Let's bump NODES_SHIFT to 4 now to cover existing platforms. If >distros have a 10-year view, they can always ship a kernel configured to >64 nodes, no need to change Kconfig (distros never ship with defconfig). > >It may have an impact on more memory constrained platforms but that's >not what defconfig is about. It should allow existing hardware to run >Linux but not necessarily run it in the most efficient way possible. > From the discussion it looks like 4 is an acceptable number to support current hardware. I'll send a patch with NODES_SHIFT set to 4. Is it still possible to add this change to the 5.10 kernel? Vanshi >-- >Catalin
On Thu, Oct 29, 2020 at 12:48:50PM -0700, Vanshidhar Konda wrote: > On Thu, Oct 29, 2020 at 01:37:10PM +0000, Catalin Marinas wrote: > > On Wed, Oct 21, 2020 at 11:29:41PM +0100, Valentin Schneider wrote: > > > On 21/10/20 12:02, Jonathan Cameron wrote: > > > > On Wed, 21 Oct 2020 09:43:21 +0530 > > > > Anshuman Khandual <anshuman.khandual@arm.com> wrote: > > > >> Agreed. Do we really need to match X86 right now ? Do we really have > > > >> systems that has 64 nodes ? We should not increase the default node > > > >> value and then try to solve some new problems, when there might not > > > >> be any system which could even use that. I would suggest increase > > > >> NODES_SHIFT value upto as required by a real and available system. > > > > > > > > I'm not going to give precise numbers on near future systems but it is public > > > > that we ship 8 NUMA node ARM64 systems today. Things will get more > > > > interesting as CXL and CCIX enter the market on ARM systems, > > > > given chances are every CXL device will look like another NUMA > > > > node (CXL spec says they should be presented as such) and you > > > > may be able to rack up lots of them. > > > > > > > > So I'd argue minimum that makes sense today is 16 nodes, but looking forward > > > > even a little and 64 is not a great stretch. > > > > I'd make the jump to 64 so we can forget about this again for a year or two. > > > > People will want to run today's distros on these new machines and we'd > > > > rather not have to go around all the distros asking them to carry a patch > > > > increasing this count (I assume they are already carrying such a patch > > > > due to those 8 node systems) > > > > > > I agree that 4 nodes is somewhat anemic; I've had to bump that just to > > > run some scheduler tests under QEMU. However I still believe we should > > > exercise caution before cranking it too high, especially when seeing things > > > like: > > > > > > ee38d94a0ad8 ("page flags: prioritize kasan bits over last-cpuid") > > > > > > To give some numbers, a defconfig build gives me: > > > > > > SECTIONS_WIDTH=0 ZONES_WIDTH=2 NODES_SHIFT=2 LAST_CPUPID_SHIFT=(8+8) KASAN_TAG_WIDTH=0 > > > BITS_PER_LONG=64 NR_PAGEFLAGS=24 > > > > > > IOW, we need 18 + NODES_SHIFT <= 40 -> NODES_SHIFT <= 22. That looks to be > > > plenty, however this can get cramped fairly easily with any combination of: > > > > > > CONFIG_SPARSEMEM_VMEMMAP=n (-18) > > > CONFIG_IDLE_PAGE_TRACKING=y (-2) > > > CONFIG_KASAN=y + CONFIG_KASAN_SW_TAGS (-8) > > > > > > Taking Arnd's above example, a randconfig build picking !VMEMMAP already > > > limits the NODES_SHIFT to 4 *if* we want to keep the CPUPID thing within > > > the flags (it gets a dedicated field at the tail of struct page > > > otherwise). If that is something we don't care too much about, then > > > consider my concerns taken care of. > > > > I don't think there's any value in allowing SPARSEMEM_VMEMMAP to be > > disabled but the option is in the core mm/Kconfig file. We could make > > NODES_SHIFT depend on SPARSEMEM_VMEMMAP (there's DISCONTIGMEM as well > > but hopefully that's going away soon). > > > > > One more thing though: NR_CPUS can be cranked up to 4096 but we've only set > > > it to 256 IIRC to support the TX2. From that PoV, I'm agreeing with > > > Anshuman in that we should set it to match the max encountered on platforms > > > that are in use right now. > > > > I agree. Let's bump NODES_SHIFT to 4 now to cover existing platforms. If > > distros have a 10-year view, they can always ship a kernel configured to > > 64 nodes, no need to change Kconfig (distros never ship with defconfig). > > > > It may have an impact on more memory constrained platforms but that's > > not what defconfig is about. It should allow existing hardware to run > > Linux but not necessarily run it in the most efficient way possible. > > > > From the discussion it looks like 4 is an acceptable number to support > current hardware. I'll send a patch with NODES_SHIFT set to 4. Is it still > possible to add this change to the 5.10 kernel? I think we can but I'll leave the decision to Will (and don't forget to cc the arm64 maintainers on your next post).
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 893130ce1626..3e69d3c981be 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -980,7 +980,7 @@ config NUMA config NODES_SHIFT int "Maximum NUMA Nodes (as a power of 2)" range 1 10 - default "2" + default "6" depends on NEED_MULTIPLE_NODES help Specify the maximum number of NUMA Nodes available on the target
The current arm64 max NUMA nodes default to 4. Today's arm64 systems can reach or exceed 16. Increase the number to 64 (matching x86_64). Signed-off-by: Vanshidhar Konda <vanshikonda@os.amperecomputing.com> --- arch/arm64/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)