Message ID | 20191216103522.32215-2-gonsolo@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix SH config error. | expand |
Hi Gon, Thanks for your patch! On Mon, Dec 16, 2019 at 11:35 AM Gon Solo <gonsolo@gmail.com> wrote: > According to https://lkml.org/lkml/2019/12/16/101 and > http://kisskb.ellerman.id.au/kisskb/buildresult/14067948/ building on > sh4 is broken due to a > > page_alloc.c:(.text+0x3148): undefined reference to `node_reclaim_distance'. > > This only happens with CONFIG_NUMA=y (variable used with #ifdef > CONFIG_NUMA at mm/page_alloc.c:3529) and CONFIG_SMP=n (variable defined at > kernel/sched/topology.c:2291 but the whole file to be built depends on > CONFIG_SMP in kernel/sched/Makefile:23. > > Follow the lead of arch/x86/Kconfig:1547 and depend on SMP. > > This assumes that there are no NUMA systems without SMP which is > reasonable I guess. Unfortunately that may be an x86-centric assumption: on other platforms, there do exist systems with multiple memory banks with different access performance figures. > Signed-off-by: Gon Solo <gonsolo@gmail.com> > --- > arch/sh/mm/Kconfig | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/sh/mm/Kconfig b/arch/sh/mm/Kconfig > index 5c8a2ebfc720..cf655d8e8758 100644 > --- a/arch/sh/mm/Kconfig > +++ b/arch/sh/mm/Kconfig > @@ -108,7 +108,7 @@ config VSYSCALL > > config NUMA > bool "Non Uniform Memory Access (NUMA) Support" > - depends on MMU && SYS_SUPPORTS_NUMA > + depends on MMU && SMP && SYS_SUPPORTS_NUMA > select ARCH_WANT_NUMA_VARIABLE_LOCALITY > default n > help Gr{oetje,eeting}s, Geert
Hi! > Unfortunately that may be an x86-centric assumption: on other platforms, > there do exist systems with multiple memory banks with different access > performance figures. Does that mean that the line obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o in kernel/sched/topology/Makefile:23 is wrong? Because in topology.c:1284 the variable node_reclaim_distance ist defined which is used in mm/page_alloc.c:3529 without depending on SMP.
Hi Gon, On Mon, Dec 16, 2019 at 2:21 PM Gonsolo <gonsolo@gmail.com> wrote: > > Unfortunately that may be an x86-centric assumption: on other platforms, > > there do exist systems with multiple memory banks with different access > > performance figures. > > Does that mean that the line > > obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o > > in kernel/sched/topology/Makefile:23 is wrong? > > Because in topology.c:1284 the variable node_reclaim_distance ist defined > which is used in mm/page_alloc.c:3529 without depending on SMP. The offending commit seems to be a55c7454a8c887b2 ("sched/topology: Improve load balancing on AMD EPYC systems"). Probably the node_reclaim_distance variable should be moved from an SMP-specific file to a NUMA-specific file. Gr{oetje,eeting}s, Geert
> The offending commit seems to be a55c7454a8c887b2 ("sched/topology: > Improve load balancing on AMD EPYC systems"). > > Probably the node_reclaim_distance variable should be moved from > an SMP-specific file to a NUMA-specific file. There are two variables that are used elsewhere: int sched_max_numa_distance; Used in kernel/sched/fair.c and kernel/sched/topology.c. I would move it to fair.c. int __read_mostly node_reclaim_distance = RECLAIM_DISTANCE; Used in arch/x86/kernel/cpu/amd.c, line 894 kernel/sched/topology.c mm/khugepaged.c, line 725 mm/page_alloc.c, line 3529 I'm not sure where to move this one.
On Mon, 16 Dec, at 02:50:49PM, Gonsolo wrote: > > The offending commit seems to be a55c7454a8c887b2 ("sched/topology: > > Improve load balancing on AMD EPYC systems"). > > > > Probably the node_reclaim_distance variable should be moved from > > an SMP-specific file to a NUMA-specific file. > > There are two variables that are used elsewhere: > > int sched_max_numa_distance; > > Used in kernel/sched/fair.c and kernel/sched/topology.c. I would move > it to fair.c. > > int __read_mostly node_reclaim_distance = RECLAIM_DISTANCE; > > Used in > arch/x86/kernel/cpu/amd.c, line 894 > kernel/sched/topology.c > mm/khugepaged.c, line 725 > mm/page_alloc.c, line 3529 > > I'm not sure where to move this one. Can someone test out this patch on one of the failing architectures? (sh, ppc64) ---->8---- diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 6ec1e595b1d4..bf20e5883026 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1281,7 +1281,6 @@ static int sched_domains_curr_level; int sched_max_numa_distance; static int *sched_domains_numa_distance; static struct cpumask ***sched_domains_numa_masks; -int __read_mostly node_reclaim_distance = RECLAIM_DISTANCE; #endif /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4785a8a2040e..733890d913ea 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3523,6 +3523,7 @@ bool zone_watermark_ok_safe(struct zone *z, unsigned int order, } #ifdef CONFIG_NUMA +int __read_mostly node_reclaim_distance = RECLAIM_DISTANCE; static bool zone_allows_reclaim(struct zone *local_zone, struct zone *zone) { return node_distance(zone_to_nid(local_zone), zone_to_nid(zone)) <=
On 12/23/19 8:42 AM, Matt Fleming wrote: > On Mon, 16 Dec, at 02:50:49PM, Gonsolo wrote: >>> The offending commit seems to be a55c7454a8c887b2 ("sched/topology: >>> Improve load balancing on AMD EPYC systems"). >>> >>> Probably the node_reclaim_distance variable should be moved from >>> an SMP-specific file to a NUMA-specific file. >> >> There are two variables that are used elsewhere: >> >> int sched_max_numa_distance; >> >> Used in kernel/sched/fair.c and kernel/sched/topology.c. I would move >> it to fair.c. >> >> int __read_mostly node_reclaim_distance = RECLAIM_DISTANCE; >> >> Used in >> arch/x86/kernel/cpu/amd.c, line 894 >> kernel/sched/topology.c >> mm/khugepaged.c, line 725 >> mm/page_alloc.c, line 3529 >> >> I'm not sure where to move this one. > > Can someone test out this patch on one of the failing architectures? > (sh, ppc64) Yes, it fixes the arch/sh/ build errors in my testing. I don't have a failing ppc64 .config file to test. thanks. > ---->8---- > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c > index 6ec1e595b1d4..bf20e5883026 100644 > --- a/kernel/sched/topology.c > +++ b/kernel/sched/topology.c > @@ -1281,7 +1281,6 @@ static int sched_domains_curr_level; > int sched_max_numa_distance; > static int *sched_domains_numa_distance; > static struct cpumask ***sched_domains_numa_masks; > -int __read_mostly node_reclaim_distance = RECLAIM_DISTANCE; > #endif > > /* > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 4785a8a2040e..733890d913ea 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3523,6 +3523,7 @@ bool zone_watermark_ok_safe(struct zone *z, unsigned int order, > } > > #ifdef CONFIG_NUMA > +int __read_mostly node_reclaim_distance = RECLAIM_DISTANCE; > static bool zone_allows_reclaim(struct zone *local_zone, struct zone *zone) > { > return node_distance(zone_to_nid(local_zone), zone_to_nid(zone)) <= >
Hi! > Yes, it fixes the arch/sh/ build errors in my testing. > I don't have a failing ppc64 .config file to test. As of v5.7-rc4 this patch doesn't seem to have been applied and the build is still failing: http://kisskb.ellerman.id.au/kisskb/buildresult/14067948/ diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 6ec1e595b1d4..bf20e5883026 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1281,7 +1281,6 @@ static int sched_domains_curr_level; int sched_max_numa_distance; static int *sched_domains_numa_distance; static struct cpumask ***sched_domains_numa_masks; -int __read_mostly node_reclaim_distance = RECLAIM_DISTANCE; #endif /* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4785a8a2040e..733890d913ea 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3523,6 +3523,7 @@ bool zone_watermark_ok_safe(struct zone *z, unsigned int order, } #ifdef CONFIG_NUMA +int __read_mostly node_reclaim_distance = RECLAIM_DISTANCE; static bool zone_allows_reclaim(struct zone *local_zone, struct zone *zone) { return node_distance(zone_to_nid(local_zone), zone_to_nid(zone)) <=
diff --git a/arch/sh/mm/Kconfig b/arch/sh/mm/Kconfig index 5c8a2ebfc720..cf655d8e8758 100644 --- a/arch/sh/mm/Kconfig +++ b/arch/sh/mm/Kconfig @@ -108,7 +108,7 @@ config VSYSCALL config NUMA bool "Non Uniform Memory Access (NUMA) Support" - depends on MMU && SYS_SUPPORTS_NUMA + depends on MMU && SMP && SYS_SUPPORTS_NUMA select ARCH_WANT_NUMA_VARIABLE_LOCALITY default n help
According to https://lkml.org/lkml/2019/12/16/101 and http://kisskb.ellerman.id.au/kisskb/buildresult/14067948/ building on sh4 is broken due to a page_alloc.c:(.text+0x3148): undefined reference to `node_reclaim_distance'. This only happens with CONFIG_NUMA=y (variable used with #ifdef CONFIG_NUMA at mm/page_alloc.c:3529) and CONFIG_SMP=n (variable defined at kernel/sched/topology.c:2291 but the whole file to be built depends on CONFIG_SMP in kernel/sched/Makefile:23. Follow the lead of arch/x86/Kconfig:1547 and depend on SMP. This assumes that there are no NUMA systems without SMP which is reasonable I guess. Signed-off-by: Gon Solo <gonsolo@gmail.com> --- arch/sh/mm/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)