Message ID | 20170914184918.20406-5-jeremy.linton@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Jeremy, On 2017/9/15 2:49, Jeremy Linton wrote: > Many modern machines have cluster on die (COD) non-uniformity > as well as the traditional multi-socket architectures. Reusing > the multi-socket or NUMA on die concepts for these (as arm64 does) > breaks down when presented with actual multi-socket/COD machines. > Similar, problems are also visible on some x86 machines so it > seems appropriate to start abstracting and making these topologies > visible. > > To start, a topology_cod_id() macro is added which defaults to returning > the same information as topology_physical_package_id(). Moving forward > we can start to spit out the differences. > > For arm64, an additional package_id is added to the cpu_topology array. > Initially this will be equal to the cluster_id as well. > > Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> > --- > arch/arm64/include/asm/topology.h | 4 +++- > arch/arm64/kernel/topology.c | 8 ++++++-- > include/linux/topology.h | 3 +++ > 3 files changed, 12 insertions(+), 3 deletions(-) > > diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h > index 8b57339823e9..bd7517960d39 100644 > --- a/arch/arm64/include/asm/topology.h > +++ b/arch/arm64/include/asm/topology.h > @@ -7,13 +7,15 @@ struct cpu_topology { > int thread_id; > int core_id; > int cluster_id; > + int package_id; > cpumask_t thread_sibling; > cpumask_t core_sibling; > }; 'core_sibling' will be updated by 'update_siblings_masks()' to represent cores in a cluster; Can we add a cpumask_t field to represent cores in a package? So that 'lstopo' can use this cpumask_t to display the right information. Thanks, Xiongfeng Wang > > extern struct cpu_topology cpu_topology[NR_CPUS]; > > -#define topology_physical_package_id(cpu) (cpu_topology[cpu].cluster_id) > +#define topology_physical_package_id(cpu) (cpu_topology[cpu].package_id) > +#define topology_cod_id(cpu) (cpu_topology[cpu].cluster_id) > #define topology_core_id(cpu) (cpu_topology[cpu].core_id) > #define topology_core_cpumask(cpu) (&cpu_topology[cpu].core_sibling) > #define topology_sibling_cpumask(cpu) (&cpu_topology[cpu].thread_sibling) > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c > index 8d48b233e6ce..9147e5b6326d 100644 > --- a/arch/arm64/kernel/topology.c > +++ b/arch/arm64/kernel/topology.c > @@ -67,6 +67,8 @@ static int __init parse_core(struct device_node *core, int cluster_id, > leaf = false; > cpu = get_cpu_for_node(t); > if (cpu >= 0) { > + /* maintain DT cluster == package behavior */ > + cpu_topology[cpu].package_id = cluster_id; > cpu_topology[cpu].cluster_id = cluster_id; > cpu_topology[cpu].core_id = core_id; > cpu_topology[cpu].thread_id = i; > @@ -88,7 +90,7 @@ static int __init parse_core(struct device_node *core, int cluster_id, > core); > return -EINVAL; > } > - > + cpu_topology[cpu].package_id = cluster_id; > cpu_topology[cpu].cluster_id = cluster_id; > cpu_topology[cpu].core_id = core_id; > } else if (leaf) { > @@ -228,7 +230,7 @@ static void update_siblings_masks(unsigned int cpuid) > for_each_possible_cpu(cpu) { > cpu_topo = &cpu_topology[cpu]; > > - if (cpuid_topo->cluster_id != cpu_topo->cluster_id) > + if (cpuid_topo->package_id != cpu_topo->package_id) > continue; > > cpumask_set_cpu(cpuid, &cpu_topo->core_sibling); > @@ -273,6 +275,7 @@ void store_cpu_topology(unsigned int cpuid) > MPIDR_AFFINITY_LEVEL(mpidr, 2) << 8 | > MPIDR_AFFINITY_LEVEL(mpidr, 3) << 16; > } > + cpuid_topo->package_id = cpuid_topo->cluster_id; > > pr_debug("CPU%u: cluster %d core %d thread %d mpidr %#016llx\n", > cpuid, cpuid_topo->cluster_id, cpuid_topo->core_id, > @@ -292,6 +295,7 @@ static void __init reset_cpu_topology(void) > cpu_topo->thread_id = -1; > cpu_topo->core_id = 0; > cpu_topo->cluster_id = -1; > + cpu_topo->package_id = -1; > > cpumask_clear(&cpu_topo->core_sibling); > cpumask_set_cpu(cpu, &cpu_topo->core_sibling); > diff --git a/include/linux/topology.h b/include/linux/topology.h > index cb0775e1ee4b..4660749a7303 100644 > --- a/include/linux/topology.h > +++ b/include/linux/topology.h > @@ -184,6 +184,9 @@ static inline int cpu_to_mem(int cpu) > #ifndef topology_physical_package_id > #define topology_physical_package_id(cpu) ((void)(cpu), -1) > #endif > +#ifndef topology_cod_id /* cluster on die */ > +#define topology_cod_id(cpu) topology_physical_package_id(cpu) > +#endif > #ifndef topology_core_id > #define topology_core_id(cpu) ((void)(cpu), 0) > #endif >
Hi, On 09/17/2017 08:50 PM, Xiongfeng Wang wrote: > Hi Jeremy, > > On 2017/9/15 2:49, Jeremy Linton wrote: >> Many modern machines have cluster on die (COD) non-uniformity >> as well as the traditional multi-socket architectures. Reusing >> the multi-socket or NUMA on die concepts for these (as arm64 does) >> breaks down when presented with actual multi-socket/COD machines. >> Similar, problems are also visible on some x86 machines so it >> seems appropriate to start abstracting and making these topologies >> visible. >> >> To start, a topology_cod_id() macro is added which defaults to returning >> the same information as topology_physical_package_id(). Moving forward >> we can start to spit out the differences. >> >> For arm64, an additional package_id is added to the cpu_topology array. >> Initially this will be equal to the cluster_id as well. >> >> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> >> --- >> arch/arm64/include/asm/topology.h | 4 +++- >> arch/arm64/kernel/topology.c | 8 ++++++-- >> include/linux/topology.h | 3 +++ >> 3 files changed, 12 insertions(+), 3 deletions(-) >> >> diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h >> index 8b57339823e9..bd7517960d39 100644 >> --- a/arch/arm64/include/asm/topology.h >> +++ b/arch/arm64/include/asm/topology.h >> @@ -7,13 +7,15 @@ struct cpu_topology { >> int thread_id; >> int core_id; >> int cluster_id; >> + int package_id; >> cpumask_t thread_sibling; >> cpumask_t core_sibling; >> }; > > 'core_sibling' will be updated by 'update_siblings_masks()' to represent cores in a cluster; > Can we add a cpumask_t field to represent cores in a package? So that 'lstopo' can use this > cpumask_t to display the right information. So, the change below modifies update_siblings_mask() to utilize the package_id. Per the ABI the ..cpuX/topology/physical_package_id is shared between the core_siblings/core_siblings_list. What physical_package_id means can vary per architecture, but the siblings list needs to be the cores with the same phyiscal_package (AFAIK, feel free to correct my understanding). That rule should be enforced by this patch set. I suspect if your running these patches, and the lstopo output looks strange its because your on a machine where the thread_id has been assigned the cluster_id in the later patch set. > > Thanks, > Xiongfeng Wang > >> >> extern struct cpu_topology cpu_topology[NR_CPUS]; >> >> -#define topology_physical_package_id(cpu) (cpu_topology[cpu].cluster_id) >> +#define topology_physical_package_id(cpu) (cpu_topology[cpu].package_id) >> +#define topology_cod_id(cpu) (cpu_topology[cpu].cluster_id) >> #define topology_core_id(cpu) (cpu_topology[cpu].core_id) >> #define topology_core_cpumask(cpu) (&cpu_topology[cpu].core_sibling) >> #define topology_sibling_cpumask(cpu) (&cpu_topology[cpu].thread_sibling) >> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c >> index 8d48b233e6ce..9147e5b6326d 100644 >> --- a/arch/arm64/kernel/topology.c >> +++ b/arch/arm64/kernel/topology.c >> @@ -67,6 +67,8 @@ static int __init parse_core(struct device_node *core, int cluster_id, >> leaf = false; >> cpu = get_cpu_for_node(t); >> if (cpu >= 0) { >> + /* maintain DT cluster == package behavior */ >> + cpu_topology[cpu].package_id = cluster_id; >> cpu_topology[cpu].cluster_id = cluster_id; >> cpu_topology[cpu].core_id = core_id; >> cpu_topology[cpu].thread_id = i; >> @@ -88,7 +90,7 @@ static int __init parse_core(struct device_node *core, int cluster_id, >> core); >> return -EINVAL; >> } >> - >> + cpu_topology[cpu].package_id = cluster_id; >> cpu_topology[cpu].cluster_id = cluster_id; >> cpu_topology[cpu].core_id = core_id; >> } else if (leaf) { >> @@ -228,7 +230,7 @@ static void update_siblings_masks(unsigned int cpuid) >> for_each_possible_cpu(cpu) { >> cpu_topo = &cpu_topology[cpu]; >> >> - if (cpuid_topo->cluster_id != cpu_topo->cluster_id) >> + if (cpuid_topo->package_id != cpu_topo->package_id) (note here that core_siblings now reflect the package_id rather than the cluster_id. This only matters if cluster_id!=package_id). >> continue; >> >> cpumask_set_cpu(cpuid, &cpu_topo->core_sibling); >> @@ -273,6 +275,7 @@ void store_cpu_topology(unsigned int cpuid) >> MPIDR_AFFINITY_LEVEL(mpidr, 2) << 8 | >> MPIDR_AFFINITY_LEVEL(mpidr, 3) << 16; >> } >> + cpuid_topo->package_id = cpuid_topo->cluster_id; >> >> pr_debug("CPU%u: cluster %d core %d thread %d mpidr %#016llx\n", >> cpuid, cpuid_topo->cluster_id, cpuid_topo->core_id, >> @@ -292,6 +295,7 @@ static void __init reset_cpu_topology(void) >> cpu_topo->thread_id = -1; >> cpu_topo->core_id = 0; >> cpu_topo->cluster_id = -1; >> + cpu_topo->package_id = -1; >> >> cpumask_clear(&cpu_topo->core_sibling); >> cpumask_set_cpu(cpu, &cpu_topo->core_sibling); >> diff --git a/include/linux/topology.h b/include/linux/topology.h >> index cb0775e1ee4b..4660749a7303 100644 >> --- a/include/linux/topology.h >> +++ b/include/linux/topology.h >> @@ -184,6 +184,9 @@ static inline int cpu_to_mem(int cpu) >> #ifndef topology_physical_package_id >> #define topology_physical_package_id(cpu) ((void)(cpu), -1) >> #endif >> +#ifndef topology_cod_id /* cluster on die */ >> +#define topology_cod_id(cpu) topology_physical_package_id(cpu) >> +#endif >> #ifndef topology_core_id >> #define topology_core_id(cpu) ((void)(cpu), 0) >> #endif >> > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >
Hi Jeremy, On 2017/9/19 2:54, Jeremy Linton wrote: > Hi, > > > On 09/17/2017 08:50 PM, Xiongfeng Wang wrote: >> Hi Jeremy, >> >> On 2017/9/15 2:49, Jeremy Linton wrote: >>> Many modern machines have cluster on die (COD) non-uniformity >>> as well as the traditional multi-socket architectures. Reusing >>> the multi-socket or NUMA on die concepts for these (as arm64 does) >>> breaks down when presented with actual multi-socket/COD machines. >>> Similar, problems are also visible on some x86 machines so it >>> seems appropriate to start abstracting and making these topologies >>> visible. >>> >>> To start, a topology_cod_id() macro is added which defaults to returning >>> the same information as topology_physical_package_id(). Moving forward >>> we can start to spit out the differences. >>> >>> For arm64, an additional package_id is added to the cpu_topology array. >>> Initially this will be equal to the cluster_id as well. >>> >>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> >>> --- >>> arch/arm64/include/asm/topology.h | 4 +++- >>> arch/arm64/kernel/topology.c | 8 ++++++-- >>> include/linux/topology.h | 3 +++ >>> 3 files changed, 12 insertions(+), 3 deletions(-) >>> >>> diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h >>> index 8b57339823e9..bd7517960d39 100644 >>> --- a/arch/arm64/include/asm/topology.h >>> +++ b/arch/arm64/include/asm/topology.h >>> @@ -7,13 +7,15 @@ struct cpu_topology { >>> int thread_id; >>> int core_id; >>> int cluster_id; >>> + int package_id; >>> cpumask_t thread_sibling; >>> cpumask_t core_sibling; >>> }; >> >> 'core_sibling' will be updated by 'update_siblings_masks()' to represent cores in a cluster; >> Can we add a cpumask_t field to represent cores in a package? So that 'lstopo' can use this >> cpumask_t to display the right information. > > So, the change below modifies update_siblings_mask() to utilize the package_id. Per the ABI the ..cpuX/topology/physical_package_id is shared between the core_siblings/core_siblings_list. What physical_package_id means can vary per architecture, but the siblings list needs to be the cores with the same phyiscal_package (AFAIK, feel free to correct my understanding). That rule should be enforced by this patch set. > > I suspect if your running these patches, and the lstopo output looks strange its because your on a machine where the thread_id has been assigned the cluster_id in the later patch set. > Sorry, I didn't notice your change in 'update_siblings_masks()' before, so 'core_sibling' are represent cores in a package now. But we may need another cpumask_t field to represent cores in a cluster, so that the scheduler can use it to build a sched_domain only with cores in one cluster. > >> >> Thanks, >> Xiongfeng Wang >> >>> extern struct cpu_topology cpu_topology[NR_CPUS]; >>> -#define topology_physical_package_id(cpu) (cpu_topology[cpu].cluster_id) >>> +#define topology_physical_package_id(cpu) (cpu_topology[cpu].package_id) >>> +#define topology_cod_id(cpu) (cpu_topology[cpu].cluster_id) >>> #define topology_core_id(cpu) (cpu_topology[cpu].core_id) >>> #define topology_core_cpumask(cpu) (&cpu_topology[cpu].core_sibling) >>> #define topology_sibling_cpumask(cpu) (&cpu_topology[cpu].thread_sibling) >>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c >>> index 8d48b233e6ce..9147e5b6326d 100644 >>> --- a/arch/arm64/kernel/topology.c >>> +++ b/arch/arm64/kernel/topology.c >>> @@ -67,6 +67,8 @@ static int __init parse_core(struct device_node *core, int cluster_id, >>> leaf = false; >>> cpu = get_cpu_for_node(t); >>> if (cpu >= 0) { >>> + /* maintain DT cluster == package behavior */ >>> + cpu_topology[cpu].package_id = cluster_id; >>> cpu_topology[cpu].cluster_id = cluster_id; >>> cpu_topology[cpu].core_id = core_id; >>> cpu_topology[cpu].thread_id = i; >>> @@ -88,7 +90,7 @@ static int __init parse_core(struct device_node *core, int cluster_id, >>> core); >>> return -EINVAL; >>> } >>> - >>> + cpu_topology[cpu].package_id = cluster_id; >>> cpu_topology[cpu].cluster_id = cluster_id; >>> cpu_topology[cpu].core_id = core_id; >>> } else if (leaf) { >>> @@ -228,7 +230,7 @@ static void update_siblings_masks(unsigned int cpuid) >>> for_each_possible_cpu(cpu) { >>> cpu_topo = &cpu_topology[cpu]; >>> - if (cpuid_topo->cluster_id != cpu_topo->cluster_id) >>> + if (cpuid_topo->package_id != cpu_topo->package_id) > > (note here that core_siblings now reflect the package_id rather than the cluster_id. This only matters if cluster_id!=package_id). > >>> continue; >>> cpumask_set_cpu(cpuid, &cpu_topo->core_sibling); >>> @@ -273,6 +275,7 @@ void store_cpu_topology(unsigned int cpuid) >>> MPIDR_AFFINITY_LEVEL(mpidr, 2) << 8 | >>> MPIDR_AFFINITY_LEVEL(mpidr, 3) << 16; >>> } >>> + cpuid_topo->package_id = cpuid_topo->cluster_id; >>> pr_debug("CPU%u: cluster %d core %d thread %d mpidr %#016llx\n", >>> cpuid, cpuid_topo->cluster_id, cpuid_topo->core_id, >>> @@ -292,6 +295,7 @@ static void __init reset_cpu_topology(void) >>> cpu_topo->thread_id = -1; >>> cpu_topo->core_id = 0; >>> cpu_topo->cluster_id = -1; >>> + cpu_topo->package_id = -1; >>> cpumask_clear(&cpu_topo->core_sibling); >>> cpumask_set_cpu(cpu, &cpu_topo->core_sibling); >>> diff --git a/include/linux/topology.h b/include/linux/topology.h >>> index cb0775e1ee4b..4660749a7303 100644 >>> --- a/include/linux/topology.h >>> +++ b/include/linux/topology.h >>> @@ -184,6 +184,9 @@ static inline int cpu_to_mem(int cpu) >>> #ifndef topology_physical_package_id >>> #define topology_physical_package_id(cpu) ((void)(cpu), -1) >>> #endif >>> +#ifndef topology_cod_id /* cluster on die */ >>> +#define topology_cod_id(cpu) topology_physical_package_id(cpu) >>> +#endif >>> #ifndef topology_core_id >>> #define topology_core_id(cpu) ((void)(cpu), 0) >>> #endif >>> >> >> >> _______________________________________________ >> linux-arm-kernel mailing list >> linux-arm-kernel@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >> > > > . >
diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h index 8b57339823e9..bd7517960d39 100644 --- a/arch/arm64/include/asm/topology.h +++ b/arch/arm64/include/asm/topology.h @@ -7,13 +7,15 @@ struct cpu_topology { int thread_id; int core_id; int cluster_id; + int package_id; cpumask_t thread_sibling; cpumask_t core_sibling; }; extern struct cpu_topology cpu_topology[NR_CPUS]; -#define topology_physical_package_id(cpu) (cpu_topology[cpu].cluster_id) +#define topology_physical_package_id(cpu) (cpu_topology[cpu].package_id) +#define topology_cod_id(cpu) (cpu_topology[cpu].cluster_id) #define topology_core_id(cpu) (cpu_topology[cpu].core_id) #define topology_core_cpumask(cpu) (&cpu_topology[cpu].core_sibling) #define topology_sibling_cpumask(cpu) (&cpu_topology[cpu].thread_sibling) diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index 8d48b233e6ce..9147e5b6326d 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -67,6 +67,8 @@ static int __init parse_core(struct device_node *core, int cluster_id, leaf = false; cpu = get_cpu_for_node(t); if (cpu >= 0) { + /* maintain DT cluster == package behavior */ + cpu_topology[cpu].package_id = cluster_id; cpu_topology[cpu].cluster_id = cluster_id; cpu_topology[cpu].core_id = core_id; cpu_topology[cpu].thread_id = i; @@ -88,7 +90,7 @@ static int __init parse_core(struct device_node *core, int cluster_id, core); return -EINVAL; } - + cpu_topology[cpu].package_id = cluster_id; cpu_topology[cpu].cluster_id = cluster_id; cpu_topology[cpu].core_id = core_id; } else if (leaf) { @@ -228,7 +230,7 @@ static void update_siblings_masks(unsigned int cpuid) for_each_possible_cpu(cpu) { cpu_topo = &cpu_topology[cpu]; - if (cpuid_topo->cluster_id != cpu_topo->cluster_id) + if (cpuid_topo->package_id != cpu_topo->package_id) continue; cpumask_set_cpu(cpuid, &cpu_topo->core_sibling); @@ -273,6 +275,7 @@ void store_cpu_topology(unsigned int cpuid) MPIDR_AFFINITY_LEVEL(mpidr, 2) << 8 | MPIDR_AFFINITY_LEVEL(mpidr, 3) << 16; } + cpuid_topo->package_id = cpuid_topo->cluster_id; pr_debug("CPU%u: cluster %d core %d thread %d mpidr %#016llx\n", cpuid, cpuid_topo->cluster_id, cpuid_topo->core_id, @@ -292,6 +295,7 @@ static void __init reset_cpu_topology(void) cpu_topo->thread_id = -1; cpu_topo->core_id = 0; cpu_topo->cluster_id = -1; + cpu_topo->package_id = -1; cpumask_clear(&cpu_topo->core_sibling); cpumask_set_cpu(cpu, &cpu_topo->core_sibling); diff --git a/include/linux/topology.h b/include/linux/topology.h index cb0775e1ee4b..4660749a7303 100644 --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -184,6 +184,9 @@ static inline int cpu_to_mem(int cpu) #ifndef topology_physical_package_id #define topology_physical_package_id(cpu) ((void)(cpu), -1) #endif +#ifndef topology_cod_id /* cluster on die */ +#define topology_cod_id(cpu) topology_physical_package_id(cpu) +#endif #ifndef topology_core_id #define topology_core_id(cpu) ((void)(cpu), 0) #endif
Many modern machines have cluster on die (COD) non-uniformity as well as the traditional multi-socket architectures. Reusing the multi-socket or NUMA on die concepts for these (as arm64 does) breaks down when presented with actual multi-socket/COD machines. Similar, problems are also visible on some x86 machines so it seems appropriate to start abstracting and making these topologies visible. To start, a topology_cod_id() macro is added which defaults to returning the same information as topology_physical_package_id(). Moving forward we can start to spit out the differences. For arm64, an additional package_id is added to the cpu_topology array. Initially this will be equal to the cluster_id as well. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> --- arch/arm64/include/asm/topology.h | 4 +++- arch/arm64/kernel/topology.c | 8 ++++++-- include/linux/topology.h | 3 +++ 3 files changed, 12 insertions(+), 3 deletions(-)