Message ID | 20180425233121.13270-14-jeremy.linton@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 26/04/18 00:31, Jeremy Linton wrote: > Now that we have an accurate view of the physical topology > we need to represent it correctly to the scheduler. Generally MC > should equal the LLC in the system, but there are a number of > special cases that need to be dealt with. > > In the case of NUMA in socket, we need to assure that the sched > domain we build for the MC layer isn't larger than the DIE above it. > Similarly for LLC's that might exist in cross socket interconnect or > directory hardware we need to assure that MC is shrunk to the socket > or NUMA node. > > This patch builds a sibling mask for the LLC, and then picks the > smallest of LLC, socket siblings, or NUMA node siblings, which > gives us the behavior described above. This is ever so slightly > different than the similar alternative where we look for a cache > layer less than or equal to the socket/NUMA siblings. > > The logic to pick the MC layer affects all arm64 machines, but > only changes the behavior for DT/MPIDR systems if the NUMA domain > is smaller than the core siblings (generally set to the cluster). > Potentially this fixes a possible bug in DT systems, but really > it only affects ACPI systems where the core siblings is correctly > set to the socket siblings. Thus all currently available ACPI > systems should have MC equal to LLC, including the NUMA in socket > machines where the LLC is partitioned between the NUMA nodes. > > Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> > --- > arch/arm64/include/asm/topology.h | 2 ++ > arch/arm64/kernel/topology.c | 32 +++++++++++++++++++++++++++++++- > 2 files changed, 33 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h > index 6b10459e6905..df48212f767b 100644 > --- a/arch/arm64/include/asm/topology.h > +++ b/arch/arm64/include/asm/topology.h > @@ -8,8 +8,10 @@ struct cpu_topology { > int thread_id; > int core_id; > int package_id; > + int llc_id; > cpumask_t thread_sibling; > cpumask_t core_sibling; > + cpumask_t llc_siblings; > }; > > extern struct cpu_topology cpu_topology[NR_CPUS]; > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c > index bd1aae438a31..20b4341dc527 100644 > --- a/arch/arm64/kernel/topology.c > +++ b/arch/arm64/kernel/topology.c > @@ -13,6 +13,7 @@ > > #include <linux/acpi.h> > #include <linux/arch_topology.h> > +#include <linux/cacheinfo.h> > #include <linux/cpu.h> > #include <linux/cpumask.h> > #include <linux/init.h> > @@ -214,7 +215,19 @@ EXPORT_SYMBOL_GPL(cpu_topology); > > const struct cpumask *cpu_coregroup_mask(int cpu) > { > - return &cpu_topology[cpu].core_sibling; > + const cpumask_t *core_mask = cpumask_of_node(cpu_to_node(cpu)); > + > + /* Find the smaller of NUMA, core or LLC siblings */ > + if (cpumask_subset(&cpu_topology[cpu].core_sibling, core_mask)) { > + /* not numa in package, lets use the package siblings */ > + core_mask = &cpu_topology[cpu].core_sibling; > + } > + if (cpu_topology[cpu].llc_id != -1) { > + if (cpumask_subset(&cpu_topology[cpu].llc_siblings, core_mask)) > + core_mask = &cpu_topology[cpu].llc_siblings; > + } > + > + return core_mask; > } > > static void update_siblings_masks(unsigned int cpuid) > @@ -226,6 +239,9 @@ static void update_siblings_masks(unsigned int cpuid) > for_each_possible_cpu(cpu) { > cpu_topo = &cpu_topology[cpu]; > > + if (cpuid_topo->llc_id == cpu_topo->llc_id) > + cpumask_set_cpu(cpu, &cpuid_topo->llc_siblings); > + Would this not result in cpuid_topo->llc_siblings = cpu_possible_mask on DT systems where llc_id is not set/defaults to -1 and still pass the condition. Does it make sense to add additional -1 check ? > if (cpuid_topo->package_id != cpu_topo->package_id) > continue; > > @@ -291,6 +307,10 @@ static void __init reset_cpu_topology(void) > cpu_topo->core_id = 0; > cpu_topo->package_id = -1; > > + cpu_topo->llc_id = -1; > + cpumask_clear(&cpu_topo->llc_siblings); > + cpumask_set_cpu(cpu, &cpu_topo->llc_siblings); > + > cpumask_clear(&cpu_topo->core_sibling); > cpumask_set_cpu(cpu, &cpu_topo->core_sibling); > cpumask_clear(&cpu_topo->thread_sibling); > @@ -311,6 +331,8 @@ static int __init parse_acpi_topology(void) > is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK; > > for_each_possible_cpu(cpu) { > + int i; > + > topology_id = find_acpi_cpu_topology(cpu, 0); > if (topology_id < 0) > return topology_id; > @@ -325,6 +347,14 @@ static int __init parse_acpi_topology(void) > } > topology_id = find_acpi_cpu_topology_package(cpu); > cpu_topology[cpu].package_id = topology_id; > + > + i = acpi_find_last_cache_level(cpu); > + > + if (i > 0) { > + topology_id = find_acpi_cpu_cache_topology(cpu, i); > + if (topology_id > 0) > + cpu_topology[cpu].llc_id = topology_id; > + } [nit] s/topology_id/cache_id/ or s/topology_id/cache_topology_id/ ? Otherwise looks fine to me. You can add with above things fixed. Acked-by: Sudeep Holla <sudeep.holla@arm.com>
On Tue, May 01, 2018 at 03:33:33PM +0100, Sudeep Holla wrote: > > > On 26/04/18 00:31, Jeremy Linton wrote: > > Now that we have an accurate view of the physical topology > > we need to represent it correctly to the scheduler. Generally MC > > should equal the LLC in the system, but there are a number of > > special cases that need to be dealt with. > > > > In the case of NUMA in socket, we need to assure that the sched > > domain we build for the MC layer isn't larger than the DIE above it. > > Similarly for LLC's that might exist in cross socket interconnect or > > directory hardware we need to assure that MC is shrunk to the socket > > or NUMA node. > > > > This patch builds a sibling mask for the LLC, and then picks the > > smallest of LLC, socket siblings, or NUMA node siblings, which > > gives us the behavior described above. This is ever so slightly > > different than the similar alternative where we look for a cache > > layer less than or equal to the socket/NUMA siblings. > > > > The logic to pick the MC layer affects all arm64 machines, but > > only changes the behavior for DT/MPIDR systems if the NUMA domain > > is smaller than the core siblings (generally set to the cluster). > > Potentially this fixes a possible bug in DT systems, but really > > it only affects ACPI systems where the core siblings is correctly > > set to the socket siblings. Thus all currently available ACPI > > systems should have MC equal to LLC, including the NUMA in socket > > machines where the LLC is partitioned between the NUMA nodes. > > > > Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> > > --- > > arch/arm64/include/asm/topology.h | 2 ++ > > arch/arm64/kernel/topology.c | 32 +++++++++++++++++++++++++++++++- > > 2 files changed, 33 insertions(+), 1 deletion(-) > > > > diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h > > index 6b10459e6905..df48212f767b 100644 > > --- a/arch/arm64/include/asm/topology.h > > +++ b/arch/arm64/include/asm/topology.h > > @@ -8,8 +8,10 @@ struct cpu_topology { > > int thread_id; > > int core_id; > > int package_id; > > + int llc_id; > > cpumask_t thread_sibling; > > cpumask_t core_sibling; > > + cpumask_t llc_siblings; > > }; > > > > extern struct cpu_topology cpu_topology[NR_CPUS]; > > diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c > > index bd1aae438a31..20b4341dc527 100644 > > --- a/arch/arm64/kernel/topology.c > > +++ b/arch/arm64/kernel/topology.c > > @@ -13,6 +13,7 @@ > > > > #include <linux/acpi.h> > > #include <linux/arch_topology.h> > > +#include <linux/cacheinfo.h> > > #include <linux/cpu.h> > > #include <linux/cpumask.h> > > #include <linux/init.h> > > @@ -214,7 +215,19 @@ EXPORT_SYMBOL_GPL(cpu_topology); > > > > const struct cpumask *cpu_coregroup_mask(int cpu) > > { > > - return &cpu_topology[cpu].core_sibling; > > + const cpumask_t *core_mask = cpumask_of_node(cpu_to_node(cpu)); > > + > > + /* Find the smaller of NUMA, core or LLC siblings */ > > + if (cpumask_subset(&cpu_topology[cpu].core_sibling, core_mask)) { > > + /* not numa in package, lets use the package siblings */ > > + core_mask = &cpu_topology[cpu].core_sibling; > > + } > > + if (cpu_topology[cpu].llc_id != -1) { > > + if (cpumask_subset(&cpu_topology[cpu].llc_siblings, core_mask)) > > + core_mask = &cpu_topology[cpu].llc_siblings; > > + } > > + > > + return core_mask; > > } > > > > static void update_siblings_masks(unsigned int cpuid) > > @@ -226,6 +239,9 @@ static void update_siblings_masks(unsigned int cpuid) > > for_each_possible_cpu(cpu) { > > cpu_topo = &cpu_topology[cpu]; > > > > + if (cpuid_topo->llc_id == cpu_topo->llc_id) > > + cpumask_set_cpu(cpu, &cpuid_topo->llc_siblings); > > + > > Would this not result in cpuid_topo->llc_siblings = cpu_possible_mask > on DT systems where llc_id is not set/defaults to -1 and still pass the > condition. Does it make sense to add additional -1 check ? I don't think mask will be used by the current code if llc_id == -1 as the user does the check. Is it better to have the mask empty than default to cpu_possible_mask? If we require all users to implement a check it shouldn't matter.
Hi, On 05/02/2018 06:49 AM, Morten Rasmussen wrote: > On Tue, May 01, 2018 at 03:33:33PM +0100, Sudeep Holla wrote: >> >> >> On 26/04/18 00:31, Jeremy Linton wrote: >>> Now that we have an accurate view of the physical topology >>> we need to represent it correctly to the scheduler. Generally MC >>> should equal the LLC in the system, but there are a number of >>> special cases that need to be dealt with. >>> >>> In the case of NUMA in socket, we need to assure that the sched >>> domain we build for the MC layer isn't larger than the DIE above it. >>> Similarly for LLC's that might exist in cross socket interconnect or >>> directory hardware we need to assure that MC is shrunk to the socket >>> or NUMA node. >>> >>> This patch builds a sibling mask for the LLC, and then picks the >>> smallest of LLC, socket siblings, or NUMA node siblings, which >>> gives us the behavior described above. This is ever so slightly >>> different than the similar alternative where we look for a cache >>> layer less than or equal to the socket/NUMA siblings. >>> >>> The logic to pick the MC layer affects all arm64 machines, but >>> only changes the behavior for DT/MPIDR systems if the NUMA domain >>> is smaller than the core siblings (generally set to the cluster). >>> Potentially this fixes a possible bug in DT systems, but really >>> it only affects ACPI systems where the core siblings is correctly >>> set to the socket siblings. Thus all currently available ACPI >>> systems should have MC equal to LLC, including the NUMA in socket >>> machines where the LLC is partitioned between the NUMA nodes. >>> >>> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> >>> --- >>> arch/arm64/include/asm/topology.h | 2 ++ >>> arch/arm64/kernel/topology.c | 32 +++++++++++++++++++++++++++++++- >>> 2 files changed, 33 insertions(+), 1 deletion(-) >>> >>> diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h >>> index 6b10459e6905..df48212f767b 100644 >>> --- a/arch/arm64/include/asm/topology.h >>> +++ b/arch/arm64/include/asm/topology.h >>> @@ -8,8 +8,10 @@ struct cpu_topology { >>> int thread_id; >>> int core_id; >>> int package_id; >>> + int llc_id; >>> cpumask_t thread_sibling; >>> cpumask_t core_sibling; >>> + cpumask_t llc_siblings; >>> }; >>> >>> extern struct cpu_topology cpu_topology[NR_CPUS]; >>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c >>> index bd1aae438a31..20b4341dc527 100644 >>> --- a/arch/arm64/kernel/topology.c >>> +++ b/arch/arm64/kernel/topology.c >>> @@ -13,6 +13,7 @@ >>> >>> #include <linux/acpi.h> >>> #include <linux/arch_topology.h> >>> +#include <linux/cacheinfo.h> >>> #include <linux/cpu.h> >>> #include <linux/cpumask.h> >>> #include <linux/init.h> >>> @@ -214,7 +215,19 @@ EXPORT_SYMBOL_GPL(cpu_topology); >>> >>> const struct cpumask *cpu_coregroup_mask(int cpu) >>> { >>> - return &cpu_topology[cpu].core_sibling; >>> + const cpumask_t *core_mask = cpumask_of_node(cpu_to_node(cpu)); >>> + >>> + /* Find the smaller of NUMA, core or LLC siblings */ >>> + if (cpumask_subset(&cpu_topology[cpu].core_sibling, core_mask)) { >>> + /* not numa in package, lets use the package siblings */ >>> + core_mask = &cpu_topology[cpu].core_sibling; >>> + } >>> + if (cpu_topology[cpu].llc_id != -1) { >>> + if (cpumask_subset(&cpu_topology[cpu].llc_siblings, core_mask)) >>> + core_mask = &cpu_topology[cpu].llc_siblings; >>> + } >>> + >>> + return core_mask; >>> } >>> >>> static void update_siblings_masks(unsigned int cpuid) >>> @@ -226,6 +239,9 @@ static void update_siblings_masks(unsigned int cpuid) >>> for_each_possible_cpu(cpu) { >>> cpu_topo = &cpu_topology[cpu]; >>> >>> + if (cpuid_topo->llc_id == cpu_topo->llc_id) >>> + cpumask_set_cpu(cpu, &cpuid_topo->llc_siblings); >>> + >> >> Would this not result in cpuid_topo->llc_siblings = cpu_possible_mask >> on DT systems where llc_id is not set/defaults to -1 and still pass the >> condition. Does it make sense to add additional -1 check ? > > I don't think mask will be used by the current code if llc_id == -1 as > the user does the check. Is it better to have the mask empty than > default to cpu_possible_mask? If we require all users to implement a > check it shouldn't matter. > Right. There is also the other way of thinking about it, which is if you remove the if llc_id == -1 check in cpu_coregroup_mask() does it make more sense to have llc_siblings default equal all the cores, or just the one being requested?
Hi, Thanks for taking a look at this. On 05/01/2018 09:33 AM, Sudeep Holla wrote: > > > On 26/04/18 00:31, Jeremy Linton wrote: >> Now that we have an accurate view of the physical topology >> we need to represent it correctly to the scheduler. Generally MC >> should equal the LLC in the system, but there are a number of >> special cases that need to be dealt with. >> >> In the case of NUMA in socket, we need to assure that the sched >> domain we build for the MC layer isn't larger than the DIE above it. >> Similarly for LLC's that might exist in cross socket interconnect or >> directory hardware we need to assure that MC is shrunk to the socket >> or NUMA node. >> >> This patch builds a sibling mask for the LLC, and then picks the >> smallest of LLC, socket siblings, or NUMA node siblings, which >> gives us the behavior described above. This is ever so slightly >> different than the similar alternative where we look for a cache >> layer less than or equal to the socket/NUMA siblings. >> >> The logic to pick the MC layer affects all arm64 machines, but >> only changes the behavior for DT/MPIDR systems if the NUMA domain >> is smaller than the core siblings (generally set to the cluster). >> Potentially this fixes a possible bug in DT systems, but really >> it only affects ACPI systems where the core siblings is correctly >> set to the socket siblings. Thus all currently available ACPI >> systems should have MC equal to LLC, including the NUMA in socket >> machines where the LLC is partitioned between the NUMA nodes. >> >> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> >> --- >> arch/arm64/include/asm/topology.h | 2 ++ >> arch/arm64/kernel/topology.c | 32 +++++++++++++++++++++++++++++++- >> 2 files changed, 33 insertions(+), 1 deletion(-) >> >> diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h >> index 6b10459e6905..df48212f767b 100644 >> --- a/arch/arm64/include/asm/topology.h >> +++ b/arch/arm64/include/asm/topology.h >> @@ -8,8 +8,10 @@ struct cpu_topology { >> int thread_id; >> int core_id; >> int package_id; >> + int llc_id; >> cpumask_t thread_sibling; >> cpumask_t core_sibling; >> + cpumask_t llc_siblings; >> }; >> >> extern struct cpu_topology cpu_topology[NR_CPUS]; >> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c >> index bd1aae438a31..20b4341dc527 100644 >> --- a/arch/arm64/kernel/topology.c >> +++ b/arch/arm64/kernel/topology.c >> @@ -13,6 +13,7 @@ >> >> #include <linux/acpi.h> >> #include <linux/arch_topology.h> >> +#include <linux/cacheinfo.h> >> #include <linux/cpu.h> >> #include <linux/cpumask.h> >> #include <linux/init.h> >> @@ -214,7 +215,19 @@ EXPORT_SYMBOL_GPL(cpu_topology); >> >> const struct cpumask *cpu_coregroup_mask(int cpu) >> { >> - return &cpu_topology[cpu].core_sibling; >> + const cpumask_t *core_mask = cpumask_of_node(cpu_to_node(cpu)); >> + >> + /* Find the smaller of NUMA, core or LLC siblings */ >> + if (cpumask_subset(&cpu_topology[cpu].core_sibling, core_mask)) { >> + /* not numa in package, lets use the package siblings */ >> + core_mask = &cpu_topology[cpu].core_sibling; >> + } >> + if (cpu_topology[cpu].llc_id != -1) { >> + if (cpumask_subset(&cpu_topology[cpu].llc_siblings, core_mask)) >> + core_mask = &cpu_topology[cpu].llc_siblings; >> + } >> + >> + return core_mask; >> } >> >> static void update_siblings_masks(unsigned int cpuid) >> @@ -226,6 +239,9 @@ static void update_siblings_masks(unsigned int cpuid) >> for_each_possible_cpu(cpu) { >> cpu_topo = &cpu_topology[cpu]; >> >> + if (cpuid_topo->llc_id == cpu_topo->llc_id) >> + cpumask_set_cpu(cpu, &cpuid_topo->llc_siblings); >> + > > Would this not result in cpuid_topo->llc_siblings = cpu_possible_mask > on DT systems where llc_id is not set/defaults to -1 and still pass the > condition. Does it make sense to add additional -1 check ? (see comment in Morton's thread) > >> if (cpuid_topo->package_id != cpu_topo->package_id) >> continue; >> >> @@ -291,6 +307,10 @@ static void __init reset_cpu_topology(void) >> cpu_topo->core_id = 0; >> cpu_topo->package_id = -1; >> >> + cpu_topo->llc_id = -1; >> + cpumask_clear(&cpu_topo->llc_siblings); >> + cpumask_set_cpu(cpu, &cpu_topo->llc_siblings); >> + >> cpumask_clear(&cpu_topo->core_sibling); >> cpumask_set_cpu(cpu, &cpu_topo->core_sibling); >> cpumask_clear(&cpu_topo->thread_sibling); >> @@ -311,6 +331,8 @@ static int __init parse_acpi_topology(void) >> is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK; >> >> for_each_possible_cpu(cpu) { >> + int i; >> + >> topology_id = find_acpi_cpu_topology(cpu, 0); >> if (topology_id < 0) >> return topology_id; >> @@ -325,6 +347,14 @@ static int __init parse_acpi_topology(void) >> } >> topology_id = find_acpi_cpu_topology_package(cpu); >> cpu_topology[cpu].package_id = topology_id; >> + >> + i = acpi_find_last_cache_level(cpu); >> + >> + if (i > 0) { >> + topology_id = find_acpi_cpu_cache_topology(cpu, i); >> + if (topology_id > 0) >> + cpu_topology[cpu].llc_id = topology_id; >> + } > > [nit] s/topology_id/cache_id/ or s/topology_id/cache_topology_id/ ? Sure. > > Otherwise looks fine to me. You can add with above things fixed. > > Acked-by: Sudeep Holla <sudeep.holla@arm.com> > Thanks,
On Wed, May 02, 2018 at 05:32:54PM -0500, Jeremy Linton wrote: > Hi, > > On 05/02/2018 06:49 AM, Morten Rasmussen wrote: > >On Tue, May 01, 2018 at 03:33:33PM +0100, Sudeep Holla wrote: > >> > >> > >>On 26/04/18 00:31, Jeremy Linton wrote: > >>>Now that we have an accurate view of the physical topology > >>>we need to represent it correctly to the scheduler. Generally MC > >>>should equal the LLC in the system, but there are a number of > >>>special cases that need to be dealt with. > >>> > >>>In the case of NUMA in socket, we need to assure that the sched > >>>domain we build for the MC layer isn't larger than the DIE above it. > >>>Similarly for LLC's that might exist in cross socket interconnect or > >>>directory hardware we need to assure that MC is shrunk to the socket > >>>or NUMA node. > >>> > >>>This patch builds a sibling mask for the LLC, and then picks the > >>>smallest of LLC, socket siblings, or NUMA node siblings, which > >>>gives us the behavior described above. This is ever so slightly > >>>different than the similar alternative where we look for a cache > >>>layer less than or equal to the socket/NUMA siblings. > >>> > >>>The logic to pick the MC layer affects all arm64 machines, but > >>>only changes the behavior for DT/MPIDR systems if the NUMA domain > >>>is smaller than the core siblings (generally set to the cluster). > >>>Potentially this fixes a possible bug in DT systems, but really > >>>it only affects ACPI systems where the core siblings is correctly > >>>set to the socket siblings. Thus all currently available ACPI > >>>systems should have MC equal to LLC, including the NUMA in socket > >>>machines where the LLC is partitioned between the NUMA nodes. > >>> > >>>Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> > >>>--- > >>> arch/arm64/include/asm/topology.h | 2 ++ > >>> arch/arm64/kernel/topology.c | 32 +++++++++++++++++++++++++++++++- > >>> 2 files changed, 33 insertions(+), 1 deletion(-) > >>> > >>>diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h > >>>index 6b10459e6905..df48212f767b 100644 > >>>--- a/arch/arm64/include/asm/topology.h > >>>+++ b/arch/arm64/include/asm/topology.h > >>>@@ -8,8 +8,10 @@ struct cpu_topology { > >>> int thread_id; > >>> int core_id; > >>> int package_id; > >>>+ int llc_id; > >>> cpumask_t thread_sibling; > >>> cpumask_t core_sibling; > >>>+ cpumask_t llc_siblings; > >>> }; > >>> extern struct cpu_topology cpu_topology[NR_CPUS]; > >>>diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c > >>>index bd1aae438a31..20b4341dc527 100644 > >>>--- a/arch/arm64/kernel/topology.c > >>>+++ b/arch/arm64/kernel/topology.c > >>>@@ -13,6 +13,7 @@ > >>> #include <linux/acpi.h> > >>> #include <linux/arch_topology.h> > >>>+#include <linux/cacheinfo.h> > >>> #include <linux/cpu.h> > >>> #include <linux/cpumask.h> > >>> #include <linux/init.h> > >>>@@ -214,7 +215,19 @@ EXPORT_SYMBOL_GPL(cpu_topology); > >>> const struct cpumask *cpu_coregroup_mask(int cpu) > >>> { > >>>- return &cpu_topology[cpu].core_sibling; > >>>+ const cpumask_t *core_mask = cpumask_of_node(cpu_to_node(cpu)); > >>>+ > >>>+ /* Find the smaller of NUMA, core or LLC siblings */ > >>>+ if (cpumask_subset(&cpu_topology[cpu].core_sibling, core_mask)) { > >>>+ /* not numa in package, lets use the package siblings */ > >>>+ core_mask = &cpu_topology[cpu].core_sibling; > >>>+ } > >>>+ if (cpu_topology[cpu].llc_id != -1) { > >>>+ if (cpumask_subset(&cpu_topology[cpu].llc_siblings, core_mask)) > >>>+ core_mask = &cpu_topology[cpu].llc_siblings; > >>>+ } > >>>+ > >>>+ return core_mask; > >>> } > >>> static void update_siblings_masks(unsigned int cpuid) > >>>@@ -226,6 +239,9 @@ static void update_siblings_masks(unsigned int cpuid) > >>> for_each_possible_cpu(cpu) { > >>> cpu_topo = &cpu_topology[cpu]; > >>>+ if (cpuid_topo->llc_id == cpu_topo->llc_id) > >>>+ cpumask_set_cpu(cpu, &cpuid_topo->llc_siblings); > >>>+ > >> > >>Would this not result in cpuid_topo->llc_siblings = cpu_possible_mask > >>on DT systems where llc_id is not set/defaults to -1 and still pass the > >>condition. Does it make sense to add additional -1 check ? > > > >I don't think mask will be used by the current code if llc_id == -1 as > >the user does the check. Is it better to have the mask empty than > >default to cpu_possible_mask? If we require all users to implement a > >check it shouldn't matter. > > > > Right. > > There is also the other way of thinking about it, which is if you remove the > if llc_id == -1 check in cpu_coregroup_mask() does it make more sense to > have llc_siblings default equal all the cores, or just the one being > requested? Since we define cpu_coregroup_mask() to be the smallest of LLC, package, and NUMA node, letting it default to just one cpu would change/break the topology on non-PPTT systems. Wouldn't it? If we want to drop the check llc_siblings should be default to either core_siblings or cpumask_of_node(). But I don't really see the point as any user of llc_siblings that really care about where the LLC is would have to check if llc_sibling is just assigned a default value or it is indeed representing the LLC. I'm fine with just expecting the user to check llc_id to see if the llc_sibling mask is valid or not.
On Wed, Apr 25, 2018 at 06:31:21PM -0500, Jeremy Linton wrote: > Now that we have an accurate view of the physical topology > we need to represent it correctly to the scheduler. Generally MC > should equal the LLC in the system, but there are a number of > special cases that need to be dealt with. > > In the case of NUMA in socket, we need to assure that the sched > domain we build for the MC layer isn't larger than the DIE above it. > Similarly for LLC's that might exist in cross socket interconnect or > directory hardware we need to assure that MC is shrunk to the socket > or NUMA node. > > This patch builds a sibling mask for the LLC, and then picks the > smallest of LLC, socket siblings, or NUMA node siblings, which > gives us the behavior described above. This is ever so slightly > different than the similar alternative where we look for a cache > layer less than or equal to the socket/NUMA siblings. > > The logic to pick the MC layer affects all arm64 machines, but > only changes the behavior for DT/MPIDR systems if the NUMA domain > is smaller than the core siblings (generally set to the cluster). > Potentially this fixes a possible bug in DT systems, but really > it only affects ACPI systems where the core siblings is correctly > set to the socket siblings. Thus all currently available ACPI > systems should have MC equal to LLC, including the NUMA in socket > machines where the LLC is partitioned between the NUMA nodes. > > Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> This patch looks good to me. Acked-by: Morten Rasmussen <morten.rasmussen@arm.com>
diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h index 6b10459e6905..df48212f767b 100644 --- a/arch/arm64/include/asm/topology.h +++ b/arch/arm64/include/asm/topology.h @@ -8,8 +8,10 @@ struct cpu_topology { int thread_id; int core_id; int package_id; + int llc_id; cpumask_t thread_sibling; cpumask_t core_sibling; + cpumask_t llc_siblings; }; extern struct cpu_topology cpu_topology[NR_CPUS]; diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index bd1aae438a31..20b4341dc527 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -13,6 +13,7 @@ #include <linux/acpi.h> #include <linux/arch_topology.h> +#include <linux/cacheinfo.h> #include <linux/cpu.h> #include <linux/cpumask.h> #include <linux/init.h> @@ -214,7 +215,19 @@ EXPORT_SYMBOL_GPL(cpu_topology); const struct cpumask *cpu_coregroup_mask(int cpu) { - return &cpu_topology[cpu].core_sibling; + const cpumask_t *core_mask = cpumask_of_node(cpu_to_node(cpu)); + + /* Find the smaller of NUMA, core or LLC siblings */ + if (cpumask_subset(&cpu_topology[cpu].core_sibling, core_mask)) { + /* not numa in package, lets use the package siblings */ + core_mask = &cpu_topology[cpu].core_sibling; + } + if (cpu_topology[cpu].llc_id != -1) { + if (cpumask_subset(&cpu_topology[cpu].llc_siblings, core_mask)) + core_mask = &cpu_topology[cpu].llc_siblings; + } + + return core_mask; } static void update_siblings_masks(unsigned int cpuid) @@ -226,6 +239,9 @@ static void update_siblings_masks(unsigned int cpuid) for_each_possible_cpu(cpu) { cpu_topo = &cpu_topology[cpu]; + if (cpuid_topo->llc_id == cpu_topo->llc_id) + cpumask_set_cpu(cpu, &cpuid_topo->llc_siblings); + if (cpuid_topo->package_id != cpu_topo->package_id) continue; @@ -291,6 +307,10 @@ static void __init reset_cpu_topology(void) cpu_topo->core_id = 0; cpu_topo->package_id = -1; + cpu_topo->llc_id = -1; + cpumask_clear(&cpu_topo->llc_siblings); + cpumask_set_cpu(cpu, &cpu_topo->llc_siblings); + cpumask_clear(&cpu_topo->core_sibling); cpumask_set_cpu(cpu, &cpu_topo->core_sibling); cpumask_clear(&cpu_topo->thread_sibling); @@ -311,6 +331,8 @@ static int __init parse_acpi_topology(void) is_threaded = read_cpuid_mpidr() & MPIDR_MT_BITMASK; for_each_possible_cpu(cpu) { + int i; + topology_id = find_acpi_cpu_topology(cpu, 0); if (topology_id < 0) return topology_id; @@ -325,6 +347,14 @@ static int __init parse_acpi_topology(void) } topology_id = find_acpi_cpu_topology_package(cpu); cpu_topology[cpu].package_id = topology_id; + + i = acpi_find_last_cache_level(cpu); + + if (i > 0) { + topology_id = find_acpi_cpu_cache_topology(cpu, i); + if (topology_id > 0) + cpu_topology[cpu].llc_id = topology_id; + } } return 0;
Now that we have an accurate view of the physical topology we need to represent it correctly to the scheduler. Generally MC should equal the LLC in the system, but there are a number of special cases that need to be dealt with. In the case of NUMA in socket, we need to assure that the sched domain we build for the MC layer isn't larger than the DIE above it. Similarly for LLC's that might exist in cross socket interconnect or directory hardware we need to assure that MC is shrunk to the socket or NUMA node. This patch builds a sibling mask for the LLC, and then picks the smallest of LLC, socket siblings, or NUMA node siblings, which gives us the behavior described above. This is ever so slightly different than the similar alternative where we look for a cache layer less than or equal to the socket/NUMA siblings. The logic to pick the MC layer affects all arm64 machines, but only changes the behavior for DT/MPIDR systems if the NUMA domain is smaller than the core siblings (generally set to the cluster). Potentially this fixes a possible bug in DT systems, but really it only affects ACPI systems where the core siblings is correctly set to the socket siblings. Thus all currently available ACPI systems should have MC equal to LLC, including the NUMA in socket machines where the LLC is partitioned between the NUMA nodes. Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> --- arch/arm64/include/asm/topology.h | 2 ++ arch/arm64/kernel/topology.c | 32 +++++++++++++++++++++++++++++++- 2 files changed, 33 insertions(+), 1 deletion(-)