Message ID | 20230121042436.2661843-4-yury.norov@gmail.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | sched: cpumask: improve on cpumask_local_spread() locality | expand |
On 2023-01-20 at 20:24:30 -0800, Yury Norov wrote: > The function finds Nth set CPU in a given cpumask starting from a given > node. > > Leveraging the fact that each hop in sched_domains_numa_masks includes the > same or greater number of CPUs than the previous one, we can use binary > search on hops instead of linear walk, which makes the overall complexity > of O(log n) in terms of number of cpumask_weight() calls. > > Signed-off-by: Yury Norov <yury.norov@gmail.com> > Acked-by: Tariq Toukan <tariqt@nvidia.com> > Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> > Reviewed-by: Peter Lafreniere <peter@n8pjl.ca> > --- > include/linux/topology.h | 8 ++++++ > kernel/sched/topology.c | 57 ++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 65 insertions(+) > [snip] > + * sched_numa_find_nth_cpu() - given the NUMA topology, find the Nth next cpu > + * closest to @cpu from @cpumask. Just minor question, the @cpu below is used as the index, right? What does "close to @cpu" mean above? > + * cpumask: cpumask to find a cpu from > + * cpu: Nth cpu to find Maybe also add description for @node? thanks, Chenyu
On Fri, 20 Jan 2023 20:24:30 -0800 Yury Norov wrote: > The function finds Nth set CPU in a given cpumask starting from a given > node. > > Leveraging the fact that each hop in sched_domains_numa_masks includes the > same or greater number of CPUs than the previous one, we can use binary > search on hops instead of linear walk, which makes the overall complexity > of O(log n) in terms of number of cpumask_weight() calls. Valentin, would you be willing to give us a SoB or Review tag for this one? We'd like to take the whole series via networking, if that's okay.
On 06/02/23 21:09, Jakub Kicinski wrote: > On Fri, 20 Jan 2023 20:24:30 -0800 Yury Norov wrote: >> The function finds Nth set CPU in a given cpumask starting from a given >> node. >> >> Leveraging the fact that each hop in sched_domains_numa_masks includes the >> same or greater number of CPUs than the previous one, we can use binary >> search on hops instead of linear walk, which makes the overall complexity >> of O(log n) in terms of number of cpumask_weight() calls. > > Valentin, would you be willing to give us a SoB or Review tag for > this one? We'd like to take the whole series via networking, if > that's okay. Sure, feel free to add Reviewed-by: Valentin Schneider <vschneid@redhat.com> to patches that don't already have it.
Hi Jakub, Can you please fold-in the following patch? Thanks, Yury From: Yury Norov <yury.norov@gmail.com> Date: Thu, 16 Feb 2023 17:03:30 -0800 Subject: [PATCH] sched/topology: fix KASAN warning in hop_cmp() Despite that prev_hop is used conditionally on curr_hop is not the first hop, it's initialized unconditionally. Because initialization implies dereferencing, it might happen that the code dereferences uninitialized memory, which has been spotted by KASAN. Fix it by reorganizing hop_cmp() logic. Reported-by: Bruno Goncalves <bgoncalv@redhat.com> Signed-off-by: Yury Norov <yury.norov@gmail.com> --- kernel/sched/topology.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 48838a05c008..c21b8b1f3537 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -2081,14 +2081,19 @@ struct __cmp_key { static int hop_cmp(const void *a, const void *b) { - struct cpumask **prev_hop = *((struct cpumask ***)b - 1); - struct cpumask **cur_hop = *(struct cpumask ***)b; + struct cpumask **prev_hop, **cur_hop = *(struct cpumask ***)b; struct __cmp_key *k = (struct __cmp_key *)a; if (cpumask_weight_and(k->cpus, cur_hop[k->node]) <= k->cpu) return 1; - k->w = (b == k->masks) ? 0 : cpumask_weight_and(k->cpus, prev_hop[k->node]); + if (b == k->masks) { + k->w = 0; + return 0; + } + + prev_hop = *((struct cpumask ***)b - 1); + k->w = cpumask_weight_and(k->cpus, prev_hop[k->node]); if (k->w <= k->cpu) return 0;
On Thu, Feb 16, 2023 at 05:39:08PM -0800, Yury Norov wrote: > From: Yury Norov <yury.norov@gmail.com> > Date: Thu, 16 Feb 2023 17:03:30 -0800 > Subject: [PATCH] sched/topology: fix KASAN warning in hop_cmp() > > Despite that prev_hop is used conditionally on curr_hop is not the curr --> cur > first hop, it's initialized unconditionally. > > Because initialization implies dereferencing, it might happen that > the code dereferences uninitialized memory, which has been spotted by > KASAN. Fix it by reorganizing hop_cmp() logic. Nice catch! I guess it deserves for a comment inside the code (IIRC I was puzzled of the logic behind and it was changed due to lack of this knowledge.)
On Thu, 16 Feb 2023 17:39:08 -0800 Yury Norov wrote: > Despite that prev_hop is used conditionally on curr_hop is not the > first hop, it's initialized unconditionally. > > Because initialization implies dereferencing, it might happen that > the code dereferences uninitialized memory, which has been spotted by > KASAN. Fix it by reorganizing hop_cmp() logic. > > Reported-by: Bruno Goncalves <bgoncalv@redhat.com> > Signed-off-by: Yury Norov <yury.norov@gmail.com> Fixed the spelling pointed out by Andy and applied, thanks!
diff --git a/include/linux/topology.h b/include/linux/topology.h index 4564faafd0e1..72f264575698 100644 --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -245,5 +245,13 @@ static inline const struct cpumask *cpu_cpu_mask(int cpu) return cpumask_of_node(cpu_to_node(cpu)); } +#ifdef CONFIG_NUMA +int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node); +#else +static __always_inline int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node) +{ + return cpumask_nth(cpu, cpus); +} +#endif /* CONFIG_NUMA */ #endif /* _LINUX_TOPOLOGY_H */ diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 8739c2a5a54e..2bf89186a10f 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -3,6 +3,8 @@ * Scheduler topology setup/handling methods */ +#include <linux/bsearch.h> + DEFINE_MUTEX(sched_domains_mutex); /* Protected by sched_domains_mutex: */ @@ -2067,6 +2069,61 @@ int sched_numa_find_closest(const struct cpumask *cpus, int cpu) return found; } +struct __cmp_key { + const struct cpumask *cpus; + struct cpumask ***masks; + int node; + int cpu; + int w; +}; + +static int hop_cmp(const void *a, const void *b) +{ + struct cpumask **prev_hop = *((struct cpumask ***)b - 1); + struct cpumask **cur_hop = *(struct cpumask ***)b; + struct __cmp_key *k = (struct __cmp_key *)a; + + if (cpumask_weight_and(k->cpus, cur_hop[k->node]) <= k->cpu) + return 1; + + k->w = (b == k->masks) ? 0 : cpumask_weight_and(k->cpus, prev_hop[k->node]); + if (k->w <= k->cpu) + return 0; + + return -1; +} + +/* + * sched_numa_find_nth_cpu() - given the NUMA topology, find the Nth next cpu + * closest to @cpu from @cpumask. + * cpumask: cpumask to find a cpu from + * cpu: Nth cpu to find + * + * returns: cpu, or nr_cpu_ids when nothing found. + */ +int sched_numa_find_nth_cpu(const struct cpumask *cpus, int cpu, int node) +{ + struct __cmp_key k = { .cpus = cpus, .node = node, .cpu = cpu }; + struct cpumask ***hop_masks; + int hop, ret = nr_cpu_ids; + + rcu_read_lock(); + + k.masks = rcu_dereference(sched_domains_numa_masks); + if (!k.masks) + goto unlock; + + hop_masks = bsearch(&k, k.masks, sched_domains_numa_levels, sizeof(k.masks[0]), hop_cmp); + hop = hop_masks - k.masks; + + ret = hop ? + cpumask_nth_and_andnot(cpu - k.w, cpus, k.masks[hop][node], k.masks[hop-1][node]) : + cpumask_nth_and(cpu, cpus, k.masks[0][node]); +unlock: + rcu_read_unlock(); + return ret; +} +EXPORT_SYMBOL_GPL(sched_numa_find_nth_cpu); #endif /* CONFIG_NUMA */ static int __sdt_alloc(const struct cpumask *cpu_map)