[v4,6/7] sched/topology: Introduce for_each_numa_hop_cpu()

Message ID	20220923155542.1212814-5-vschneid@redhat.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-rdma-owner@kernel.org> From: Valentin Schneider <vschneid@redhat.com> To: netdev@vger.kernel.org, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Saeed Mahameed <saeedm@nvidia.com>, Leon Romanovsky <leon@kernel.org>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Yury Norov <yury.norov@gmail.com>, Andy Shevchenko <andriy.shevchenko@linux.intel.com>, Rasmus Villemoes <linux@rasmusvillemoes.dk>, Ingo Molnar <mingo@kernel.org>, Peter Zijlstra <peterz@infradead.org>, Vincent Guittot <vincent.guittot@linaro.org>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Steven Rostedt <rostedt@goodmis.org>, Mel Gorman <mgorman@suse.de>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Heiko Carstens <hca@linux.ibm.com>, Tony Luck <tony.luck@intel.com>, Jonathan Cameron <Jonathan.Cameron@huawei.com>, Gal Pressman <gal@nvidia.com>, Tariq Toukan <tariqt@nvidia.com>, Jesse Brandeburg <jesse.brandeburg@intel.com> Subject: [PATCH v4 6/7] sched/topology: Introduce for_each_numa_hop_cpu() Date: Fri, 23 Sep 2022 16:55:41 +0100 Message-Id: <20220923155542.1212814-5-vschneid@redhat.com> In-Reply-To: <20220923132527.1001870-1-vschneid@redhat.com> References: <20220923132527.1001870-1-vschneid@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	sched, net: NUMA-aware CPU spreading interface \| expand [v4,0/7] sched, net: NUMA-aware CPU spreading interface [v4,1/7] lib/find_bit: Introduce find_next_andnot_bit() [v4,2/7] cpumask: Introduce for_each_cpu_andnot() [v4,3/7] lib/test_cpumask: Add for_each_cpu_and(not) tests [v4,4/7] sched/core: Merge cpumask_andnot()+for_each_cpu() into for_each_cpu_andnot() [v4,5/7] sched/topology: Introduce sched_numa_hop_mask() [v4,6/7] sched/topology: Introduce for_each_numa_hop_cpu() [v4,7/7] net/mlx5e: Improve remote NUMA preferences used for the IRQ affinity hints

Message ID

20220923155542.1212814-5-vschneid@redhat.com (mailing list archive)

State

Superseded

Headers

From: Valentin Schneider <vschneid@redhat.com>
To: netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
        linux-kernel@vger.kernel.org
Cc: Saeed Mahameed <saeedm@nvidia.com>,
        Leon Romanovsky <leon@kernel.org>,
        "David S. Miller" <davem@davemloft.net>,
        Eric Dumazet <edumazet@google.com>,
        Jakub Kicinski <kuba@kernel.org>,
        Paolo Abeni <pabeni@redhat.com>,
        Yury Norov <yury.norov@gmail.com>,
        Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
        Rasmus Villemoes <linux@rasmusvillemoes.dk>,
        Ingo Molnar <mingo@kernel.org>,
        Peter Zijlstra <peterz@infradead.org>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Mel Gorman <mgorman@suse.de>,
        Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
        Heiko Carstens <hca@linux.ibm.com>,
        Tony Luck <tony.luck@intel.com>,
        Jonathan Cameron <Jonathan.Cameron@huawei.com>,
        Gal Pressman <gal@nvidia.com>,
        Tariq Toukan <tariqt@nvidia.com>,
        Jesse Brandeburg <jesse.brandeburg@intel.com>
Subject: [PATCH v4 6/7] sched/topology: Introduce for_each_numa_hop_cpu()
Date: Fri, 23 Sep 2022 16:55:41 +0100
Message-Id: <20220923155542.1212814-5-vschneid@redhat.com>
In-Reply-To: <20220923132527.1001870-1-vschneid@redhat.com>
References: <20220923132527.1001870-1-vschneid@redhat.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

sched, net: NUMA-aware CPU spreading interface | expand

Commit Message

Valentin Schneider Sept. 23, 2022, 3:55 p.m. UTC

The recently introduced sched_numa_hop_mask() exposes cpumasks of CPUs
reachable within a given distance budget, but this means each successive
cpumask is a superset of the previous one.

Code wanting to allocate one item per CPU (e.g. IRQs) at increasing
distances would thus need to allocate a temporary cpumask to note which
CPUs have already been visited. This can be prevented by leveraging
for_each_cpu_andnot() - package all that logic into one ugl^D fancy macro.

Signed-off-by: Valentin Schneider <vschneid@redhat.com>
---
 include/linux/topology.h | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

Comments

Yury Norov Sept. 25, 2022, 2:58 p.m. UTC | #1

On Fri, Sep 23, 2022 at 04:55:41PM +0100, Valentin Schneider wrote:
> The recently introduced sched_numa_hop_mask() exposes cpumasks of CPUs
> reachable within a given distance budget, but this means each successive
> cpumask is a superset of the previous one.
> 
> Code wanting to allocate one item per CPU (e.g. IRQs) at increasing
> distances would thus need to allocate a temporary cpumask to note which
> CPUs have already been visited. This can be prevented by leveraging
> for_each_cpu_andnot() - package all that logic into one ugl^D fancy macro.
> 
> Signed-off-by: Valentin Schneider <vschneid@redhat.com>
> ---
>  include/linux/topology.h | 37 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 37 insertions(+)
> 
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index 3e91ae6d0ad5..7aa7e6a4c739 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -257,5 +257,42 @@ static inline const struct cpumask *sched_numa_hop_mask(int node, int hops)
>  }
>  #endif	/* CONFIG_NUMA */
>  
> +/**
> + * for_each_numa_hop_cpu - iterate over CPUs by increasing NUMA distance,
> + *                         starting from a given node.
> + * @cpu: the iteration variable.
> + * @node: the NUMA node to start the search from.
> + *
> + * Requires rcu_lock to be held.
> + * Careful: this is a double loop, 'break' won't work as expected.

This warning concerns me not only because new iteration loop hides
complexity and breaks 'break' (sic!), but also because it looks too
specific. Why don't you split it, so instead:

       for_each_numa_hop_cpu(cpu, dev->priv.numa_node) {
               cpus[i] = cpu;
               if (++i == ncomp_eqs)
                       goto spread_done;
       }

in the following patch you would have something like this:

       for_each_node_hop(hop, node) {
               struct cpumask hop_cpus = sched_numa_hop_mask(node, hop);

               for_each_cpu_andnot(cpu, hop_cpus, ...) {
                       cpus[i] = cpu;
                       if (++i == ncomp_eqs)
                               goto spread_done;
               }
       }

It looks more bulky, but I believe there will be more users for
for_each_node_hop() alone.

On top of that, if you really like it, you can implement
for_each_numa_hop_cpu() if you want.

> + * Implementation notes:
> + *
> + * Providing it is valid, the mask returned by
> + *  sched_numa_hop_mask(node, hops+1)
> + * is a superset of the one returned by
> + *   sched_numa_hop_mask(node, hops)
> + * which may not be that useful for drivers that try to spread things out and
> + * want to visit a CPU not more than once.
> + *
> + * To accommodate for that, we use for_each_cpu_andnot() to iterate over the cpus
> + * of sched_numa_hop_mask(node, hops+1) with the CPUs of
> + * sched_numa_hop_mask(node, hops) removed, IOW we only iterate over CPUs
> + * a given distance away (rather than *up to* a given distance).
> + *
> + * hops=0 forces us to play silly games: we pass cpu_none_mask to
> + * for_each_cpu_andnot(), which turns it into for_each_cpu().
> + */
> +#define for_each_numa_hop_cpu(cpu, node)				       \
> +	for (struct { const struct cpumask *curr, *prev; int hops; } __v =     \
> +		     { sched_numa_hop_mask(node, 0), NULL, 0 };		       \

This anonymous structure is never used as structure. What for you
define it? Why not just declare hops, prev and curr without packing
them?

Thanks,
Yury

> +	     !IS_ERR_OR_NULL(__v.curr);					       \
> +	     __v.hops++,                                                       \
> +	     __v.prev = __v.curr,					       \
> +	     __v.curr = sched_numa_hop_mask(node, __v.hops))                   \
> +		for_each_cpu_andnot(cpu,				       \
> +				    __v.curr,				       \
> +				    __v.hops ? __v.prev : cpu_none_mask)
>  
>  #endif /* _LINUX_TOPOLOGY_H */
> -- 
> 2.31.1

Valentin Schneider Sept. 27, 2022, 4:45 p.m. UTC | #2

On 25/09/22 07:58, Yury Norov wrote:
> On Fri, Sep 23, 2022 at 04:55:41PM +0100, Valentin Schneider wrote:
>> +/**
>> + * for_each_numa_hop_cpu - iterate over CPUs by increasing NUMA distance,
>> + *                         starting from a given node.
>> + * @cpu: the iteration variable.
>> + * @node: the NUMA node to start the search from.
>> + *
>> + * Requires rcu_lock to be held.
>> + * Careful: this is a double loop, 'break' won't work as expected.
>
> This warning concerns me not only because new iteration loop hides
> complexity and breaks 'break' (sic!), but also because it looks too
> specific. Why don't you split it, so instead:
>
>        for_each_numa_hop_cpu(cpu, dev->priv.numa_node) {
>                cpus[i] = cpu;
>                if (++i == ncomp_eqs)
>                        goto spread_done;
>        }
>
> in the following patch you would have something like this:
>
>        for_each_node_hop(hop, node) {
>                struct cpumask hop_cpus = sched_numa_hop_mask(node, hop);
>
>                for_each_cpu_andnot(cpu, hop_cpus, ...) {
>                        cpus[i] = cpu;
>                        if (++i == ncomp_eqs)
>                                goto spread_done;
>                }
>        }
>
> It looks more bulky, but I believe there will be more users for
> for_each_node_hop() alone.
>
> On top of that, if you really like it, you can implement
> for_each_numa_hop_cpu() if you want.
>

IIUC you're suggesting to introduce an iterator for the cpumasks first, and
then maybe add one on top for the individual cpus.

I'm happy to do that, though I have to say I'm keen to keep the CPU
iterator - IMO the complexity is justified if it is centralized in one
location and saves us from boring old boilerplate code.

>> + * Implementation notes:
>> + *
>> + * Providing it is valid, the mask returned by
>> + *  sched_numa_hop_mask(node, hops+1)
>> + * is a superset of the one returned by
>> + *   sched_numa_hop_mask(node, hops)
>> + * which may not be that useful for drivers that try to spread things out and
>> + * want to visit a CPU not more than once.
>> + *
>> + * To accommodate for that, we use for_each_cpu_andnot() to iterate over the cpus
>> + * of sched_numa_hop_mask(node, hops+1) with the CPUs of
>> + * sched_numa_hop_mask(node, hops) removed, IOW we only iterate over CPUs
>> + * a given distance away (rather than *up to* a given distance).
>> + *
>> + * hops=0 forces us to play silly games: we pass cpu_none_mask to
>> + * for_each_cpu_andnot(), which turns it into for_each_cpu().
>> + */
>> +#define for_each_numa_hop_cpu(cpu, node)				       \
>> +	for (struct { const struct cpumask *curr, *prev; int hops; } __v =     \
>> +		     { sched_numa_hop_mask(node, 0), NULL, 0 };		       \
>
> This anonymous structure is never used as structure. What for you
> define it? Why not just declare hops, prev and curr without packing
> them?
>

I haven't found a way to do this that doesn't involve a struct - apparently
you can't mix types in a for loop declaration clause.

> Thanks,
> Yury
>

diff --git a/include/linux/topology.h b/include/linux/topology.h
index 3e91ae6d0ad5..7aa7e6a4c739 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -257,5 +257,42 @@  static inline const struct cpumask *sched_numa_hop_mask(int node, int hops)
 }
 #endif	/* CONFIG_NUMA */
 
+/**
+ * for_each_numa_hop_cpu - iterate over CPUs by increasing NUMA distance,
+ *                         starting from a given node.
+ * @cpu: the iteration variable.
+ * @node: the NUMA node to start the search from.
+ *
+ * Requires rcu_lock to be held.
+ * Careful: this is a double loop, 'break' won't work as expected.
+ *
+ *
+ * Implementation notes:
+ *
+ * Providing it is valid, the mask returned by
+ *  sched_numa_hop_mask(node, hops+1)
+ * is a superset of the one returned by
+ *   sched_numa_hop_mask(node, hops)
+ * which may not be that useful for drivers that try to spread things out and
+ * want to visit a CPU not more than once.
+ *
+ * To accommodate for that, we use for_each_cpu_andnot() to iterate over the cpus
+ * of sched_numa_hop_mask(node, hops+1) with the CPUs of
+ * sched_numa_hop_mask(node, hops) removed, IOW we only iterate over CPUs
+ * a given distance away (rather than *up to* a given distance).
+ *
+ * hops=0 forces us to play silly games: we pass cpu_none_mask to
+ * for_each_cpu_andnot(), which turns it into for_each_cpu().
+ */
+#define for_each_numa_hop_cpu(cpu, node)				       \
+	for (struct { const struct cpumask *curr, *prev; int hops; } __v =     \
+		     { sched_numa_hop_mask(node, 0), NULL, 0 };		       \
+	     !IS_ERR_OR_NULL(__v.curr);					       \
+	     __v.hops++,                                                       \
+	     __v.prev = __v.curr,					       \
+	     __v.curr = sched_numa_hop_mask(node, __v.hops))                   \
+		for_each_cpu_andnot(cpu,				       \
+				    __v.curr,				       \
+				    __v.hops ? __v.prev : cpu_none_mask)
 
 #endif /* _LINUX_TOPOLOGY_H */

[v4,6/7] sched/topology: Introduce for_each_numa_hop_cpu()

Commit Message

Comments

Patch