[v2,7/8] lib: add test for for_each_numa_{cpu,hop_mask}()

Message ID	20230420051946.7463-8-yury.norov@gmail.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-rdma-owner@vger.kernel.org> From: Yury Norov <yury.norov@gmail.com> To: Jakub Kicinski <kuba@kernel.org>, netdev@vger.kernel.org, linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Yury Norov <yury.norov@gmail.com>, Saeed Mahameed <saeedm@nvidia.com>, Pawel Chmielewski <pawel.chmielewski@intel.com>, Leon Romanovsky <leon@kernel.org>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Paolo Abeni <pabeni@redhat.com>, Andy Shevchenko <andriy.shevchenko@linux.intel.com>, Rasmus Villemoes <linux@rasmusvillemoes.dk>, Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>, Vincent Guittot <vincent.guittot@linaro.org>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>, Daniel Bristot de Oliveira <bristot@redhat.com>, Valentin Schneider <vschneid@redhat.com>, Tariq Toukan <tariqt@nvidia.com>, Gal Pressman <gal@nvidia.com>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Heiko Carstens <hca@linux.ibm.com>, Barry Song <baohua@kernel.org> Subject: [PATCH v2 7/8] lib: add test for for_each_numa_{cpu,hop_mask}() Date: Wed, 19 Apr 2023 22:19:45 -0700 Message-Id: <20230420051946.7463-8-yury.norov@gmail.com> In-Reply-To: <20230420051946.7463-1-yury.norov@gmail.com> References: <20230420051946.7463-1-yury.norov@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	sched/topology: add for_each_numa_cpu() macro \| expand [v2,0/8] sched/topology: add for_each_numa_cpu() macro [v2,1/8] lib/find: add find_next_and_andnot_bit() [v2,2/8] sched/topology: introduce sched_numa_find_next_cpu() [v2,3/8] sched/topology: add for_each_numa_cpu() macro [v2,4/8] net: mlx5: switch comp_irqs_request() to using for_each_numa_cpu [v2,5/8] lib/cpumask: update comment to cpumask_local_spread() [v2,6/8] sched/topology: export sched_domains_numa_levels [v2,7/8] lib: add test for for_each_numa_{cpu,hop_mask}() [v2,8/8] sched: drop for_each_numa_hop_mask()

Yury Norov April 20, 2023, 5:19 a.m. UTC

The test ensures that enumerators' output is consistent with
cpumask_local_spread().

Signed-off-by: Yury Norov <yury.norov@gmail.com>
---
 lib/test_bitmap.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

Valentin Schneider April 24, 2023, 5:09 p.m. UTC | #1

On 19/04/23 22:19, Yury Norov wrote:
> +	for (node = 0; node < sched_domains_numa_levels; node++) {
> +		unsigned int hop, c = 0;
> +
> +		rcu_read_lock();
> +		for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
> +			expect_eq_uint(cpumask_local_spread(c++, node), cpu);
> +		rcu_read_unlock();
> +	}

I'm not fond of the export of sched_domains_numa_levels, especially
considering it's just there for tests.

Furthermore, is there any value is testing parity with
cpumask_local_spread()? Rather, shouldn't we check that using this API does
yield CPUs of increasing NUMA distance?

Something like

        for_each_node(node) {
                unsigned int prev_cpu, hop = 0;

                cpu = cpumask_first(cpumask_of_node(node));
                prev_cpu = cpu;

                rcu_read_lock();

                /* Assert distance is monotonically increasing */
                for_each_numa_cpu(cpu, hop, node, cpu_online_mask) {
                        expect_ge_uint(cpu_to_node(cpu), cpu_to_node(prev_cpu));
                        prev_cpu = cpu;
                }

                rcu_read_unlock();
        }

Yury Norov April 26, 2023, 5:50 a.m. UTC | #2

Hi Valentin,

Thanks for review!

On Mon, Apr 24, 2023 at 06:09:52PM +0100, Valentin Schneider wrote:
> On 19/04/23 22:19, Yury Norov wrote:
> > +	for (node = 0; node < sched_domains_numa_levels; node++) {
> > +		unsigned int hop, c = 0;
> > +
> > +		rcu_read_lock();
> > +		for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
> > +			expect_eq_uint(cpumask_local_spread(c++, node), cpu);
> > +		rcu_read_unlock();
> > +	}
> 
> I'm not fond of the export of sched_domains_numa_levels, especially
> considering it's just there for tests.
> 
> Furthermore, is there any value is testing parity with
> cpumask_local_spread()?

I wanted to emphasize that new NUMA-aware functions are coherent with
each other, just like find_nth_bit() is coherent with find_next_bit().

But all that coherence looks important only in non-NUMA case, because
client code may depend on fact that next CPU is never less than current.
This doesn't hold for NUMA iterators anyways...

> Rather, shouldn't we check that using this API does
> yield CPUs of increasing NUMA distance?
> 
> Something like
> 
>         for_each_node(node) {
>                 unsigned int prev_cpu, hop = 0;
> 
>                 cpu = cpumask_first(cpumask_of_node(node));
>                 prev_cpu = cpu;
> 
>                 rcu_read_lock();
> 
>                 /* Assert distance is monotonically increasing */
>                 for_each_numa_cpu(cpu, hop, node, cpu_online_mask) {
>                         expect_ge_uint(cpu_to_node(cpu), cpu_to_node(prev_cpu));
>                         prev_cpu = cpu;
>                 }
> 
>                 rcu_read_unlock();
>         }

Your version of the test looks more straightforward. I need to think
for more, but it looks like I can take it in v3.

Thanks,
Yury

Valentin Schneider April 26, 2023, 9:17 a.m. UTC | #3

On 25/04/23 22:50, Yury Norov wrote:
> Hi Valentin,
>
> Thanks for review!
>
> On Mon, Apr 24, 2023 at 06:09:52PM +0100, Valentin Schneider wrote:
>> On 19/04/23 22:19, Yury Norov wrote:
>> > +	for (node = 0; node < sched_domains_numa_levels; node++) {
>> > +		unsigned int hop, c = 0;
>> > +
>> > +		rcu_read_lock();
>> > +		for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
>> > +			expect_eq_uint(cpumask_local_spread(c++, node), cpu);
>> > +		rcu_read_unlock();
>> > +	}
>> 
>> I'm not fond of the export of sched_domains_numa_levels, especially
>> considering it's just there for tests.
>> 
>> Furthermore, is there any value is testing parity with
>> cpumask_local_spread()?
>
> I wanted to emphasize that new NUMA-aware functions are coherent with
> each other, just like find_nth_bit() is coherent with find_next_bit().
>
> But all that coherence looks important only in non-NUMA case, because
> client code may depend on fact that next CPU is never less than current.
> This doesn't hold for NUMA iterators anyways...
>

Ah right, I see your point. But yes, distance-ordered walks break this
assumption.

>> Rather, shouldn't we check that using this API does
>> yield CPUs of increasing NUMA distance?
>> 
>> Something like
>> 
>>         for_each_node(node) {
>>                 unsigned int prev_cpu, hop = 0;
>> 
>>                 cpu = cpumask_first(cpumask_of_node(node));
>>                 prev_cpu = cpu;
>> 
>>                 rcu_read_lock();
>> 
>>                 /* Assert distance is monotonically increasing */
>>                 for_each_numa_cpu(cpu, hop, node, cpu_online_mask) {
>>                         expect_ge_uint(cpu_to_node(cpu), cpu_to_node(prev_cpu));
>>                         prev_cpu = cpu;
>>                 }
>> 
>>                 rcu_read_unlock();
>>         }
>
> Your version of the test looks more straightforward. I need to think
> for more, but it looks like I can take it in v3.
>

I realized I only wrote half the relevant code - comparing node IDs is
meaningless, I meant to compare distances as we walk through the
CPUs... I tested the below against a few NUMA topologies and it seems to be
sane:

diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c
index 6becb044a66f0..8f8512d139d58 100644
--- a/lib/test_bitmap.c
+++ b/lib/test_bitmap.c
@@ -174,11 +174,23 @@ __check_eq_str(const char *srcfile, unsigned int line,
 	return eq;
 }
 
-#define __expect_eq(suffix, ...)					\
+static bool __init
+__check_ge_uint(const char *srcfile, unsigned int line,
+		const unsigned int a, unsigned int b)
+{
+	if (a < b) {
+		pr_err("[%s:%u] expected a(%u) >= b(%u)\n",
+			srcfile, line, a, b);
+		return false;
+	}
+	return true;
+}
+
+#define __expect_op(op, suffix, ...)					\
 	({								\
 		int result = 0;						\
 		total_tests++;						\
-		if (!__check_eq_ ## suffix(__FILE__, __LINE__,		\
+		if (!__check_## op ## _ ## suffix(__FILE__, __LINE__,	\
 					   ##__VA_ARGS__)) {		\
 			failed_tests++;					\
 			result = 1;					\
@@ -186,6 +198,9 @@ __check_eq_str(const char *srcfile, unsigned int line,
 		result;							\
 	})
 
+#define __expect_eq(suffix, ...) __expect_op(eq, suffix, ##__VA_ARGS__)
+#define __expect_ge(suffix, ...) __expect_op(ge, suffix, ##__VA_ARGS__)
+
 #define expect_eq_uint(...)		__expect_eq(uint, ##__VA_ARGS__)
 #define expect_eq_bitmap(...)		__expect_eq(bitmap, ##__VA_ARGS__)
 #define expect_eq_pbl(...)		__expect_eq(pbl, ##__VA_ARGS__)
@@ -193,6 +208,8 @@ __check_eq_str(const char *srcfile, unsigned int line,
 #define expect_eq_clump8(...)		__expect_eq(clump8, ##__VA_ARGS__)
 #define expect_eq_str(...)		__expect_eq(str, ##__VA_ARGS__)
 
+#define expect_ge_uint(...)		__expect_ge(uint, ##__VA_ARGS__)
+
 static void __init test_zero_clear(void)
 {
 	DECLARE_BITMAP(bmap, 1024);
@@ -756,12 +773,23 @@ static void __init test_for_each_numa(void)
 {
 	unsigned int cpu, node;
 
-	for (node = 0; node < sched_domains_numa_levels; node++) {
-		unsigned int hop, c = 0;
+	for_each_node(node) {
+		unsigned int start_cpu, prev_dist, hop = 0;
+
+		cpu = cpumask_first(cpumask_of_node(node));
+		prev_dist = node_distance(node, node);
+		start_cpu = cpu;
 
 		rcu_read_lock();
-		for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
-			expect_eq_uint(cpumask_local_spread(c++, node), cpu);
+
+		/* Assert distance is monotonically increasing */
+		for_each_numa_cpu(cpu, hop, node, cpu_online_mask) {
+			unsigned int dist = node_distance(cpu_to_node(cpu), cpu_to_node(start_cpu));
+
+			expect_ge_uint(dist, prev_dist);
+			prev_dist = dist;
+		}
+
 		rcu_read_unlock();
 	}
 }

Yury Norov April 26, 2023, 8:51 p.m. UTC | #4

> I realized I only wrote half the relevant code - comparing node IDs is
> meaningless, I meant to compare distances as we walk through the
> CPUs... I tested the below against a few NUMA topologies and it seems to be
> sane:
> 
> @@ -756,12 +773,23 @@ static void __init test_for_each_numa(void)
>  {
>  	unsigned int cpu, node;
>  
> -	for (node = 0; node < sched_domains_numa_levels; node++) {
> -		unsigned int hop, c = 0;
> +	for_each_node(node) {
> +		unsigned int start_cpu, prev_dist, hop = 0;
> +
> +		cpu = cpumask_first(cpumask_of_node(node));
> +		prev_dist = node_distance(node, node);
> +		start_cpu = cpu;
>  
>  		rcu_read_lock();
> -		for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
> -			expect_eq_uint(cpumask_local_spread(c++, node), cpu);
> +
> +		/* Assert distance is monotonically increasing */
> +		for_each_numa_cpu(cpu, hop, node, cpu_online_mask) {
> +			unsigned int dist = node_distance(cpu_to_node(cpu), cpu_to_node(start_cpu));

Interestingly, node_distance() is an arch-specific function. Generic
implementation is quite useless:

 #define node_distance(from,to)  ((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)

Particularly, arm64 takes the above. With node_distance() implemented
like that, we can barely test something...

Taking that into the account, I think it's better to test iterator against
cpumask_local_spread(), like in v2. I'll add a comment about that in v3.

> +
> +			expect_ge_uint(dist, prev_dist);
> +			prev_dist = dist;
> +		}
> +
>  		rcu_read_unlock();
>  	}
>  }

Valentin Schneider April 27, 2023, 9:35 a.m. UTC | #5

On 26/04/23 13:51, Yury Norov wrote:
>> I realized I only wrote half the relevant code - comparing node IDs is
>> meaningless, I meant to compare distances as we walk through the
>> CPUs... I tested the below against a few NUMA topologies and it seems to be
>> sane:
>> 
>> @@ -756,12 +773,23 @@ static void __init test_for_each_numa(void)
>>  {
>>  	unsigned int cpu, node;
>>  
>> -	for (node = 0; node < sched_domains_numa_levels; node++) {
>> -		unsigned int hop, c = 0;
>> +	for_each_node(node) {
>> +		unsigned int start_cpu, prev_dist, hop = 0;
>> +
>> +		cpu = cpumask_first(cpumask_of_node(node));
>> +		prev_dist = node_distance(node, node);
>> +		start_cpu = cpu;
>>  
>>  		rcu_read_lock();
>> -		for_each_numa_cpu(cpu, hop, node, cpu_online_mask)
>> -			expect_eq_uint(cpumask_local_spread(c++, node), cpu);
>> +
>> +		/* Assert distance is monotonically increasing */
>> +		for_each_numa_cpu(cpu, hop, node, cpu_online_mask) {
>> +			unsigned int dist = node_distance(cpu_to_node(cpu), cpu_to_node(start_cpu));
>
> Interestingly, node_distance() is an arch-specific function. Generic
> implementation is quite useless:
>
>  #define node_distance(from,to)  ((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
>
> Particularly, arm64 takes the above. With node_distance() implemented
> like that, we can barely test something...
>

riscv and arm64 rely on drivers/base/arch_numa.c to provide
__node_distance() (cf. CONFIG_GENERIC_ARCH_NUMA).

x86, sparc, powerpc and ia64 define __node_distance()
loongarch and mips define their own node_distance().

So all of those archs will have a usable node_distance(), the others won't
and that means the scheduler can't do anything about it - the scheduler
relies on node_distance() to understand the topolgoy!

[v2,7/8] lib: add test for for_each_numa_{cpu,hop_mask}()

Commit Message

Comments

Patch