diff mbox series

net: sort queues in xps maps

Message ID 1657679096-38572-1-git-send-email-liyonglong@chinatelecom.cn (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series net: sort queues in xps maps | expand

Checks

Context Check Description
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/cc_maintainers warning 1 maintainers not CCed: petrm@nvidia.com
netdev/build_clang success Errors and warnings before: 6 this patch: 6
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 31 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

YonglongLi July 13, 2022, 2:24 a.m. UTC
in the following case that set xps of each tx-queue with same cpu mask,
packets in the same tcp stream may be hash to different tx queue. Because
the order of queues in each xps map is not the same.

first set each tx-queue with different cpu mask
echo 0 > /sys/class/net/eth0/queues/tx-0
echo 1 > /sys/class/net/eth0/queues/tx-1
echo 2 > /sys/class/net/eth0/queues/tx-2
echo 4 > /sys/class/net/eth0/queues/tx-3
and then set each tx-queue with same cpu mask
echo f > /sys/class/net/eth0/queues/tx-0
echo f > /sys/class/net/eth0/queues/tx-1
echo f > /sys/class/net/eth0/queues/tx-2
echo f > /sys/class/net/eth0/queues/tx-3

at this point the order of each map queues is differnet, It will cause
packets in the same stream be hashed to diffetent tx queue:
attr_map[0].queues = [0,1,2,3]
attr_map[1].queues = [1,0,2,3]
attr_map[2].queues = [2,0,1,3]
attr_map[3].queues = [3,0,1,2]

It is more reasonable that pacekts in the same stream be hashed to the same
tx queue when all tx queue bind with the same CPUs.

Fixes: 537c00de1c9b ("net: Add functions netif_reset_xps_queue and netif_set_xps_queue")
Signed-off-by: Yonglong Li <liyonglong@chinatelecom.cn>
---
 net/core/dev.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Jakub Kicinski July 14, 2022, 2:07 a.m. UTC | #1
On Wed, 13 Jul 2022 10:24:56 +0800 Yonglong Li wrote:
> in the following case that set xps of each tx-queue with same cpu mask,
> packets in the same tcp stream may be hash to different tx queue. Because
> the order of queues in each xps map is not the same.
> 
> first set each tx-queue with different cpu mask
> echo 0 > /sys/class/net/eth0/queues/tx-0
> echo 1 > /sys/class/net/eth0/queues/tx-1
> echo 2 > /sys/class/net/eth0/queues/tx-2
> echo 4 > /sys/class/net/eth0/queues/tx-3
> and then set each tx-queue with same cpu mask
> echo f > /sys/class/net/eth0/queues/tx-0
> echo f > /sys/class/net/eth0/queues/tx-1
> echo f > /sys/class/net/eth0/queues/tx-2
> echo f > /sys/class/net/eth0/queues/tx-3

These commands look truncated.

> at this point the order of each map queues is differnet, It will cause
> packets in the same stream be hashed to diffetent tx queue:
> attr_map[0].queues = [0,1,2,3]
> attr_map[1].queues = [1,0,2,3]
> attr_map[2].queues = [2,0,1,3]
> attr_map[3].queues = [3,0,1,2]
> 
> It is more reasonable that pacekts in the same stream be hashed to the same
> tx queue when all tx queue bind with the same CPUs.
> 
> Fixes: 537c00de1c9b ("net: Add functions netif_reset_xps_queue and netif_set_xps_queue")

I'd suggest treating this as a general improvement rather than fix,
the kernel always behaved this way - it seems logical that sorted is
better but whether it's a bug not to sort is not as clear cut.

> @@ -2654,6 +2660,13 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
>  					  skip_tc);
>  	}
>  
> +	for (j = -1; j = netif_attrmask_next_and(j, online_mask, mask, nr_ids),
> +	     j < nr_ids;) {
> +		tci = j * num_tc + tc;
> +		map = xmap_dereference(new_dev_maps->attr_map[tci]);
> +		sort(map->queues, map->len, sizeof(u16), cmp_u16, NULL);
> +	}
> +

Can we instead make sure that expand_xps_map() maintains order?
YonglongLi July 14, 2022, 3:24 a.m. UTC | #2
Hi Jakub,

Thanks for your feedback.

On 7/14/2022 10:07 AM, Jakub Kicinski wrote:
> On Wed, 13 Jul 2022 10:24:56 +0800 Yonglong Li wrote:
>> in the following case that set xps of each tx-queue with same cpu mask,
>> packets in the same tcp stream may be hash to different tx queue. Because
>> the order of queues in each xps map is not the same.
>>
>> first set each tx-queue with different cpu mask
>> echo 0 > /sys/class/net/eth0/queues/tx-0
>> echo 1 > /sys/class/net/eth0/queues/tx-1
>> echo 2 > /sys/class/net/eth0/queues/tx-2
>> echo 4 > /sys/class/net/eth0/queues/tx-3
>> and then set each tx-queue with same cpu mask
>> echo f > /sys/class/net/eth0/queues/tx-0
>> echo f > /sys/class/net/eth0/queues/tx-1
>> echo f > /sys/class/net/eth0/queues/tx-2
>> echo f > /sys/class/net/eth0/queues/tx-3
> 
> These commands look truncated.

I will refill the commands.

> 
>> at this point the order of each map queues is differnet, It will cause
>> packets in the same stream be hashed to diffetent tx queue:
>> attr_map[0].queues = [0,1,2,3]
>> attr_map[1].queues = [1,0,2,3]
>> attr_map[2].queues = [2,0,1,3]
>> attr_map[3].queues = [3,0,1,2]
>>
>> It is more reasonable that pacekts in the same stream be hashed to the same
>> tx queue when all tx queue bind with the same CPUs.
>>
>> Fixes: 537c00de1c9b ("net: Add functions netif_reset_xps_queue and netif_set_xps_queue")
> 
> I'd suggest treating this as a general improvement rather than fix,
> the kernel always behaved this way - it seems logical that sorted is
> better but whether it's a bug not to sort is not as clear cut.
> 
agree, will remove Fixes:tag in next version.

>> @@ -2654,6 +2660,13 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
>>  					  skip_tc);
>>  	}
>>  
>> +	for (j = -1; j = netif_attrmask_next_and(j, online_mask, mask, nr_ids),
>> +	     j < nr_ids;) {
>> +		tci = j * num_tc + tc;
>> +		map = xmap_dereference(new_dev_maps->attr_map[tci]);
>> +		sort(map->queues, map->len, sizeof(u16), cmp_u16, NULL);
>> +	}
>> +
> 
> Can we instead make sure that expand_xps_map() maintains order?
> 
expand_xps_map() only alloc new_map and copy old map's queue to new_map.
I think it is not suitable to do it in expand_xps_map().
WDYT?
Jakub Kicinski July 14, 2022, 3:32 a.m. UTC | #3
On Thu, 14 Jul 2022 11:24:31 +0800 Yonglong Li wrote:
> >> @@ -2654,6 +2660,13 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
> >>  					  skip_tc);
> >>  	}
> >>  
> >> +	for (j = -1; j = netif_attrmask_next_and(j, online_mask, mask, nr_ids),
> >> +	     j < nr_ids;) {
> >> +		tci = j * num_tc + tc;
> >> +		map = xmap_dereference(new_dev_maps->attr_map[tci]);
> >> +		sort(map->queues, map->len, sizeof(u16), cmp_u16, NULL);
> >> +	}
> >> +  
> > 
> > Can we instead make sure that expand_xps_map() maintains order?
> >   
> expand_xps_map() only alloc new_map and copy old map's queue to new_map.
> I think it is not suitable to do it in expand_xps_map().
> WDYT?

Oh, right, sorry for the confusion, I assumed since it reallocates that
it also fills the entry. It probably doesn't to make sure that all
allocations succeed before making any modifications.

Can we factor out the inside of the next loop - starting from the 
"add tx-queue to CPU/rx-queue maps" comment into a helper? My worry is
that __netif_set_xps_queue() is already pretty long and complicated we
should try to move some code out rather than make it longer.
YonglongLi July 14, 2022, 6:04 a.m. UTC | #4
On 7/14/2022 11:32 AM, Jakub Kicinski wrote:
> On Thu, 14 Jul 2022 11:24:31 +0800 Yonglong Li wrote:
>>>> @@ -2654,6 +2660,13 @@ int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
>>>>  					  skip_tc);
>>>>  	}
>>>>  
>>>> +	for (j = -1; j = netif_attrmask_next_and(j, online_mask, mask, nr_ids),
>>>> +	     j < nr_ids;) {
>>>> +		tci = j * num_tc + tc;
>>>> +		map = xmap_dereference(new_dev_maps->attr_map[tci]);
>>>> +		sort(map->queues, map->len, sizeof(u16), cmp_u16, NULL);
>>>> +	}
>>>> +  
>>>
>>> Can we instead make sure that expand_xps_map() maintains order?
>>>   
>> expand_xps_map() only alloc new_map and copy old map's queue to new_map.
>> I think it is not suitable to do it in expand_xps_map().
>> WDYT?
> 
> Oh, right, sorry for the confusion, I assumed since it reallocates that
> it also fills the entry. It probably doesn't to make sure that all
> allocations succeed before making any modifications.
> 
> Can we factor out the inside of the next loop - starting from the 
> "add tx-queue to CPU/rx-queue maps" comment into a helper? My worry is
> that __netif_set_xps_queue() is already pretty long and complicated we
> should try to move some code out rather than make it longer.
>
Ok, I will prepare a v2 patch as your suggestion.
diff mbox series

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index 978ed06..aab26b4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -150,6 +150,7 @@ 
 #include <linux/pm_runtime.h>
 #include <linux/prandom.h>
 #include <linux/once_lite.h>
+#include <linux/sort.h>
 
 #include "dev.h"
 #include "net-sysfs.h"
@@ -199,6 +200,11 @@  static int call_netdevice_notifiers_extack(unsigned long val,
 
 static DECLARE_RWSEM(devnet_rename_sem);
 
+static int cmp_u16(const void *a, const void *b)
+{
+	return *(u16 *)a - *(u16 *)b;
+}
+
 static inline void dev_base_seq_inc(struct net *net)
 {
 	while (++net->dev_base_seq == 0)
@@ -2654,6 +2660,13 @@  int __netif_set_xps_queue(struct net_device *dev, const unsigned long *mask,
 					  skip_tc);
 	}
 
+	for (j = -1; j = netif_attrmask_next_and(j, online_mask, mask, nr_ids),
+	     j < nr_ids;) {
+		tci = j * num_tc + tc;
+		map = xmap_dereference(new_dev_maps->attr_map[tci]);
+		sort(map->queues, map->len, sizeof(u16), cmp_u16, NULL);
+	}
+
 	rcu_assign_pointer(dev->xps_maps[type], new_dev_maps);
 
 	/* Cleanup old maps */