diff mbox series

[3/3] net: mana: add a function to spread IRQs per CPUs

Message ID 20231217213214.1905481-4-yury.norov@gmail.com (mailing list archive)
State Not Applicable
Delegated to: Netdev Maintainers
Headers show
Series net: mana: add irq_spread() | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/cc_maintainers warning 1 maintainers not CCed: kotaranov@microsoft.com
netdev/build_clang success Errors and warnings before: 1142 this patch: 1142
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1142 this patch: 1142
netdev/checkpatch warning WARNING: Co-developed-by and Signed-off-by: name/email do not match WARNING: Missing a blank line after declarations WARNING: externs should be avoided in .c files WARNING: function definition argument 'free_cpumask_var' should also have an identifier name WARNING: line length of 83 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns WARNING: line length of 98 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Yury Norov Dec. 17, 2023, 9:32 p.m. UTC
Souradeep investigated that the driver performs faster if IRQs are
spread on CPUs with the following heuristics:

1. No more than one IRQ per CPU, if possible;
2. NUMA locality is the second priority;
3. Sibling dislocality is the last priority.

Let's consider this topology:

Node            0               1
Core        0       1       2       3
CPU       0   1   2   3   4   5   6   7

The most performant IRQ distribution based on the above topology
and heuristics may look like this:

IRQ     Nodes   Cores   CPUs
0       1       0       0-1
1       1       1       2-3
2       1       0       0-1
3       1       1       2-3
4       2       2       4-5
5       2       3       6-7
6       2       2       4-5
7       2       3       6-7

The irq_setup() routine introduced in this patch leverages the
for_each_numa_hop_mask() iterator and assigns IRQs to sibling groups
as described above.

According to [1], for NUMA-aware but sibling-ignorant IRQ distribution
based on cpumask_local_spread() performance test results look like this:

./ntttcp -r -m 16
NTTTCP for Linux 1.4.0
---------------------------------------------------------
08:05:20 INFO: 17 threads created
08:05:28 INFO: Network activity progressing...
08:06:28 INFO: Test run completed.
08:06:28 INFO: Test cycle finished.
08:06:28 INFO: #####  Totals:  #####
08:06:28 INFO: test duration    :60.00 seconds
08:06:28 INFO: total bytes      :630292053310
08:06:28 INFO:   throughput     :84.04Gbps
08:06:28 INFO:   retrans segs   :4
08:06:28 INFO: cpu cores        :192
08:06:28 INFO:   cpu speed      :3799.725MHz
08:06:28 INFO:   user           :0.05%
08:06:28 INFO:   system         :1.60%
08:06:28 INFO:   idle           :96.41%
08:06:28 INFO:   iowait         :0.00%
08:06:28 INFO:   softirq        :1.94%
08:06:28 INFO:   cycles/byte    :2.50
08:06:28 INFO: cpu busy (all)   :534.41%

For NUMA- and sibling-aware IRQ distribution, the same test works
15% faster:

./ntttcp -r -m 16
NTTTCP for Linux 1.4.0
---------------------------------------------------------
08:08:51 INFO: 17 threads created
08:08:56 INFO: Network activity progressing...
08:09:56 INFO: Test run completed.
08:09:56 INFO: Test cycle finished.
08:09:56 INFO: #####  Totals:  #####
08:09:56 INFO: test duration    :60.00 seconds
08:09:56 INFO: total bytes      :741966608384
08:09:56 INFO:   throughput     :98.93Gbps
08:09:56 INFO:   retrans segs   :6
08:09:56 INFO: cpu cores        :192
08:09:56 INFO:   cpu speed      :3799.791MHz
08:09:56 INFO:   user           :0.06%
08:09:56 INFO:   system         :1.81%
08:09:56 INFO:   idle           :96.18%
08:09:56 INFO:   iowait         :0.00%
08:09:56 INFO:   softirq        :1.95%
08:09:56 INFO:   cycles/byte    :2.25
08:09:56 INFO: cpu busy (all)   :569.22%

[1] https://lore.kernel.org/all/20231211063726.GA4977@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Co-developed-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
---
 .../net/ethernet/microsoft/mana/gdma_main.c   | 28 +++++++++++++++++++
 1 file changed, 28 insertions(+)

Comments

Jacob Keller Dec. 18, 2023, 9:17 p.m. UTC | #1
On 12/17/2023 1:32 PM, Yury Norov wrote:
> +static __maybe_unused int irq_setup(unsigned int *irqs, unsigned int len, int node)
> +{
> +	const struct cpumask *next, *prev = cpu_none_mask;
> +	cpumask_var_t cpus __free(free_cpumask_var);
> +	int cpu, weight;
> +
> +	if (!alloc_cpumask_var(&cpus, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	rcu_read_lock();
> +	for_each_numa_hop_mask(next, node) {
> +		weight = cpumask_weight_andnot(next, prev);
> +		while (weight-- > 0) {
> +			cpumask_andnot(cpus, next, prev);
> +			for_each_cpu(cpu, cpus) {
> +				if (len-- == 0)
> +					goto done;
> +				irq_set_affinity_and_hint(*irqs++, topology_sibling_cpumask(cpu));
> +				cpumask_andnot(cpus, cpus, topology_sibling_cpumask(cpu));
> +			}
> +		}
> +		prev = next;
> +	}
> +done:
> +	rcu_read_unlock();
> +	return 0;
> +}
> +

You're adding a function here but its not called and even marked as
__maybe_unused?

>  static int mana_gd_setup_irqs(struct pci_dev *pdev)
>  {
>  	unsigned int max_queues_per_port = num_online_cpus();
Yury Norov Dec. 18, 2023, 9:42 p.m. UTC | #2
On Mon, Dec 18, 2023 at 01:17:53PM -0800, Jacob Keller wrote:
> 
> 
> On 12/17/2023 1:32 PM, Yury Norov wrote:
> > +static __maybe_unused int irq_setup(unsigned int *irqs, unsigned int len, int node)
> > +{
> > +	const struct cpumask *next, *prev = cpu_none_mask;
> > +	cpumask_var_t cpus __free(free_cpumask_var);
> > +	int cpu, weight;
> > +
> > +	if (!alloc_cpumask_var(&cpus, GFP_KERNEL))
> > +		return -ENOMEM;
> > +
> > +	rcu_read_lock();
> > +	for_each_numa_hop_mask(next, node) {
> > +		weight = cpumask_weight_andnot(next, prev);
> > +		while (weight-- > 0) {
> > +			cpumask_andnot(cpus, next, prev);
> > +			for_each_cpu(cpu, cpus) {
> > +				if (len-- == 0)
> > +					goto done;
> > +				irq_set_affinity_and_hint(*irqs++, topology_sibling_cpumask(cpu));
> > +				cpumask_andnot(cpus, cpus, topology_sibling_cpumask(cpu));
> > +			}
> > +		}
> > +		prev = next;
> > +	}
> > +done:
> > +	rcu_read_unlock();
> > +	return 0;
> > +}
> > +
> 
> You're adding a function here but its not called and even marked as
> __maybe_unused?

I expect that Souradeep would build his driver improvement on top of
this function. cpumask API is somewhat tricky to use it properly here,
so this is an attempt help him, instead of moving back and forth on
review.

Sorry, I had to be more explicit.

Thanks,
Yury
Souradeep Chakrabarti Dec. 19, 2023, 7:14 a.m. UTC | #3
>-----Original Message-----
>From: Yury Norov <yury.norov@gmail.com>
>Sent: Monday, December 18, 2023 3:02 AM
>To: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>; KY Srinivasan
><kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>;
>wei.liu@kernel.org; Dexuan Cui <decui@microsoft.com>; davem@davemloft.net;
>edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; Long Li
><longli@microsoft.com>; yury.norov@gmail.com; leon@kernel.org;
>cai.huoqing@linux.dev; ssengar@linux.microsoft.com; vkuznets@redhat.com;
>tglx@linutronix.de; linux-hyperv@vger.kernel.org; netdev@vger.kernel.org; linux-
>kernel@vger.kernel.org; linux-rdma@vger.kernel.org
>Cc: Souradeep Chakrabarti <schakrabarti@microsoft.com>; Paul Rosswurm
><paulros@microsoft.com>
>Subject: [EXTERNAL] [PATCH 3/3] net: mana: add a function to spread IRQs per
>CPUs
>
>[Some people who received this message don't often get email from
>yury.norov@gmail.com. Learn why this is important at
>https://aka.ms/LearnAboutSenderIdentification ]
>
>Souradeep investigated that the driver performs faster if IRQs are spread on CPUs
>with the following heuristics:
>
>1. No more than one IRQ per CPU, if possible; 2. NUMA locality is the second
>priority; 3. Sibling dislocality is the last priority.
>
>Let's consider this topology:
>
>Node            0               1
>Core        0       1       2       3
>CPU       0   1   2   3   4   5   6   7
>
>The most performant IRQ distribution based on the above topology and heuristics
>may look like this:
>
>IRQ     Nodes   Cores   CPUs
>0       1       0       0-1
>1       1       1       2-3
>2       1       0       0-1
>3       1       1       2-3
>4       2       2       4-5
>5       2       3       6-7
>6       2       2       4-5
>7       2       3       6-7
>
>The irq_setup() routine introduced in this patch leverages the
>for_each_numa_hop_mask() iterator and assigns IRQs to sibling groups as
>described above.
>
>According to [1], for NUMA-aware but sibling-ignorant IRQ distribution based on
>cpumask_local_spread() performance test results look like this:
>
>./ntttcp -r -m 16
>NTTTCP for Linux 1.4.0
>---------------------------------------------------------
>08:05:20 INFO: 17 threads created
>08:05:28 INFO: Network activity progressing...
>08:06:28 INFO: Test run completed.
>08:06:28 INFO: Test cycle finished.
>08:06:28 INFO: #####  Totals:  #####
>08:06:28 INFO: test duration    :60.00 seconds
>08:06:28 INFO: total bytes      :630292053310
>08:06:28 INFO:   throughput     :84.04Gbps
>08:06:28 INFO:   retrans segs   :4
>08:06:28 INFO: cpu cores        :192
>08:06:28 INFO:   cpu speed      :3799.725MHz
>08:06:28 INFO:   user           :0.05%
>08:06:28 INFO:   system         :1.60%
>08:06:28 INFO:   idle           :96.41%
>08:06:28 INFO:   iowait         :0.00%
>08:06:28 INFO:   softirq        :1.94%
>08:06:28 INFO:   cycles/byte    :2.50
>08:06:28 INFO: cpu busy (all)   :534.41%
>
>For NUMA- and sibling-aware IRQ distribution, the same test works 15% faster:
>
>./ntttcp -r -m 16
>NTTTCP for Linux 1.4.0
>---------------------------------------------------------
>08:08:51 INFO: 17 threads created
>08:08:56 INFO: Network activity progressing...
>08:09:56 INFO: Test run completed.
>08:09:56 INFO: Test cycle finished.
>08:09:56 INFO: #####  Totals:  #####
>08:09:56 INFO: test duration    :60.00 seconds
>08:09:56 INFO: total bytes      :741966608384
>08:09:56 INFO:   throughput     :98.93Gbps
>08:09:56 INFO:   retrans segs   :6
>08:09:56 INFO: cpu cores        :192
>08:09:56 INFO:   cpu speed      :3799.791MHz
>08:09:56 INFO:   user           :0.06%
>08:09:56 INFO:   system         :1.81%
>08:09:56 INFO:   idle           :96.18%
>08:09:56 INFO:   iowait         :0.00%
>08:09:56 INFO:   softirq        :1.95%
>08:09:56 INFO:   cycles/byte    :2.25
>08:09:56 INFO: cpu busy (all)   :569.22%
>
>[1]
>https://lore.kernel/
>.org%2Fall%2F20231211063726.GA4977%40linuxonhyperv3.guj3yctzbm1etfxqx2v
>ob5hsef.xx.internal.cloudapp.net%2F&data=05%7C02%7Cschakrabarti%40micros
>oft.com%7Ca385a5a5d661458219c208dbff47a7ab%7C72f988bf86f141af91ab2d7
>cd011db47%7C1%7C0%7C638384455520036393%7CUnknown%7CTWFpbGZsb3d
>8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
>7C3000%7C%7C%7C&sdata=kzoalzSu6frB0GIaUM5VWsz04%2FsB%2FBdXwXKb26
>IhqkE%3D&reserved=0
>
>Signed-off-by: Yury Norov <yury.norov@gmail.com>
>Co-developed-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
Please also add Signed-off-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
>---
> .../net/ethernet/microsoft/mana/gdma_main.c   | 28 +++++++++++++++++++
> 1 file changed, 28 insertions(+)
>
>diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
>b/drivers/net/ethernet/microsoft/mana/gdma_main.c
>index 6367de0c2c2e..11e64e42e3b2 100644
>--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
>+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
>@@ -1243,6 +1243,34 @@ void mana_gd_free_res_map(struct gdma_resource
>*r)
>        r->size = 0;
> }
>
>+static __maybe_unused int irq_setup(unsigned int *irqs, unsigned int
>+len, int node) {
>+       const struct cpumask *next, *prev = cpu_none_mask;
>+       cpumask_var_t cpus __free(free_cpumask_var);
>+       int cpu, weight;
>+
>+       if (!alloc_cpumask_var(&cpus, GFP_KERNEL))
>+               return -ENOMEM;
>+
>+       rcu_read_lock();
>+       for_each_numa_hop_mask(next, node) {
>+               weight = cpumask_weight_andnot(next, prev);
>+               while (weight-- > 0) {
>+                       cpumask_andnot(cpus, next, prev);
>+                       for_each_cpu(cpu, cpus) {
>+                               if (len-- == 0)
>+                                       goto done;
>+                               irq_set_affinity_and_hint(*irqs++,
>topology_sibling_cpumask(cpu));
>+                               cpumask_andnot(cpus, cpus, topology_sibling_cpumask(cpu));
>+                       }
>+               }
>+               prev = next;
>+       }
>+done:
>+       rcu_read_unlock();
>+       return 0;
>+}
>+
> static int mana_gd_setup_irqs(struct pci_dev *pdev)  {
>        unsigned int max_queues_per_port = num_online_cpus();
>--
>2.40.1
Souradeep Chakrabarti Dec. 19, 2023, 10:18 a.m. UTC | #4
>-----Original Message-----
>From: Yury Norov <yury.norov@gmail.com>
>Sent: Monday, December 18, 2023 3:02 AM
>To: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>; KY Srinivasan
><kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>;
>wei.liu@kernel.org; Dexuan Cui <decui@microsoft.com>; davem@davemloft.net;
>edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; Long Li
><longli@microsoft.com>; yury.norov@gmail.com; leon@kernel.org;
>cai.huoqing@linux.dev; ssengar@linux.microsoft.com; vkuznets@redhat.com;
>tglx@linutronix.de; linux-hyperv@vger.kernel.org; netdev@vger.kernel.org; linux-
>kernel@vger.kernel.org; linux-rdma@vger.kernel.org
>Cc: Souradeep Chakrabarti <schakrabarti@microsoft.com>; Paul Rosswurm
><paulros@microsoft.com>
>Subject: [EXTERNAL] [PATCH 3/3] net: mana: add a function to spread IRQs per
>CPUs
>
>[Some people who received this message don't often get email from
>yury.norov@gmail.com. Learn why this is important at
>https://aka.ms/LearnAboutSenderIdentification ]
>
>Souradeep investigated that the driver performs faster if IRQs are spread on CPUs
>with the following heuristics:
>
>1. No more than one IRQ per CPU, if possible; 2. NUMA locality is the second
>priority; 3. Sibling dislocality is the last priority.
>
>Let's consider this topology:
>
>Node            0               1
>Core        0       1       2       3
>CPU       0   1   2   3   4   5   6   7
>
>The most performant IRQ distribution based on the above topology and heuristics
>may look like this:
>
>IRQ     Nodes   Cores   CPUs
>0       1       0       0-1
>1       1       1       2-3
>2       1       0       0-1
>3       1       1       2-3
>4       2       2       4-5
>5       2       3       6-7
>6       2       2       4-5
>7       2       3       6-7
>
>The irq_setup() routine introduced in this patch leverages the
>for_each_numa_hop_mask() iterator and assigns IRQs to sibling groups as
>described above.
>
>According to [1], for NUMA-aware but sibling-ignorant IRQ distribution based on
>cpumask_local_spread() performance test results look like this:
>
>./ntttcp -r -m 16
>NTTTCP for Linux 1.4.0
>---------------------------------------------------------
>08:05:20 INFO: 17 threads created
>08:05:28 INFO: Network activity progressing...
>08:06:28 INFO: Test run completed.
>08:06:28 INFO: Test cycle finished.
>08:06:28 INFO: #####  Totals:  #####
>08:06:28 INFO: test duration    :60.00 seconds
>08:06:28 INFO: total bytes      :630292053310
>08:06:28 INFO:   throughput     :84.04Gbps
>08:06:28 INFO:   retrans segs   :4
>08:06:28 INFO: cpu cores        :192
>08:06:28 INFO:   cpu speed      :3799.725MHz
>08:06:28 INFO:   user           :0.05%
>08:06:28 INFO:   system         :1.60%
>08:06:28 INFO:   idle           :96.41%
>08:06:28 INFO:   iowait         :0.00%
>08:06:28 INFO:   softirq        :1.94%
>08:06:28 INFO:   cycles/byte    :2.50
>08:06:28 INFO: cpu busy (all)   :534.41%
>
>For NUMA- and sibling-aware IRQ distribution, the same test works 15% faster:
>
>./ntttcp -r -m 16
>NTTTCP for Linux 1.4.0
>---------------------------------------------------------
>08:08:51 INFO: 17 threads created
>08:08:56 INFO: Network activity progressing...
>08:09:56 INFO: Test run completed.
>08:09:56 INFO: Test cycle finished.
>08:09:56 INFO: #####  Totals:  #####
>08:09:56 INFO: test duration    :60.00 seconds
>08:09:56 INFO: total bytes      :741966608384
>08:09:56 INFO:   throughput     :98.93Gbps
>08:09:56 INFO:   retrans segs   :6
>08:09:56 INFO: cpu cores        :192
>08:09:56 INFO:   cpu speed      :3799.791MHz
>08:09:56 INFO:   user           :0.06%
>08:09:56 INFO:   system         :1.81%
>08:09:56 INFO:   idle           :96.18%
>08:09:56 INFO:   iowait         :0.00%
>08:09:56 INFO:   softirq        :1.95%
>08:09:56 INFO:   cycles/byte    :2.25
>08:09:56 INFO: cpu busy (all)   :569.22%
>
>[1]
>https://lore.kernel/
>.org%2Fall%2F20231211063726.GA4977%40linuxonhyperv3.guj3yctzbm1etfxqx2v
>ob5hsef.xx.internal.cloudapp.net%2F&data=05%7C02%7Cschakrabarti%40micros
>oft.com%7Ca385a5a5d661458219c208dbff47a7ab%7C72f988bf86f141af91ab2d7
>cd011db47%7C1%7C0%7C638384455520036393%7CUnknown%7CTWFpbGZsb3d
>8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
>7C3000%7C%7C%7C&sdata=kzoalzSu6frB0GIaUM5VWsz04%2FsB%2FBdXwXKb26
>IhqkE%3D&reserved=0
>
>Signed-off-by: Yury Norov <yury.norov@gmail.com>
>Co-developed-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
>---
> .../net/ethernet/microsoft/mana/gdma_main.c   | 28 +++++++++++++++++++
> 1 file changed, 28 insertions(+)
>
>diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
>b/drivers/net/ethernet/microsoft/mana/gdma_main.c
>index 6367de0c2c2e..11e64e42e3b2 100644
>--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
>+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
>@@ -1243,6 +1243,34 @@ void mana_gd_free_res_map(struct gdma_resource
>*r)
>        r->size = 0;
> }
>
>+static __maybe_unused int irq_setup(unsigned int *irqs, unsigned int
>+len, int node) {
>+       const struct cpumask *next, *prev = cpu_none_mask;
>+       cpumask_var_t cpus __free(free_cpumask_var);
>+       int cpu, weight;
>+
>+       if (!alloc_cpumask_var(&cpus, GFP_KERNEL))
>+               return -ENOMEM;
>+
>+       rcu_read_lock();
>+       for_each_numa_hop_mask(next, node) {
>+               weight = cpumask_weight_andnot(next, prev);
>+               while (weight-- > 0) {
Make it while (weight > 0) {
>+                       cpumask_andnot(cpus, next, prev);
>+                       for_each_cpu(cpu, cpus) {
>+                               if (len-- == 0)
>+                                       goto done;
>+                               irq_set_affinity_and_hint(*irqs++,
>topology_sibling_cpumask(cpu));
>+                               cpumask_andnot(cpus, cpus, topology_sibling_cpumask(cpu));
Here do --weight, else this code will traverse the same node N^2 times, where each
node has N cpus .
>+                       }
>+               }
>+               prev = next;
>+       }
>+done:
>+       rcu_read_unlock();
>+       return 0;
>+}
>+
> static int mana_gd_setup_irqs(struct pci_dev *pdev)  {
>        unsigned int max_queues_per_port = num_online_cpus();
>--
>2.40.1
Yury Norov Dec. 19, 2023, 2:03 p.m. UTC | #5
On Tue, Dec 19, 2023 at 10:18:49AM +0000, Souradeep Chakrabarti wrote:
> 
> 
> >-----Original Message-----
> >From: Yury Norov <yury.norov@gmail.com>
> >Sent: Monday, December 18, 2023 3:02 AM
> >To: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>; KY Srinivasan
> ><kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>;
> >wei.liu@kernel.org; Dexuan Cui <decui@microsoft.com>; davem@davemloft.net;
> >edumazet@google.com; kuba@kernel.org; pabeni@redhat.com; Long Li
> ><longli@microsoft.com>; yury.norov@gmail.com; leon@kernel.org;
> >cai.huoqing@linux.dev; ssengar@linux.microsoft.com; vkuznets@redhat.com;
> >tglx@linutronix.de; linux-hyperv@vger.kernel.org; netdev@vger.kernel.org; linux-
> >kernel@vger.kernel.org; linux-rdma@vger.kernel.org
> >Cc: Souradeep Chakrabarti <schakrabarti@microsoft.com>; Paul Rosswurm
> ><paulros@microsoft.com>
> >Subject: [EXTERNAL] [PATCH 3/3] net: mana: add a function to spread IRQs per
> >CPUs
> >
> >[Some people who received this message don't often get email from
> >yury.norov@gmail.com. Learn why this is important at
> >https://aka.ms/LearnAboutSenderIdentification ]
> >
> >Souradeep investigated that the driver performs faster if IRQs are spread on CPUs
> >with the following heuristics:
> >
> >1. No more than one IRQ per CPU, if possible; 2. NUMA locality is the second
> >priority; 3. Sibling dislocality is the last priority.
> >
> >Let's consider this topology:
> >
> >Node            0               1
> >Core        0       1       2       3
> >CPU       0   1   2   3   4   5   6   7
> >
> >The most performant IRQ distribution based on the above topology and heuristics
> >may look like this:
> >
> >IRQ     Nodes   Cores   CPUs
> >0       1       0       0-1
> >1       1       1       2-3
> >2       1       0       0-1
> >3       1       1       2-3
> >4       2       2       4-5
> >5       2       3       6-7
> >6       2       2       4-5
> >7       2       3       6-7
> >
> >The irq_setup() routine introduced in this patch leverages the
> >for_each_numa_hop_mask() iterator and assigns IRQs to sibling groups as
> >described above.
> >
> >According to [1], for NUMA-aware but sibling-ignorant IRQ distribution based on
> >cpumask_local_spread() performance test results look like this:
> >
> >./ntttcp -r -m 16
> >NTTTCP for Linux 1.4.0
> >---------------------------------------------------------
> >08:05:20 INFO: 17 threads created
> >08:05:28 INFO: Network activity progressing...
> >08:06:28 INFO: Test run completed.
> >08:06:28 INFO: Test cycle finished.
> >08:06:28 INFO: #####  Totals:  #####
> >08:06:28 INFO: test duration    :60.00 seconds
> >08:06:28 INFO: total bytes      :630292053310
> >08:06:28 INFO:   throughput     :84.04Gbps
> >08:06:28 INFO:   retrans segs   :4
> >08:06:28 INFO: cpu cores        :192
> >08:06:28 INFO:   cpu speed      :3799.725MHz
> >08:06:28 INFO:   user           :0.05%
> >08:06:28 INFO:   system         :1.60%
> >08:06:28 INFO:   idle           :96.41%
> >08:06:28 INFO:   iowait         :0.00%
> >08:06:28 INFO:   softirq        :1.94%
> >08:06:28 INFO:   cycles/byte    :2.50
> >08:06:28 INFO: cpu busy (all)   :534.41%
> >
> >For NUMA- and sibling-aware IRQ distribution, the same test works 15% faster:
> >
> >./ntttcp -r -m 16
> >NTTTCP for Linux 1.4.0
> >---------------------------------------------------------
> >08:08:51 INFO: 17 threads created
> >08:08:56 INFO: Network activity progressing...
> >08:09:56 INFO: Test run completed.
> >08:09:56 INFO: Test cycle finished.
> >08:09:56 INFO: #####  Totals:  #####
> >08:09:56 INFO: test duration    :60.00 seconds
> >08:09:56 INFO: total bytes      :741966608384
> >08:09:56 INFO:   throughput     :98.93Gbps
> >08:09:56 INFO:   retrans segs   :6
> >08:09:56 INFO: cpu cores        :192
> >08:09:56 INFO:   cpu speed      :3799.791MHz
> >08:09:56 INFO:   user           :0.06%
> >08:09:56 INFO:   system         :1.81%
> >08:09:56 INFO:   idle           :96.18%
> >08:09:56 INFO:   iowait         :0.00%
> >08:09:56 INFO:   softirq        :1.95%
> >08:09:56 INFO:   cycles/byte    :2.25
> >08:09:56 INFO: cpu busy (all)   :569.22%
> >
> >[1]
> >https://lore.kernel/
> >.org%2Fall%2F20231211063726.GA4977%40linuxonhyperv3.guj3yctzbm1etfxqx2v
> >ob5hsef.xx.internal.cloudapp.net%2F&data=05%7C02%7Cschakrabarti%40micros
> >oft.com%7Ca385a5a5d661458219c208dbff47a7ab%7C72f988bf86f141af91ab2d7
> >cd011db47%7C1%7C0%7C638384455520036393%7CUnknown%7CTWFpbGZsb3d
> >8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%
> >7C3000%7C%7C%7C&sdata=kzoalzSu6frB0GIaUM5VWsz04%2FsB%2FBdXwXKb26
> >IhqkE%3D&reserved=0
> >
> >Signed-off-by: Yury Norov <yury.norov@gmail.com>
> >Co-developed-by: Souradeep Chakrabarti <schakrabarti@linux.microsoft.com>
> >---
> > .../net/ethernet/microsoft/mana/gdma_main.c   | 28 +++++++++++++++++++
> > 1 file changed, 28 insertions(+)
> >
> >diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> >b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> >index 6367de0c2c2e..11e64e42e3b2 100644
> >--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
> >+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
> >@@ -1243,6 +1243,34 @@ void mana_gd_free_res_map(struct gdma_resource
> >*r)
> >        r->size = 0;
> > }
> >
> >+static __maybe_unused int irq_setup(unsigned int *irqs, unsigned int
> >+len, int node) {
> >+       const struct cpumask *next, *prev = cpu_none_mask;
> >+       cpumask_var_t cpus __free(free_cpumask_var);
> >+       int cpu, weight;
> >+
> >+       if (!alloc_cpumask_var(&cpus, GFP_KERNEL))
> >+               return -ENOMEM;
> >+
> >+       rcu_read_lock();
> >+       for_each_numa_hop_mask(next, node) {
> >+               weight = cpumask_weight_andnot(next, prev);
> >+               while (weight-- > 0) {
> Make it while (weight > 0) {
> >+                       cpumask_andnot(cpus, next, prev);
> >+                       for_each_cpu(cpu, cpus) {
> >+                               if (len-- == 0)
> >+                                       goto done;
> >+                               irq_set_affinity_and_hint(*irqs++,
> >topology_sibling_cpumask(cpu));
> >+                               cpumask_andnot(cpus, cpus, topology_sibling_cpumask(cpu));
> Here do --weight, else this code will traverse the same node N^2 times, where each
> node has N cpus .

Sure.

When building your series on top of this, can you please fix it
inplace?

Thanks,
Yury

> >+                       }
> >+               }
> >+               prev = next;
> >+       }
> >+done:
> >+       rcu_read_unlock();
> >+       return 0;
> >+}
> >+
> > static int mana_gd_setup_irqs(struct pci_dev *pdev)  {
> >        unsigned int max_queues_per_port = num_online_cpus();
> >--
> >2.40.1
diff mbox series

Patch

diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c b/drivers/net/ethernet/microsoft/mana/gdma_main.c
index 6367de0c2c2e..11e64e42e3b2 100644
--- a/drivers/net/ethernet/microsoft/mana/gdma_main.c
+++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c
@@ -1243,6 +1243,34 @@  void mana_gd_free_res_map(struct gdma_resource *r)
 	r->size = 0;
 }
 
+static __maybe_unused int irq_setup(unsigned int *irqs, unsigned int len, int node)
+{
+	const struct cpumask *next, *prev = cpu_none_mask;
+	cpumask_var_t cpus __free(free_cpumask_var);
+	int cpu, weight;
+
+	if (!alloc_cpumask_var(&cpus, GFP_KERNEL))
+		return -ENOMEM;
+
+	rcu_read_lock();
+	for_each_numa_hop_mask(next, node) {
+		weight = cpumask_weight_andnot(next, prev);
+		while (weight-- > 0) {
+			cpumask_andnot(cpus, next, prev);
+			for_each_cpu(cpu, cpus) {
+				if (len-- == 0)
+					goto done;
+				irq_set_affinity_and_hint(*irqs++, topology_sibling_cpumask(cpu));
+				cpumask_andnot(cpus, cpus, topology_sibling_cpumask(cpu));
+			}
+		}
+		prev = next;
+	}
+done:
+	rcu_read_unlock();
+	return 0;
+}
+
 static int mana_gd_setup_irqs(struct pci_dev *pdev)
 {
 	unsigned int max_queues_per_port = num_online_cpus();