diff mbox series

[net-next,v7,2/5] net: napi: add CPU affinity to napi_config

Message ID 20250204220622.156061-3-ahmed.zaki@intel.com (mailing list archive)
State New
Delegated to: Netdev Maintainers
Headers show
Series net: napi: add CPU affinity to napi->config | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 40 this patch: 40
netdev/build_tools success Errors and warnings before: 26 (+1) this patch: 26 (+1)
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 7109 this patch: 7109
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4116 this patch: 4116
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 181 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 95 this patch: 95
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-02-05--03-00 (tests: 886)

Commit Message

Ahmed Zaki Feb. 4, 2025, 10:06 p.m. UTC
A common task for most drivers is to remember the user-set CPU affinity
to its IRQs. On each netdev reset, the driver should re-assign the
user's settings to the IRQs.

Add CPU affinity mask to napi_config. To delegate the CPU affinity
management to the core, drivers must:
 1 - set the new netdev flag "irq_affinity_auto":
                                       netif_enable_irq_affinity(netdev)
 2 - create the napi with persistent config:
                                       netif_napi_add_config()
 3 - bind an IRQ to the napi instance: netif_napi_set_irq()

the core will then make sure to use re-assign affinity to the napi's
IRQ.

The default IRQ mask is set to one cpu starting from the closest NUMA.

Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
---
 include/linux/netdevice.h | 14 +++++++--
 net/core/dev.c            | 62 +++++++++++++++++++++++++++++++--------
 2 files changed, 61 insertions(+), 15 deletions(-)

Comments

Joe Damato Feb. 4, 2025, 10:43 p.m. UTC | #1
On Tue, Feb 04, 2025 at 03:06:19PM -0700, Ahmed Zaki wrote:
> A common task for most drivers is to remember the user-set CPU affinity
> to its IRQs. On each netdev reset, the driver should re-assign the
> user's settings to the IRQs.
> 
> Add CPU affinity mask to napi_config. To delegate the CPU affinity
> management to the core, drivers must:
>  1 - set the new netdev flag "irq_affinity_auto":
>                                        netif_enable_irq_affinity(netdev)
>  2 - create the napi with persistent config:
>                                        netif_napi_add_config()
>  3 - bind an IRQ to the napi instance: netif_napi_set_irq()
> 
> the core will then make sure to use re-assign affinity to the napi's
> IRQ.
> 
> The default IRQ mask is set to one cpu starting from the closest NUMA.

Not sure, but maybe the above should be documented somewhere like
Documentation/networking/napi.rst or similar?

Maybe that's too nit-picky, though, since the per-NAPI config stuff
never made it into the docs (I'll propose a patch to fix that).

> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
> ---
>  include/linux/netdevice.h | 14 +++++++--
>  net/core/dev.c            | 62 +++++++++++++++++++++++++++++++--------
>  2 files changed, 61 insertions(+), 15 deletions(-)

[...]
 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 33e84477c9c2..4cde7ac31e74 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c

[...]

> @@ -6968,17 +6983,28 @@ void netif_napi_set_irq_locked(struct napi_struct *napi, int irq)
>  {
>  	int rc;
>  
> -	/* Remove existing rmap entries */
> -	if (napi->dev->rx_cpu_rmap_auto &&
> +	/* Remove existing resources */
> +	if ((napi->dev->rx_cpu_rmap_auto || napi->dev->irq_affinity_auto) &&
>  	    napi->irq != irq && napi->irq > 0)
>  		irq_set_affinity_notifier(napi->irq, NULL);
>  
>  	napi->irq = irq;
> -	if (irq > 0) {
> +	if (irq < 0)
> +		return;
> +
> +	if (napi->dev->rx_cpu_rmap_auto) {
>  		rc = napi_irq_cpu_rmap_add(napi, irq);
>  		if (rc)
>  			netdev_warn(napi->dev, "Unable to update ARFS map (%d)\n",
>  				    rc);
> +	} else if (napi->config && napi->dev->irq_affinity_auto) {
> +		napi->notify.notify = netif_napi_irq_notify;
> +		napi->notify.release = netif_napi_affinity_release;
> +
> +		rc = irq_set_affinity_notifier(irq, &napi->notify);
> +		if (rc)
> +			netdev_warn(napi->dev, "Unable to set IRQ notifier (%d)\n",
> +				    rc);
>  	}

Should there be a WARN_ON or WARN_ON_ONCE in here somewhere if the
driver calls netif_napi_set_irq_locked but did not link NAPI config
with a call to netif_napi_add_config?

It seems like in that case the driver is buggy and a warning might
be helpful.
Ahmed Zaki Feb. 5, 2025, 3:20 p.m. UTC | #2
On 2025-02-04 3:43 p.m., Joe Damato wrote:
> On Tue, Feb 04, 2025 at 03:06:19PM -0700, Ahmed Zaki wrote:
>> A common task for most drivers is to remember the user-set CPU affinity
>> to its IRQs. On each netdev reset, the driver should re-assign the
>> user's settings to the IRQs.
>>
>> Add CPU affinity mask to napi_config. To delegate the CPU affinity
>> management to the core, drivers must:
>>   1 - set the new netdev flag "irq_affinity_auto":
>>                                         netif_enable_irq_affinity(netdev)
>>   2 - create the napi with persistent config:
>>                                         netif_napi_add_config()
>>   3 - bind an IRQ to the napi instance: netif_napi_set_irq()
>>
>> the core will then make sure to use re-assign affinity to the napi's
>> IRQ.
>>
>> The default IRQ mask is set to one cpu starting from the closest NUMA.
> 
> Not sure, but maybe the above should be documented somewhere like
> Documentation/networking/napi.rst or similar?
> 
> Maybe that's too nit-picky, though, since the per-NAPI config stuff
> never made it into the docs (I'll propose a patch to fix that).


Yeah, and not all API is there (like netif_napi_set_irq()).

> 
>> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com>
>> ---
>>   include/linux/netdevice.h | 14 +++++++--
>>   net/core/dev.c            | 62 +++++++++++++++++++++++++++++++--------
>>   2 files changed, 61 insertions(+), 15 deletions(-)
> 
> [...]
>   
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 33e84477c9c2..4cde7ac31e74 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
> 
> [...]
> 
>> @@ -6968,17 +6983,28 @@ void netif_napi_set_irq_locked(struct napi_struct *napi, int irq)
>>   {
>>   	int rc;
>>   
>> -	/* Remove existing rmap entries */
>> -	if (napi->dev->rx_cpu_rmap_auto &&
>> +	/* Remove existing resources */
>> +	if ((napi->dev->rx_cpu_rmap_auto || napi->dev->irq_affinity_auto) &&
>>   	    napi->irq != irq && napi->irq > 0)
>>   		irq_set_affinity_notifier(napi->irq, NULL);
>>   
>>   	napi->irq = irq;
>> -	if (irq > 0) {
>> +	if (irq < 0)
>> +		return;
>> +
>> +	if (napi->dev->rx_cpu_rmap_auto) {
>>   		rc = napi_irq_cpu_rmap_add(napi, irq);
>>   		if (rc)
>>   			netdev_warn(napi->dev, "Unable to update ARFS map (%d)\n",
>>   				    rc);
>> +	} else if (napi->config && napi->dev->irq_affinity_auto) {
>> +		napi->notify.notify = netif_napi_irq_notify;
>> +		napi->notify.release = netif_napi_affinity_release;
>> +
>> +		rc = irq_set_affinity_notifier(irq, &napi->notify);
>> +		if (rc)
>> +			netdev_warn(napi->dev, "Unable to set IRQ notifier (%d)\n",
>> +				    rc);
>>   	}
> 
> Should there be a WARN_ON or WARN_ON_ONCE in here somewhere if the
> driver calls netif_napi_set_irq_locked but did not link NAPI config
> with a call to netif_napi_add_config?
> 
> It seems like in that case the driver is buggy and a warning might
> be helpful.
> 

I think that is a good idea, if there is a new version I can add this in 
the second part of the if:


if (WARN_ON_ONCE(!napi->config))
	return;
diff mbox series

Patch

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0d19fa98b65e..0436605ee607 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -352,6 +352,7 @@  struct napi_config {
 	u64 gro_flush_timeout;
 	u64 irq_suspend_timeout;
 	u32 defer_hard_irqs;
+	cpumask_t affinity_mask;
 	unsigned int napi_id;
 };
 
@@ -394,10 +395,8 @@  struct napi_struct {
 	struct list_head	dev_list;
 	struct hlist_node	napi_hash_node;
 	int			irq;
-#ifdef CONFIG_RFS_ACCEL
 	struct irq_affinity_notify notify;
 	int			napi_rmap_idx;
-#endif
 	int			index;
 	struct napi_config	*config;
 };
@@ -1992,6 +1991,11 @@  enum netdev_reg_state {
  *
  *	@threaded:	napi threaded mode is enabled
  *
+ *	@irq_affinity_auto: driver wants the core to manage the IRQ affinity.
+ *			    Set by netif_enable_irq_affinity(), then driver must
+ *			    create persistent napi by netif_napi_add_config()
+ *			    and finally bind napi to IRQ (netif_napi_set_irq).
+ *
  *	@rx_cpu_rmap_auto: driver wants the core to manage the ARFS rmap.
  *	                   Set by calling netif_enable_cpu_rmap().
  *
@@ -2402,6 +2406,7 @@  struct net_device {
 	struct lock_class_key	*qdisc_tx_busylock;
 	bool			proto_down;
 	bool			threaded;
+	bool			irq_affinity_auto;
 	bool			rx_cpu_rmap_auto;
 
 	/* priv_flags_slow, ungrouped to save space */
@@ -2662,6 +2667,11 @@  static inline void netdev_set_ml_priv(struct net_device *dev,
 	dev->ml_priv_type = type;
 }
 
+static inline void netif_enable_irq_affinity(struct net_device *dev)
+{
+	dev->irq_affinity_auto = true;
+}
+
 /*
  * Net namespace inlines
  */
diff --git a/net/core/dev.c b/net/core/dev.c
index 33e84477c9c2..4cde7ac31e74 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6866,28 +6866,39 @@  void netif_queue_set_napi(struct net_device *dev, unsigned int queue_index,
 }
 EXPORT_SYMBOL(netif_queue_set_napi);
 
-#ifdef CONFIG_RFS_ACCEL
 static void
-netif_irq_cpu_rmap_notify(struct irq_affinity_notify *notify,
-			  const cpumask_t *mask)
+netif_napi_irq_notify(struct irq_affinity_notify *notify,
+		      const cpumask_t *mask)
 {
 	struct napi_struct *napi =
 		container_of(notify, struct napi_struct, notify);
+#ifdef CONFIG_RFS_ACCEL
 	struct cpu_rmap *rmap = napi->dev->rx_cpu_rmap;
 	int err;
+#endif
 
-	err = cpu_rmap_update(rmap, napi->napi_rmap_idx, mask);
-	if (err)
-		netdev_warn(napi->dev, "RMAP update failed (%d)\n",
-			    err);
+	if (napi->config && napi->dev->irq_affinity_auto)
+		cpumask_copy(&napi->config->affinity_mask, mask);
+
+#ifdef CONFIG_RFS_ACCEL
+	if (napi->dev->rx_cpu_rmap_auto) {
+		err = cpu_rmap_update(rmap, napi->napi_rmap_idx, mask);
+		if (err)
+			netdev_warn(napi->dev, "RMAP update failed (%d)\n",
+				    err);
+	}
+#endif
 }
 
+#ifdef CONFIG_RFS_ACCEL
 static void netif_napi_affinity_release(struct kref *ref)
 {
 	struct napi_struct *napi =
 		container_of(ref, struct napi_struct, notify.kref);
 	struct cpu_rmap *rmap = napi->dev->rx_cpu_rmap;
 
+	if (!napi->dev->rx_cpu_rmap_auto)
+		return;
 	rmap->obj[napi->napi_rmap_idx] = NULL;
 	napi->napi_rmap_idx = -1;
 	cpu_rmap_put(rmap);
@@ -6898,7 +6909,7 @@  static int napi_irq_cpu_rmap_add(struct napi_struct *napi, int irq)
 	struct cpu_rmap *rmap = napi->dev->rx_cpu_rmap;
 	int rc;
 
-	napi->notify.notify = netif_irq_cpu_rmap_notify;
+	napi->notify.notify = netif_napi_irq_notify;
 	napi->notify.release = netif_napi_affinity_release;
 	cpu_rmap_get(rmap);
 	rc = cpu_rmap_add(rmap, napi);
@@ -6948,6 +6959,10 @@  static void netif_del_cpu_rmap(struct net_device *dev)
 }
 
 #else
+static void netif_napi_affinity_release(struct kref *ref)
+{
+}
+
 static int napi_irq_cpu_rmap_add(struct napi_struct *napi, int irq)
 {
 	return 0;
@@ -6968,17 +6983,28 @@  void netif_napi_set_irq_locked(struct napi_struct *napi, int irq)
 {
 	int rc;
 
-	/* Remove existing rmap entries */
-	if (napi->dev->rx_cpu_rmap_auto &&
+	/* Remove existing resources */
+	if ((napi->dev->rx_cpu_rmap_auto || napi->dev->irq_affinity_auto) &&
 	    napi->irq != irq && napi->irq > 0)
 		irq_set_affinity_notifier(napi->irq, NULL);
 
 	napi->irq = irq;
-	if (irq > 0) {
+	if (irq < 0)
+		return;
+
+	if (napi->dev->rx_cpu_rmap_auto) {
 		rc = napi_irq_cpu_rmap_add(napi, irq);
 		if (rc)
 			netdev_warn(napi->dev, "Unable to update ARFS map (%d)\n",
 				    rc);
+	} else if (napi->config && napi->dev->irq_affinity_auto) {
+		napi->notify.notify = netif_napi_irq_notify;
+		napi->notify.release = netif_napi_affinity_release;
+
+		rc = irq_set_affinity_notifier(irq, &napi->notify);
+		if (rc)
+			netdev_warn(napi->dev, "Unable to set IRQ notifier (%d)\n",
+				    rc);
 	}
 }
 EXPORT_SYMBOL(netif_napi_set_irq_locked);
@@ -6988,6 +7014,10 @@  static void napi_restore_config(struct napi_struct *n)
 	n->defer_hard_irqs = n->config->defer_hard_irqs;
 	n->gro_flush_timeout = n->config->gro_flush_timeout;
 	n->irq_suspend_timeout = n->config->irq_suspend_timeout;
+
+	if (n->irq > 0 && n->dev->irq_affinity_auto)
+		irq_set_affinity(n->irq, &n->config->affinity_mask);
+
 	/* a NAPI ID might be stored in the config, if so use it. if not, use
 	 * napi_hash_add to generate one for us.
 	 */
@@ -7112,7 +7142,8 @@  void napi_disable_locked(struct napi_struct *n)
 	else
 		napi_hash_del(n);
 
-	if (n->irq > 0 && n->dev->rx_cpu_rmap_auto)
+	if (n->irq > 0 &&
+	    (n->dev->irq_affinity_auto || n->dev->rx_cpu_rmap_auto))
 		irq_set_affinity_notifier(n->irq, NULL);
 
 	clear_bit(NAPI_STATE_DISABLE, &n->state);
@@ -11550,9 +11581,9 @@  struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 		void (*setup)(struct net_device *),
 		unsigned int txqs, unsigned int rxqs)
 {
+	unsigned int maxqs, i, numa;
 	struct net_device *dev;
 	size_t napi_config_sz;
-	unsigned int maxqs;
 
 	BUG_ON(strlen(name) >= sizeof(dev->name));
 
@@ -11654,6 +11685,11 @@  struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	if (!dev->napi_config)
 		goto free_all;
 
+	numa = dev_to_node(&dev->dev);
+	for (i = 0; i < maxqs; i++)
+		cpumask_set_cpu(cpumask_local_spread(i, numa),
+				&dev->napi_config[i].affinity_mask);
+
 	strscpy(dev->name, name);
 	dev->name_assign_type = name_assign_type;
 	dev->group = INIT_NETDEV_GROUP;