Message ID | 20250204220622.156061-3-ahmed.zaki@intel.com (mailing list archive) |
---|---|
State | New |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: napi: add CPU affinity to napi->config | expand |
On Tue, Feb 04, 2025 at 03:06:19PM -0700, Ahmed Zaki wrote: > A common task for most drivers is to remember the user-set CPU affinity > to its IRQs. On each netdev reset, the driver should re-assign the > user's settings to the IRQs. > > Add CPU affinity mask to napi_config. To delegate the CPU affinity > management to the core, drivers must: > 1 - set the new netdev flag "irq_affinity_auto": > netif_enable_irq_affinity(netdev) > 2 - create the napi with persistent config: > netif_napi_add_config() > 3 - bind an IRQ to the napi instance: netif_napi_set_irq() > > the core will then make sure to use re-assign affinity to the napi's > IRQ. > > The default IRQ mask is set to one cpu starting from the closest NUMA. Not sure, but maybe the above should be documented somewhere like Documentation/networking/napi.rst or similar? Maybe that's too nit-picky, though, since the per-NAPI config stuff never made it into the docs (I'll propose a patch to fix that). > Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> > --- > include/linux/netdevice.h | 14 +++++++-- > net/core/dev.c | 62 +++++++++++++++++++++++++++++++-------- > 2 files changed, 61 insertions(+), 15 deletions(-) [...] > diff --git a/net/core/dev.c b/net/core/dev.c > index 33e84477c9c2..4cde7ac31e74 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c [...] > @@ -6968,17 +6983,28 @@ void netif_napi_set_irq_locked(struct napi_struct *napi, int irq) > { > int rc; > > - /* Remove existing rmap entries */ > - if (napi->dev->rx_cpu_rmap_auto && > + /* Remove existing resources */ > + if ((napi->dev->rx_cpu_rmap_auto || napi->dev->irq_affinity_auto) && > napi->irq != irq && napi->irq > 0) > irq_set_affinity_notifier(napi->irq, NULL); > > napi->irq = irq; > - if (irq > 0) { > + if (irq < 0) > + return; > + > + if (napi->dev->rx_cpu_rmap_auto) { > rc = napi_irq_cpu_rmap_add(napi, irq); > if (rc) > netdev_warn(napi->dev, "Unable to update ARFS map (%d)\n", > rc); > + } else if (napi->config && napi->dev->irq_affinity_auto) { > + napi->notify.notify = netif_napi_irq_notify; > + napi->notify.release = netif_napi_affinity_release; > + > + rc = irq_set_affinity_notifier(irq, &napi->notify); > + if (rc) > + netdev_warn(napi->dev, "Unable to set IRQ notifier (%d)\n", > + rc); > } Should there be a WARN_ON or WARN_ON_ONCE in here somewhere if the driver calls netif_napi_set_irq_locked but did not link NAPI config with a call to netif_napi_add_config? It seems like in that case the driver is buggy and a warning might be helpful.
On 2025-02-04 3:43 p.m., Joe Damato wrote: > On Tue, Feb 04, 2025 at 03:06:19PM -0700, Ahmed Zaki wrote: >> A common task for most drivers is to remember the user-set CPU affinity >> to its IRQs. On each netdev reset, the driver should re-assign the >> user's settings to the IRQs. >> >> Add CPU affinity mask to napi_config. To delegate the CPU affinity >> management to the core, drivers must: >> 1 - set the new netdev flag "irq_affinity_auto": >> netif_enable_irq_affinity(netdev) >> 2 - create the napi with persistent config: >> netif_napi_add_config() >> 3 - bind an IRQ to the napi instance: netif_napi_set_irq() >> >> the core will then make sure to use re-assign affinity to the napi's >> IRQ. >> >> The default IRQ mask is set to one cpu starting from the closest NUMA. > > Not sure, but maybe the above should be documented somewhere like > Documentation/networking/napi.rst or similar? > > Maybe that's too nit-picky, though, since the per-NAPI config stuff > never made it into the docs (I'll propose a patch to fix that). Yeah, and not all API is there (like netif_napi_set_irq()). > >> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> >> --- >> include/linux/netdevice.h | 14 +++++++-- >> net/core/dev.c | 62 +++++++++++++++++++++++++++++++-------- >> 2 files changed, 61 insertions(+), 15 deletions(-) > > [...] > >> diff --git a/net/core/dev.c b/net/core/dev.c >> index 33e84477c9c2..4cde7ac31e74 100644 >> --- a/net/core/dev.c >> +++ b/net/core/dev.c > > [...] > >> @@ -6968,17 +6983,28 @@ void netif_napi_set_irq_locked(struct napi_struct *napi, int irq) >> { >> int rc; >> >> - /* Remove existing rmap entries */ >> - if (napi->dev->rx_cpu_rmap_auto && >> + /* Remove existing resources */ >> + if ((napi->dev->rx_cpu_rmap_auto || napi->dev->irq_affinity_auto) && >> napi->irq != irq && napi->irq > 0) >> irq_set_affinity_notifier(napi->irq, NULL); >> >> napi->irq = irq; >> - if (irq > 0) { >> + if (irq < 0) >> + return; >> + >> + if (napi->dev->rx_cpu_rmap_auto) { >> rc = napi_irq_cpu_rmap_add(napi, irq); >> if (rc) >> netdev_warn(napi->dev, "Unable to update ARFS map (%d)\n", >> rc); >> + } else if (napi->config && napi->dev->irq_affinity_auto) { >> + napi->notify.notify = netif_napi_irq_notify; >> + napi->notify.release = netif_napi_affinity_release; >> + >> + rc = irq_set_affinity_notifier(irq, &napi->notify); >> + if (rc) >> + netdev_warn(napi->dev, "Unable to set IRQ notifier (%d)\n", >> + rc); >> } > > Should there be a WARN_ON or WARN_ON_ONCE in here somewhere if the > driver calls netif_napi_set_irq_locked but did not link NAPI config > with a call to netif_napi_add_config? > > It seems like in that case the driver is buggy and a warning might > be helpful. > I think that is a good idea, if there is a new version I can add this in the second part of the if: if (WARN_ON_ONCE(!napi->config)) return;
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 0d19fa98b65e..0436605ee607 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -352,6 +352,7 @@ struct napi_config { u64 gro_flush_timeout; u64 irq_suspend_timeout; u32 defer_hard_irqs; + cpumask_t affinity_mask; unsigned int napi_id; }; @@ -394,10 +395,8 @@ struct napi_struct { struct list_head dev_list; struct hlist_node napi_hash_node; int irq; -#ifdef CONFIG_RFS_ACCEL struct irq_affinity_notify notify; int napi_rmap_idx; -#endif int index; struct napi_config *config; }; @@ -1992,6 +1991,11 @@ enum netdev_reg_state { * * @threaded: napi threaded mode is enabled * + * @irq_affinity_auto: driver wants the core to manage the IRQ affinity. + * Set by netif_enable_irq_affinity(), then driver must + * create persistent napi by netif_napi_add_config() + * and finally bind napi to IRQ (netif_napi_set_irq). + * * @rx_cpu_rmap_auto: driver wants the core to manage the ARFS rmap. * Set by calling netif_enable_cpu_rmap(). * @@ -2402,6 +2406,7 @@ struct net_device { struct lock_class_key *qdisc_tx_busylock; bool proto_down; bool threaded; + bool irq_affinity_auto; bool rx_cpu_rmap_auto; /* priv_flags_slow, ungrouped to save space */ @@ -2662,6 +2667,11 @@ static inline void netdev_set_ml_priv(struct net_device *dev, dev->ml_priv_type = type; } +static inline void netif_enable_irq_affinity(struct net_device *dev) +{ + dev->irq_affinity_auto = true; +} + /* * Net namespace inlines */ diff --git a/net/core/dev.c b/net/core/dev.c index 33e84477c9c2..4cde7ac31e74 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6866,28 +6866,39 @@ void netif_queue_set_napi(struct net_device *dev, unsigned int queue_index, } EXPORT_SYMBOL(netif_queue_set_napi); -#ifdef CONFIG_RFS_ACCEL static void -netif_irq_cpu_rmap_notify(struct irq_affinity_notify *notify, - const cpumask_t *mask) +netif_napi_irq_notify(struct irq_affinity_notify *notify, + const cpumask_t *mask) { struct napi_struct *napi = container_of(notify, struct napi_struct, notify); +#ifdef CONFIG_RFS_ACCEL struct cpu_rmap *rmap = napi->dev->rx_cpu_rmap; int err; +#endif - err = cpu_rmap_update(rmap, napi->napi_rmap_idx, mask); - if (err) - netdev_warn(napi->dev, "RMAP update failed (%d)\n", - err); + if (napi->config && napi->dev->irq_affinity_auto) + cpumask_copy(&napi->config->affinity_mask, mask); + +#ifdef CONFIG_RFS_ACCEL + if (napi->dev->rx_cpu_rmap_auto) { + err = cpu_rmap_update(rmap, napi->napi_rmap_idx, mask); + if (err) + netdev_warn(napi->dev, "RMAP update failed (%d)\n", + err); + } +#endif } +#ifdef CONFIG_RFS_ACCEL static void netif_napi_affinity_release(struct kref *ref) { struct napi_struct *napi = container_of(ref, struct napi_struct, notify.kref); struct cpu_rmap *rmap = napi->dev->rx_cpu_rmap; + if (!napi->dev->rx_cpu_rmap_auto) + return; rmap->obj[napi->napi_rmap_idx] = NULL; napi->napi_rmap_idx = -1; cpu_rmap_put(rmap); @@ -6898,7 +6909,7 @@ static int napi_irq_cpu_rmap_add(struct napi_struct *napi, int irq) struct cpu_rmap *rmap = napi->dev->rx_cpu_rmap; int rc; - napi->notify.notify = netif_irq_cpu_rmap_notify; + napi->notify.notify = netif_napi_irq_notify; napi->notify.release = netif_napi_affinity_release; cpu_rmap_get(rmap); rc = cpu_rmap_add(rmap, napi); @@ -6948,6 +6959,10 @@ static void netif_del_cpu_rmap(struct net_device *dev) } #else +static void netif_napi_affinity_release(struct kref *ref) +{ +} + static int napi_irq_cpu_rmap_add(struct napi_struct *napi, int irq) { return 0; @@ -6968,17 +6983,28 @@ void netif_napi_set_irq_locked(struct napi_struct *napi, int irq) { int rc; - /* Remove existing rmap entries */ - if (napi->dev->rx_cpu_rmap_auto && + /* Remove existing resources */ + if ((napi->dev->rx_cpu_rmap_auto || napi->dev->irq_affinity_auto) && napi->irq != irq && napi->irq > 0) irq_set_affinity_notifier(napi->irq, NULL); napi->irq = irq; - if (irq > 0) { + if (irq < 0) + return; + + if (napi->dev->rx_cpu_rmap_auto) { rc = napi_irq_cpu_rmap_add(napi, irq); if (rc) netdev_warn(napi->dev, "Unable to update ARFS map (%d)\n", rc); + } else if (napi->config && napi->dev->irq_affinity_auto) { + napi->notify.notify = netif_napi_irq_notify; + napi->notify.release = netif_napi_affinity_release; + + rc = irq_set_affinity_notifier(irq, &napi->notify); + if (rc) + netdev_warn(napi->dev, "Unable to set IRQ notifier (%d)\n", + rc); } } EXPORT_SYMBOL(netif_napi_set_irq_locked); @@ -6988,6 +7014,10 @@ static void napi_restore_config(struct napi_struct *n) n->defer_hard_irqs = n->config->defer_hard_irqs; n->gro_flush_timeout = n->config->gro_flush_timeout; n->irq_suspend_timeout = n->config->irq_suspend_timeout; + + if (n->irq > 0 && n->dev->irq_affinity_auto) + irq_set_affinity(n->irq, &n->config->affinity_mask); + /* a NAPI ID might be stored in the config, if so use it. if not, use * napi_hash_add to generate one for us. */ @@ -7112,7 +7142,8 @@ void napi_disable_locked(struct napi_struct *n) else napi_hash_del(n); - if (n->irq > 0 && n->dev->rx_cpu_rmap_auto) + if (n->irq > 0 && + (n->dev->irq_affinity_auto || n->dev->rx_cpu_rmap_auto)) irq_set_affinity_notifier(n->irq, NULL); clear_bit(NAPI_STATE_DISABLE, &n->state); @@ -11550,9 +11581,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, void (*setup)(struct net_device *), unsigned int txqs, unsigned int rxqs) { + unsigned int maxqs, i, numa; struct net_device *dev; size_t napi_config_sz; - unsigned int maxqs; BUG_ON(strlen(name) >= sizeof(dev->name)); @@ -11654,6 +11685,11 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, if (!dev->napi_config) goto free_all; + numa = dev_to_node(&dev->dev); + for (i = 0; i < maxqs; i++) + cpumask_set_cpu(cpumask_local_spread(i, numa), + &dev->napi_config[i].affinity_mask); + strscpy(dev->name, name); dev->name_assign_type = name_assign_type; dev->group = INIT_NETDEV_GROUP;
A common task for most drivers is to remember the user-set CPU affinity to its IRQs. On each netdev reset, the driver should re-assign the user's settings to the IRQs. Add CPU affinity mask to napi_config. To delegate the CPU affinity management to the core, drivers must: 1 - set the new netdev flag "irq_affinity_auto": netif_enable_irq_affinity(netdev) 2 - create the napi with persistent config: netif_napi_add_config() 3 - bind an IRQ to the napi instance: netif_napi_set_irq() the core will then make sure to use re-assign affinity to the napi's IRQ. The default IRQ mask is set to one cpu starting from the closest NUMA. Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> --- include/linux/netdevice.h | 14 +++++++-- net/core/dev.c | 62 +++++++++++++++++++++++++++++++-------- 2 files changed, 61 insertions(+), 15 deletions(-)