Message ID | 20250102191227.2084046-4-skhawaja@google.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Add support to do threaded napi busy poll | expand |
On 1/2/2025 11:12 AM, Samiullah Khawaja wrote: > Add a new state to napi state enum: > > - STATE_THREADED_BUSY_POLL > Threaded busy poll is enabled/running for this napi. > > Following changes are introduced in the napi scheduling and state logic: > > - When threaded busy poll is enabled through sysfs it also enables > NAPI_STATE_THREADED so a kthread is created per napi. It also sets > NAPI_STATE_THREADED_BUSY_POLL bit on each napi to indicate that we are > supposed to busy poll for each napi. Looks like this patch is changing the sysfs 'threaded' field from boolean to an integer and value 2 is used to indicate threaded mode with busypoll. So I think the above comment should reflect that instead of just saying enabled for both threaded and busypoll. > > - When napi is scheduled with STATE_SCHED_THREADED and associated > kthread is woken up, the kthread owns the context. If > NAPI_STATE_THREADED_BUSY_POLL and NAPI_SCHED_THREADED both are set > then it means that we can busy poll. > > - To keep busy polling and to avoid scheduling of the interrupts, the > napi_complete_done returns false when both SCHED_THREADED and > THREADED_BUSY_POLL flags are set. Also napi_complete_done returns > early to avoid the STATE_SCHED_THREADED being unset. > > - If at any point STATE_THREADED_BUSY_POLL is unset, the > napi_complete_done will run and unset the SCHED_THREADED bit also. > This will make the associated kthread go to sleep as per existing > logic. When does STATE_THREADED_BUSY_POLL get unset? Don't we need a timeout value to come out of busypoll mode if there is no traffic? > > Signed-off-by: Samiullah Khawaja <skhawaja@google.com> > Reviewed-by: Willem de Bruijn <willemb@google.com> > --- > Documentation/ABI/testing/sysfs-class-net | 3 +- > Documentation/netlink/specs/netdev.yaml | 5 +- > .../net/ethernet/atheros/atl1c/atl1c_main.c | 2 +- > include/linux/netdevice.h | 24 +++++-- > net/core/dev.c | 72 ++++++++++++++++--- > net/core/net-sysfs.c | 2 +- > net/core/netdev-genl-gen.c | 2 +- > 7 files changed, 89 insertions(+), 21 deletions(-) > > diff --git a/Documentation/ABI/testing/sysfs-class-net b/Documentation/ABI/testing/sysfs-class-net > index ebf21beba846..15d7d36a8294 100644 > --- a/Documentation/ABI/testing/sysfs-class-net > +++ b/Documentation/ABI/testing/sysfs-class-net > @@ -343,7 +343,7 @@ Date: Jan 2021 > KernelVersion: 5.12 > Contact: netdev@vger.kernel.org > Description: > - Boolean value to control the threaded mode per device. User could > + Integer value to control the threaded mode per device. User could > set this value to enable/disable threaded mode for all napi > belonging to this device, without the need to do device up/down. > > @@ -351,4 +351,5 @@ Description: > == ================================== > 0 threaded mode disabled for this dev > 1 threaded mode enabled for this dev > + 2 threaded mode enabled, and busy polling enabled. > == ================================== > diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml > index aac343af7246..9c905243a1cc 100644 > --- a/Documentation/netlink/specs/netdev.yaml > +++ b/Documentation/netlink/specs/netdev.yaml > @@ -272,10 +272,11 @@ attribute-sets: > name: threaded > doc: Whether the napi is configured to operate in threaded polling > mode. If this is set to `1` then the NAPI context operates > - in threaded polling mode. > + in threaded polling mode. If this is set to `2` then the NAPI > + kthread also does busypolling. > type: u32 > checks: > - max: 1 > + max: 2 > - > name: queue > attributes: > diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c > index c571614b1d50..a709cddcd292 100644 > --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c > +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c > @@ -2688,7 +2688,7 @@ static int atl1c_probe(struct pci_dev *pdev, const struct pci_device_id *ent) > adapter->mii.mdio_write = atl1c_mdio_write; > adapter->mii.phy_id_mask = 0x1f; > adapter->mii.reg_num_mask = MDIO_CTRL_REG_MASK; > - dev_set_threaded(netdev, true); > + dev_set_threaded(netdev, DEV_NAPI_THREADED); > for (i = 0; i < adapter->rx_queue_count; ++i) > netif_napi_add(netdev, &adapter->rrd_ring[i].napi, > atl1c_clean_rx); > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 8f531d528869..c384ffe0976e 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -407,6 +407,8 @@ enum { > NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ > NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ > NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */ > + NAPI_STATE_THREADED_BUSY_POLL, /* The threaded napi poller will busy poll */ > + NAPI_STATE_SCHED_THREADED_BUSY_POLL, /* The threaded napi poller is busy polling */ > }; > > enum { > @@ -420,8 +422,14 @@ enum { > NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), > NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), > NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED), > + NAPIF_STATE_THREADED_BUSY_POLL = BIT(NAPI_STATE_THREADED_BUSY_POLL), > + NAPIF_STATE_SCHED_THREADED_BUSY_POLL > + = BIT(NAPI_STATE_SCHED_THREADED_BUSY_POLL), > }; > > +#define NAPIF_STATE_THREADED_BUSY_POLL_MASK \ > + (NAPIF_STATE_THREADED | NAPIF_STATE_THREADED_BUSY_POLL) > + > enum gro_result { > GRO_MERGED, > GRO_MERGED_FREE, > @@ -568,16 +576,24 @@ static inline bool napi_complete(struct napi_struct *n) > return napi_complete_done(n, 0); > } > > -int dev_set_threaded(struct net_device *dev, bool threaded); > +enum napi_threaded_state { > + NAPI_THREADED_OFF = 0, > + NAPI_THREADED = 1, > + NAPI_THREADED_BUSY_POLL = 2, > + NAPI_THREADED_MAX = NAPI_THREADED_BUSY_POLL, > +}; > + > +int dev_set_threaded(struct net_device *dev, enum napi_threaded_state threaded); > > /* > * napi_set_threaded - set napi threaded state > * @napi: NAPI context > - * @threaded: whether this napi does threaded polling > + * @threaded: threading mode > * > * Return 0 on success and negative errno on failure. > */ > -int napi_set_threaded(struct napi_struct *napi, bool threaded); > +int napi_set_threaded(struct napi_struct *napi, > + enum napi_threaded_state threaded); > > /** > * napi_disable - prevent NAPI from scheduling > @@ -2406,7 +2422,7 @@ struct net_device { > struct sfp_bus *sfp_bus; > struct lock_class_key *qdisc_tx_busylock; > bool proto_down; > - bool threaded; > + u8 threaded; > > /* priv_flags_slow, ungrouped to save space */ > unsigned long see_all_hwtstamp_requests:1; > diff --git a/net/core/dev.c b/net/core/dev.c > index 762977a62da2..b6cd9474bdd3 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -78,6 +78,7 @@ > #include <linux/slab.h> > #include <linux/sched.h> > #include <linux/sched/isolation.h> > +#include <linux/sched/types.h> > #include <linux/sched/mm.h> > #include <linux/smpboot.h> > #include <linux/mutex.h> > @@ -6231,7 +6232,8 @@ bool napi_complete_done(struct napi_struct *n, int work_done) > * the guarantee we will be called later. > */ > if (unlikely(n->state & (NAPIF_STATE_NPSVC | > - NAPIF_STATE_IN_BUSY_POLL))) > + NAPIF_STATE_IN_BUSY_POLL | > + NAPIF_STATE_SCHED_THREADED_BUSY_POLL))) > return false; > > if (work_done) { > @@ -6633,8 +6635,10 @@ static void init_gro_hash(struct napi_struct *napi) > napi->gro_bitmask = 0; > } > > -int napi_set_threaded(struct napi_struct *napi, bool threaded) > +int napi_set_threaded(struct napi_struct *napi, > + enum napi_threaded_state threaded) > { > + unsigned long val; > if (napi->dev->threaded) > return -EINVAL; > > @@ -6649,30 +6653,41 @@ int napi_set_threaded(struct napi_struct *napi, bool threaded) > > /* Make sure kthread is created before THREADED bit is set. */ > smp_mb__before_atomic(); > - assign_bit(NAPI_STATE_THREADED, &napi->state, threaded); > + val = 0; > + if (threaded == NAPI_THREADED_BUSY_POLL) > + val |= NAPIF_STATE_THREADED_BUSY_POLL; > + if (threaded) > + val |= NAPIF_STATE_THREADED; > + set_mask_bits(&napi->state, NAPIF_STATE_THREADED_BUSY_POLL_MASK, val); > > return 0; > } > > -int dev_set_threaded(struct net_device *dev, bool threaded) > +int dev_set_threaded(struct net_device *dev, enum napi_threaded_state threaded) > { > struct napi_struct *napi; > + unsigned long val; > int err = 0; > > if (dev->threaded == threaded) > return 0; > > + val = 0; > if (threaded) { > /* Check if threaded is set at napi level already */ > list_for_each_entry(napi, &dev->napi_list, dev_list) > if (test_bit(NAPI_STATE_THREADED, &napi->state)) > return -EINVAL; > > + val |= NAPIF_STATE_THREADED; > + if (threaded == NAPI_THREADED_BUSY_POLL) > + val |= NAPIF_STATE_THREADED_BUSY_POLL; > + > list_for_each_entry(napi, &dev->napi_list, dev_list) { > if (!napi->thread) { > err = napi_kthread_create(napi); > if (err) { > - threaded = false; > + threaded = NAPI_THREADED_OFF; > break; > } > } > @@ -6691,9 +6706,13 @@ int dev_set_threaded(struct net_device *dev, bool threaded) > * polled. In this case, the switch between threaded mode and > * softirq mode will happen in the next round of napi_schedule(). > * This should not cause hiccups/stalls to the live traffic. > + * > + * Switch to busy_poll threaded napi will occur after the threaded > + * napi is scheduled. > */ > list_for_each_entry(napi, &dev->napi_list, dev_list) > - assign_bit(NAPI_STATE_THREADED, &napi->state, threaded); > + set_mask_bits(&napi->state, > + NAPIF_STATE_THREADED_BUSY_POLL_MASK, val); > > return err; > } > @@ -7007,7 +7026,7 @@ static int napi_thread_wait(struct napi_struct *napi) > return -1; > } > > -static void napi_threaded_poll_loop(struct napi_struct *napi) > +static void napi_threaded_poll_loop(struct napi_struct *napi, bool busy_poll) > { > struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; > struct softnet_data *sd; > @@ -7036,22 +7055,53 @@ static void napi_threaded_poll_loop(struct napi_struct *napi) > } > skb_defer_free_flush(sd); > bpf_net_ctx_clear(bpf_net_ctx); > + > + /* Push the skbs up the stack if busy polling. */ > + if (busy_poll) > + __napi_gro_flush_helper(napi); > local_bh_enable(); > > - if (!repoll) > + /* If busy polling then do not break here because we need to > + * call cond_resched and rcu_softirq_qs_periodic to prevent > + * watchdog warnings. > + */ > + if (!repoll && !busy_poll) > break; > > rcu_softirq_qs_periodic(last_qs); > cond_resched(); > + > + if (!repoll) > + break; > } > } > > static int napi_threaded_poll(void *data) > { > struct napi_struct *napi = data; > + bool busy_poll_sched; > + unsigned long val; > + bool busy_poll; > + > + while (!napi_thread_wait(napi)) { > + /* Once woken up, this means that we are scheduled as threaded > + * napi and this thread owns the napi context, if busy poll > + * state is set then we busy poll this napi. > + */ > + val = READ_ONCE(napi->state); > + busy_poll = val & NAPIF_STATE_THREADED_BUSY_POLL; > + busy_poll_sched = val & NAPIF_STATE_SCHED_THREADED_BUSY_POLL; > + > + /* Do not busy poll if napi is disabled. */ > + if (unlikely(val & NAPIF_STATE_DISABLE)) > + busy_poll = false; > + > + if (busy_poll != busy_poll_sched) > + assign_bit(NAPI_STATE_SCHED_THREADED_BUSY_POLL, > + &napi->state, busy_poll); > > - while (!napi_thread_wait(napi)) > - napi_threaded_poll_loop(napi); > + napi_threaded_poll_loop(napi, busy_poll); > + } > > return 0; > } > @@ -12205,7 +12255,7 @@ static void run_backlog_napi(unsigned int cpu) > { > struct softnet_data *sd = per_cpu_ptr(&softnet_data, cpu); > > - napi_threaded_poll_loop(&sd->backlog); > + napi_threaded_poll_loop(&sd->backlog, false); > } > > static void backlog_napi_setup(unsigned int cpu) > diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c > index 2d9afc6e2161..36d0a22e341c 100644 > --- a/net/core/net-sysfs.c > +++ b/net/core/net-sysfs.c > @@ -626,7 +626,7 @@ static int modify_napi_threaded(struct net_device *dev, unsigned long val) > if (list_empty(&dev->napi_list)) > return -EOPNOTSUPP; > > - if (val != 0 && val != 1) > + if (val > NAPI_THREADED_MAX) > return -EOPNOTSUPP; > > ret = dev_set_threaded(dev, val); > diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c > index 93dc74dad6de..4086d2577dcc 100644 > --- a/net/core/netdev-genl-gen.c > +++ b/net/core/netdev-genl-gen.c > @@ -102,7 +102,7 @@ static const struct nla_policy netdev_napi_set_nl_policy[NETDEV_A_NAPI_IRQ_SUSPE > /* NETDEV_CMD_NAPI_SET_THREADED - do */ > static const struct nla_policy netdev_napi_set_threaded_nl_policy[NETDEV_A_NAPI_THREADED + 1] = { > [NETDEV_A_NAPI_ID] = { .type = NLA_U32, }, > - [NETDEV_A_NAPI_THREADED] = NLA_POLICY_MAX(NLA_U32, 1), > + [NETDEV_A_NAPI_THREADED] = NLA_POLICY_MAX(NLA_U32, 2), > }; > > /* Ops table for netdev */
On 01/02, Samiullah Khawaja wrote: > Add a new state to napi state enum: > > - STATE_THREADED_BUSY_POLL > Threaded busy poll is enabled/running for this napi. > > Following changes are introduced in the napi scheduling and state logic: > > - When threaded busy poll is enabled through sysfs it also enables > NAPI_STATE_THREADED so a kthread is created per napi. It also sets > NAPI_STATE_THREADED_BUSY_POLL bit on each napi to indicate that we are > supposed to busy poll for each napi. > > - When napi is scheduled with STATE_SCHED_THREADED and associated > kthread is woken up, the kthread owns the context. If > NAPI_STATE_THREADED_BUSY_POLL and NAPI_SCHED_THREADED both are set > then it means that we can busy poll. > > - To keep busy polling and to avoid scheduling of the interrupts, the > napi_complete_done returns false when both SCHED_THREADED and > THREADED_BUSY_POLL flags are set. Also napi_complete_done returns > early to avoid the STATE_SCHED_THREADED being unset. > > - If at any point STATE_THREADED_BUSY_POLL is unset, the > napi_complete_done will run and unset the SCHED_THREADED bit also. > This will make the associated kthread go to sleep as per existing > logic. > > Signed-off-by: Samiullah Khawaja <skhawaja@google.com> > Reviewed-by: Willem de Bruijn <willemb@google.com> > --- > Documentation/ABI/testing/sysfs-class-net | 3 +- > Documentation/netlink/specs/netdev.yaml | 5 +- > .../net/ethernet/atheros/atl1c/atl1c_main.c | 2 +- > include/linux/netdevice.h | 24 +++++-- > net/core/dev.c | 72 ++++++++++++++++--- > net/core/net-sysfs.c | 2 +- > net/core/netdev-genl-gen.c | 2 +- > 7 files changed, 89 insertions(+), 21 deletions(-) > > diff --git a/Documentation/ABI/testing/sysfs-class-net b/Documentation/ABI/testing/sysfs-class-net > index ebf21beba846..15d7d36a8294 100644 > --- a/Documentation/ABI/testing/sysfs-class-net > +++ b/Documentation/ABI/testing/sysfs-class-net > @@ -343,7 +343,7 @@ Date: Jan 2021 > KernelVersion: 5.12 > Contact: netdev@vger.kernel.org > Description: > - Boolean value to control the threaded mode per device. User could > + Integer value to control the threaded mode per device. User could > set this value to enable/disable threaded mode for all napi > belonging to this device, without the need to do device up/down. > > @@ -351,4 +351,5 @@ Description: > == ================================== > 0 threaded mode disabled for this dev > 1 threaded mode enabled for this dev > + 2 threaded mode enabled, and busy polling enabled. > == ================================== > diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml > index aac343af7246..9c905243a1cc 100644 > --- a/Documentation/netlink/specs/netdev.yaml > +++ b/Documentation/netlink/specs/netdev.yaml > @@ -272,10 +272,11 @@ attribute-sets: > name: threaded > doc: Whether the napi is configured to operate in threaded polling > mode. If this is set to `1` then the NAPI context operates > - in threaded polling mode. > + in threaded polling mode. If this is set to `2` then the NAPI > + kthread also does busypolling. > type: u32 > checks: > - max: 1 > + max: 2 > - I'd vote for a separate threaded-busy-poll parameter (and separate doc) instead of overloading 'threaded' bool. But if you prefer to have a single argument, let's at least change it to enum with proper values for busy and non-busy modes instead of magic numbers?
Hi Samiullah, kernel test robot noticed the following build errors: [auto build test ERROR on net-next/main] url: https://github.com/intel-lab-lkp/linux/commits/Samiullah-Khawaja/Add-support-to-set-napi-threaded-for-individual-napi/20250103-031428 base: net-next/main patch link: https://lore.kernel.org/r/20250102191227.2084046-4-skhawaja%40google.com patch subject: [PATCH net-next 3/3] Extend napi threaded polling to allow kthread based busy polling config: i386-buildonly-randconfig-006-20250103 (https://download.01.org/0day-ci/archive/20250103/202501030842.OdBE8ADq-lkp@intel.com/config) compiler: gcc-12 (Debian 12.2.0-14) 12.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250103/202501030842.OdBE8ADq-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202501030842.OdBE8ADq-lkp@intel.com/ All errors (new ones prefixed by >>): drivers/net/ethernet/atheros/atl1c/atl1c_main.c: In function 'atl1c_probe': >> drivers/net/ethernet/atheros/atl1c/atl1c_main.c:2691:34: error: 'DEV_NAPI_THREADED' undeclared (first use in this function); did you mean 'NAPI_THREADED'? 2691 | dev_set_threaded(netdev, DEV_NAPI_THREADED); | ^~~~~~~~~~~~~~~~~ | NAPI_THREADED drivers/net/ethernet/atheros/atl1c/atl1c_main.c:2691:34: note: each undeclared identifier is reported only once for each function it appears in vim +2691 drivers/net/ethernet/atheros/atl1c/atl1c_main.c 2600 2601 /** 2602 * atl1c_probe - Device Initialization Routine 2603 * @pdev: PCI device information struct 2604 * @ent: entry in atl1c_pci_tbl 2605 * 2606 * Returns 0 on success, negative on failure 2607 * 2608 * atl1c_probe initializes an adapter identified by a pci_dev structure. 2609 * The OS initialization, configuring of the adapter private structure, 2610 * and a hardware reset occur. 2611 */ 2612 static int atl1c_probe(struct pci_dev *pdev, const struct pci_device_id *ent) 2613 { 2614 struct net_device *netdev; 2615 struct atl1c_adapter *adapter; 2616 static int cards_found; 2617 u8 __iomem *hw_addr; 2618 enum atl1c_nic_type nic_type; 2619 u32 queue_count = 1; 2620 int err = 0; 2621 int i; 2622 2623 /* enable device (incl. PCI PM wakeup and hotplug setup) */ 2624 err = pci_enable_device_mem(pdev); 2625 if (err) 2626 return dev_err_probe(&pdev->dev, err, "cannot enable PCI device\n"); 2627 2628 /* 2629 * The atl1c chip can DMA to 64-bit addresses, but it uses a single 2630 * shared register for the high 32 bits, so only a single, aligned, 2631 * 4 GB physical address range can be used at a time. 2632 * 2633 * Supporting 64-bit DMA on this hardware is more trouble than it's 2634 * worth. It is far easier to limit to 32-bit DMA than update 2635 * various kernel subsystems to support the mechanics required by a 2636 * fixed-high-32-bit system. 2637 */ 2638 err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32)); 2639 if (err) { 2640 dev_err(&pdev->dev, "No usable DMA configuration,aborting\n"); 2641 goto err_dma; 2642 } 2643 2644 err = pci_request_regions(pdev, atl1c_driver_name); 2645 if (err) { 2646 dev_err(&pdev->dev, "cannot obtain PCI resources\n"); 2647 goto err_pci_reg; 2648 } 2649 2650 pci_set_master(pdev); 2651 2652 hw_addr = pci_ioremap_bar(pdev, 0); 2653 if (!hw_addr) { 2654 err = -EIO; 2655 dev_err(&pdev->dev, "cannot map device registers\n"); 2656 goto err_ioremap; 2657 } 2658 2659 nic_type = atl1c_get_mac_type(pdev, hw_addr); 2660 if (nic_type == athr_mt) 2661 queue_count = 4; 2662 2663 netdev = alloc_etherdev_mq(sizeof(struct atl1c_adapter), queue_count); 2664 if (netdev == NULL) { 2665 err = -ENOMEM; 2666 goto err_alloc_etherdev; 2667 } 2668 2669 err = atl1c_init_netdev(netdev, pdev); 2670 if (err) { 2671 dev_err(&pdev->dev, "init netdevice failed\n"); 2672 goto err_init_netdev; 2673 } 2674 adapter = netdev_priv(netdev); 2675 adapter->bd_number = cards_found; 2676 adapter->netdev = netdev; 2677 adapter->pdev = pdev; 2678 adapter->hw.adapter = adapter; 2679 adapter->hw.nic_type = nic_type; 2680 adapter->msg_enable = netif_msg_init(-1, atl1c_default_msg); 2681 adapter->hw.hw_addr = hw_addr; 2682 adapter->tx_queue_count = queue_count; 2683 adapter->rx_queue_count = queue_count; 2684 2685 /* init mii data */ 2686 adapter->mii.dev = netdev; 2687 adapter->mii.mdio_read = atl1c_mdio_read; 2688 adapter->mii.mdio_write = atl1c_mdio_write; 2689 adapter->mii.phy_id_mask = 0x1f; 2690 adapter->mii.reg_num_mask = MDIO_CTRL_REG_MASK; > 2691 dev_set_threaded(netdev, DEV_NAPI_THREADED); 2692 for (i = 0; i < adapter->rx_queue_count; ++i) 2693 netif_napi_add(netdev, &adapter->rrd_ring[i].napi, 2694 atl1c_clean_rx); 2695 for (i = 0; i < adapter->tx_queue_count; ++i) 2696 netif_napi_add_tx(netdev, &adapter->tpd_ring[i].napi, 2697 atl1c_clean_tx); 2698 timer_setup(&adapter->phy_config_timer, atl1c_phy_config, 0); 2699 /* setup the private structure */ 2700 err = atl1c_sw_init(adapter); 2701 if (err) { 2702 dev_err(&pdev->dev, "net device private data init failed\n"); 2703 goto err_sw_init; 2704 } 2705 /* set max MTU */ 2706 atl1c_set_max_mtu(netdev); 2707 2708 atl1c_reset_pcie(&adapter->hw, ATL1C_PCIE_L0S_L1_DISABLE); 2709 2710 /* Init GPHY as early as possible due to power saving issue */ 2711 atl1c_phy_reset(&adapter->hw); 2712 2713 err = atl1c_reset_mac(&adapter->hw); 2714 if (err) { 2715 err = -EIO; 2716 goto err_reset; 2717 } 2718 2719 /* reset the controller to 2720 * put the device in a known good starting state */ 2721 err = atl1c_phy_init(&adapter->hw); 2722 if (err) { 2723 err = -EIO; 2724 goto err_reset; 2725 } 2726 if (atl1c_read_mac_addr(&adapter->hw)) { 2727 /* got a random MAC address, set NET_ADDR_RANDOM to netdev */ 2728 netdev->addr_assign_type = NET_ADDR_RANDOM; 2729 } 2730 eth_hw_addr_set(netdev, adapter->hw.mac_addr); 2731 if (netif_msg_probe(adapter)) 2732 dev_dbg(&pdev->dev, "mac address : %pM\n", 2733 adapter->hw.mac_addr); 2734 2735 atl1c_hw_set_mac_addr(&adapter->hw, adapter->hw.mac_addr); 2736 INIT_WORK(&adapter->common_task, atl1c_common_task); 2737 adapter->work_event = 0; 2738 err = register_netdev(netdev); 2739 if (err) { 2740 dev_err(&pdev->dev, "register netdevice failed\n"); 2741 goto err_register; 2742 } 2743 2744 cards_found++; 2745 return 0; 2746 2747 err_reset: 2748 err_register: 2749 err_sw_init: 2750 err_init_netdev: 2751 free_netdev(netdev); 2752 err_alloc_etherdev: 2753 iounmap(hw_addr); 2754 err_ioremap: 2755 pci_release_regions(pdev); 2756 err_pci_reg: 2757 err_dma: 2758 pci_disable_device(pdev); 2759 return err; 2760 } 2761
Hi Samiullah, kernel test robot noticed the following build errors: [auto build test ERROR on net-next/main] url: https://github.com/intel-lab-lkp/linux/commits/Samiullah-Khawaja/Add-support-to-set-napi-threaded-for-individual-napi/20250103-031428 base: net-next/main patch link: https://lore.kernel.org/r/20250102191227.2084046-4-skhawaja%40google.com patch subject: [PATCH net-next 3/3] Extend napi threaded polling to allow kthread based busy polling config: x86_64-kexec (https://download.01.org/0day-ci/archive/20250103/202501031537.QXSNLahs-lkp@intel.com/config) compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250103/202501031537.QXSNLahs-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202501031537.QXSNLahs-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from drivers/net/ethernet/atheros/atl1c/atl1c_main.c:9: In file included from drivers/net/ethernet/atheros/atl1c/atl1c.h:16: In file included from include/linux/pci.h:1658: In file included from include/linux/dmapool.h:14: In file included from include/linux/scatterlist.h:8: In file included from include/linux/mm.h:2223: include/linux/vmstat.h:504:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] 504 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~ ^ 505 | item]; | ~~~~ include/linux/vmstat.h:511:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] 511 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~ ^ 512 | NR_VM_NUMA_EVENT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~~ include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 518 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_" | ~~~~~~~~~~~ ^ ~~~ include/linux/vmstat.h:524:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion] 524 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~ ^ 525 | NR_VM_NUMA_EVENT_ITEMS + | ~~~~~~~~~~~~~~~~~~~~~~ >> drivers/net/ethernet/atheros/atl1c/atl1c_main.c:2691:27: error: use of undeclared identifier 'DEV_NAPI_THREADED'; did you mean 'NAPI_THREADED'? 2691 | dev_set_threaded(netdev, DEV_NAPI_THREADED); | ^~~~~~~~~~~~~~~~~ | NAPI_THREADED include/linux/netdevice.h:581:2: note: 'NAPI_THREADED' declared here 581 | NAPI_THREADED = 1, | ^ 4 warnings and 1 error generated. vim +2691 drivers/net/ethernet/atheros/atl1c/atl1c_main.c 2600 2601 /** 2602 * atl1c_probe - Device Initialization Routine 2603 * @pdev: PCI device information struct 2604 * @ent: entry in atl1c_pci_tbl 2605 * 2606 * Returns 0 on success, negative on failure 2607 * 2608 * atl1c_probe initializes an adapter identified by a pci_dev structure. 2609 * The OS initialization, configuring of the adapter private structure, 2610 * and a hardware reset occur. 2611 */ 2612 static int atl1c_probe(struct pci_dev *pdev, const struct pci_device_id *ent) 2613 { 2614 struct net_device *netdev; 2615 struct atl1c_adapter *adapter; 2616 static int cards_found; 2617 u8 __iomem *hw_addr; 2618 enum atl1c_nic_type nic_type; 2619 u32 queue_count = 1; 2620 int err = 0; 2621 int i; 2622 2623 /* enable device (incl. PCI PM wakeup and hotplug setup) */ 2624 err = pci_enable_device_mem(pdev); 2625 if (err) 2626 return dev_err_probe(&pdev->dev, err, "cannot enable PCI device\n"); 2627 2628 /* 2629 * The atl1c chip can DMA to 64-bit addresses, but it uses a single 2630 * shared register for the high 32 bits, so only a single, aligned, 2631 * 4 GB physical address range can be used at a time. 2632 * 2633 * Supporting 64-bit DMA on this hardware is more trouble than it's 2634 * worth. It is far easier to limit to 32-bit DMA than update 2635 * various kernel subsystems to support the mechanics required by a 2636 * fixed-high-32-bit system. 2637 */ 2638 err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32)); 2639 if (err) { 2640 dev_err(&pdev->dev, "No usable DMA configuration,aborting\n"); 2641 goto err_dma; 2642 } 2643 2644 err = pci_request_regions(pdev, atl1c_driver_name); 2645 if (err) { 2646 dev_err(&pdev->dev, "cannot obtain PCI resources\n"); 2647 goto err_pci_reg; 2648 } 2649 2650 pci_set_master(pdev); 2651 2652 hw_addr = pci_ioremap_bar(pdev, 0); 2653 if (!hw_addr) { 2654 err = -EIO; 2655 dev_err(&pdev->dev, "cannot map device registers\n"); 2656 goto err_ioremap; 2657 } 2658 2659 nic_type = atl1c_get_mac_type(pdev, hw_addr); 2660 if (nic_type == athr_mt) 2661 queue_count = 4; 2662 2663 netdev = alloc_etherdev_mq(sizeof(struct atl1c_adapter), queue_count); 2664 if (netdev == NULL) { 2665 err = -ENOMEM; 2666 goto err_alloc_etherdev; 2667 } 2668 2669 err = atl1c_init_netdev(netdev, pdev); 2670 if (err) { 2671 dev_err(&pdev->dev, "init netdevice failed\n"); 2672 goto err_init_netdev; 2673 } 2674 adapter = netdev_priv(netdev); 2675 adapter->bd_number = cards_found; 2676 adapter->netdev = netdev; 2677 adapter->pdev = pdev; 2678 adapter->hw.adapter = adapter; 2679 adapter->hw.nic_type = nic_type; 2680 adapter->msg_enable = netif_msg_init(-1, atl1c_default_msg); 2681 adapter->hw.hw_addr = hw_addr; 2682 adapter->tx_queue_count = queue_count; 2683 adapter->rx_queue_count = queue_count; 2684 2685 /* init mii data */ 2686 adapter->mii.dev = netdev; 2687 adapter->mii.mdio_read = atl1c_mdio_read; 2688 adapter->mii.mdio_write = atl1c_mdio_write; 2689 adapter->mii.phy_id_mask = 0x1f; 2690 adapter->mii.reg_num_mask = MDIO_CTRL_REG_MASK; > 2691 dev_set_threaded(netdev, DEV_NAPI_THREADED); 2692 for (i = 0; i < adapter->rx_queue_count; ++i) 2693 netif_napi_add(netdev, &adapter->rrd_ring[i].napi, 2694 atl1c_clean_rx); 2695 for (i = 0; i < adapter->tx_queue_count; ++i) 2696 netif_napi_add_tx(netdev, &adapter->tpd_ring[i].napi, 2697 atl1c_clean_tx); 2698 timer_setup(&adapter->phy_config_timer, atl1c_phy_config, 0); 2699 /* setup the private structure */ 2700 err = atl1c_sw_init(adapter); 2701 if (err) { 2702 dev_err(&pdev->dev, "net device private data init failed\n"); 2703 goto err_sw_init; 2704 } 2705 /* set max MTU */ 2706 atl1c_set_max_mtu(netdev); 2707 2708 atl1c_reset_pcie(&adapter->hw, ATL1C_PCIE_L0S_L1_DISABLE); 2709 2710 /* Init GPHY as early as possible due to power saving issue */ 2711 atl1c_phy_reset(&adapter->hw); 2712 2713 err = atl1c_reset_mac(&adapter->hw); 2714 if (err) { 2715 err = -EIO; 2716 goto err_reset; 2717 } 2718 2719 /* reset the controller to 2720 * put the device in a known good starting state */ 2721 err = atl1c_phy_init(&adapter->hw); 2722 if (err) { 2723 err = -EIO; 2724 goto err_reset; 2725 } 2726 if (atl1c_read_mac_addr(&adapter->hw)) { 2727 /* got a random MAC address, set NET_ADDR_RANDOM to netdev */ 2728 netdev->addr_assign_type = NET_ADDR_RANDOM; 2729 } 2730 eth_hw_addr_set(netdev, adapter->hw.mac_addr); 2731 if (netif_msg_probe(adapter)) 2732 dev_dbg(&pdev->dev, "mac address : %pM\n", 2733 adapter->hw.mac_addr); 2734 2735 atl1c_hw_set_mac_addr(&adapter->hw, adapter->hw.mac_addr); 2736 INIT_WORK(&adapter->common_task, atl1c_common_task); 2737 adapter->work_event = 0; 2738 err = register_netdev(netdev); 2739 if (err) { 2740 dev_err(&pdev->dev, "register netdevice failed\n"); 2741 goto err_register; 2742 } 2743 2744 cards_found++; 2745 return 0; 2746 2747 err_reset: 2748 err_register: 2749 err_sw_init: 2750 err_init_netdev: 2751 free_netdev(netdev); 2752 err_alloc_etherdev: 2753 iounmap(hw_addr); 2754 err_ioremap: 2755 pci_release_regions(pdev); 2756 err_pci_reg: 2757 err_dma: 2758 pci_disable_device(pdev); 2759 return err; 2760 } 2761
Hi Samiullah,
kernel test robot noticed the following build warnings:
[auto build test WARNING on net-next/main]
url: https://github.com/intel-lab-lkp/linux/commits/Samiullah-Khawaja/Add-support-to-set-napi-threaded-for-individual-napi/20250103-031428
base: net-next/main
patch link: https://lore.kernel.org/r/20250102191227.2084046-4-skhawaja%40google.com
patch subject: [PATCH net-next 3/3] Extend napi threaded polling to allow kthread based busy polling
config: x86_64-randconfig-073-20250103 (https://download.01.org/0day-ci/archive/20250103/202501031530.ss0kvHke-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250103/202501031530.ss0kvHke-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501031530.ss0kvHke-lkp@intel.com/
All warnings (new ones prefixed by >>):
drivers/net/ethernet/mellanox/mlxsw/pci.c: In function 'mlxsw_pci_napi_devs_init':
>> drivers/net/ethernet/mellanox/mlxsw/pci.c:158:50: warning: implicit conversion from 'enum <anonymous>' to 'enum napi_threaded_state' [-Wenum-conversion]
158 | dev_set_threaded(mlxsw_pci->napi_dev_rx, true);
| ^~~~
vim +158 drivers/net/ethernet/mellanox/mlxsw/pci.c
eda6500a987a02 Jiri Pirko 2015-07-29 140
5d01ed2e970812 Amit Cohen 2024-04-26 141 static int mlxsw_pci_napi_devs_init(struct mlxsw_pci *mlxsw_pci)
5d01ed2e970812 Amit Cohen 2024-04-26 142 {
5d01ed2e970812 Amit Cohen 2024-04-26 143 int err;
5d01ed2e970812 Amit Cohen 2024-04-26 144
5d01ed2e970812 Amit Cohen 2024-04-26 145 mlxsw_pci->napi_dev_tx = alloc_netdev_dummy(0);
5d01ed2e970812 Amit Cohen 2024-04-26 146 if (!mlxsw_pci->napi_dev_tx)
5d01ed2e970812 Amit Cohen 2024-04-26 147 return -ENOMEM;
5d01ed2e970812 Amit Cohen 2024-04-26 148 strscpy(mlxsw_pci->napi_dev_tx->name, "mlxsw_tx",
5d01ed2e970812 Amit Cohen 2024-04-26 149 sizeof(mlxsw_pci->napi_dev_tx->name));
5d01ed2e970812 Amit Cohen 2024-04-26 150
5d01ed2e970812 Amit Cohen 2024-04-26 151 mlxsw_pci->napi_dev_rx = alloc_netdev_dummy(0);
5d01ed2e970812 Amit Cohen 2024-04-26 152 if (!mlxsw_pci->napi_dev_rx) {
5d01ed2e970812 Amit Cohen 2024-04-26 153 err = -ENOMEM;
5d01ed2e970812 Amit Cohen 2024-04-26 154 goto err_alloc_rx;
5d01ed2e970812 Amit Cohen 2024-04-26 155 }
5d01ed2e970812 Amit Cohen 2024-04-26 156 strscpy(mlxsw_pci->napi_dev_rx->name, "mlxsw_rx",
5d01ed2e970812 Amit Cohen 2024-04-26 157 sizeof(mlxsw_pci->napi_dev_rx->name));
5d01ed2e970812 Amit Cohen 2024-04-26 @158 dev_set_threaded(mlxsw_pci->napi_dev_rx, true);
5d01ed2e970812 Amit Cohen 2024-04-26 159
5d01ed2e970812 Amit Cohen 2024-04-26 160 return 0;
5d01ed2e970812 Amit Cohen 2024-04-26 161
5d01ed2e970812 Amit Cohen 2024-04-26 162 err_alloc_rx:
5d01ed2e970812 Amit Cohen 2024-04-26 163 free_netdev(mlxsw_pci->napi_dev_tx);
5d01ed2e970812 Amit Cohen 2024-04-26 164 return err;
5d01ed2e970812 Amit Cohen 2024-04-26 165 }
5d01ed2e970812 Amit Cohen 2024-04-26 166
diff --git a/Documentation/ABI/testing/sysfs-class-net b/Documentation/ABI/testing/sysfs-class-net index ebf21beba846..15d7d36a8294 100644 --- a/Documentation/ABI/testing/sysfs-class-net +++ b/Documentation/ABI/testing/sysfs-class-net @@ -343,7 +343,7 @@ Date: Jan 2021 KernelVersion: 5.12 Contact: netdev@vger.kernel.org Description: - Boolean value to control the threaded mode per device. User could + Integer value to control the threaded mode per device. User could set this value to enable/disable threaded mode for all napi belonging to this device, without the need to do device up/down. @@ -351,4 +351,5 @@ Description: == ================================== 0 threaded mode disabled for this dev 1 threaded mode enabled for this dev + 2 threaded mode enabled, and busy polling enabled. == ================================== diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml index aac343af7246..9c905243a1cc 100644 --- a/Documentation/netlink/specs/netdev.yaml +++ b/Documentation/netlink/specs/netdev.yaml @@ -272,10 +272,11 @@ attribute-sets: name: threaded doc: Whether the napi is configured to operate in threaded polling mode. If this is set to `1` then the NAPI context operates - in threaded polling mode. + in threaded polling mode. If this is set to `2` then the NAPI + kthread also does busypolling. type: u32 checks: - max: 1 + max: 2 - name: queue attributes: diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c index c571614b1d50..a709cddcd292 100644 --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c @@ -2688,7 +2688,7 @@ static int atl1c_probe(struct pci_dev *pdev, const struct pci_device_id *ent) adapter->mii.mdio_write = atl1c_mdio_write; adapter->mii.phy_id_mask = 0x1f; adapter->mii.reg_num_mask = MDIO_CTRL_REG_MASK; - dev_set_threaded(netdev, true); + dev_set_threaded(netdev, DEV_NAPI_THREADED); for (i = 0; i < adapter->rx_queue_count; ++i) netif_napi_add(netdev, &adapter->rrd_ring[i].napi, atl1c_clean_rx); diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 8f531d528869..c384ffe0976e 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -407,6 +407,8 @@ enum { NAPI_STATE_PREFER_BUSY_POLL, /* prefer busy-polling over softirq processing*/ NAPI_STATE_THREADED, /* The poll is performed inside its own thread*/ NAPI_STATE_SCHED_THREADED, /* Napi is currently scheduled in threaded mode */ + NAPI_STATE_THREADED_BUSY_POLL, /* The threaded napi poller will busy poll */ + NAPI_STATE_SCHED_THREADED_BUSY_POLL, /* The threaded napi poller is busy polling */ }; enum { @@ -420,8 +422,14 @@ enum { NAPIF_STATE_PREFER_BUSY_POLL = BIT(NAPI_STATE_PREFER_BUSY_POLL), NAPIF_STATE_THREADED = BIT(NAPI_STATE_THREADED), NAPIF_STATE_SCHED_THREADED = BIT(NAPI_STATE_SCHED_THREADED), + NAPIF_STATE_THREADED_BUSY_POLL = BIT(NAPI_STATE_THREADED_BUSY_POLL), + NAPIF_STATE_SCHED_THREADED_BUSY_POLL + = BIT(NAPI_STATE_SCHED_THREADED_BUSY_POLL), }; +#define NAPIF_STATE_THREADED_BUSY_POLL_MASK \ + (NAPIF_STATE_THREADED | NAPIF_STATE_THREADED_BUSY_POLL) + enum gro_result { GRO_MERGED, GRO_MERGED_FREE, @@ -568,16 +576,24 @@ static inline bool napi_complete(struct napi_struct *n) return napi_complete_done(n, 0); } -int dev_set_threaded(struct net_device *dev, bool threaded); +enum napi_threaded_state { + NAPI_THREADED_OFF = 0, + NAPI_THREADED = 1, + NAPI_THREADED_BUSY_POLL = 2, + NAPI_THREADED_MAX = NAPI_THREADED_BUSY_POLL, +}; + +int dev_set_threaded(struct net_device *dev, enum napi_threaded_state threaded); /* * napi_set_threaded - set napi threaded state * @napi: NAPI context - * @threaded: whether this napi does threaded polling + * @threaded: threading mode * * Return 0 on success and negative errno on failure. */ -int napi_set_threaded(struct napi_struct *napi, bool threaded); +int napi_set_threaded(struct napi_struct *napi, + enum napi_threaded_state threaded); /** * napi_disable - prevent NAPI from scheduling @@ -2406,7 +2422,7 @@ struct net_device { struct sfp_bus *sfp_bus; struct lock_class_key *qdisc_tx_busylock; bool proto_down; - bool threaded; + u8 threaded; /* priv_flags_slow, ungrouped to save space */ unsigned long see_all_hwtstamp_requests:1; diff --git a/net/core/dev.c b/net/core/dev.c index 762977a62da2..b6cd9474bdd3 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -78,6 +78,7 @@ #include <linux/slab.h> #include <linux/sched.h> #include <linux/sched/isolation.h> +#include <linux/sched/types.h> #include <linux/sched/mm.h> #include <linux/smpboot.h> #include <linux/mutex.h> @@ -6231,7 +6232,8 @@ bool napi_complete_done(struct napi_struct *n, int work_done) * the guarantee we will be called later. */ if (unlikely(n->state & (NAPIF_STATE_NPSVC | - NAPIF_STATE_IN_BUSY_POLL))) + NAPIF_STATE_IN_BUSY_POLL | + NAPIF_STATE_SCHED_THREADED_BUSY_POLL))) return false; if (work_done) { @@ -6633,8 +6635,10 @@ static void init_gro_hash(struct napi_struct *napi) napi->gro_bitmask = 0; } -int napi_set_threaded(struct napi_struct *napi, bool threaded) +int napi_set_threaded(struct napi_struct *napi, + enum napi_threaded_state threaded) { + unsigned long val; if (napi->dev->threaded) return -EINVAL; @@ -6649,30 +6653,41 @@ int napi_set_threaded(struct napi_struct *napi, bool threaded) /* Make sure kthread is created before THREADED bit is set. */ smp_mb__before_atomic(); - assign_bit(NAPI_STATE_THREADED, &napi->state, threaded); + val = 0; + if (threaded == NAPI_THREADED_BUSY_POLL) + val |= NAPIF_STATE_THREADED_BUSY_POLL; + if (threaded) + val |= NAPIF_STATE_THREADED; + set_mask_bits(&napi->state, NAPIF_STATE_THREADED_BUSY_POLL_MASK, val); return 0; } -int dev_set_threaded(struct net_device *dev, bool threaded) +int dev_set_threaded(struct net_device *dev, enum napi_threaded_state threaded) { struct napi_struct *napi; + unsigned long val; int err = 0; if (dev->threaded == threaded) return 0; + val = 0; if (threaded) { /* Check if threaded is set at napi level already */ list_for_each_entry(napi, &dev->napi_list, dev_list) if (test_bit(NAPI_STATE_THREADED, &napi->state)) return -EINVAL; + val |= NAPIF_STATE_THREADED; + if (threaded == NAPI_THREADED_BUSY_POLL) + val |= NAPIF_STATE_THREADED_BUSY_POLL; + list_for_each_entry(napi, &dev->napi_list, dev_list) { if (!napi->thread) { err = napi_kthread_create(napi); if (err) { - threaded = false; + threaded = NAPI_THREADED_OFF; break; } } @@ -6691,9 +6706,13 @@ int dev_set_threaded(struct net_device *dev, bool threaded) * polled. In this case, the switch between threaded mode and * softirq mode will happen in the next round of napi_schedule(). * This should not cause hiccups/stalls to the live traffic. + * + * Switch to busy_poll threaded napi will occur after the threaded + * napi is scheduled. */ list_for_each_entry(napi, &dev->napi_list, dev_list) - assign_bit(NAPI_STATE_THREADED, &napi->state, threaded); + set_mask_bits(&napi->state, + NAPIF_STATE_THREADED_BUSY_POLL_MASK, val); return err; } @@ -7007,7 +7026,7 @@ static int napi_thread_wait(struct napi_struct *napi) return -1; } -static void napi_threaded_poll_loop(struct napi_struct *napi) +static void napi_threaded_poll_loop(struct napi_struct *napi, bool busy_poll) { struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx; struct softnet_data *sd; @@ -7036,22 +7055,53 @@ static void napi_threaded_poll_loop(struct napi_struct *napi) } skb_defer_free_flush(sd); bpf_net_ctx_clear(bpf_net_ctx); + + /* Push the skbs up the stack if busy polling. */ + if (busy_poll) + __napi_gro_flush_helper(napi); local_bh_enable(); - if (!repoll) + /* If busy polling then do not break here because we need to + * call cond_resched and rcu_softirq_qs_periodic to prevent + * watchdog warnings. + */ + if (!repoll && !busy_poll) break; rcu_softirq_qs_periodic(last_qs); cond_resched(); + + if (!repoll) + break; } } static int napi_threaded_poll(void *data) { struct napi_struct *napi = data; + bool busy_poll_sched; + unsigned long val; + bool busy_poll; + + while (!napi_thread_wait(napi)) { + /* Once woken up, this means that we are scheduled as threaded + * napi and this thread owns the napi context, if busy poll + * state is set then we busy poll this napi. + */ + val = READ_ONCE(napi->state); + busy_poll = val & NAPIF_STATE_THREADED_BUSY_POLL; + busy_poll_sched = val & NAPIF_STATE_SCHED_THREADED_BUSY_POLL; + + /* Do not busy poll if napi is disabled. */ + if (unlikely(val & NAPIF_STATE_DISABLE)) + busy_poll = false; + + if (busy_poll != busy_poll_sched) + assign_bit(NAPI_STATE_SCHED_THREADED_BUSY_POLL, + &napi->state, busy_poll); - while (!napi_thread_wait(napi)) - napi_threaded_poll_loop(napi); + napi_threaded_poll_loop(napi, busy_poll); + } return 0; } @@ -12205,7 +12255,7 @@ static void run_backlog_napi(unsigned int cpu) { struct softnet_data *sd = per_cpu_ptr(&softnet_data, cpu); - napi_threaded_poll_loop(&sd->backlog); + napi_threaded_poll_loop(&sd->backlog, false); } static void backlog_napi_setup(unsigned int cpu) diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c index 2d9afc6e2161..36d0a22e341c 100644 --- a/net/core/net-sysfs.c +++ b/net/core/net-sysfs.c @@ -626,7 +626,7 @@ static int modify_napi_threaded(struct net_device *dev, unsigned long val) if (list_empty(&dev->napi_list)) return -EOPNOTSUPP; - if (val != 0 && val != 1) + if (val > NAPI_THREADED_MAX) return -EOPNOTSUPP; ret = dev_set_threaded(dev, val); diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c index 93dc74dad6de..4086d2577dcc 100644 --- a/net/core/netdev-genl-gen.c +++ b/net/core/netdev-genl-gen.c @@ -102,7 +102,7 @@ static const struct nla_policy netdev_napi_set_nl_policy[NETDEV_A_NAPI_IRQ_SUSPE /* NETDEV_CMD_NAPI_SET_THREADED - do */ static const struct nla_policy netdev_napi_set_threaded_nl_policy[NETDEV_A_NAPI_THREADED + 1] = { [NETDEV_A_NAPI_ID] = { .type = NLA_U32, }, - [NETDEV_A_NAPI_THREADED] = NLA_POLICY_MAX(NLA_U32, 1), + [NETDEV_A_NAPI_THREADED] = NLA_POLICY_MAX(NLA_U32, 2), }; /* Ops table for netdev */