diff mbox series

[net-next,3/3] Extend napi threaded polling to allow kthread based busy polling

Message ID 20250102191227.2084046-4-skhawaja@google.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series Add support to do threaded napi busy poll | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; GEN HAS DIFF 2 files changed, 80 insertions(+);
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 40 this patch: 40
netdev/build_tools success Errors and warnings before: 0 (+23) this patch: 0 (+23)
netdev/cc_maintainers warning 5 maintainers not CCed: horms@kernel.org andrew+netdev@lunn.ch willemb@google.com chris.snook@gmail.com donald.hunter@gmail.com
netdev/build_clang fail Errors and warnings before: 6615 this patch: 4691
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn fail Errors and warnings before: 4100 this patch: 3257
netdev/checkpatch warning CHECK: Assignment operator '=' should be on the previous line WARNING: line length of 85 exceeds 80 columns WARNING: line length of 92 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 109 this patch: 109
netdev/source_inline success Was 0 now: 0

Commit Message

Samiullah Khawaja Jan. 2, 2025, 7:12 p.m. UTC
Add a new state to napi state enum:

- STATE_THREADED_BUSY_POLL
  Threaded busy poll is enabled/running for this napi.

Following changes are introduced in the napi scheduling and state logic:

- When threaded busy poll is enabled through sysfs it also enables
  NAPI_STATE_THREADED so a kthread is created per napi. It also sets
  NAPI_STATE_THREADED_BUSY_POLL bit on each napi to indicate that we are
  supposed to busy poll for each napi.

- When napi is scheduled with STATE_SCHED_THREADED and associated
  kthread is woken up, the kthread owns the context. If
  NAPI_STATE_THREADED_BUSY_POLL and NAPI_SCHED_THREADED both are set
  then it means that we can busy poll.

- To keep busy polling and to avoid scheduling of the interrupts, the
  napi_complete_done returns false when both SCHED_THREADED and
  THREADED_BUSY_POLL flags are set. Also napi_complete_done returns
  early to avoid the STATE_SCHED_THREADED being unset.

- If at any point STATE_THREADED_BUSY_POLL is unset, the
  napi_complete_done will run and unset the SCHED_THREADED bit also.
  This will make the associated kthread go to sleep as per existing
  logic.

Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
---
 Documentation/ABI/testing/sysfs-class-net     |  3 +-
 Documentation/netlink/specs/netdev.yaml       |  5 +-
 .../net/ethernet/atheros/atl1c/atl1c_main.c   |  2 +-
 include/linux/netdevice.h                     | 24 +++++--
 net/core/dev.c                                | 72 ++++++++++++++++---
 net/core/net-sysfs.c                          |  2 +-
 net/core/netdev-genl-gen.c                    |  2 +-
 7 files changed, 89 insertions(+), 21 deletions(-)

Comments

Samudrala, Sridhar Jan. 2, 2025, 9:16 p.m. UTC | #1
On 1/2/2025 11:12 AM, Samiullah Khawaja wrote:
> Add a new state to napi state enum:
> 
> - STATE_THREADED_BUSY_POLL
>    Threaded busy poll is enabled/running for this napi.
> 
> Following changes are introduced in the napi scheduling and state logic:
> 
> - When threaded busy poll is enabled through sysfs it also enables
>    NAPI_STATE_THREADED so a kthread is created per napi. It also sets
>    NAPI_STATE_THREADED_BUSY_POLL bit on each napi to indicate that we are
>    supposed to busy poll for each napi.

Looks like this patch is changing the sysfs 'threaded' field from 
boolean to an integer and value 2 is used to indicate threaded mode with 
busypoll.
So I think the above comment should reflect that instead of just saying 
enabled for both threaded and busypoll.

> 
> - When napi is scheduled with STATE_SCHED_THREADED and associated
>    kthread is woken up, the kthread owns the context. If
>    NAPI_STATE_THREADED_BUSY_POLL and NAPI_SCHED_THREADED both are set
>    then it means that we can busy poll.
> 
> - To keep busy polling and to avoid scheduling of the interrupts, the
>    napi_complete_done returns false when both SCHED_THREADED and
>    THREADED_BUSY_POLL flags are set. Also napi_complete_done returns
>    early to avoid the STATE_SCHED_THREADED being unset.
> 
> - If at any point STATE_THREADED_BUSY_POLL is unset, the
>    napi_complete_done will run and unset the SCHED_THREADED bit also.
>    This will make the associated kthread go to sleep as per existing
>    logic.

When does STATE_THREADED_BUSY_POLL get unset? Don't we need a timeout 
value to come out of busypoll mode if there is no traffic?

> 
> Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
> Reviewed-by: Willem de Bruijn <willemb@google.com>
> ---
>   Documentation/ABI/testing/sysfs-class-net     |  3 +-
>   Documentation/netlink/specs/netdev.yaml       |  5 +-
>   .../net/ethernet/atheros/atl1c/atl1c_main.c   |  2 +-
>   include/linux/netdevice.h                     | 24 +++++--
>   net/core/dev.c                                | 72 ++++++++++++++++---
>   net/core/net-sysfs.c                          |  2 +-
>   net/core/netdev-genl-gen.c                    |  2 +-
>   7 files changed, 89 insertions(+), 21 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-class-net b/Documentation/ABI/testing/sysfs-class-net
> index ebf21beba846..15d7d36a8294 100644
> --- a/Documentation/ABI/testing/sysfs-class-net
> +++ b/Documentation/ABI/testing/sysfs-class-net
> @@ -343,7 +343,7 @@ Date:		Jan 2021
>   KernelVersion:	5.12
>   Contact:	netdev@vger.kernel.org
>   Description:
> -		Boolean value to control the threaded mode per device. User could
> +		Integer value to control the threaded mode per device. User could
>   		set this value to enable/disable threaded mode for all napi
>   		belonging to this device, without the need to do device up/down.
>   
> @@ -351,4 +351,5 @@ Description:
>   		== ==================================
>   		0  threaded mode disabled for this dev
>   		1  threaded mode enabled for this dev
> +		2  threaded mode enabled, and busy polling enabled.
>   		== ==================================
> diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
> index aac343af7246..9c905243a1cc 100644
> --- a/Documentation/netlink/specs/netdev.yaml
> +++ b/Documentation/netlink/specs/netdev.yaml
> @@ -272,10 +272,11 @@ attribute-sets:
>           name: threaded
>           doc: Whether the napi is configured to operate in threaded polling
>                mode. If this is set to `1` then the NAPI context operates
> -             in threaded polling mode.
> +             in threaded polling mode. If this is set to `2` then the NAPI
> +             kthread also does busypolling.
>           type: u32
>           checks:
> -          max: 1
> +          max: 2
>     -
>       name: queue
>       attributes:
> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> index c571614b1d50..a709cddcd292 100644
> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> @@ -2688,7 +2688,7 @@ static int atl1c_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>   	adapter->mii.mdio_write = atl1c_mdio_write;
>   	adapter->mii.phy_id_mask = 0x1f;
>   	adapter->mii.reg_num_mask = MDIO_CTRL_REG_MASK;
> -	dev_set_threaded(netdev, true);
> +	dev_set_threaded(netdev, DEV_NAPI_THREADED);
>   	for (i = 0; i < adapter->rx_queue_count; ++i)
>   		netif_napi_add(netdev, &adapter->rrd_ring[i].napi,
>   			       atl1c_clean_rx);
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 8f531d528869..c384ffe0976e 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -407,6 +407,8 @@ enum {
>   	NAPI_STATE_PREFER_BUSY_POLL,	/* prefer busy-polling over softirq processing*/
>   	NAPI_STATE_THREADED,		/* The poll is performed inside its own thread*/
>   	NAPI_STATE_SCHED_THREADED,	/* Napi is currently scheduled in threaded mode */
> +	NAPI_STATE_THREADED_BUSY_POLL,	/* The threaded napi poller will busy poll */
> +	NAPI_STATE_SCHED_THREADED_BUSY_POLL,  /* The threaded napi poller is busy polling */
>   };
>   
>   enum {
> @@ -420,8 +422,14 @@ enum {
>   	NAPIF_STATE_PREFER_BUSY_POLL	= BIT(NAPI_STATE_PREFER_BUSY_POLL),
>   	NAPIF_STATE_THREADED		= BIT(NAPI_STATE_THREADED),
>   	NAPIF_STATE_SCHED_THREADED	= BIT(NAPI_STATE_SCHED_THREADED),
> +	NAPIF_STATE_THREADED_BUSY_POLL	= BIT(NAPI_STATE_THREADED_BUSY_POLL),
> +	NAPIF_STATE_SCHED_THREADED_BUSY_POLL
> +				= BIT(NAPI_STATE_SCHED_THREADED_BUSY_POLL),
>   };
>   
> +#define NAPIF_STATE_THREADED_BUSY_POLL_MASK \
> +	(NAPIF_STATE_THREADED | NAPIF_STATE_THREADED_BUSY_POLL)
> +
>   enum gro_result {
>   	GRO_MERGED,
>   	GRO_MERGED_FREE,
> @@ -568,16 +576,24 @@ static inline bool napi_complete(struct napi_struct *n)
>   	return napi_complete_done(n, 0);
>   }
>   
> -int dev_set_threaded(struct net_device *dev, bool threaded);
> +enum napi_threaded_state {
> +	NAPI_THREADED_OFF = 0,
> +	NAPI_THREADED = 1,
> +	NAPI_THREADED_BUSY_POLL = 2,
> +	NAPI_THREADED_MAX = NAPI_THREADED_BUSY_POLL,
> +};
> +
> +int dev_set_threaded(struct net_device *dev, enum napi_threaded_state threaded);
>   
>   /*
>    * napi_set_threaded - set napi threaded state
>    * @napi: NAPI context
> - * @threaded: whether this napi does threaded polling
> + * @threaded: threading mode
>    *
>    * Return 0 on success and negative errno on failure.
>    */
> -int napi_set_threaded(struct napi_struct *napi, bool threaded);
> +int napi_set_threaded(struct napi_struct *napi,
> +		      enum napi_threaded_state threaded);
>   
>   /**
>    *	napi_disable - prevent NAPI from scheduling
> @@ -2406,7 +2422,7 @@ struct net_device {
>   	struct sfp_bus		*sfp_bus;
>   	struct lock_class_key	*qdisc_tx_busylock;
>   	bool			proto_down;
> -	bool			threaded;
> +	u8			threaded;
>   
>   	/* priv_flags_slow, ungrouped to save space */
>   	unsigned long		see_all_hwtstamp_requests:1;
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 762977a62da2..b6cd9474bdd3 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -78,6 +78,7 @@
>   #include <linux/slab.h>
>   #include <linux/sched.h>
>   #include <linux/sched/isolation.h>
> +#include <linux/sched/types.h>
>   #include <linux/sched/mm.h>
>   #include <linux/smpboot.h>
>   #include <linux/mutex.h>
> @@ -6231,7 +6232,8 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
>   	 *    the guarantee we will be called later.
>   	 */
>   	if (unlikely(n->state & (NAPIF_STATE_NPSVC |
> -				 NAPIF_STATE_IN_BUSY_POLL)))
> +				 NAPIF_STATE_IN_BUSY_POLL |
> +				 NAPIF_STATE_SCHED_THREADED_BUSY_POLL)))
>   		return false;
>   
>   	if (work_done) {
> @@ -6633,8 +6635,10 @@ static void init_gro_hash(struct napi_struct *napi)
>   	napi->gro_bitmask = 0;
>   }
>   
> -int napi_set_threaded(struct napi_struct *napi, bool threaded)
> +int napi_set_threaded(struct napi_struct *napi,
> +		      enum napi_threaded_state threaded)
>   {
> +	unsigned long val;
>   	if (napi->dev->threaded)
>   		return -EINVAL;
>   
> @@ -6649,30 +6653,41 @@ int napi_set_threaded(struct napi_struct *napi, bool threaded)
>   
>   	/* Make sure kthread is created before THREADED bit is set. */
>   	smp_mb__before_atomic();
> -	assign_bit(NAPI_STATE_THREADED, &napi->state, threaded);
> +	val = 0;
> +	if (threaded == NAPI_THREADED_BUSY_POLL)
> +		val |= NAPIF_STATE_THREADED_BUSY_POLL;
> +	if (threaded)
> +		val |= NAPIF_STATE_THREADED;
> +	set_mask_bits(&napi->state, NAPIF_STATE_THREADED_BUSY_POLL_MASK, val);
>   
>   	return 0;
>   }
>   
> -int dev_set_threaded(struct net_device *dev, bool threaded)
> +int dev_set_threaded(struct net_device *dev, enum napi_threaded_state threaded)
>   {
>   	struct napi_struct *napi;
> +	unsigned long val;
>   	int err = 0;
>   
>   	if (dev->threaded == threaded)
>   		return 0;
>   
> +	val = 0;
>   	if (threaded) {
>   		/* Check if threaded is set at napi level already */
>   		list_for_each_entry(napi, &dev->napi_list, dev_list)
>   			if (test_bit(NAPI_STATE_THREADED, &napi->state))
>   				return -EINVAL;
>   
> +		val |= NAPIF_STATE_THREADED;
> +		if (threaded == NAPI_THREADED_BUSY_POLL)
> +			val |= NAPIF_STATE_THREADED_BUSY_POLL;
> +
>   		list_for_each_entry(napi, &dev->napi_list, dev_list) {
>   			if (!napi->thread) {
>   				err = napi_kthread_create(napi);
>   				if (err) {
> -					threaded = false;
> +					threaded = NAPI_THREADED_OFF;
>   					break;
>   				}
>   			}
> @@ -6691,9 +6706,13 @@ int dev_set_threaded(struct net_device *dev, bool threaded)
>   	 * polled. In this case, the switch between threaded mode and
>   	 * softirq mode will happen in the next round of napi_schedule().
>   	 * This should not cause hiccups/stalls to the live traffic.
> +	 *
> +	 * Switch to busy_poll threaded napi will occur after the threaded
> +	 * napi is scheduled.
>   	 */
>   	list_for_each_entry(napi, &dev->napi_list, dev_list)
> -		assign_bit(NAPI_STATE_THREADED, &napi->state, threaded);
> +		set_mask_bits(&napi->state,
> +			      NAPIF_STATE_THREADED_BUSY_POLL_MASK, val);
>   
>   	return err;
>   }
> @@ -7007,7 +7026,7 @@ static int napi_thread_wait(struct napi_struct *napi)
>   	return -1;
>   }
>   
> -static void napi_threaded_poll_loop(struct napi_struct *napi)
> +static void napi_threaded_poll_loop(struct napi_struct *napi, bool busy_poll)
>   {
>   	struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
>   	struct softnet_data *sd;
> @@ -7036,22 +7055,53 @@ static void napi_threaded_poll_loop(struct napi_struct *napi)
>   		}
>   		skb_defer_free_flush(sd);
>   		bpf_net_ctx_clear(bpf_net_ctx);
> +
> +		/* Push the skbs up the stack if busy polling. */
> +		if (busy_poll)
> +			__napi_gro_flush_helper(napi);
>   		local_bh_enable();
>   
> -		if (!repoll)
> +		/* If busy polling then do not break here because we need to
> +		 * call cond_resched and rcu_softirq_qs_periodic to prevent
> +		 * watchdog warnings.
> +		 */
> +		if (!repoll && !busy_poll)
>   			break;
>   
>   		rcu_softirq_qs_periodic(last_qs);
>   		cond_resched();
> +
> +		if (!repoll)
> +			break;
>   	}
>   }
>   
>   static int napi_threaded_poll(void *data)
>   {
>   	struct napi_struct *napi = data;
> +	bool busy_poll_sched;
> +	unsigned long val;
> +	bool busy_poll;
> +
> +	while (!napi_thread_wait(napi)) {
> +		/* Once woken up, this means that we are scheduled as threaded
> +		 * napi and this thread owns the napi context, if busy poll
> +		 * state is set then we busy poll this napi.
> +		 */
> +		val = READ_ONCE(napi->state);
> +		busy_poll = val & NAPIF_STATE_THREADED_BUSY_POLL;
> +		busy_poll_sched = val & NAPIF_STATE_SCHED_THREADED_BUSY_POLL;
> +
> +		/* Do not busy poll if napi is disabled. */
> +		if (unlikely(val & NAPIF_STATE_DISABLE))
> +			busy_poll = false;
> +
> +		if (busy_poll != busy_poll_sched)
> +			assign_bit(NAPI_STATE_SCHED_THREADED_BUSY_POLL,
> +				   &napi->state, busy_poll);
>   
> -	while (!napi_thread_wait(napi))
> -		napi_threaded_poll_loop(napi);
> +		napi_threaded_poll_loop(napi, busy_poll);
> +	}
>   
>   	return 0;
>   }
> @@ -12205,7 +12255,7 @@ static void run_backlog_napi(unsigned int cpu)
>   {
>   	struct softnet_data *sd = per_cpu_ptr(&softnet_data, cpu);
>   
> -	napi_threaded_poll_loop(&sd->backlog);
> +	napi_threaded_poll_loop(&sd->backlog, false);
>   }
>   
>   static void backlog_napi_setup(unsigned int cpu)
> diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> index 2d9afc6e2161..36d0a22e341c 100644
> --- a/net/core/net-sysfs.c
> +++ b/net/core/net-sysfs.c
> @@ -626,7 +626,7 @@ static int modify_napi_threaded(struct net_device *dev, unsigned long val)
>   	if (list_empty(&dev->napi_list))
>   		return -EOPNOTSUPP;
>   
> -	if (val != 0 && val != 1)
> +	if (val > NAPI_THREADED_MAX)
>   		return -EOPNOTSUPP;
>   
>   	ret = dev_set_threaded(dev, val);
> diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c
> index 93dc74dad6de..4086d2577dcc 100644
> --- a/net/core/netdev-genl-gen.c
> +++ b/net/core/netdev-genl-gen.c
> @@ -102,7 +102,7 @@ static const struct nla_policy netdev_napi_set_nl_policy[NETDEV_A_NAPI_IRQ_SUSPE
>   /* NETDEV_CMD_NAPI_SET_THREADED - do */
>   static const struct nla_policy netdev_napi_set_threaded_nl_policy[NETDEV_A_NAPI_THREADED + 1] = {
>   	[NETDEV_A_NAPI_ID] = { .type = NLA_U32, },
> -	[NETDEV_A_NAPI_THREADED] = NLA_POLICY_MAX(NLA_U32, 1),
> +	[NETDEV_A_NAPI_THREADED] = NLA_POLICY_MAX(NLA_U32, 2),
>   };
>   
>   /* Ops table for netdev */
Stanislav Fomichev Jan. 2, 2025, 9:28 p.m. UTC | #2
On 01/02, Samiullah Khawaja wrote:
> Add a new state to napi state enum:
> 
> - STATE_THREADED_BUSY_POLL
>   Threaded busy poll is enabled/running for this napi.
> 
> Following changes are introduced in the napi scheduling and state logic:
> 
> - When threaded busy poll is enabled through sysfs it also enables
>   NAPI_STATE_THREADED so a kthread is created per napi. It also sets
>   NAPI_STATE_THREADED_BUSY_POLL bit on each napi to indicate that we are
>   supposed to busy poll for each napi.
> 
> - When napi is scheduled with STATE_SCHED_THREADED and associated
>   kthread is woken up, the kthread owns the context. If
>   NAPI_STATE_THREADED_BUSY_POLL and NAPI_SCHED_THREADED both are set
>   then it means that we can busy poll.
> 
> - To keep busy polling and to avoid scheduling of the interrupts, the
>   napi_complete_done returns false when both SCHED_THREADED and
>   THREADED_BUSY_POLL flags are set. Also napi_complete_done returns
>   early to avoid the STATE_SCHED_THREADED being unset.
> 
> - If at any point STATE_THREADED_BUSY_POLL is unset, the
>   napi_complete_done will run and unset the SCHED_THREADED bit also.
>   This will make the associated kthread go to sleep as per existing
>   logic.
> 
> Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
> Reviewed-by: Willem de Bruijn <willemb@google.com>
> ---
>  Documentation/ABI/testing/sysfs-class-net     |  3 +-
>  Documentation/netlink/specs/netdev.yaml       |  5 +-
>  .../net/ethernet/atheros/atl1c/atl1c_main.c   |  2 +-
>  include/linux/netdevice.h                     | 24 +++++--
>  net/core/dev.c                                | 72 ++++++++++++++++---
>  net/core/net-sysfs.c                          |  2 +-
>  net/core/netdev-genl-gen.c                    |  2 +-
>  7 files changed, 89 insertions(+), 21 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-class-net b/Documentation/ABI/testing/sysfs-class-net
> index ebf21beba846..15d7d36a8294 100644
> --- a/Documentation/ABI/testing/sysfs-class-net
> +++ b/Documentation/ABI/testing/sysfs-class-net
> @@ -343,7 +343,7 @@ Date:		Jan 2021
>  KernelVersion:	5.12
>  Contact:	netdev@vger.kernel.org
>  Description:
> -		Boolean value to control the threaded mode per device. User could
> +		Integer value to control the threaded mode per device. User could
>  		set this value to enable/disable threaded mode for all napi
>  		belonging to this device, without the need to do device up/down.
>  
> @@ -351,4 +351,5 @@ Description:
>  		== ==================================
>  		0  threaded mode disabled for this dev
>  		1  threaded mode enabled for this dev
> +		2  threaded mode enabled, and busy polling enabled.
>  		== ==================================
> diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
> index aac343af7246..9c905243a1cc 100644
> --- a/Documentation/netlink/specs/netdev.yaml
> +++ b/Documentation/netlink/specs/netdev.yaml
> @@ -272,10 +272,11 @@ attribute-sets:
>          name: threaded
>          doc: Whether the napi is configured to operate in threaded polling
>               mode. If this is set to `1` then the NAPI context operates
> -             in threaded polling mode.
> +             in threaded polling mode. If this is set to `2` then the NAPI
> +             kthread also does busypolling.
>          type: u32
>          checks:
> -          max: 1
> +          max: 2
>    -

I'd vote for a separate threaded-busy-poll parameter (and separate doc)
instead of overloading 'threaded' bool. But if you prefer to
have a single argument, let's at least change it to enum with proper
values for busy and non-busy modes instead of magic numbers?
kernel test robot Jan. 3, 2025, 1:05 a.m. UTC | #3
Hi Samiullah,

kernel test robot noticed the following build errors:

[auto build test ERROR on net-next/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Samiullah-Khawaja/Add-support-to-set-napi-threaded-for-individual-napi/20250103-031428
base:   net-next/main
patch link:    https://lore.kernel.org/r/20250102191227.2084046-4-skhawaja%40google.com
patch subject: [PATCH net-next 3/3] Extend napi threaded polling to allow kthread based busy polling
config: i386-buildonly-randconfig-006-20250103 (https://download.01.org/0day-ci/archive/20250103/202501030842.OdBE8ADq-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250103/202501030842.OdBE8ADq-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501030842.OdBE8ADq-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/net/ethernet/atheros/atl1c/atl1c_main.c: In function 'atl1c_probe':
>> drivers/net/ethernet/atheros/atl1c/atl1c_main.c:2691:34: error: 'DEV_NAPI_THREADED' undeclared (first use in this function); did you mean 'NAPI_THREADED'?
    2691 |         dev_set_threaded(netdev, DEV_NAPI_THREADED);
         |                                  ^~~~~~~~~~~~~~~~~
         |                                  NAPI_THREADED
   drivers/net/ethernet/atheros/atl1c/atl1c_main.c:2691:34: note: each undeclared identifier is reported only once for each function it appears in


vim +2691 drivers/net/ethernet/atheros/atl1c/atl1c_main.c

  2600	
  2601	/**
  2602	 * atl1c_probe - Device Initialization Routine
  2603	 * @pdev: PCI device information struct
  2604	 * @ent: entry in atl1c_pci_tbl
  2605	 *
  2606	 * Returns 0 on success, negative on failure
  2607	 *
  2608	 * atl1c_probe initializes an adapter identified by a pci_dev structure.
  2609	 * The OS initialization, configuring of the adapter private structure,
  2610	 * and a hardware reset occur.
  2611	 */
  2612	static int atl1c_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
  2613	{
  2614		struct net_device *netdev;
  2615		struct atl1c_adapter *adapter;
  2616		static int cards_found;
  2617		u8 __iomem *hw_addr;
  2618		enum atl1c_nic_type nic_type;
  2619		u32 queue_count = 1;
  2620		int err = 0;
  2621		int i;
  2622	
  2623		/* enable device (incl. PCI PM wakeup and hotplug setup) */
  2624		err = pci_enable_device_mem(pdev);
  2625		if (err)
  2626			return dev_err_probe(&pdev->dev, err, "cannot enable PCI device\n");
  2627	
  2628		/*
  2629		 * The atl1c chip can DMA to 64-bit addresses, but it uses a single
  2630		 * shared register for the high 32 bits, so only a single, aligned,
  2631		 * 4 GB physical address range can be used at a time.
  2632		 *
  2633		 * Supporting 64-bit DMA on this hardware is more trouble than it's
  2634		 * worth.  It is far easier to limit to 32-bit DMA than update
  2635		 * various kernel subsystems to support the mechanics required by a
  2636		 * fixed-high-32-bit system.
  2637		 */
  2638		err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
  2639		if (err) {
  2640			dev_err(&pdev->dev, "No usable DMA configuration,aborting\n");
  2641			goto err_dma;
  2642		}
  2643	
  2644		err = pci_request_regions(pdev, atl1c_driver_name);
  2645		if (err) {
  2646			dev_err(&pdev->dev, "cannot obtain PCI resources\n");
  2647			goto err_pci_reg;
  2648		}
  2649	
  2650		pci_set_master(pdev);
  2651	
  2652		hw_addr = pci_ioremap_bar(pdev, 0);
  2653		if (!hw_addr) {
  2654			err = -EIO;
  2655			dev_err(&pdev->dev, "cannot map device registers\n");
  2656			goto err_ioremap;
  2657		}
  2658	
  2659		nic_type = atl1c_get_mac_type(pdev, hw_addr);
  2660		if (nic_type == athr_mt)
  2661			queue_count = 4;
  2662	
  2663		netdev = alloc_etherdev_mq(sizeof(struct atl1c_adapter), queue_count);
  2664		if (netdev == NULL) {
  2665			err = -ENOMEM;
  2666			goto err_alloc_etherdev;
  2667		}
  2668	
  2669		err = atl1c_init_netdev(netdev, pdev);
  2670		if (err) {
  2671			dev_err(&pdev->dev, "init netdevice failed\n");
  2672			goto err_init_netdev;
  2673		}
  2674		adapter = netdev_priv(netdev);
  2675		adapter->bd_number = cards_found;
  2676		adapter->netdev = netdev;
  2677		adapter->pdev = pdev;
  2678		adapter->hw.adapter = adapter;
  2679		adapter->hw.nic_type = nic_type;
  2680		adapter->msg_enable = netif_msg_init(-1, atl1c_default_msg);
  2681		adapter->hw.hw_addr = hw_addr;
  2682		adapter->tx_queue_count = queue_count;
  2683		adapter->rx_queue_count = queue_count;
  2684	
  2685		/* init mii data */
  2686		adapter->mii.dev = netdev;
  2687		adapter->mii.mdio_read  = atl1c_mdio_read;
  2688		adapter->mii.mdio_write = atl1c_mdio_write;
  2689		adapter->mii.phy_id_mask = 0x1f;
  2690		adapter->mii.reg_num_mask = MDIO_CTRL_REG_MASK;
> 2691		dev_set_threaded(netdev, DEV_NAPI_THREADED);
  2692		for (i = 0; i < adapter->rx_queue_count; ++i)
  2693			netif_napi_add(netdev, &adapter->rrd_ring[i].napi,
  2694				       atl1c_clean_rx);
  2695		for (i = 0; i < adapter->tx_queue_count; ++i)
  2696			netif_napi_add_tx(netdev, &adapter->tpd_ring[i].napi,
  2697					  atl1c_clean_tx);
  2698		timer_setup(&adapter->phy_config_timer, atl1c_phy_config, 0);
  2699		/* setup the private structure */
  2700		err = atl1c_sw_init(adapter);
  2701		if (err) {
  2702			dev_err(&pdev->dev, "net device private data init failed\n");
  2703			goto err_sw_init;
  2704		}
  2705		/* set max MTU */
  2706		atl1c_set_max_mtu(netdev);
  2707	
  2708		atl1c_reset_pcie(&adapter->hw, ATL1C_PCIE_L0S_L1_DISABLE);
  2709	
  2710		/* Init GPHY as early as possible due to power saving issue  */
  2711		atl1c_phy_reset(&adapter->hw);
  2712	
  2713		err = atl1c_reset_mac(&adapter->hw);
  2714		if (err) {
  2715			err = -EIO;
  2716			goto err_reset;
  2717		}
  2718	
  2719		/* reset the controller to
  2720		 * put the device in a known good starting state */
  2721		err = atl1c_phy_init(&adapter->hw);
  2722		if (err) {
  2723			err = -EIO;
  2724			goto err_reset;
  2725		}
  2726		if (atl1c_read_mac_addr(&adapter->hw)) {
  2727			/* got a random MAC address, set NET_ADDR_RANDOM to netdev */
  2728			netdev->addr_assign_type = NET_ADDR_RANDOM;
  2729		}
  2730		eth_hw_addr_set(netdev, adapter->hw.mac_addr);
  2731		if (netif_msg_probe(adapter))
  2732			dev_dbg(&pdev->dev, "mac address : %pM\n",
  2733				adapter->hw.mac_addr);
  2734	
  2735		atl1c_hw_set_mac_addr(&adapter->hw, adapter->hw.mac_addr);
  2736		INIT_WORK(&adapter->common_task, atl1c_common_task);
  2737		adapter->work_event = 0;
  2738		err = register_netdev(netdev);
  2739		if (err) {
  2740			dev_err(&pdev->dev, "register netdevice failed\n");
  2741			goto err_register;
  2742		}
  2743	
  2744		cards_found++;
  2745		return 0;
  2746	
  2747	err_reset:
  2748	err_register:
  2749	err_sw_init:
  2750	err_init_netdev:
  2751		free_netdev(netdev);
  2752	err_alloc_etherdev:
  2753		iounmap(hw_addr);
  2754	err_ioremap:
  2755		pci_release_regions(pdev);
  2756	err_pci_reg:
  2757	err_dma:
  2758		pci_disable_device(pdev);
  2759		return err;
  2760	}
  2761
kernel test robot Jan. 3, 2025, 8:11 a.m. UTC | #4
Hi Samiullah,

kernel test robot noticed the following build errors:

[auto build test ERROR on net-next/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Samiullah-Khawaja/Add-support-to-set-napi-threaded-for-individual-napi/20250103-031428
base:   net-next/main
patch link:    https://lore.kernel.org/r/20250102191227.2084046-4-skhawaja%40google.com
patch subject: [PATCH net-next 3/3] Extend napi threaded polling to allow kthread based busy polling
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20250103/202501031537.QXSNLahs-lkp@intel.com/config)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250103/202501031537.QXSNLahs-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501031537.QXSNLahs-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from drivers/net/ethernet/atheros/atl1c/atl1c_main.c:9:
   In file included from drivers/net/ethernet/atheros/atl1c/atl1c.h:16:
   In file included from include/linux/pci.h:1658:
   In file included from include/linux/dmapool.h:14:
   In file included from include/linux/scatterlist.h:8:
   In file included from include/linux/mm.h:2223:
   include/linux/vmstat.h:504:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     504 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     505 |                            item];
         |                            ~~~~
   include/linux/vmstat.h:511:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     511 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     512 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
   include/linux/vmstat.h:518:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     518 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   include/linux/vmstat.h:524:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
     524 |         return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~ ^
     525 |                            NR_VM_NUMA_EVENT_ITEMS +
         |                            ~~~~~~~~~~~~~~~~~~~~~~
>> drivers/net/ethernet/atheros/atl1c/atl1c_main.c:2691:27: error: use of undeclared identifier 'DEV_NAPI_THREADED'; did you mean 'NAPI_THREADED'?
    2691 |         dev_set_threaded(netdev, DEV_NAPI_THREADED);
         |                                  ^~~~~~~~~~~~~~~~~
         |                                  NAPI_THREADED
   include/linux/netdevice.h:581:2: note: 'NAPI_THREADED' declared here
     581 |         NAPI_THREADED = 1,
         |         ^
   4 warnings and 1 error generated.


vim +2691 drivers/net/ethernet/atheros/atl1c/atl1c_main.c

  2600	
  2601	/**
  2602	 * atl1c_probe - Device Initialization Routine
  2603	 * @pdev: PCI device information struct
  2604	 * @ent: entry in atl1c_pci_tbl
  2605	 *
  2606	 * Returns 0 on success, negative on failure
  2607	 *
  2608	 * atl1c_probe initializes an adapter identified by a pci_dev structure.
  2609	 * The OS initialization, configuring of the adapter private structure,
  2610	 * and a hardware reset occur.
  2611	 */
  2612	static int atl1c_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
  2613	{
  2614		struct net_device *netdev;
  2615		struct atl1c_adapter *adapter;
  2616		static int cards_found;
  2617		u8 __iomem *hw_addr;
  2618		enum atl1c_nic_type nic_type;
  2619		u32 queue_count = 1;
  2620		int err = 0;
  2621		int i;
  2622	
  2623		/* enable device (incl. PCI PM wakeup and hotplug setup) */
  2624		err = pci_enable_device_mem(pdev);
  2625		if (err)
  2626			return dev_err_probe(&pdev->dev, err, "cannot enable PCI device\n");
  2627	
  2628		/*
  2629		 * The atl1c chip can DMA to 64-bit addresses, but it uses a single
  2630		 * shared register for the high 32 bits, so only a single, aligned,
  2631		 * 4 GB physical address range can be used at a time.
  2632		 *
  2633		 * Supporting 64-bit DMA on this hardware is more trouble than it's
  2634		 * worth.  It is far easier to limit to 32-bit DMA than update
  2635		 * various kernel subsystems to support the mechanics required by a
  2636		 * fixed-high-32-bit system.
  2637		 */
  2638		err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
  2639		if (err) {
  2640			dev_err(&pdev->dev, "No usable DMA configuration,aborting\n");
  2641			goto err_dma;
  2642		}
  2643	
  2644		err = pci_request_regions(pdev, atl1c_driver_name);
  2645		if (err) {
  2646			dev_err(&pdev->dev, "cannot obtain PCI resources\n");
  2647			goto err_pci_reg;
  2648		}
  2649	
  2650		pci_set_master(pdev);
  2651	
  2652		hw_addr = pci_ioremap_bar(pdev, 0);
  2653		if (!hw_addr) {
  2654			err = -EIO;
  2655			dev_err(&pdev->dev, "cannot map device registers\n");
  2656			goto err_ioremap;
  2657		}
  2658	
  2659		nic_type = atl1c_get_mac_type(pdev, hw_addr);
  2660		if (nic_type == athr_mt)
  2661			queue_count = 4;
  2662	
  2663		netdev = alloc_etherdev_mq(sizeof(struct atl1c_adapter), queue_count);
  2664		if (netdev == NULL) {
  2665			err = -ENOMEM;
  2666			goto err_alloc_etherdev;
  2667		}
  2668	
  2669		err = atl1c_init_netdev(netdev, pdev);
  2670		if (err) {
  2671			dev_err(&pdev->dev, "init netdevice failed\n");
  2672			goto err_init_netdev;
  2673		}
  2674		adapter = netdev_priv(netdev);
  2675		adapter->bd_number = cards_found;
  2676		adapter->netdev = netdev;
  2677		adapter->pdev = pdev;
  2678		adapter->hw.adapter = adapter;
  2679		adapter->hw.nic_type = nic_type;
  2680		adapter->msg_enable = netif_msg_init(-1, atl1c_default_msg);
  2681		adapter->hw.hw_addr = hw_addr;
  2682		adapter->tx_queue_count = queue_count;
  2683		adapter->rx_queue_count = queue_count;
  2684	
  2685		/* init mii data */
  2686		adapter->mii.dev = netdev;
  2687		adapter->mii.mdio_read  = atl1c_mdio_read;
  2688		adapter->mii.mdio_write = atl1c_mdio_write;
  2689		adapter->mii.phy_id_mask = 0x1f;
  2690		adapter->mii.reg_num_mask = MDIO_CTRL_REG_MASK;
> 2691		dev_set_threaded(netdev, DEV_NAPI_THREADED);
  2692		for (i = 0; i < adapter->rx_queue_count; ++i)
  2693			netif_napi_add(netdev, &adapter->rrd_ring[i].napi,
  2694				       atl1c_clean_rx);
  2695		for (i = 0; i < adapter->tx_queue_count; ++i)
  2696			netif_napi_add_tx(netdev, &adapter->tpd_ring[i].napi,
  2697					  atl1c_clean_tx);
  2698		timer_setup(&adapter->phy_config_timer, atl1c_phy_config, 0);
  2699		/* setup the private structure */
  2700		err = atl1c_sw_init(adapter);
  2701		if (err) {
  2702			dev_err(&pdev->dev, "net device private data init failed\n");
  2703			goto err_sw_init;
  2704		}
  2705		/* set max MTU */
  2706		atl1c_set_max_mtu(netdev);
  2707	
  2708		atl1c_reset_pcie(&adapter->hw, ATL1C_PCIE_L0S_L1_DISABLE);
  2709	
  2710		/* Init GPHY as early as possible due to power saving issue  */
  2711		atl1c_phy_reset(&adapter->hw);
  2712	
  2713		err = atl1c_reset_mac(&adapter->hw);
  2714		if (err) {
  2715			err = -EIO;
  2716			goto err_reset;
  2717		}
  2718	
  2719		/* reset the controller to
  2720		 * put the device in a known good starting state */
  2721		err = atl1c_phy_init(&adapter->hw);
  2722		if (err) {
  2723			err = -EIO;
  2724			goto err_reset;
  2725		}
  2726		if (atl1c_read_mac_addr(&adapter->hw)) {
  2727			/* got a random MAC address, set NET_ADDR_RANDOM to netdev */
  2728			netdev->addr_assign_type = NET_ADDR_RANDOM;
  2729		}
  2730		eth_hw_addr_set(netdev, adapter->hw.mac_addr);
  2731		if (netif_msg_probe(adapter))
  2732			dev_dbg(&pdev->dev, "mac address : %pM\n",
  2733				adapter->hw.mac_addr);
  2734	
  2735		atl1c_hw_set_mac_addr(&adapter->hw, adapter->hw.mac_addr);
  2736		INIT_WORK(&adapter->common_task, atl1c_common_task);
  2737		adapter->work_event = 0;
  2738		err = register_netdev(netdev);
  2739		if (err) {
  2740			dev_err(&pdev->dev, "register netdevice failed\n");
  2741			goto err_register;
  2742		}
  2743	
  2744		cards_found++;
  2745		return 0;
  2746	
  2747	err_reset:
  2748	err_register:
  2749	err_sw_init:
  2750	err_init_netdev:
  2751		free_netdev(netdev);
  2752	err_alloc_etherdev:
  2753		iounmap(hw_addr);
  2754	err_ioremap:
  2755		pci_release_regions(pdev);
  2756	err_pci_reg:
  2757	err_dma:
  2758		pci_disable_device(pdev);
  2759		return err;
  2760	}
  2761
kernel test robot Jan. 3, 2025, 8:12 a.m. UTC | #5
Hi Samiullah,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net-next/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Samiullah-Khawaja/Add-support-to-set-napi-threaded-for-individual-napi/20250103-031428
base:   net-next/main
patch link:    https://lore.kernel.org/r/20250102191227.2084046-4-skhawaja%40google.com
patch subject: [PATCH net-next 3/3] Extend napi threaded polling to allow kthread based busy polling
config: x86_64-randconfig-073-20250103 (https://download.01.org/0day-ci/archive/20250103/202501031530.ss0kvHke-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250103/202501031530.ss0kvHke-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501031530.ss0kvHke-lkp@intel.com/

All warnings (new ones prefixed by >>):

   drivers/net/ethernet/mellanox/mlxsw/pci.c: In function 'mlxsw_pci_napi_devs_init':
>> drivers/net/ethernet/mellanox/mlxsw/pci.c:158:50: warning: implicit conversion from 'enum <anonymous>' to 'enum napi_threaded_state' [-Wenum-conversion]
     158 |         dev_set_threaded(mlxsw_pci->napi_dev_rx, true);
         |                                                  ^~~~


vim +158 drivers/net/ethernet/mellanox/mlxsw/pci.c

eda6500a987a02 Jiri Pirko 2015-07-29  140  
5d01ed2e970812 Amit Cohen 2024-04-26  141  static int mlxsw_pci_napi_devs_init(struct mlxsw_pci *mlxsw_pci)
5d01ed2e970812 Amit Cohen 2024-04-26  142  {
5d01ed2e970812 Amit Cohen 2024-04-26  143  	int err;
5d01ed2e970812 Amit Cohen 2024-04-26  144  
5d01ed2e970812 Amit Cohen 2024-04-26  145  	mlxsw_pci->napi_dev_tx = alloc_netdev_dummy(0);
5d01ed2e970812 Amit Cohen 2024-04-26  146  	if (!mlxsw_pci->napi_dev_tx)
5d01ed2e970812 Amit Cohen 2024-04-26  147  		return -ENOMEM;
5d01ed2e970812 Amit Cohen 2024-04-26  148  	strscpy(mlxsw_pci->napi_dev_tx->name, "mlxsw_tx",
5d01ed2e970812 Amit Cohen 2024-04-26  149  		sizeof(mlxsw_pci->napi_dev_tx->name));
5d01ed2e970812 Amit Cohen 2024-04-26  150  
5d01ed2e970812 Amit Cohen 2024-04-26  151  	mlxsw_pci->napi_dev_rx = alloc_netdev_dummy(0);
5d01ed2e970812 Amit Cohen 2024-04-26  152  	if (!mlxsw_pci->napi_dev_rx) {
5d01ed2e970812 Amit Cohen 2024-04-26  153  		err = -ENOMEM;
5d01ed2e970812 Amit Cohen 2024-04-26  154  		goto err_alloc_rx;
5d01ed2e970812 Amit Cohen 2024-04-26  155  	}
5d01ed2e970812 Amit Cohen 2024-04-26  156  	strscpy(mlxsw_pci->napi_dev_rx->name, "mlxsw_rx",
5d01ed2e970812 Amit Cohen 2024-04-26  157  		sizeof(mlxsw_pci->napi_dev_rx->name));
5d01ed2e970812 Amit Cohen 2024-04-26 @158  	dev_set_threaded(mlxsw_pci->napi_dev_rx, true);
5d01ed2e970812 Amit Cohen 2024-04-26  159  
5d01ed2e970812 Amit Cohen 2024-04-26  160  	return 0;
5d01ed2e970812 Amit Cohen 2024-04-26  161  
5d01ed2e970812 Amit Cohen 2024-04-26  162  err_alloc_rx:
5d01ed2e970812 Amit Cohen 2024-04-26  163  	free_netdev(mlxsw_pci->napi_dev_tx);
5d01ed2e970812 Amit Cohen 2024-04-26  164  	return err;
5d01ed2e970812 Amit Cohen 2024-04-26  165  }
5d01ed2e970812 Amit Cohen 2024-04-26  166
diff mbox series

Patch

diff --git a/Documentation/ABI/testing/sysfs-class-net b/Documentation/ABI/testing/sysfs-class-net
index ebf21beba846..15d7d36a8294 100644
--- a/Documentation/ABI/testing/sysfs-class-net
+++ b/Documentation/ABI/testing/sysfs-class-net
@@ -343,7 +343,7 @@  Date:		Jan 2021
 KernelVersion:	5.12
 Contact:	netdev@vger.kernel.org
 Description:
-		Boolean value to control the threaded mode per device. User could
+		Integer value to control the threaded mode per device. User could
 		set this value to enable/disable threaded mode for all napi
 		belonging to this device, without the need to do device up/down.
 
@@ -351,4 +351,5 @@  Description:
 		== ==================================
 		0  threaded mode disabled for this dev
 		1  threaded mode enabled for this dev
+		2  threaded mode enabled, and busy polling enabled.
 		== ==================================
diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index aac343af7246..9c905243a1cc 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -272,10 +272,11 @@  attribute-sets:
         name: threaded
         doc: Whether the napi is configured to operate in threaded polling
              mode. If this is set to `1` then the NAPI context operates
-             in threaded polling mode.
+             in threaded polling mode. If this is set to `2` then the NAPI
+             kthread also does busypolling.
         type: u32
         checks:
-          max: 1
+          max: 2
   -
     name: queue
     attributes:
diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
index c571614b1d50..a709cddcd292 100644
--- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
+++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
@@ -2688,7 +2688,7 @@  static int atl1c_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	adapter->mii.mdio_write = atl1c_mdio_write;
 	adapter->mii.phy_id_mask = 0x1f;
 	adapter->mii.reg_num_mask = MDIO_CTRL_REG_MASK;
-	dev_set_threaded(netdev, true);
+	dev_set_threaded(netdev, DEV_NAPI_THREADED);
 	for (i = 0; i < adapter->rx_queue_count; ++i)
 		netif_napi_add(netdev, &adapter->rrd_ring[i].napi,
 			       atl1c_clean_rx);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8f531d528869..c384ffe0976e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -407,6 +407,8 @@  enum {
 	NAPI_STATE_PREFER_BUSY_POLL,	/* prefer busy-polling over softirq processing*/
 	NAPI_STATE_THREADED,		/* The poll is performed inside its own thread*/
 	NAPI_STATE_SCHED_THREADED,	/* Napi is currently scheduled in threaded mode */
+	NAPI_STATE_THREADED_BUSY_POLL,	/* The threaded napi poller will busy poll */
+	NAPI_STATE_SCHED_THREADED_BUSY_POLL,  /* The threaded napi poller is busy polling */
 };
 
 enum {
@@ -420,8 +422,14 @@  enum {
 	NAPIF_STATE_PREFER_BUSY_POLL	= BIT(NAPI_STATE_PREFER_BUSY_POLL),
 	NAPIF_STATE_THREADED		= BIT(NAPI_STATE_THREADED),
 	NAPIF_STATE_SCHED_THREADED	= BIT(NAPI_STATE_SCHED_THREADED),
+	NAPIF_STATE_THREADED_BUSY_POLL	= BIT(NAPI_STATE_THREADED_BUSY_POLL),
+	NAPIF_STATE_SCHED_THREADED_BUSY_POLL
+				= BIT(NAPI_STATE_SCHED_THREADED_BUSY_POLL),
 };
 
+#define NAPIF_STATE_THREADED_BUSY_POLL_MASK \
+	(NAPIF_STATE_THREADED | NAPIF_STATE_THREADED_BUSY_POLL)
+
 enum gro_result {
 	GRO_MERGED,
 	GRO_MERGED_FREE,
@@ -568,16 +576,24 @@  static inline bool napi_complete(struct napi_struct *n)
 	return napi_complete_done(n, 0);
 }
 
-int dev_set_threaded(struct net_device *dev, bool threaded);
+enum napi_threaded_state {
+	NAPI_THREADED_OFF = 0,
+	NAPI_THREADED = 1,
+	NAPI_THREADED_BUSY_POLL = 2,
+	NAPI_THREADED_MAX = NAPI_THREADED_BUSY_POLL,
+};
+
+int dev_set_threaded(struct net_device *dev, enum napi_threaded_state threaded);
 
 /*
  * napi_set_threaded - set napi threaded state
  * @napi: NAPI context
- * @threaded: whether this napi does threaded polling
+ * @threaded: threading mode
  *
  * Return 0 on success and negative errno on failure.
  */
-int napi_set_threaded(struct napi_struct *napi, bool threaded);
+int napi_set_threaded(struct napi_struct *napi,
+		      enum napi_threaded_state threaded);
 
 /**
  *	napi_disable - prevent NAPI from scheduling
@@ -2406,7 +2422,7 @@  struct net_device {
 	struct sfp_bus		*sfp_bus;
 	struct lock_class_key	*qdisc_tx_busylock;
 	bool			proto_down;
-	bool			threaded;
+	u8			threaded;
 
 	/* priv_flags_slow, ungrouped to save space */
 	unsigned long		see_all_hwtstamp_requests:1;
diff --git a/net/core/dev.c b/net/core/dev.c
index 762977a62da2..b6cd9474bdd3 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -78,6 +78,7 @@ 
 #include <linux/slab.h>
 #include <linux/sched.h>
 #include <linux/sched/isolation.h>
+#include <linux/sched/types.h>
 #include <linux/sched/mm.h>
 #include <linux/smpboot.h>
 #include <linux/mutex.h>
@@ -6231,7 +6232,8 @@  bool napi_complete_done(struct napi_struct *n, int work_done)
 	 *    the guarantee we will be called later.
 	 */
 	if (unlikely(n->state & (NAPIF_STATE_NPSVC |
-				 NAPIF_STATE_IN_BUSY_POLL)))
+				 NAPIF_STATE_IN_BUSY_POLL |
+				 NAPIF_STATE_SCHED_THREADED_BUSY_POLL)))
 		return false;
 
 	if (work_done) {
@@ -6633,8 +6635,10 @@  static void init_gro_hash(struct napi_struct *napi)
 	napi->gro_bitmask = 0;
 }
 
-int napi_set_threaded(struct napi_struct *napi, bool threaded)
+int napi_set_threaded(struct napi_struct *napi,
+		      enum napi_threaded_state threaded)
 {
+	unsigned long val;
 	if (napi->dev->threaded)
 		return -EINVAL;
 
@@ -6649,30 +6653,41 @@  int napi_set_threaded(struct napi_struct *napi, bool threaded)
 
 	/* Make sure kthread is created before THREADED bit is set. */
 	smp_mb__before_atomic();
-	assign_bit(NAPI_STATE_THREADED, &napi->state, threaded);
+	val = 0;
+	if (threaded == NAPI_THREADED_BUSY_POLL)
+		val |= NAPIF_STATE_THREADED_BUSY_POLL;
+	if (threaded)
+		val |= NAPIF_STATE_THREADED;
+	set_mask_bits(&napi->state, NAPIF_STATE_THREADED_BUSY_POLL_MASK, val);
 
 	return 0;
 }
 
-int dev_set_threaded(struct net_device *dev, bool threaded)
+int dev_set_threaded(struct net_device *dev, enum napi_threaded_state threaded)
 {
 	struct napi_struct *napi;
+	unsigned long val;
 	int err = 0;
 
 	if (dev->threaded == threaded)
 		return 0;
 
+	val = 0;
 	if (threaded) {
 		/* Check if threaded is set at napi level already */
 		list_for_each_entry(napi, &dev->napi_list, dev_list)
 			if (test_bit(NAPI_STATE_THREADED, &napi->state))
 				return -EINVAL;
 
+		val |= NAPIF_STATE_THREADED;
+		if (threaded == NAPI_THREADED_BUSY_POLL)
+			val |= NAPIF_STATE_THREADED_BUSY_POLL;
+
 		list_for_each_entry(napi, &dev->napi_list, dev_list) {
 			if (!napi->thread) {
 				err = napi_kthread_create(napi);
 				if (err) {
-					threaded = false;
+					threaded = NAPI_THREADED_OFF;
 					break;
 				}
 			}
@@ -6691,9 +6706,13 @@  int dev_set_threaded(struct net_device *dev, bool threaded)
 	 * polled. In this case, the switch between threaded mode and
 	 * softirq mode will happen in the next round of napi_schedule().
 	 * This should not cause hiccups/stalls to the live traffic.
+	 *
+	 * Switch to busy_poll threaded napi will occur after the threaded
+	 * napi is scheduled.
 	 */
 	list_for_each_entry(napi, &dev->napi_list, dev_list)
-		assign_bit(NAPI_STATE_THREADED, &napi->state, threaded);
+		set_mask_bits(&napi->state,
+			      NAPIF_STATE_THREADED_BUSY_POLL_MASK, val);
 
 	return err;
 }
@@ -7007,7 +7026,7 @@  static int napi_thread_wait(struct napi_struct *napi)
 	return -1;
 }
 
-static void napi_threaded_poll_loop(struct napi_struct *napi)
+static void napi_threaded_poll_loop(struct napi_struct *napi, bool busy_poll)
 {
 	struct bpf_net_context __bpf_net_ctx, *bpf_net_ctx;
 	struct softnet_data *sd;
@@ -7036,22 +7055,53 @@  static void napi_threaded_poll_loop(struct napi_struct *napi)
 		}
 		skb_defer_free_flush(sd);
 		bpf_net_ctx_clear(bpf_net_ctx);
+
+		/* Push the skbs up the stack if busy polling. */
+		if (busy_poll)
+			__napi_gro_flush_helper(napi);
 		local_bh_enable();
 
-		if (!repoll)
+		/* If busy polling then do not break here because we need to
+		 * call cond_resched and rcu_softirq_qs_periodic to prevent
+		 * watchdog warnings.
+		 */
+		if (!repoll && !busy_poll)
 			break;
 
 		rcu_softirq_qs_periodic(last_qs);
 		cond_resched();
+
+		if (!repoll)
+			break;
 	}
 }
 
 static int napi_threaded_poll(void *data)
 {
 	struct napi_struct *napi = data;
+	bool busy_poll_sched;
+	unsigned long val;
+	bool busy_poll;
+
+	while (!napi_thread_wait(napi)) {
+		/* Once woken up, this means that we are scheduled as threaded
+		 * napi and this thread owns the napi context, if busy poll
+		 * state is set then we busy poll this napi.
+		 */
+		val = READ_ONCE(napi->state);
+		busy_poll = val & NAPIF_STATE_THREADED_BUSY_POLL;
+		busy_poll_sched = val & NAPIF_STATE_SCHED_THREADED_BUSY_POLL;
+
+		/* Do not busy poll if napi is disabled. */
+		if (unlikely(val & NAPIF_STATE_DISABLE))
+			busy_poll = false;
+
+		if (busy_poll != busy_poll_sched)
+			assign_bit(NAPI_STATE_SCHED_THREADED_BUSY_POLL,
+				   &napi->state, busy_poll);
 
-	while (!napi_thread_wait(napi))
-		napi_threaded_poll_loop(napi);
+		napi_threaded_poll_loop(napi, busy_poll);
+	}
 
 	return 0;
 }
@@ -12205,7 +12255,7 @@  static void run_backlog_napi(unsigned int cpu)
 {
 	struct softnet_data *sd = per_cpu_ptr(&softnet_data, cpu);
 
-	napi_threaded_poll_loop(&sd->backlog);
+	napi_threaded_poll_loop(&sd->backlog, false);
 }
 
 static void backlog_napi_setup(unsigned int cpu)
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 2d9afc6e2161..36d0a22e341c 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -626,7 +626,7 @@  static int modify_napi_threaded(struct net_device *dev, unsigned long val)
 	if (list_empty(&dev->napi_list))
 		return -EOPNOTSUPP;
 
-	if (val != 0 && val != 1)
+	if (val > NAPI_THREADED_MAX)
 		return -EOPNOTSUPP;
 
 	ret = dev_set_threaded(dev, val);
diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c
index 93dc74dad6de..4086d2577dcc 100644
--- a/net/core/netdev-genl-gen.c
+++ b/net/core/netdev-genl-gen.c
@@ -102,7 +102,7 @@  static const struct nla_policy netdev_napi_set_nl_policy[NETDEV_A_NAPI_IRQ_SUSPE
 /* NETDEV_CMD_NAPI_SET_THREADED - do */
 static const struct nla_policy netdev_napi_set_threaded_nl_policy[NETDEV_A_NAPI_THREADED + 1] = {
 	[NETDEV_A_NAPI_ID] = { .type = NLA_U32, },
-	[NETDEV_A_NAPI_THREADED] = NLA_POLICY_MAX(NLA_U32, 1),
+	[NETDEV_A_NAPI_THREADED] = NLA_POLICY_MAX(NLA_U32, 2),
 };
 
 /* Ops table for netdev */