diff mbox series

[net-next,v3,1/4] Add support to set napi threaded for individual napi

Message ID 20250205001052.2590140-2-skhawaja@google.com (mailing list archive)
State New
Delegated to: Netdev Maintainers
Headers show
Series Add support to do threaded napi busy poll | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; GEN HAS DIFF 2 files changed, 33 insertions(+);
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 40 this patch: 40
netdev/build_tools success Errors and warnings before: 26 (+1) this patch: 26 (+1)
netdev/cc_maintainers warning 6 maintainers not CCed: donald.hunter@gmail.com linux-doc@vger.kernel.org andrew+netdev@lunn.ch jdamato@fastly.com horms@kernel.org corbet@lwn.net
netdev/build_clang success Errors and warnings before: 7110 this patch: 7110
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4116 this patch: 4116
netdev/checkpatch warning WARNING: line length of 88 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 107 this patch: 107
netdev/source_inline success Was 0 now: 0

Commit Message

Samiullah Khawaja Feb. 5, 2025, 12:10 a.m. UTC
A net device has a threaded sysctl that can be used to enable threaded
napi polling on all of the NAPI contexts under that device. Allow
enabling threaded napi polling at individual napi level using netlink.

Extend the netlink operation `napi-set` and allow setting the threaded
attribute of a NAPI. This will enable the threaded polling on a napi
context.

Tested using following command in qemu/virtio-net:
./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
  --do napi-set       --json '{"id": 66, "threaded": 1}'

Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
---
 Documentation/netlink/specs/netdev.yaml | 10 ++++++++
 Documentation/networking/napi.rst       | 13 ++++++++++-
 include/linux/netdevice.h               | 10 ++++++++
 include/uapi/linux/netdev.h             |  1 +
 net/core/dev.c                          | 31 +++++++++++++++++++++++++
 net/core/netdev-genl-gen.c              |  5 ++--
 net/core/netdev-genl.c                  |  9 +++++++
 tools/include/uapi/linux/netdev.h       |  1 +
 8 files changed, 77 insertions(+), 3 deletions(-)

Comments

Joe Damato Feb. 5, 2025, 2:50 a.m. UTC | #1
On Wed, Feb 05, 2025 at 12:10:49AM +0000, Samiullah Khawaja wrote:
> A net device has a threaded sysctl that can be used to enable threaded
> napi polling on all of the NAPI contexts under that device. Allow
> enabling threaded napi polling at individual napi level using netlink.
> 
> Extend the netlink operation `napi-set` and allow setting the threaded
> attribute of a NAPI. This will enable the threaded polling on a napi
> context.
> 
> Tested using following command in qemu/virtio-net:
> ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
>   --do napi-set       --json '{"id": 66, "threaded": 1}'


At a high level, I think this patch could probably be sent on its
own; IMHO it makes sense to make threaded NAPI per-NAPI via
netdev-genl.

I think its fine if you want to include it in your overall series,
but I think a change like this could probably go in on its own.

[...]

> diff --git a/net/core/dev.c b/net/core/dev.c
> index c0021cbd28fc..50fb234dd7a0 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6787,6 +6787,30 @@ static void init_gro_hash(struct napi_struct *napi)
>  	napi->gro_bitmask = 0;
>  }
>  
> +int napi_set_threaded(struct napi_struct *napi, bool threaded)
> +{
> +	if (napi->dev->threaded)
> +		return -EINVAL;

I feel this is probably a WARN_ON_ONCE situation? Not sure, but in
any case, see below.

> +
> +	if (threaded) {
> +		if (!napi->thread) {
> +			int err = napi_kthread_create(napi);
> +
> +			if (err)
> +				return err;
> +		}
> +	}
> +
> +	if (napi->config)
> +		napi->config->threaded = threaded;
> +
> +	/* Make sure kthread is created before THREADED bit is set. */
> +	smp_mb__before_atomic();
> +	assign_bit(NAPI_STATE_THREADED, &napi->state, threaded);
> +
> +	return 0;
> +}
> +
>  int dev_set_threaded(struct net_device *dev, bool threaded)
>  {
>  	struct napi_struct *napi;
> @@ -6798,6 +6822,11 @@ int dev_set_threaded(struct net_device *dev, bool threaded)
>  		return 0;
>  
>  	if (threaded) {
> +		/* Check if threaded is set at napi level already */
> +		list_for_each_entry(napi, &dev->napi_list, dev_list)
> +			if (test_bit(NAPI_STATE_THREADED, &napi->state))
> +				return -EINVAL;
> +
>  		list_for_each_entry(napi, &dev->napi_list, dev_list) {
>  			if (!napi->thread) {
>  				err = napi_kthread_create(napi);
> @@ -6880,6 +6909,8 @@ static void napi_restore_config(struct napi_struct *n)
>  		napi_hash_add(n);
>  		n->config->napi_id = n->napi_id;
>  	}
> +
> +	napi_set_threaded(n, n->config->threaded);

The return value isn't checked so even if napi_set_threaded returned
EINVAL it seems like there's no way for that error to "bubble" back
up?
diff mbox series

Patch

diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index cbb544bd6c84..785240d60df6 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -268,6 +268,14 @@  attribute-sets:
         doc: The timeout, in nanoseconds, of how long to suspend irq
              processing, if event polling finds events
         type: uint
+      -
+        name: threaded
+        doc: Whether the napi is configured to operate in threaded polling
+             mode. If this is set to `1` then the NAPI context operates
+             in threaded polling mode.
+        type: u32
+        checks:
+          max: 1
   -
     name: queue
     attributes:
@@ -659,6 +667,7 @@  operations:
             - defer-hard-irqs
             - gro-flush-timeout
             - irq-suspend-timeout
+            - threaded
       dump:
         request:
           attributes:
@@ -711,6 +720,7 @@  operations:
             - defer-hard-irqs
             - gro-flush-timeout
             - irq-suspend-timeout
+            - threaded
 
 kernel-family:
   headers: [ "linux/list.h"]
diff --git a/Documentation/networking/napi.rst b/Documentation/networking/napi.rst
index f970a2be271a..73c83b4533dc 100644
--- a/Documentation/networking/napi.rst
+++ b/Documentation/networking/napi.rst
@@ -413,7 +413,18 @@  dependent). The NAPI instance IDs will be assigned in the opposite
 order than the process IDs of the kernel threads.
 
 Threaded NAPI is controlled by writing 0/1 to the ``threaded`` file in
-netdev's sysfs directory.
+netdev's sysfs directory. It can also be enabled for a specific napi using
+netlink interface.
+
+For example, using the script:
+
+.. code-block:: bash
+
+  $ kernel-source/tools/net/ynl/pyynl/cli.py \
+            --spec Documentation/netlink/specs/netdev.yaml \
+            --do napi-set \
+            --json='{"id": 66,
+                     "threaded": 1}'
 
 .. rubric:: Footnotes
 
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2a59034a5fa2..a0e485722ed9 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -352,6 +352,7 @@  struct napi_config {
 	u64 gro_flush_timeout;
 	u64 irq_suspend_timeout;
 	u32 defer_hard_irqs;
+	bool threaded;
 	unsigned int napi_id;
 };
 
@@ -572,6 +573,15 @@  static inline bool napi_complete(struct napi_struct *n)
 
 int dev_set_threaded(struct net_device *dev, bool threaded);
 
+/*
+ * napi_set_threaded - set napi threaded state
+ * @napi: NAPI context
+ * @threaded: whether this napi does threaded polling
+ *
+ * Return 0 on success and negative errno on failure.
+ */
+int napi_set_threaded(struct napi_struct *napi, bool threaded);
+
 void napi_disable(struct napi_struct *n);
 void napi_disable_locked(struct napi_struct *n);
 
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index e4be227d3ad6..829648b2ef65 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -125,6 +125,7 @@  enum {
 	NETDEV_A_NAPI_DEFER_HARD_IRQS,
 	NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT,
 	NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT,
+	NETDEV_A_NAPI_THREADED,
 
 	__NETDEV_A_NAPI_MAX,
 	NETDEV_A_NAPI_MAX = (__NETDEV_A_NAPI_MAX - 1)
diff --git a/net/core/dev.c b/net/core/dev.c
index c0021cbd28fc..50fb234dd7a0 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6787,6 +6787,30 @@  static void init_gro_hash(struct napi_struct *napi)
 	napi->gro_bitmask = 0;
 }
 
+int napi_set_threaded(struct napi_struct *napi, bool threaded)
+{
+	if (napi->dev->threaded)
+		return -EINVAL;
+
+	if (threaded) {
+		if (!napi->thread) {
+			int err = napi_kthread_create(napi);
+
+			if (err)
+				return err;
+		}
+	}
+
+	if (napi->config)
+		napi->config->threaded = threaded;
+
+	/* Make sure kthread is created before THREADED bit is set. */
+	smp_mb__before_atomic();
+	assign_bit(NAPI_STATE_THREADED, &napi->state, threaded);
+
+	return 0;
+}
+
 int dev_set_threaded(struct net_device *dev, bool threaded)
 {
 	struct napi_struct *napi;
@@ -6798,6 +6822,11 @@  int dev_set_threaded(struct net_device *dev, bool threaded)
 		return 0;
 
 	if (threaded) {
+		/* Check if threaded is set at napi level already */
+		list_for_each_entry(napi, &dev->napi_list, dev_list)
+			if (test_bit(NAPI_STATE_THREADED, &napi->state))
+				return -EINVAL;
+
 		list_for_each_entry(napi, &dev->napi_list, dev_list) {
 			if (!napi->thread) {
 				err = napi_kthread_create(napi);
@@ -6880,6 +6909,8 @@  static void napi_restore_config(struct napi_struct *n)
 		napi_hash_add(n);
 		n->config->napi_id = n->napi_id;
 	}
+
+	napi_set_threaded(n, n->config->threaded);
 }
 
 static void napi_save_config(struct napi_struct *n)
diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c
index 996ac6a449eb..a1f80e687f53 100644
--- a/net/core/netdev-genl-gen.c
+++ b/net/core/netdev-genl-gen.c
@@ -92,11 +92,12 @@  static const struct nla_policy netdev_bind_rx_nl_policy[NETDEV_A_DMABUF_FD + 1]
 };
 
 /* NETDEV_CMD_NAPI_SET - do */
-static const struct nla_policy netdev_napi_set_nl_policy[NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT + 1] = {
+static const struct nla_policy netdev_napi_set_nl_policy[NETDEV_A_NAPI_THREADED + 1] = {
 	[NETDEV_A_NAPI_ID] = { .type = NLA_U32, },
 	[NETDEV_A_NAPI_DEFER_HARD_IRQS] = NLA_POLICY_FULL_RANGE(NLA_U32, &netdev_a_napi_defer_hard_irqs_range),
 	[NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT] = { .type = NLA_UINT, },
 	[NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT] = { .type = NLA_UINT, },
+	[NETDEV_A_NAPI_THREADED] = NLA_POLICY_MAX(NLA_U32, 1),
 };
 
 /* Ops table for netdev */
@@ -187,7 +188,7 @@  static const struct genl_split_ops netdev_nl_ops[] = {
 		.cmd		= NETDEV_CMD_NAPI_SET,
 		.doit		= netdev_nl_napi_set_doit,
 		.policy		= netdev_napi_set_nl_policy,
-		.maxattr	= NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT,
+		.maxattr	= NETDEV_A_NAPI_THREADED,
 		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
 	},
 };
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 715f85c6b62e..208c3dd768ec 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -183,6 +183,9 @@  netdev_nl_napi_fill_one(struct sk_buff *rsp, struct napi_struct *napi,
 	if (napi->irq >= 0 && nla_put_u32(rsp, NETDEV_A_NAPI_IRQ, napi->irq))
 		goto nla_put_failure;
 
+	if (nla_put_u32(rsp, NETDEV_A_NAPI_THREADED, !!napi->thread))
+		goto nla_put_failure;
+
 	if (napi->thread) {
 		pid = task_pid_nr(napi->thread);
 		if (nla_put_u32(rsp, NETDEV_A_NAPI_PID, pid))
@@ -321,8 +324,14 @@  netdev_nl_napi_set_config(struct napi_struct *napi, struct genl_info *info)
 {
 	u64 irq_suspend_timeout = 0;
 	u64 gro_flush_timeout = 0;
+	u32 threaded = 0;
 	u32 defer = 0;
 
+	if (info->attrs[NETDEV_A_NAPI_THREADED]) {
+		threaded = nla_get_u32(info->attrs[NETDEV_A_NAPI_THREADED]);
+		napi_set_threaded(napi, !!threaded);
+	}
+
 	if (info->attrs[NETDEV_A_NAPI_DEFER_HARD_IRQS]) {
 		defer = nla_get_u32(info->attrs[NETDEV_A_NAPI_DEFER_HARD_IRQS]);
 		napi_set_defer_hard_irqs(napi, defer);
diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h
index e4be227d3ad6..829648b2ef65 100644
--- a/tools/include/uapi/linux/netdev.h
+++ b/tools/include/uapi/linux/netdev.h
@@ -125,6 +125,7 @@  enum {
 	NETDEV_A_NAPI_DEFER_HARD_IRQS,
 	NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT,
 	NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT,
+	NETDEV_A_NAPI_THREADED,
 
 	__NETDEV_A_NAPI_MAX,
 	NETDEV_A_NAPI_MAX = (__NETDEV_A_NAPI_MAX - 1)