Message ID | 20250321021521.849856-2-skhawaja@google.com (mailing list archive) |
---|---|
State | New |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Add support to do threaded napi busy poll | expand |
On Fri, Mar 21, 2025 at 02:15:18AM +0000, Samiullah Khawaja wrote: > A net device has a threaded sysctl that can be used to enable threaded > napi polling on all of the NAPI contexts under that device. Allow > enabling threaded napi polling at individual napi level using netlink. > > Extend the netlink operation `napi-set` and allow setting the threaded > attribute of a NAPI. This will enable the threaded polling on a napi > context. > > Tested using following command in qemu/virtio-net: > ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ > --do napi-set --json '{"id": 66, "threaded": 1}' > > Signed-off-by: Samiullah Khawaja <skhawaja@google.com> > --- > Documentation/netlink/specs/netdev.yaml | 10 ++++++++ > Documentation/networking/napi.rst | 13 ++++++++++- > include/linux/netdevice.h | 10 ++++++++ > include/uapi/linux/netdev.h | 1 + > net/core/dev.c | 31 +++++++++++++++++++++++++ > net/core/netdev-genl-gen.c | 5 ++-- > net/core/netdev-genl.c | 9 +++++++ > tools/include/uapi/linux/netdev.h | 1 + > 8 files changed, 77 insertions(+), 3 deletions(-) I still think that this patch could be submit on its own, separate from this series, with its own selftest. It seems generally useful to be able to enable threaded NAPI on a per-NAPI basis, unrelated to the rest of the work in the series. > diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml > index f5e0750ab71d..92f98f2a6bd7 100644 > --- a/Documentation/netlink/specs/netdev.yaml > +++ b/Documentation/netlink/specs/netdev.yaml > @@ -280,6 +280,14 @@ attribute-sets: > doc: The timeout, in nanoseconds, of how long to suspend irq > processing, if event polling finds events > type: uint > + - > + name: threaded > + doc: Whether the napi is configured to operate in threaded polling > + mode. If this is set to `1` then the NAPI context operates > + in threaded polling mode. > + type: u32 I think the type should be uint instead of u32? > + checks: > + max: 1 > - > name: xsk-info > attributes: [] > @@ -691,6 +699,7 @@ operations: > - defer-hard-irqs > - gro-flush-timeout > - irq-suspend-timeout > + - threaded > dump: > request: > attributes: > @@ -743,6 +752,7 @@ operations: > - defer-hard-irqs > - gro-flush-timeout > - irq-suspend-timeout > + - threaded > > kernel-family: > headers: [ "net/netdev_netlink.h"] > diff --git a/Documentation/networking/napi.rst b/Documentation/networking/napi.rst > index d0e3953cae6a..63f98c05860f 100644 > --- a/Documentation/networking/napi.rst > +++ b/Documentation/networking/napi.rst > @@ -444,7 +444,18 @@ dependent). The NAPI instance IDs will be assigned in the opposite > order than the process IDs of the kernel threads. > > Threaded NAPI is controlled by writing 0/1 to the ``threaded`` file in > -netdev's sysfs directory. > +netdev's sysfs directory. It can also be enabled for a specific napi using > +netlink interface. > + > +For example, using the script: > + > +.. code-block:: bash > + > + $ kernel-source/tools/net/ynl/pyynl/cli.py \ > + --spec Documentation/netlink/specs/netdev.yaml \ > + --do napi-set \ > + --json='{"id": 66, > + "threaded": 1}' > > .. rubric:: Footnotes > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 0c5b1f7f8f3a..3c244fd9ae6d 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -369,6 +369,7 @@ struct napi_config { > u64 irq_suspend_timeout; > u32 defer_hard_irqs; > cpumask_t affinity_mask; > + bool threaded; > unsigned int napi_id; > }; > > @@ -590,6 +591,15 @@ static inline bool napi_complete(struct napi_struct *n) > > int dev_set_threaded(struct net_device *dev, bool threaded); > > +/* > + * napi_set_threaded - set napi threaded state > + * @napi: NAPI context > + * @threaded: whether this napi does threaded polling > + * > + * Return 0 on success and negative errno on failure. > + */ > +int napi_set_threaded(struct napi_struct *napi, bool threaded); > + > void napi_disable(struct napi_struct *n); > void napi_disable_locked(struct napi_struct *n); > > diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h > index 7600bf62dbdf..fac1b8ffeb55 100644 > --- a/include/uapi/linux/netdev.h > +++ b/include/uapi/linux/netdev.h > @@ -134,6 +134,7 @@ enum { > NETDEV_A_NAPI_DEFER_HARD_IRQS, > NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT, > NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT, > + NETDEV_A_NAPI_THREADED, > > __NETDEV_A_NAPI_MAX, > NETDEV_A_NAPI_MAX = (__NETDEV_A_NAPI_MAX - 1) > diff --git a/net/core/dev.c b/net/core/dev.c > index 235560341765..b92e4e8890d1 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -6806,6 +6806,30 @@ static enum hrtimer_restart napi_watchdog(struct hrtimer *timer) > return HRTIMER_NORESTART; > } > > +int napi_set_threaded(struct napi_struct *napi, bool threaded) > +{ > + if (napi->dev->threaded) > + return -EINVAL; This works differently than the existing per-NAPI defer_hard_irqs / gro_flush_timeout which are also interface wide. In that implementation: - the per-NAPI value is set when requested by the user - when the sysfs value is written, all NAPIs have their values overwritten to the sysfs value I think either: - This implementation should work like the existing ones, or - The existing ones should be changed to work like this I am opposed to have two different behaviors when setting per-NAPI vs system/nic-wide sysfs values. I don't have a preference on which behavior is chosen, but the behavior should be the same for all of the things that are system/nic-wide and moving to per-NAPI. > + if (threaded) { > + if (!napi->thread) { > + int err = napi_kthread_create(napi); > + > + if (err) > + return err; > + } > + } > + > + if (napi->config) > + napi->config->threaded = threaded; > + > + /* Make sure kthread is created before THREADED bit is set. */ > + smp_mb__before_atomic(); > + assign_bit(NAPI_STATE_THREADED, &napi->state, threaded); > + > + return 0; > +} > + > int dev_set_threaded(struct net_device *dev, bool threaded) > { > struct napi_struct *napi; > @@ -6817,6 +6841,11 @@ int dev_set_threaded(struct net_device *dev, bool threaded) > return 0; > > if (threaded) { > + /* Check if threaded is set at napi level already */ > + list_for_each_entry(napi, &dev->napi_list, dev_list) > + if (test_bit(NAPI_STATE_THREADED, &napi->state)) > + return -EINVAL; > + Same comment as above. > list_for_each_entry(napi, &dev->napi_list, dev_list) { > if (!napi->thread) { > err = napi_kthread_create(napi); > @@ -7063,6 +7092,8 @@ static void napi_restore_config(struct napi_struct *n) > napi_hash_add(n); > n->config->napi_id = n->napi_id; > } > + > + napi_set_threaded(n, n->config->threaded); I feel like if this fails, it should only be because napi_kthread_create fails (as described above), which I think should probably generate a warning, perhaps via WARN_ON_ONCE ? > } > > static void napi_save_config(struct napi_struct *n) > diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c > index 739f7b6506a6..c2e5cee857d2 100644 > --- a/net/core/netdev-genl-gen.c > +++ b/net/core/netdev-genl-gen.c > @@ -92,11 +92,12 @@ static const struct nla_policy netdev_bind_rx_nl_policy[NETDEV_A_DMABUF_FD + 1] > }; > > /* NETDEV_CMD_NAPI_SET - do */ > -static const struct nla_policy netdev_napi_set_nl_policy[NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT + 1] = { > +static const struct nla_policy netdev_napi_set_nl_policy[NETDEV_A_NAPI_THREADED + 1] = { > [NETDEV_A_NAPI_ID] = { .type = NLA_U32, }, > [NETDEV_A_NAPI_DEFER_HARD_IRQS] = NLA_POLICY_FULL_RANGE(NLA_U32, &netdev_a_napi_defer_hard_irqs_range), > [NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT] = { .type = NLA_UINT, }, > [NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT] = { .type = NLA_UINT, }, > + [NETDEV_A_NAPI_THREADED] = NLA_POLICY_MAX(NLA_U32, 1), > }; > > /* Ops table for netdev */ > @@ -187,7 +188,7 @@ static const struct genl_split_ops netdev_nl_ops[] = { > .cmd = NETDEV_CMD_NAPI_SET, > .doit = netdev_nl_napi_set_doit, > .policy = netdev_napi_set_nl_policy, > - .maxattr = NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT, > + .maxattr = NETDEV_A_NAPI_THREADED, > .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO, > }, > }; > diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c > index a186fea63c09..057001c3bbba 100644 > --- a/net/core/netdev-genl.c > +++ b/net/core/netdev-genl.c > @@ -186,6 +186,9 @@ netdev_nl_napi_fill_one(struct sk_buff *rsp, struct napi_struct *napi, > if (napi->irq >= 0 && nla_put_u32(rsp, NETDEV_A_NAPI_IRQ, napi->irq)) > goto nla_put_failure; > > + if (nla_put_u32(rsp, NETDEV_A_NAPI_THREADED, !!napi->thread)) > + goto nla_put_failure; > + > if (napi->thread) { > pid = task_pid_nr(napi->thread); > if (nla_put_u32(rsp, NETDEV_A_NAPI_PID, pid)) > @@ -324,8 +327,14 @@ netdev_nl_napi_set_config(struct napi_struct *napi, struct genl_info *info) > { > u64 irq_suspend_timeout = 0; > u64 gro_flush_timeout = 0; > + u32 threaded = 0; > u32 defer = 0; > > + if (info->attrs[NETDEV_A_NAPI_THREADED]) { > + threaded = nla_get_u32(info->attrs[NETDEV_A_NAPI_THREADED]); > + napi_set_threaded(napi, !!threaded); > + } > + > if (info->attrs[NETDEV_A_NAPI_DEFER_HARD_IRQS]) { > defer = nla_get_u32(info->attrs[NETDEV_A_NAPI_DEFER_HARD_IRQS]); > napi_set_defer_hard_irqs(napi, defer); > diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h > index 7600bf62dbdf..fac1b8ffeb55 100644 > --- a/tools/include/uapi/linux/netdev.h > +++ b/tools/include/uapi/linux/netdev.h > @@ -134,6 +134,7 @@ enum { > NETDEV_A_NAPI_DEFER_HARD_IRQS, > NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT, > NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT, > + NETDEV_A_NAPI_THREADED, > > __NETDEV_A_NAPI_MAX, > NETDEV_A_NAPI_MAX = (__NETDEV_A_NAPI_MAX - 1) > -- > 2.49.0.395.g12beb8f557-goog >
diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml index f5e0750ab71d..92f98f2a6bd7 100644 --- a/Documentation/netlink/specs/netdev.yaml +++ b/Documentation/netlink/specs/netdev.yaml @@ -280,6 +280,14 @@ attribute-sets: doc: The timeout, in nanoseconds, of how long to suspend irq processing, if event polling finds events type: uint + - + name: threaded + doc: Whether the napi is configured to operate in threaded polling + mode. If this is set to `1` then the NAPI context operates + in threaded polling mode. + type: u32 + checks: + max: 1 - name: xsk-info attributes: [] @@ -691,6 +699,7 @@ operations: - defer-hard-irqs - gro-flush-timeout - irq-suspend-timeout + - threaded dump: request: attributes: @@ -743,6 +752,7 @@ operations: - defer-hard-irqs - gro-flush-timeout - irq-suspend-timeout + - threaded kernel-family: headers: [ "net/netdev_netlink.h"] diff --git a/Documentation/networking/napi.rst b/Documentation/networking/napi.rst index d0e3953cae6a..63f98c05860f 100644 --- a/Documentation/networking/napi.rst +++ b/Documentation/networking/napi.rst @@ -444,7 +444,18 @@ dependent). The NAPI instance IDs will be assigned in the opposite order than the process IDs of the kernel threads. Threaded NAPI is controlled by writing 0/1 to the ``threaded`` file in -netdev's sysfs directory. +netdev's sysfs directory. It can also be enabled for a specific napi using +netlink interface. + +For example, using the script: + +.. code-block:: bash + + $ kernel-source/tools/net/ynl/pyynl/cli.py \ + --spec Documentation/netlink/specs/netdev.yaml \ + --do napi-set \ + --json='{"id": 66, + "threaded": 1}' .. rubric:: Footnotes diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 0c5b1f7f8f3a..3c244fd9ae6d 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -369,6 +369,7 @@ struct napi_config { u64 irq_suspend_timeout; u32 defer_hard_irqs; cpumask_t affinity_mask; + bool threaded; unsigned int napi_id; }; @@ -590,6 +591,15 @@ static inline bool napi_complete(struct napi_struct *n) int dev_set_threaded(struct net_device *dev, bool threaded); +/* + * napi_set_threaded - set napi threaded state + * @napi: NAPI context + * @threaded: whether this napi does threaded polling + * + * Return 0 on success and negative errno on failure. + */ +int napi_set_threaded(struct napi_struct *napi, bool threaded); + void napi_disable(struct napi_struct *n); void napi_disable_locked(struct napi_struct *n); diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h index 7600bf62dbdf..fac1b8ffeb55 100644 --- a/include/uapi/linux/netdev.h +++ b/include/uapi/linux/netdev.h @@ -134,6 +134,7 @@ enum { NETDEV_A_NAPI_DEFER_HARD_IRQS, NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT, NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT, + NETDEV_A_NAPI_THREADED, __NETDEV_A_NAPI_MAX, NETDEV_A_NAPI_MAX = (__NETDEV_A_NAPI_MAX - 1) diff --git a/net/core/dev.c b/net/core/dev.c index 235560341765..b92e4e8890d1 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6806,6 +6806,30 @@ static enum hrtimer_restart napi_watchdog(struct hrtimer *timer) return HRTIMER_NORESTART; } +int napi_set_threaded(struct napi_struct *napi, bool threaded) +{ + if (napi->dev->threaded) + return -EINVAL; + + if (threaded) { + if (!napi->thread) { + int err = napi_kthread_create(napi); + + if (err) + return err; + } + } + + if (napi->config) + napi->config->threaded = threaded; + + /* Make sure kthread is created before THREADED bit is set. */ + smp_mb__before_atomic(); + assign_bit(NAPI_STATE_THREADED, &napi->state, threaded); + + return 0; +} + int dev_set_threaded(struct net_device *dev, bool threaded) { struct napi_struct *napi; @@ -6817,6 +6841,11 @@ int dev_set_threaded(struct net_device *dev, bool threaded) return 0; if (threaded) { + /* Check if threaded is set at napi level already */ + list_for_each_entry(napi, &dev->napi_list, dev_list) + if (test_bit(NAPI_STATE_THREADED, &napi->state)) + return -EINVAL; + list_for_each_entry(napi, &dev->napi_list, dev_list) { if (!napi->thread) { err = napi_kthread_create(napi); @@ -7063,6 +7092,8 @@ static void napi_restore_config(struct napi_struct *n) napi_hash_add(n); n->config->napi_id = n->napi_id; } + + napi_set_threaded(n, n->config->threaded); } static void napi_save_config(struct napi_struct *n) diff --git a/net/core/netdev-genl-gen.c b/net/core/netdev-genl-gen.c index 739f7b6506a6..c2e5cee857d2 100644 --- a/net/core/netdev-genl-gen.c +++ b/net/core/netdev-genl-gen.c @@ -92,11 +92,12 @@ static const struct nla_policy netdev_bind_rx_nl_policy[NETDEV_A_DMABUF_FD + 1] }; /* NETDEV_CMD_NAPI_SET - do */ -static const struct nla_policy netdev_napi_set_nl_policy[NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT + 1] = { +static const struct nla_policy netdev_napi_set_nl_policy[NETDEV_A_NAPI_THREADED + 1] = { [NETDEV_A_NAPI_ID] = { .type = NLA_U32, }, [NETDEV_A_NAPI_DEFER_HARD_IRQS] = NLA_POLICY_FULL_RANGE(NLA_U32, &netdev_a_napi_defer_hard_irqs_range), [NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT] = { .type = NLA_UINT, }, [NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT] = { .type = NLA_UINT, }, + [NETDEV_A_NAPI_THREADED] = NLA_POLICY_MAX(NLA_U32, 1), }; /* Ops table for netdev */ @@ -187,7 +188,7 @@ static const struct genl_split_ops netdev_nl_ops[] = { .cmd = NETDEV_CMD_NAPI_SET, .doit = netdev_nl_napi_set_doit, .policy = netdev_napi_set_nl_policy, - .maxattr = NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT, + .maxattr = NETDEV_A_NAPI_THREADED, .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO, }, }; diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c index a186fea63c09..057001c3bbba 100644 --- a/net/core/netdev-genl.c +++ b/net/core/netdev-genl.c @@ -186,6 +186,9 @@ netdev_nl_napi_fill_one(struct sk_buff *rsp, struct napi_struct *napi, if (napi->irq >= 0 && nla_put_u32(rsp, NETDEV_A_NAPI_IRQ, napi->irq)) goto nla_put_failure; + if (nla_put_u32(rsp, NETDEV_A_NAPI_THREADED, !!napi->thread)) + goto nla_put_failure; + if (napi->thread) { pid = task_pid_nr(napi->thread); if (nla_put_u32(rsp, NETDEV_A_NAPI_PID, pid)) @@ -324,8 +327,14 @@ netdev_nl_napi_set_config(struct napi_struct *napi, struct genl_info *info) { u64 irq_suspend_timeout = 0; u64 gro_flush_timeout = 0; + u32 threaded = 0; u32 defer = 0; + if (info->attrs[NETDEV_A_NAPI_THREADED]) { + threaded = nla_get_u32(info->attrs[NETDEV_A_NAPI_THREADED]); + napi_set_threaded(napi, !!threaded); + } + if (info->attrs[NETDEV_A_NAPI_DEFER_HARD_IRQS]) { defer = nla_get_u32(info->attrs[NETDEV_A_NAPI_DEFER_HARD_IRQS]); napi_set_defer_hard_irqs(napi, defer); diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h index 7600bf62dbdf..fac1b8ffeb55 100644 --- a/tools/include/uapi/linux/netdev.h +++ b/tools/include/uapi/linux/netdev.h @@ -134,6 +134,7 @@ enum { NETDEV_A_NAPI_DEFER_HARD_IRQS, NETDEV_A_NAPI_GRO_FLUSH_TIMEOUT, NETDEV_A_NAPI_IRQ_SUSPEND_TIMEOUT, + NETDEV_A_NAPI_THREADED, __NETDEV_A_NAPI_MAX, NETDEV_A_NAPI_MAX = (__NETDEV_A_NAPI_MAX - 1)
A net device has a threaded sysctl that can be used to enable threaded napi polling on all of the NAPI contexts under that device. Allow enabling threaded napi polling at individual napi level using netlink. Extend the netlink operation `napi-set` and allow setting the threaded attribute of a NAPI. This will enable the threaded polling on a napi context. Tested using following command in qemu/virtio-net: ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \ --do napi-set --json '{"id": 66, "threaded": 1}' Signed-off-by: Samiullah Khawaja <skhawaja@google.com> --- Documentation/netlink/specs/netdev.yaml | 10 ++++++++ Documentation/networking/napi.rst | 13 ++++++++++- include/linux/netdevice.h | 10 ++++++++ include/uapi/linux/netdev.h | 1 + net/core/dev.c | 31 +++++++++++++++++++++++++ net/core/netdev-genl-gen.c | 5 ++-- net/core/netdev-genl.c | 9 +++++++ tools/include/uapi/linux/netdev.h | 1 + 8 files changed, 77 insertions(+), 3 deletions(-)