mbox series

[net-next,v2,0/4] Add support to do threaded napi busy poll

Message ID 20250123231236.2657321-1-skhawaja@google.com (mailing list archive)
Headers show
Series Add support to do threaded napi busy poll | expand

Message

Samiullah Khawaja Jan. 23, 2025, 11:12 p.m. UTC
Extend the already existing support of threaded napi poll to do continuous
busy polling.

This is used for doing continuous polling of napi to fetch descriptors from
backing RX/TX queues for low latency applications. Allow enabling of threaded
busypoll using netlink so this can be enabled on a set of dedicated napis for
low latency applications.

It allows enabling NAPI busy poll for any userspace application
indepdendent of userspace API being used for packet and event processing
(epoll, io_uring, raw socket APIs). Once enabled user can fetch the PID
of the kthread doing NAPI polling and set affinity, priority and
scheduler for it depending on the low-latency requirements.

Currently threaded napi is only enabled at device level using sysfs. Add
support to enable/disable threaded mode for a napi individually. This can be
done using the netlink interface. Extend `napi-set` op in netlink spec that
allows setting the `threaded` attribute of a napi.

Extend the threaded attribute in napi struct to add an option to enable
continuous busy polling. Extend the netlink and sysfs interface to allow
enabled/disabling threaded busypolling at device or individual napi level.

We use this for our AF_XDP based hard low-latency usecase using onload
stack (https://github.com/Xilinx-CNS/onload) that runs in userspace. Our
usecase is a fixed frequency RPC style traffic with fixed
request/response size. We simulated this using neper by only starting
next transaction when last one has completed. The experiment results are
listed below,

Setup:

- Running on Google C3 VMs with idpf driver with following configurations.
- IRQ affinity and coalascing is common for both experiments.
- There is only 1 RX/TX queue configured.
- First experiment enables busy poll using sysctl for both epoll and
  socket APIs.
- Second experiment enables NAPI threaded busy poll for the full device
  using sysctl.

Non threaded NAPI busy poll enabled using sysctl.
```
echo 400 | sudo tee /proc/sys/net/core/busy_poll
echo 400 | sudo tee /proc/sys/net/core/busy_read
echo 2 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs
echo 15000  | sudo tee /sys/class/net/eth0/gro_flush_timeout
```

Results using following command,
```
sudo EF_NO_FAIL=0 EF_POLL_USEC=100000 taskset -c 3-10 onload -v \
		--profile=latency ./neper/tcp_rr -Q 200 -R 400 -T 1 -F 50 \
		-p 50,90,99,999 -H <IP> -l 10

...
...

num_transactions=2835
latency_min=0.000018976
latency_max=0.049642100
latency_mean=0.003243618
latency_stddev=0.010636847
latency_p50=0.000025270
latency_p90=0.005406710
latency_p99=0.049807350
latency_p99.9=0.049807350
```

Results with napi threaded busy poll using following command,
```
sudo EF_NO_FAIL=0 EF_POLL_USEC=100000 taskset -c 3-10 onload -v \
                --profile=latency ./neper/tcp_rr -Q 200 -R 400 -T 1 -F 50 \
                -p 50,90,99,999 -H <IP> -l 10

...
...

num_transactions=460163
latency_min=0.000015707
latency_max=0.200182942
latency_mean=0.000019453
latency_stddev=0.000720727
latency_p50=0.000016950
latency_p90=0.000017270
latency_p99=0.000018710
latency_p99.9=0.000020150
```

Here with NAPI threaded busy poll in a separate core, we are able to
consistently poll the NAPI to keep latency to absolute minimum. And also
we are able to do this without any major changes to the onload stack and
threading model.

v2:
 - Add documentation in napi.rst.
 - Provide experiment data and usecase details.
 - Update busy_poller selftest to include napi threaded poll testcase.
 - Define threaded mode enum in netlink interface.
 - Included NAPI threaded state in napi config to save/restore.

Samiullah Khawaja (4):
  Add support to set napi threaded for individual napi
  net: Create separate gro_flush helper function
  Extend napi threaded polling to allow kthread based busy polling
  selftests: Add napi threaded busy poll test in `busy_poller`

 Documentation/ABI/testing/sysfs-class-net     |   3 +-
 Documentation/netlink/specs/netdev.yaml       |  14 ++
 Documentation/networking/napi.rst             |  80 ++++++++++-
 .../net/ethernet/atheros/atl1c/atl1c_main.c   |   2 +-
 include/linux/netdevice.h                     |  24 +++-
 include/uapi/linux/netdev.h                   |   7 +
 net/core/dev.c                                | 127 ++++++++++++++----
 net/core/net-sysfs.c                          |   2 +-
 net/core/netdev-genl-gen.c                    |   5 +-
 net/core/netdev-genl.c                        |   9 ++
 tools/include/uapi/linux/netdev.h             |   7 +
 tools/testing/selftests/net/busy_poll_test.sh |  25 +++-
 tools/testing/selftests/net/busy_poller.c     |  14 +-
 13 files changed, 282 insertions(+), 37 deletions(-)

Comments

Jakub Kicinski Jan. 24, 2025, 1:24 a.m. UTC | #1
On Thu, 23 Jan 2025 23:12:32 +0000 Samiullah Khawaja wrote:
> Extend the already existing support of threaded napi poll to do continuous
> busy polling.

## Form letter - net-next-closed

The merge window for v6.14 has begun and we have already posted our pull
request. Therefore net-next is closed for new drivers, features, code
refactoring and optimizations. We are currently accepting bug fixes only.

Please repost when net-next reopens after Feb 3rd.

RFC patches sent for review only are obviously welcome at any time.

See: https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#development-cycle
Joe Damato Jan. 27, 2025, 5:06 p.m. UTC | #2
On Thu, Jan 23, 2025 at 11:12:32PM +0000, Samiullah Khawaja wrote:
> Extend the already existing support of threaded napi poll to do continuous
> busy polling.
> 
> This is used for doing continuous polling of napi to fetch descriptors from
> backing RX/TX queues for low latency applications. Allow enabling of threaded
> busypoll using netlink so this can be enabled on a set of dedicated napis for
> low latency applications.
> 
> It allows enabling NAPI busy poll for any userspace application
> indepdendent of userspace API being used for packet and event processing
> (epoll, io_uring, raw socket APIs). Once enabled user can fetch the PID
> of the kthread doing NAPI polling and set affinity, priority and
> scheduler for it depending on the low-latency requirements.

When you resubmit this after the merge window (or if you resubmit it
as an RFC), would you mind CCing both me (jdamato@fastly.com) and
Martin (mkarsten@uwaterloo.ca) ?

We almost missed this revision after commenting on the previous
version, since we weren't included in the CC list.

Both Martin and I read through the cover letter and proposed changes
and have several questions/comments, but given that the thread is
marked as deferred/closed due to the merge window, we'll hold off
on digging in until the next revision is posted.