mbox series

[v1,0/3] io_uring: add option to not set in_iowait

Message ID 20240816180145.14561-1-dw@davidwei.uk (mailing list archive)
Headers show
Series io_uring: add option to not set in_iowait | expand

Message

David Wei Aug. 16, 2024, 6:01 p.m. UTC
io_uring sets current->in_iowait when waiting for completions, which
achieves two things:

1. Proper accounting of the time as iowait time
2. Enable cpufreq optimisations, setting SCHED_CPUFREQ_IOWAIT on the rq

For block IO this makes sense as high iowait can be indicative of
issues. But for network IO especially recv, we do not control when the
completions happen. When doing network IO with the new min-wait feature
that lets io_uring wait for a certain number of completions before
returning, this manifests as high iowait time.

Some user tooling attributes iowait time as 'CPU utilisation' time, so
high iowait time looks like high CPU util even though the task is not
scheduled and the CPU is free to run other tasks.

This patchset adds a IOURING_ENTER_NO_IOWAIT flag that can be set on
enter. If set, then current->in_iowait is not set. By default this flag
is not set to maintain existing behaviour i.e. in_iowait is always set.

Not setting in_iowait does mean that we also lose cpufreq optimisations
above because in_iowait semantics couples 1 and 2 together. Eventually
we will untangle the two so the optimisations can be enabled
independently of the accounting.

David Wei (3):
  io_uring: add IORING_ENTER_NO_IOWAIT flag
  io_uring: do not set no_iowait if IORING_ENTER_NO_WAIT
  io_uring: add IORING_FEAT_IOWAIT_TOGGLE feature flag

 include/uapi/linux/io_uring.h | 2 ++
 io_uring/io_uring.c           | 8 +++++---
 io_uring/io_uring.h           | 1 +
 3 files changed, 8 insertions(+), 3 deletions(-)