Message ID | 20240816180145.14561-3-dw@davidwei.uk (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | io_uring: add option to not set in_iowait | expand |
On 8/16/24 12:01 PM, David Wei wrote: > diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h > index 9935819f12b7..e35fecca4445 100644 > --- a/io_uring/io_uring.h > +++ b/io_uring/io_uring.h > @@ -41,6 +41,7 @@ struct io_wait_queue { > unsigned cq_tail; > unsigned nr_timeouts; > ktime_t timeout; > + bool no_iowait; > > #ifdef CONFIG_NET_RX_BUSY_POLL > ktime_t napi_busy_poll_dt; I'd put that bool below the NAPI section, then it'll pack in with napi_prefer_busy_poll rather than waste 7 bytes as it does here.
On 8/16/24 12:01 PM, David Wei wrote: > @@ -2414,6 +2414,8 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags, > iowq.nr_timeouts = atomic_read(&ctx->cq_timeouts); > iowq.cq_tail = READ_ONCE(ctx->rings->cq.head) + min_events; > iowq.timeout = KTIME_MAX; > + if (flags & IORING_ENTER_NO_IOWAIT) > + iowq.no_iowait = true; Oh, and this should be: iowq.no_iowait = flags & IORING_ENTER_NO_IOWAIT; to avoid leaving this field uninitialized by default if the flag isn't set. The struct isn't initialized to zero.
On 2024-08-16 11:38, Jens Axboe wrote: > On 8/16/24 12:01 PM, David Wei wrote: >> diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h >> index 9935819f12b7..e35fecca4445 100644 >> --- a/io_uring/io_uring.h >> +++ b/io_uring/io_uring.h >> @@ -41,6 +41,7 @@ struct io_wait_queue { >> unsigned cq_tail; >> unsigned nr_timeouts; >> ktime_t timeout; >> + bool no_iowait; >> >> #ifdef CONFIG_NET_RX_BUSY_POLL >> ktime_t napi_busy_poll_dt; > > I'd put that bool below the NAPI section, then it'll pack in with > napi_prefer_busy_poll rather than waste 7 bytes as it does here. > Thanks, I will remember to always check with pahole next time!
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 4cc905b228a5..9438875e43ea 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2372,7 +2372,7 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, * can take into account that the task is waiting for IO - turns out * to be important for low QD IO. */ - if (current_pending_io()) + if (!iowq->no_iowait && current_pending_io()) current->in_iowait = 1; ret = 0; if (iowq->timeout == KTIME_MAX) @@ -2414,6 +2414,8 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags, iowq.nr_timeouts = atomic_read(&ctx->cq_timeouts); iowq.cq_tail = READ_ONCE(ctx->rings->cq.head) + min_events; iowq.timeout = KTIME_MAX; + if (flags & IORING_ENTER_NO_IOWAIT) + iowq.no_iowait = true; if (uts) { struct timespec64 ts; diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 9935819f12b7..e35fecca4445 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -41,6 +41,7 @@ struct io_wait_queue { unsigned cq_tail; unsigned nr_timeouts; ktime_t timeout; + bool no_iowait; #ifdef CONFIG_NET_RX_BUSY_POLL ktime_t napi_busy_poll_dt;
Check for IORING_ENTER_NO_WAIT and do not set current->in_iowait if it is set. To maintain existing behaviour, by default this flag is not set. This is to prevent waiting for completions being accounted as iowait time. Some userspace tools consider iowait time to be 'utilisation' time which is misleading since the task is not scheduled and the CPU is free to run other tasks. High iowait time might be indicative of issues for block IO, but not for network IO i.e. recv() where we do not control when IO happens. Signed-off-by: David Wei <dw@davidwei.uk> --- io_uring/io_uring.c | 4 +++- io_uring/io_uring.h | 1 + 2 files changed, 4 insertions(+), 1 deletion(-)