diff mbox series

[v1,2/3] io_uring: do not set no_iowait if IORING_ENTER_NO_WAIT

Message ID 20240816180145.14561-3-dw@davidwei.uk (mailing list archive)
State New
Headers show
Series io_uring: add option to not set in_iowait | expand

Commit Message

David Wei Aug. 16, 2024, 6:01 p.m. UTC
Check for IORING_ENTER_NO_WAIT and do not set current->in_iowait if it
is set. To maintain existing behaviour, by default this flag is not set.

This is to prevent waiting for completions being accounted as iowait
time. Some userspace tools consider iowait time to be 'utilisation' time
which is misleading since the task is not scheduled and the CPU is free
to run other tasks.

High iowait time might be indicative of issues for block IO, but not for
network IO i.e. recv() where we do not control when IO happens.

Signed-off-by: David Wei <dw@davidwei.uk>
---
 io_uring/io_uring.c | 4 +++-
 io_uring/io_uring.h | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

Comments

Jens Axboe Aug. 16, 2024, 6:38 p.m. UTC | #1
On 8/16/24 12:01 PM, David Wei wrote:
> diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
> index 9935819f12b7..e35fecca4445 100644
> --- a/io_uring/io_uring.h
> +++ b/io_uring/io_uring.h
> @@ -41,6 +41,7 @@ struct io_wait_queue {
>  	unsigned cq_tail;
>  	unsigned nr_timeouts;
>  	ktime_t timeout;
> +	bool no_iowait;
>  
>  #ifdef CONFIG_NET_RX_BUSY_POLL
>  	ktime_t napi_busy_poll_dt;

I'd put that bool below the NAPI section, then it'll pack in with
napi_prefer_busy_poll rather than waste 7 bytes as it does here.
Jens Axboe Aug. 16, 2024, 6:49 p.m. UTC | #2
On 8/16/24 12:01 PM, David Wei wrote:
> @@ -2414,6 +2414,8 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags,
>  	iowq.nr_timeouts = atomic_read(&ctx->cq_timeouts);
>  	iowq.cq_tail = READ_ONCE(ctx->rings->cq.head) + min_events;
>  	iowq.timeout = KTIME_MAX;
> +	if (flags & IORING_ENTER_NO_IOWAIT)
> +		iowq.no_iowait = true;

Oh, and this should be:

	iowq.no_iowait = flags & IORING_ENTER_NO_IOWAIT;

to avoid leaving this field uninitialized by default if the flag isn't
set. The struct isn't initialized to zero.
David Wei Aug. 16, 2024, 10:23 p.m. UTC | #3
On 2024-08-16 11:38, Jens Axboe wrote:
> On 8/16/24 12:01 PM, David Wei wrote:
>> diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
>> index 9935819f12b7..e35fecca4445 100644
>> --- a/io_uring/io_uring.h
>> +++ b/io_uring/io_uring.h
>> @@ -41,6 +41,7 @@ struct io_wait_queue {
>>  	unsigned cq_tail;
>>  	unsigned nr_timeouts;
>>  	ktime_t timeout;
>> +	bool no_iowait;
>>  
>>  #ifdef CONFIG_NET_RX_BUSY_POLL
>>  	ktime_t napi_busy_poll_dt;
> 
> I'd put that bool below the NAPI section, then it'll pack in with
> napi_prefer_busy_poll rather than waste 7 bytes as it does here.
> 

Thanks, I will remember to always check with pahole next time!
diff mbox series

Patch

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 4cc905b228a5..9438875e43ea 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2372,7 +2372,7 @@  static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
 	 * can take into account that the task is waiting for IO - turns out
 	 * to be important for low QD IO.
 	 */
-	if (current_pending_io())
+	if (!iowq->no_iowait && current_pending_io())
 		current->in_iowait = 1;
 	ret = 0;
 	if (iowq->timeout == KTIME_MAX)
@@ -2414,6 +2414,8 @@  static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags,
 	iowq.nr_timeouts = atomic_read(&ctx->cq_timeouts);
 	iowq.cq_tail = READ_ONCE(ctx->rings->cq.head) + min_events;
 	iowq.timeout = KTIME_MAX;
+	if (flags & IORING_ENTER_NO_IOWAIT)
+		iowq.no_iowait = true;
 
 	if (uts) {
 		struct timespec64 ts;
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 9935819f12b7..e35fecca4445 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -41,6 +41,7 @@  struct io_wait_queue {
 	unsigned cq_tail;
 	unsigned nr_timeouts;
 	ktime_t timeout;
+	bool no_iowait;
 
 #ifdef CONFIG_NET_RX_BUSY_POLL
 	ktime_t napi_busy_poll_dt;