From patchwork Tue Feb 13 19:03:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13555497 Received: from mail-io1-f44.google.com (mail-io1-f44.google.com [209.85.166.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C2D85FF03 for ; Tue, 13 Feb 2024 19:13:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707851641; cv=none; b=ACFva+5xvJ3rBwtJMIzyZ+lFqwhiE1gemaqno+TVhDp49fkNAzdn1HbwxCA4CwZb8i/MzqErWy07K48bB6jJzE3oHFi44M5BRzflTGa6TMWp4UetJIpPXdO8jLgrlR8dqDKSrwwP8EBZpU+Zn1dBAvfACUZajKR+ZMpWW3insHg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707851641; c=relaxed/simple; bh=hzMJczFPiL49MXlzNIh2W66wvQ32YkE6TzhWCp2rVQk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gNh+sIMU8dQ9R31UfYNhb5G0hBSJf56lDySgzS8MyxIPBr9a+4XLrYAsTQN4hsC9eBxBUeXVVWYwsoygWNEQFlgusXOvczdSAsrc1/rp1MWbjF2JuUXY1j1oCw0s8UichdYfD9QL/HFL9Waer8eJkWbieBwUhDCH+PkBvMwEZXE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=IjPjubec; arc=none smtp.client-ip=209.85.166.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="IjPjubec" Received: by mail-io1-f44.google.com with SMTP id ca18e2360f4ac-7bbdd28a52aso89846039f.1 for ; Tue, 13 Feb 2024 11:13:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1707851637; x=1708456437; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hDpZTErlRHw22QWpqYvdD5JTThCnDYRT8sRzc1Z849Y=; b=IjPjubecbmx0HEcRVNrVK2T8ftqCtwVOP0yg6MM120kju9mkominnkaLb21D6oTB/X AKk/EPgD9YcicHNXZVvfdJtloDGTSphbprkr3KDDyvyFZZDsf0ndLy7m6rGrhb44ewAY CHAt7GmD/fVNS0KU+3nxYYD+3JgH34Qy2NCsUXsIGeo1clWs4PZ4lPgMJNkaau/WCWLe l16VNXFhWA9BHfK4+VcSKG2KQvMBQJZBJyorsux1LQmq1sO6dAorLi9nxlVNNCKUFQTL qRcNN6IAWhVmKvRSSBW/VY4YDSmNx9okOpyBclJF1tD4YXgTbex/sbRrRmhtpE1vVrBF t5gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707851637; x=1708456437; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hDpZTErlRHw22QWpqYvdD5JTThCnDYRT8sRzc1Z849Y=; b=ejtpu9VPzSry8pPAQY23zJMcvaepo0sEyEEkZtaUAMNs149cRfUUlwjOcxq2g/N8ao soU+cBsHedYEaf2BRjALTwB8oA8bu/SznP4mJy/7UE0pS6Fe9zo4lH+IjLy/YEJexOgg inQO0PFIdjFEE/TyURlfKjcmDBMNHOd0Guu7zKujCEjix+tMoyma9dynru1hnSF2HrPK r/+D10VeIFOEKz44NoCn6dBa6w+aVF2yyIG4OKYwKrRiR0p/6gFjV1hP/s5NCqrISkjX BT8C9y6Dy6X6GY6uF4JkHIh0fmqU5shJkHJ8S8yxDiuMzEr0eEpegVL7foOTAzn9ITXT koLA== X-Gm-Message-State: AOJu0YyW/qfASyOS3s8L0hGqvkySv6JgbVVduZX16cx/CNxQ6y+BA1zs GrsugE6D3pAnCRei/WsHBuH+0NbyZtX80hIDamBC4XfMRWaY1uRRl4BKFIHYdSkSc8MFgwHJGzi 3 X-Google-Smtp-Source: AGHT+IE8iXVzapFBgbD/7e3ezbf/m2ZFculC1Fy//teoCzU55B2c2AJZ+MGGuzLDJJKgmfC9PNV+0A== X-Received: by 2002:a05:6602:52:b0:7c4:79cc:c4b3 with SMTP id z18-20020a056602005200b007c479ccc4b3mr704713ioz.0.1707851636988; Tue, 13 Feb 2024 11:13:56 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id cz17-20020a0566384a1100b004713ef05d60sm2032176jab.96.2024.02.13.11.13.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Feb 2024 11:13:55 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 1/4] io_uring: move schedule wait logic into helper Date: Tue, 13 Feb 2024 12:03:38 -0700 Message-ID: <20240213191352.2452160-2-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240213191352.2452160-1-axboe@kernel.dk> References: <20240213191352.2452160-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In preparation for expanding how we handle waits, move the actual schedule and schedule_timeout() handling into a helper. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 37 +++++++++++++++++++++---------------- 1 file changed, 21 insertions(+), 16 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 479f610e314f..67cc7003b5bd 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2523,22 +2523,10 @@ static bool current_pending_io(void) return percpu_counter_read_positive(&tctx->inflight); } -/* when returns >0, the caller should retry */ -static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, - struct io_wait_queue *iowq) +static int __io_cqring_wait_schedule(struct io_ring_ctx *ctx, + struct io_wait_queue *iowq) { - int io_wait, ret; - - if (unlikely(READ_ONCE(ctx->check_cq))) - return 1; - if (unlikely(!llist_empty(&ctx->work_llist))) - return 1; - if (unlikely(test_thread_flag(TIF_NOTIFY_SIGNAL))) - return 1; - if (unlikely(task_sigpending(current))) - return -EINTR; - if (unlikely(io_should_wake(iowq))) - return 0; + int io_wait, ret = 0; /* * Mark us as being in io_wait if we have pending requests, so cpufreq @@ -2548,7 +2536,6 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, io_wait = current->in_iowait; if (current_pending_io()) current->in_iowait = 1; - ret = 0; if (iowq->timeout == KTIME_MAX) schedule(); else if (!schedule_hrtimeout(&iowq->timeout, HRTIMER_MODE_ABS)) @@ -2557,6 +2544,24 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, return ret; } +/* when returns >0, the caller should retry */ +static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, + struct io_wait_queue *iowq) +{ + if (unlikely(READ_ONCE(ctx->check_cq))) + return 1; + if (unlikely(!llist_empty(&ctx->work_llist))) + return 1; + if (unlikely(test_thread_flag(TIF_NOTIFY_SIGNAL))) + return 1; + if (unlikely(task_sigpending(current))) + return -EINTR; + if (unlikely(io_should_wake(iowq))) + return 0; + + return __io_cqring_wait_schedule(ctx, iowq); +} + /* * Wait until events become available, if we don't already have some. The * application must reap them itself, as they reside on the shared cq ring. From patchwork Tue Feb 13 19:03:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13555498 Received: from mail-io1-f54.google.com (mail-io1-f54.google.com [209.85.166.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8ACCF1CF9C for ; Tue, 13 Feb 2024 19:13:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707851641; cv=none; b=FvBPwF+PwLGKFt68Qngr4calaK84moKnpJeWMcjPbHRgni2qqaavSwW8zar6As0lQhjK3cDcEYX6VkTY/UO+s4TY34q8jz6tXtRLFcrrOvVb4eFmvGvvIPJ9T/ZA89JOHA4loWT26zMT8Jh8G3ET4rMRqGjznrQpfXpDFqavBZM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707851641; c=relaxed/simple; bh=KaYvlteVmBmUOKR/d66kbUHxMrfMW0kblL3f6aVkDlA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HCkEj3zp5981UzI4k4hCzvEXD/fKfuyGspFrzx1nWCJkHz3pZwOv0d8ui0Zm0NhLsq+DJlGqMiVyq9oFRbdpKuI144bBrr1uJgoMtNe4/bZYXCivrJSAr6SnaAxloC55znkgZ6QI09zIEcmXM4+GXHTVP4leDtKHt4kY/uwM/Ms= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=gP46mGjq; arc=none smtp.client-ip=209.85.166.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="gP46mGjq" Received: by mail-io1-f54.google.com with SMTP id ca18e2360f4ac-7bff2f6080aso22262939f.1 for ; Tue, 13 Feb 2024 11:13:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1707851638; x=1708456438; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=c/Kmq+35dnA+ebSFVKAZ8uNkETqLpSzTuSZvxvGR39E=; b=gP46mGjqU1nXCdr3r6XSkXlSiGnDgZ4caIOdRKJrkwgvyqnell+GJg4ejs44duvps2 Rt/vSUarOVAyATJ5D1ey6/Cl2RJ+NwkYtKm8Ynpf1FNzL/w3AlILWOpRviFFdz+f8nIG fT8/OtMRXJW58iMqQdGUwDaIcbQYctwIKbTeFjWf7QWICaSJnLkLAU0mn3NspWAm/CF/ bz6FTK4H6bjzrFrN9dlowB1+/bdUUjf/WLK8K0yD08xg+y2HFlAbPM2QMjaTs4nOmwhg /4LlSmB3T/oCMre5KtdJhu1QmUeop9ev6bUNruet0ddu5IBgi0n5VEChsz8csdVEj7NW Z5sg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707851638; x=1708456438; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=c/Kmq+35dnA+ebSFVKAZ8uNkETqLpSzTuSZvxvGR39E=; b=bIIblMM9uQBro0PYUHRNw5eNskPcfPOVlNls2Tcv4UH8HYYP/1n0O6SdWkR4Hf+c36 r3qQdw5NhoiidZ8j4SK0wRUCmsKT38Wn/gaLqnVTnTuCa23JXZD6h9uP/RJY9BcpOQ0d /pQeBI4/WqrhC6gB4CKV95cTzO2YX5C8TYm/4HkVn6uoB2aFNgmEWoA2WPExAcvS2iAO ahkONQrSK+1l0pFcgLS8IogOUYtcL1H6+ibuPv4t6/GjGv2WvODIFTC+4JFBZ57fFm9d B7QP3isxFJXAbemGig9udew8Gn820HTkA5GBhnKF6C1RWjJkFKk0D0oUnyWvZ89BaUfI SJ3A== X-Gm-Message-State: AOJu0YwbM3q4r7j6+WZ0LnOKx8t706n3QXsDtqR+rxMlDKLr/eHsegsJ IdITqPLasttci5sB5NsPhvLM5C+arIBP8lZU6LRBx5Srpn7LWQgGuiNa8mplzEz2ywvdbAM6EHt i X-Google-Smtp-Source: AGHT+IGmAMG+eElwuhB4rpspvB3Es1nbrYaPHfz04brf18YxHNaKbqR6oVZ/ss6QOEZwAD/7N9WaPQ== X-Received: by 2002:a5e:9411:0:b0:7c4:4f32:8311 with SMTP id q17-20020a5e9411000000b007c44f328311mr647932ioj.2.1707851638331; Tue, 13 Feb 2024 11:13:58 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id cz17-20020a0566384a1100b004713ef05d60sm2032176jab.96.2024.02.13.11.13.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Feb 2024 11:13:57 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 2/4] io_uring: implement our own schedule timeout handling Date: Tue, 13 Feb 2024 12:03:39 -0700 Message-ID: <20240213191352.2452160-3-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240213191352.2452160-1-axboe@kernel.dk> References: <20240213191352.2452160-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In preparation for having two distinct timeouts and avoid waking the task if we don't need to. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 39 +++++++++++++++++++++++++++++++++++---- io_uring/io_uring.h | 2 ++ 2 files changed, 37 insertions(+), 4 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 67cc7003b5bd..f2d3f39d6106 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2495,7 +2495,7 @@ static int io_wake_function(struct wait_queue_entry *curr, unsigned int mode, * Cannot safely flush overflowed CQEs from here, ensure we wake up * the task, and the next invocation will do it. */ - if (io_should_wake(iowq) || io_has_work(iowq->ctx)) + if (io_should_wake(iowq) || io_has_work(iowq->ctx) || iowq->hit_timeout) return autoremove_wake_function(curr, mode, wake_flags, key); return -1; } @@ -2523,6 +2523,37 @@ static bool current_pending_io(void) return percpu_counter_read_positive(&tctx->inflight); } +static enum hrtimer_restart io_cqring_timer_wakeup(struct hrtimer *timer) +{ + struct io_wait_queue *iowq = container_of(timer, struct io_wait_queue, t); + struct io_ring_ctx *ctx = iowq->ctx; + + WRITE_ONCE(iowq->hit_timeout, 1); + if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) + wake_up_process(ctx->submitter_task); + else + io_cqring_wake(ctx); + return HRTIMER_NORESTART; +} + +static int io_cqring_schedule_timeout(struct io_wait_queue *iowq) +{ + iowq->hit_timeout = 0; + hrtimer_init_on_stack(&iowq->t, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); + iowq->t.function = io_cqring_timer_wakeup; + hrtimer_set_expires_range_ns(&iowq->t, iowq->timeout, 0); + hrtimer_start_expires(&iowq->t, HRTIMER_MODE_ABS); + + if (!READ_ONCE(iowq->hit_timeout)) + schedule(); + + hrtimer_cancel(&iowq->t); + destroy_hrtimer_on_stack(&iowq->t); + __set_current_state(TASK_RUNNING); + + return READ_ONCE(iowq->hit_timeout) ? -ETIME : 0; +} + static int __io_cqring_wait_schedule(struct io_ring_ctx *ctx, struct io_wait_queue *iowq) { @@ -2536,10 +2567,10 @@ static int __io_cqring_wait_schedule(struct io_ring_ctx *ctx, io_wait = current->in_iowait; if (current_pending_io()) current->in_iowait = 1; - if (iowq->timeout == KTIME_MAX) + if (iowq->timeout != KTIME_MAX) + ret = io_cqring_schedule_timeout(iowq); + else schedule(); - else if (!schedule_hrtimeout(&iowq->timeout, HRTIMER_MODE_ABS)) - ret = -ETIME; current->in_iowait = io_wait; return ret; } diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 1ca99522811b..d7295ae2c8a6 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -40,7 +40,9 @@ struct io_wait_queue { struct io_ring_ctx *ctx; unsigned cq_tail; unsigned nr_timeouts; + int hit_timeout; ktime_t timeout; + struct hrtimer t; #ifdef CONFIG_NET_RX_BUSY_POLL unsigned int napi_busy_poll_to; From patchwork Tue Feb 13 19:03:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13555499 Received: from mail-io1-f43.google.com (mail-io1-f43.google.com [209.85.166.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1B1F6171A5 for ; Tue, 13 Feb 2024 19:14:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707851643; cv=none; b=rgCCBomH3CiG5JVRhMHbGnhIY7F5U7nmL2dclCujIcDQVr4mXEGC2PFzm0AqEO8TQkWof+7o1XLRkenMQZ8h2ZACELgYy9zxk+EY+jL5w1blC2wlgdiCM1mNzbjngS2uraxJYobx7TMRLZicq4xW0UAPJs7xWLMxTSLtarWWh7E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707851643; c=relaxed/simple; bh=dMt1NlBWsrqJtie6PeTOysB0V/wSdi5EkfC+Wy3ckK4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Wob/3v5v9smlUwrg5n2OLt4R9d5S33dTY1X3VjhELxAplAAJT2iON+EQqhtVg5fvvUX4edZMKk0oVO6rf0c/R3rYvgw6xld/KA8HnbdFpByO3jwoe6KUJylZ+hn1X4ZJyrPLcY3yiUCNtMF1F1bR15ZQ2a1e9Wj6RaCjmgeQFmM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=qyVyQQ4l; arc=none smtp.client-ip=209.85.166.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="qyVyQQ4l" Received: by mail-io1-f43.google.com with SMTP id ca18e2360f4ac-7bf3283c18dso44105539f.0 for ; Tue, 13 Feb 2024 11:14:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1707851640; x=1708456440; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Vf38SdiaDrxbabZ3coOW8KMn6A6Vh6WqJb747/gIVcY=; b=qyVyQQ4lPq8DR8Rd0IoRf+hdNRgdW4thGbNdSzCDe0KDveU0Fmi62cRKx7hFAH8z+8 fPxsupJ71+UUdlLO2PR4yUDY5vyTa8VXJS1OgoWBMndOXWW3pWde14G32ysETZswxwqG 72EsK0Q8eMKJZiQGX0DJWVerv//P/2wpuqgJOP1Yp8DxZb9gIIhuQ0rsfp2y1Pa9PsYa Yd4zZr52HtMhY/A+d9hHf0LI7kiL7Crvl7xHGNwOIPLhnwzwyy7G+drFNRczZUP0bEM5 mhe4SUFGpkSlxvNsKqCc19SQU5wGRe+kUfLXSqj+BncPQQbZxowzQ+n0vRZQDza7c8hO OscQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707851640; x=1708456440; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Vf38SdiaDrxbabZ3coOW8KMn6A6Vh6WqJb747/gIVcY=; b=ugtZopqb2He2/8lavLcj8G7bvuM9PPhPapeV2jiU7oQsbyLBBPP+pa3cncEmWOk6Za 0svT2bdcn5YY+JmIOLbrXRPRPgGGXyrpywoqjoMRWdEOn9hQLNjqnT1GrONg+vrn7C4P TQLg/I8dswTSveudfajW+Abj58pNs00111JQkPKdfz+pMMpMSB82LW0R1U+D7X5HQiIj 0nuTCBJYcyd/SRBtvc4f1RERS/6q1/kW2InAWZvT/6tcwAH74GVRTLxCaxq5BgkvXg+P iQOGzQUYr3YE3Khqa9KbooW0+aTViBxSeA0VjCXl75zHZ/LXUJiYRblzCxRh+nqQLfwn khvg== X-Gm-Message-State: AOJu0YzFgKXjYGSQYSdTyP47xsZ0B0oWURTCiqUX/VrAX9B+PBIWzcz5 crtpgSYBGTjz98g3wjWsmIO6+qOlnd8mwM79ne80vDz4rDI61RSVm+grMTGlKSxCaKK0ONP6aBQ g X-Google-Smtp-Source: AGHT+IHkG8ZvGBVyBld/j83TerdDA6VBCm2Fyq/cGxU0l84RRkma4Pojf5TzbviF6IiPz4f1i2p/0A== X-Received: by 2002:a6b:dd0d:0:b0:7c4:5898:11d0 with SMTP id f13-20020a6bdd0d000000b007c4589811d0mr703282ioc.1.1707851639697; Tue, 13 Feb 2024 11:13:59 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id cz17-20020a0566384a1100b004713ef05d60sm2032176jab.96.2024.02.13.11.13.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Feb 2024 11:13:58 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 3/4] io_uring: add support for batch wait timeout Date: Tue, 13 Feb 2024 12:03:40 -0700 Message-ID: <20240213191352.2452160-4-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240213191352.2452160-1-axboe@kernel.dk> References: <20240213191352.2452160-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Waiting for events with io_uring has two knobs that can be set: 1) The number of events to wake for 2) The timeout associated with the event Waiting will abort when either of those conditions are met, as expected. This adds support for a third event, which is associated with the number of events to wait for. Applications will generally like to handle batches of completions, and right now they'd set a number of events to wait for and the timeout for that. If no events have been received but the timeout triggers, control is returned to the application and it can wait again. However, if the application doesn't have anything to do until events are reaped, then it's possible to make this waiting more efficient. For example, the application may have a latency time of 50 usecs and wanting to handle a batch of 8 requests at the time. If it uses 50 usecs as the timeout, then it'll be doing 20K context switches per second even if nothing is happening. This introduces the notion of min batch wait time. If the min batch wait time expires, then we'll return to userspace if we have any events at all. If none are available, the general wait time is applied. Any request arriving after the min batch wait time will cause waiting to stop and return control to the application. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 78 +++++++++++++++++++++++++++++++++++++++------ io_uring/io_uring.h | 2 ++ 2 files changed, 70 insertions(+), 10 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index f2d3f39d6106..34f7884be932 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2536,12 +2536,64 @@ static enum hrtimer_restart io_cqring_timer_wakeup(struct hrtimer *timer) return HRTIMER_NORESTART; } -static int io_cqring_schedule_timeout(struct io_wait_queue *iowq) +/* + * Doing min_timeout portion. If we saw any timeouts, events, or have work, + * wake up. If not, and we have a normal timeout, switch to that and keep + * sleeping. + */ +static enum hrtimer_restart io_cqring_min_timer_wakeup(struct hrtimer *timer) +{ + struct io_wait_queue *iowq = container_of(timer, struct io_wait_queue, t); + struct io_ring_ctx *ctx = iowq->ctx; + + /* no general timeout, or shorter, we are done */ + if (iowq->timeout == KTIME_MAX || + ktime_after(iowq->min_timeout, iowq->timeout)) + goto out_wake; + /* work we may need to run, wake function will see if we need to wake */ + if (io_has_work(ctx)) + goto out_wake; + /* got events since we started waiting, min timeout is done */ + if (iowq->cq_min_tail != READ_ONCE(ctx->rings->cq.tail)) + goto out_wake; + /* if we have any events and min timeout expired, we're done */ + if (io_cqring_events(ctx)) + goto out_wake; + + /* + * If using deferred task_work running and application is waiting on + * more than one request, ensure we reset it now where we are switching + * to normal sleeps. Any request completion post min_wait should wake + * the task and return. + */ + if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) + atomic_set(&ctx->cq_wait_nr, 1); + + iowq->timeout = ktime_add_ns(ktime_sub_ns(iowq->timeout, iowq->min_timeout), ktime_get_ns()); + iowq->t.function = io_cqring_timer_wakeup; + hrtimer_set_expires(timer, iowq->timeout); + return HRTIMER_RESTART; +out_wake: + return io_cqring_timer_wakeup(timer); +} + +static int io_cqring_schedule_timeout(struct io_wait_queue *iowq, + ktime_t start_time) { + ktime_t timeout; + iowq->hit_timeout = 0; hrtimer_init_on_stack(&iowq->t, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); - iowq->t.function = io_cqring_timer_wakeup; - hrtimer_set_expires_range_ns(&iowq->t, iowq->timeout, 0); + + if (iowq->min_timeout != KTIME_MAX) { + timeout = ktime_add_ns(iowq->min_timeout, start_time); + iowq->t.function = io_cqring_min_timer_wakeup; + } else { + timeout = ktime_add_ns(iowq->timeout, start_time); + iowq->t.function = io_cqring_timer_wakeup; + } + + hrtimer_set_expires_range_ns(&iowq->t, timeout, 0); hrtimer_start_expires(&iowq->t, HRTIMER_MODE_ABS); if (!READ_ONCE(iowq->hit_timeout)) @@ -2555,7 +2607,8 @@ static int io_cqring_schedule_timeout(struct io_wait_queue *iowq) } static int __io_cqring_wait_schedule(struct io_ring_ctx *ctx, - struct io_wait_queue *iowq) + struct io_wait_queue *iowq, + ktime_t start_time) { int io_wait, ret = 0; @@ -2567,8 +2620,8 @@ static int __io_cqring_wait_schedule(struct io_ring_ctx *ctx, io_wait = current->in_iowait; if (current_pending_io()) current->in_iowait = 1; - if (iowq->timeout != KTIME_MAX) - ret = io_cqring_schedule_timeout(iowq); + if (iowq->timeout != KTIME_MAX || iowq->min_timeout != KTIME_MAX) + ret = io_cqring_schedule_timeout(iowq, start_time); else schedule(); current->in_iowait = io_wait; @@ -2577,7 +2630,8 @@ static int __io_cqring_wait_schedule(struct io_ring_ctx *ctx, /* when returns >0, the caller should retry */ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, - struct io_wait_queue *iowq) + struct io_wait_queue *iowq, + ktime_t start_time) { if (unlikely(READ_ONCE(ctx->check_cq))) return 1; @@ -2590,7 +2644,7 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, if (unlikely(io_should_wake(iowq))) return 0; - return __io_cqring_wait_schedule(ctx, iowq); + return __io_cqring_wait_schedule(ctx, iowq, start_time); } /* @@ -2603,6 +2657,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, { struct io_wait_queue iowq; struct io_rings *rings = ctx->rings; + ktime_t start_time; int ret; if (!io_allowed_run_tw(ctx)) @@ -2633,8 +2688,11 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, INIT_LIST_HEAD(&iowq.wq.entry); iowq.ctx = ctx; iowq.nr_timeouts = atomic_read(&ctx->cq_timeouts); + iowq.cq_min_tail = READ_ONCE(ctx->rings->cq.tail); iowq.cq_tail = READ_ONCE(ctx->rings->cq.head) + min_events; + iowq.min_timeout = KTIME_MAX; iowq.timeout = KTIME_MAX; + start_time = ktime_get_ns(); if (uts) { struct timespec64 ts; @@ -2642,7 +2700,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, if (get_timespec64(&ts, uts)) return -EFAULT; - iowq.timeout = ktime_add_ns(timespec64_to_ktime(ts), ktime_get_ns()); + iowq.timeout = timespec64_to_ktime(ts); io_napi_adjust_timeout(ctx, &iowq, &ts); } @@ -2661,7 +2719,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, TASK_INTERRUPTIBLE); } - ret = io_cqring_wait_schedule(ctx, &iowq); + ret = io_cqring_wait_schedule(ctx, &iowq, start_time); __set_current_state(TASK_RUNNING); atomic_set(&ctx->cq_wait_nr, IO_CQ_WAKE_INIT); diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index d7295ae2c8a6..56b1672dbeb7 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -39,8 +39,10 @@ struct io_wait_queue { struct wait_queue_entry wq; struct io_ring_ctx *ctx; unsigned cq_tail; + unsigned cq_min_tail; unsigned nr_timeouts; int hit_timeout; + ktime_t min_timeout; ktime_t timeout; struct hrtimer t; From patchwork Tue Feb 13 19:03:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13555500 Received: from mail-il1-f173.google.com (mail-il1-f173.google.com [209.85.166.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C3C7E605DD for ; Tue, 13 Feb 2024 19:14:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707851644; cv=none; b=suG3taBM3DZX+xTwwF17quk06KDnT6WFUjDu7xwpiOyw1LBoyRkiO1JPSMhj6LxHGO2qQY097Mx7uxbnEISAZEsmCGmTzOlc9kfUiVuNeusgSNmLHa2/sq1uAtxwf+wVdSuvffLywt9zZ532itht6svPXU16NZQb12ozpJDFnHE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707851644; c=relaxed/simple; bh=2ReQMfCL1Aw49zDjNNTZo96Hd8ZLY0K30Zd3Ic32Nxs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZiybE8MNjLAwT5KIMbURBNBPKmQs26pR/OQ7Mxe89VliOeR6DFg9+OaCkR/cBoQhU9KUp9pj1PldfbfIkJKk8uo+luCNdp7ECRkMXqou4pD8mGBiCgd1FnX89/w26OoDj5hhZrVIcNG1MV5NeR22foTDR1t1cA0aloP5h1+ihag= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=lme36W6c; arc=none smtp.client-ip=209.85.166.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="lme36W6c" Received: by mail-il1-f173.google.com with SMTP id e9e14a558f8ab-363acc3bbd8so4594705ab.1 for ; Tue, 13 Feb 2024 11:14:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1707851641; x=1708456441; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ks4ahqB4+W/rvQqJQjSRmZs4we6KjbHbNkJ+sqJKp/w=; b=lme36W6cil/BI5cu1bT668BuEs6cSCrlo1top3O41Xkr3Ip5O/9BHIlG9N9gW55lhj eGVeBMHcM11QBtY3iJscx+frwmFEm+VXVZy5hGjqRoThKLvH8q8iKVZ+W0QAMh9zQFAe /kFyF7QYtMqmteGieWDNJTxOp5d88xbrO2leXnnijRkEnQroX+tgz8P8t1BWc7XmRhAw 4azxOVujg7FNmKl7EHwSao3wMDjb6F4K2KhPg3oNb9RyH9i473zsHSCYM5fIQsjtmO8f XzD4auUXtQHy4mxkjI6vXIpXH9cQGtVJKPNd2YJsS4gIQLMicTocwv28aEDqym4iqSzo NGKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707851641; x=1708456441; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ks4ahqB4+W/rvQqJQjSRmZs4we6KjbHbNkJ+sqJKp/w=; b=wx46zD6gSB+u8xG4hyHjoLulzBM+IBtpkWC2LbqiyDNM1qbKCtAFrOmbEj+exw42O4 I83MbjPc/Lo30DlC6n+57vHOUYB5uCSJA8A6QjyaAxLy5UbdPf/g49GvdHGhl6XuOS4O tYts/Y1BK8ypoK5VobqreEd7i47sEqLZ9pCFSYxqrlMz/JTTknrQw6x/dqen0O2tTavQ iIzcpoQR75exJ3/V5hEqj4xg7EV9PZAorgR4QEKMS65A7265HvG0BbtvbjADslSkrUfw zuuaKYsDHg54bBRiHafYyf/fmsVi/Oj9P71Ip5YBK2l6usUKm2lzeR3sTHGg6j3/9pmD ygFA== X-Gm-Message-State: AOJu0Yy5LK6Y8iwgwAE95Ff1vr5MC6joVZdINGnLU2isDnN2n8PGV+Xm GmQEeErFjhD9zlDP2MZl1QFJcqUfiijDteMkT3z07eSiAjplWrb0Lto5CCpX5G6p0tYiDunVyDz t X-Google-Smtp-Source: AGHT+IHX8E4qE6pojKTOyl51AsCJ3JHpdeOEvyDDKg1efAbAoOMozcW/kCZBiYpW3Djd9sFhK8XDWg== X-Received: by 2002:a6b:d810:0:b0:7c4:8059:7715 with SMTP id y16-20020a6bd810000000b007c480597715mr456196iob.1.1707851641274; Tue, 13 Feb 2024 11:14:01 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id cz17-20020a0566384a1100b004713ef05d60sm2032176jab.96.2024.02.13.11.13.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Feb 2024 11:13:59 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 4/4] io_uring: wire up min batch wake timeout Date: Tue, 13 Feb 2024 12:03:41 -0700 Message-ID: <20240213191352.2452160-5-axboe@kernel.dk> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240213191352.2452160-1-axboe@kernel.dk> References: <20240213191352.2452160-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Expose min_wait_usec in io_uring_getevents_arg, replacing the pad member that is currently in there. The value is in usecs, which is explained in the name as well. Note that if min_wait_usec and a normal timeout is used in conjunction, the normal timeout is still relative to the base time. For example, if min_wait_usec is set to 100 and the normal timeout is 1000, the max total time waited is still 1000. This also means that if the normal timeout is shorter than min_wait_usec, then only the min_wait_usec will take effect. See previous commit for an explanation of how this works. IORING_FEAT_MIN_TIMEOUT is added as a feature flag for this, as applications doing submit_and_wait_timeout() style operations will generally not see the -EINVAL from the wait side as they return the number of IOs submitted. Only if no IOs are submitted will the -EINVAL bubble back up to the application. Signed-off-by: Jens Axboe --- include/uapi/linux/io_uring.h | 3 ++- io_uring/io_uring.c | 21 +++++++++++++-------- 2 files changed, 15 insertions(+), 9 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 7bd10201a02b..dbefda14d087 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -522,6 +522,7 @@ struct io_uring_params { #define IORING_FEAT_CQE_SKIP (1U << 11) #define IORING_FEAT_LINKED_FILE (1U << 12) #define IORING_FEAT_REG_REG_RING (1U << 13) +#define IORING_FEAT_MIN_TIMEOUT (1U << 14) /* * io_uring_register(2) opcodes and arguments @@ -738,7 +739,7 @@ enum { struct io_uring_getevents_arg { __u64 sigmask; __u32 sigmask_sz; - __u32 pad; + __u32 min_wait_usec; __u64 ts; }; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 34f7884be932..2f9aaa3d8273 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2653,7 +2653,8 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, */ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, const sigset_t __user *sig, size_t sigsz, - struct __kernel_timespec __user *uts) + struct __kernel_timespec __user *uts, + ktime_t min_time) { struct io_wait_queue iowq; struct io_rings *rings = ctx->rings; @@ -2690,7 +2691,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, iowq.nr_timeouts = atomic_read(&ctx->cq_timeouts); iowq.cq_min_tail = READ_ONCE(ctx->rings->cq.tail); iowq.cq_tail = READ_ONCE(ctx->rings->cq.head) + min_events; - iowq.min_timeout = KTIME_MAX; + iowq.min_timeout = min_time; iowq.timeout = KTIME_MAX; start_time = ktime_get_ns(); @@ -3640,10 +3641,12 @@ static int io_validate_ext_arg(unsigned flags, const void __user *argp, size_t a static int io_get_ext_arg(unsigned flags, const void __user *argp, size_t *argsz, struct __kernel_timespec __user **ts, - const sigset_t __user **sig) + const sigset_t __user **sig, ktime_t *min_time) { struct io_uring_getevents_arg arg; + *min_time = KTIME_MAX; + /* * If EXT_ARG isn't set, then we have no timespec and the argp pointer * is just a pointer to the sigset_t. @@ -3662,8 +3665,8 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp, size_t *argsz return -EINVAL; if (copy_from_user(&arg, argp, sizeof(arg))) return -EFAULT; - if (arg.pad) - return -EINVAL; + if (arg.min_wait_usec) + *min_time = arg.min_wait_usec * NSEC_PER_USEC; *sig = u64_to_user_ptr(arg.sigmask); *argsz = arg.sigmask_sz; *ts = u64_to_user_ptr(arg.ts); @@ -3775,13 +3778,14 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, } else { const sigset_t __user *sig; struct __kernel_timespec __user *ts; + ktime_t min_time; - ret2 = io_get_ext_arg(flags, argp, &argsz, &ts, &sig); + ret2 = io_get_ext_arg(flags, argp, &argsz, &ts, &sig, &min_time); if (likely(!ret2)) { min_complete = min(min_complete, ctx->cq_entries); ret2 = io_cqring_wait(ctx, min_complete, sig, - argsz, ts); + argsz, ts, min_time); } } @@ -4064,7 +4068,8 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, IORING_FEAT_POLL_32BITS | IORING_FEAT_SQPOLL_NONFIXED | IORING_FEAT_EXT_ARG | IORING_FEAT_NATIVE_WORKERS | IORING_FEAT_RSRC_TAGS | IORING_FEAT_CQE_SKIP | - IORING_FEAT_LINKED_FILE | IORING_FEAT_REG_REG_RING; + IORING_FEAT_LINKED_FILE | IORING_FEAT_REG_REG_RING | + IORING_FEAT_MIN_TIMEOUT; if (copy_to_user(params, p, sizeof(*p))) { ret = -EFAULT;