[5/5] iomap: support IOCB_DIO_DEFER

Message ID	20230711203325.208957-6-axboe@kernel.dk (mailing list archive)
State	Superseded, archived
Headers	show Return-Path: <linux-xfs-owner@vger.kernel.org> From: Jens Axboe <axboe@kernel.dk> To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, Jens Axboe <axboe@kernel.dk> Subject: [PATCH 5/5] iomap: support IOCB_DIO_DEFER Date: Tue, 11 Jul 2023 14:33:25 -0600 Message-Id: <20230711203325.208957-6-axboe@kernel.dk> In-Reply-To: <20230711203325.208957-1-axboe@kernel.dk> References: <20230711203325.208957-1-axboe@kernel.dk> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Improve async iomap DIO performance \| expand [PATCHSET,0/5] Improve async iomap DIO performance [1/5] iomap: complete polled writes inline [2/5] fs: add IOCB flags related to passing back dio completions [3/5] io_uring/rw: add write support for IOCB_DIO_DEFER [4/5] iomap: add local 'iocb' variable in iomap_dio_bio_end_io() [5/5] iomap: support IOCB_DIO_DEFER

Message ID

20230711203325.208957-6-axboe@kernel.dk (mailing list archive)

State

Superseded, archived

Headers

From: Jens Axboe <axboe@kernel.dk>
To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org
Cc: hch@lst.de, andres@anarazel.de, Jens Axboe <axboe@kernel.dk>
Subject: [PATCH 5/5] iomap: support IOCB_DIO_DEFER
Date: Tue, 11 Jul 2023 14:33:25 -0600
Message-Id: <20230711203325.208957-6-axboe@kernel.dk>
In-Reply-To: <20230711203325.208957-1-axboe@kernel.dk>
References: <20230711203325.208957-1-axboe@kernel.dk>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

Improve async iomap DIO performance | expand

Commit Message

Jens Axboe July 11, 2023, 8:33 p.m. UTC

If IOCB_DIO_DEFER is set, utilize that to set kiocb->dio_complete handler
and data for that callback. Rather than punt the completion to a
workqueue, we pass back the handler and data to the issuer and will get a
callback from a safe task context.

Using the following fio job to randomly dio write 4k blocks at
queue depths of 1..16:

fio --name=dio-write --filename=/data1/file --time_based=1 \
--runtime=10 --bs=4096 --rw=randwrite --norandommap --buffered=0 \
--cpus_allowed=4 --ioengine=io_uring --iodepth=16

shows the following results before and after this patch:

	Stock	Patched		Diff
=======================================
QD1	155K	162K		+ 4.5%
QD2	290K	313K		+ 7.9%
QD4	533K	597K		+12.0%
QD8	604K	827K		+36.9%
QD16	615K	845K		+37.4%

which shows nice wins all around. If we factored in per-IOP efficiency,
the wins look even nicer. This becomes apparent as queue depth rises,
as the offloaded workqueue completions runs out of steam.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/iomap/direct-io.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 94ef78b25b76..bd7b948a29a7 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -130,6 +130,11 @@  ssize_t iomap_dio_complete(struct iomap_dio *dio)
 }
 EXPORT_SYMBOL_GPL(iomap_dio_complete);
 
+static ssize_t iomap_dio_deferred_complete(void *data)
+{
+	return iomap_dio_complete(data);
+}
+
 static void iomap_dio_complete_work(struct work_struct *work)
 {
 	struct iomap_dio *dio = container_of(work, struct iomap_dio, aio.work);
@@ -167,6 +172,25 @@  void iomap_dio_bio_end_io(struct bio *bio)
 			   !(dio->flags & IOMAP_DIO_WRITE)) {
 			WRITE_ONCE(iocb->private, NULL);
 			iomap_dio_complete_work(&dio->aio.work);
+		} else if ((iocb->ki_flags & IOCB_DIO_DEFER) &&
+			   !(dio->flags & IOMAP_DIO_NEED_SYNC)) {
+			/* only polled IO cares about private cleared */
+			iocb->private = dio;
+			iocb->dio_complete = iomap_dio_deferred_complete;
+			/*
+			 * Invoke ->ki_complete() directly. We've assigned
+			 * out dio_complete callback handler, and since the
+			 * issuer set IOCB_DIO_DEFER, we know their
+			 * ki_complete handler will notice ->dio_complete
+			 * being set and will defer calling that handler
+			 * until it can be done from a safe task context.
+			 *
+			 * Note that the 'res' being passed in here is
+			 * not important for this case. The actual completion
+			 * value of the request will be gotten from dio_complete
+			 * when that is run by the issuer.
+			 */
+			iocb->ki_complete(iocb, 0);
 		} else {
 			struct inode *inode = file_inode(iocb->ki_filp);