From patchwork Tue Jul 11 20:33:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13309364 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31107EB64DD for ; Tue, 11 Jul 2023 20:33:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230258AbjGKUdf (ORCPT ); Tue, 11 Jul 2023 16:33:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44806 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229782AbjGKUdd (ORCPT ); Tue, 11 Jul 2023 16:33:33 -0400 Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9B65B1709 for ; Tue, 11 Jul 2023 13:33:32 -0700 (PDT) Received: by mail-pf1-x429.google.com with SMTP id d2e1a72fcca58-682a5465e9eso980073b3a.1 for ; Tue, 11 Jul 2023 13:33:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689107612; x=1691699612; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7v97jP75G3VwqqaqjV+oC6sPS3RZEc8vaRf0s3+D/gs=; b=SKxCWLIvvazG4YGBjUsfWOM8s8h5mAXiIzp6mRiKRMhTC6fy9lP6a1x476z0fc7I5X UEaKtqJQ68ELCM+hVDG8qJ0ZEDJ2OlZbndwUqP1HAIdvVams6QKYWAxGEe9sv80jeA6s oejF779Hu5OCp8JwTGHcDPfwo4i2WwmP4uO+4vIKRq//FW0K8s8KpKrVRmeNIqjwLpae 08zF967BVdEjjxkMZ2PCDdkhVcEpMNgRqAQWF2VA5iyT6Ywr0dQKB9/dinCBoTewq71w zzTKP9YW89BeeYJZRA7EixUds8407oiM7ysM8turbt5nckBTywDVguV3XiM3ujNdAAEB MeoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689107612; x=1691699612; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7v97jP75G3VwqqaqjV+oC6sPS3RZEc8vaRf0s3+D/gs=; b=HZVXC+A/2cVg6jmQ3c0sdTFhrlV8BabLOkYYHEk9YF16GKNGUkK5Gk3/kpx+qbVblm dCEe5SkNwDOSV+++1yAdG8Ujzv/+WFniAzr9cC0JX5wTDhF+Paxq10xQKqiVhNlq0vaj sT4Bhj7qrA6UBTJh9VnxdgH2FQA7aLXUGlrWRoBcZjaWLrpM9W3x8nI0bvggG8k4xLmj 5CjmMZtdqHXU3IfotTW7Dc7qFWdCS/CRHkiXxZbVkeXXaWr6H9b5PuV4q3nS9s82Basc L+MU8caA03ySk7jb7O8TP6rDPBYQM49Q6jODD9QQSw/8XXNA04t9kUBhCUdVbQZSqYUs bd3g== X-Gm-Message-State: ABy/qLbYf6kecHXrS37ptj9kZFSeUFtSpYLv6J3ox3m8gLiwt4V0rLn3 xIXwMKJhwkzC0cXJbe+qcWI9dQ== X-Google-Smtp-Source: APBJJlEICKTVTh1X1fQ6strVzMxv0XkAE3MjS2/04VOWMiSDEOodeDA0AM2Bgry1eJUu7HHw1QYL9g== X-Received: by 2002:a05:6a00:3387:b0:675:8627:a291 with SMTP id cm7-20020a056a00338700b006758627a291mr16779088pfb.3.1689107612080; Tue, 11 Jul 2023 13:33:32 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id fk13-20020a056a003a8d00b0067903510abbsm2108081pfb.163.2023.07.11.13.33.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 13:33:31 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, Jens Axboe Subject: [PATCH 1/5] iomap: complete polled writes inline Date: Tue, 11 Jul 2023 14:33:21 -0600 Message-Id: <20230711203325.208957-2-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230711203325.208957-1-axboe@kernel.dk> References: <20230711203325.208957-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Polled IO is always reaped in the context of the process itself, so it does not need to be punted to a workqueue for the completion. This is different than IRQ driven IO, where iomap_dio_bio_end_io() will be invoked from hard/soft IRQ context. For those cases we currently need to punt to a workqueue for further processing. For the polled case, since it's the task itself reaping completions, we're already in task context. That makes it identical to the sync completion case. Testing a basic QD 1..8 dio random write with polled IO with the following fio job: fio --name=polled-dio-write --filename=/data1/file --time_based=1 \ --runtime=10 --bs=4096 --rw=randwrite --norandommap --buffered=0 \ --cpus_allowed=4 --ioengine=io_uring --iodepth=$depth --hipri=1 yields: Stock Patched Diff ======================================= QD1 180K 201K +11% QD2 356K 394K +10% QD4 608K 650K +7% QD8 827K 831K +0.5% which shows a nice win, particularly for lower queue depth writes. This is expected, as higher queue depths will be busy polling completions while the offloaded workqueue completions can happen in parallel. Signed-off-by: Jens Axboe --- fs/iomap/direct-io.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index ea3b868c8355..343bde5d50d3 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -161,15 +161,16 @@ void iomap_dio_bio_end_io(struct bio *bio) struct task_struct *waiter = dio->submit.waiter; WRITE_ONCE(dio->submit.waiter, NULL); blk_wake_io_task(waiter); - } else if (dio->flags & IOMAP_DIO_WRITE) { + } else if ((bio->bi_opf & REQ_POLLED) || + !(dio->flags & IOMAP_DIO_WRITE)) { + WRITE_ONCE(dio->iocb->private, NULL); + iomap_dio_complete_work(&dio->aio.work); + } else { struct inode *inode = file_inode(dio->iocb->ki_filp); WRITE_ONCE(dio->iocb->private, NULL); INIT_WORK(&dio->aio.work, iomap_dio_complete_work); queue_work(inode->i_sb->s_dio_done_wq, &dio->aio.work); - } else { - WRITE_ONCE(dio->iocb->private, NULL); - iomap_dio_complete_work(&dio->aio.work); } } From patchwork Tue Jul 11 20:33:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13309365 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B06D2C00528 for ; Tue, 11 Jul 2023 20:33:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230289AbjGKUdi (ORCPT ); Tue, 11 Jul 2023 16:33:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44824 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230257AbjGKUde (ORCPT ); Tue, 11 Jul 2023 16:33:34 -0400 Received: from mail-pf1-x42a.google.com (mail-pf1-x42a.google.com [IPv6:2607:f8b0:4864:20::42a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11AB11AE for ; Tue, 11 Jul 2023 13:33:34 -0700 (PDT) Received: by mail-pf1-x42a.google.com with SMTP id d2e1a72fcca58-682a5465e9eso980077b3a.1 for ; Tue, 11 Jul 2023 13:33:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689107613; x=1691699613; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8G1ATQFuI41fkL8nxWaXO/6T7p7QjpNqJTfODI4BURk=; b=ruLMvlGVd99hIimX09o4jOpO7XRLli4f19K5s0UL/kg8kU/nXFBG5TEdysN/kw2pxK L4yLcMmsezywupoQo+SWOuZCBlKvHlZwCxZOMWQiRWGZpl7qGfD06s7KXFBJXrquzx2+ 3cno7S1zlf63a6WwoyH8p/bSYzt2/Ksw2ZHyF/D7pZyw8sPTpO+5iOTYSsfbEQtEzdF/ ojN/5s5mAlbA8+DaMbIg71FzV1AiL5HnVyP5eNjCpBnXO5g0y9FqU2pzTZuxFkT60uJ9 a2Nb3VcYgOqxaTFSaZWdh4Ki154Nipci5q740nlVZTKt6PCC0BGcp5xg9oT3eOG0U2lb eV3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689107613; x=1691699613; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8G1ATQFuI41fkL8nxWaXO/6T7p7QjpNqJTfODI4BURk=; b=VUDjoc9oI1QC9xJmJnS7MQszVkNwKRP1IYZokIWVq4boWl6VJKa0FLWT6wcXzCXmEY Y2o4R5X/AzOVZggw7cJrRy+gIV7hagKVDNegc6ZE3XoeXzzjmIoYn0vziScqAZzekow7 dTRfL1ozAxmedhDVlOy6lhqvyniv/UhWvmmgGbncf0jWjzYQcwrnBva2QZts8HI0oG8G BZReVDNBFCOmBz2moXzIvE0BVZGdOl39Qj14jXqOqQgqMGOjYL2GA2DyB1nWERhck0qt 7B1tfJy3HVHh4m+jpI+Cois/7uenQYgI2Cw6oZAnMAj2cQbVMDY/cMyX1leAmuoEn9eb zfcQ== X-Gm-Message-State: ABy/qLaTmQAWtv0bgXvlvXxqF0PkhyB+kyImpAo5Cb4ifYZvvvZ44ws2 BX1yVSRLR2bIYb0rPn1wmoTyJg== X-Google-Smtp-Source: APBJJlHzKXTNBJSugfp1HnJcJwK1C7ev1+Bvxj1jMKRytCnxsp6FqaxILGesdp1hHtLzGsNtRuKcPw== X-Received: by 2002:a05:6a20:3ca7:b0:11a:efaa:eb88 with SMTP id b39-20020a056a203ca700b0011aefaaeb88mr21693558pzj.3.1689107613516; Tue, 11 Jul 2023 13:33:33 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id fk13-20020a056a003a8d00b0067903510abbsm2108081pfb.163.2023.07.11.13.33.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 13:33:32 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, Jens Axboe Subject: [PATCH 2/5] fs: add IOCB flags related to passing back dio completions Date: Tue, 11 Jul 2023 14:33:22 -0600 Message-Id: <20230711203325.208957-3-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230711203325.208957-1-axboe@kernel.dk> References: <20230711203325.208957-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org Async dio completions generally happen from hard/soft IRQ context, which means that users like iomap may need to defer some of the completion handling to a workqueue. This is less efficient than having the original issuer handle it, like we do for sync IO, and it adds latency to the completions. Add IOCB_DIO_DEFER, which the issuer can set if it is able to safely punt these completions to a safe context. If the dio handler is aware of this flag, assign a callback handler in kiocb->dio_complete and associated data io kiocb->private. The issuer will then call this handler with that data from task context. No functional changes in this patch. Signed-off-by: Jens Axboe --- include/linux/fs.h | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 6867512907d6..115382f66d79 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -338,6 +338,16 @@ enum rw_hint { #define IOCB_NOIO (1 << 20) /* can use bio alloc cache */ #define IOCB_ALLOC_CACHE (1 << 21) +/* + * IOCB_DIO_DEFER can be set by the iocb owner, to indicate that the + * iocb completion can be passed back to the owner for execution from a safe + * context rather than needing to be punted through a workqueue. If this + * flag is set, the completion handling may set iocb->dio_complete to a + * handler, which the issuer will then call from task context to complete + * the processing of the iocb. iocb->private should then also be set to + * the argument being passed to this handler. + */ +#define IOCB_DIO_DEFER (1 << 22) /* for use in trace events */ #define TRACE_IOCB_STRINGS \ @@ -351,7 +361,8 @@ enum rw_hint { { IOCB_WRITE, "WRITE" }, \ { IOCB_WAITQ, "WAITQ" }, \ { IOCB_NOIO, "NOIO" }, \ - { IOCB_ALLOC_CACHE, "ALLOC_CACHE" } + { IOCB_ALLOC_CACHE, "ALLOC_CACHE" }, \ + { IOCB_DIO_DEFER, "DIO_DEFER" } struct kiocb { struct file *ki_filp; @@ -360,7 +371,22 @@ struct kiocb { void *private; int ki_flags; u16 ki_ioprio; /* See linux/ioprio.h */ - struct wait_page_queue *ki_waitq; /* for async buffered IO */ + union { + /* + * Only used for async buffered reads, where it denotes the + * page waitqueue associated with completing the read. Valid + * IFF IOCB_WAITQ is set. + */ + struct wait_page_queue *ki_waitq; + /* + * Can be used for O_DIRECT IO, where the completion handling + * is punted back to the issuer of the IO. May only be set + * if IOCB_DIO_DEFER is set by the issuer, and the issuer must + * then check for presence of this handler when ki_complete is + * invoked. + */ + ssize_t (*dio_complete)(void *data); + }; }; static inline bool is_sync_kiocb(struct kiocb *kiocb) From patchwork Tue Jul 11 20:33:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13309363 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6265C001E0 for ; Tue, 11 Jul 2023 20:33:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230352AbjGKUdh (ORCPT ); Tue, 11 Jul 2023 16:33:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44836 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230289AbjGKUdg (ORCPT ); Tue, 11 Jul 2023 16:33:36 -0400 Received: from mail-pf1-x42e.google.com (mail-pf1-x42e.google.com [IPv6:2607:f8b0:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 565E9170C for ; Tue, 11 Jul 2023 13:33:35 -0700 (PDT) Received: by mail-pf1-x42e.google.com with SMTP id d2e1a72fcca58-676cc97ca74so1390664b3a.1 for ; Tue, 11 Jul 2023 13:33:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689107615; x=1689712415; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Rd5q4QGxS59bSAw+JDmQ90OVbkUJjAQIgvePrWiPffo=; b=GDFBVXHrRbHPsPVU4DCmRN6oNIWp52f4jpnmeWVWuO5m92w0TVuR59FJTTN+ga1vtS CDq7NW98kbb95PEhjGTPtMyAZ00nxb4mMve2wzye7sZ2Rgos0LiFLINEeUtipY7/lT3C zxQJbjNL29IIqWIaMbK/uCju3tUA3gmNKiaaQABjaNlVJO/0oCvZA3bDCqBk8h+J9iTC 8w2MnI3/l08WbdKv5OTadqfBMma7qlYKK+n7F5Yzrs/rLEOPJPjW/ZMJngZ8uS64+h6m 9Zv8jBnVRjjg6+bDrfwAd5q4S7p/RFoLs2h4rymbI22cMnbzzvS/A+hx4fdBQ/1hm/1r Uy6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689107615; x=1689712415; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Rd5q4QGxS59bSAw+JDmQ90OVbkUJjAQIgvePrWiPffo=; b=Df63tXEXpWjNMBQ4vB8zk6rpjNG1LhfdwRujsuDmJtincXmMRKZtkooukWg55/63sZ q5JKBHOUxGm2S1t1Mfw3vuDwwRYJZFCydGuHptCaC2L1cMRcKr7Pu0F/PxEW88AOsK0M Xcl6jSDy5HpAUPfetHgabtWfP/UHcHeTKpGl/jrwmLvkXA8i9C1CvzoFjKbiRwjCtD4x ShOBaF/XRcTHtFrvl/8nhmBPv3gGt1hJM5RrlGZWeDglZ8Pw2hWBKPSE2y2dEuV7IRHe xX4XAGGPNvnvYoa6yKkA33QnzdeqlQHYHjPey/grVIEpE5n/S8F/1VCU+Mlli0h+vnD0 UDVQ== X-Gm-Message-State: ABy/qLaP6vY5aEvusbqXC3F+ttEgSsuipneSSzzrCpv+x4u6NZiRbw6Z Kkq/WBm2w2NorQi85yhyf2iuo+Vt29mdoSn3x28= X-Google-Smtp-Source: APBJJlHh4R8H8y0q7l/+iJSpHFtPMwjZxdkfJxKb3+C34QPTz5vRIWajkTtepebKwCc5RmqimcURtw== X-Received: by 2002:a05:6a20:7da6:b0:12f:dce2:b381 with SMTP id v38-20020a056a207da600b0012fdce2b381mr20396148pzj.3.1689107614789; Tue, 11 Jul 2023 13:33:34 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id fk13-20020a056a003a8d00b0067903510abbsm2108081pfb.163.2023.07.11.13.33.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 13:33:34 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, Jens Axboe Subject: [PATCH 3/5] io_uring/rw: add write support for IOCB_DIO_DEFER Date: Tue, 11 Jul 2023 14:33:23 -0600 Message-Id: <20230711203325.208957-4-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230711203325.208957-1-axboe@kernel.dk> References: <20230711203325.208957-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org If the filesystem dio handler understands IOCB_DIO_DEFER, we'll get a kiocb->ki_complete() callback with kiocb->dio_complete set. In that case, rather than complete the IO directly through task_work, queue up an intermediate task_work handler that first processes this callback and then immediately completes the request. For XFS, this avoids a punt through a workqueue, which is a lot less efficient and adds latency to lower queue depth (or sync) O_DIRECT writes. Signed-off-by: Jens Axboe --- io_uring/rw.c | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/io_uring/rw.c b/io_uring/rw.c index 1bce2208b65c..4ed378c70249 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -285,6 +285,14 @@ static inline int io_fixup_rw_res(struct io_kiocb *req, long res) void io_req_rw_complete(struct io_kiocb *req, struct io_tw_state *ts) { + struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); + + if (rw->kiocb.dio_complete) { + long res = rw->kiocb.dio_complete(rw->kiocb.private); + + io_req_set_res(req, io_fixup_rw_res(req, res), 0); + } + io_req_io_end(req); if (req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)) { @@ -300,9 +308,11 @@ static void io_complete_rw(struct kiocb *kiocb, long res) struct io_rw *rw = container_of(kiocb, struct io_rw, kiocb); struct io_kiocb *req = cmd_to_io_kiocb(rw); - if (__io_complete_rw_common(req, res)) - return; - io_req_set_res(req, io_fixup_rw_res(req, res), 0); + if (!rw->kiocb.dio_complete) { + if (__io_complete_rw_common(req, res)) + return; + io_req_set_res(req, io_fixup_rw_res(req, res), 0); + } req->io_task_work.func = io_req_rw_complete; __io_req_task_work_add(req, IOU_F_TWQ_LAZY_WAKE); } @@ -914,7 +924,13 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags) __sb_writers_release(file_inode(req->file)->i_sb, SB_FREEZE_WRITE); } - kiocb->ki_flags |= IOCB_WRITE; + + /* + * Set IOCB_DIO_DEFER, stating that our handler groks deferring the + * completion to task context. + */ + kiocb->ki_flags |= IOCB_WRITE | IOCB_DIO_DEFER; + kiocb->dio_complete = NULL; if (likely(req->file->f_op->write_iter)) ret2 = call_write_iter(req->file, kiocb, &s->iter); From patchwork Tue Jul 11 20:33:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13309366 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59220C04A94 for ; Tue, 11 Jul 2023 20:33:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230360AbjGKUdi (ORCPT ); Tue, 11 Jul 2023 16:33:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44856 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229884AbjGKUdh (ORCPT ); Tue, 11 Jul 2023 16:33:37 -0400 Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D80F195 for ; Tue, 11 Jul 2023 13:33:36 -0700 (PDT) Received: by mail-pf1-x435.google.com with SMTP id d2e1a72fcca58-66d6a9851f3so979259b3a.0 for ; Tue, 11 Jul 2023 13:33:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689107616; x=1691699616; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=K4lQMN8Le/C1jdSB9yTI2JvyEXlu5Hktu+P+yNzyi4U=; b=mrcVeQl9YidPXEIu0qrh18WlOJZDdWABWgO6kgrZ0XmOagDqRVENP+qucVgSkt0anB +pzV8VXABn00CAVK3nnZMcHXaOdjUl2w7xmMgqnmWkndLe0qc6+gSZCODFghShrs5bRf 36qWmIuSDuFGFtg89nYValQI+DF8vXfGPo4qgnSocLzbH1rDTipwGSA/KIVd5yiL3jDh BwJioibL7LoWa761AFiHPra/rVJLfaiZ0Ji4vvd4xLWe7gEVofVPknUDeExtPHttAoPg EubpbxwUozHnSrcrOowG8jDm2DWhVo6O2YK5O4KEuvCaAKvOlbrQV/HKlzNoHPz73RgN b+OQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689107616; x=1691699616; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=K4lQMN8Le/C1jdSB9yTI2JvyEXlu5Hktu+P+yNzyi4U=; b=lCqNmg+fr3CjvSGdH85EVzfdwzh8VPWvzgp4Td5nUFTCO9ntQiFrzh+KbKoS26ifIi nAHlnVfmbWiDIswR6MfVAGfy6s4AH62T4f2BfUuEar+Opxiw2r1I6GeGP2c7kPJ31phv leuFILuo4p1Yl87M8h1KKwfVAUa4J9Ysdc8oaRcgaav/KPFE4Drc/FKIxNg4iLFjozTX Q1tDQU3L6nXdzbIjjMv9t04/hj09lLha3moMC8jeUbpNIivj4KTA8pu/o8ZudrLF6CRC Iaps+BdEJ3+gsoh+MjedYbqeW/SCPN47pEMoDsCZPEqhlVKVLlKhhcxqdkvA1GP2J8cw RADQ== X-Gm-Message-State: ABy/qLa8qGJNPVuGlzjbhW5GgZnfY33hfoZDgL4LUeSh/wmAOY+XdR+d 8/0Sm3GWPXeelPUxoEy/4jcM/w== X-Google-Smtp-Source: APBJJlHGZD2pToOKcRdBqvrpeDqGNCC7/KU1/F7yZ4nORGQegCZgWoU5k+zvAQYTDatQpmPxq6XajQ== X-Received: by 2002:a05:6a20:8e2a:b0:130:9af7:bf1 with SMTP id y42-20020a056a208e2a00b001309af70bf1mr16939638pzj.6.1689107616082; Tue, 11 Jul 2023 13:33:36 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id fk13-20020a056a003a8d00b0067903510abbsm2108081pfb.163.2023.07.11.13.33.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 13:33:35 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, Jens Axboe Subject: [PATCH 4/5] iomap: add local 'iocb' variable in iomap_dio_bio_end_io() Date: Tue, 11 Jul 2023 14:33:24 -0600 Message-Id: <20230711203325.208957-5-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230711203325.208957-1-axboe@kernel.dk> References: <20230711203325.208957-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org We use this multiple times, add a local variable for the kiocb. Signed-off-by: Jens Axboe --- fs/iomap/direct-io.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 343bde5d50d3..94ef78b25b76 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -157,18 +157,20 @@ void iomap_dio_bio_end_io(struct bio *bio) iomap_dio_set_error(dio, blk_status_to_errno(bio->bi_status)); if (atomic_dec_and_test(&dio->ref)) { + struct kiocb *iocb = dio->iocb; + if (dio->wait_for_completion) { struct task_struct *waiter = dio->submit.waiter; WRITE_ONCE(dio->submit.waiter, NULL); blk_wake_io_task(waiter); } else if ((bio->bi_opf & REQ_POLLED) || !(dio->flags & IOMAP_DIO_WRITE)) { - WRITE_ONCE(dio->iocb->private, NULL); + WRITE_ONCE(iocb->private, NULL); iomap_dio_complete_work(&dio->aio.work); } else { - struct inode *inode = file_inode(dio->iocb->ki_filp); + struct inode *inode = file_inode(iocb->ki_filp); - WRITE_ONCE(dio->iocb->private, NULL); + WRITE_ONCE(iocb->private, NULL); INIT_WORK(&dio->aio.work, iomap_dio_complete_work); queue_work(inode->i_sb->s_dio_done_wq, &dio->aio.work); } From patchwork Tue Jul 11 20:33:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13309367 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02802C04E69 for ; Tue, 11 Jul 2023 20:33:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229884AbjGKUdk (ORCPT ); Tue, 11 Jul 2023 16:33:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44876 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229888AbjGKUdi (ORCPT ); Tue, 11 Jul 2023 16:33:38 -0400 Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D277C1AE for ; Tue, 11 Jul 2023 13:33:37 -0700 (PDT) Received: by mail-pf1-x429.google.com with SMTP id d2e1a72fcca58-682eef7d752so840247b3a.0 for ; Tue, 11 Jul 2023 13:33:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689107617; x=1689712417; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ixQUY57vnvYb5HvRH5ijyFihp5oJ/szTZ8856/uqLoU=; b=sI/EhwbO/KOMFb7uDzGESwMGWUkuorv7KqjZikEwkCW+0t36HgzSjPtaR7itUyd0M2 SHnZ2gCDDMoHxCeO9HB1Z3MzwkTyeF3jBmz2F4QHoGwfa0yvgQrZa79pbXDxtYyVob1K HRAR0y/cbmJb62fG8vH5iKRyUuXhtWrVOD6nTA9i8SFhSBZnGn4GYmBRSbx2gsBy5CJk fNIK5hQGgLik6/2F4YzjVa829WpmPNBrQhYDN6PFnVLR2ArPoS7Eyn6d7Q3inkHCFF2U /AVXYeUiEnGnKC1NY6/qsKYspwRS4kDipl++OGWF0KYDplasu3M6dlE0DwIAOzX8z2j/ hkSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689107617; x=1689712417; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ixQUY57vnvYb5HvRH5ijyFihp5oJ/szTZ8856/uqLoU=; b=N3LJVepXM7icfbpOsQNCznAqintVbi2JTBAt/Az3xQjUwrxSqyMMjIFfUs7YLTyFtp WfTHFPrNVj7yTufZHc+1BQS6e5CIitEchn1qDuMrCHBDXgwDAeL9gFHgDFW5UPmAuA2m 2+dVgi5opXARrBVLSTj3jgVnk6+VJLDS9+QRqbkajpi6/yoImGObjSVam5YF+FR0uTiv iDqAaMiEERWTh68kU13BVkD9HRym051ZBORiBOfnr8L6cCQLdpkNLZiZ51pIJo+maw4Y n21q33uRowmaW49dVOGProc3AH9ACyw+YbL4HQn9rBRhGaTp5YKDv5bGBuI7Wn2NtP5Y iXVw== X-Gm-Message-State: ABy/qLaiV8ULXjrd0NxnuhojAS0waEJIw2jp7GWuvcF3JK1qDLAUSIx2 aSt206hUUDPMZJtlZ8OQncHLyFa7XQB8WtoA7G8= X-Google-Smtp-Source: APBJJlE0Fdjl9niH1fLBogzYDLzZpwQEuTCuENjw7d/RyTpMB94Vv2knTegxp7PH5nQo9z3SCc2jMg== X-Received: by 2002:a05:6a20:4289:b0:12c:76d1:bcde with SMTP id o9-20020a056a20428900b0012c76d1bcdemr23311409pzj.4.1689107617361; Tue, 11 Jul 2023 13:33:37 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id fk13-20020a056a003a8d00b0067903510abbsm2108081pfb.163.2023.07.11.13.33.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Jul 2023 13:33:36 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, Jens Axboe Subject: [PATCH 5/5] iomap: support IOCB_DIO_DEFER Date: Tue, 11 Jul 2023 14:33:25 -0600 Message-Id: <20230711203325.208957-6-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230711203325.208957-1-axboe@kernel.dk> References: <20230711203325.208957-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org If IOCB_DIO_DEFER is set, utilize that to set kiocb->dio_complete handler and data for that callback. Rather than punt the completion to a workqueue, we pass back the handler and data to the issuer and will get a callback from a safe task context. Using the following fio job to randomly dio write 4k blocks at queue depths of 1..16: fio --name=dio-write --filename=/data1/file --time_based=1 \ --runtime=10 --bs=4096 --rw=randwrite --norandommap --buffered=0 \ --cpus_allowed=4 --ioengine=io_uring --iodepth=16 shows the following results before and after this patch: Stock Patched Diff ======================================= QD1 155K 162K + 4.5% QD2 290K 313K + 7.9% QD4 533K 597K +12.0% QD8 604K 827K +36.9% QD16 615K 845K +37.4% which shows nice wins all around. If we factored in per-IOP efficiency, the wins look even nicer. This becomes apparent as queue depth rises, as the offloaded workqueue completions runs out of steam. Signed-off-by: Jens Axboe --- fs/iomap/direct-io.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 94ef78b25b76..bd7b948a29a7 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -130,6 +130,11 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio) } EXPORT_SYMBOL_GPL(iomap_dio_complete); +static ssize_t iomap_dio_deferred_complete(void *data) +{ + return iomap_dio_complete(data); +} + static void iomap_dio_complete_work(struct work_struct *work) { struct iomap_dio *dio = container_of(work, struct iomap_dio, aio.work); @@ -167,6 +172,25 @@ void iomap_dio_bio_end_io(struct bio *bio) !(dio->flags & IOMAP_DIO_WRITE)) { WRITE_ONCE(iocb->private, NULL); iomap_dio_complete_work(&dio->aio.work); + } else if ((iocb->ki_flags & IOCB_DIO_DEFER) && + !(dio->flags & IOMAP_DIO_NEED_SYNC)) { + /* only polled IO cares about private cleared */ + iocb->private = dio; + iocb->dio_complete = iomap_dio_deferred_complete; + /* + * Invoke ->ki_complete() directly. We've assigned + * out dio_complete callback handler, and since the + * issuer set IOCB_DIO_DEFER, we know their + * ki_complete handler will notice ->dio_complete + * being set and will defer calling that handler + * until it can be done from a safe task context. + * + * Note that the 'res' being passed in here is + * not important for this case. The actual completion + * value of the request will be gotten from dio_complete + * when that is run by the issuer. + */ + iocb->ki_complete(iocb, 0); } else { struct inode *inode = file_inode(iocb->ki_filp);