From patchwork Wed Jul 19 19:54:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13319436 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14BC1C001B0 for ; Wed, 19 Jul 2023 19:54:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229853AbjGSTy3 (ORCPT ); Wed, 19 Jul 2023 15:54:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230041AbjGSTy2 (ORCPT ); Wed, 19 Jul 2023 15:54:28 -0400 Received: from mail-il1-x133.google.com (mail-il1-x133.google.com [IPv6:2607:f8b0:4864:20::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C59E92 for ; Wed, 19 Jul 2023 12:54:25 -0700 (PDT) Received: by mail-il1-x133.google.com with SMTP id e9e14a558f8ab-345d2b936c2so151515ab.0 for ; Wed, 19 Jul 2023 12:54:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689796464; x=1690401264; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=jpHtyM71rycbGzxm88J44jHtqMHJstO4/q8wpN7wJCA=; b=FNlYw+GmMeqZJ1fUIFhKyYLPOtHct1s9GU3dXBrGMfiPuHhz1RMtma4KEybGDZVewv +yLSdH4uw/auwxtOERf6D1LE0rQDTThqOkasvglSvnELfGwkO5f3ARhMwiKM2+O4MK/B YL29w+OunBD+ljYgIXYQiYmaHMCr4HJXB5cHgtF+zd4XO1o6L0WuEAafd9JPycncWuU3 BF02i9/sA8lPl2md/yIWvmPCtprJGPF8XAxRMJsTuNi2yGA/llqePQaB50AhxlCgSm2I wLMdhFC3HfZFB/lk1SZ13yJk8Xr0Vl4WWfodSGkxFwQj9nW+7uCVY886kGQzy/l6dcGV sCaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689796464; x=1690401264; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jpHtyM71rycbGzxm88J44jHtqMHJstO4/q8wpN7wJCA=; b=dbuGZcxpJGqjQ2DmIq9u8Xw/Rt5M03QC1Qm9lERaHFpbPsFDzPFJXjzCydkio9djlJ DpMXn8HyUdGsQ4b31TNh0FyMscD/mA7rkg3smVricms25rl/hOATNs5UVzSQYb+uDAh0 pg66AmILlwqNOGAhQkhJxpKFWew1kwg1W5YKlEaUO/nKhfNaNWszAH1CNBtB8CAtb3XP +47FJhXISUCQwdX0/DwSTI1Qy+g/zciGO3aASCDAj3eoW20XXUqdTLlee77wuQDV9u8+ qdLS5yPMrmLxcMZEitnAiXdg2AR9UKHrLJIqN2AUSZpk4lAVJL6sNz4cp7MxNzTr9VjI A3hA== X-Gm-Message-State: ABy/qLa1PMmwiWdnVizG0HRc9WPEvXwjqaXAGR5LosRfzeOagTKeSds8 dKFammStSsygsFLW2ak4ZozRB6kZ17It8eltlew= X-Google-Smtp-Source: APBJJlGCrp8uJatS31bBDATPPWJD65ZxrfZJ395iuU0Uxn4EkwUvcftEqKlHKsbRMKYfjuQZrGl8zQ== X-Received: by 2002:a92:c243:0:b0:346:1919:7cb1 with SMTP id k3-20020a92c243000000b0034619197cb1mr9293382ilo.2.1689796464315; Wed, 19 Jul 2023 12:54:24 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id j21-20020a02a695000000b0042bb13cb80fsm1471893jam.120.2023.07.19.12.54.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Jul 2023 12:54:23 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 1/6] iomap: cleanup up iomap_dio_bio_end_io() Date: Wed, 19 Jul 2023 13:54:12 -0600 Message-Id: <20230719195417.1704513-2-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230719195417.1704513-1-axboe@kernel.dk> References: <20230719195417.1704513-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Make the logic a bit easier to follow: 1) Add a release_bio out path, as everybody needs to touch that, and have our bio ref check jump there if it's non-zero. 2) Add a kiocb local variable. 3) Add comments for each of the three conditions (sync, inline, or async workqueue punt). No functional changes in this patch. Signed-off-by: Jens Axboe --- fs/iomap/direct-io.c | 43 ++++++++++++++++++++++++++++--------------- 1 file changed, 28 insertions(+), 15 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index ea3b868c8355..1c32f734c767 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -152,27 +152,40 @@ void iomap_dio_bio_end_io(struct bio *bio) { struct iomap_dio *dio = bio->bi_private; bool should_dirty = (dio->flags & IOMAP_DIO_DIRTY); + struct kiocb *iocb = dio->iocb; if (bio->bi_status) iomap_dio_set_error(dio, blk_status_to_errno(bio->bi_status)); + if (!atomic_dec_and_test(&dio->ref)) + goto release_bio; - if (atomic_dec_and_test(&dio->ref)) { - if (dio->wait_for_completion) { - struct task_struct *waiter = dio->submit.waiter; - WRITE_ONCE(dio->submit.waiter, NULL); - blk_wake_io_task(waiter); - } else if (dio->flags & IOMAP_DIO_WRITE) { - struct inode *inode = file_inode(dio->iocb->ki_filp); - - WRITE_ONCE(dio->iocb->private, NULL); - INIT_WORK(&dio->aio.work, iomap_dio_complete_work); - queue_work(inode->i_sb->s_dio_done_wq, &dio->aio.work); - } else { - WRITE_ONCE(dio->iocb->private, NULL); - iomap_dio_complete_work(&dio->aio.work); - } + /* + * Synchronous dio, task itself will handle any completion work + * that needs after IO. All we need to do is wake the task. + */ + if (dio->wait_for_completion) { + struct task_struct *waiter = dio->submit.waiter; + WRITE_ONCE(dio->submit.waiter, NULL); + blk_wake_io_task(waiter); + goto release_bio; + } + + /* + * If this dio is an async write, queue completion work for async + * handling. Reads can always complete inline. + */ + if (dio->flags & IOMAP_DIO_WRITE) { + struct inode *inode = file_inode(iocb->ki_filp); + + WRITE_ONCE(iocb->private, NULL); + INIT_WORK(&dio->aio.work, iomap_dio_complete_work); + queue_work(inode->i_sb->s_dio_done_wq, &dio->aio.work); + } else { + WRITE_ONCE(iocb->private, NULL); + iomap_dio_complete_work(&dio->aio.work); } +release_bio: if (should_dirty) { bio_check_pages_dirty(bio); } else { From patchwork Wed Jul 19 19:54:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13319437 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B1ABC001E0 for ; Wed, 19 Jul 2023 19:54:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231147AbjGSTy3 (ORCPT ); Wed, 19 Jul 2023 15:54:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41474 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231303AbjGSTy2 (ORCPT ); Wed, 19 Jul 2023 15:54:28 -0400 Received: from mail-il1-x136.google.com (mail-il1-x136.google.com [IPv6:2607:f8b0:4864:20::136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B953C1FCE for ; Wed, 19 Jul 2023 12:54:26 -0700 (PDT) Received: by mail-il1-x136.google.com with SMTP id e9e14a558f8ab-346434c7793so170745ab.0 for ; Wed, 19 Jul 2023 12:54:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689796465; x=1692388465; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=45begm+fIx+LrUIPU3YZoGzZeihEJfvQZCl04QelfsE=; b=YqXQR4MHANqJ4lqNW+dVVcLxA6p+84Z8JqnMPbx2YwUnr3ejibQ3s0RNm3WmVb+2zp BaxyAnKKw7OEVhFtW+ifnNtlqJ03MjuOcUeaK8wrGrZLcT0s51LMRP3DQr5txqrzr3XH +g74qADtzwULZf650xEAwHFSiW3r/4o8hJ95HRlmucc6K9AilmzmK21wOtrOlSL2p/9a +3HDWmTrunAiIJVU7SMz82WLYie45vODi+6FtLpQaxRdLxAS7QwLp19vqnZqkK0YHFKy 8mGqA4Rt4kYNJrlbwGFst2KNYRja3/r+6VG+/UjbkR2wUVQ4aw1E4ILMu9tSSpEW2Ql0 XvHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689796465; x=1692388465; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=45begm+fIx+LrUIPU3YZoGzZeihEJfvQZCl04QelfsE=; b=BbXO87OLEEl4mOnRehNJaf5DpFbRAImUGZ0QPSJlu6e7cCXbb6IIUxy8R0S/taGkJW j8VTaJ0WdE3jQ3LbTeNQnZW57e+6Cj29NzDbP6oufv5vzMSdyGEhbmcGbhFyRL9LBLtM 3plx0g6GZ5ulnwv3aR8wjWMn8IvXd31U9xOEADasiW+mPd/YItGtWvSg+vEaq12Wthca BNgMz4FTmAOijLjSDjKBq2Femd4x4ctvWL6tCyHe/jx+8WZgqBwgRsDTPyN+BgsOwZhp FZ7cB37Yu5X4p0DEwzZLm66qOPVBGWyLZpzl+O3rmMeYq4ZMBZPOYgTxa2DMoUOg6oAL 8b1A== X-Gm-Message-State: ABy/qLYnkG3+klcNvkhfb5BOphpjvA+PWNi2SFgnlqKd1XkQ+etO9nx3 CNG8XNHjoU8xNihMs1NCTLH64KvpJxloK5KVSxY= X-Google-Smtp-Source: APBJJlFFfZAK6BEoCWdwWC6nABZAvmNLxmkBzkKmeWekoeuiJzcEvs+ie0vVzdLsUtQ159wXwx5Zbg== X-Received: by 2002:a05:6602:3404:b0:77a:ee79:652 with SMTP id n4-20020a056602340400b0077aee790652mr511333ioz.1.1689796465599; Wed, 19 Jul 2023 12:54:25 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id j21-20020a02a695000000b0042bb13cb80fsm1471893jam.120.2023.07.19.12.54.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Jul 2023 12:54:24 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 2/6] iomap: add IOMAP_DIO_INLINE_COMP Date: Wed, 19 Jul 2023 13:54:13 -0600 Message-Id: <20230719195417.1704513-3-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230719195417.1704513-1-axboe@kernel.dk> References: <20230719195417.1704513-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Rather than gate whether or not we need to punt a dio completion to a workqueue, add an explicit flag for it. For now we treat them the same, reads always set the flags and async writes do not. No functional changes in this patch. Signed-off-by: Jens Axboe --- fs/iomap/direct-io.c | 27 ++++++++++++++++++--------- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 1c32f734c767..6b302bf8790b 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -20,6 +20,7 @@ * Private flags for iomap_dio, must not overlap with the public ones in * iomap.h: */ +#define IOMAP_DIO_INLINE_COMP (1 << 27) #define IOMAP_DIO_WRITE_FUA (1 << 28) #define IOMAP_DIO_NEED_SYNC (1 << 29) #define IOMAP_DIO_WRITE (1 << 30) @@ -171,20 +172,25 @@ void iomap_dio_bio_end_io(struct bio *bio) } /* - * If this dio is an async write, queue completion work for async - * handling. Reads can always complete inline. + * Flagged with IOMAP_DIO_INLINE_COMP, we can complete it inline */ - if (dio->flags & IOMAP_DIO_WRITE) { - struct inode *inode = file_inode(iocb->ki_filp); - - WRITE_ONCE(iocb->private, NULL); - INIT_WORK(&dio->aio.work, iomap_dio_complete_work); - queue_work(inode->i_sb->s_dio_done_wq, &dio->aio.work); - } else { + if (dio->flags & IOMAP_DIO_INLINE_COMP) { WRITE_ONCE(iocb->private, NULL); iomap_dio_complete_work(&dio->aio.work); + goto release_bio; } + /* + * Async DIO completion that requires filesystem level completion work + * gets punted to a work queue to complete as the operation may require + * more IO to be issued to finalise filesystem metadata changes or + * guarantee data integrity. + */ + WRITE_ONCE(iocb->private, NULL); + INIT_WORK(&dio->aio.work, iomap_dio_complete_work); + queue_work(file_inode(iocb->ki_filp)->i_sb->s_dio_done_wq, + &dio->aio.work); + release_bio: if (should_dirty) { bio_check_pages_dirty(bio); @@ -524,6 +530,9 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, iomi.flags |= IOMAP_NOWAIT; if (iov_iter_rw(iter) == READ) { + /* reads can always complete inline */ + dio->flags |= IOMAP_DIO_INLINE_COMP; + if (iomi.pos >= dio->i_size) goto out_free_dio; From patchwork Wed Jul 19 19:54:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13319438 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74CB4C04E69 for ; Wed, 19 Jul 2023 19:54:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229990AbjGSTya (ORCPT ); Wed, 19 Jul 2023 15:54:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41508 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230031AbjGSTya (ORCPT ); Wed, 19 Jul 2023 15:54:30 -0400 Received: from mail-il1-x12c.google.com (mail-il1-x12c.google.com [IPv6:2607:f8b0:4864:20::12c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D709F1FD7 for ; Wed, 19 Jul 2023 12:54:27 -0700 (PDT) Received: by mail-il1-x12c.google.com with SMTP id e9e14a558f8ab-34637e55d9dso134535ab.1 for ; Wed, 19 Jul 2023 12:54:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689796466; x=1690401266; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=giarwnzcCaT/PBvps45FEOjjLZagrA/AJXIKNe98qXk=; b=eksc0EVn1J7E9i2FEHXrZSmg9JRoO+qtoe8f9+uIhNMqDFZiYOx1oRbck3k8Elvw8l lVfKAeju+yS+rGI67gQI27KgveS4qQZ3OG18L1MeD043TMBKI87nLpqa8QX8VgKILRYA yd8dGS1bnVZir0jayDAP9fYUKf3Wj0Mj29dpGbYiWfqMNENj0338HyGKqod4WZJJoAyU 6k3G1TaLzHLZ7o/WsTFxuY2/hCmKs7afLqrsp1yqcwT2+cszT7SUNhFr4GjeJnJtXkEH 3ZysLsIWOVOW45NlVm65bQn3M75oJ17oIDFD8daifdZb0pfPkjnO159lHDsiDqsJUAE6 NNaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689796466; x=1690401266; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=giarwnzcCaT/PBvps45FEOjjLZagrA/AJXIKNe98qXk=; b=dnJT77Hpse+iU36QQgV9E0sWeyL39cbKw8SSi7zqJnBatnClCj3jH+h+/56iW/0Tco EMxT2aegVhmVzGplrg2oZEg/dKLaqOz2haqKUcapYIDfdzr+4wU+ePUgJEeDiTzp9RuG XcyKpz6gmifMeKpNnsv7AP31e0bzQ3OJPmAQXF47yKB97lQUv8SNJaW1C1+6beiMaILB Ap2EeMgZRdwN+6VtpnL+8s4Qh+jclIFX6zAdAyzDpYnZ7nF+qHyWtIJB+lmEN91yAieA bccYNbfQdtAHZXu9Azs9re/RmYs9z4uAD7PQa2ilScSgl8kk3j/1vzVSLU0P2KWwU5d4 j62Q== X-Gm-Message-State: ABy/qLYJs6EeVy09xhEqzwWBjiBomeVivmfzsCQF3aOeCnBcqe54k3jl ftG+lYziojTgfsG/+/X9GV8VQDGRcbzyOn/lwUA= X-Google-Smtp-Source: APBJJlGAukebKX7XN84+x+/Sd6SfjBNgKmNb3bQZ5eAU4XvDWiN/z7PSkEMd6iAlarxUJBBPh0asUg== X-Received: by 2002:a92:c243:0:b0:346:1919:7cb1 with SMTP id k3-20020a92c243000000b0034619197cb1mr9293424ilo.2.1689796466653; Wed, 19 Jul 2023 12:54:26 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id j21-20020a02a695000000b0042bb13cb80fsm1471893jam.120.2023.07.19.12.54.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Jul 2023 12:54:26 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 3/6] iomap: treat a write through cache the same as FUA Date: Wed, 19 Jul 2023 13:54:14 -0600 Message-Id: <20230719195417.1704513-4-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230719195417.1704513-1-axboe@kernel.dk> References: <20230719195417.1704513-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Whether we have a write back cache and are using FUA or don't have a write back cache at all is the same situation. Treat them the same. Signed-off-by: Jens Axboe --- fs/iomap/direct-io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 6b302bf8790b..b30c3edf2ef3 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -280,7 +280,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, * cache flushes on IO completion. */ if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) && - (dio->flags & IOMAP_DIO_WRITE_FUA) && bdev_fua(iomap->bdev)) + (dio->flags & IOMAP_DIO_WRITE_FUA) && + (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev))) use_fua = true; } From patchwork Wed Jul 19 19:54:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13319439 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1603AC001B0 for ; Wed, 19 Jul 2023 19:54:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231303AbjGSTyb (ORCPT ); Wed, 19 Jul 2023 15:54:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41510 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230041AbjGSTya (ORCPT ); Wed, 19 Jul 2023 15:54:30 -0400 Received: from mail-il1-x134.google.com (mail-il1-x134.google.com [IPv6:2607:f8b0:4864:20::134]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2C538B3 for ; Wed, 19 Jul 2023 12:54:29 -0700 (PDT) Received: by mail-il1-x134.google.com with SMTP id e9e14a558f8ab-345d2b936c2so151605ab.0 for ; Wed, 19 Jul 2023 12:54:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689796468; x=1690401268; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8G1ATQFuI41fkL8nxWaXO/6T7p7QjpNqJTfODI4BURk=; b=zrDwjC1d0FFyHF4BZzGKrNTGsfN9BlDcgqYOtgjpIeUfI8cUJ2gWoJWcV2v5/PLiTu HeyoUydrQK56YU+zKI+YOR0ZH2b8tjRZfUCtpOrHth4YDgvMaZvG+4yacsCAOxtqOivn 7BnToAzllX09eFjFXYh8ZhuOsPY5pg6h8b656mooeFRPrwJLe+76YIQs71YoAd7Pss7q OlRLGqCsox8ZlBSYH09RSut8BwtoVWBctVkMasOeGR3+IYGRmNmKRe9Bvc0W3ZBFopC9 KNpv70BDCVWiaEUNYjAtAEgyxv6ZTXhQPXLOxuAH1Q+uHzBkYUFh6wmErTXD5xCloULh gTaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689796468; x=1690401268; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8G1ATQFuI41fkL8nxWaXO/6T7p7QjpNqJTfODI4BURk=; b=FWrmS4ubJf+sYSQTkzy9dkL6LFdUIhdUccWVuKOvTSXJa8b94jYmdv2thSEBp4qzZJ CeFLy13mfzIzwQ9N8vKs7GJEqslohLW+qXtN4odT7Y67jHrCo0x0IUELnWTuqCEb2n4q 5nxN9x4reebEeCaPnmokYJCzIV8lh8ObuOPcQtze7tuv4QZnSVYtnagdNdoD6jJvAWp4 M8JB1jEkNyC6Vfu/PoojIzUBYqZ+vybrdJi5tLmv5aZhRhRfjtH0NfYhO3LkG0XoXiBT WshloiMdbDWrV3YrokHNl5dF7a14iwxfE0DD8TW2keyMVDqMe9TNjoVR1tyghwpI7tqG /RBw== X-Gm-Message-State: ABy/qLYCWh20c24yonO3EOJSUrjbbUJ3rK4+RC8h3VlbW7FEQJoKyq93 u9GXqE/hcSI6lz5qToGhuny9gAmp3ZRB2pFcU2Q= X-Google-Smtp-Source: APBJJlF33Mp1ZkeKdSrYKVbcfXze+rZUfO0282dRzlMGTO0XNHw70QFDmxUMZpxKQSFviRi7haLNVg== X-Received: by 2002:a92:c243:0:b0:346:1919:7cb1 with SMTP id k3-20020a92c243000000b0034619197cb1mr9293457ilo.2.1689796468044; Wed, 19 Jul 2023 12:54:28 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id j21-20020a02a695000000b0042bb13cb80fsm1471893jam.120.2023.07.19.12.54.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Jul 2023 12:54:27 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 4/6] fs: add IOCB flags related to passing back dio completions Date: Wed, 19 Jul 2023 13:54:15 -0600 Message-Id: <20230719195417.1704513-5-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230719195417.1704513-1-axboe@kernel.dk> References: <20230719195417.1704513-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Async dio completions generally happen from hard/soft IRQ context, which means that users like iomap may need to defer some of the completion handling to a workqueue. This is less efficient than having the original issuer handle it, like we do for sync IO, and it adds latency to the completions. Add IOCB_DIO_DEFER, which the issuer can set if it is able to safely punt these completions to a safe context. If the dio handler is aware of this flag, assign a callback handler in kiocb->dio_complete and associated data io kiocb->private. The issuer will then call this handler with that data from task context. No functional changes in this patch. Signed-off-by: Jens Axboe --- include/linux/fs.h | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 6867512907d6..115382f66d79 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -338,6 +338,16 @@ enum rw_hint { #define IOCB_NOIO (1 << 20) /* can use bio alloc cache */ #define IOCB_ALLOC_CACHE (1 << 21) +/* + * IOCB_DIO_DEFER can be set by the iocb owner, to indicate that the + * iocb completion can be passed back to the owner for execution from a safe + * context rather than needing to be punted through a workqueue. If this + * flag is set, the completion handling may set iocb->dio_complete to a + * handler, which the issuer will then call from task context to complete + * the processing of the iocb. iocb->private should then also be set to + * the argument being passed to this handler. + */ +#define IOCB_DIO_DEFER (1 << 22) /* for use in trace events */ #define TRACE_IOCB_STRINGS \ @@ -351,7 +361,8 @@ enum rw_hint { { IOCB_WRITE, "WRITE" }, \ { IOCB_WAITQ, "WAITQ" }, \ { IOCB_NOIO, "NOIO" }, \ - { IOCB_ALLOC_CACHE, "ALLOC_CACHE" } + { IOCB_ALLOC_CACHE, "ALLOC_CACHE" }, \ + { IOCB_DIO_DEFER, "DIO_DEFER" } struct kiocb { struct file *ki_filp; @@ -360,7 +371,22 @@ struct kiocb { void *private; int ki_flags; u16 ki_ioprio; /* See linux/ioprio.h */ - struct wait_page_queue *ki_waitq; /* for async buffered IO */ + union { + /* + * Only used for async buffered reads, where it denotes the + * page waitqueue associated with completing the read. Valid + * IFF IOCB_WAITQ is set. + */ + struct wait_page_queue *ki_waitq; + /* + * Can be used for O_DIRECT IO, where the completion handling + * is punted back to the issuer of the IO. May only be set + * if IOCB_DIO_DEFER is set by the issuer, and the issuer must + * then check for presence of this handler when ki_complete is + * invoked. + */ + ssize_t (*dio_complete)(void *data); + }; }; static inline bool is_sync_kiocb(struct kiocb *kiocb) From patchwork Wed Jul 19 19:54:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13319440 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC6DDC001DE for ; Wed, 19 Jul 2023 19:54:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230031AbjGSTyc (ORCPT ); Wed, 19 Jul 2023 15:54:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41538 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231314AbjGSTyb (ORCPT ); Wed, 19 Jul 2023 15:54:31 -0400 Received: from mail-il1-x136.google.com (mail-il1-x136.google.com [IPv6:2607:f8b0:4864:20::136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BBE62B3 for ; Wed, 19 Jul 2023 12:54:30 -0700 (PDT) Received: by mail-il1-x136.google.com with SMTP id e9e14a558f8ab-3461b58c61dso150275ab.1 for ; Wed, 19 Jul 2023 12:54:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689796469; x=1692388469; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=YmSjJDLDmeliZeLZZd2Btg4MO1D55XByrYaapFM2zhA=; b=pAxJhc8Zrqww3ifH7tblUbxcnSaGGpK6Flzmom88IdoU4B6DS38U4aSmjG/hCStwDs 8OK8brESywMi2HMvjpegdMgf10LhDAqY2LRx0uHjG/27of6Atj7tKZGn3AzL44Lkw8eC Ny2RPnGhyIrh3ibS3g83vWFAV2Qv9II94vA6KErvtFudclA6oKOX8lsU/YyVRQt1QaSo g782Au/0mIUGTbLxLX9a1aIPqqUw4ipXxkJtfFXRWMEplopn9AiQv/mXoJEVzKE3PfoY g4qWlKXYkjZQRKAVbSEAWGDgbgejPhZJRX629AGt7H8yaxM54EIiaYl8QjsSwBpgIyBx jXCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689796469; x=1692388469; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YmSjJDLDmeliZeLZZd2Btg4MO1D55XByrYaapFM2zhA=; b=UEdHFXjSKEO/jsg579YHsMu7xFkY1ztCwPQRoAWSRHU+qylFi61ZoFUx5D1Z+7UJfz IbUHTLiW8qst6trvgTjUkmhzPvWRoQI6fD7ovR08Wkyt4WlLHJ/n7Xmneupvp0IqBC7P agiVfcfRqLaCU85DpcLcACW8AhvD+KyhA78cfD4IPIQzmYOQmSMO6M71qN0Y4i6wO2RK nQ41T6CMjn1Ts6qlrcfUfDUR/EaOKj3mkwubs8oAkW358jGasAYE/vJyRyF3tNp4sbWz hPyR3uY0APXVWfBaNnc60Uu+O3YqHIOsTe+NeVZshRhWfsFVUMwQLSvVAPQNI7JT/8Mg DEsg== X-Gm-Message-State: ABy/qLZm/++/U+Zb30uAxZ7YyMgTGOQIsnFXMFU8wIbUgLMNy1b/9lzP CS1FKPd5cvtMvSWD4ceC+JTBwwJHb+LfPo1LihU= X-Google-Smtp-Source: APBJJlG5AcxR0HUDfbb2X91dM4A8cbc003r5OK2/TGl957f1MPdCIviBNGBxD1RJ5yYp6sohizmfUg== X-Received: by 2002:a05:6602:3710:b0:788:2d78:813c with SMTP id bh16-20020a056602371000b007882d78813cmr498953iob.0.1689796469594; Wed, 19 Jul 2023 12:54:29 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id j21-20020a02a695000000b0042bb13cb80fsm1471893jam.120.2023.07.19.12.54.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Jul 2023 12:54:28 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 5/6] io_uring/rw: add write support for IOCB_DIO_DEFER Date: Wed, 19 Jul 2023 13:54:16 -0600 Message-Id: <20230719195417.1704513-6-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230719195417.1704513-1-axboe@kernel.dk> References: <20230719195417.1704513-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org If the filesystem dio handler understands IOCB_DIO_DEFER, we'll get a kiocb->ki_complete() callback with kiocb->dio_complete set. In that case, rather than complete the IO directly through task_work, queue up an intermediate task_work handler that first processes this callback and then immediately completes the request. For XFS, this avoids a punt through a workqueue, which is a lot less efficient and adds latency to lower queue depth (or sync) O_DIRECT writes. Signed-off-by: Jens Axboe --- io_uring/rw.c | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/io_uring/rw.c b/io_uring/rw.c index 1bce2208b65c..4657e11acf02 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -285,6 +285,14 @@ static inline int io_fixup_rw_res(struct io_kiocb *req, long res) void io_req_rw_complete(struct io_kiocb *req, struct io_tw_state *ts) { + struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); + + if (rw->kiocb.dio_complete) { + long res = rw->kiocb.dio_complete(rw->kiocb.private); + + io_req_set_res(req, io_fixup_rw_res(req, res), 0); + } + io_req_io_end(req); if (req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)) { @@ -300,9 +308,11 @@ static void io_complete_rw(struct kiocb *kiocb, long res) struct io_rw *rw = container_of(kiocb, struct io_rw, kiocb); struct io_kiocb *req = cmd_to_io_kiocb(rw); - if (__io_complete_rw_common(req, res)) - return; - io_req_set_res(req, io_fixup_rw_res(req, res), 0); + if (!rw->kiocb.dio_complete) { + if (__io_complete_rw_common(req, res)) + return; + io_req_set_res(req, io_fixup_rw_res(req, res), 0); + } req->io_task_work.func = io_req_rw_complete; __io_req_task_work_add(req, IOU_F_TWQ_LAZY_WAKE); } @@ -312,6 +322,9 @@ static void io_complete_rw_iopoll(struct kiocb *kiocb, long res) struct io_rw *rw = container_of(kiocb, struct io_rw, kiocb); struct io_kiocb *req = cmd_to_io_kiocb(rw); + if (rw->kiocb.dio_complete) + res = rw->kiocb.dio_complete(rw->kiocb.private); + if (kiocb->ki_flags & IOCB_WRITE) kiocb_end_write(req); if (unlikely(res != req->cqe.res)) { @@ -914,7 +927,13 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags) __sb_writers_release(file_inode(req->file)->i_sb, SB_FREEZE_WRITE); } - kiocb->ki_flags |= IOCB_WRITE; + + /* + * Set IOCB_DIO_DEFER, stating that our handler groks deferring the + * completion to task context. + */ + kiocb->ki_flags |= IOCB_WRITE | IOCB_DIO_DEFER; + kiocb->dio_complete = NULL; if (likely(req->file->f_op->write_iter)) ret2 = call_write_iter(req->file, kiocb, &s->iter); From patchwork Wed Jul 19 19:54:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13319441 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F35CCC001B0 for ; Wed, 19 Jul 2023 19:54:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231304AbjGSTyg (ORCPT ); Wed, 19 Jul 2023 15:54:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41534 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231316AbjGSTyd (ORCPT ); Wed, 19 Jul 2023 15:54:33 -0400 Received: from mail-io1-xd33.google.com (mail-io1-xd33.google.com [IPv6:2607:f8b0:4864:20::d33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38F5F1FD3 for ; Wed, 19 Jul 2023 12:54:32 -0700 (PDT) Received: by mail-io1-xd33.google.com with SMTP id ca18e2360f4ac-785d3a53ed6so82448739f.1 for ; Wed, 19 Jul 2023 12:54:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1689796471; x=1690401271; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=kTvYQlDOhP6EOVmXA2umB2ZKrh0BWM6v9ydQBvMCtEo=; b=bntvLtM93TfxHDv1res/Le2NnqIAI2hPK1vOgQO+ZboYSOMGk0AyDqi0tcI2iSWnP2 kBPZ2hoO6+JtUocaEoPTyx+AnaVilw9JCpUvrbfOISFu8KlGjVmsTul36huDeAfZFZom famDHX0bTR7bVniAyiKg1W1WvspZAHb9qmUv5yZ40X+abb0PeHmq7e/9tRRPBZwXJzSF LX/bCOiBKfF7z0Og5QDpXU5O+x4rltwbjzmkW9YYP0nZISYPG2ZRZQWdKEDHgHFwV8lA cITkp69rHLOflHD7M2nh/eraJ8QIJhB9FK8g6SRJ2PIyDH4CPvm+UjZbjZi0UCHoDGfE yhKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689796471; x=1690401271; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kTvYQlDOhP6EOVmXA2umB2ZKrh0BWM6v9ydQBvMCtEo=; b=LABpYxXAXzh0sIMm30O47Znj2vIb+w10bBwnS+dx7Gc+k+cZii1zRmygqO6LgYbaNE 62ky7rHJLs52krVl2mF6zRB9/ABUlm/deJ57coVO9ilHf7rEtQhGkZvGbG6Bi2/899/9 nnI0iOWf+VMYZdgmchHwUyVqeQTBacCGDmPNBZH2F1TGn4/L4EdT8nFObJ//be6igYEk xs+AkJa2nDuI6xG+Zburdl7GiIwC+JS22FF3ru4315+wX5ARKu2CVALcKgKF8HYXRNN0 KtN8gRb9BZIMH77f93PAKSQ3plKObXj+Nbbu9PSN5gqZWFlgNWiCK8CNvzoPcNfCPtoh HA2A== X-Gm-Message-State: ABy/qLYbURSeTAiyMEEjp8LnzNuLXacRQnNDVc8YFeeyW4sEmbhY/0Dt CPNH/vCKu4TF6tIyvyyZIfAjLV3Z7AW4xP/clh0= X-Google-Smtp-Source: APBJJlFnEcVMxJ5LOd6PR/a75IhjhuRWzWbS5J5uUVT/bik2+nthisdHTN19bYSQbo/z7VRMMUf64Q== X-Received: by 2002:a05:6602:3423:b0:780:d65c:d78f with SMTP id n35-20020a056602342300b00780d65cd78fmr586047ioz.2.1689796471049; Wed, 19 Jul 2023 12:54:31 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id j21-20020a02a695000000b0042bb13cb80fsm1471893jam.120.2023.07.19.12.54.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Jul 2023 12:54:30 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, Jens Axboe Subject: [PATCH 6/6] iomap: support IOCB_DIO_DEFER Date: Wed, 19 Jul 2023 13:54:17 -0600 Message-Id: <20230719195417.1704513-7-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230719195417.1704513-1-axboe@kernel.dk> References: <20230719195417.1704513-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org If IOCB_DIO_DEFER is set, utilize that to set kiocb->dio_complete handler and data for that callback. Rather than punt the completion to a workqueue, we pass back the handler and data to the issuer and will get a callback from a safe task context. Using the following fio job to randomly dio write 4k blocks at queue depths of 1..16: fio --name=dio-write --filename=/data1/file --time_based=1 \ --runtime=10 --bs=4096 --rw=randwrite --norandommap --buffered=0 \ --cpus_allowed=4 --ioengine=io_uring --iodepth=$depth shows the following results before and after this patch: Stock Patched Diff ======================================= QD1 155K 162K + 4.5% QD2 290K 313K + 7.9% QD4 533K 597K +12.0% QD8 604K 827K +36.9% QD16 615K 845K +37.4% which shows nice wins all around. If we factored in per-IOP efficiency, the wins look even nicer. This becomes apparent as queue depth rises, as the offloaded workqueue completions runs out of steam. Signed-off-by: Jens Axboe --- fs/iomap/direct-io.c | 47 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 46 insertions(+), 1 deletion(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index b30c3edf2ef3..b7055d50dd99 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -20,6 +20,7 @@ * Private flags for iomap_dio, must not overlap with the public ones in * iomap.h: */ +#define IOMAP_DIO_DEFER_COMP (1 << 26) #define IOMAP_DIO_INLINE_COMP (1 << 27) #define IOMAP_DIO_WRITE_FUA (1 << 28) #define IOMAP_DIO_NEED_SYNC (1 << 29) @@ -131,6 +132,11 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio) } EXPORT_SYMBOL_GPL(iomap_dio_complete); +static ssize_t iomap_dio_deferred_complete(void *data) +{ + return iomap_dio_complete(data); +} + static void iomap_dio_complete_work(struct work_struct *work) { struct iomap_dio *dio = container_of(work, struct iomap_dio, aio.work); @@ -180,6 +186,31 @@ void iomap_dio_bio_end_io(struct bio *bio) goto release_bio; } + /* + * If this dio is flagged with IOMAP_DIO_DEFER_COMP, then schedule + * our completion that way to avoid an async punt to a workqueue. + */ + if (dio->flags & IOMAP_DIO_DEFER_COMP) { + /* only polled IO cares about private cleared */ + iocb->private = dio; + iocb->dio_complete = iomap_dio_deferred_complete; + + /* + * Invoke ->ki_complete() directly. We've assigned out + * dio_complete callback handler, and since the issuer set + * IOCB_DIO_DEFER, we know their ki_complete handler will + * notice ->dio_complete being set and will defer calling that + * handler until it can be done from a safe task context. + * + * Note that the 'res' being passed in here is not important + * for this case. The actual completion value of the request + * will be gotten from dio_complete when that is run by the + * issuer. + */ + iocb->ki_complete(iocb, 0); + goto release_bio; + } + /* * Async DIO completion that requires filesystem level completion work * gets punted to a work queue to complete as the operation may require @@ -277,12 +308,15 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, * data IO that doesn't require any metadata updates (including * after IO completion such as unwritten extent conversion) and * the underlying device supports FUA. This allows us to avoid - * cache flushes on IO completion. + * cache flushes on IO completion. If we can't use FUA and + * need to sync, disable in-task completions. */ if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) && (dio->flags & IOMAP_DIO_WRITE_FUA) && (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev))) use_fua = true; + else if (dio->flags & IOMAP_DIO_NEED_SYNC) + dio->flags &= ~IOMAP_DIO_DEFER_COMP; } /* @@ -308,6 +342,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, pad = pos & (fs_block_size - 1); if (pad) iomap_dio_zero(iter, dio, pos - pad, pad); + + dio->flags &= ~IOMAP_DIO_DEFER_COMP; } /* @@ -547,6 +583,15 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, iomi.flags |= IOMAP_WRITE; dio->flags |= IOMAP_DIO_WRITE; + /* + * Flag as supporting deferred completions, if the issuer + * groks it. This can avoid a workqueue punt for writes. + * We may later clear this flag if we need to do other IO + * as part of this IO completion. + */ + if (iocb->ki_flags & IOCB_DIO_DEFER) + dio->flags |= IOMAP_DIO_DEFER_COMP; + if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) { ret = -EAGAIN; if (iomi.pos >= dio->i_size ||