From patchwork Mon Jul 24 22:55:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13325413 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5D5EC001B0 for ; Mon, 24 Jul 2023 22:55:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229848AbjGXWzT (ORCPT ); Mon, 24 Jul 2023 18:55:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33428 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229496AbjGXWzS (ORCPT ); Mon, 24 Jul 2023 18:55:18 -0400 Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 625C610E3 for ; Mon, 24 Jul 2023 15:55:17 -0700 (PDT) Received: by mail-pl1-x631.google.com with SMTP id d9443c01a7336-1bbadf9ed37so1478695ad.0 for ; Mon, 24 Jul 2023 15:55:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1690239316; x=1690844116; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=A/R5UlJihGOa6+mEJqx8lHR7UM+4ST5J1wIdWJwVmsw=; b=Z4eRb4Fg/xOnz5US0iHTMZBP+UrWEINLRqVJiwIiQi93n/tGr8LMLJ/dLP9suWM6Vc H3R6uihOBZu+zqp0XHp7ZojMRFSEEebl65nWOjnpr87DSi19vDBEabDIGH1LseWpNxxW h9vWvqKHN1wm+O7wX8/EwV+2h1MLvdu6LEFk8NE42dpYCbwGYO/Uln6J67rhbhwK4NcB EoPwQF7DJFTHlRxM8ge98xdW6DLAiYpRWB7peB6narKkDKCEMvJUaoaV1JBycx57pXop n/zCosM6xzHK2S9yv2F4TEtP6qlzNxZPrY3h6wD8BE4hYCRcJh6GdOXKs30/oW1K8D9k en8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690239316; x=1690844116; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=A/R5UlJihGOa6+mEJqx8lHR7UM+4ST5J1wIdWJwVmsw=; b=Dhg3nRt6zk5e2xOAtZt8PoXzblRNMRmko6x7rTlZAEGFEY8Yy+zAUoQmfK8mXXAlko iZpVQ0D8bM0K1chSFBi1OP6XIHEXDnvVM6RIEFw4fSEbL+G++vud7HJ3avjghFKTRxYt 55uLmNtvPEe8ssFZMeXsTjKT6j91Clxv3Nbs25z98zaDD7LLTQTafF3jGr/GSpJPbfgZ WIz0afWVXJR0PGW9lGYwlaGlxW5E5f78u+RtS11o+1oxCYMgwlj1mTQncFErWrn2jyf5 fVQGHTAMgpuuMrh5GcQouXvQkymHUI17CaG+XNE32r0oR24TZh18SkazIgUiOTUzDJLN +5jA== X-Gm-Message-State: ABy/qLbAPJ7vSudaMOzLJxvfK5MrI5oEBbTyWUEoaK32+5nkPlvxI+CA JreOya2Zw6+5Isd2TLwGzjNldfW4WpWBECTvslk= X-Google-Smtp-Source: APBJJlFmb5w7Cf5UsiJZA1gyKD5FHUU0imdizY6mcFZDEw4ad1e0VV2drsmrNPcbUJTLy5v1L2O9sA== X-Received: by 2002:a17:903:41cd:b0:1bb:9e6e:a9f3 with SMTP id u13-20020a17090341cd00b001bb9e6ea9f3mr6381477ple.4.1690239316678; Mon, 24 Jul 2023 15:55:16 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id p7-20020a1709026b8700b001acae9734c0sm9424733plk.266.2023.07.24.15.55.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jul 2023 15:55:16 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, djwong@kernel.org, Jens Axboe Subject: [PATCH 1/8] iomap: cleanup up iomap_dio_bio_end_io() Date: Mon, 24 Jul 2023 16:55:04 -0600 Message-Id: <20230724225511.599870-2-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230724225511.599870-1-axboe@kernel.dk> References: <20230724225511.599870-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Make the logic a bit easier to follow: 1) Add a release_bio out path, as everybody needs to touch that, and have our bio ref check jump there if it's non-zero. 2) Add a kiocb local variable. 3) Add comments for each of the three conditions (sync, inline, or async workqueue punt). No functional changes in this patch. Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/iomap/direct-io.c | 46 +++++++++++++++++++++++++++++--------------- 1 file changed, 31 insertions(+), 15 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index ea3b868c8355..0ce60e80c901 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -152,27 +152,43 @@ void iomap_dio_bio_end_io(struct bio *bio) { struct iomap_dio *dio = bio->bi_private; bool should_dirty = (dio->flags & IOMAP_DIO_DIRTY); + struct kiocb *iocb = dio->iocb; if (bio->bi_status) iomap_dio_set_error(dio, blk_status_to_errno(bio->bi_status)); + if (!atomic_dec_and_test(&dio->ref)) + goto release_bio; - if (atomic_dec_and_test(&dio->ref)) { - if (dio->wait_for_completion) { - struct task_struct *waiter = dio->submit.waiter; - WRITE_ONCE(dio->submit.waiter, NULL); - blk_wake_io_task(waiter); - } else if (dio->flags & IOMAP_DIO_WRITE) { - struct inode *inode = file_inode(dio->iocb->ki_filp); - - WRITE_ONCE(dio->iocb->private, NULL); - INIT_WORK(&dio->aio.work, iomap_dio_complete_work); - queue_work(inode->i_sb->s_dio_done_wq, &dio->aio.work); - } else { - WRITE_ONCE(dio->iocb->private, NULL); - iomap_dio_complete_work(&dio->aio.work); - } + /* + * Synchronous dio, task itself will handle any completion work + * that needs after IO. All we need to do is wake the task. + */ + if (dio->wait_for_completion) { + struct task_struct *waiter = dio->submit.waiter; + + WRITE_ONCE(dio->submit.waiter, NULL); + blk_wake_io_task(waiter); + goto release_bio; + } + + /* Read completion can always complete inline. */ + if (!(dio->flags & IOMAP_DIO_WRITE)) { + WRITE_ONCE(iocb->private, NULL); + iomap_dio_complete_work(&dio->aio.work); + goto release_bio; } + /* + * Async DIO completion that requires filesystem level completion work + * gets punted to a work queue to complete as the operation may require + * more IO to be issued to finalise filesystem metadata changes or + * guarantee data integrity. + */ + WRITE_ONCE(iocb->private, NULL); + INIT_WORK(&dio->aio.work, iomap_dio_complete_work); + queue_work(file_inode(iocb->ki_filp)->i_sb->s_dio_done_wq, + &dio->aio.work); +release_bio: if (should_dirty) { bio_check_pages_dirty(bio); } else { From patchwork Mon Jul 24 22:55:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13325412 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 241A0C001E0 for ; Mon, 24 Jul 2023 22:55:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229800AbjGXWzU (ORCPT ); Mon, 24 Jul 2023 18:55:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33436 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230073AbjGXWzT (ORCPT ); Mon, 24 Jul 2023 18:55:19 -0400 Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 88F601B8 for ; Mon, 24 Jul 2023 15:55:18 -0700 (PDT) Received: by mail-pl1-x634.google.com with SMTP id d9443c01a7336-1bb91c20602so3392625ad.0 for ; Mon, 24 Jul 2023 15:55:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1690239318; x=1690844118; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=eVeLYS33nnc1708rDqfbQibRM29ZHSZOxWFzkowQIMY=; b=ykm9gWdurj0xOxNjjFoGt7mW+OF9Gw/mBBQYDdfmgAR63NziZCw7JWsImTNuRnwqQo Wa1aZQ8/c6v82ktiGrh8sftd+3evkOvwVSYT8EuHB0ykswG2vdvFhgDu67fczpcRPwmn DRFxhrtpDi+H7zZy0A5D0sjP93cQYLEZE4vhxnhA/7bDs1VI2cayrOp5LJdJN4v2f5Rt /j/RKgIBYDUYcz+Wa8NLZcr39bRM/paQtfR6K2cNHC/QiYD/u6s6Uk6dr4CIiIeEwFSI DgBH9UNr3jaCGYG61cc8CciPApi+Q7wQJOCJ0SdmTPBHxayNs9nRH6rM51+6gYeaxbJM gn9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690239318; x=1690844118; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eVeLYS33nnc1708rDqfbQibRM29ZHSZOxWFzkowQIMY=; b=SiLzw4ylc3lWV06zcrMvJKQYKN+5T10Rj9cU6vTWKyAoQmIn26E0KVK0HhadTJq+QD UCkSZQxgrmj9Xg8tt+gk2vfbs7NhMef+RgNJFuZStzx3mCkmD9/FC6lvQMosVBgAFWku yeqh6xE3I+HVeZgNZzFad9qXZnxTcyeWkta6fBMg7Xn/Sl3bztz4AWhajeOgNNilhGLl Hx53WY3ytHtXt6dV2WKkSsYlXHUN8+nEtH9LtVO2MxozvW41xS3CEDoRNbbhsObXhEJ9 lyjv5E5a8PBr3b6N9QlcB1IIq/OySAsLwrImEj++Fa+5Q8GcnIqbrnuY87PTVqI2lHAP 1RZg== X-Gm-Message-State: ABy/qLbQBKr1f5cGDYofuJyCfKnDM10/qp/o5xxXYofSRnjOq641iNSs OazscoJuY7JXK8uMgiiSTpucHpIilWkXHREz+QQ= X-Google-Smtp-Source: APBJJlGftuzyUY14869rqyzSOwXATOafJNKRe47t9tts3Omzk/El6tbEHeyUXO9Zpn43CHB8fY3Tlg== X-Received: by 2002:a17:902:dad2:b0:1b8:9fc4:2733 with SMTP id q18-20020a170902dad200b001b89fc42733mr14708729plx.3.1690239317766; Mon, 24 Jul 2023 15:55:17 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id p7-20020a1709026b8700b001acae9734c0sm9424733plk.266.2023.07.24.15.55.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jul 2023 15:55:17 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, djwong@kernel.org, Jens Axboe Subject: [PATCH 2/8] iomap: use an unsigned type for IOMAP_DIO_* defines Date: Mon, 24 Jul 2023 16:55:05 -0600 Message-Id: <20230724225511.599870-3-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230724225511.599870-1-axboe@kernel.dk> References: <20230724225511.599870-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org IOMAP_DIO_DIRTY shifts by 31 bits, which makes UBSAN unhappy. Clean up all the defines by making the shifted value an unsigned value. Reviewed-by: Darrick J. Wong Reported-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/iomap/direct-io.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 0ce60e80c901..7d627d43d10b 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -20,10 +20,10 @@ * Private flags for iomap_dio, must not overlap with the public ones in * iomap.h: */ -#define IOMAP_DIO_WRITE_FUA (1 << 28) -#define IOMAP_DIO_NEED_SYNC (1 << 29) -#define IOMAP_DIO_WRITE (1 << 30) -#define IOMAP_DIO_DIRTY (1 << 31) +#define IOMAP_DIO_WRITE_FUA (1U << 28) +#define IOMAP_DIO_NEED_SYNC (1U << 29) +#define IOMAP_DIO_WRITE (1U << 30) +#define IOMAP_DIO_DIRTY (1U << 31) struct iomap_dio { struct kiocb *iocb; From patchwork Mon Jul 24 22:55:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13325414 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA0D4EB64DD for ; Mon, 24 Jul 2023 22:55:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230143AbjGXWzV (ORCPT ); Mon, 24 Jul 2023 18:55:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33458 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230062AbjGXWzU (ORCPT ); Mon, 24 Jul 2023 18:55:20 -0400 Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 780F3100 for ; Mon, 24 Jul 2023 15:55:19 -0700 (PDT) Received: by mail-pl1-x633.google.com with SMTP id d9443c01a7336-1bbadf9ed37so1478745ad.0 for ; Mon, 24 Jul 2023 15:55:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1690239318; x=1690844118; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Wx3gDejnv+MYuO8cUn67Qw106/rsCHumB4kwemSJRZY=; b=bm+YKX6D1BwkR4vrs42vdWoeqGRLlquEzDtbB0b00LKREL0LWQEHHMtZV+Lve3k79k nJlObut1EZppwPrHtzilkZnngV2wP5bNloU+coVqsuEOL/nmSD5mM5mVyfZeCs0KdGrc eUjDg+Mswj6k7ivu7z7L/9wxEMuwiEJoztbGAfmnGDVcsfkIIXcDMR+CB0aT5Fb3WpO0 qaS049XZo1myuKfEwG+guOIHssrbJ1NhrfOxOG3Kyrt8JeE64/L+kHziZl6wDN70GdmX P92zM73IY4gVZwhJJBo4VQCBKTWrSEt9KqfghLeUxNDlFBApp8YNLhVNxJA/StUHYMAp px6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690239318; x=1690844118; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Wx3gDejnv+MYuO8cUn67Qw106/rsCHumB4kwemSJRZY=; b=dLSd29ZngFCwRFE6UfTPakXciQqcZ2RW6eP+ZwOUENCTzyDjhF8tPxOpxmKd94D4ot 5AwVbSXucsP+69/6r4i4orWino12sfx9vfhT0qtCwHmeB62ZveWSEvJ/3wVEhzAI/DIs jWMpLLftRXr18ZSaOkju8v8Ra1/sEIOEHeO4dziGbmtMZlSi4vakNPA1oHFQ8RKpzN2x SqN1zn5MikTbqeUw+SqbiuO0db6LNkF6HFP6xhISE1E3VODb9XJAHiHfRFYMW5+y5PCx LnBKC/c4nf9TzcIQK7P3A1Ow6pqI7nWgGFwlB3fwtexNKvHibgfdFJLYhJw3HUvRzNqc Bq0A== X-Gm-Message-State: ABy/qLaap2ZFIdkqGPos1nCMKSztCv32UydyerCwXg7ZSjAMmYpi0L3z vm3M3krk/gik/vbGCT336X8MmG6UoTdIqGyUg1c= X-Google-Smtp-Source: APBJJlHvNGqZ/QpFqqw+CFa53OAtegzHdOldV4x76K1a9aUXPkJY8NVWbAeVvyGXes1jwAY4Frt8tw== X-Received: by 2002:a17:902:e74d:b0:1bb:ac37:384b with SMTP id p13-20020a170902e74d00b001bbac37384bmr3491050plf.6.1690239318701; Mon, 24 Jul 2023 15:55:18 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id p7-20020a1709026b8700b001acae9734c0sm9424733plk.266.2023.07.24.15.55.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jul 2023 15:55:18 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, djwong@kernel.org, Jens Axboe Subject: [PATCH 3/8] iomap: treat a write through cache the same as FUA Date: Mon, 24 Jul 2023 16:55:06 -0600 Message-Id: <20230724225511.599870-4-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230724225511.599870-1-axboe@kernel.dk> References: <20230724225511.599870-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Whether we have a write back cache and are using FUA or don't have a write back cache at all is the same situation. Treat them the same. This makes the IOMAP_DIO_WRITE_FUA name a bit misleading, as we have two cases that provide stable writes: 1) Volatile write cache with FUA writes 2) Normal write without a volatile write cache Rename that flag to IOMAP_DIO_STABLE_WRITE to make that clearer, and update some of the FUA comments as well. Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/iomap/direct-io.c | 34 ++++++++++++++++++++-------------- 1 file changed, 20 insertions(+), 14 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 7d627d43d10b..6b690fc22365 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -20,7 +20,7 @@ * Private flags for iomap_dio, must not overlap with the public ones in * iomap.h: */ -#define IOMAP_DIO_WRITE_FUA (1U << 28) +#define IOMAP_DIO_WRITE_THROUGH (1U << 28) #define IOMAP_DIO_NEED_SYNC (1U << 29) #define IOMAP_DIO_WRITE (1U << 30) #define IOMAP_DIO_DIRTY (1U << 31) @@ -219,7 +219,7 @@ static void iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, /* * Figure out the bio's operation flags from the dio request, the * mapping, and whether or not we want FUA. Note that we can end up - * clearing the WRITE_FUA flag in the dio request. + * clearing the WRITE_THROUGH flag in the dio request. */ static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio, const struct iomap *iomap, bool use_fua) @@ -233,7 +233,7 @@ static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio, if (use_fua) opflags |= REQ_FUA; else - dio->flags &= ~IOMAP_DIO_WRITE_FUA; + dio->flags &= ~IOMAP_DIO_WRITE_THROUGH; return opflags; } @@ -273,11 +273,13 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, * Use a FUA write if we need datasync semantics, this is a pure * data IO that doesn't require any metadata updates (including * after IO completion such as unwritten extent conversion) and - * the underlying device supports FUA. This allows us to avoid - * cache flushes on IO completion. + * the underlying device either supports FUA or doesn't have + * a volatile write cache. This allows us to avoid cache flushes + * on IO completion. */ if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) && - (dio->flags & IOMAP_DIO_WRITE_FUA) && bdev_fua(iomap->bdev)) + (dio->flags & IOMAP_DIO_WRITE_THROUGH) && + (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev))) use_fua = true; } @@ -553,13 +555,16 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, dio->flags |= IOMAP_DIO_NEED_SYNC; /* - * For datasync only writes, we optimistically try - * using FUA for this IO. Any non-FUA write that - * occurs will clear this flag, hence we know before - * completion whether a cache flush is necessary. + * For datasync only writes, we optimistically try using + * WRITE_THROUGH for this IO. This flag requires either + * FUA writes through the device's write cache, or a + * normal write to a device without a volatile write + * cache. For the former, Any non-FUA write that occurs + * will clear this flag, hence we know before completion + * whether a cache flush is necessary. */ if (!(iocb->ki_flags & IOCB_SYNC)) - dio->flags |= IOMAP_DIO_WRITE_FUA; + dio->flags |= IOMAP_DIO_WRITE_THROUGH; } /* @@ -621,10 +626,11 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, iomap_dio_set_error(dio, ret); /* - * If all the writes we issued were FUA, we don't need to flush the - * cache on IO completion. Clear the sync flag for this case. + * If all the writes we issued were already written through to the + * media, we don't need to flush the cache on IO completion. Clear the + * sync flag for this case. */ - if (dio->flags & IOMAP_DIO_WRITE_FUA) + if (dio->flags & IOMAP_DIO_WRITE_THROUGH) dio->flags &= ~IOMAP_DIO_NEED_SYNC; WRITE_ONCE(iocb->private, dio->submit.poll_bio); From patchwork Mon Jul 24 22:55:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13325417 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC457C41513 for ; Mon, 24 Jul 2023 22:55:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230062AbjGXWzW (ORCPT ); Mon, 24 Jul 2023 18:55:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33466 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229496AbjGXWzV (ORCPT ); Mon, 24 Jul 2023 18:55:21 -0400 Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5EC3AE5A for ; Mon, 24 Jul 2023 15:55:20 -0700 (PDT) Received: by mail-pl1-x633.google.com with SMTP id d9443c01a7336-1b7dfb95761so7039465ad.1 for ; Mon, 24 Jul 2023 15:55:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1690239319; x=1690844119; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=VaOldHsyTD5Hhzj5cIOnYQsnyuSsb37LHUEv6JZdWBU=; b=kWU+884CPmoOgYk3dplzwKBrvgeyCkYZpstecaAjynjjFeu99X4hes3fF3hrHaJ2FX rGITHDyrqLXX0tMIzib5aIHgvAftr79kYXMz6h7P3Lk6qts4qADy/KbcfHF6p3c7rn6n xV3k4ifjXdCkn9BoL4rdktFdkxY0NZ+YZ8SXtIK6y+IjY7O7+0orxvTQUdfZ1oY4jmqp fynnLQUHFMqv29FKcalYuBpmeRJNIVqB85yJTdTNGIYAt0M6nZwmJGKJw2Mo0i6c9Jw0 giYGzlY5BORCLwoFsEkPsFfe+CnuhrEGttyKbPBmJlogLGS74OiiaZGOCHbn76uk5hv7 CBmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690239319; x=1690844119; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VaOldHsyTD5Hhzj5cIOnYQsnyuSsb37LHUEv6JZdWBU=; b=W2HTXVIZwpP1HR+EFy4IzVUM7UdYiXmlL9vFAkjm0cW1LgrcWJr9e3hdqOOlw4vmiG lG/U8vc3IV8JuvSu567fKhNyynOJGTUYAxZ6m6q7CPrLa5rTHiBgd1g7ctyrdTdUMR9b re9bhs03bOLKSVnus8bmTd2IJMv3iRl5sFa2VtYyRXlshKKB+r3Eet5xhrCsXt1U68Y7 Q9wZBLLgw5h5FD4Rygyi/fIcBnJUjPY3Gj+oXPa9m0C4fw5TkeCw+KCnbA9yfTSX6FN9 DwCLvsJJF03Dmjfjx2n5C/tH7tB7QfTq7g+kUJIK5fSZ8ZP9FpjpvBx0126Vnx+inwo+ s2gg== X-Gm-Message-State: ABy/qLbTzcC1nlMQjMlNgHBs4N+MFvWM7DydkcL8F437xWnENB4I+7fa QHColWKI+Y0y8MVaYSXTVH4dOXlMyitzH/ttIng= X-Google-Smtp-Source: APBJJlFGDAl/BePiqiyq9kDB9ah5cnhnCYsm7+4fdelWpYWIzFjJjVVKnmUzN9oI637OpbfH7TDZyw== X-Received: by 2002:a17:903:32c9:b0:1b8:5827:8763 with SMTP id i9-20020a17090332c900b001b858278763mr14369452plr.4.1690239319665; Mon, 24 Jul 2023 15:55:19 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id p7-20020a1709026b8700b001acae9734c0sm9424733plk.266.2023.07.24.15.55.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jul 2023 15:55:19 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, djwong@kernel.org, Jens Axboe Subject: [PATCH 4/8] iomap: only set iocb->private for polled bio Date: Mon, 24 Jul 2023 16:55:07 -0600 Message-Id: <20230724225511.599870-5-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230724225511.599870-1-axboe@kernel.dk> References: <20230724225511.599870-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org iocb->private is only used for polled IO, where the completer will find the bio to poll through that field. Assign it when we're submitting a polled bio, and get rid of the dio->poll_bio indirection. Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/iomap/direct-io.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 6b690fc22365..e4b9d9123b75 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -41,7 +41,6 @@ struct iomap_dio { struct { struct iov_iter *iter; struct task_struct *waiter; - struct bio *poll_bio; } submit; /* used for aio completion: */ @@ -63,12 +62,14 @@ static struct bio *iomap_dio_alloc_bio(const struct iomap_iter *iter, static void iomap_dio_submit_bio(const struct iomap_iter *iter, struct iomap_dio *dio, struct bio *bio, loff_t pos) { + struct kiocb *iocb = dio->iocb; + atomic_inc(&dio->ref); /* Sync dio can't be polled reliably */ - if ((dio->iocb->ki_flags & IOCB_HIPRI) && !is_sync_kiocb(dio->iocb)) { - bio_set_polled(bio, dio->iocb); - dio->submit.poll_bio = bio; + if ((iocb->ki_flags & IOCB_HIPRI) && !is_sync_kiocb(iocb)) { + bio_set_polled(bio, iocb); + WRITE_ONCE(iocb->private, bio); } if (dio->dops && dio->dops->submit_io) @@ -184,7 +185,6 @@ void iomap_dio_bio_end_io(struct bio *bio) * more IO to be issued to finalise filesystem metadata changes or * guarantee data integrity. */ - WRITE_ONCE(iocb->private, NULL); INIT_WORK(&dio->aio.work, iomap_dio_complete_work); queue_work(file_inode(iocb->ki_filp)->i_sb->s_dio_done_wq, &dio->aio.work); @@ -523,7 +523,6 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, dio->submit.iter = iter; dio->submit.waiter = current; - dio->submit.poll_bio = NULL; if (iocb->ki_flags & IOCB_NOWAIT) iomi.flags |= IOMAP_NOWAIT; @@ -633,8 +632,6 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, if (dio->flags & IOMAP_DIO_WRITE_THROUGH) dio->flags &= ~IOMAP_DIO_NEED_SYNC; - WRITE_ONCE(iocb->private, dio->submit.poll_bio); - /* * We are about to drop our additional submission reference, which * might be the last reference to the dio. There are three different From patchwork Mon Jul 24 22:55:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13325415 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB5C4C04E69 for ; Mon, 24 Jul 2023 22:55:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230298AbjGXWzX (ORCPT ); Mon, 24 Jul 2023 18:55:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33496 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230207AbjGXWzW (ORCPT ); Mon, 24 Jul 2023 18:55:22 -0400 Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com [IPv6:2607:f8b0:4864:20::62e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 49D2D10FA for ; Mon, 24 Jul 2023 15:55:21 -0700 (PDT) Received: by mail-pl1-x62e.google.com with SMTP id d9443c01a7336-1bba9539a23so1825965ad.1 for ; Mon, 24 Jul 2023 15:55:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1690239320; x=1690844120; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=UkiYXAkH9GVD15ideYRW3eg5TFVpCBJh44wXJ9Qv8IA=; b=uMRNz4lYhIHghddKBkDuUHf0sS4xU5ctdO8KePyUfVQHcgs3wI++XVI2MyZ3fcz0ok GO8j6DrkoPJy8E0LBMD7NWiH5Be6+/grUqqthwMNBx1iyWW1q8g4L0oNULyvp8E20asX fnjUCaUqLN+ivFtIinQDYENNz/pQdPdufuuL3FOBlCJ65cWpI/QGU6W1Hq5ZJhBWBUk+ zapQYXyROrczWGS/MU/lvkk6Zdz2i5SjWZkMiR/WTQsI9izC2DULRCFQDweG4xDZEJVX 1ePCWR59SPbppPg9TK3asjiY1soPE7RXVMjq17RyT1H5R5LMNcOND1gb6potLHvmTkll 5NuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690239320; x=1690844120; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UkiYXAkH9GVD15ideYRW3eg5TFVpCBJh44wXJ9Qv8IA=; b=LqigATWqwcgy7PaJpvSSqygOFaDz0NtS2ujOI6zvcLV6/UmnKtG6jqJNtJoCAbrl07 h+1LDF8ucErDZTv0aV9rxQ5mm7kLC5G95Wc8laFb/fBB94N4uvZZruNpAbCsK1L3YZdl 0YNarxg7TXn562yd/aX/A/e8JcpI6iHPd0tONohVI6CjFWQAnJW9qOk9vWsX3eQqAcMy ZQ0g6HF61mXo8DlvjCjGV44YB9yvOs0AufPhDfw8obsnvYOsOTlFPhe6cXWZJ+w9+RQd mZBgIIGOL4RU/Vcql4LlqWhpDjr43r7vhqcCnk25vzWAYjiAicgAPEaYuchf+1XXe1gZ sxXg== X-Gm-Message-State: ABy/qLb3YuEbzkvNdMYhzQTE7RLq6m/Yo7hVVpKmHRibhkF+OCLzO6UV XuW3XgL1m0VT0BlvdnsGOQLgo1I+14ZZbpGr9V8= X-Google-Smtp-Source: APBJJlE75QLPlazs431saOk4q704bVzGQ9hXd/HdIVqUe6ICO+GIg7JpBbH53tBMy7aw60lbIeereg== X-Received: by 2002:a17:903:2305:b0:1b8:b0c4:2e3d with SMTP id d5-20020a170903230500b001b8b0c42e3dmr14474787plh.4.1690239320604; Mon, 24 Jul 2023 15:55:20 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id p7-20020a1709026b8700b001acae9734c0sm9424733plk.266.2023.07.24.15.55.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jul 2023 15:55:20 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, djwong@kernel.org, Jens Axboe Subject: [PATCH 5/8] iomap: add IOMAP_DIO_INLINE_COMP Date: Mon, 24 Jul 2023 16:55:08 -0600 Message-Id: <20230724225511.599870-6-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230724225511.599870-1-axboe@kernel.dk> References: <20230724225511.599870-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Rather than gate whether or not we need to punt a dio completion to a workqueue on whether the IO is a write or not, add an explicit flag for it. For now we treat them the same, reads always set the flags and async writes do not. No functional changes in this patch. Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/iomap/direct-io.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index e4b9d9123b75..b943bc5c7b18 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -20,6 +20,7 @@ * Private flags for iomap_dio, must not overlap with the public ones in * iomap.h: */ +#define IOMAP_DIO_INLINE_COMP (1U << 27) #define IOMAP_DIO_WRITE_THROUGH (1U << 28) #define IOMAP_DIO_NEED_SYNC (1U << 29) #define IOMAP_DIO_WRITE (1U << 30) @@ -172,8 +173,10 @@ void iomap_dio_bio_end_io(struct bio *bio) goto release_bio; } - /* Read completion can always complete inline. */ - if (!(dio->flags & IOMAP_DIO_WRITE)) { + /* + * Flagged with IOMAP_DIO_INLINE_COMP, we can complete it inline + */ + if (dio->flags & IOMAP_DIO_INLINE_COMP) { WRITE_ONCE(iocb->private, NULL); iomap_dio_complete_work(&dio->aio.work); goto release_bio; @@ -528,6 +531,9 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, iomi.flags |= IOMAP_NOWAIT; if (iov_iter_rw(iter) == READ) { + /* reads can always complete inline */ + dio->flags |= IOMAP_DIO_INLINE_COMP; + if (iomi.pos >= dio->i_size) goto out_free_dio; From patchwork Mon Jul 24 22:55:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13325416 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3618BC04FE0 for ; Mon, 24 Jul 2023 22:55:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230317AbjGXWzY (ORCPT ); Mon, 24 Jul 2023 18:55:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33512 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230288AbjGXWzX (ORCPT ); Mon, 24 Jul 2023 18:55:23 -0400 Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6954D100 for ; Mon, 24 Jul 2023 15:55:22 -0700 (PDT) Received: by mail-pl1-x635.google.com with SMTP id d9443c01a7336-1bbadf9ed37so1478815ad.0 for ; Mon, 24 Jul 2023 15:55:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1690239321; x=1690844121; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4m73hDbmfHkB6vBAc3X7AmWd4GXS2D47NYfUBsYjTeE=; b=VG7ninWMl0JZlPdtIdNJcgkEXQh8MU+lBObbQkV5neVUmOxeTsurIAurlrNjadlfEk DbgsRETI6qP4mK3+D6R6o7o+SsqRLBHSuqTmTKPD18i6b/MNyPETV3U+eenLXgF7ot9d bviEE8LIdTNW+55BzbCtmr4SdSJmLpeCtdWBuA7RKR+K6glPgb7W5iE1csdiKxcQVqON O1Aelyg5+JUb5Yz3qcWbAqMoHKehauwOhIamUhoUOFI09lapOejJkVANZvSOrL60CYlj mOs713G+KiVtz022nbct3jHESv7OWvwynKBqfPuzycunxnsQdS2mFLpirHxtZKCx+SNn dVqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690239321; x=1690844121; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4m73hDbmfHkB6vBAc3X7AmWd4GXS2D47NYfUBsYjTeE=; b=HFU729/QCTGXmuXLwBQd2SIGio7b+LtjZzhEwx6ETSogaFen5Ia0d9ZfLm74kGOGA6 o7elN3KKfHCsTahXgizi622ETBJml7wKAgu3kUp7r5UZsG3zn9ICD1wi1xKWUBM3nvwF XXfSMFuhNwJfNCjXiLXwJtmrh2H9oRnmEY4dnEwjHCrQExEIc4y6Tue+z5E258b6LX+J nffTJK3c2zCTsNnhuCZy1s64U0A44F7XNNWIJM6aWL9cpXN5BBmA+ilOjvKU0+13wkI2 8w9V/t6FeCRK3xSj4b6ZzRyA+EZ0/7Vn7Zx8M7Et/W6UitYpl1aDGdr36fXIP92czD6I Sbgg== X-Gm-Message-State: ABy/qLZQi8Bt44eGOUzPzGAgMrNnqC/UOFoB9LSN/WRBKvKzrQvI4wqV xsvCgFd5hkG4gXHUql6MCH2gTt5CTO6qDM3bDjs= X-Google-Smtp-Source: APBJJlGVStg2IqBph0zTI/BetjM8/WiT7C7LPmmDKrWD9iKfcw4MKCaPlznHf9Vaj0ssvFl3I+fUsg== X-Received: by 2002:a17:902:d4c6:b0:1b8:85c4:48f5 with SMTP id o6-20020a170902d4c600b001b885c448f5mr15164004plg.2.1690239321659; Mon, 24 Jul 2023 15:55:21 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id p7-20020a1709026b8700b001acae9734c0sm9424733plk.266.2023.07.24.15.55.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jul 2023 15:55:21 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, djwong@kernel.org, Jens Axboe Subject: [PATCH 6/8] fs: add IOCB flags related to passing back dio completions Date: Mon, 24 Jul 2023 16:55:09 -0600 Message-Id: <20230724225511.599870-7-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230724225511.599870-1-axboe@kernel.dk> References: <20230724225511.599870-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Async dio completions generally happen from hard/soft IRQ context, which means that users like iomap may need to defer some of the completion handling to a workqueue. This is less efficient than having the original issuer handle it, like we do for sync IO, and it adds latency to the completions. Add IOCB_DIO_CALLER_COMP, which the issuer can set if it is able to safely punt these completions to a safe context. If the dio handler is aware of this flag, assign a callback handler in kiocb->dio_complete and associated data io kiocb->private. The issuer will then call this handler with that data from task context. No functional changes in this patch. Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- include/linux/fs.h | 35 +++++++++++++++++++++++++++++++++-- 1 file changed, 33 insertions(+), 2 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index 6867512907d6..1e6dbe309d52 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -338,6 +338,20 @@ enum rw_hint { #define IOCB_NOIO (1 << 20) /* can use bio alloc cache */ #define IOCB_ALLOC_CACHE (1 << 21) +/* + * IOCB_DIO_CALLER_COMP can be set by the iocb owner, to indicate that the + * iocb completion can be passed back to the owner for execution from a safe + * context rather than needing to be punted through a workqueue. If this + * flag is set, the bio completion handling may set iocb->dio_complete to a + * handler function and iocb->private to context information for that handler. + * The issuer should call the handler with that context information from task + * context to complete the processing of the iocb. Note that while this + * provides a task context for the dio_complete() callback, it should only be + * used on the completion side for non-IO generating completions. It's fine to + * call blocking functions from this callback, but they should not wait for + * unrelated IO (like cache flushing, new IO generation, etc). + */ +#define IOCB_DIO_CALLER_COMP (1 << 22) /* for use in trace events */ #define TRACE_IOCB_STRINGS \ @@ -351,7 +365,8 @@ enum rw_hint { { IOCB_WRITE, "WRITE" }, \ { IOCB_WAITQ, "WAITQ" }, \ { IOCB_NOIO, "NOIO" }, \ - { IOCB_ALLOC_CACHE, "ALLOC_CACHE" } + { IOCB_ALLOC_CACHE, "ALLOC_CACHE" }, \ + { IOCB_DIO_CALLER_COMP, "CALLER_COMP" } struct kiocb { struct file *ki_filp; @@ -360,7 +375,23 @@ struct kiocb { void *private; int ki_flags; u16 ki_ioprio; /* See linux/ioprio.h */ - struct wait_page_queue *ki_waitq; /* for async buffered IO */ + union { + /* + * Only used for async buffered reads, where it denotes the + * page waitqueue associated with completing the read. Valid + * IFF IOCB_WAITQ is set. + */ + struct wait_page_queue *ki_waitq; + /* + * Can be used for O_DIRECT IO, where the completion handling + * is punted back to the issuer of the IO. May only be set + * if IOCB_DIO_CALLER_COMP is set by the issuer, and the issuer + * must then check for presence of this handler when ki_complete + * is invoked. The data passed in to this handler must be + * assigned to ->private when dio_complete is assigned. + */ + ssize_t (*dio_complete)(void *data); + }; }; static inline bool is_sync_kiocb(struct kiocb *kiocb) From patchwork Mon Jul 24 22:55:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13325418 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95B8DC04A6A for ; Mon, 24 Jul 2023 22:55:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230329AbjGXWzZ (ORCPT ); Mon, 24 Jul 2023 18:55:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33534 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229496AbjGXWzY (ORCPT ); Mon, 24 Jul 2023 18:55:24 -0400 Received: from mail-pl1-x62a.google.com (mail-pl1-x62a.google.com [IPv6:2607:f8b0:4864:20::62a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B14510FA for ; Mon, 24 Jul 2023 15:55:23 -0700 (PDT) Received: by mail-pl1-x62a.google.com with SMTP id d9443c01a7336-1b89b0c73d7so8121275ad.1 for ; Mon, 24 Jul 2023 15:55:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1690239322; x=1690844122; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=pjL6qHDAMp43y83zcgUBo4WIWL310kwBcjaLigNwC5o=; b=RZaFqrVRlaMJTDgYZnV9ei7uQv6WyNsF/3oHK39dfgfrQoSoiKZx5Eu+nfPbPxcnju L3aVKbpb1Y5giwjoeiuVFNhnQLpERXvLpjLO1BhHxqW/Fuj/ACjJfp03hGxwK03xwng9 WtxNwN9MAuHT8Xs7Foteb3OuO35pLd8D5rAbvOizr7LQg1uaBmTAmGDklYfWDvofnpj/ bTyhlSDrLjTDzfZXQU9RRR2RAYarRUxb2yFbuwUz4izvkyIdwKLTZ3HLT+bcAUPfY3va sJJ8oopd5mxkNnfQ8xT6GxZPRTwsOlRa44PI7r14X6sIMRlG/wbdqDbDub+Ea6x/mnh/ XJlg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690239322; x=1690844122; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pjL6qHDAMp43y83zcgUBo4WIWL310kwBcjaLigNwC5o=; b=cz/ywgjphhyLShgo6Fh5nmlHi5kvecq6LMNZRggU0kl8WqlTfOBAIKN+gQSpmP4HNh 7tG4wwQ2HmAxOI5wqEske7rTasGIOheoQQcl9JbhINeqWNzbf8DSa/eweih5aRQHE2y0 3fTNiin4BD8KgtKob8ILSdbE/XZFzMtEAlWGnmQUFJJuTsZbQ1X5vIjFZf8UcXInamhr iTX5QJlolsJeBXKkrzat8RKARpghtPXN72UEH0Jd+nJgvaCfWCqov6/xF9NgPYdCj5X6 3iAvdaNxY7eXOhJr6TBnQTbjlTdSIH4MQr6Fkc4cUC7IU5s8q/AYvKKB3glXu+ZfHn49 W9Gw== X-Gm-Message-State: ABy/qLYlN0nlOVt80wbdpl61Ps4GBYK9i/p3SPfNilSGLfsZRrS5pcD2 MF0onZLftyoa3wILYqzqKQYY/mIyB+kwpxvbuk0= X-Google-Smtp-Source: APBJJlGqVaFhkrrW+NY3IBwC4JQe4MjPRh7n5pVFU87+M8pw/bZU9P+wACt1FrfPaWZfwkFAMEgi+A== X-Received: by 2002:a17:902:d4c6:b0:1b8:85c4:48f5 with SMTP id o6-20020a170902d4c600b001b885c448f5mr15164040plg.2.1690239322608; Mon, 24 Jul 2023 15:55:22 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id p7-20020a1709026b8700b001acae9734c0sm9424733plk.266.2023.07.24.15.55.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jul 2023 15:55:22 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, djwong@kernel.org, Jens Axboe Subject: [PATCH 7/8] io_uring/rw: add write support for IOCB_DIO_CALLER_COMP Date: Mon, 24 Jul 2023 16:55:10 -0600 Message-Id: <20230724225511.599870-8-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230724225511.599870-1-axboe@kernel.dk> References: <20230724225511.599870-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org If the filesystem dio handler understands IOCB_DIO_CALLER_COMP, we'll get a kiocb->ki_complete() callback with kiocb->dio_complete set. In that case, rather than complete the IO directly through task_work, queue up an intermediate task_work handler that first processes this callback and then immediately completes the request. For XFS, this avoids a punt through a workqueue, which is a lot less efficient and adds latency to lower queue depth (or sync) O_DIRECT writes. Only do this for non-polled IO, as polled IO doesn't need this kind of deferral as it always completes within the task itself. This then avoids a check for deferral in the polled IO completion handler. Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- io_uring/rw.c | 26 +++++++++++++++++++++++--- 1 file changed, 23 insertions(+), 3 deletions(-) diff --git a/io_uring/rw.c b/io_uring/rw.c index 1bce2208b65c..f19f65b3f0ee 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -105,6 +105,7 @@ int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe) } else { rw->kiocb.ki_ioprio = get_current_ioprio(); } + rw->kiocb.dio_complete = NULL; rw->addr = READ_ONCE(sqe->addr); rw->len = READ_ONCE(sqe->len); @@ -285,6 +286,14 @@ static inline int io_fixup_rw_res(struct io_kiocb *req, long res) void io_req_rw_complete(struct io_kiocb *req, struct io_tw_state *ts) { + struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); + + if (rw->kiocb.dio_complete) { + long res = rw->kiocb.dio_complete(rw->kiocb.private); + + io_req_set_res(req, io_fixup_rw_res(req, res), 0); + } + io_req_io_end(req); if (req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)) { @@ -300,9 +309,11 @@ static void io_complete_rw(struct kiocb *kiocb, long res) struct io_rw *rw = container_of(kiocb, struct io_rw, kiocb); struct io_kiocb *req = cmd_to_io_kiocb(rw); - if (__io_complete_rw_common(req, res)) - return; - io_req_set_res(req, io_fixup_rw_res(req, res), 0); + if (!rw->kiocb.dio_complete) { + if (__io_complete_rw_common(req, res)) + return; + io_req_set_res(req, io_fixup_rw_res(req, res), 0); + } req->io_task_work.func = io_req_rw_complete; __io_req_task_work_add(req, IOU_F_TWQ_LAZY_WAKE); } @@ -916,6 +927,15 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags) } kiocb->ki_flags |= IOCB_WRITE; + /* + * For non-polled IO, set IOCB_DIO_CALLER_COMP, stating that our handler + * groks deferring the completion to task context. This isn't + * necessary and useful for polled IO as that can always complete + * directly. + */ + if (!(kiocb->ki_flags & IOCB_HIPRI)) + kiocb->ki_flags |= IOCB_DIO_CALLER_COMP; + if (likely(req->file->f_op->write_iter)) ret2 = call_write_iter(req->file, kiocb, &s->iter); else if (req->file->f_op->write) From patchwork Mon Jul 24 22:55:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13325419 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2E5AC04A94 for ; Mon, 24 Jul 2023 22:55:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230288AbjGXWz0 (ORCPT ); Mon, 24 Jul 2023 18:55:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33566 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230344AbjGXWzZ (ORCPT ); Mon, 24 Jul 2023 18:55:25 -0400 Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3695DE5A for ; Mon, 24 Jul 2023 15:55:24 -0700 (PDT) Received: by mail-pl1-x635.google.com with SMTP id d9443c01a7336-1bb91c20602so3392835ad.0 for ; Mon, 24 Jul 2023 15:55:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1690239323; x=1690844123; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+DqlprFT3rIJ93hb4ytZ/U+5DLbYbXZZtopIzdBjbSs=; b=qZyXJ5GOrSNROW0VLYT+HLv0T1KL6Mh1WHD9ZALp0ngmW4xvke7aJv0gGltZeOBrt7 bYoLfr0qjEzJN0R3KP25nnJ08in3S5axLYskHHTWQa8/hIZk22fjqiljQemLyYOewTUp vM3o42lLuoR0DnSDGIF9IyswN6hE/xUDaE7CqjnM3Cpjv/PnCLuQplD8m28cfjs+DjJL 86VtF80yyUGCWXrYz3lDs1YP135U8OlFUr8wntQYEUacAU8U5UdksHX+IxUYL7C77Wqn Egf3Snu+mupFd09XqzSbFVwpjOB5GTaRwQ1+lHrY5pxcqXhGEa7adagVSIppXj3681SW PZLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690239323; x=1690844123; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+DqlprFT3rIJ93hb4ytZ/U+5DLbYbXZZtopIzdBjbSs=; b=cuepMrBebHNuxoaFMNdbqc1V0vCnfBp7DRS06mG/D3e6acAWaKuJl1a/PmrZoZMdHM NsMt7GIsCYnqyU6LgZsFzMvLsQjcgfmKGN0Cjd+tB4eArZ42heyBfm5/Yd5c0hBcqUta G5XHmVsvpPuUpRPevr0gto1cWPH3dvfwYcihKm/1TjMX4bq4QeqpP0hSTsvUkMB3f01Z Uuh13QUykl9GXpcYGATUiOEIg/ELZykcFNq1LodLMak0XzbsqfFhoxTd/cJLpYPx6htz bqaYuTnKY3mJQfMjm/qeC3ZfsCpS03QpInXPPlkccwBrD6whudL7R4R2PbbFp04ReoZv S/9Q== X-Gm-Message-State: ABy/qLYBmtG5S0/Pa7/+DPLZaXChg5M4Yh5mE3zkQy26vzyS2A9EwFKV +H24wB/4Iyy/XfWNh22eBPgwYTzYTtD76XwNAhE= X-Google-Smtp-Source: APBJJlF7N9AmmkBTysR/mafF4wuzyAwYLu/QLUnYzQrf7ApEXKA6OZJFE1/IL+aC4TOHfQMOGzedBg== X-Received: by 2002:a17:902:ea01:b0:1bb:83ec:832 with SMTP id s1-20020a170902ea0100b001bb83ec0832mr10583875plg.2.1690239323510; Mon, 24 Jul 2023 15:55:23 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id p7-20020a1709026b8700b001acae9734c0sm9424733plk.266.2023.07.24.15.55.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jul 2023 15:55:23 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org, linux-xfs@vger.kernel.org Cc: hch@lst.de, andres@anarazel.de, david@fromorbit.com, djwong@kernel.org, Jens Axboe Subject: [PATCH 8/8] iomap: support IOCB_DIO_CALLER_COMP Date: Mon, 24 Jul 2023 16:55:11 -0600 Message-Id: <20230724225511.599870-9-axboe@kernel.dk> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230724225511.599870-1-axboe@kernel.dk> References: <20230724225511.599870-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org If IOCB_DIO_CALLER_COMP is set, utilize that to set kiocb->dio_complete handler and data for that callback. Rather than punt the completion to a workqueue, we pass back the handler and data to the issuer and will get a callback from a safe task context. Using the following fio job to randomly dio write 4k blocks at queue depths of 1..16: fio --name=dio-write --filename=/data1/file --time_based=1 \ --runtime=10 --bs=4096 --rw=randwrite --norandommap --buffered=0 \ --cpus_allowed=4 --ioengine=io_uring --iodepth=$depth shows the following results before and after this patch: Stock Patched Diff ======================================= QD1 155K 162K + 4.5% QD2 290K 313K + 7.9% QD4 533K 597K +12.0% QD8 604K 827K +36.9% QD16 615K 845K +37.4% which shows nice wins all around. If we factored in per-IOP efficiency, the wins look even nicer. This becomes apparent as queue depth rises, as the offloaded workqueue completions runs out of steam. Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Signed-off-by: Jens Axboe --- fs/iomap/direct-io.c | 62 ++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 60 insertions(+), 2 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index b943bc5c7b18..bcd3f8cf5ea4 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -20,6 +20,7 @@ * Private flags for iomap_dio, must not overlap with the public ones in * iomap.h: */ +#define IOMAP_DIO_CALLER_COMP (1U << 26) #define IOMAP_DIO_INLINE_COMP (1U << 27) #define IOMAP_DIO_WRITE_THROUGH (1U << 28) #define IOMAP_DIO_NEED_SYNC (1U << 29) @@ -132,6 +133,11 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio) } EXPORT_SYMBOL_GPL(iomap_dio_complete); +static ssize_t iomap_dio_deferred_complete(void *data) +{ + return iomap_dio_complete(data); +} + static void iomap_dio_complete_work(struct work_struct *work) { struct iomap_dio *dio = container_of(work, struct iomap_dio, aio.work); @@ -182,6 +188,31 @@ void iomap_dio_bio_end_io(struct bio *bio) goto release_bio; } + /* + * If this dio is flagged with IOMAP_DIO_CALLER_COMP, then schedule + * our completion that way to avoid an async punt to a workqueue. + */ + if (dio->flags & IOMAP_DIO_CALLER_COMP) { + /* only polled IO cares about private cleared */ + iocb->private = dio; + iocb->dio_complete = iomap_dio_deferred_complete; + + /* + * Invoke ->ki_complete() directly. We've assigned our + * dio_complete callback handler, and since the issuer set + * IOCB_DIO_CALLER_COMP, we know their ki_complete handler will + * notice ->dio_complete being set and will defer calling that + * handler until it can be done from a safe task context. + * + * Note that the 'res' being passed in here is not important + * for this case. The actual completion value of the request + * will be gotten from dio_complete when that is run by the + * issuer. + */ + iocb->ki_complete(iocb, 0); + goto release_bio; + } + /* * Async DIO completion that requires filesystem level completion work * gets punted to a work queue to complete as the operation may require @@ -278,12 +309,17 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, * after IO completion such as unwritten extent conversion) and * the underlying device either supports FUA or doesn't have * a volatile write cache. This allows us to avoid cache flushes - * on IO completion. + * on IO completion. If we can't use writethrough and need to + * sync, disable in-task completions as dio completion will + * need to call generic_write_sync() which will do a blocking + * fsync / cache flush call. */ if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) && (dio->flags & IOMAP_DIO_WRITE_THROUGH) && (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev))) use_fua = true; + else if (dio->flags & IOMAP_DIO_NEED_SYNC) + dio->flags &= ~IOMAP_DIO_CALLER_COMP; } /* @@ -298,10 +334,23 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, goto out; /* - * We can only poll for single bio I/Os. + * We can only do deferred completion for pure overwrites that + * don't require additional IO at completion. This rules out + * writes that need zeroing or extent conversion, extend + * the file size, or issue journal IO or cache flushes + * during completion processing. */ if (need_zeroout || + ((dio->flags & IOMAP_DIO_NEED_SYNC) && !use_fua) || ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode))) + dio->flags &= ~IOMAP_DIO_CALLER_COMP; + + /* + * The rules for polled IO completions follow the guidelines as the + * ones we set for inline and deferred completions. If none of those + * are available for this IO, clear the polled flag. + */ + if (!(dio->flags & (IOMAP_DIO_INLINE_COMP|IOMAP_DIO_CALLER_COMP))) dio->iocb->ki_flags &= ~IOCB_HIPRI; if (need_zeroout) { @@ -547,6 +596,15 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, iomi.flags |= IOMAP_WRITE; dio->flags |= IOMAP_DIO_WRITE; + /* + * Flag as supporting deferred completions, if the issuer + * groks it. This can avoid a workqueue punt for writes. + * We may later clear this flag if we need to do other IO + * as part of this IO completion. + */ + if (iocb->ki_flags & IOCB_DIO_CALLER_COMP) + dio->flags |= IOMAP_DIO_CALLER_COMP; + if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) { ret = -EAGAIN; if (iomi.pos >= dio->i_size ||