From patchwork Wed May 3 07:06:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13229555 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0DAF6C77B78 for ; Wed, 3 May 2023 07:06:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229569AbjECHG2 (ORCPT ); Wed, 3 May 2023 03:06:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50916 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229491AbjECHGZ (ORCPT ); Wed, 3 May 2023 03:06:25 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0CCC33C11 for ; Wed, 3 May 2023 00:06:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=jj35ohtDs9SXoCNYZ3zaN3XBSzFhEjYYggLE2AF1Vdo=; b=XaK2LXJap3qdQP9QxbpOKe66uA ByLgyhrDD0AVUfdmqSo3RXDg52t41lyvNO/8FH9eyPysMla3HuI8Owu4Hoy4Y8pHgZ97qduyjN0ij LDq6pRRbfneeNXkyAGdD+MoeVnKAj+iobOvZHmA/LAssGrGsZ5nUE0nsV9Dky1mFrho1LuDm51jM9 auDirtDGT9EOCx6XIOAiLc/SiQI6PwPB6zh/IVUKUBzw0sApd21mX23ODK29Ma1jhbH2AZhlYBiHt AiQemln02Y/jYwRS2m6PDP3+IULVHNSzB9WYcmncU31KMJLICUByX8r4dVYUpzBgrjXI0UrLEsKV2 roCPqnfA==; Received: from 213-225-6-169.nat.highway.a1.net ([213.225.6.169] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1pu6ZC-003aCq-2Z; Wed, 03 May 2023 07:06:23 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: linux-btrfs@vger.kernel.org Subject: [PATCH 1/3] btrfs: never defer I/O submission for fast CRC implementations Date: Wed, 3 May 2023 09:06:13 +0200 Message-Id: <20230503070615.1029820-2-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230503070615.1029820-1-hch@lst.de> References: <20230503070615.1029820-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Most modern hardware supports very fast accelerated crc32c calculation. If that is supported the CPU overhead of the checksum calculation is very limited, and offloading the calculation to special worker threads has a lot of overhead for no gain. E.g. on an Intel Optane device is actually very much slows down even 1M buffered writes with fio: Unpatched: write: IOPS=3316, BW=3316MiB/s (3477MB/s)(200GiB/61757msec); 0 zone resets With synchronous CRCs: write: IOPS=4882, BW=4882MiB/s (5119MB/s)(200GiB/41948msec); 0 zone resets With a lot of variation during the unpatch run going down as low as 1100MB/s, while the synchronous CRC version has about the same peak write speed but much lower dips, and fewer kworkers churning around. Both tests had fio saturated at 100% CPU. (thanks to Jens Axboe via Chris Mason for the benchmarking) Signed-off-by: Christoph Hellwig Reviewed-by: Chris Mason Reviewed-by: Johannes Thumshirn --- fs/btrfs/bio.c | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c index 4b5220509186a8..e8a55605ce22fa 100644 --- a/fs/btrfs/bio.c +++ b/fs/btrfs/bio.c @@ -574,6 +574,12 @@ static void run_one_async_free(struct btrfs_work *work) static bool should_async_write(struct btrfs_bio *bbio) { + /* + * Submit synchronously if the checksum implementation is fast. + */ + if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &bbio->fs_info->flags)) + return false; + /* * If the I/O is not issued by fsync and friends, (->sync_writers != 0), * then try to defer the submission to a workqueue to parallelize the @@ -583,18 +589,10 @@ static bool should_async_write(struct btrfs_bio *bbio) return false; /* - * Submit metadata writes synchronously if the checksum implementation - * is fast, or we are on a zoned device that wants I/O to be submitted - * in order. + * Zoned devices require I/O to be submitted in order. */ - if (bbio->bio.bi_opf & REQ_META) { - struct btrfs_fs_info *fs_info = bbio->fs_info; - - if (btrfs_is_zoned(fs_info)) - return false; - if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags)) - return false; - } + if ((bbio->bio.bi_opf & REQ_META) && btrfs_is_zoned(bbio->fs_info)) + return false; return true; } From patchwork Wed May 3 07:06:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13229556 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B0B4C77B7F for ; Wed, 3 May 2023 07:06:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229609AbjECHGa (ORCPT ); Wed, 3 May 2023 03:06:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50964 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229574AbjECHG3 (ORCPT ); Wed, 3 May 2023 03:06:29 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0BEBA40FD for ; Wed, 3 May 2023 00:06:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=Us70f+i1D33yBJSW8u7dYt/c0GCYYuIcO/ig5UHXzIQ=; b=cty5EFEWsFbSgTwD/9Rm4elhQM SHnnb9ajkksaJAIfTA2pnNigjetRj05jS78hQxGhp6K8JuWLcGtfjst1vLoHmjxSOeM72HwT8E1b8 MlU3SJYnUnljZNihUX2MLdD0ilLmAP5IM514YT3Zvj2Eqxh11re1Y0qOaZ46Rh/UenE9Ob0JFl4hM KHSojvc5IHb0vA2XFpHWb9tXy+3KFnAzenwGidACW2MLkFiJbTCDU/ltPS+SQSBUMu8KcOhIyu/K4 2t6lglwybMTJSWstI1QEM5C6BPJnt8NdOZa+iUfdqObYli/rPFNWbF8FEXaU1+Cm4vU+5A9eh80aT aIlR8+sA==; Received: from 213-225-6-169.nat.highway.a1.net ([213.225.6.169] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1pu6ZF-003aDt-2m; Wed, 03 May 2023 07:06:26 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: linux-btrfs@vger.kernel.org Subject: [PATCH 2/3] btrfs: don't reinvent synchronous writer detection logic Date: Wed, 3 May 2023 09:06:14 +0200 Message-Id: <20230503070615.1029820-3-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230503070615.1029820-1-hch@lst.de> References: <20230503070615.1029820-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The writeback_control structure already passes down the information about a writeback being synchronous from the core VM code, and thus information is propagated into the bio REQ_SYNC flag through the wbc_to_write_flags helper. Use that information to decide if CRCs calculation is offloaded to a workqueue instead of the sync_writers field in the btrfs_inode that not only bloats the inode but also has too wide scope, being inode wide instead of limited to the actual writeback request. Signed-off-by: Christoph Hellwig Reviewed-by: Chris Mason Reviewed-by: Johannes Thumshirn --- fs/btrfs/bio.c | 7 +++---- fs/btrfs/btrfs_inode.h | 3 --- fs/btrfs/file.c | 9 --------- fs/btrfs/inode.c | 1 - fs/btrfs/transaction.c | 2 -- 5 files changed, 3 insertions(+), 19 deletions(-) diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c index e8a55605ce22fa..49324388499cf1 100644 --- a/fs/btrfs/bio.c +++ b/fs/btrfs/bio.c @@ -581,11 +581,10 @@ static bool should_async_write(struct btrfs_bio *bbio) return false; /* - * If the I/O is not issued by fsync and friends, (->sync_writers != 0), - * then try to defer the submission to a workqueue to parallelize the - * checksum calculation. + * Try to defer the submission to a workqueue to parallelize the + * checksum calculation unless the I/O is issued synchronously. */ - if (atomic_read(&bbio->inode->sync_writers)) + if (op_is_sync(bbio->bio.bi_opf)) return false; /* diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index bb498448066981..3808aa5af21676 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -116,9 +116,6 @@ struct btrfs_inode { unsigned long runtime_flags; - /* Keep track of who's O_SYNC/fsyncing currently */ - atomic_t sync_writers; - /* full 64 bit generation number, struct vfs_inode doesn't have a big * enough field for this. */ diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 5cc5a1faaef5b5..4b9433b96f4b8e 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1651,7 +1651,6 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from, struct file *file = iocb->ki_filp; struct btrfs_inode *inode = BTRFS_I(file_inode(file)); ssize_t num_written, num_sync; - const bool sync = iocb_is_dsync(iocb); /* * If the fs flips readonly due to some impossible error, although we @@ -1664,9 +1663,6 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from, if (encoded && (iocb->ki_flags & IOCB_NOWAIT)) return -EOPNOTSUPP; - if (sync) - atomic_inc(&inode->sync_writers); - if (encoded) { num_written = btrfs_encoded_write(iocb, from, encoded); num_sync = encoded->len; @@ -1686,9 +1682,6 @@ ssize_t btrfs_do_write_iter(struct kiocb *iocb, struct iov_iter *from, num_written = num_sync; } - if (sync) - atomic_dec(&inode->sync_writers); - current->backing_dev_info = NULL; return num_written; } @@ -1733,9 +1726,7 @@ static int start_ordered_ops(struct inode *inode, loff_t start, loff_t end) * several segments of stripe length (currently 64K). */ blk_start_plug(&plug); - atomic_inc(&BTRFS_I(inode)->sync_writers); ret = btrfs_fdatawrite_range(inode, start, end); - atomic_dec(&BTRFS_I(inode)->sync_writers); blk_finish_plug(&plug); return ret; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index b9f88309bd4c43..5b0c827bb4c951 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8459,7 +8459,6 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) ei->io_tree.inode = ei; extent_io_tree_init(fs_info, &ei->file_extent_tree, IO_TREE_INODE_FILE_EXTENT); - atomic_set(&ei->sync_writers, 0); mutex_init(&ei->log_mutex); btrfs_ordered_inode_tree_init(&ei->ordered_tree); INIT_LIST_HEAD(&ei->delalloc_inodes); diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index 8b6a99b8d7f6d3..27c616fdfae274 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1056,7 +1056,6 @@ int btrfs_write_marked_extents(struct btrfs_fs_info *fs_info, u64 start = 0; u64 end; - atomic_inc(&BTRFS_I(fs_info->btree_inode)->sync_writers); while (!find_first_extent_bit(dirty_pages, start, &start, &end, mark, &cached_state)) { bool wait_writeback = false; @@ -1092,7 +1091,6 @@ int btrfs_write_marked_extents(struct btrfs_fs_info *fs_info, cond_resched(); start = end + 1; } - atomic_dec(&BTRFS_I(fs_info->btree_inode)->sync_writers); return werr; } From patchwork Wed May 3 07:06:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 13229557 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5E03C77B78 for ; Wed, 3 May 2023 07:06:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229620AbjECHGc (ORCPT ); Wed, 3 May 2023 03:06:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50992 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229574AbjECHGb (ORCPT ); Wed, 3 May 2023 03:06:31 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A2EE4422D for ; Wed, 3 May 2023 00:06:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc: To:From:Sender:Reply-To:Content-ID:Content-Description; bh=aoQaVeD3u5IhR7hSITYV12FCpghxrrSA5CiiNQNeL04=; b=UzFrXjYP3uXdmbcTqylXRmbVne XYCT92L/XH5BiobH+d4bfxMLSGhH/JzyAJTGtbxYOkk5eOKLyoWIFvR9VhDiKSkwmSgo4fQsi+Ihb WjRjBtVoxxrtMtIEwBH1XXX+0WBOUka4d7M7EmzbjKGCs8St+GhhDF59Y1TTcrrAQgs3BwXtG3ns9 ZJeUf8W/MnttApxZt0G8BIZBhNsV4JFy6XbJSrOo9EZt5PvGQboLUCS9zkXASG2G3Sci71QT5V5kr X1rGNVEYBsobDXJnpN5HyhogA2g9BQGLmnjRL69iqBv9TD5J6kho01qXIE4Ec94yZFYzQoZ8JI3tp /LpHZMFQ==; Received: from 213-225-6-169.nat.highway.a1.net ([213.225.6.169] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1pu6ZI-003aEn-1b; Wed, 03 May 2023 07:06:28 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: linux-btrfs@vger.kernel.org Subject: [PATCH 3/3] btrfs: remove hipri_workers workqueue Date: Wed, 3 May 2023 09:06:15 +0200 Message-Id: <20230503070615.1029820-4-hch@lst.de> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230503070615.1029820-1-hch@lst.de> References: <20230503070615.1029820-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Now that btrfs_wq_submit_bio is never called for synchronous I/O, the hipri_workers workqueue is dead code and can be removeḋ. Signed-off-by: Christoph Hellwig Reviewed-by: Chris Mason Reviewed-by: Johannes Thumshirn --- fs/btrfs/bio.c | 5 +---- fs/btrfs/disk-io.c | 6 +----- fs/btrfs/fs.h | 1 - fs/btrfs/super.c | 1 - 4 files changed, 2 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c index 49324388499cf1..fa07e9fbb7c395 100644 --- a/fs/btrfs/bio.c +++ b/fs/btrfs/bio.c @@ -619,10 +619,7 @@ static bool btrfs_wq_submit_bio(struct btrfs_bio *bbio, btrfs_init_work(&async->work, run_one_async_start, run_one_async_done, run_one_async_free); - if (op_is_sync(bbio->bio.bi_opf)) - btrfs_queue_work(fs_info->hipri_workers, &async->work); - else - btrfs_queue_work(fs_info->workers, &async->work); + btrfs_queue_work(fs_info->workers, &async->work); return true; } diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 59ea049fe7ee0d..96f144094af687 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -1985,7 +1985,6 @@ static void btrfs_stop_all_workers(struct btrfs_fs_info *fs_info) { btrfs_destroy_workqueue(fs_info->fixup_workers); btrfs_destroy_workqueue(fs_info->delalloc_workers); - btrfs_destroy_workqueue(fs_info->hipri_workers); btrfs_destroy_workqueue(fs_info->workers); if (fs_info->endio_workers) destroy_workqueue(fs_info->endio_workers); @@ -2180,9 +2179,6 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info) fs_info->workers = btrfs_alloc_workqueue(fs_info, "worker", flags, max_active, 16); - fs_info->hipri_workers = - btrfs_alloc_workqueue(fs_info, "worker-high", - flags | WQ_HIGHPRI, max_active, 16); fs_info->delalloc_workers = btrfs_alloc_workqueue(fs_info, "delalloc", @@ -2219,7 +2215,7 @@ static int btrfs_init_workqueues(struct btrfs_fs_info *fs_info) fs_info->discard_ctl.discard_workers = alloc_workqueue("btrfs_discard", WQ_UNBOUND | WQ_FREEZABLE, 1); - if (!(fs_info->workers && fs_info->hipri_workers && + if (!(fs_info->workers && fs_info->delalloc_workers && fs_info->flush_workers && fs_info->endio_workers && fs_info->endio_meta_workers && fs_info->compressed_write_workers && diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h index 0d98fc5f6f44f7..840e4def18b519 100644 --- a/fs/btrfs/fs.h +++ b/fs/btrfs/fs.h @@ -543,7 +543,6 @@ struct btrfs_fs_info { * A third pool does submit_bio to avoid deadlocking with the other two. */ struct btrfs_workqueue *workers; - struct btrfs_workqueue *hipri_workers; struct btrfs_workqueue *delalloc_workers; struct btrfs_workqueue *flush_workers; struct workqueue_struct *endio_workers; diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 0f2f915e42b06a..0ec440cd756671 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1632,7 +1632,6 @@ static void btrfs_resize_thread_pool(struct btrfs_fs_info *fs_info, old_pool_size, new_pool_size); btrfs_workqueue_set_max(fs_info->workers, new_pool_size); - btrfs_workqueue_set_max(fs_info->hipri_workers, new_pool_size); btrfs_workqueue_set_max(fs_info->delalloc_workers, new_pool_size); btrfs_workqueue_set_max(fs_info->caching_workers, new_pool_size); workqueue_set_max_active(fs_info->endio_workers, new_pool_size);