From patchwork Thu Sep 1 07:42:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12962031 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8013CECAAD8 for ; Thu, 1 Sep 2022 07:42:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234004AbiIAHmd (ORCPT ); Thu, 1 Sep 2022 03:42:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34026 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233848AbiIAHma (ORCPT ); Thu, 1 Sep 2022 03:42:30 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 39DEEE68F8; Thu, 1 Sep 2022 00:42:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=Q0cJ+yGC6IUGRmsPYw8GA1ONb9WnpEqPOZmuK5m2FYw=; b=QxFW6/SjNqpBtJF8DfEGKFPX4U 9loTVeTVfaX8NVNt03i4qwX+aY186HoD522Rqxb2QOwOHPw7RwWvUXgK/IVWwD8LlOAbgD0qgRMzU H3f5yojgnOQqlFc4aG110rQ2IiQxsvi3lP1uJOwgN3J4NTnkMuaXHsEUHb0Qm0+dd3/NYAGh8Mzif VmTWBe+wwrh2s6X5+lbbzDYP9vpeelHMDXu6QhdJqh+rWXOo74Owx4GZAKL87mqlo/8IBNQJvYXlB uainKj0jMBipH6tI8Z6SCUu8UybPFVbSKuatzvgOA/HI76ieUXSXf1lIpHknwOiKN5/nFcdgaVl/I KV6MXBPQ==; Received: from 213-225-1-14.nat.highway.a1.net ([213.225.1.14] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1oTeqG-00ANUr-OO; Thu, 01 Sep 2022 07:42:25 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Damien Le Moal , Naohiro Aota , Johannes Thumshirn , Qu Wenruo , Jens Axboe , "Darrick J. Wong" , linux-block@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 01/17] block: export bio_split_rw Date: Thu, 1 Sep 2022 10:42:00 +0300 Message-Id: <20220901074216.1849941-2-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220901074216.1849941-1-hch@lst.de> References: <20220901074216.1849941-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org bio_split_rw can be used by file systems to split and incoming write bio into multiple bios fitting the hardware limit for use as ZONE_APPEND bios. Export it for initial use in btrfs. Signed-off-by: Christoph Hellwig Reviewed-by: Johannes Thumshirn Reviewed-by: Josef Bacik --- block/blk-merge.c | 3 ++- include/linux/bio.h | 4 ++++ 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/block/blk-merge.c b/block/blk-merge.c index ff04e9290715a..e68295462977b 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -267,7 +267,7 @@ static bool bvec_split_segs(struct queue_limits *lim, const struct bio_vec *bv, * responsible for ensuring that @bs is only destroyed after processing of the * split bio has finished. */ -static struct bio *bio_split_rw(struct bio *bio, struct queue_limits *lim, +struct bio *bio_split_rw(struct bio *bio, struct queue_limits *lim, unsigned *segs, struct bio_set *bs, unsigned max_bytes) { struct bio_vec bv, bvprv, *bvprvp = NULL; @@ -317,6 +317,7 @@ static struct bio *bio_split_rw(struct bio *bio, struct queue_limits *lim, bio_clear_polled(bio); return bio_split(bio, bytes >> SECTOR_SHIFT, GFP_NOIO, bs); } +EXPORT_SYMBOL_GPL(bio_split_rw); /** * __bio_split_to_limits - split a bio to fit the queue limits diff --git a/include/linux/bio.h b/include/linux/bio.h index ca22b06700a94..46890f8235401 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -12,6 +12,8 @@ #define BIO_MAX_VECS 256U +struct queue_limits; + static inline unsigned int bio_max_segs(unsigned int nr_segs) { return min(nr_segs, BIO_MAX_VECS); @@ -375,6 +377,8 @@ static inline void bip_set_seed(struct bio_integrity_payload *bip, void bio_trim(struct bio *bio, sector_t offset, sector_t size); extern struct bio *bio_split(struct bio *bio, int sectors, gfp_t gfp, struct bio_set *bs); +struct bio *bio_split_rw(struct bio *bio, struct queue_limits *lim, + unsigned *segs, struct bio_set *bs, unsigned max_bytes); /** * bio_next_split - get next @sectors from a bio, splitting if necessary From patchwork Thu Sep 1 07:42:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12962033 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4A5AC67868 for ; Thu, 1 Sep 2022 07:42:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234045AbiIAHm4 (ORCPT ); Thu, 1 Sep 2022 03:42:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234014AbiIAHms (ORCPT ); Thu, 1 Sep 2022 03:42:48 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7325C11E82A; Thu, 1 Sep 2022 00:42:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=dIecHME7GD9eTT6DRFQ8fxva9C5znLEXWNUzbOtIc9U=; b=vgEjvvYgl+Nd71/vogWCk5HxUN 73x3c+GTNsDaFX7FxEnFnFsZBm1HPlrKZtMo4CaYN8KBTRvnYHCPMmYyssuCwcuq3+f7LyFh0RWoL KuKLGddfzVlyixJejGnFV08zUsX94fP09mJISaidG6Rk7x9eXdRw2m9TBALibjRZ8+iL/Pn1jKtLp yNUc8Lqzt+Np/KsefDRLabU0deaq8mqhK9x2ZaaioA14y7ZDZBnzN1K8ViwLxgx6iDfPu20e9ccXq JHERPA5z+Qwt0t5L5jKAa7Y2YKhyfbPwRSjuP+hkhGImioZf7/yAI587suBQiX1uxuTUDPI8LAPsR QBrtyDFA==; Received: from 213-225-1-14.nat.highway.a1.net ([213.225.1.14] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1oTeqL-00ANWK-5o; Thu, 01 Sep 2022 07:42:29 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Damien Le Moal , Naohiro Aota , Johannes Thumshirn , Qu Wenruo , Jens Axboe , "Darrick J. Wong" , linux-block@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 02/17] btrfs: stop tracking failed reads in the I/O tree Date: Thu, 1 Sep 2022 10:42:01 +0300 Message-Id: <20220901074216.1849941-3-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220901074216.1849941-1-hch@lst.de> References: <20220901074216.1849941-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org There is a separate I/O failure tree to track the fail reads, so remove the extra EXTENT_DAMAGED bit in the I/O tree. Signed-off-by: Christoph Hellwig Reviewed-by: Qu Wenruo Reviewed-by: Josef Baacik --- fs/btrfs/extent-io-tree.h | 1 - fs/btrfs/extent_io.c | 16 +--------------- fs/btrfs/tests/extent-io-tests.c | 1 - include/trace/events/btrfs.h | 1 - 4 files changed, 1 insertion(+), 18 deletions(-) diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h index ec2f8b8e6faa7..e218bb56d86ac 100644 --- a/fs/btrfs/extent-io-tree.h +++ b/fs/btrfs/extent-io-tree.h @@ -17,7 +17,6 @@ struct io_failure_record; #define EXTENT_NODATASUM (1U << 7) #define EXTENT_CLEAR_META_RESV (1U << 8) #define EXTENT_NEED_WAIT (1U << 9) -#define EXTENT_DAMAGED (1U << 10) #define EXTENT_NORESERVE (1U << 11) #define EXTENT_QGROUP_RESERVED (1U << 12) #define EXTENT_CLEAR_DATA_RESV (1U << 13) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 591c191a58bc9..6ac76534d2c9e 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2280,23 +2280,13 @@ int free_io_failure(struct extent_io_tree *failure_tree, struct io_failure_record *rec) { int ret; - int err = 0; set_state_failrec(failure_tree, rec->start, NULL); ret = clear_extent_bits(failure_tree, rec->start, rec->start + rec->len - 1, EXTENT_LOCKED | EXTENT_DIRTY); - if (ret) - err = ret; - - ret = clear_extent_bits(io_tree, rec->start, - rec->start + rec->len - 1, - EXTENT_DAMAGED); - if (ret && !err) - err = ret; - kfree(rec); - return err; + return ret; } /* @@ -2521,7 +2511,6 @@ static struct io_failure_record *btrfs_get_io_failure_record(struct inode *inode u64 start = bbio->file_offset + bio_offset; struct io_failure_record *failrec; struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree; - struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree; const u32 sectorsize = fs_info->sectorsize; int ret; @@ -2573,9 +2562,6 @@ static struct io_failure_record *btrfs_get_io_failure_record(struct inode *inode EXTENT_LOCKED | EXTENT_DIRTY); if (ret >= 0) { ret = set_state_failrec(failure_tree, start, failrec); - /* Set the bits in the inode's tree */ - ret = set_extent_bits(tree, start, start + sectorsize - 1, - EXTENT_DAMAGED); } else if (ret < 0) { kfree(failrec); return ERR_PTR(ret); diff --git a/fs/btrfs/tests/extent-io-tests.c b/fs/btrfs/tests/extent-io-tests.c index a232b15b8021f..ba4b7601e8c0a 100644 --- a/fs/btrfs/tests/extent-io-tests.c +++ b/fs/btrfs/tests/extent-io-tests.c @@ -80,7 +80,6 @@ static void extent_flag_to_str(const struct extent_state *state, char *dest) PRINT_ONE_FLAG(state, dest, cur, NODATASUM); PRINT_ONE_FLAG(state, dest, cur, CLEAR_META_RESV); PRINT_ONE_FLAG(state, dest, cur, NEED_WAIT); - PRINT_ONE_FLAG(state, dest, cur, DAMAGED); PRINT_ONE_FLAG(state, dest, cur, NORESERVE); PRINT_ONE_FLAG(state, dest, cur, QGROUP_RESERVED); PRINT_ONE_FLAG(state, dest, cur, CLEAR_DATA_RESV); diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index 73df80d462dc8..f8a4118b16574 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -154,7 +154,6 @@ FLUSH_STATES { EXTENT_NODATASUM, "NODATASUM"}, \ { EXTENT_CLEAR_META_RESV, "CLEAR_META_RESV"}, \ { EXTENT_NEED_WAIT, "NEED_WAIT"}, \ - { EXTENT_DAMAGED, "DAMAGED"}, \ { EXTENT_NORESERVE, "NORESERVE"}, \ { EXTENT_QGROUP_RESERVED, "QGROUP_RESERVED"}, \ { EXTENT_CLEAR_DATA_RESV, "CLEAR_DATA_RESV"}, \ From patchwork Thu Sep 1 07:42:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12962032 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E7B95ECAAD8 for ; Thu, 1 Sep 2022 07:42:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234035AbiIAHmz (ORCPT ); Thu, 1 Sep 2022 03:42:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34284 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233178AbiIAHmr (ORCPT ); Thu, 1 Sep 2022 03:42:47 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4649108538; Thu, 1 Sep 2022 00:42:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: Content-Type:MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc: To:From:Sender:Reply-To:Content-ID:Content-Description; bh=4szSUaj6PvlcAFJXbFSRnUoLl0jMetbt1iNRmxD8bEM=; b=Kg5emx10/xoOfbqvWCFsIE5bOr mmvTicMKENHTMTyy+5WAvLuM9mxyf6dH1xjuDZ0LqQMSSvtjTzRnEXkzlmH6D9tl+BrTD3nLvhYDP kRqTfrtlfDr6kzfCQViDpbcyo2m8IBI0fKTRdGAvJp99buIg6HE3p0K3xieL0CoZl0/OZTA1CPAHw hYcwIj38lie/D0Nzwt4bSGjn7uEojyZnzMJkiQ3/jB8VC9N+Ff1dfKSX3QjyNbaKWPtqE7d57D64o L7HdUhTY6yiKIreHC+O2HfPx6gcuh0NWUiZIdMgoyU4IUPxMmgDr8X6BgB1sCkqQomMocMv9fmuo/ BlWdALog==; Received: from 213-225-1-14.nat.highway.a1.net ([213.225.1.14] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1oTeqO-00ANXk-K6; Thu, 01 Sep 2022 07:42:33 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Damien Le Moal , Naohiro Aota , Johannes Thumshirn , Qu Wenruo , Jens Axboe , "Darrick J. Wong" , linux-block@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 03/17] btrfs: move repair_io_failure to volumes.c Date: Thu, 1 Sep 2022 10:42:02 +0300 Message-Id: <20220901074216.1849941-4-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220901074216.1849941-1-hch@lst.de> References: <20220901074216.1849941-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org repair_io_failure ties directly into all the glory low-level details of mapping a bio with a logic address to the actual physical location. Move it right below btrfs_submit_bio to keep all the related logic together. Also move btrfs_repair_eb_io_failure to its caller in disk-io.c now that repair_io_failure is available in a header. Signed-off-by: Christoph Hellwig Reviewed-by: Josef Bacik --- fs/btrfs/disk-io.c | 24 +++++++++ fs/btrfs/extent_io.c | 118 +------------------------------------------ fs/btrfs/extent_io.h | 1 - fs/btrfs/volumes.c | 91 +++++++++++++++++++++++++++++++++ fs/btrfs/volumes.h | 3 ++ 5 files changed, 120 insertions(+), 117 deletions(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 912e0b2bd0c5f..a88d6c3b59042 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -249,6 +249,30 @@ int btrfs_verify_level_key(struct extent_buffer *eb, int level, return ret; } +static int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, + int mirror_num) +{ + struct btrfs_fs_info *fs_info = eb->fs_info; + u64 start = eb->start; + int i, num_pages = num_extent_pages(eb); + int ret = 0; + + if (sb_rdonly(fs_info->sb)) + return -EROFS; + + for (i = 0; i < num_pages; i++) { + struct page *p = eb->pages[i]; + + ret = btrfs_repair_io_failure(fs_info, 0, start, PAGE_SIZE, + start, p, start - page_offset(p), mirror_num); + if (ret) + break; + start += PAGE_SIZE; + } + + return ret; +} + /* * helper to read a given tree block, doing retries as required when * the checksums don't match and we have alternate mirrors to try. diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 6ac76534d2c9e..c83cc5677a08a 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2289,120 +2289,6 @@ int free_io_failure(struct extent_io_tree *failure_tree, return ret; } -/* - * this bypasses the standard btrfs submit functions deliberately, as - * the standard behavior is to write all copies in a raid setup. here we only - * want to write the one bad copy. so we do the mapping for ourselves and issue - * submit_bio directly. - * to avoid any synchronization issues, wait for the data after writing, which - * actually prevents the read that triggered the error from finishing. - * currently, there can be no more than two copies of every data bit. thus, - * exactly one rewrite is required. - */ -static int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start, - u64 length, u64 logical, struct page *page, - unsigned int pg_offset, int mirror_num) -{ - struct btrfs_device *dev; - struct bio_vec bvec; - struct bio bio; - u64 map_length = 0; - u64 sector; - struct btrfs_io_context *bioc = NULL; - int ret = 0; - - ASSERT(!(fs_info->sb->s_flags & SB_RDONLY)); - BUG_ON(!mirror_num); - - if (btrfs_repair_one_zone(fs_info, logical)) - return 0; - - map_length = length; - - /* - * Avoid races with device replace and make sure our bioc has devices - * associated to its stripes that don't go away while we are doing the - * read repair operation. - */ - btrfs_bio_counter_inc_blocked(fs_info); - if (btrfs_is_parity_mirror(fs_info, logical, length)) { - /* - * Note that we don't use BTRFS_MAP_WRITE because it's supposed - * to update all raid stripes, but here we just want to correct - * bad stripe, thus BTRFS_MAP_READ is abused to only get the bad - * stripe's dev and sector. - */ - ret = btrfs_map_block(fs_info, BTRFS_MAP_READ, logical, - &map_length, &bioc, 0); - if (ret) - goto out_counter_dec; - ASSERT(bioc->mirror_num == 1); - } else { - ret = btrfs_map_block(fs_info, BTRFS_MAP_WRITE, logical, - &map_length, &bioc, mirror_num); - if (ret) - goto out_counter_dec; - BUG_ON(mirror_num != bioc->mirror_num); - } - - sector = bioc->stripes[bioc->mirror_num - 1].physical >> 9; - dev = bioc->stripes[bioc->mirror_num - 1].dev; - btrfs_put_bioc(bioc); - - if (!dev || !dev->bdev || - !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state)) { - ret = -EIO; - goto out_counter_dec; - } - - bio_init(&bio, dev->bdev, &bvec, 1, REQ_OP_WRITE | REQ_SYNC); - bio.bi_iter.bi_sector = sector; - __bio_add_page(&bio, page, length, pg_offset); - - btrfsic_check_bio(&bio); - ret = submit_bio_wait(&bio); - if (ret) { - /* try to remap that extent elsewhere? */ - btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS); - goto out_bio_uninit; - } - - btrfs_info_rl_in_rcu(fs_info, - "read error corrected: ino %llu off %llu (dev %s sector %llu)", - ino, start, - rcu_str_deref(dev->name), sector); - ret = 0; - -out_bio_uninit: - bio_uninit(&bio); -out_counter_dec: - btrfs_bio_counter_dec(fs_info); - return ret; -} - -int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num) -{ - struct btrfs_fs_info *fs_info = eb->fs_info; - u64 start = eb->start; - int i, num_pages = num_extent_pages(eb); - int ret = 0; - - if (sb_rdonly(fs_info->sb)) - return -EROFS; - - for (i = 0; i < num_pages; i++) { - struct page *p = eb->pages[i]; - - ret = repair_io_failure(fs_info, 0, start, PAGE_SIZE, start, p, - start - page_offset(p), mirror_num); - if (ret) - break; - start += PAGE_SIZE; - } - - return ret; -} - static int next_mirror(const struct io_failure_record *failrec, int cur_mirror) { if (cur_mirror == failrec->num_copies) @@ -2460,7 +2346,7 @@ int clean_io_failure(struct btrfs_fs_info *fs_info, mirror = failrec->this_mirror; do { mirror = prev_mirror(failrec, mirror); - repair_io_failure(fs_info, ino, start, failrec->len, + btrfs_repair_io_failure(fs_info, ino, start, failrec->len, failrec->logical, page, pg_offset, mirror); } while (mirror != failrec->failed_mirror); @@ -2600,7 +2486,7 @@ int btrfs_repair_one_sector(struct inode *inode, struct btrfs_bio *failed_bbio, * * Since we're only doing repair for one sector, we only need to get * a good copy of the failed sector and if we succeed, we have setup - * everything for repair_io_failure to do the rest for us. + * everything for btrfs_repair_io_failure to do the rest for us. */ failrec->this_mirror = next_mirror(failrec, failrec->this_mirror); if (failrec->this_mirror == failrec->failed_mirror) { diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 69a86ae6fd508..e653e64598bf7 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -243,7 +243,6 @@ void extent_clear_unlock_delalloc(struct btrfs_inode *inode, u64 start, u64 end, int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array); void end_extent_writepage(struct page *page, int err, u64 start, u64 end); -int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num); /* * When IO fails, either with EIO or csum verification fails, we diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 19f7858aa2b91..dff735e36da96 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6902,6 +6902,97 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror } } +/* + * Submit a repair write. + * + * This bypasses btrfs_submit_bio deliberately, as that writes all copies in a + * RAID setup. Here we only want to write the one bad copy, so we do the + * mapping ourselves and submit the bio directly. + * + * The I/O is Ń–ssued sychronously to block the repair read completion from + * freeing the bio. + */ +int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start, + u64 length, u64 logical, struct page *page, + unsigned int pg_offset, int mirror_num) +{ + struct btrfs_device *dev; + struct bio_vec bvec; + struct bio bio; + u64 map_length = 0; + u64 sector; + struct btrfs_io_context *bioc = NULL; + int ret = 0; + + ASSERT(!(fs_info->sb->s_flags & SB_RDONLY)); + BUG_ON(!mirror_num); + + if (btrfs_repair_one_zone(fs_info, logical)) + return 0; + + map_length = length; + + /* + * Avoid races with device replace and make sure our bioc has devices + * associated to its stripes that don't go away while we are doing the + * read repair operation. + */ + btrfs_bio_counter_inc_blocked(fs_info); + if (btrfs_is_parity_mirror(fs_info, logical, length)) { + /* + * Note that we don't use BTRFS_MAP_WRITE because it's supposed + * to update all raid stripes, but here we just want to correct + * bad stripe, thus BTRFS_MAP_READ is abused to only get the bad + * stripe's dev and sector. + */ + ret = btrfs_map_block(fs_info, BTRFS_MAP_READ, logical, + &map_length, &bioc, 0); + if (ret) + goto out_counter_dec; + ASSERT(bioc->mirror_num == 1); + } else { + ret = btrfs_map_block(fs_info, BTRFS_MAP_WRITE, logical, + &map_length, &bioc, mirror_num); + if (ret) + goto out_counter_dec; + BUG_ON(mirror_num != bioc->mirror_num); + } + + sector = bioc->stripes[bioc->mirror_num - 1].physical >> 9; + dev = bioc->stripes[bioc->mirror_num - 1].dev; + btrfs_put_bioc(bioc); + + if (!dev || !dev->bdev || + !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state)) { + ret = -EIO; + goto out_counter_dec; + } + + bio_init(&bio, dev->bdev, &bvec, 1, REQ_OP_WRITE | REQ_SYNC); + bio.bi_iter.bi_sector = sector; + __bio_add_page(&bio, page, length, pg_offset); + + btrfsic_check_bio(&bio); + ret = submit_bio_wait(&bio); + if (ret) { + /* try to remap that extent elsewhere? */ + btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS); + goto out_bio_uninit; + } + + btrfs_info_rl_in_rcu(fs_info, + "read error corrected: ino %llu off %llu (dev %s sector %llu)", + ino, start, + rcu_str_deref(dev->name), sector); + ret = 0; + +out_bio_uninit: + bio_uninit(&bio); +out_counter_dec: + btrfs_bio_counter_dec(fs_info); + return ret; +} + static bool dev_args_match_fs_devices(const struct btrfs_dev_lookup_args *args, const struct btrfs_fs_devices *fs_devices) { diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index f19a1cd1bfcf2..b368356fa78a1 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -598,6 +598,9 @@ struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans, u64 type); void btrfs_mapping_tree_free(struct extent_map_tree *tree); void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror_num); +int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start, + u64 length, u64 logical, struct page *page, + unsigned int pg_offset, int mirror_num); int btrfs_open_devices(struct btrfs_fs_devices *fs_devices, fmode_t flags, void *holder); struct btrfs_device *btrfs_scan_one_device(const char *path, From patchwork Thu Sep 1 07:42:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12962036 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 98018C67868 for ; Thu, 1 Sep 2022 07:43:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232819AbiIAHnK (ORCPT ); Thu, 1 Sep 2022 03:43:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34284 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233663AbiIAHmy (ORCPT ); Thu, 1 Sep 2022 03:42:54 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B43571257F6; Thu, 1 Sep 2022 00:42:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=iikjnqPzjg+9wg5AV+fH99ZJqAg14r4OUufEN2EMzJU=; b=GhIR5+T5Eonv03SbvAV/FtZ814 8q2zH+pbybR2SgjVMsQfPiEzWTq/FbEjJEty7ZNYAVM8/IHSj52O8yxwcDqFyHkqsLcBtXSmc2Bfz wonx9NeaTYoW+7NhPLyStlPE8T5HIhsu5cNvku8sUm/gJdoxrjcmvSVnfBdkvxfoFrlulfnUaR6j4 9JNv3mpzNMZapD0s/plCbyDYW4lcwPlRshBPWfLnnUswku4P0zYL+UL3AN1kf5TTqDIIYN5V7m1gE BRcQNqRg30O1dg/7JzI2ScVaYuWUQCyu9cUj4qhetuEwkzrEAxXlq+9TW/w8cCxeAraCW+TuIIo/9 /JAowArg==; Received: from 213-225-1-14.nat.highway.a1.net ([213.225.1.14] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1oTeqS-00ANZ5-Hn; Thu, 01 Sep 2022 07:42:37 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Damien Le Moal , Naohiro Aota , Johannes Thumshirn , Qu Wenruo , Jens Axboe , "Darrick J. Wong" , linux-block@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 04/17] btrfs: handle checksum validation and repair at the storage layer Date: Thu, 1 Sep 2022 10:42:03 +0300 Message-Id: <20220901074216.1849941-5-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220901074216.1849941-1-hch@lst.de> References: <20220901074216.1849941-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently btrfs handles checksum validation and repair in the end I/O handler for the btrfs_bio. This leads to a lot of duplicate code plus issues with variying semantics or bugs, e.g. - the until recently completetly broken repair for compressed extents - the fact that encoded reads validate the checksums but do not kick of read repair - the inconsistent checking of the BTRFS_FS_STATE_NO_CSUMS flag This commit revamps the checksum validation and repair code to instead work below the btrfs_submit_bio interfaces. For this to work we need to make sure an inode is available, so that is added as a parameter to btrfs_bio_alloc. With that btrfs_submit_bio can preload btrfs_bio.csum from the csum tree without help from the upper layers, and the low-level I/O completion can iterate over the bio and verify the checksums. In case of a checksum failure (or a plain old I/O error), the repair is now kicked off before the upper level ->end_io handler is invoked. Tracking of the repair status is massively simplified by just keeping a small failed_bio structure per bio with failed sectors and otherwise using the information in the repair bio. The per-inode I/O failure tree can be entirely removed. The saved bvec_iter in the btrfs_bio is now competely managed by btrfs_submit_bio and must not be accessed by the callers. There is one significant behavior change here: If repair fails or is impossible to start with, the whole bio will be failed to the upper layer. This is the behavior that all I/O submitters execept for buffered I/O already emulated in their end_io handler. For buffered I/O this now means that a large readahead request can fail due to a single bad sector, but as readahead errors are igored the following readpage if the sector is actually accessed will still be able to read. This also matches the I/O failure handling in other file systems. Signed-off-by: Christoph Hellwig --- fs/btrfs/btrfs_inode.h | 5 - fs/btrfs/compression.c | 54 +---- fs/btrfs/ctree.h | 13 +- fs/btrfs/extent-io-tree.h | 18 -- fs/btrfs/extent_io.c | 451 +---------------------------------- fs/btrfs/extent_io.h | 28 --- fs/btrfs/file-item.c | 42 ++-- fs/btrfs/inode.c | 320 ++++--------------------- fs/btrfs/volumes.c | 238 ++++++++++++++++-- fs/btrfs/volumes.h | 49 ++-- include/trace/events/btrfs.h | 1 - 11 files changed, 320 insertions(+), 899 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index b160b8e124e01..4cb9898869019 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -91,11 +91,6 @@ struct btrfs_inode { /* the io_tree does range state (DIRTY, LOCKED etc) */ struct extent_io_tree io_tree; - /* special utility tree used to record which mirrors have already been - * tried when checksums fail for a given block - */ - struct extent_io_tree io_failure_tree; - /* * Keep track of where the inode has extent items mapped in order to * make sure the i_size adjustments are accurate diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 1c77de3239bc4..f932415a4f1df 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -159,53 +159,15 @@ static void finish_compressed_bio_read(struct compressed_bio *cb) kfree(cb); } -/* - * Verify the checksums and kick off repair if needed on the uncompressed data - * before decompressing it into the original bio and freeing the uncompressed - * pages. - */ static void end_compressed_bio_read(struct btrfs_bio *bbio) { struct compressed_bio *cb = bbio->private; - struct inode *inode = cb->inode; - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct btrfs_inode *bi = BTRFS_I(inode); - bool csum = !(bi->flags & BTRFS_INODE_NODATASUM) && - !test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state); - blk_status_t status = bbio->bio.bi_status; - struct bvec_iter iter; - struct bio_vec bv; - u32 offset; - - btrfs_bio_for_each_sector(fs_info, bv, bbio, iter, offset) { - u64 start = bbio->file_offset + offset; - - if (!status && - (!csum || !btrfs_check_data_csum(inode, bbio, offset, - bv.bv_page, bv.bv_offset))) { - clean_io_failure(fs_info, &bi->io_failure_tree, - &bi->io_tree, start, bv.bv_page, - btrfs_ino(bi), bv.bv_offset); - } else { - int ret; - - refcount_inc(&cb->pending_ios); - ret = btrfs_repair_one_sector(inode, bbio, offset, - bv.bv_page, bv.bv_offset, - btrfs_submit_data_read_bio); - if (ret) { - refcount_dec(&cb->pending_ios); - status = errno_to_blk_status(ret); - } - } - } - if (status) - cb->status = status; + if (bbio->bio.bi_status) + cb->status = bbio->bio.bi_status; if (refcount_dec_and_test(&cb->pending_ios)) finish_compressed_bio_read(cb); - btrfs_bio_free_csum(bbio); bio_put(&bbio->bio); } @@ -342,7 +304,7 @@ static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_byte struct bio *bio; int ret; - bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, endio_func, cb); + bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, cb->inode, endio_func, cb); bio->bi_iter.bi_sector = disk_bytenr >> SECTOR_SHIFT; em = btrfs_get_chunk_map(fs_info, disk_bytenr, fs_info->sectorsize); @@ -778,10 +740,6 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, submit = true; if (submit) { - /* Save the original iter for read repair */ - if (bio_op(comp_bio) == REQ_OP_READ) - btrfs_bio(comp_bio)->iter = comp_bio->bi_iter; - /* * Save the initial offset of this chunk, as there * is no direct correlation between compressed pages and @@ -790,12 +748,6 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, */ btrfs_bio(comp_bio)->file_offset = file_offset; - ret = btrfs_lookup_bio_sums(inode, comp_bio, NULL); - if (ret) { - btrfs_bio_end_io(btrfs_bio(comp_bio), ret); - break; - } - ASSERT(comp_bio->bi_iter.bi_size); btrfs_submit_bio(fs_info, comp_bio, mirror_num); comp_bio = NULL; diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 0069bc86c04f1..3dcb0d5f8faa0 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3344,7 +3344,7 @@ int btrfs_find_orphan_item(struct btrfs_root *root, u64 offset); /* file-item.c */ int btrfs_del_csums(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 bytenr, u64 len); -blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst); +int btrfs_lookup_bio_sums(struct btrfs_bio *bbio); int btrfs_insert_hole_extent(struct btrfs_trans_handle *trans, struct btrfs_root *root, u64 objectid, u64 pos, u64 num_bytes); @@ -3375,15 +3375,8 @@ u64 btrfs_file_extent_end(const struct btrfs_path *path); void btrfs_submit_data_write_bio(struct inode *inode, struct bio *bio, int mirror_num); void btrfs_submit_data_read_bio(struct inode *inode, struct bio *bio, int mirror_num, enum btrfs_compression_type compress_type); -int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page, - u32 pgoff, u8 *csum, const u8 * const csum_expected); -int btrfs_check_data_csum(struct inode *inode, struct btrfs_bio *bbio, - u32 bio_offset, struct page *page, u32 pgoff); -unsigned int btrfs_verify_data_csum(struct btrfs_bio *bbio, - u32 bio_offset, struct page *page, - u64 start, u64 end); -int btrfs_check_data_csum(struct inode *inode, struct btrfs_bio *bbio, - u32 bio_offset, struct page *page, u32 pgoff); +bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev, + u32 bio_offset, struct bio_vec *bv); struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode, u64 start, u64 len); noinline int can_nocow_extent(struct inode *inode, u64 offset, u64 *len, diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h index e218bb56d86ac..a1afe6e15943e 100644 --- a/fs/btrfs/extent-io-tree.h +++ b/fs/btrfs/extent-io-tree.h @@ -4,7 +4,6 @@ #define BTRFS_EXTENT_IO_TREE_H struct extent_changeset; -struct io_failure_record; /* Bits for the extent state */ #define EXTENT_DIRTY (1U << 0) @@ -55,7 +54,6 @@ enum { IO_TREE_FS_EXCLUDED_EXTENTS, IO_TREE_BTREE_INODE_IO, IO_TREE_INODE_IO, - IO_TREE_INODE_IO_FAILURE, IO_TREE_RELOC_BLOCKS, IO_TREE_TRANS_DIRTY_PAGES, IO_TREE_ROOT_DIRTY_LOG_PAGES, @@ -88,8 +86,6 @@ struct extent_state { refcount_t refs; u32 state; - struct io_failure_record *failrec; - #ifdef CONFIG_BTRFS_DEBUG struct list_head leak_list; #endif @@ -246,18 +242,4 @@ bool btrfs_find_delalloc_range(struct extent_io_tree *tree, u64 *start, u64 *end, u64 max_bytes, struct extent_state **cached_state); -/* This should be reworked in the future and put elsewhere. */ -struct io_failure_record *get_state_failrec(struct extent_io_tree *tree, u64 start); -int set_state_failrec(struct extent_io_tree *tree, u64 start, - struct io_failure_record *failrec); -void btrfs_free_io_failure_record(struct btrfs_inode *inode, u64 start, - u64 end); -int free_io_failure(struct extent_io_tree *failure_tree, - struct extent_io_tree *io_tree, - struct io_failure_record *rec); -int clean_io_failure(struct btrfs_fs_info *fs_info, - struct extent_io_tree *failure_tree, - struct extent_io_tree *io_tree, u64 start, - struct page *page, u64 ino, unsigned int pg_offset); - #endif /* BTRFS_EXTENT_IO_TREE_H */ diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index c83cc5677a08a..d8c43e2111a99 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -326,7 +326,6 @@ static struct extent_state *alloc_extent_state(gfp_t mask) if (!state) return state; state->state = 0; - state->failrec = NULL; RB_CLEAR_NODE(&state->rb_node); btrfs_leak_debug_add(&leak_lock, &state->leak_list, &states); refcount_set(&state->refs, 1); @@ -2159,66 +2158,6 @@ u64 count_range_bits(struct extent_io_tree *tree, return total_bytes; } -/* - * set the private field for a given byte offset in the tree. If there isn't - * an extent_state there already, this does nothing. - */ -int set_state_failrec(struct extent_io_tree *tree, u64 start, - struct io_failure_record *failrec) -{ - struct rb_node *node; - struct extent_state *state; - int ret = 0; - - spin_lock(&tree->lock); - /* - * this search will find all the extents that end after - * our range starts. - */ - node = tree_search(tree, start); - if (!node) { - ret = -ENOENT; - goto out; - } - state = rb_entry(node, struct extent_state, rb_node); - if (state->start != start) { - ret = -ENOENT; - goto out; - } - state->failrec = failrec; -out: - spin_unlock(&tree->lock); - return ret; -} - -struct io_failure_record *get_state_failrec(struct extent_io_tree *tree, u64 start) -{ - struct rb_node *node; - struct extent_state *state; - struct io_failure_record *failrec; - - spin_lock(&tree->lock); - /* - * this search will find all the extents that end after - * our range starts. - */ - node = tree_search(tree, start); - if (!node) { - failrec = ERR_PTR(-ENOENT); - goto out; - } - state = rb_entry(node, struct extent_state, rb_node); - if (state->start != start) { - failrec = ERR_PTR(-ENOENT); - goto out; - } - - failrec = state->failrec; -out: - spin_unlock(&tree->lock); - return failrec; -} - /* * searches a range in the state tree for a given mask. * If 'filled' == 1, this returns 1 only if every extent in the tree @@ -2275,258 +2214,6 @@ int test_range_bit(struct extent_io_tree *tree, u64 start, u64 end, return bitset; } -int free_io_failure(struct extent_io_tree *failure_tree, - struct extent_io_tree *io_tree, - struct io_failure_record *rec) -{ - int ret; - - set_state_failrec(failure_tree, rec->start, NULL); - ret = clear_extent_bits(failure_tree, rec->start, - rec->start + rec->len - 1, - EXTENT_LOCKED | EXTENT_DIRTY); - kfree(rec); - return ret; -} - -static int next_mirror(const struct io_failure_record *failrec, int cur_mirror) -{ - if (cur_mirror == failrec->num_copies) - return cur_mirror + 1 - failrec->num_copies; - return cur_mirror + 1; -} - -static int prev_mirror(const struct io_failure_record *failrec, int cur_mirror) -{ - if (cur_mirror == 1) - return failrec->num_copies; - return cur_mirror - 1; -} - -/* - * each time an IO finishes, we do a fast check in the IO failure tree - * to see if we need to process or clean up an io_failure_record - */ -int clean_io_failure(struct btrfs_fs_info *fs_info, - struct extent_io_tree *failure_tree, - struct extent_io_tree *io_tree, u64 start, - struct page *page, u64 ino, unsigned int pg_offset) -{ - u64 private; - struct io_failure_record *failrec; - struct extent_state *state; - int mirror; - int ret; - - private = 0; - ret = count_range_bits(failure_tree, &private, (u64)-1, 1, - EXTENT_DIRTY, 0); - if (!ret) - return 0; - - failrec = get_state_failrec(failure_tree, start); - if (IS_ERR(failrec)) - return 0; - - BUG_ON(!failrec->this_mirror); - - if (sb_rdonly(fs_info->sb)) - goto out; - - spin_lock(&io_tree->lock); - state = find_first_extent_bit_state(io_tree, - failrec->start, - EXTENT_LOCKED); - spin_unlock(&io_tree->lock); - - if (!state || state->start > failrec->start || - state->end < failrec->start + failrec->len - 1) - goto out; - - mirror = failrec->this_mirror; - do { - mirror = prev_mirror(failrec, mirror); - btrfs_repair_io_failure(fs_info, ino, start, failrec->len, - failrec->logical, page, pg_offset, mirror); - } while (mirror != failrec->failed_mirror); - -out: - free_io_failure(failure_tree, io_tree, failrec); - return 0; -} - -/* - * Can be called when - * - hold extent lock - * - under ordered extent - * - the inode is freeing - */ -void btrfs_free_io_failure_record(struct btrfs_inode *inode, u64 start, u64 end) -{ - struct extent_io_tree *failure_tree = &inode->io_failure_tree; - struct io_failure_record *failrec; - struct extent_state *state, *next; - - if (RB_EMPTY_ROOT(&failure_tree->state)) - return; - - spin_lock(&failure_tree->lock); - state = find_first_extent_bit_state(failure_tree, start, EXTENT_DIRTY); - while (state) { - if (state->start > end) - break; - - ASSERT(state->end <= end); - - next = next_state(state); - - failrec = state->failrec; - free_extent_state(state); - kfree(failrec); - - state = next; - } - spin_unlock(&failure_tree->lock); -} - -static struct io_failure_record *btrfs_get_io_failure_record(struct inode *inode, - struct btrfs_bio *bbio, - unsigned int bio_offset) -{ - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - u64 start = bbio->file_offset + bio_offset; - struct io_failure_record *failrec; - struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree; - const u32 sectorsize = fs_info->sectorsize; - int ret; - - failrec = get_state_failrec(failure_tree, start); - if (!IS_ERR(failrec)) { - btrfs_debug(fs_info, - "Get IO Failure Record: (found) logical=%llu, start=%llu, len=%llu", - failrec->logical, failrec->start, failrec->len); - /* - * when data can be on disk more than twice, add to failrec here - * (e.g. with a list for failed_mirror) to make - * clean_io_failure() clean all those errors at once. - */ - ASSERT(failrec->this_mirror == bbio->mirror_num); - ASSERT(failrec->len == fs_info->sectorsize); - return failrec; - } - - failrec = kzalloc(sizeof(*failrec), GFP_NOFS); - if (!failrec) - return ERR_PTR(-ENOMEM); - - failrec->start = start; - failrec->len = sectorsize; - failrec->failed_mirror = bbio->mirror_num; - failrec->this_mirror = bbio->mirror_num; - failrec->logical = (bbio->iter.bi_sector << SECTOR_SHIFT) + bio_offset; - - btrfs_debug(fs_info, - "new io failure record logical %llu start %llu", - failrec->logical, start); - - failrec->num_copies = btrfs_num_copies(fs_info, failrec->logical, sectorsize); - if (failrec->num_copies == 1) { - /* - * We only have a single copy of the data, so don't bother with - * all the retry and error correction code that follows. No - * matter what the error is, it is very likely to persist. - */ - btrfs_debug(fs_info, - "cannot repair logical %llu num_copies %d", - failrec->logical, failrec->num_copies); - kfree(failrec); - return ERR_PTR(-EIO); - } - - /* Set the bits in the private failure tree */ - ret = set_extent_bits(failure_tree, start, start + sectorsize - 1, - EXTENT_LOCKED | EXTENT_DIRTY); - if (ret >= 0) { - ret = set_state_failrec(failure_tree, start, failrec); - } else if (ret < 0) { - kfree(failrec); - return ERR_PTR(ret); - } - - return failrec; -} - -int btrfs_repair_one_sector(struct inode *inode, struct btrfs_bio *failed_bbio, - u32 bio_offset, struct page *page, unsigned int pgoff, - submit_bio_hook_t *submit_bio_hook) -{ - u64 start = failed_bbio->file_offset + bio_offset; - struct io_failure_record *failrec; - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree; - struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree; - struct bio *failed_bio = &failed_bbio->bio; - const int icsum = bio_offset >> fs_info->sectorsize_bits; - struct bio *repair_bio; - struct btrfs_bio *repair_bbio; - - btrfs_debug(fs_info, - "repair read error: read error at %llu", start); - - BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE); - - failrec = btrfs_get_io_failure_record(inode, failed_bbio, bio_offset); - if (IS_ERR(failrec)) - return PTR_ERR(failrec); - - /* - * There are two premises: - * a) deliver good data to the caller - * b) correct the bad sectors on disk - * - * Since we're only doing repair for one sector, we only need to get - * a good copy of the failed sector and if we succeed, we have setup - * everything for btrfs_repair_io_failure to do the rest for us. - */ - failrec->this_mirror = next_mirror(failrec, failrec->this_mirror); - if (failrec->this_mirror == failrec->failed_mirror) { - btrfs_debug(fs_info, - "failed to repair num_copies %d this_mirror %d failed_mirror %d", - failrec->num_copies, failrec->this_mirror, failrec->failed_mirror); - free_io_failure(failure_tree, tree, failrec); - return -EIO; - } - - repair_bio = btrfs_bio_alloc(1, REQ_OP_READ, failed_bbio->end_io, - failed_bbio->private); - repair_bbio = btrfs_bio(repair_bio); - repair_bbio->file_offset = start; - repair_bio->bi_iter.bi_sector = failrec->logical >> 9; - - if (failed_bbio->csum) { - const u32 csum_size = fs_info->csum_size; - - repair_bbio->csum = repair_bbio->csum_inline; - memcpy(repair_bbio->csum, - failed_bbio->csum + csum_size * icsum, csum_size); - } - - bio_add_page(repair_bio, page, failrec->len, pgoff); - repair_bbio->iter = repair_bio->bi_iter; - - btrfs_debug(btrfs_sb(inode->i_sb), - "repair read error: submitting new read to mirror %d", - failrec->this_mirror); - - /* - * At this point we have a bio, so any errors from submit_bio_hook() - * will be handled by the endio on the repair_bio, so we can't return an - * error here. - */ - submit_bio_hook(inode, repair_bio, failrec->this_mirror, 0); - return BLK_STS_OK; -} - static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len) { struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb); @@ -2555,84 +2242,6 @@ static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len) btrfs_subpage_end_reader(fs_info, page, start, len); } -static void end_sector_io(struct page *page, u64 offset, bool uptodate) -{ - struct btrfs_inode *inode = BTRFS_I(page->mapping->host); - const u32 sectorsize = inode->root->fs_info->sectorsize; - struct extent_state *cached = NULL; - - end_page_read(page, uptodate, offset, sectorsize); - if (uptodate) - set_extent_uptodate(&inode->io_tree, offset, - offset + sectorsize - 1, &cached, GFP_ATOMIC); - unlock_extent_cached_atomic(&inode->io_tree, offset, - offset + sectorsize - 1, &cached); -} - -static void submit_data_read_repair(struct inode *inode, - struct btrfs_bio *failed_bbio, - u32 bio_offset, const struct bio_vec *bvec, - unsigned int error_bitmap) -{ - const unsigned int pgoff = bvec->bv_offset; - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct page *page = bvec->bv_page; - const u64 start = page_offset(bvec->bv_page) + bvec->bv_offset; - const u64 end = start + bvec->bv_len - 1; - const u32 sectorsize = fs_info->sectorsize; - const int nr_bits = (end + 1 - start) >> fs_info->sectorsize_bits; - int i; - - BUG_ON(bio_op(&failed_bbio->bio) == REQ_OP_WRITE); - - /* This repair is only for data */ - ASSERT(is_data_inode(inode)); - - /* We're here because we had some read errors or csum mismatch */ - ASSERT(error_bitmap); - - /* - * We only get called on buffered IO, thus page must be mapped and bio - * must not be cloned. - */ - ASSERT(page->mapping && !bio_flagged(&failed_bbio->bio, BIO_CLONED)); - - /* Iterate through all the sectors in the range */ - for (i = 0; i < nr_bits; i++) { - const unsigned int offset = i * sectorsize; - bool uptodate = false; - int ret; - - if (!(error_bitmap & (1U << i))) { - /* - * This sector has no error, just end the page read - * and unlock the range. - */ - uptodate = true; - goto next; - } - - ret = btrfs_repair_one_sector(inode, failed_bbio, - bio_offset + offset, page, pgoff + offset, - btrfs_submit_data_read_bio); - if (!ret) { - /* - * We have submitted the read repair, the page release - * will be handled by the endio function of the - * submitted repair bio. - * Thus we don't need to do any thing here. - */ - continue; - } - /* - * Continue on failed repair, otherwise the remaining sectors - * will not be properly unlocked. - */ -next: - end_sector_io(page, start + offset, uptodate); - } -} - /* lots and lots of room for performance fixes in the end_bio funcs */ void end_extent_writepage(struct page *page, int err, u64 start, u64 end) @@ -2835,7 +2444,6 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio) { struct bio *bio = &bbio->bio; struct bio_vec *bvec; - struct extent_io_tree *tree, *failure_tree; struct processed_extent processed = { 0 }; /* * The offset to the beginning of a bio, since one bio can never be @@ -2852,8 +2460,6 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio) struct inode *inode = page->mapping->host; struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); const u32 sectorsize = fs_info->sectorsize; - unsigned int error_bitmap = (unsigned int)-1; - bool repair = false; u64 start; u64 end; u32 len; @@ -2862,8 +2468,6 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio) "end_bio_extent_readpage: bi_sector=%llu, err=%d, mirror=%u", bio->bi_iter.bi_sector, bio->bi_status, bbio->mirror_num); - tree = &BTRFS_I(inode)->io_tree; - failure_tree = &BTRFS_I(inode)->io_failure_tree; /* * We always issue full-sector reads, but if some block in a @@ -2887,27 +2491,15 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio) len = bvec->bv_len; mirror = bbio->mirror_num; - if (likely(uptodate)) { - if (is_data_inode(inode)) { - error_bitmap = btrfs_verify_data_csum(bbio, - bio_offset, page, start, end); - if (error_bitmap) - uptodate = false; - } else { - if (btrfs_validate_metadata_buffer(bbio, - page, start, end, mirror)) - uptodate = false; - } - } + if (uptodate && !is_data_inode(inode) && + btrfs_validate_metadata_buffer(bbio, page, start, end, + mirror)) + uptodate = false; if (likely(uptodate)) { loff_t i_size = i_size_read(inode); pgoff_t end_index = i_size >> PAGE_SHIFT; - clean_io_failure(BTRFS_I(inode)->root->fs_info, - failure_tree, tree, start, page, - btrfs_ino(BTRFS_I(inode)), 0); - /* * Zero out the remaining part if this range straddles * i_size. @@ -2924,19 +2516,7 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio) zero_user_segment(page, zero_start, offset_in_page(end) + 1); } - } else if (is_data_inode(inode)) { - /* - * Only try to repair bios that actually made it to a - * device. If the bio failed to be submitted mirror - * is 0 and we need to fail it without retrying. - * - * This also includes the high level bios for compressed - * extents - these never make it to a device and repair - * is already handled on the lower compressed bio. - */ - if (mirror > 0) - repair = true; - } else { + } else if (!is_data_inode(inode)) { struct extent_buffer *eb; eb = find_extent_buffer_readpage(fs_info, page, start); @@ -2945,19 +2525,10 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio) atomic_dec(&eb->io_pages); } - if (repair) { - /* - * submit_data_read_repair() will handle all the good - * and bad sectors, we just continue to the next bvec. - */ - submit_data_read_repair(inode, bbio, bio_offset, bvec, - error_bitmap); - } else { - /* Update page status and unlock */ - end_page_read(page, uptodate, start, len); - endio_readpage_release_extent(&processed, BTRFS_I(inode), - start, end, PageUptodate(page)); - } + /* Update page status and unlock */ + end_page_read(page, uptodate, start, len); + endio_readpage_release_extent(&processed, BTRFS_I(inode), + start, end, PageUptodate(page)); ASSERT(bio_offset + len > bio_offset); bio_offset += len; @@ -2965,7 +2536,6 @@ static void end_bio_extent_readpage(struct btrfs_bio *bbio) } /* Release the last extent */ endio_readpage_release_extent(&processed, NULL, 0, 0, false); - btrfs_bio_free_csum(bbio); bio_put(bio); } @@ -3158,7 +2728,8 @@ static int alloc_new_bio(struct btrfs_inode *inode, struct bio *bio; int ret; - bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, end_io_func, NULL); + bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, &inode->vfs_inode, end_io_func, + NULL); /* * For compressed page range, its disk_bytenr is always @disk_bytenr * passed in, no matter if we have added any range into previous bio. diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index e653e64598bf7..caf3343d1a36c 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -57,17 +57,11 @@ enum { #define BITMAP_LAST_BYTE_MASK(nbits) \ (BYTE_MASK >> (-(nbits) & (BITS_PER_BYTE - 1))) -struct btrfs_bio; struct btrfs_root; struct btrfs_inode; struct btrfs_fs_info; -struct io_failure_record; struct extent_io_tree; -typedef void (submit_bio_hook_t)(struct inode *inode, struct bio *bio, - int mirror_num, - enum btrfs_compression_type compress_type); - typedef blk_status_t (extent_submit_bio_start_t)(struct inode *inode, struct bio *bio, u64 dio_file_offset); @@ -244,28 +238,6 @@ int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array); void end_extent_writepage(struct page *page, int err, u64 start, u64 end); -/* - * When IO fails, either with EIO or csum verification fails, we - * try other mirrors that might have a good copy of the data. This - * io_failure_record is used to record state as we go through all the - * mirrors. If another mirror has good data, the sector is set up to date - * and things continue. If a good mirror can't be found, the original - * bio end_io callback is called to indicate things have failed. - */ -struct io_failure_record { - struct page *page; - u64 start; - u64 len; - u64 logical; - int this_mirror; - int failed_mirror; - int num_copies; -}; - -int btrfs_repair_one_sector(struct inode *inode, struct btrfs_bio *failed_bbio, - u32 bio_offset, struct page *page, unsigned int pgoff, - submit_bio_hook_t *submit_bio_hook); - #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS bool find_lock_delalloc_range(struct inode *inode, struct page *locked_page, u64 *start, diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index 29999686d234c..ffbac8f257908 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -359,27 +359,27 @@ static int search_file_offset_in_bio(struct bio *bio, struct inode *inode, * NULL, the checksum buffer is allocated and returned in * btrfs_bio(bio)->csum instead. * - * Return: BLK_STS_RESOURCE if allocating memory fails, BLK_STS_OK otherwise. + * Return: -ENOMEM if allocating memory fails, 0 otherwise. */ -blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst) +int btrfs_lookup_bio_sums(struct btrfs_bio *bbio) { + struct inode *inode = bbio->inode; struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; - struct btrfs_bio *bbio = NULL; + struct bio *bio = &bbio->bio; struct btrfs_path *path; const u32 sectorsize = fs_info->sectorsize; const u32 csum_size = fs_info->csum_size; u32 orig_len = bio->bi_iter.bi_size; u64 orig_disk_bytenr = bio->bi_iter.bi_sector << SECTOR_SHIFT; u64 cur_disk_bytenr; - u8 *csum; const unsigned int nblocks = orig_len >> fs_info->sectorsize_bits; int count = 0; - blk_status_t ret = BLK_STS_OK; + int ret = 0; if ((BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) || test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state)) - return BLK_STS_OK; + return 0; /* * This function is only called for read bio. @@ -396,23 +396,16 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst ASSERT(bio_op(bio) == REQ_OP_READ); path = btrfs_alloc_path(); if (!path) - return BLK_STS_RESOURCE; - - if (!dst) { - bbio = btrfs_bio(bio); + return -ENOMEM; - if (nblocks * csum_size > BTRFS_BIO_INLINE_CSUM_SIZE) { - bbio->csum = kmalloc_array(nblocks, csum_size, GFP_NOFS); - if (!bbio->csum) { - btrfs_free_path(path); - return BLK_STS_RESOURCE; - } - } else { - bbio->csum = bbio->csum_inline; + if (nblocks * csum_size > BTRFS_BIO_INLINE_CSUM_SIZE) { + bbio->csum = kmalloc_array(nblocks, csum_size, GFP_NOFS); + if (!bbio->csum) { + btrfs_free_path(path); + return -ENOMEM; } - csum = bbio->csum; } else { - csum = dst; + bbio->csum = bbio->csum_inline; } /* @@ -451,14 +444,15 @@ blk_status_t btrfs_lookup_bio_sums(struct inode *inode, struct bio *bio, u8 *dst ASSERT(cur_disk_bytenr - orig_disk_bytenr < UINT_MAX); sector_offset = (cur_disk_bytenr - orig_disk_bytenr) >> fs_info->sectorsize_bits; - csum_dst = csum + sector_offset * csum_size; + csum_dst = bbio->csum + sector_offset * csum_size; count = search_csum_tree(fs_info, path, cur_disk_bytenr, search_len, csum_dst); if (count < 0) { - ret = errno_to_blk_status(count); - if (bbio) - btrfs_bio_free_csum(bbio); + ret = count; + if (bbio->csum != bbio->csum_inline) + kfree(bbio->csum); + bbio->csum = NULL; break; } diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index b9d40e25d978c..b3466015008c7 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -85,9 +85,6 @@ struct btrfs_dio_private { */ refcount_t refs; - /* Array of checksums */ - u8 *csums; - /* This must be last */ struct bio bio; }; @@ -2735,9 +2732,6 @@ void btrfs_submit_data_write_bio(struct inode *inode, struct bio *bio, int mirro void btrfs_submit_data_read_bio(struct inode *inode, struct bio *bio, int mirror_num, enum btrfs_compression_type compress_type) { - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - blk_status_t ret; - if (compress_type != BTRFS_COMPRESS_NONE) { /* * btrfs_submit_compressed_read will handle completing the bio @@ -2747,20 +2741,7 @@ void btrfs_submit_data_read_bio(struct inode *inode, struct bio *bio, return; } - /* Save the original iter for read repair */ - btrfs_bio(bio)->iter = bio->bi_iter; - - /* - * Lookup bio sums does extra checks around whether we need to csum or - * not, which is why we ignore skip_sum here. - */ - ret = btrfs_lookup_bio_sums(inode, bio, NULL); - if (ret) { - btrfs_bio_end_io(btrfs_bio(bio), ret); - return; - } - - btrfs_submit_bio(fs_info, bio, mirror_num); + btrfs_submit_bio(btrfs_sb(inode->i_sb), bio, mirror_num); } /* @@ -3238,8 +3219,6 @@ int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) ordered_extent->disk_num_bytes); } - btrfs_free_io_failure_record(inode, start, end); - if (test_bit(BTRFS_ORDERED_TRUNCATED, &ordered_extent->flags)) { truncated = true; logical_len = ordered_extent->truncated_len; @@ -3417,133 +3396,64 @@ void btrfs_writepage_endio_finish_ordered(struct btrfs_inode *inode, } /* - * Verify the checksum for a single sector without any extra action that depend - * on the type of I/O. + * btrfs_data_csum_ok - verify the checksum of single data sector + * @bbio: btrfs_io_bio which contains the csum + * @dev: device the sector is on + * @bio_offset: offset to the beginning of the bio (in bytes) + * @bv: bio_vec to check + * + * Check if the checksum on a data block is valid. When a checksum mismatch is + * detected, report the error and fill the corrupted range with zero. + * + * Return %true if the sector is ok or had no checksum to start with, else + * %false. */ -int btrfs_check_sector_csum(struct btrfs_fs_info *fs_info, struct page *page, - u32 pgoff, u8 *csum, const u8 * const csum_expected) +bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev, + u32 bio_offset, struct bio_vec *bv) { + struct btrfs_fs_info *fs_info = btrfs_sb(bbio->inode->i_sb); + struct btrfs_inode *bi = BTRFS_I(bbio->inode); + u64 file_offset = bbio->file_offset + bio_offset; + u64 end = file_offset + bv->bv_len - 1; SHASH_DESC_ON_STACK(shash, fs_info->csum_shash); + u8 *csum_expected; + u8 csum[BTRFS_CSUM_SIZE]; char *kaddr; - ASSERT(pgoff + fs_info->sectorsize <= PAGE_SIZE); + ASSERT(bv->bv_len == fs_info->sectorsize); + + if (!bbio->csum) + return true; + + if (btrfs_is_data_reloc_root(bi->root) && + test_range_bit(&bi->io_tree, file_offset, end, EXTENT_NODATASUM, + 1, NULL)) { + /* Skip the range without csum for data reloc inode */ + clear_extent_bits(&bi->io_tree, file_offset, end, + EXTENT_NODATASUM); + return true; + } + + csum_expected = btrfs_csum_ptr(fs_info, bbio->csum, bio_offset); shash->tfm = fs_info->csum_shash; - kaddr = kmap_local_page(page) + pgoff; + kaddr = bvec_kmap_local(bv); crypto_shash_digest(shash, kaddr, fs_info->sectorsize, csum); kunmap_local(kaddr); if (memcmp(csum, csum_expected, fs_info->csum_size)) - return -EIO; - return 0; -} - -/* - * check_data_csum - verify checksum of one sector of uncompressed data - * @inode: inode - * @bbio: btrfs_bio which contains the csum - * @bio_offset: offset to the beginning of the bio (in bytes) - * @page: page where is the data to be verified - * @pgoff: offset inside the page - * - * The length of such check is always one sector size. - * - * When csum mismatch is detected, we will also report the error and fill the - * corrupted range with zero. (Thus it needs the extra parameters) - */ -int btrfs_check_data_csum(struct inode *inode, struct btrfs_bio *bbio, - u32 bio_offset, struct page *page, u32 pgoff) -{ - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - u32 len = fs_info->sectorsize; - u8 *csum_expected; - u8 csum[BTRFS_CSUM_SIZE]; - - ASSERT(pgoff + len <= PAGE_SIZE); - - csum_expected = btrfs_csum_ptr(fs_info, bbio->csum, bio_offset); - - if (btrfs_check_sector_csum(fs_info, page, pgoff, csum, csum_expected)) goto zeroit; - return 0; + return true; zeroit: - btrfs_print_data_csum_error(BTRFS_I(inode), - bbio->file_offset + bio_offset, - csum, csum_expected, bbio->mirror_num); - if (bbio->device) - btrfs_dev_stat_inc_and_print(bbio->device, + btrfs_print_data_csum_error(BTRFS_I(bbio->inode), file_offset, csum, + csum_expected, bbio->mirror_num); + if (dev) + btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_CORRUPTION_ERRS); - memzero_page(page, pgoff, len); - return -EIO; -} - -/* - * When reads are done, we need to check csums to verify the data is correct. - * if there's a match, we allow the bio to finish. If not, the code in - * extent_io.c will try to find good copies for us. - * - * @bio_offset: offset to the beginning of the bio (in bytes) - * @start: file offset of the range start - * @end: file offset of the range end (inclusive) - * - * Return a bitmap where bit set means a csum mismatch, and bit not set means - * csum match. - */ -unsigned int btrfs_verify_data_csum(struct btrfs_bio *bbio, - u32 bio_offset, struct page *page, - u64 start, u64 end) -{ - struct inode *inode = page->mapping->host; - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; - struct btrfs_root *root = BTRFS_I(inode)->root; - const u32 sectorsize = root->fs_info->sectorsize; - u32 pg_off; - unsigned int result = 0; - - /* - * This only happens for NODATASUM or compressed read. - * Normally this should be covered by above check for compressed read - * or the next check for NODATASUM. Just do a quicker exit here. - */ - if (bbio->csum == NULL) - return 0; - - if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) - return 0; - - if (unlikely(test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state))) - return 0; - - ASSERT(page_offset(page) <= start && - end <= page_offset(page) + PAGE_SIZE - 1); - for (pg_off = offset_in_page(start); - pg_off < offset_in_page(end); - pg_off += sectorsize, bio_offset += sectorsize) { - u64 file_offset = pg_off + page_offset(page); - int ret; - - if (btrfs_is_data_reloc_root(root) && - test_range_bit(io_tree, file_offset, - file_offset + sectorsize - 1, - EXTENT_NODATASUM, 1, NULL)) { - /* Skip the range without csum for data reloc inode */ - clear_extent_bits(io_tree, file_offset, - file_offset + sectorsize - 1, - EXTENT_NODATASUM); - continue; - } - ret = btrfs_check_data_csum(inode, bbio, bio_offset, page, pg_off); - if (ret < 0) { - const int nr_bit = (pg_off - offset_in_page(start)) >> - root->fs_info->sectorsize_bits; - - result |= (1U << nr_bit); - } - } - return result; + memzero_bvec(bv); + return false; } /* @@ -5437,8 +5347,6 @@ void btrfs_evict_inode(struct inode *inode) if (is_bad_inode(inode)) goto no_delete; - btrfs_free_io_failure_record(BTRFS_I(inode), 0, (u64)-1); - if (test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags)) goto no_delete; @@ -7974,60 +7882,9 @@ static void btrfs_dio_private_put(struct btrfs_dio_private *dip) dip->file_offset + dip->bytes - 1); } - kfree(dip->csums); bio_endio(&dip->bio); } -static void submit_dio_repair_bio(struct inode *inode, struct bio *bio, - int mirror_num, - enum btrfs_compression_type compress_type) -{ - struct btrfs_dio_private *dip = btrfs_bio(bio)->private; - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - - BUG_ON(bio_op(bio) == REQ_OP_WRITE); - - refcount_inc(&dip->refs); - btrfs_submit_bio(fs_info, bio, mirror_num); -} - -static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip, - struct btrfs_bio *bbio, - const bool uptodate) -{ - struct inode *inode = dip->inode; - struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; - struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree; - struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; - const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM); - blk_status_t err = BLK_STS_OK; - struct bvec_iter iter; - struct bio_vec bv; - u32 offset; - - btrfs_bio_for_each_sector(fs_info, bv, bbio, iter, offset) { - u64 start = bbio->file_offset + offset; - - if (uptodate && - (!csum || !btrfs_check_data_csum(inode, bbio, offset, bv.bv_page, - bv.bv_offset))) { - clean_io_failure(fs_info, failure_tree, io_tree, start, - bv.bv_page, btrfs_ino(BTRFS_I(inode)), - bv.bv_offset); - } else { - int ret; - - ret = btrfs_repair_one_sector(inode, bbio, offset, - bv.bv_page, bv.bv_offset, - submit_dio_repair_bio); - if (ret) - err = errno_to_blk_status(ret); - } - } - - return err; -} - static blk_status_t btrfs_submit_bio_start_direct_io(struct inode *inode, struct bio *bio, u64 dio_file_offset) @@ -8041,18 +7898,14 @@ static void btrfs_end_dio_bio(struct btrfs_bio *bbio) struct bio *bio = &bbio->bio; blk_status_t err = bio->bi_status; - if (err) + if (err) { btrfs_warn(BTRFS_I(dip->inode)->root->fs_info, "direct IO failed ino %llu rw %d,%u sector %#Lx len %u err no %d", btrfs_ino(BTRFS_I(dip->inode)), bio_op(bio), bio->bi_opf, bio->bi_iter.bi_sector, bio->bi_iter.bi_size, err); - - if (bio_op(bio) == REQ_OP_READ) - err = btrfs_check_read_dio_bio(dip, bbio, !err); - - if (err) dip->bio.bi_status = err; + } btrfs_record_physical_zoned(dip->inode, bbio->file_offset, bio); @@ -8064,13 +7917,8 @@ static void btrfs_submit_dio_bio(struct bio *bio, struct inode *inode, u64 file_offset, int async_submit) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct btrfs_dio_private *dip = btrfs_bio(bio)->private; blk_status_t ret; - - /* Save the original iter for read repair */ - if (btrfs_op(bio) == BTRFS_MAP_READ) - btrfs_bio(bio)->iter = bio->bi_iter; - + if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) goto map; @@ -8090,9 +7938,6 @@ static void btrfs_submit_dio_bio(struct bio *bio, struct inode *inode, btrfs_bio_end_io(btrfs_bio(bio), ret); return; } - } else { - btrfs_bio(bio)->csum = btrfs_csum_ptr(fs_info, dip->csums, - file_offset - dip->file_offset); } map: btrfs_submit_bio(fs_info, bio, 0); @@ -8104,7 +7949,6 @@ static void btrfs_submit_direct(const struct iomap_iter *iter, struct btrfs_dio_private *dip = container_of(dio_bio, struct btrfs_dio_private, bio); struct inode *inode = iter->inode; - const bool write = (btrfs_op(dio_bio) == BTRFS_MAP_WRITE); struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); const bool raid56 = (btrfs_data_alloc_profile(fs_info) & BTRFS_BLOCK_GROUP_RAID56_MASK); @@ -8125,25 +7969,6 @@ static void btrfs_submit_direct(const struct iomap_iter *iter, dip->file_offset = file_offset; dip->bytes = dio_bio->bi_iter.bi_size; refcount_set(&dip->refs, 1); - dip->csums = NULL; - - if (!write && !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)) { - unsigned int nr_sectors = - (dio_bio->bi_iter.bi_size >> fs_info->sectorsize_bits); - - /* - * Load the csums up front to reduce csum tree searches and - * contention when submitting bios. - */ - status = BLK_STS_RESOURCE; - dip->csums = kcalloc(nr_sectors, fs_info->csum_size, GFP_NOFS); - if (!dip) - goto out_err; - - status = btrfs_lookup_bio_sums(inode, dio_bio, dip->csums); - if (status != BLK_STS_OK) - goto out_err; - } start_sector = dio_bio->bi_iter.bi_sector; submit_len = dio_bio->bi_iter.bi_size; @@ -8171,7 +7996,7 @@ static void btrfs_submit_direct(const struct iomap_iter *iter, * the allocation is backed by btrfs_bioset. */ bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len, - btrfs_end_dio_bio, dip); + inode, btrfs_end_dio_bio, dip); btrfs_bio(bio)->file_offset = file_offset; if (bio_op(bio) == REQ_OP_ZONE_APPEND) { @@ -8918,12 +8743,9 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) inode = &ei->vfs_inode; extent_map_tree_init(&ei->extent_tree); extent_io_tree_init(fs_info, &ei->io_tree, IO_TREE_INODE_IO, inode); - extent_io_tree_init(fs_info, &ei->io_failure_tree, - IO_TREE_INODE_IO_FAILURE, inode); extent_io_tree_init(fs_info, &ei->file_extent_tree, IO_TREE_INODE_FILE_EXTENT, inode); ei->io_tree.track_uptodate = true; - ei->io_failure_tree.track_uptodate = true; atomic_set(&ei->sync_writers, 0); mutex_init(&ei->log_mutex); btrfs_ordered_inode_tree_init(&ei->ordered_tree); @@ -10370,7 +10192,6 @@ struct btrfs_encoded_read_private { wait_queue_head_t wait; atomic_t pending; blk_status_t status; - bool skip_csum; }; static blk_status_t submit_encoded_read_bio(struct btrfs_inode *inode, @@ -10378,57 +10199,17 @@ static blk_status_t submit_encoded_read_bio(struct btrfs_inode *inode, { struct btrfs_encoded_read_private *priv = btrfs_bio(bio)->private; struct btrfs_fs_info *fs_info = inode->root->fs_info; - blk_status_t ret; - - if (!priv->skip_csum) { - ret = btrfs_lookup_bio_sums(&inode->vfs_inode, bio, NULL); - if (ret) - return ret; - } atomic_inc(&priv->pending); btrfs_submit_bio(fs_info, bio, mirror_num); return BLK_STS_OK; } -static blk_status_t btrfs_encoded_read_verify_csum(struct btrfs_bio *bbio) -{ - const bool uptodate = (bbio->bio.bi_status == BLK_STS_OK); - struct btrfs_encoded_read_private *priv = bbio->private; - struct btrfs_inode *inode = priv->inode; - struct btrfs_fs_info *fs_info = inode->root->fs_info; - u32 sectorsize = fs_info->sectorsize; - struct bio_vec *bvec; - struct bvec_iter_all iter_all; - u32 bio_offset = 0; - - if (priv->skip_csum || !uptodate) - return bbio->bio.bi_status; - - bio_for_each_segment_all(bvec, &bbio->bio, iter_all) { - unsigned int i, nr_sectors, pgoff; - - nr_sectors = BTRFS_BYTES_TO_BLKS(fs_info, bvec->bv_len); - pgoff = bvec->bv_offset; - for (i = 0; i < nr_sectors; i++) { - ASSERT(pgoff < PAGE_SIZE); - if (btrfs_check_data_csum(&inode->vfs_inode, bbio, bio_offset, - bvec->bv_page, pgoff)) - return BLK_STS_IOERR; - bio_offset += sectorsize; - pgoff += sectorsize; - } - } - return BLK_STS_OK; -} - static void btrfs_encoded_read_endio(struct btrfs_bio *bbio) { struct btrfs_encoded_read_private *priv = bbio->private; - blk_status_t status; - status = btrfs_encoded_read_verify_csum(bbio); - if (status) { + if (bbio->bio.bi_status) { /* * The memory barrier implied by the atomic_dec_return() here * pairs with the memory barrier implied by the @@ -10437,11 +10218,10 @@ static void btrfs_encoded_read_endio(struct btrfs_bio *bbio) * write is observed before the load of status in * btrfs_encoded_read_regular_fill_pages(). */ - WRITE_ONCE(priv->status, status); + WRITE_ONCE(priv->status, bbio->bio.bi_status); } if (!atomic_dec_return(&priv->pending)) wake_up(&priv->wait); - btrfs_bio_free_csum(bbio); bio_put(&bbio->bio); } @@ -10454,7 +10234,6 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, .inode = inode, .file_offset = file_offset, .pending = ATOMIC_INIT(1), - .skip_csum = (inode->flags & BTRFS_INODE_NODATASUM), }; unsigned long i = 0; u64 cur = 0; @@ -10490,6 +10269,7 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, if (!bio) { bio = btrfs_bio_alloc(BIO_MAX_VECS, REQ_OP_READ, + &inode->vfs_inode, btrfs_encoded_read_endio, &priv); bio->bi_iter.bi_sector = diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index dff735e36da96..b8472ab466abe 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -35,6 +35,14 @@ #include "zoned.h" static struct bio_set btrfs_bioset; +static struct bio_set btrfs_repair_bioset; +static mempool_t btrfs_failed_bio_pool; + +struct btrfs_failed_bio { + struct btrfs_bio *bbio; + int num_copies; + atomic_t repair_count; +}; #define BTRFS_BLOCK_GROUP_STRIPE_MASK (BTRFS_BLOCK_GROUP_RAID0 | \ BTRFS_BLOCK_GROUP_RAID10 | \ @@ -6646,10 +6654,11 @@ int btrfs_map_sblock(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, * Initialize a btrfs_bio structure. This skips the embedded bio itself as it * is already initialized by the block layer. */ -static inline void btrfs_bio_init(struct btrfs_bio *bbio, - btrfs_bio_end_io_t end_io, void *private) +static void btrfs_bio_init(struct btrfs_bio *bbio, struct inode *inode, + btrfs_bio_end_io_t end_io, void *private) { memset(bbio, 0, offsetof(struct btrfs_bio, bio)); + bbio->inode = inode; bbio->end_io = end_io; bbio->private = private; } @@ -6662,16 +6671,18 @@ static inline void btrfs_bio_init(struct btrfs_bio *bbio, * a mempool. */ struct bio *btrfs_bio_alloc(unsigned int nr_vecs, blk_opf_t opf, - btrfs_bio_end_io_t end_io, void *private) + struct inode *inode, btrfs_bio_end_io_t end_io, + void *private) { struct bio *bio; bio = bio_alloc_bioset(NULL, nr_vecs, opf, GFP_NOFS, &btrfs_bioset); - btrfs_bio_init(btrfs_bio(bio), end_io, private); + btrfs_bio_init(btrfs_bio(bio), inode, end_io, private); return bio; } struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size, + struct inode *inode, btrfs_bio_end_io_t end_io, void *private) { struct bio *bio; @@ -6681,13 +6692,174 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size, bio = bio_alloc_clone(orig->bi_bdev, orig, GFP_NOFS, &btrfs_bioset); bbio = btrfs_bio(bio); - btrfs_bio_init(bbio, end_io, private); + btrfs_bio_init(bbio, inode, end_io, private); bio_trim(bio, offset >> 9, size >> 9); - bbio->iter = bio->bi_iter; return bio; } +static int next_repair_mirror(struct btrfs_failed_bio *fbio, int cur_mirror) +{ + if (cur_mirror == fbio->num_copies) + return cur_mirror + 1 - fbio->num_copies; + return cur_mirror + 1; +} + +static int prev_repair_mirror(struct btrfs_failed_bio *fbio, int cur_mirror) +{ + if (cur_mirror == 1) + return fbio->num_copies; + return cur_mirror - 1; +} + +static void btrfs_repair_done(struct btrfs_failed_bio *fbio) +{ + if (atomic_dec_and_test(&fbio->repair_count)) { + fbio->bbio->end_io(fbio->bbio); + mempool_free(fbio, &btrfs_failed_bio_pool); + } +} + +static void btrfs_end_repair_bio(struct btrfs_bio *repair_bbio, + struct btrfs_device *dev) +{ + struct btrfs_failed_bio *fbio = repair_bbio->private; + struct inode *inode = repair_bbio->inode; + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + struct bio_vec *bv = bio_first_bvec_all(&repair_bbio->bio); + int mirror = repair_bbio->mirror_num; + + if (repair_bbio->bio.bi_status || + !btrfs_data_csum_ok(repair_bbio, dev, 0, bv)) { + bio_reset(&repair_bbio->bio, NULL, REQ_OP_READ); + repair_bbio->bio.bi_iter = repair_bbio->saved_iter; + + mirror = next_repair_mirror(fbio, mirror); + if (mirror == fbio->bbio->mirror_num) { + btrfs_debug(fs_info, "no mirror left"); + fbio->bbio->bio.bi_status = BLK_STS_IOERR; + goto done; + } + + btrfs_submit_bio(fs_info, &repair_bbio->bio, mirror); + return; + } + + do { + mirror = prev_repair_mirror(fbio, mirror); + btrfs_repair_io_failure(fs_info, btrfs_ino(BTRFS_I(inode)), + repair_bbio->file_offset, fs_info->sectorsize, + repair_bbio->saved_iter.bi_sector << + SECTOR_SHIFT, + bv->bv_page, bv->bv_offset, mirror); + } while (mirror != fbio->bbio->mirror_num); + +done: + btrfs_repair_done(fbio); + bio_put(&repair_bbio->bio); +} + +/* + * Try to kick off a repair read to the next available mirror for a bad + * sector. + * + * This primarily tries to recover good data to serve the actual read request, + * but also tries to write the good data back to the bad mirror(s) when a + * read succeeded to restore the redundancy. + */ +static void repair_one_sector(struct btrfs_bio *failed_bbio, u32 bio_offset, + struct bio_vec *bv, + struct btrfs_failed_bio **fbio) +{ + struct inode *inode = failed_bbio->inode; + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + const u32 sectorsize = fs_info->sectorsize; + const u64 logical = failed_bbio->saved_iter.bi_sector << SECTOR_SHIFT; + struct btrfs_bio *repair_bbio; + struct bio *repair_bio; + int num_copies; + int mirror; + + btrfs_debug(fs_info, "repair read error: read error at %llu", + failed_bbio->file_offset + bio_offset); + + num_copies = btrfs_num_copies(fs_info, logical, sectorsize); + if (num_copies == 1) { + btrfs_debug(fs_info, "no copy to repair from"); + failed_bbio->bio.bi_status = BLK_STS_IOERR; + return; + } + + if (!*fbio) { + *fbio = mempool_alloc(&btrfs_failed_bio_pool, GFP_NOFS); + (*fbio)->bbio = failed_bbio; + (*fbio)->num_copies = num_copies; + atomic_set(&(*fbio)->repair_count, 1); + } + + atomic_inc(&(*fbio)->repair_count); + + repair_bio = bio_alloc_bioset(NULL, 1, REQ_OP_READ, GFP_NOFS, + &btrfs_repair_bioset); + repair_bio->bi_iter.bi_sector = failed_bbio->saved_iter.bi_sector; + bio_add_page(repair_bio, bv->bv_page, bv->bv_len, bv->bv_offset); + + repair_bbio = btrfs_bio(repair_bio); + btrfs_bio_init(repair_bbio, failed_bbio->inode, NULL, *fbio); + repair_bbio->file_offset = failed_bbio->file_offset + bio_offset; + + mirror = next_repair_mirror(*fbio, failed_bbio->mirror_num); + btrfs_debug(fs_info, "submitting repair read to mirror %d", mirror); + btrfs_submit_bio(fs_info, repair_bio, mirror); +} + +static void btrfs_check_read_bio(struct btrfs_bio *bbio, + struct btrfs_device *dev) +{ + struct inode *inode = bbio->inode; + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + unsigned int sectorsize = fs_info->sectorsize; + struct bvec_iter *iter = &bbio->saved_iter; + blk_status_t status = bbio->bio.bi_status; + struct btrfs_failed_bio *fbio = NULL; + u32 offset = 0; + + /* + * Hand off repair bios to the repair code as there is no upper level + * submitter for them. + */ + if (unlikely(bbio->bio.bi_pool == &btrfs_repair_bioset)) { + btrfs_end_repair_bio(bbio, dev); + return; + } + + /* Metadata reads are checked and repaired by the submitter */ + if (bbio->bio.bi_opf & REQ_META) + goto done; + + /* Clear the I/O error. A failed repair will reset it */ + bbio->bio.bi_status = BLK_STS_OK; + + while (iter->bi_size) { + struct bio_vec bv = bio_iter_iovec(&bbio->bio, *iter); + + bv.bv_len = min(bv.bv_len, sectorsize); + if (status || !btrfs_data_csum_ok(bbio, dev, offset, &bv)) + repair_one_sector(bbio, offset, &bv, &fbio); + + bio_advance_iter_single(&bbio->bio, iter, sectorsize); + offset += sectorsize; + } + + if (bbio->csum != bbio->csum_inline) + kfree(bbio->csum); +done: + if (unlikely(fbio)) + btrfs_repair_done(fbio); + else + bbio->end_io(bbio); +} + static void btrfs_log_dev_io_error(struct bio *bio, struct btrfs_device *dev) { if (!dev || !dev->bdev) @@ -6716,18 +6888,19 @@ static void btrfs_end_bio_work(struct work_struct *work) struct btrfs_bio *bbio = container_of(work, struct btrfs_bio, end_io_work); - bbio->end_io(bbio); + btrfs_check_read_bio(bbio, bbio->bio.bi_private); } static void btrfs_simple_end_io(struct bio *bio) { - struct btrfs_fs_info *fs_info = bio->bi_private; struct btrfs_bio *bbio = btrfs_bio(bio); + struct btrfs_device *dev = bio->bi_private; + struct btrfs_fs_info *fs_info = btrfs_sb(bbio->inode->i_sb); btrfs_bio_counter_dec(fs_info); if (bio->bi_status) - btrfs_log_dev_io_error(bio, bbio->device); + btrfs_log_dev_io_error(bio, dev); if (bio_op(bio) == REQ_OP_READ) { INIT_WORK(&bbio->end_io_work, btrfs_end_bio_work); @@ -6744,7 +6917,10 @@ static void btrfs_raid56_end_io(struct bio *bio) btrfs_bio_counter_dec(bioc->fs_info); bbio->mirror_num = bioc->mirror_num; - bbio->end_io(bbio); + if (bio_op(bio) == REQ_OP_READ) + btrfs_check_read_bio(bbio, NULL); + else + bbio->end_io(bbio); btrfs_put_bioc(bioc); } @@ -6852,6 +7028,7 @@ static void btrfs_submit_mirrored_bio(struct btrfs_io_context *bioc, int dev_nr) void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror_num) { + struct btrfs_bio *bbio = btrfs_bio(bio); u64 logical = bio->bi_iter.bi_sector << 9; u64 length = bio->bi_iter.bi_size; u64 map_length = length; @@ -6862,11 +7039,8 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror btrfs_bio_counter_inc_blocked(fs_info); ret = __btrfs_map_block(fs_info, btrfs_op(bio), logical, &map_length, &bioc, &smap, &mirror_num, 1); - if (ret) { - btrfs_bio_counter_dec(fs_info); - btrfs_bio_end_io(btrfs_bio(bio), errno_to_blk_status(ret)); - return; - } + if (ret) + goto fail; if (map_length < length) { btrfs_crit(fs_info, @@ -6875,12 +7049,22 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror BUG(); } + /* + * Save the iter for the end_io handler and preload the checksums for + * data reads. + */ + if (bio_op(bio) == REQ_OP_READ && !(bio->bi_opf & REQ_META)) { + bbio->saved_iter = bio->bi_iter; + ret = btrfs_lookup_bio_sums(bbio); + if (ret) + goto fail; + } + if (!bioc) { /* Single mirror read/write fast path */ btrfs_bio(bio)->mirror_num = mirror_num; - btrfs_bio(bio)->device = smap.dev; bio->bi_iter.bi_sector = smap.physical >> SECTOR_SHIFT; - bio->bi_private = fs_info; + bio->bi_private = smap.dev; bio->bi_end_io = btrfs_simple_end_io; btrfs_submit_dev_bio(smap.dev, bio); } else if (bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) { @@ -6900,6 +7084,11 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror for (dev_nr = 0; dev_nr < total_devs; dev_nr++) btrfs_submit_mirrored_bio(bioc, dev_nr); } + + return; +fail: + btrfs_bio_counter_dec(fs_info); + btrfs_bio_end_io(bbio, errno_to_blk_status(ret)); } /* @@ -8499,10 +8688,25 @@ int __init btrfs_bioset_init(void) offsetof(struct btrfs_bio, bio), BIOSET_NEED_BVECS)) return -ENOMEM; + if (bioset_init(&btrfs_repair_bioset, BIO_POOL_SIZE, + offsetof(struct btrfs_bio, bio), + BIOSET_NEED_BVECS)) + goto out_free_bioset; + if (mempool_init_kmalloc_pool(&btrfs_failed_bio_pool, BIO_POOL_SIZE, + sizeof(struct btrfs_failed_bio))) + goto out_free_repair_bioset; return 0; + +out_free_repair_bioset: + bioset_exit(&btrfs_repair_bioset); +out_free_bioset: + bioset_exit(&btrfs_bioset); + return -ENOMEM; } void __cold btrfs_bioset_exit(void) { + mempool_exit(&btrfs_failed_bio_pool); + bioset_exit(&btrfs_repair_bioset); bioset_exit(&btrfs_bioset); } diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index b368356fa78a1..58c4156caa736 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -364,27 +364,28 @@ struct btrfs_fs_devices { typedef void (*btrfs_bio_end_io_t)(struct btrfs_bio *bbio); /* - * Additional info to pass along bio. - * - * Mostly for btrfs specific features like csum and mirror_num. + * Highlevel btrfs I/O structure. It is allocated by btrfs_bio_alloc and + * passed to btrfs_submit_bio for mapping to the physical devices. */ struct btrfs_bio { - unsigned int mirror_num; - - /* for direct I/O */ + /* Inode and offset into it that this I/O operates on. */ + struct inode *inode; u64 file_offset; - /* @device is for stripe IO submission. */ - struct btrfs_device *device; + /* + * Checksumming and original I/O information for internal use in the + * btrfs_submit_bio machinery. + */ u8 *csum; u8 csum_inline[BTRFS_BIO_INLINE_CSUM_SIZE]; - struct bvec_iter iter; + struct bvec_iter saved_iter; /* End I/O information supplied to btrfs_bio_alloc */ btrfs_bio_end_io_t end_io; void *private; - /* For read end I/O handling */ + /* For internal use in read end I/O handling */ + unsigned int mirror_num; struct work_struct end_io_work; /* @@ -403,8 +404,10 @@ int __init btrfs_bioset_init(void); void __cold btrfs_bioset_exit(void); struct bio *btrfs_bio_alloc(unsigned int nr_vecs, blk_opf_t opf, - btrfs_bio_end_io_t end_io, void *private); + struct inode *inode, btrfs_bio_end_io_t end_io, + void *private); struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size, + struct inode *inode, btrfs_bio_end_io_t end_io, void *private); static inline void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status) @@ -413,30 +416,6 @@ static inline void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status) bbio->end_io(bbio); } -static inline void btrfs_bio_free_csum(struct btrfs_bio *bbio) -{ - if (bbio->csum != bbio->csum_inline) { - kfree(bbio->csum); - bbio->csum = NULL; - } -} - -/* - * Iterate through a btrfs_bio (@bbio) on a per-sector basis. - * - * bvl - struct bio_vec - * bbio - struct btrfs_bio - * iters - struct bvec_iter - * bio_offset - unsigned int - */ -#define btrfs_bio_for_each_sector(fs_info, bvl, bbio, iter, bio_offset) \ - for ((iter) = (bbio)->iter, (bio_offset) = 0; \ - (iter).bi_size && \ - (((bvl) = bio_iter_iovec((&(bbio)->bio), (iter))), 1); \ - (bio_offset) += fs_info->sectorsize, \ - bio_advance_iter_single(&(bbio)->bio, &(iter), \ - (fs_info)->sectorsize)) - struct btrfs_io_stripe { struct btrfs_device *dev; union { diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index f8a4118b16574..ed50e81174bf4 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -84,7 +84,6 @@ struct raid56_bio_trace_info; EM( IO_TREE_FS_EXCLUDED_EXTENTS, "EXCLUDED_EXTENTS") \ EM( IO_TREE_BTREE_INODE_IO, "BTREE_INODE_IO") \ EM( IO_TREE_INODE_IO, "INODE_IO") \ - EM( IO_TREE_INODE_IO_FAILURE, "INODE_IO_FAILURE") \ EM( IO_TREE_RELOC_BLOCKS, "RELOC_BLOCKS") \ EM( IO_TREE_TRANS_DIRTY_PAGES, "TRANS_DIRTY_PAGES") \ EM( IO_TREE_ROOT_DIRTY_LOG_PAGES, "ROOT_DIRTY_LOG_PAGES") \ From patchwork Thu Sep 1 07:42:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12962034 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21D1FC54EE9 for ; Thu, 1 Sep 2022 07:43:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234028AbiIAHnJ (ORCPT ); Thu, 1 Sep 2022 03:43:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233696AbiIAHmy (ORCPT ); Thu, 1 Sep 2022 03:42:54 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 72033125B84; Thu, 1 Sep 2022 00:42:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=mn7IyLnu9OyHB9Afty6QvRyM2MaZlJlCS5YdvUAQ14Y=; b=PqPSgQNF3R4TbkeB1FuC8l1tvp bV97FNLFFBVocnSgZ9IwOF+CVxiWt4cSeK09oG6DtW470CqC7czHCt+oK54OV0EZ0p/GKy+lMg0bq x1TyvU6tlbDMsmeNxm9L6sA1keVLReZp5STEHkwVD1YsuXcFWbl6JbFDzKMm0Lrm/FJmrRxrRxiBs y0cWD+gVJv8bgS3eMQPhf+JYSNncH9TyzD2uwjzwDq0xdNHim8e0EMj9JNxRPkxKwtbKm2Q0uvGh7 W3aUOiRb9msuFtSsaBMqKzbuD3hyScsKzkwXv5boE7VhhwzPC+ev4/4M82NPJDo75YZTU7qxAIG6g jyd0J9Dg==; Received: from 213-225-1-14.nat.highway.a1.net ([213.225.1.14] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1oTeqX-00ANak-Du; Thu, 01 Sep 2022 07:42:42 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Damien Le Moal , Naohiro Aota , Johannes Thumshirn , Qu Wenruo , Jens Axboe , "Darrick J. Wong" , linux-block@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 05/17] btrfs: handle checksum generation in the storage layer Date: Thu, 1 Sep 2022 10:42:04 +0300 Message-Id: <20220901074216.1849941-6-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220901074216.1849941-1-hch@lst.de> References: <20220901074216.1849941-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Instead of letting the callers of btrfs_submit_bio deal with checksumming the (meta)data in the bio and making decisions on when to offload the checksumming to the bio, leave that to btrfs_submit_bio. Do do so the existing btrfs_submit_bio function is split into an upper and a lower half, so that the lower half can be offloaded to a workqueue. The driver-private REQ_DRV flag is used to indicate the special 'bio must be contained in a single ordered extent case' that is used by the compressed write case instead of passing a new flag all the way down the stack. Note that this changes the behavior for direct writes to raid56 volumes so that async checksum offloading is not skipped when more I/O is expected. This runs counter to the argument explaining why it was done, although I can't measure any affects of the change. Commits later in this series will make sure the entire direct writes is offloaded to the workqueue at once and thus make sure it is sent to the raid56 code from a single thread. Signed-off-by: Christoph Hellwig Reviewed-by: Josef Bacik --- fs/btrfs/compression.c | 13 +-- fs/btrfs/ctree.h | 4 +- fs/btrfs/disk-io.c | 170 ++------------------------------- fs/btrfs/disk-io.h | 5 - fs/btrfs/extent_io.h | 3 - fs/btrfs/file-item.c | 25 ++--- fs/btrfs/inode.c | 89 +----------------- fs/btrfs/volumes.c | 208 ++++++++++++++++++++++++++++++++++++----- fs/btrfs/volumes.h | 7 +- 9 files changed, 215 insertions(+), 309 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index f932415a4f1df..53f9e123712b0 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -351,9 +351,9 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, u64 cur_disk_bytenr = disk_start; u64 next_stripe_start; blk_status_t ret = BLK_STS_OK; - int skip_sum = inode->flags & BTRFS_INODE_NODATASUM; const bool use_append = btrfs_use_zone_append(inode, disk_start); - const enum req_op bio_op = use_append ? REQ_OP_ZONE_APPEND : REQ_OP_WRITE; + const enum req_op bio_op = REQ_BTRFS_ONE_ORDERED | + (use_append ? REQ_OP_ZONE_APPEND : REQ_OP_WRITE); ASSERT(IS_ALIGNED(start, fs_info->sectorsize) && IS_ALIGNED(len, fs_info->sectorsize)); @@ -431,15 +431,8 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, submit = true; if (submit) { - if (!skip_sum) { - ret = btrfs_csum_one_bio(inode, bio, start, true); - if (ret) { - btrfs_bio_end_io(btrfs_bio(bio), ret); - break; - } - } - ASSERT(bio->bi_iter.bi_size); + btrfs_bio(bio)->file_offset = start; btrfs_submit_bio(fs_info, bio, 0); bio = NULL; } diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 3dcb0d5f8faa0..33c3c394e43e3 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3355,8 +3355,8 @@ int btrfs_lookup_file_extent(struct btrfs_trans_handle *trans, int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct btrfs_ordered_sum *sums); -blk_status_t btrfs_csum_one_bio(struct btrfs_inode *inode, struct bio *bio, - u64 offset, bool one_ordered); +int btrfs_csum_one_bio(struct btrfs_bio *bbio); +int btree_csum_one_bio(struct btrfs_bio *bbio); int btrfs_lookup_csums_range(struct btrfs_root *root, u64 start, u64 end, struct list_head *list, int search_commit); void btrfs_extent_item_to_extent_map(struct btrfs_inode *inode, diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index a88d6c3b59042..ceee039b65ea0 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -69,23 +69,6 @@ static void btrfs_free_csum_hash(struct btrfs_fs_info *fs_info) crypto_free_shash(fs_info->csum_shash); } -/* - * async submit bios are used to offload expensive checksumming - * onto the worker threads. They checksum file and metadata bios - * just before they are sent down the IO stack. - */ -struct async_submit_bio { - struct inode *inode; - struct bio *bio; - extent_submit_bio_start_t *submit_bio_start; - int mirror_num; - - /* Optional parameter for submit_bio_start used by direct io */ - u64 dio_file_offset; - struct btrfs_work work; - blk_status_t status; -}; - /* * Compute the csum of a btree block and store the result to provided buffer. */ @@ -649,161 +632,26 @@ int btrfs_validate_metadata_buffer(struct btrfs_bio *bbio, return ret; } -static void run_one_async_start(struct btrfs_work *work) -{ - struct async_submit_bio *async; - blk_status_t ret; - - async = container_of(work, struct async_submit_bio, work); - ret = async->submit_bio_start(async->inode, async->bio, - async->dio_file_offset); - if (ret) - async->status = ret; -} - -/* - * In order to insert checksums into the metadata in large chunks, we wait - * until bio submission time. All the pages in the bio are checksummed and - * sums are attached onto the ordered extent record. - * - * At IO completion time the csums attached on the ordered extent record are - * inserted into the tree. - */ -static void run_one_async_done(struct btrfs_work *work) -{ - struct async_submit_bio *async = - container_of(work, struct async_submit_bio, work); - struct inode *inode = async->inode; - struct btrfs_bio *bbio = btrfs_bio(async->bio); - - /* If an error occurred we just want to clean up the bio and move on */ - if (async->status) { - btrfs_bio_end_io(bbio, async->status); - return; - } - - /* - * All of the bios that pass through here are from async helpers. - * Use REQ_CGROUP_PUNT to issue them from the owning cgroup's context. - * This changes nothing when cgroups aren't in use. - */ - async->bio->bi_opf |= REQ_CGROUP_PUNT; - btrfs_submit_bio(btrfs_sb(inode->i_sb), async->bio, async->mirror_num); -} - -static void run_one_async_free(struct btrfs_work *work) -{ - struct async_submit_bio *async; - - async = container_of(work, struct async_submit_bio, work); - kfree(async); -} - -/* - * Submit bio to an async queue. - * - * Retrun: - * - true if the work has been succesfuly submitted - * - false in case of error - */ -bool btrfs_wq_submit_bio(struct inode *inode, struct bio *bio, int mirror_num, - u64 dio_file_offset, - extent_submit_bio_start_t *submit_bio_start) -{ - struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; - struct async_submit_bio *async; - - async = kmalloc(sizeof(*async), GFP_NOFS); - if (!async) - return false; - - async->inode = inode; - async->bio = bio; - async->mirror_num = mirror_num; - async->submit_bio_start = submit_bio_start; - - btrfs_init_work(&async->work, run_one_async_start, run_one_async_done, - run_one_async_free); - - async->dio_file_offset = dio_file_offset; - - async->status = 0; - - if (op_is_sync(bio->bi_opf)) - btrfs_queue_work(fs_info->hipri_workers, &async->work); - else - btrfs_queue_work(fs_info->workers, &async->work); - return true; -} - -static blk_status_t btree_csum_one_bio(struct bio *bio) +int btree_csum_one_bio(struct btrfs_bio *bbio) { - struct bio_vec *bvec; - struct btrfs_root *root; - int ret = 0; - struct bvec_iter_all iter_all; + struct btrfs_fs_info *fs_info = btrfs_sb(bbio->inode->i_sb); + struct bvec_iter iter; + struct bio_vec bvec; + int ret; - ASSERT(!bio_flagged(bio, BIO_CLONED)); - bio_for_each_segment_all(bvec, bio, iter_all) { - root = BTRFS_I(bvec->bv_page->mapping->host)->root; - ret = csum_dirty_buffer(root->fs_info, bvec); + bio_for_each_segment(bvec, &bbio->bio, iter) { + ret = csum_dirty_buffer(fs_info, &bvec); if (ret) break; } - return errno_to_blk_status(ret); -} - -static blk_status_t btree_submit_bio_start(struct inode *inode, struct bio *bio, - u64 dio_file_offset) -{ - /* - * when we're called for a write, we're already in the async - * submission context. Just jump into btrfs_submit_bio. - */ - return btree_csum_one_bio(bio); -} - -static bool should_async_write(struct btrfs_fs_info *fs_info, - struct btrfs_inode *bi) -{ - if (btrfs_is_zoned(fs_info)) - return false; - if (atomic_read(&bi->sync_writers)) - return false; - if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags)) - return false; - return true; + return ret; } void btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio, int mirror_num) { - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct btrfs_bio *bbio = btrfs_bio(bio); - blk_status_t ret; - bio->bi_opf |= REQ_META; - - if (btrfs_op(bio) != BTRFS_MAP_WRITE) { - btrfs_submit_bio(fs_info, bio, mirror_num); - return; - } - - /* - * Kthread helpers are used to submit writes so that checksumming can - * happen in parallel across all CPUs. - */ - if (should_async_write(fs_info, BTRFS_I(inode)) && - btrfs_wq_submit_bio(inode, bio, mirror_num, 0, btree_submit_bio_start)) - return; - - ret = btree_csum_one_bio(bio); - if (ret) { - btrfs_bio_end_io(bbio, ret); - return; - } - - btrfs_submit_bio(fs_info, bio, mirror_num); + btrfs_submit_bio(btrfs_sb(inode->i_sb), bio, mirror_num); } #ifdef CONFIG_MIGRATION diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h index 47ad8e0a2d33f..9d4e0e36f7bb9 100644 --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -114,11 +114,6 @@ int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid, int atomic); int btrfs_read_extent_buffer(struct extent_buffer *buf, u64 parent_transid, int level, struct btrfs_key *first_key); -bool btrfs_wq_submit_bio(struct inode *inode, struct bio *bio, int mirror_num, - u64 dio_file_offset, - extent_submit_bio_start_t *submit_bio_start); -blk_status_t btrfs_submit_bio_done(void *private_data, struct bio *bio, - int mirror_num); int btrfs_alloc_log_tree_node(struct btrfs_trans_handle *trans, struct btrfs_root *root); int btrfs_init_log_root_tree(struct btrfs_trans_handle *trans, diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index caf3343d1a36c..ddbeba7c6118a 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -62,9 +62,6 @@ struct btrfs_inode; struct btrfs_fs_info; struct extent_io_tree; -typedef blk_status_t (extent_submit_bio_start_t)(struct inode *inode, - struct bio *bio, u64 dio_file_offset); - #define INLINE_EXTENT_BUFFER_PAGES (BTRFS_MAX_METADATA_BLOCKSIZE / PAGE_SIZE) struct extent_buffer { u64 start; diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index ffbac8f257908..5b3279e38665b 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -613,23 +613,17 @@ int btrfs_lookup_csums_range(struct btrfs_root *root, u64 start, u64 end, /** * Calculate checksums of the data contained inside a bio - * - * @inode: Owner of the data inside the bio - * @bio: Contains the data to be checksummed - * @offset: If (u64)-1, @bio may contain discontiguous bio vecs, so the - * file offsets are determined from the page offsets in the bio. - * Otherwise, this is the starting file offset of the bio vecs in - * @bio, which must be contiguous. - * @one_ordered: If true, @bio only refers to one ordered extent. + * @bbio: Contains the data to be checksummed */ -blk_status_t btrfs_csum_one_bio(struct btrfs_inode *inode, struct bio *bio, - u64 offset, bool one_ordered) +int btrfs_csum_one_bio(struct btrfs_bio *bbio) { + struct btrfs_inode *inode = BTRFS_I(bbio->inode); struct btrfs_fs_info *fs_info = inode->root->fs_info; SHASH_DESC_ON_STACK(shash, fs_info->csum_shash); + struct bio *bio = &bbio->bio; + u64 offset = bbio->file_offset; struct btrfs_ordered_sum *sums; struct btrfs_ordered_extent *ordered = NULL; - const bool use_page_offsets = (offset == (u64)-1); char *data; struct bvec_iter iter; struct bio_vec bvec; @@ -646,7 +640,7 @@ blk_status_t btrfs_csum_one_bio(struct btrfs_inode *inode, struct bio *bio, memalloc_nofs_restore(nofs_flag); if (!sums) - return BLK_STS_RESOURCE; + return -ENOMEM; sums->len = bio->bi_iter.bi_size; INIT_LIST_HEAD(&sums->list); @@ -657,9 +651,6 @@ blk_status_t btrfs_csum_one_bio(struct btrfs_inode *inode, struct bio *bio, shash->tfm = fs_info->csum_shash; bio_for_each_segment(bvec, bio, iter) { - if (use_page_offsets) - offset = page_offset(bvec.bv_page) + bvec.bv_offset; - if (!ordered) { ordered = btrfs_lookup_ordered_extent(inode, offset); /* @@ -672,7 +663,7 @@ blk_status_t btrfs_csum_one_bio(struct btrfs_inode *inode, struct bio *bio, inode->root->root_key.objectid, btrfs_ino(inode), offset); kvfree(sums); - return BLK_STS_IOERR; + return -EIO; } } @@ -681,7 +672,7 @@ blk_status_t btrfs_csum_one_bio(struct btrfs_inode *inode, struct bio *bio, - 1); for (i = 0; i < blockcount; i++) { - if (!one_ordered && + if (!(bio->bi_opf & REQ_BTRFS_ONE_ORDERED) && !in_range(offset, ordered->file_offset, ordered->num_bytes)) { unsigned long bytes_left; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index b3466015008c7..88dd99997631a 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2500,20 +2500,6 @@ void btrfs_clear_delalloc_extent(struct inode *vfs_inode, } } -/* - * in order to insert checksums into the metadata in large chunks, - * we wait until bio submission time. All the pages in the bio are - * checksummed and sums are attached onto the ordered extent record. - * - * At IO completion time the cums attached on the ordered extent record - * are inserted into the btree - */ -static blk_status_t btrfs_submit_bio_start(struct inode *inode, struct bio *bio, - u64 dio_file_offset) -{ - return btrfs_csum_one_bio(BTRFS_I(inode), bio, (u64)-1, false); -} - /* * Split an extent_map at [start, start + len] * @@ -2704,28 +2690,6 @@ void btrfs_submit_data_write_bio(struct inode *inode, struct bio *bio, int mirro } } - /* - * If we need to checksum, and the I/O is not issued by fsync and - * friends, that is ->sync_writers != 0, defer the submission to a - * workqueue to parallelize it. - * - * Csum items for reloc roots have already been cloned at this point, - * so they are handled as part of the no-checksum case. - */ - if (!(bi->flags & BTRFS_INODE_NODATASUM) && - !test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state) && - !btrfs_is_data_reloc_root(bi->root)) { - if (!atomic_read(&bi->sync_writers) && - btrfs_wq_submit_bio(inode, bio, mirror_num, 0, - btrfs_submit_bio_start)) - return; - - ret = btrfs_csum_one_bio(bi, bio, (u64)-1, false); - if (ret) { - btrfs_bio_end_io(btrfs_bio(bio), ret); - return; - } - } btrfs_submit_bio(fs_info, bio, mirror_num); } @@ -7885,13 +7849,6 @@ static void btrfs_dio_private_put(struct btrfs_dio_private *dip) bio_endio(&dip->bio); } -static blk_status_t btrfs_submit_bio_start_direct_io(struct inode *inode, - struct bio *bio, - u64 dio_file_offset) -{ - return btrfs_csum_one_bio(BTRFS_I(inode), bio, dio_file_offset, false); -} - static void btrfs_end_dio_bio(struct btrfs_bio *bbio) { struct btrfs_dio_private *dip = bbio->private; @@ -7913,36 +7870,6 @@ static void btrfs_end_dio_bio(struct btrfs_bio *bbio) btrfs_dio_private_put(dip); } -static void btrfs_submit_dio_bio(struct bio *bio, struct inode *inode, - u64 file_offset, int async_submit) -{ - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - blk_status_t ret; - - if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) - goto map; - - if (btrfs_op(bio) == BTRFS_MAP_WRITE) { - /* Check btrfs_submit_data_write_bio() for async submit rules */ - if (async_submit && !atomic_read(&BTRFS_I(inode)->sync_writers) && - btrfs_wq_submit_bio(inode, bio, 0, file_offset, - btrfs_submit_bio_start_direct_io)) - return; - - /* - * If we aren't doing async submit, calculate the csum of the - * bio now. - */ - ret = btrfs_csum_one_bio(BTRFS_I(inode), bio, file_offset, false); - if (ret) { - btrfs_bio_end_io(btrfs_bio(bio), ret); - return; - } - } -map: - btrfs_submit_bio(fs_info, bio, 0); -} - static void btrfs_submit_direct(const struct iomap_iter *iter, struct bio *dio_bio, loff_t file_offset) { @@ -7950,11 +7877,8 @@ static void btrfs_submit_direct(const struct iomap_iter *iter, container_of(dio_bio, struct btrfs_dio_private, bio); struct inode *inode = iter->inode; struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - const bool raid56 = (btrfs_data_alloc_profile(fs_info) & - BTRFS_BLOCK_GROUP_RAID56_MASK); struct bio *bio; u64 start_sector; - int async_submit = 0; u64 submit_len; u64 clone_offset = 0; u64 clone_len; @@ -8020,19 +7944,10 @@ static void btrfs_submit_direct(const struct iomap_iter *iter, * We transfer the initial reference to the last bio, so we * don't need to increment the reference count for the last one. */ - if (submit_len > 0) { + if (submit_len > 0) refcount_inc(&dip->refs); - /* - * If we are submitting more than one bio, submit them - * all asynchronously. The exception is RAID 5 or 6, as - * asynchronous checksums make it difficult to collect - * full stripe writes. - */ - if (!raid56) - async_submit = 1; - } - btrfs_submit_dio_bio(bio, inode, file_offset, async_submit); + btrfs_submit_bio(fs_info, bio, 0); dio_data->submitted += clone_len; clone_offset += clone_len; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index b8472ab466abe..2d13e8b52c94f 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -7026,7 +7026,170 @@ static void btrfs_submit_mirrored_bio(struct btrfs_io_context *bioc, int dev_nr) btrfs_submit_dev_bio(bioc->stripes[dev_nr].dev, bio); } -void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror_num) +static void __btrfs_submit_bio(struct bio *bio, struct btrfs_io_context *bioc, + struct btrfs_io_stripe *smap, int mirror_num) +{ + /* Do not leak our private flag into the block layer */ + bio->bi_opf &= ~REQ_BTRFS_ONE_ORDERED; + + if (!bioc) { + /* Single mirror read/write fast path */ + btrfs_bio(bio)->mirror_num = mirror_num; + bio->bi_iter.bi_sector = smap->physical >> SECTOR_SHIFT; + bio->bi_private = smap->dev; + bio->bi_end_io = btrfs_simple_end_io; + btrfs_submit_dev_bio(smap->dev, bio); + } else if (bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) { + /* Parity RAID write or read recovery */ + bio->bi_private = bioc; + bio->bi_end_io = btrfs_raid56_end_io; + if (bio_op(bio) == REQ_OP_READ) + raid56_parity_recover(bio, bioc, mirror_num); + else + raid56_parity_write(bio, bioc); + } else { + /* Write to multiple mirrors */ + int total_devs = bioc->num_stripes; + int dev_nr; + + bioc->orig_bio = bio; + for (dev_nr = 0; dev_nr < total_devs; dev_nr++) + btrfs_submit_mirrored_bio(bioc, dev_nr); + } +} + +/* + * async submit bios are used to offload expensive checksumming + * onto the worker threads. + */ +struct async_submit_bio { + struct btrfs_bio *bbio; + struct btrfs_io_context *bioc; + struct btrfs_io_stripe smap; + int mirror_num; + struct btrfs_work work; +}; + +/* + * In order to insert checksums into the metadata in large chunks, + * we wait until bio submission time. All the pages in the bio are + * checksummed and sums are attached onto the ordered extent record. + * + * At IO completion time the cums attached on the ordered extent record + * are inserted into the btree + */ +static void run_one_async_start(struct btrfs_work *work) +{ + struct async_submit_bio *async = + container_of(work, struct async_submit_bio, work); + struct btrfs_bio *bbio = async->bbio; + blk_status_t ret; + + if (bbio->bio.bi_opf & REQ_META) + ret = btree_csum_one_bio(bbio); + else + ret = btrfs_csum_one_bio(bbio); + if (ret) + bbio->bio.bi_status = errno_to_blk_status(ret); +} + +/* + * In order to insert checksums into the metadata in large chunks, we wait + * until bio submission time. All the pages in the bio are checksummed and + * sums are attached onto the ordered extent record. + * + * At IO completion time the csums attached on the ordered extent record are + * inserted into the tree. + */ +static void run_one_async_done(struct btrfs_work *work) +{ + struct async_submit_bio *async = + container_of(work, struct async_submit_bio, work); + struct bio *bio = &async->bbio->bio; + + /* If an error occurred we just want to clean up the bio and move on */ + if (bio->bi_status) { + btrfs_bio_end_io(async->bbio, bio->bi_status); + return; + } + + /* + * All of the bios that pass through here are from async helpers. + * Use REQ_CGROUP_PUNT to issue them from the owning cgroup's context. + * This changes nothing when cgroups aren't in use. + */ + bio->bi_opf |= REQ_CGROUP_PUNT; + __btrfs_submit_bio(bio, async->bioc, &async->smap, async->mirror_num); +} + +static void run_one_async_free(struct btrfs_work *work) +{ + kfree(container_of(work, struct async_submit_bio, work)); +} + +static bool should_async_write(struct btrfs_bio *bbio) +{ + struct btrfs_inode *bi = BTRFS_I(bbio->inode); + + /* + * If the I/O is not issued by fsync and friends, (->sync_writers != 0), + * then try to defer the submission to a workqueue to parallelize the + * checksum calculation. + */ + if (atomic_read(&bi->sync_writers)) + return false; + + /* + * Submit metadata writes synchronously if the checksum implementation + * is fast, or we are on a zoned device that wants I/O to be submitted + * in order. + */ + if (bbio->bio.bi_opf & REQ_META) { + struct btrfs_fs_info *fs_info = bi->root->fs_info; + + if (btrfs_is_zoned(fs_info)) + return false; + if (test_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags)) + return false; + } + + return true; +} + +/* + * Submit bio to an async queue. + * + * Retrun: + * - true if the work has been succesfuly submitted + * - false in case of error + */ +static bool btrfs_wq_submit_bio(struct btrfs_bio *bbio, + struct btrfs_io_context *bioc, + struct btrfs_io_stripe *smap, int mirror_num) +{ + struct btrfs_fs_info *fs_info = btrfs_sb(bbio->inode->i_sb); + struct async_submit_bio *async; + + async = kmalloc(sizeof(*async), GFP_NOFS); + if (!async) + return false; + + async->bbio = bbio; + async->bioc = bioc; + async->smap = *smap; + async->mirror_num = mirror_num; + + btrfs_init_work(&async->work, run_one_async_start, run_one_async_done, + run_one_async_free); + if (op_is_sync(bbio->bio.bi_opf)) + btrfs_queue_work(fs_info->hipri_workers, &async->work); + else + btrfs_queue_work(fs_info->workers, &async->work); + return true; +} + +void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, + int mirror_num) { struct btrfs_bio *bbio = btrfs_bio(bio); u64 logical = bio->bi_iter.bi_sector << 9; @@ -7060,31 +7223,30 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror goto fail; } - if (!bioc) { - /* Single mirror read/write fast path */ - btrfs_bio(bio)->mirror_num = mirror_num; - bio->bi_iter.bi_sector = smap.physical >> SECTOR_SHIFT; - bio->bi_private = smap.dev; - bio->bi_end_io = btrfs_simple_end_io; - btrfs_submit_dev_bio(smap.dev, bio); - } else if (bioc->map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) { - /* Parity RAID write or read recovery */ - bio->bi_private = bioc; - bio->bi_end_io = btrfs_raid56_end_io; - if (bio_op(bio) == REQ_OP_READ) - raid56_parity_recover(bio, bioc, mirror_num); - else - raid56_parity_write(bio, bioc); - } else { - /* Write to multiple mirrors */ - int total_devs = bioc->num_stripes; - int dev_nr; + if (btrfs_op(bio) == BTRFS_MAP_WRITE) { + struct btrfs_inode *bi = BTRFS_I(bbio->inode); - bioc->orig_bio = bio; - for (dev_nr = 0; dev_nr < total_devs; dev_nr++) - btrfs_submit_mirrored_bio(bioc, dev_nr); + /* + * Csum items for reloc roots have already been cloned at this + * point, so they are handled as part of the no-checksum case. + */ + if (!(bi->flags & BTRFS_INODE_NODATASUM) && + !test_bit(BTRFS_FS_STATE_NO_CSUMS, &fs_info->fs_state) && + !btrfs_is_data_reloc_root(bi->root)) { + if (should_async_write(bbio) && + btrfs_wq_submit_bio(bbio, bioc, &smap, mirror_num)) + return; + + if (bio->bi_opf & REQ_META) + ret = btree_csum_one_bio(bbio); + else + ret = btrfs_csum_one_bio(bbio); + if (ret) + goto fail; + } } + __btrfs_submit_bio(bio, bioc, &smap, mirror_num); return; fail: btrfs_bio_counter_dec(fs_info); diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 58c4156caa736..8b248c9bd602b 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -576,7 +576,12 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info); struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans, u64 type); void btrfs_mapping_tree_free(struct extent_map_tree *tree); -void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror_num); + +/* bio only refers to one ordered extent */ +#define REQ_BTRFS_ONE_ORDERED REQ_DRV + +void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, + int mirror_num); int btrfs_repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start, u64 length, u64 logical, struct page *page, unsigned int pg_offset, int mirror_num); From patchwork Thu Sep 1 07:42:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12962035 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7CFAFECAAD1 for ; Thu, 1 Sep 2022 07:43:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233696AbiIAHnM (ORCPT ); Thu, 1 Sep 2022 03:43:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34134 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234039AbiIAHmz (ORCPT ); Thu, 1 Sep 2022 03:42:55 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 82B9B11E821; Thu, 1 Sep 2022 00:42:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=n33iwIKld7aEIuI5+A0J12bO2Y5oj8H2AV3grZDE9Po=; b=tNtruTtNwaODP5yLbgU7AgK7UP zjHy7T1JFIfCRUSjK153xaVpBrt++1TJGOeD9nW/2Teahoi7i7TTqjrc/xP72lHq6t6RwzWDYkkSp /ornB7EKbbnqEGjhQFtLrWnEteD0Uoyh8XY0psK26rcaPFVjzIW53bnjp9iJPqO+zinBBpABLkthn YvvcwiIGEJUR7jOSVwA2+7hz5doPfatmA9vgLhapaF6TDeaOC83FnlbtrUs/jB9pipeFAZa4/FLHX GyZ+IZVciGAybVTV4k4UtRS1Vsx4KPjWlTbdLEAfVzHdj7vvlOD8ApW0hRCzS86ahatJITf8Jo4ox v3d602AA==; Received: from 213-225-1-14.nat.highway.a1.net ([213.225.1.14] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1oTeqb-00ANcF-KW; Thu, 01 Sep 2022 07:42:46 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Damien Le Moal , Naohiro Aota , Johannes Thumshirn , Qu Wenruo , Jens Axboe , "Darrick J. Wong" , linux-block@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 06/17] btrfs: handle recording of zoned writes in the storage layer Date: Thu, 1 Sep 2022 10:42:05 +0300 Message-Id: <20220901074216.1849941-7-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220901074216.1849941-1-hch@lst.de> References: <20220901074216.1849941-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Move the code that splits the ordered extents and records the physical location for them to the storage layer so that the higher level consumers don't have to care about physical block numbers at all. This will also allow to eventually remove accounting for the zone append write sizes in the upper layer with a little bit more block layer work. Signed-off-by: Christoph Hellwig Reviewed-by: Johannes Thumshirn Reviewed-by: Josef Bacik Reviewed-by: Naohiro Aota --- fs/btrfs/compression.c | 1 - fs/btrfs/extent_io.c | 6 ------ fs/btrfs/inode.c | 40 ++++++++-------------------------------- fs/btrfs/ordered-data.h | 1 + fs/btrfs/volumes.c | 8 ++++++++ fs/btrfs/zoned.c | 13 +++++-------- fs/btrfs/zoned.h | 6 ++---- 7 files changed, 24 insertions(+), 51 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 53f9e123712b0..1f10f86e70557 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -270,7 +270,6 @@ static void end_compressed_bio_write(struct btrfs_bio *bbio) if (refcount_dec_and_test(&cb->pending_ios)) { struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb); - btrfs_record_physical_zoned(cb->inode, cb->start, &bbio->bio); queue_work(fs_info->compressed_write_workers, &cb->write_end_work); } bio_put(&bbio->bio); diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index d8c43e2111a99..4c00bdefe5b45 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2285,7 +2285,6 @@ static void end_bio_extent_writepage(struct btrfs_bio *bbio) u64 start; u64 end; struct bvec_iter_all iter_all; - bool first_bvec = true; ASSERT(!bio_flagged(bio, BIO_CLONED)); bio_for_each_segment_all(bvec, bio, iter_all) { @@ -2307,11 +2306,6 @@ static void end_bio_extent_writepage(struct btrfs_bio *bbio) start = page_offset(page) + bvec->bv_offset; end = start + bvec->bv_len - 1; - if (first_bvec) { - btrfs_record_physical_zoned(inode, start, bio); - first_bvec = false; - } - end_extent_writepage(page, error, start, end); btrfs_page_clear_writeback(fs_info, page, start, bvec->bv_len); diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 88dd99997631a..03953c1f176dd 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2615,21 +2615,21 @@ static int split_zoned_em(struct btrfs_inode *inode, u64 start, u64 len, return ret; } -static blk_status_t extract_ordered_extent(struct btrfs_inode *inode, - struct bio *bio, loff_t file_offset) +int btrfs_extract_ordered_extent(struct btrfs_bio *bbio) { + u64 start = (u64)bbio->bio.bi_iter.bi_sector << SECTOR_SHIFT; + u64 len = bbio->bio.bi_iter.bi_size; + struct btrfs_inode *bi = BTRFS_I(bbio->inode); struct btrfs_ordered_extent *ordered; - u64 start = (u64)bio->bi_iter.bi_sector << SECTOR_SHIFT; u64 file_len; - u64 len = bio->bi_iter.bi_size; u64 end = start + len; u64 ordered_end; u64 pre, post; int ret = 0; - ordered = btrfs_lookup_ordered_extent(inode, file_offset); + ordered = btrfs_lookup_ordered_extent(bi, bbio->file_offset); if (WARN_ON_ONCE(!ordered)) - return BLK_STS_IOERR; + return -EIO; /* No need to split */ if (ordered->disk_num_bytes == len) @@ -2667,28 +2667,16 @@ static blk_status_t extract_ordered_extent(struct btrfs_inode *inode, ret = btrfs_split_ordered_extent(ordered, pre, post); if (ret) goto out; - ret = split_zoned_em(inode, file_offset, file_len, pre, post); + ret = split_zoned_em(bi, bbio->file_offset, file_len, pre, post); out: btrfs_put_ordered_extent(ordered); - - return errno_to_blk_status(ret); + return ret; } void btrfs_submit_data_write_bio(struct inode *inode, struct bio *bio, int mirror_num) { struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct btrfs_inode *bi = BTRFS_I(inode); - blk_status_t ret; - - if (bio_op(bio) == REQ_OP_ZONE_APPEND) { - ret = extract_ordered_extent(bi, bio, - page_offset(bio_first_bvec_all(bio)->bv_page)); - if (ret) { - btrfs_bio_end_io(btrfs_bio(bio), ret); - return; - } - } btrfs_submit_bio(fs_info, bio, mirror_num); } @@ -7864,8 +7852,6 @@ static void btrfs_end_dio_bio(struct btrfs_bio *bbio) dip->bio.bi_status = err; } - btrfs_record_physical_zoned(dip->inode, bbio->file_offset, bio); - bio_put(bio); btrfs_dio_private_put(dip); } @@ -7923,15 +7909,6 @@ static void btrfs_submit_direct(const struct iomap_iter *iter, inode, btrfs_end_dio_bio, dip); btrfs_bio(bio)->file_offset = file_offset; - if (bio_op(bio) == REQ_OP_ZONE_APPEND) { - status = extract_ordered_extent(BTRFS_I(inode), bio, - file_offset); - if (status) { - bio_put(bio); - goto out_err; - } - } - ASSERT(submit_len >= clone_len); submit_len -= clone_len; @@ -7960,7 +7937,6 @@ static void btrfs_submit_direct(const struct iomap_iter *iter, out_err_em: free_extent_map(em); -out_err: dio_bio->bi_status = status; btrfs_dio_private_put(dip); } diff --git a/fs/btrfs/ordered-data.h b/fs/btrfs/ordered-data.h index 87792f85e2c4a..0cef17f4b752f 100644 --- a/fs/btrfs/ordered-data.h +++ b/fs/btrfs/ordered-data.h @@ -220,6 +220,7 @@ void btrfs_lock_and_flush_ordered_range(struct btrfs_inode *inode, u64 start, struct extent_state **cached_state); int btrfs_split_ordered_extent(struct btrfs_ordered_extent *ordered, u64 pre, u64 post); +int btrfs_extract_ordered_extent(struct btrfs_bio *bbio); int __init ordered_data_init(void); void __cold ordered_data_exit(void); diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 2d13e8b52c94f..5c6535e10085d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6906,6 +6906,8 @@ static void btrfs_simple_end_io(struct bio *bio) INIT_WORK(&bbio->end_io_work, btrfs_end_bio_work); queue_work(btrfs_end_io_wq(fs_info, bio), &bbio->end_io_work); } else { + if (bio_op(bio) == REQ_OP_ZONE_APPEND) + btrfs_record_physical_zoned(bbio); bbio->end_io(bbio); } } @@ -7226,6 +7228,12 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, if (btrfs_op(bio) == BTRFS_MAP_WRITE) { struct btrfs_inode *bi = BTRFS_I(bbio->inode); + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + ret = btrfs_extract_ordered_extent(btrfs_bio(bio)); + if (ret) + goto fail; + } + /* * Csum items for reloc roots have already been cloned at this * point, so they are handled as part of the no-checksum case. diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index dc96b3331bfb7..2638f71eec4b6 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1633,21 +1633,18 @@ bool btrfs_use_zone_append(struct btrfs_inode *inode, u64 start) return ret; } -void btrfs_record_physical_zoned(struct inode *inode, u64 file_offset, - struct bio *bio) +void btrfs_record_physical_zoned(struct btrfs_bio *bbio) { + const u64 physical = bbio->bio.bi_iter.bi_sector << SECTOR_SHIFT; + struct btrfs_inode *bi = BTRFS_I(bbio->inode); struct btrfs_ordered_extent *ordered; - const u64 physical = bio->bi_iter.bi_sector << SECTOR_SHIFT; - if (bio_op(bio) != REQ_OP_ZONE_APPEND) - return; - - ordered = btrfs_lookup_ordered_extent(BTRFS_I(inode), file_offset); + ordered = btrfs_lookup_ordered_extent(bi, bbio->file_offset); if (WARN_ON(!ordered)) return; ordered->physical = physical; - ordered->bdev = bio->bi_bdev; + ordered->bdev = bbio->bio.bi_bdev; btrfs_put_ordered_extent(ordered); } diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index e17462db3a842..cafa639927050 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -55,8 +55,7 @@ void btrfs_redirty_list_add(struct btrfs_transaction *trans, struct extent_buffer *eb); void btrfs_free_redirty_list(struct btrfs_transaction *trans); bool btrfs_use_zone_append(struct btrfs_inode *inode, u64 start); -void btrfs_record_physical_zoned(struct inode *inode, u64 file_offset, - struct bio *bio); +void btrfs_record_physical_zoned(struct btrfs_bio *bbio); void btrfs_rewrite_logical_zoned(struct btrfs_ordered_extent *ordered); bool btrfs_check_meta_write_pointer(struct btrfs_fs_info *fs_info, struct extent_buffer *eb, @@ -178,8 +177,7 @@ static inline bool btrfs_use_zone_append(struct btrfs_inode *inode, u64 start) return false; } -static inline void btrfs_record_physical_zoned(struct inode *inode, - u64 file_offset, struct bio *bio) +static inline void btrfs_record_physical_zoned(struct btrfs_bio *bbio) { } From patchwork Thu Sep 1 07:42:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12962037 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52B72ECAAD8 for ; Thu, 1 Sep 2022 07:43:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233874AbiIAHng (ORCPT ); Thu, 1 Sep 2022 03:43:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34414 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234015AbiIAHm4 (ORCPT ); Thu, 1 Sep 2022 03:42:56 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 21ABC11EB44; Thu, 1 Sep 2022 00:42:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=JYdpoe6k7CY+uRGDzKFOf7z0fsAL04X/cS+MN4M+zIg=; b=LTZy62o35zMsEaIUWz6yHjx0aL 1pCR46hziQMA5hNBK05rGagU5JlDcJ/vZyWF0LV+7iceXt5aNg/3efxngoItoL4JotV1BQ80vKA5w MQlJXcJCO5q3Ga3P8p9WS8NDFVi44269xJaMUmHKuwcg00KaB5mteHEdFR2JbEbolheRKhWxTt/KU EHOupbERKoFN3e8AnQ49tnmDi4J+BIodRlTZ6yTzXm+yk5S2wFk5rIGRrMHodFjcl3WjPcxVIrlgO fahJ9YWO+u1iqLnKAQm9JZ8ThtfVxPsePcrNVh68t9cyoAcS5j9PLXhvUoMnR58WZWNmM/BTRI1Vz MMA54RlA==; Received: from 213-225-1-14.nat.highway.a1.net ([213.225.1.14] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1oTeqe-00ANco-VJ; Thu, 01 Sep 2022 07:42:49 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Damien Le Moal , Naohiro Aota , Johannes Thumshirn , Qu Wenruo , Jens Axboe , "Darrick J. Wong" , linux-block@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 07/17] btrfs: allow btrfs_submit_bio to split bios Date: Thu, 1 Sep 2022 10:42:06 +0300 Message-Id: <20220901074216.1849941-8-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220901074216.1849941-1-hch@lst.de> References: <20220901074216.1849941-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Currently the I/O submitters have to split bios according to the chunk stripe boundaries. This leads to extra lookups in the extent trees and a lot of boilerplate code. To drop this requirement, split the bio when __btrfs_map_block returns a mapping that is smaller than the requested size and keep a count of pending bios in the original btrfs_bio so that the upper level completion is only invoked when all clones have completed. Based on a patch from Qu Wenruo. Signed-off-by: Christoph Hellwig Reviewed-by: Johannes Thumshirn Reviewed-by: Qu Wenruo --- fs/btrfs/volumes.c | 106 +++++++++++++++++++++++++++++++++++++-------- fs/btrfs/volumes.h | 1 + 2 files changed, 90 insertions(+), 17 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 5c6535e10085d..0a2d144c20604 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -35,6 +35,7 @@ #include "zoned.h" static struct bio_set btrfs_bioset; +static struct bio_set btrfs_clone_bioset; static struct bio_set btrfs_repair_bioset; static mempool_t btrfs_failed_bio_pool; @@ -6661,6 +6662,7 @@ static void btrfs_bio_init(struct btrfs_bio *bbio, struct inode *inode, bbio->inode = inode; bbio->end_io = end_io; bbio->private = private; + atomic_set(&bbio->pending_ios, 1); } /* @@ -6698,6 +6700,57 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size, return bio; } +static struct bio *btrfs_split_bio(struct bio *orig, u64 map_length) +{ + struct btrfs_bio *orig_bbio = btrfs_bio(orig); + struct bio *bio; + + bio = bio_split(orig, map_length >> SECTOR_SHIFT, GFP_NOFS, + &btrfs_clone_bioset); + btrfs_bio_init(btrfs_bio(bio), orig_bbio->inode, NULL, orig_bbio); + + btrfs_bio(bio)->file_offset = orig_bbio->file_offset; + orig_bbio->file_offset += map_length; + + atomic_inc(&orig_bbio->pending_ios); + return bio; +} + +static void btrfs_orig_write_end_io(struct bio *bio); +static void btrfs_bbio_propagate_error(struct btrfs_bio *bbio, + struct btrfs_bio *orig_bbio) +{ + /* + * For writes btrfs tolerates nr_mirrors - 1 write failures, so we + * can't just blindly propagate a write failure here. + * Instead increment the error count in the original I/O context so + * that it is guaranteed to be larger than the error tolerance. + */ + if (bbio->bio.bi_end_io == &btrfs_orig_write_end_io) { + struct btrfs_io_stripe *orig_stripe = orig_bbio->bio.bi_private; + struct btrfs_io_context *orig_bioc = orig_stripe->bioc; + + atomic_add(orig_bioc->max_errors, &orig_bioc->error); + } else { + orig_bbio->bio.bi_status = bbio->bio.bi_status; + } +} + +static void btrfs_orig_bbio_end_io(struct btrfs_bio *bbio) +{ + if (bbio->bio.bi_pool == &btrfs_clone_bioset) { + struct btrfs_bio *orig_bbio = bbio->private; + + if (bbio->bio.bi_status) + btrfs_bbio_propagate_error(bbio, orig_bbio); + bio_put(&bbio->bio); + bbio = orig_bbio; + } + + if (atomic_dec_and_test(&bbio->pending_ios)) + bbio->end_io(bbio); +} + static int next_repair_mirror(struct btrfs_failed_bio *fbio, int cur_mirror) { if (cur_mirror == fbio->num_copies) @@ -6715,7 +6768,7 @@ static int prev_repair_mirror(struct btrfs_failed_bio *fbio, int cur_mirror) static void btrfs_repair_done(struct btrfs_failed_bio *fbio) { if (atomic_dec_and_test(&fbio->repair_count)) { - fbio->bbio->end_io(fbio->bbio); + btrfs_orig_bbio_end_io(fbio->bbio); mempool_free(fbio, &btrfs_failed_bio_pool); } } @@ -6857,7 +6910,7 @@ static void btrfs_check_read_bio(struct btrfs_bio *bbio, if (unlikely(fbio)) btrfs_repair_done(fbio); else - bbio->end_io(bbio); + btrfs_orig_bbio_end_io(bbio); } static void btrfs_log_dev_io_error(struct bio *bio, struct btrfs_device *dev) @@ -6908,7 +6961,7 @@ static void btrfs_simple_end_io(struct bio *bio) } else { if (bio_op(bio) == REQ_OP_ZONE_APPEND) btrfs_record_physical_zoned(bbio); - bbio->end_io(bbio); + btrfs_orig_bbio_end_io(bbio); } } @@ -6922,7 +6975,7 @@ static void btrfs_raid56_end_io(struct bio *bio) if (bio_op(bio) == REQ_OP_READ) btrfs_check_read_bio(bbio, NULL); else - bbio->end_io(bbio); + btrfs_orig_bbio_end_io(bbio); btrfs_put_bioc(bioc); } @@ -6949,7 +7002,7 @@ static void btrfs_orig_write_end_io(struct bio *bio) else bio->bi_status = BLK_STS_OK; - bbio->end_io(bbio); + btrfs_orig_bbio_end_io(bbio); btrfs_put_bioc(bioc); } @@ -7190,8 +7243,8 @@ static bool btrfs_wq_submit_bio(struct btrfs_bio *bbio, return true; } -void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, - int mirror_num) +static bool btrfs_submit_chunk(struct btrfs_fs_info *fs_info, struct bio *bio, + int mirror_num) { struct btrfs_bio *bbio = btrfs_bio(bio); u64 logical = bio->bi_iter.bi_sector << 9; @@ -7207,11 +7260,10 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, if (ret) goto fail; + map_length = min(map_length, length); if (map_length < length) { - btrfs_crit(fs_info, - "mapping failed logical %llu bio len %llu len %llu", - logical, length, map_length); - BUG(); + bio = btrfs_split_bio(bio, map_length); + bbio = btrfs_bio(bio); } /* @@ -7222,7 +7274,7 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, bbio->saved_iter = bio->bi_iter; ret = btrfs_lookup_bio_sums(bbio); if (ret) - goto fail; + goto fail_put_bio; } if (btrfs_op(bio) == BTRFS_MAP_WRITE) { @@ -7231,7 +7283,7 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, if (bio_op(bio) == REQ_OP_ZONE_APPEND) { ret = btrfs_extract_ordered_extent(btrfs_bio(bio)); if (ret) - goto fail; + goto fail_put_bio; } /* @@ -7243,22 +7295,36 @@ void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, !btrfs_is_data_reloc_root(bi->root)) { if (should_async_write(bbio) && btrfs_wq_submit_bio(bbio, bioc, &smap, mirror_num)) - return; + goto done; if (bio->bi_opf & REQ_META) ret = btree_csum_one_bio(bbio); else ret = btrfs_csum_one_bio(bbio); if (ret) - goto fail; + goto fail_put_bio; } } __btrfs_submit_bio(bio, bioc, &smap, mirror_num); - return; +done: + return map_length == length; + +fail_put_bio: + if (map_length < length) + bio_put(bio); fail: btrfs_bio_counter_dec(fs_info); btrfs_bio_end_io(bbio, errno_to_blk_status(ret)); + /* Do not submit another chunk */ + return true; +} + +void btrfs_submit_bio(struct btrfs_fs_info *fs_info, struct bio *bio, + int mirror_num) +{ + while (!btrfs_submit_chunk(fs_info, bio, mirror_num)) + ; } /* @@ -8858,10 +8924,13 @@ int __init btrfs_bioset_init(void) offsetof(struct btrfs_bio, bio), BIOSET_NEED_BVECS)) return -ENOMEM; + if (bioset_init(&btrfs_clone_bioset, BIO_POOL_SIZE, + offsetof(struct btrfs_bio, bio), 0)) + goto out_free_bioset; if (bioset_init(&btrfs_repair_bioset, BIO_POOL_SIZE, offsetof(struct btrfs_bio, bio), BIOSET_NEED_BVECS)) - goto out_free_bioset; + goto out_free_clone_bioset; if (mempool_init_kmalloc_pool(&btrfs_failed_bio_pool, BIO_POOL_SIZE, sizeof(struct btrfs_failed_bio))) goto out_free_repair_bioset; @@ -8869,6 +8938,8 @@ int __init btrfs_bioset_init(void) out_free_repair_bioset: bioset_exit(&btrfs_repair_bioset); +out_free_clone_bioset: + bioset_exit(&btrfs_clone_bioset); out_free_bioset: bioset_exit(&btrfs_bioset); return -ENOMEM; @@ -8878,5 +8949,6 @@ void __cold btrfs_bioset_exit(void) { mempool_exit(&btrfs_failed_bio_pool); bioset_exit(&btrfs_repair_bioset); + bioset_exit(&btrfs_clone_bioset); bioset_exit(&btrfs_bioset); } diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 8b248c9bd602b..97877184d0db1 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -386,6 +386,7 @@ struct btrfs_bio { /* For internal use in read end I/O handling */ unsigned int mirror_num; + atomic_t pending_ios; struct work_struct end_io_work; /* From patchwork Thu Sep 1 07:42:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12962038 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8731DECAAD3 for ; Thu, 1 Sep 2022 07:44:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234061AbiIAHni (ORCPT ); Thu, 1 Sep 2022 03:43:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34166 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233888AbiIAHm6 (ORCPT ); Thu, 1 Sep 2022 03:42:58 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8AB531257D7; Thu, 1 Sep 2022 00:42:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=TnqeoHFrh1MxV+hta84LlYAbhOieggrswEFSvOzv2cA=; b=b6AQffJfPnbPaEK83CapsmmIhB /Ctwrxd9/M1sHpZv7Corv7kiollJ8TNqnTbz9zNQVcGtefiAT3RjZBkGwlu0dBJ/1lO0pDIBkZrAS JC/iuHVfWbXs3afJ9TVFBVs7/YUU95c6sJrcp0h8+zeCP3v793pGtnAazv1KUHpS+mc4mkjVV96fK eEJT7+SwytBzm3IyhMXzr3GzylxHnNbWxpfFxR+fkAIXSJe9rcwpm8zPGhbm+Esvn6dDUkIfa4EA0 8DjIqzSsbfoXOv346C+R2wRamKfPinqZwq8G62tt9JQWxkhSOVM+7PxtAQMUWQb9KwBrqI7SyaPpA svaUXphA==; Received: from 213-225-1-14.nat.highway.a1.net ([213.225.1.14] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1oTeqj-00ANdq-4g; Thu, 01 Sep 2022 07:42:53 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Damien Le Moal , Naohiro Aota , Johannes Thumshirn , Qu Wenruo , Jens Axboe , "Darrick J. Wong" , linux-block@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 08/17] btrfs: pass the iomap bio to btrfs_submit_bio Date: Thu, 1 Sep 2022 10:42:07 +0300 Message-Id: <20220901074216.1849941-9-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220901074216.1849941-1-hch@lst.de> References: <20220901074216.1849941-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Now that btrfs_submit_bio splits the bio when crossing stripe boundaries, there is no need for the higher level code to do that manually. For direct I/O this is really helpful, as btrfs_submit_io can now simply take the bio allocated by iomap and send it on to btrfs_submit_bio instead of allocating clones. For that to work, the bio embedded into struct btrfs_dio_private needs to become a full btrfs_bio as expected by btrfs_submit_bio. With this change there is a single work item to offload the entire iomap bio so the heuristics to skip async processing for bios that were split isn't needed anymore either. Signed-off-by: Christoph Hellwig Reviewed-by: Josef Bacik --- fs/btrfs/inode.c | 159 +++++++++------------------------------------ fs/btrfs/volumes.c | 21 +----- fs/btrfs/volumes.h | 7 +- 3 files changed, 37 insertions(+), 150 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 03953c1f176dd..833ea647f7887 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -69,24 +69,12 @@ struct btrfs_dio_data { }; struct btrfs_dio_private { - struct inode *inode; - - /* - * Since DIO can use anonymous page, we cannot use page_offset() to - * grab the file offset, thus need a dedicated member for file offset. - */ + /* Range of I/O */ u64 file_offset; - /* Used for bio::bi_size */ u32 bytes; - /* - * References to this structure. There is one reference per in-flight - * bio plus one while we're still setting up. - */ - refcount_t refs; - /* This must be last */ - struct bio bio; + struct btrfs_bio bbio; }; static struct bio_set btrfs_dio_bioset; @@ -7815,130 +7803,47 @@ static int btrfs_dio_iomap_end(struct inode *inode, loff_t pos, loff_t length, return ret; } -static void btrfs_dio_private_put(struct btrfs_dio_private *dip) -{ - /* - * This implies a barrier so that stores to dio_bio->bi_status before - * this and loads of dio_bio->bi_status after this are fully ordered. - */ - if (!refcount_dec_and_test(&dip->refs)) - return; - - if (btrfs_op(&dip->bio) == BTRFS_MAP_WRITE) { - btrfs_mark_ordered_io_finished(BTRFS_I(dip->inode), NULL, - dip->file_offset, dip->bytes, - !dip->bio.bi_status); - } else { - unlock_extent(&BTRFS_I(dip->inode)->io_tree, - dip->file_offset, - dip->file_offset + dip->bytes - 1); - } - - bio_endio(&dip->bio); -} - -static void btrfs_end_dio_bio(struct btrfs_bio *bbio) +static void btrfs_dio_end_io(struct btrfs_bio *bbio) { - struct btrfs_dio_private *dip = bbio->private; + struct btrfs_dio_private *dip = + container_of(bbio, struct btrfs_dio_private, bbio); + struct btrfs_inode *bi = BTRFS_I(bbio->inode); struct bio *bio = &bbio->bio; - blk_status_t err = bio->bi_status; - if (err) { - btrfs_warn(BTRFS_I(dip->inode)->root->fs_info, - "direct IO failed ino %llu rw %d,%u sector %#Lx len %u err no %d", - btrfs_ino(BTRFS_I(dip->inode)), bio_op(bio), - bio->bi_opf, bio->bi_iter.bi_sector, - bio->bi_iter.bi_size, err); - dip->bio.bi_status = err; + if (bio->bi_status) { + btrfs_warn(bi->root->fs_info, + "direct IO failed ino %llu op 0x%0x offset %#llx len %u err no %d", + btrfs_ino(bi), bio->bi_opf, + dip->file_offset, dip->bytes, bio->bi_status); } - bio_put(bio); - btrfs_dio_private_put(dip); + if (btrfs_op(bio) == BTRFS_MAP_WRITE) + btrfs_mark_ordered_io_finished(bi, NULL, dip->file_offset, + dip->bytes, !bio->bi_status); + else + unlock_extent(&bi->io_tree, dip->file_offset, + dip->file_offset + dip->bytes - 1); + + bbio->bio.bi_private = bbio->private; + iomap_dio_bio_end_io(bio); } -static void btrfs_submit_direct(const struct iomap_iter *iter, - struct bio *dio_bio, loff_t file_offset) +static void btrfs_dio_submit_io(const struct iomap_iter *iter, struct bio *bio, + loff_t file_offset) { + struct btrfs_bio *bbio = btrfs_bio(bio); struct btrfs_dio_private *dip = - container_of(dio_bio, struct btrfs_dio_private, bio); - struct inode *inode = iter->inode; - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct bio *bio; - u64 start_sector; - u64 submit_len; - u64 clone_offset = 0; - u64 clone_len; - u64 logical; - int ret; - blk_status_t status; - struct btrfs_io_geometry geom; + container_of(bbio, struct btrfs_dio_private, bbio); struct btrfs_dio_data *dio_data = iter->private; - struct extent_map *em = NULL; - - dip->inode = inode; - dip->file_offset = file_offset; - dip->bytes = dio_bio->bi_iter.bi_size; - refcount_set(&dip->refs, 1); - start_sector = dio_bio->bi_iter.bi_sector; - submit_len = dio_bio->bi_iter.bi_size; + btrfs_bio_init(bbio, iter->inode, btrfs_dio_end_io, bio->bi_private); + bbio->file_offset = file_offset; - do { - logical = start_sector << 9; - em = btrfs_get_chunk_map(fs_info, logical, submit_len); - if (IS_ERR(em)) { - status = errno_to_blk_status(PTR_ERR(em)); - em = NULL; - goto out_err_em; - } - ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(dio_bio), - logical, &geom); - if (ret) { - status = errno_to_blk_status(ret); - goto out_err_em; - } - - clone_len = min(submit_len, geom.len); - ASSERT(clone_len <= UINT_MAX); - - /* - * This will never fail as it's passing GPF_NOFS and - * the allocation is backed by btrfs_bioset. - */ - bio = btrfs_bio_clone_partial(dio_bio, clone_offset, clone_len, - inode, btrfs_end_dio_bio, dip); - btrfs_bio(bio)->file_offset = file_offset; - - ASSERT(submit_len >= clone_len); - submit_len -= clone_len; - - /* - * Increase the count before we submit the bio so we know - * the end IO handler won't happen before we increase the - * count. Otherwise, the dip might get freed before we're - * done setting it up. - * - * We transfer the initial reference to the last bio, so we - * don't need to increment the reference count for the last one. - */ - if (submit_len > 0) - refcount_inc(&dip->refs); - - btrfs_submit_bio(fs_info, bio, 0); - - dio_data->submitted += clone_len; - clone_offset += clone_len; - start_sector += clone_len >> 9; - file_offset += clone_len; - - free_extent_map(em); - } while (submit_len > 0); - return; + dip->file_offset = file_offset; + dip->bytes = bio->bi_iter.bi_size; -out_err_em: - free_extent_map(em); - dio_bio->bi_status = status; - btrfs_dio_private_put(dip); + dio_data->submitted += bio->bi_iter.bi_size; + btrfs_submit_bio(btrfs_sb(iter->inode->i_sb), bio, 0); } static const struct iomap_ops btrfs_dio_iomap_ops = { @@ -7947,7 +7852,7 @@ static const struct iomap_ops btrfs_dio_iomap_ops = { }; static const struct iomap_dio_ops btrfs_dio_ops = { - .submit_io = btrfs_submit_direct, + .submit_io = btrfs_dio_submit_io, .bio_set = &btrfs_dio_bioset, }; @@ -8788,7 +8693,7 @@ int __init btrfs_init_cachep(void) goto fail; if (bioset_init(&btrfs_dio_bioset, BIO_POOL_SIZE, - offsetof(struct btrfs_dio_private, bio), + offsetof(struct btrfs_dio_private, bbio.bio), BIOSET_NEED_BVECS)) goto fail; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 0a2d144c20604..dba8e53101ed9 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6655,8 +6655,8 @@ int btrfs_map_sblock(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, * Initialize a btrfs_bio structure. This skips the embedded bio itself as it * is already initialized by the block layer. */ -static void btrfs_bio_init(struct btrfs_bio *bbio, struct inode *inode, - btrfs_bio_end_io_t end_io, void *private) +void btrfs_bio_init(struct btrfs_bio *bbio, struct inode *inode, + btrfs_bio_end_io_t end_io, void *private) { memset(bbio, 0, offsetof(struct btrfs_bio, bio)); bbio->inode = inode; @@ -6683,23 +6683,6 @@ struct bio *btrfs_bio_alloc(unsigned int nr_vecs, blk_opf_t opf, return bio; } -struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size, - struct inode *inode, - btrfs_bio_end_io_t end_io, void *private) -{ - struct bio *bio; - struct btrfs_bio *bbio; - - ASSERT(offset <= UINT_MAX && size <= UINT_MAX); - - bio = bio_alloc_clone(orig->bi_bdev, orig, GFP_NOFS, &btrfs_bioset); - bbio = btrfs_bio(bio); - btrfs_bio_init(bbio, inode, end_io, private); - - bio_trim(bio, offset >> 9, size >> 9); - return bio; -} - static struct bio *btrfs_split_bio(struct bio *orig, u64 map_length) { struct btrfs_bio *orig_bbio = btrfs_bio(orig); diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 97877184d0db1..82bbc0aa7081d 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -404,12 +404,11 @@ static inline struct btrfs_bio *btrfs_bio(struct bio *bio) int __init btrfs_bioset_init(void); void __cold btrfs_bioset_exit(void); -struct bio *btrfs_bio_alloc(unsigned int nr_vecs, blk_opf_t opf, +void btrfs_bio_init(struct btrfs_bio *bbio, struct inode *inode, + btrfs_bio_end_io_t end_io, void *private); +struct bio *btrfs_bio_alloc(unsigned int nr_vecs, unsigned int opf, struct inode *inode, btrfs_bio_end_io_t end_io, void *private); -struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size, - struct inode *inode, - btrfs_bio_end_io_t end_io, void *private); static inline void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status) { From patchwork Thu Sep 1 07:42:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12962039 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA73AECAAD8 for ; Thu, 1 Sep 2022 07:44:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234089AbiIAHnn (ORCPT ); Thu, 1 Sep 2022 03:43:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233173AbiIAHnC (ORCPT ); Thu, 1 Sep 2022 03:43:02 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C1D261090AC; Thu, 1 Sep 2022 00:43:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=IV+WAuANa6nQ5NvpFQWf0zbFE6Say9xQtaAnlrgmYx0=; b=njEAEUz2E0R7VlYFin50NIWhAf xcExK6kpBNt/XyTRY7mLbkIvRNJKbwd0g7X3MAabR9stZqcU7xfUhMv9EX0IOUOFRg9zGaYwMBqt8 iwlcbPXWYjbHOntlkhvUJWf8+NxrZRPzE2ErYVTohY/DmwhwcH1yZQHtRJK67RIbOQznsU82jzcNq wsE/Su55jQ8hP8RYjAJ11qjDxO05CadOvuzPVMcxUpUelDD57vVx+NQA8mlsGhv77PWP0J7tbQ1YB xQHeR+ahKk5ZFeeV8J/XjyyMn6I5o1kLBkHPq9Bih2oVFGWc3uIEZI3BZs81Dayw+zmemwGsYMMdh IQGkX8Rg==; Received: from 213-225-1-14.nat.highway.a1.net ([213.225.1.14] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1oTeqm-00ANfR-Tl; Thu, 01 Sep 2022 07:42:57 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Damien Le Moal , Naohiro Aota , Johannes Thumshirn , Qu Wenruo , Jens Axboe , "Darrick J. Wong" , linux-block@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 09/17] btrfs: remove stripe boundary calculation for buffered I/O Date: Thu, 1 Sep 2022 10:42:08 +0300 Message-Id: <20220901074216.1849941-10-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220901074216.1849941-1-hch@lst.de> References: <20220901074216.1849941-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Qu Wenruo Remove btrfs_bio_ctrl::len_to_stripe_boundary, so that buffer I/O will no longer limit its bio size according to stripe length now that btrfs_submit_bio can split bios at stripe boundaries. Signed-off-by: Qu Wenruo [hch: simplify calc_bio_boundaries a little more] Signed-off-by: Christoph Hellwig Reviewed-by: Josef Bacik --- fs/btrfs/extent_io.c | 71 ++++++++++++-------------------------------- 1 file changed, 19 insertions(+), 52 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 4c00bdefe5b45..46a3f0e33fb69 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -145,7 +145,6 @@ struct btrfs_bio_ctrl { struct bio *bio; int mirror_num; enum btrfs_compression_type compress_type; - u32 len_to_stripe_boundary; u32 len_to_oe_boundary; }; @@ -2601,7 +2600,7 @@ static int btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl, ASSERT(bio); /* The limit should be calculated when bio_ctrl->bio is allocated */ - ASSERT(bio_ctrl->len_to_oe_boundary && bio_ctrl->len_to_stripe_boundary); + ASSERT(bio_ctrl->len_to_oe_boundary); if (bio_ctrl->compress_type != compress_type) return 0; @@ -2637,9 +2636,7 @@ static int btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl, if (!contig) return 0; - real_size = min(bio_ctrl->len_to_oe_boundary, - bio_ctrl->len_to_stripe_boundary) - bio_size; - real_size = min(real_size, size); + real_size = min(bio_ctrl->len_to_oe_boundary - bio_size, size); /* * If real_size is 0, never call bio_add_*_page(), as even size is 0, @@ -2656,58 +2653,30 @@ static int btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl, return ret; } -static int calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl, - struct btrfs_inode *inode, u64 file_offset) +static void calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl, + struct btrfs_inode *inode, u64 file_offset) { - struct btrfs_fs_info *fs_info = inode->root->fs_info; - struct btrfs_io_geometry geom; struct btrfs_ordered_extent *ordered; - struct extent_map *em; u64 logical = (bio_ctrl->bio->bi_iter.bi_sector << SECTOR_SHIFT); - int ret; /* - * Pages for compressed extent are never submitted to disk directly, - * thus it has no real boundary, just set them to U32_MAX. - * - * The split happens for real compressed bio, which happens in - * btrfs_submit_compressed_read/write(). + * Limit the extent to the ordered boundary for Zone Append. + * Compressed bios aren't submitted directly, so it doesn't apply + * to them. */ - if (bio_ctrl->compress_type != BTRFS_COMPRESS_NONE) { - bio_ctrl->len_to_oe_boundary = U32_MAX; - bio_ctrl->len_to_stripe_boundary = U32_MAX; - return 0; - } - em = btrfs_get_chunk_map(fs_info, logical, fs_info->sectorsize); - if (IS_ERR(em)) - return PTR_ERR(em); - ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(bio_ctrl->bio), - logical, &geom); - free_extent_map(em); - if (ret < 0) { - return ret; - } - if (geom.len > U32_MAX) - bio_ctrl->len_to_stripe_boundary = U32_MAX; - else - bio_ctrl->len_to_stripe_boundary = (u32)geom.len; - - if (bio_op(bio_ctrl->bio) != REQ_OP_ZONE_APPEND) { - bio_ctrl->len_to_oe_boundary = U32_MAX; - return 0; - } - - /* Ordered extent not yet created, so we're good */ - ordered = btrfs_lookup_ordered_extent(inode, file_offset); - if (!ordered) { - bio_ctrl->len_to_oe_boundary = U32_MAX; - return 0; + if (bio_ctrl->compress_type == BTRFS_COMPRESS_NONE && + bio_op(bio_ctrl->bio) == REQ_OP_ZONE_APPEND) { + ordered = btrfs_lookup_ordered_extent(inode, file_offset); + if (ordered) { + bio_ctrl->len_to_oe_boundary = min_t(u32, U32_MAX, + ordered->disk_bytenr + + ordered->disk_num_bytes - logical); + btrfs_put_ordered_extent(ordered); + return; + } } - bio_ctrl->len_to_oe_boundary = min_t(u32, U32_MAX, - ordered->disk_bytenr + ordered->disk_num_bytes - logical); - btrfs_put_ordered_extent(ordered); - return 0; + bio_ctrl->len_to_oe_boundary = U32_MAX; } static int alloc_new_bio(struct btrfs_inode *inode, @@ -2734,9 +2703,7 @@ static int alloc_new_bio(struct btrfs_inode *inode, bio->bi_iter.bi_sector = (disk_bytenr + offset) >> SECTOR_SHIFT; bio_ctrl->bio = bio; bio_ctrl->compress_type = compress_type; - ret = calc_bio_boundaries(bio_ctrl, inode, file_offset); - if (ret < 0) - goto error; + calc_bio_boundaries(bio_ctrl, inode, file_offset); if (wbc) { /* From patchwork Thu Sep 1 07:42:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12962040 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2393FECAAD8 for ; Thu, 1 Sep 2022 07:44:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233113AbiIAHoI (ORCPT ); Thu, 1 Sep 2022 03:44:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234046AbiIAHnJ (ORCPT ); Thu, 1 Sep 2022 03:43:09 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D7244125BB1; Thu, 1 Sep 2022 00:43:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=1roTzywvarz5LabvDIpocxFQR4qcmEp4AsSgGivBf9o=; b=wcFJqS2KyxKdZQic//Zj+0seFB a594R/zA3VclUTm6Tb3boiKY75uDWTzsov4lN/4kduPGcTHB+xSKk9CHBbvdV93TQK8zn2kVUBcSg G/a3tKjP7fyAzWRhHVhy5NGMYrGe+CoCQlqXJrra8SF22bjNGRUzpMPeU/3hs/jFc1KNWqaykVwyQ nu0Hpad5hkRQHigs0h4uOCVqj9y57t+pAu0TkhaaVLQJLRHNWPUsJxpN4yf9EK8RvNA4CZCFakyuu Qcbdgdd7kk7CT0BHHefyuuUJSVr37iYBKHKRaqR/At4EzKbJOAF8X1ei79fMBNGMWw6rHkUkn7A4R MunIC/kQ==; Received: from 213-225-1-14.nat.highway.a1.net ([213.225.1.14] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1oTeqq-00ANhi-T7; Thu, 01 Sep 2022 07:43:01 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Damien Le Moal , Naohiro Aota , Johannes Thumshirn , Qu Wenruo , Jens Axboe , "Darrick J. Wong" , linux-block@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 10/17] btrfs: remove stripe boundary calculation for compressed I/O Date: Thu, 1 Sep 2022 10:42:09 +0300 Message-Id: <20220901074216.1849941-11-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220901074216.1849941-1-hch@lst.de> References: <20220901074216.1849941-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Qu Wenruo Stop looking at the stripe boundary in alloc_compressed_bio() now that that btrfs_submit_bio can split bios, open code the now trivial code from alloc_compressed_bio() in btrfs_submit_compressed_read and stop maintaining the pending_ios count for reads as there is always just a single bio now. Signed-off-by: Qu Wenruo [hch: remove more cruft in btrfs_submit_compressed_read] Signed-off-by: Christoph Hellwig Reviewed-by: Josef Bacik --- fs/btrfs/compression.c | 131 +++++++++++------------------------------ 1 file changed, 34 insertions(+), 97 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 1f10f86e70557..5e8b75b030ace 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -136,12 +136,15 @@ static int compression_decompress(int type, struct list_head *ws, static int btrfs_decompress_bio(struct compressed_bio *cb); -static void finish_compressed_bio_read(struct compressed_bio *cb) +static void end_compressed_bio_read(struct btrfs_bio *bbio) { + struct compressed_bio *cb = bbio->private; unsigned int index; struct page *page; - if (cb->status == BLK_STS_OK) + if (bbio->bio.bi_status) + cb->status = bbio->bio.bi_status; + else cb->status = errno_to_blk_status(btrfs_decompress_bio(cb)); /* Release the compressed pages */ @@ -157,17 +160,6 @@ static void finish_compressed_bio_read(struct compressed_bio *cb) /* Finally free the cb struct */ kfree(cb->compressed_pages); kfree(cb); -} - -static void end_compressed_bio_read(struct btrfs_bio *bbio) -{ - struct compressed_bio *cb = bbio->private; - - if (bbio->bio.bi_status) - cb->status = bbio->bio.bi_status; - - if (refcount_dec_and_test(&cb->pending_ios)) - finish_compressed_bio_read(cb); bio_put(&bbio->bio); } @@ -286,42 +278,30 @@ static void end_compressed_bio_write(struct btrfs_bio *bbio) * from or written to. * @endio_func: The endio function to call after the IO for compressed data * is finished. - * @next_stripe_start: Return value of logical bytenr of where next stripe starts. - * Let the caller know to only fill the bio up to the stripe - * boundary. */ - - static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_bytenr, blk_opf_t opf, - btrfs_bio_end_io_t endio_func, - u64 *next_stripe_start) + btrfs_bio_end_io_t endio_func) { - struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb); - struct btrfs_io_geometry geom; - struct extent_map *em; struct bio *bio; - int ret; bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, cb->inode, endio_func, cb); bio->bi_iter.bi_sector = disk_bytenr >> SECTOR_SHIFT; - em = btrfs_get_chunk_map(fs_info, disk_bytenr, fs_info->sectorsize); - if (IS_ERR(em)) { - bio_put(bio); - return ERR_CAST(em); - } + if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb); + struct extent_map *em; - if (bio_op(bio) == REQ_OP_ZONE_APPEND) - bio_set_dev(bio, em->map_lookup->stripes[0].dev->bdev); + em = btrfs_get_chunk_map(fs_info, disk_bytenr, + fs_info->sectorsize); + if (IS_ERR(em)) { + bio_put(bio); + return ERR_CAST(em); + } - ret = btrfs_get_io_geometry(fs_info, em, btrfs_op(bio), disk_bytenr, &geom); - free_extent_map(em); - if (ret < 0) { - bio_put(bio); - return ERR_PTR(ret); + bio_set_dev(bio, em->map_lookup->stripes[0].dev->bdev); + free_extent_map(em); } - *next_stripe_start = disk_bytenr + geom.len; refcount_inc(&cb->pending_ios); return bio; } @@ -348,7 +328,6 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, struct bio *bio = NULL; struct compressed_bio *cb; u64 cur_disk_bytenr = disk_start; - u64 next_stripe_start; blk_status_t ret = BLK_STS_OK; const bool use_append = btrfs_use_zone_append(inode, disk_start); const enum req_op bio_op = REQ_BTRFS_ONE_ORDERED | @@ -384,8 +363,7 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, /* Allocate new bio if submitted or not yet allocated */ if (!bio) { bio = alloc_compressed_bio(cb, cur_disk_bytenr, - bio_op | write_flags, end_compressed_bio_write, - &next_stripe_start); + bio_op | write_flags, end_compressed_bio_write); if (IS_ERR(bio)) { ret = errno_to_blk_status(PTR_ERR(bio)); break; @@ -393,20 +371,12 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, if (blkcg_css) bio->bi_opf |= REQ_CGROUP_PUNT; } - /* - * We should never reach next_stripe_start start as we will - * submit comp_bio when reach the boundary immediately. - */ - ASSERT(cur_disk_bytenr != next_stripe_start); - /* * We have various limits on the real read size: - * - stripe boundary * - page boundary * - compressed length boundary */ - real_size = min_t(u64, U32_MAX, next_stripe_start - cur_disk_bytenr); - real_size = min_t(u64, real_size, PAGE_SIZE - offset_in_page(offset)); + real_size = min_t(u64, U32_MAX, PAGE_SIZE - offset_in_page(offset)); real_size = min_t(u64, real_size, compressed_len - offset); ASSERT(IS_ALIGNED(real_size, fs_info->sectorsize)); @@ -421,9 +391,6 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, submit = true; cur_disk_bytenr += added; - /* Reached stripe boundary */ - if (cur_disk_bytenr == next_stripe_start) - submit = true; /* Finished the range */ if (cur_disk_bytenr == disk_start + compressed_len) @@ -613,10 +580,9 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, struct extent_map_tree *em_tree; struct compressed_bio *cb; unsigned int compressed_len; - struct bio *comp_bio = NULL; + struct bio *comp_bio; const u64 disk_bytenr = bio->bi_iter.bi_sector << SECTOR_SHIFT; u64 cur_disk_byte = disk_bytenr; - u64 next_stripe_start; u64 file_offset; u64 em_len; u64 em_start; @@ -681,37 +647,23 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, /* include any pages we added in add_ra-bio_pages */ cb->len = bio->bi_iter.bi_size; + comp_bio = btrfs_bio_alloc(BIO_MAX_VECS, REQ_OP_READ, cb->inode, + end_compressed_bio_read, cb); + comp_bio->bi_iter.bi_sector = cur_disk_byte >> SECTOR_SHIFT; + while (cur_disk_byte < disk_bytenr + compressed_len) { u64 offset = cur_disk_byte - disk_bytenr; unsigned int index = offset >> PAGE_SHIFT; unsigned int real_size; unsigned int added; struct page *page = cb->compressed_pages[index]; - bool submit = false; - /* Allocate new bio if submitted or not yet allocated */ - if (!comp_bio) { - comp_bio = alloc_compressed_bio(cb, cur_disk_byte, - REQ_OP_READ, end_compressed_bio_read, - &next_stripe_start); - if (IS_ERR(comp_bio)) { - cb->status = errno_to_blk_status(PTR_ERR(comp_bio)); - break; - } - } - /* - * We should never reach next_stripe_start start as we will - * submit comp_bio when reach the boundary immediately. - */ - ASSERT(cur_disk_byte != next_stripe_start); /* * We have various limit on the real read size: - * - stripe boundary * - page boundary * - compressed length boundary */ - real_size = min_t(u64, U32_MAX, next_stripe_start - cur_disk_byte); - real_size = min_t(u64, real_size, PAGE_SIZE - offset_in_page(offset)); + real_size = min_t(u64, U32_MAX, PAGE_SIZE - offset_in_page(offset)); real_size = min_t(u64, real_size, compressed_len - offset); ASSERT(IS_ALIGNED(real_size, fs_info->sectorsize)); @@ -722,32 +674,17 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, */ ASSERT(added == real_size); cur_disk_byte += added; - - /* Reached stripe boundary, need to submit */ - if (cur_disk_byte == next_stripe_start) - submit = true; - - /* Has finished the range, need to submit */ - if (cur_disk_byte == disk_bytenr + compressed_len) - submit = true; - - if (submit) { - /* - * Save the initial offset of this chunk, as there - * is no direct correlation between compressed pages and - * the original file offset. The field is only used for - * priting error messages. - */ - btrfs_bio(comp_bio)->file_offset = file_offset; - - ASSERT(comp_bio->bi_iter.bi_size); - btrfs_submit_bio(fs_info, comp_bio, mirror_num); - comp_bio = NULL; - } } - if (refcount_dec_and_test(&cb->pending_ios)) - finish_compressed_bio_read(cb); + /* + * Just stash the initial offset of this chunk, as there is no direct + * correlation between compressed pages and the original file offset. + * The field is only used for priting error messages anyway. + */ + btrfs_bio(comp_bio)->file_offset = file_offset; + + ASSERT(comp_bio->bi_iter.bi_size); + btrfs_submit_bio(fs_info, comp_bio, mirror_num); return; fail: From patchwork Thu Sep 1 07:42:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12962041 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2CC7ECAAD3 for ; Thu, 1 Sep 2022 07:44:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234225AbiIAHoK (ORCPT ); Thu, 1 Sep 2022 03:44:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34806 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233756AbiIAHnM (ORCPT ); Thu, 1 Sep 2022 03:43:12 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C58B81257FC; Thu, 1 Sep 2022 00:43:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=k/bCRqhIEe5EhCH7REQnZWBBj0b2P9XHUMRdc2s+emM=; b=RJro6QENp+ATJxjQYZBSEm2m6k H9IF1Owaw6OuVJZmyYl3RtIdOuu2muedMSl8OC099YK+EJXkDH4LLua9j0GUjK2GT/MnHuHhDAnR0 /S+Cycno1M4TpgpltNq/mTv4wZL6pM07ekiX4mGanWFKT+pL/IBuD5BF3e85X4dKxa0srgrx5iBxb 1N7F3srEeW3h/BNPVF5HxYXvdHFfW7rCZcK4r+GgtiegnakSgWRYTG3Qh8qnvwIc/7CZ8BRB2nSOY aI18DV5QFCYLsIgHKBH15smQTpjvVwIxDxLUnZs83iLdOunBGc0aRUKAZ6FH0cRPmhnBFP4GDpSrc PUsdyPXQ==; Received: from 213-225-1-14.nat.highway.a1.net ([213.225.1.14] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1oTeqv-00ANj6-36; Thu, 01 Sep 2022 07:43:06 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Damien Le Moal , Naohiro Aota , Johannes Thumshirn , Qu Wenruo , Jens Axboe , "Darrick J. Wong" , linux-block@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 11/17] btrfs: remove stripe boundary calculation for encoded I/O Date: Thu, 1 Sep 2022 10:42:10 +0300 Message-Id: <20220901074216.1849941-12-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220901074216.1849941-1-hch@lst.de> References: <20220901074216.1849941-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Qu Wenruo Stop looking at the stripe boundary in btrfs_encoded_read_regular_fill_pages() now that that btrfs_submit_bio can split bios. Signed-off-by: Qu Wenruo Signed-off-by: Christoph Hellwig Reviewed-by: Johannes Thumshirn Reviewed-by: Josef Bacik --- fs/btrfs/inode.c | 23 ++--------------------- 1 file changed, 2 insertions(+), 21 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 833ea647f7887..399381a4f8e69 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -10025,7 +10025,6 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, u64 file_offset, u64 disk_bytenr, u64 disk_io_size, struct page **pages) { - struct btrfs_fs_info *fs_info = inode->root->fs_info; struct btrfs_encoded_read_private priv = { .inode = inode, .file_offset = file_offset, @@ -10033,33 +10032,15 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, }; unsigned long i = 0; u64 cur = 0; - int ret; init_waitqueue_head(&priv.wait); /* - * Submit bios for the extent, splitting due to bio or stripe limits as - * necessary. + * Submit bios for the extent, splitting due to bio limits as necessary. */ while (cur < disk_io_size) { - struct extent_map *em; - struct btrfs_io_geometry geom; struct bio *bio = NULL; - u64 remaining; + u64 remaining = disk_io_size - cur; - em = btrfs_get_chunk_map(fs_info, disk_bytenr + cur, - disk_io_size - cur); - if (IS_ERR(em)) { - ret = PTR_ERR(em); - } else { - ret = btrfs_get_io_geometry(fs_info, em, BTRFS_MAP_READ, - disk_bytenr + cur, &geom); - free_extent_map(em); - } - if (ret) { - WRITE_ONCE(priv.status, errno_to_blk_status(ret)); - break; - } - remaining = min(geom.len, disk_io_size - cur); while (bio || remaining) { size_t bytes = min_t(u64, remaining, PAGE_SIZE); From patchwork Thu Sep 1 07:42:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12962045 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF6F8C65C0D for ; Thu, 1 Sep 2022 07:44:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232651AbiIAHoN (ORCPT ); Thu, 1 Sep 2022 03:44:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34362 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233189AbiIAHng (ORCPT ); Thu, 1 Sep 2022 03:43:36 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8426E126490; Thu, 1 Sep 2022 00:43:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=qlQrvrfSDmRPTKG1VSQz5QlosmwvOgEgSdThS8yBvS4=; b=l4/IuagDyCS96qIIWB6A53qLBs An1coXtEKAgoLBEyY4yl3h4C3/yWtIWhGcnbFsl7nCIP/nJt3FM5pJLGoXpZXfV4oiph8AU/5tAla qU9LcjBrZzqzkptGeKtf/De9DOpm6J7XyufSgptIVtz62/SdHm4fWNnG5NfriLHIbUyRylUZ++TsY T+oyU1vSoSysUijHwOwF1Fxd8R2AwhX84FJ3SNqQWtMvNhfxIsdfDrLWq0efsYLPZmD8m5pNTWfKm ry9EaeNSI+y6q2Gze/e94WhqZZbFMvzcbMEA/NMRXWU/8iRFvBGUny1zwVrz3xVp21Okm/Nyc/8Op Uj8ybzkg==; Received: from 213-225-1-14.nat.highway.a1.net ([213.225.1.14] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1oTeqz-00ANjw-RI; Thu, 01 Sep 2022 07:43:10 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Damien Le Moal , Naohiro Aota , Johannes Thumshirn , Qu Wenruo , Jens Axboe , "Darrick J. Wong" , linux-block@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 12/17] btrfs: remove struct btrfs_io_geometry Date: Thu, 1 Sep 2022 10:42:11 +0300 Message-Id: <20220901074216.1849941-13-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220901074216.1849941-1-hch@lst.de> References: <20220901074216.1849941-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Now that btrfs_get_io_geometry has a single caller, we can massage it into a form that is more suitable for that caller and remove the marshalling into and out of struct btrfs_io_geometry. Signed-off-by: Christoph Hellwig Reviewed-by: Josef Bacik --- fs/btrfs/volumes.c | 115 +++++++++++++-------------------------------- fs/btrfs/volumes.h | 18 ------- 2 files changed, 32 insertions(+), 101 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index dba8e53101ed9..e497b63238189 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6269,91 +6269,43 @@ static bool need_full_stripe(enum btrfs_map_op op) return (op == BTRFS_MAP_WRITE || op == BTRFS_MAP_GET_READ_MIRRORS); } -/* - * Calculate the geometry of a particular (address, len) tuple. This - * information is used to calculate how big a particular bio can get before it - * straddles a stripe. - * - * @fs_info: the filesystem - * @em: mapping containing the logical extent - * @op: type of operation - write or read - * @logical: address that we want to figure out the geometry of - * @io_geom: pointer used to return values - * - * Returns < 0 in case a chunk for the given logical address cannot be found, - * usually shouldn't happen unless @logical is corrupted, 0 otherwise. - */ -int btrfs_get_io_geometry(struct btrfs_fs_info *fs_info, struct extent_map *em, - enum btrfs_map_op op, u64 logical, - struct btrfs_io_geometry *io_geom) +static u64 btrfs_max_io_len(struct map_lookup *map, enum btrfs_map_op op, + u64 offset, u64 *stripe_nr, u64 *stripe_offset, + u64 *full_stripe_start) { - struct map_lookup *map; - u64 len; - u64 offset; - u64 stripe_offset; - u64 stripe_nr; - u32 stripe_len; - u64 raid56_full_stripe_start = (u64)-1; - int data_stripes; + u32 stripe_len = map->stripe_len; ASSERT(op != BTRFS_MAP_DISCARD); - map = em->map_lookup; - /* Offset of this logical address in the chunk */ - offset = logical - em->start; - /* Len of a stripe in a chunk */ - stripe_len = map->stripe_len; /* - * Stripe_nr is where this block falls in - * stripe_offset is the offset of this block in its stripe. + * Stripe_nr is the stripe where this block falls. + * Stripe_offset is the offset of this block in its stripe. */ - stripe_nr = div64_u64_rem(offset, stripe_len, &stripe_offset); - ASSERT(stripe_offset < U32_MAX); + *stripe_nr = div64_u64_rem(offset, stripe_len, stripe_offset); + ASSERT(*stripe_offset < U32_MAX); - data_stripes = nr_data_stripes(map); + if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) { + unsigned long full_stripe_len = + stripe_len * nr_data_stripes(map); - /* Only stripe based profiles needs to check against stripe length. */ - if (map->type & BTRFS_BLOCK_GROUP_STRIPE_MASK) { - u64 max_len = stripe_len - stripe_offset; + *full_stripe_start = + div64_u64(offset, full_stripe_len) * full_stripe_len; /* - * In case of raid56, we need to know the stripe aligned start + * For writes to RAID[56], allow to write a full stripe set, but + * no straddling of stripe sets. */ - if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK) { - unsigned long full_stripe_len = stripe_len * data_stripes; - raid56_full_stripe_start = offset; - - /* - * Allow a write of a full stripe, but make sure we - * don't allow straddling of stripes - */ - raid56_full_stripe_start = div64_u64(raid56_full_stripe_start, - full_stripe_len); - raid56_full_stripe_start *= full_stripe_len; - - /* - * For writes to RAID[56], allow a full stripeset across - * all disks. For other RAID types and for RAID[56] - * reads, just allow a single stripe (on a single disk). - */ - if (op == BTRFS_MAP_WRITE) { - max_len = stripe_len * data_stripes - - (offset - raid56_full_stripe_start); - } - } - len = min_t(u64, em->len - offset, max_len); - } else { - len = em->len - offset; + if (op == BTRFS_MAP_WRITE) + return full_stripe_len - (offset - *full_stripe_start); } - io_geom->len = len; - io_geom->offset = offset; - io_geom->stripe_len = stripe_len; - io_geom->stripe_nr = stripe_nr; - io_geom->stripe_offset = stripe_offset; - io_geom->raid56_stripe_offset = raid56_full_stripe_start; - - return 0; + /* + * For other RAID types and for RAID[56] reads, just allow a single + * stripe (on a single disk). + */ + if (map->type & BTRFS_BLOCK_GROUP_STRIPE_MASK) + return stripe_len - *stripe_offset; + return U64_MAX; } static void set_io_stripe(struct btrfs_io_stripe *dst, const struct map_lookup *map, @@ -6372,6 +6324,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, { struct extent_map *em; struct map_lookup *map; + u64 map_offset; u64 stripe_offset; u64 stripe_nr; u64 stripe_len; @@ -6390,7 +6343,7 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, int patch_the_first_stripe_for_dev_replace = 0; u64 physical_to_patch_in_first_stripe = 0; u64 raid56_full_stripe_start = (u64)-1; - struct btrfs_io_geometry geom; + u64 max_len; ASSERT(bioc_ret); ASSERT(op != BTRFS_MAP_DISCARD); @@ -6398,18 +6351,14 @@ static int __btrfs_map_block(struct btrfs_fs_info *fs_info, em = btrfs_get_chunk_map(fs_info, logical, *length); ASSERT(!IS_ERR(em)); - ret = btrfs_get_io_geometry(fs_info, em, op, logical, &geom); - if (ret < 0) - return ret; - map = em->map_lookup; - - *length = geom.len; - stripe_len = geom.stripe_len; - stripe_nr = geom.stripe_nr; - stripe_offset = geom.stripe_offset; - raid56_full_stripe_start = geom.raid56_stripe_offset; data_stripes = nr_data_stripes(map); + stripe_len = map->stripe_len; + + map_offset = logical - em->start; + max_len = btrfs_max_io_len(map, op, map_offset, &stripe_nr, + &stripe_offset, &raid56_full_stripe_start); + *length = min_t(u64, em->len - map_offset, max_len); down_read(&dev_replace->rwsem); dev_replace_is_ongoing = btrfs_dev_replace_is_ongoing(dev_replace); diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 82bbc0aa7081d..3b1fe04ff078e 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -51,21 +51,6 @@ enum btrfs_raid_types { BTRFS_NR_RAID_TYPES }; -struct btrfs_io_geometry { - /* remaining bytes before crossing a stripe */ - u64 len; - /* offset of logical address in chunk */ - u64 offset; - /* length of single IO stripe */ - u32 stripe_len; - /* offset of address in stripe */ - u32 stripe_offset; - /* number of stripe where address falls */ - u64 stripe_nr; - /* offset of raid56 stripe into the chunk */ - u64 raid56_stripe_offset; -}; - /* * Use sequence counter to get consistent device stat data on * 32-bit processors. @@ -568,9 +553,6 @@ int btrfs_map_sblock(struct btrfs_fs_info *fs_info, enum btrfs_map_op op, struct btrfs_discard_stripe *btrfs_map_discard(struct btrfs_fs_info *fs_info, u64 logical, u64 *length_ret, u32 *num_stripes); -int btrfs_get_io_geometry(struct btrfs_fs_info *fs_info, struct extent_map *map, - enum btrfs_map_op op, u64 logical, - struct btrfs_io_geometry *io_geom); int btrfs_read_sys_array(struct btrfs_fs_info *fs_info); int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info); struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans, From patchwork Thu Sep 1 07:42:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12962042 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 703F2C64991 for ; Thu, 1 Sep 2022 07:44:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233898AbiIAHoM (ORCPT ); Thu, 1 Sep 2022 03:44:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233324AbiIAHnh (ORCPT ); Thu, 1 Sep 2022 03:43:37 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 171711264A7; Thu, 1 Sep 2022 00:43:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=xsLsepdDzI92/2ENAfo4C5Xjkuy4xHWvAk9PUbduG/M=; b=vEsy/WwxjHl81dUC4T+ib9A5WG O+OA/hTvVc7qOewlAMaHxouFYT6JSAOfNAxw86BondRgWcyEUTUxV2O1Ej3DQUL4VDJYA9u9mJfhH SoFVlIAPnJLsh7Qy/eDtyEZnM2f/akRx+KpQBuIccRfcnocp+2G+aNpoTrvXE4hBA5Do4ZqHrb8tE uJIa+XE+qLHC6OhbUdzd8ETKoA8aMKhV5oJ/EOdhhFDELAv0/TS6Inl+YYYC7Hr3QsVvHZ73xkusi US17JEnxKK8OUEfZte+MBTgFFGvIpF1Nh9h0zBKBAubJzgpkC6/SBiOCoHNKzBpuMQqQ/Zfim2CoM xfUZdgAg==; Received: from 213-225-1-14.nat.highway.a1.net ([213.225.1.14] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1oTer3-00ANlX-DP; Thu, 01 Sep 2022 07:43:14 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Damien Le Moal , Naohiro Aota , Johannes Thumshirn , Qu Wenruo , Jens Axboe , "Darrick J. Wong" , linux-block@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 13/17] btrfs: remove submit_encoded_read_bio Date: Thu, 1 Sep 2022 10:42:12 +0300 Message-Id: <20220901074216.1849941-14-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220901074216.1849941-1-hch@lst.de> References: <20220901074216.1849941-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Just opencode the functionality in the only caller and remove the now superflous error handling there. Signed-off-by: Christoph Hellwig Reviewed-by: Johannes Thumshirn Reviewed-by: Josef Bacik --- fs/btrfs/inode.c | 23 +++-------------------- 1 file changed, 3 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 399381a4f8e69..25194e75c0812 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -9990,17 +9990,6 @@ struct btrfs_encoded_read_private { blk_status_t status; }; -static blk_status_t submit_encoded_read_bio(struct btrfs_inode *inode, - struct bio *bio, int mirror_num) -{ - struct btrfs_encoded_read_private *priv = btrfs_bio(bio)->private; - struct btrfs_fs_info *fs_info = inode->root->fs_info; - - atomic_inc(&priv->pending); - btrfs_submit_bio(fs_info, bio, mirror_num); - return BLK_STS_OK; -} - static void btrfs_encoded_read_endio(struct btrfs_bio *bbio) { struct btrfs_encoded_read_private *priv = bbio->private; @@ -10025,6 +10014,7 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, u64 file_offset, u64 disk_bytenr, u64 disk_io_size, struct page **pages) { + struct btrfs_fs_info *fs_info = inode->root->fs_info; struct btrfs_encoded_read_private priv = { .inode = inode, .file_offset = file_offset, @@ -10055,14 +10045,8 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, if (!bytes || bio_add_page(bio, pages[i], bytes, 0) < bytes) { - blk_status_t status; - - status = submit_encoded_read_bio(inode, bio, 0); - if (status) { - WRITE_ONCE(priv.status, status); - bio_put(bio); - goto out; - } + atomic_inc(&priv.pending); + btrfs_submit_bio(fs_info, bio, 0); bio = NULL; continue; } @@ -10073,7 +10057,6 @@ int btrfs_encoded_read_regular_fill_pages(struct btrfs_inode *inode, } } -out: if (atomic_dec_return(&priv.pending)) io_wait_event(priv.wait, !atomic_read(&priv.pending)); /* See btrfs_encoded_read_endio() for ordering. */ From patchwork Thu Sep 1 07:42:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12962043 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BD73ECAAD1 for ; Thu, 1 Sep 2022 07:44:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232768AbiIAHoL (ORCPT ); Thu, 1 Sep 2022 03:44:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35750 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234053AbiIAHnh (ORCPT ); Thu, 1 Sep 2022 03:43:37 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0970B1257F1; Thu, 1 Sep 2022 00:43:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=BcknqeZiBU7AfUTFnagzOoaBAn7P6DlrRAHe430S5NY=; b=l8ySUqRFcax7ifTq824G4bo2uG +ez/WKKM34hhZ1j9okNSPxOPmm83PfkHWYcMdBfcO4f9eL9Uv43YsRUTVliezK4uGKPXdx7AkUZzq nTIWjqzmbjdOyEdSOQh9b80OI4ns1n/UyUtGDGoCWVT4GS549E+22DoHEDdMTN8SDJ9LnpexTnjBF PXjDuiGe0yxMe9/jE2yIc6o91Q/3f2wuMELt82+ZYriwEkpuk6gH0LxGA8TC4kAVvn3irnhWz/cqO sUIqBFA+wjDkJ6VDnMwO5Q6fcxj7JY30s7Y/ghKE/G2LOcP4qcTrA1b9jfW509FTnKzzOSniiHSzE 40YUsBow==; Received: from 213-225-1-14.nat.highway.a1.net ([213.225.1.14] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1oTer6-00ANmz-SF; Thu, 01 Sep 2022 07:43:17 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Damien Le Moal , Naohiro Aota , Johannes Thumshirn , Qu Wenruo , Jens Axboe , "Darrick J. Wong" , linux-block@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 14/17] btrfs: remove now spurious bio submission helpers Date: Thu, 1 Sep 2022 10:42:13 +0300 Message-Id: <20220901074216.1849941-15-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220901074216.1849941-1-hch@lst.de> References: <20220901074216.1849941-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Just call btrfs_submit_bio and btrfs_submit_compressed_read directly from submit_one_bio now that all additional functionality has moved into btrfs_submit_bio. Signed-off-by: Christoph Hellwig Reviewed-by: Johannes Thumshirn Reviewed-by: Josef Bacik --- fs/btrfs/ctree.h | 3 --- fs/btrfs/disk-io.c | 6 ------ fs/btrfs/disk-io.h | 1 - fs/btrfs/extent_io.c | 11 ++++++----- fs/btrfs/inode.c | 22 ---------------------- 5 files changed, 6 insertions(+), 37 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 33c3c394e43e3..5e57e3c6a1fd6 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -3372,9 +3372,6 @@ void btrfs_inode_safe_disk_i_size_write(struct btrfs_inode *inode, u64 new_i_siz u64 btrfs_file_extent_end(const struct btrfs_path *path); /* inode.c */ -void btrfs_submit_data_write_bio(struct inode *inode, struct bio *bio, int mirror_num); -void btrfs_submit_data_read_bio(struct inode *inode, struct bio *bio, - int mirror_num, enum btrfs_compression_type compress_type); bool btrfs_data_csum_ok(struct btrfs_bio *bbio, struct btrfs_device *dev, u32 bio_offset, struct bio_vec *bv); struct extent_map *btrfs_get_extent_fiemap(struct btrfs_inode *inode, diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index ceee039b65ea0..014c06c74155f 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -648,12 +648,6 @@ int btree_csum_one_bio(struct btrfs_bio *bbio) return ret; } -void btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio, int mirror_num) -{ - bio->bi_opf |= REQ_META; - btrfs_submit_bio(btrfs_sb(inode->i_sb), bio, mirror_num); -} - #ifdef CONFIG_MIGRATION static int btree_migrate_folio(struct address_space *mapping, struct folio *dst, struct folio *src, enum migrate_mode mode) diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h index 9d4e0e36f7bb9..3a7ef2352c968 100644 --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -80,7 +80,6 @@ void btrfs_drop_and_free_fs_root(struct btrfs_fs_info *fs_info, int btrfs_validate_metadata_buffer(struct btrfs_bio *bbio, struct page *page, u64 start, u64 end, int mirror); -void btrfs_submit_metadata_bio(struct inode *inode, struct bio *bio, int mirror_num); #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS struct btrfs_root *btrfs_alloc_dummy_root(struct btrfs_fs_info *fs_info); #endif diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 46a3f0e33fb69..33e80f8dd0b1b 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -198,12 +198,13 @@ static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl) btrfs_bio(bio)->file_offset = page_offset(bv->bv_page) + bv->bv_offset; if (!is_data_inode(inode)) - btrfs_submit_metadata_bio(inode, bio, mirror_num); - else if (btrfs_op(bio) == BTRFS_MAP_WRITE) - btrfs_submit_data_write_bio(inode, bio, mirror_num); + bio->bi_opf |= REQ_META; + + if (btrfs_op(bio) == BTRFS_MAP_READ && + bio_ctrl->compress_type != BTRFS_COMPRESS_NONE) + btrfs_submit_compressed_read(inode, bio, mirror_num); else - btrfs_submit_data_read_bio(inode, bio, mirror_num, - bio_ctrl->compress_type); + btrfs_submit_bio(btrfs_sb(inode->i_sb), bio, mirror_num); /* The bio is owned by the end_io handler now */ bio_ctrl->bio = NULL; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 25194e75c0812..9c562d36e4570 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2662,28 +2662,6 @@ int btrfs_extract_ordered_extent(struct btrfs_bio *bbio) return ret; } -void btrfs_submit_data_write_bio(struct inode *inode, struct bio *bio, int mirror_num) -{ - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - - btrfs_submit_bio(fs_info, bio, mirror_num); -} - -void btrfs_submit_data_read_bio(struct inode *inode, struct bio *bio, - int mirror_num, enum btrfs_compression_type compress_type) -{ - if (compress_type != BTRFS_COMPRESS_NONE) { - /* - * btrfs_submit_compressed_read will handle completing the bio - * if there were any errors, so just return here. - */ - btrfs_submit_compressed_read(inode, bio, mirror_num); - return; - } - - btrfs_submit_bio(btrfs_sb(inode->i_sb), bio, mirror_num); -} - /* * given a list of ordered sums record them in the inode. This happens * at IO completion time based on sums calculated at bio submission time. From patchwork Thu Sep 1 07:42:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12962044 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6FFCC6FA85 for ; Thu, 1 Sep 2022 07:44:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233459AbiIAHoQ (ORCPT ); Thu, 1 Sep 2022 03:44:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35772 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230136AbiIAHnh (ORCPT ); Thu, 1 Sep 2022 03:43:37 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF4781257FE; Thu, 1 Sep 2022 00:43:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=STwYbSO0Izmp+1f3nnFS1hp97GJ6kHn4j/tdcQ/e/24=; b=xH18PWLefCm9R1AjbvzOcidgo4 nukbqy9Qggn1Vlgq/8yPXV7djtDV+08cNMHoJLDRpzwTCLWHa/eEqcpgFrWZ4R6VJoeart4lNjd2L c41NXI+lH5ffd9RaElqT3pRqUKjLjkkPiJjDS6L7m2e4yzQceMwwHPqtAxXsuEPDpeeW62AJYd88s WrFjP+FLK8BIIq5XE0eDZwdOlwlCcZmFDs9S6VbOoO2TJvS4VcYNZfghbi0cFlrgbT713VO+suZHK 3xmyNlm8+m7hm4ehNxOnScG7rqo8qmBqlZv94tQ0GTeaag6VUhWgDmEK+GotGr/3XD2qzmCiTLqmn XusE6y9A==; Received: from 213-225-1-14.nat.highway.a1.net ([213.225.1.14] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1oTerA-00ANo2-5B; Thu, 01 Sep 2022 07:43:20 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Damien Le Moal , Naohiro Aota , Johannes Thumshirn , Qu Wenruo , Jens Axboe , "Darrick J. Wong" , linux-block@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 15/17] btrfs: calculate file system wide queue limit for zoned mode Date: Thu, 1 Sep 2022 10:42:14 +0300 Message-Id: <20220901074216.1849941-16-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220901074216.1849941-1-hch@lst.de> References: <20220901074216.1849941-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org To be able to split a write into properly sized zone append commands, we need a queue_limits structure that contains the least common denominator suitable for all devices. Signed-off-by: Christoph Hellwig --- fs/btrfs/ctree.h | 4 +++- fs/btrfs/zoned.c | 36 ++++++++++++++++++------------------ fs/btrfs/zoned.h | 1 - 3 files changed, 21 insertions(+), 20 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 5e57e3c6a1fd6..a37129363e184 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -1071,8 +1071,10 @@ struct btrfs_fs_info { */ u64 zone_size; - /* Max size to emit ZONE_APPEND write command */ + /* Constraints for ZONE_APPEND commands: */ + struct queue_limits limits; u64 max_zone_append_size; + struct mutex zoned_meta_io_lock; spinlock_t treelog_bg_lock; u64 treelog_bg; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 2638f71eec4b6..6e04fbbd76b92 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -415,16 +415,6 @@ int btrfs_get_dev_zone_info(struct btrfs_device *device, bool populate_cache) nr_sectors = bdev_nr_sectors(bdev); zone_info->zone_size_shift = ilog2(zone_info->zone_size); zone_info->nr_zones = nr_sectors >> ilog2(zone_sectors); - /* - * We limit max_zone_append_size also by max_segments * - * PAGE_SIZE. Technically, we can have multiple pages per segment. But, - * since btrfs adds the pages one by one to a bio, and btrfs cannot - * increase the metadata reservation even if it increases the number of - * extents, it is safe to stick with the limit. - */ - zone_info->max_zone_append_size = - min_t(u64, (u64)bdev_max_zone_append_sectors(bdev) << SECTOR_SHIFT, - (u64)bdev_max_segments(bdev) << PAGE_SHIFT); if (!IS_ALIGNED(nr_sectors, zone_sectors)) zone_info->nr_zones++; @@ -646,14 +636,16 @@ int btrfs_get_dev_zone(struct btrfs_device *device, u64 pos, int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) { struct btrfs_fs_devices *fs_devices = fs_info->fs_devices; + struct queue_limits *lim = &fs_info->limits; struct btrfs_device *device; u64 zoned_devices = 0; u64 nr_devices = 0; u64 zone_size = 0; - u64 max_zone_append_size = 0; const bool incompat_zoned = btrfs_fs_incompat(fs_info, ZONED); int ret = 0; + blk_set_stacking_limits(lim); + /* Count zoned devices */ list_for_each_entry(device, &fs_devices->devices, dev_list) { enum blk_zoned_model model; @@ -685,11 +677,9 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) ret = -EINVAL; goto out; } - if (!max_zone_append_size || - (zone_info->max_zone_append_size && - zone_info->max_zone_append_size < max_zone_append_size)) - max_zone_append_size = - zone_info->max_zone_append_size; + blk_stack_limits(lim, + &bdev_get_queue(device->bdev)->limits, + 0); } nr_devices++; } @@ -739,8 +729,18 @@ int btrfs_check_zoned_mode(struct btrfs_fs_info *fs_info) } fs_info->zone_size = zone_size; - fs_info->max_zone_append_size = ALIGN_DOWN(max_zone_append_size, - fs_info->sectorsize); + /* + * Also limit max_zone_append_size by max_segments * PAGE_SIZE. + * Technically, we can have multiple pages per segment. But, + * since btrfs adds the pages one by one to a bio, and btrfs cannot + * increase the metadata reservation even if it increases the number of + * extents, it is safe to stick with the limit. + */ + fs_info->max_zone_append_size = ALIGN_DOWN( + min3((u64)lim->max_zone_append_sectors << SECTOR_SHIFT, + (u64)lim->max_sectors << SECTOR_SHIFT, + (u64)lim->max_segments << PAGE_SHIFT), + fs_info->sectorsize); fs_info->fs_devices->chunk_alloc_policy = BTRFS_CHUNK_ALLOC_ZONED; if (fs_info->max_zone_append_size < fs_info->max_extent_size) fs_info->max_extent_size = fs_info->max_zone_append_size; diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index cafa639927050..0f22b22fe359f 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -19,7 +19,6 @@ struct btrfs_zoned_device_info { */ u64 zone_size; u8 zone_size_shift; - u64 max_zone_append_size; u32 nr_zones; unsigned int max_active_zones; atomic_t active_zones_left; From patchwork Thu Sep 1 07:42:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12962046 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 797D8C6FA89 for ; Thu, 1 Sep 2022 07:44:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233824AbiIAHoR (ORCPT ); Thu, 1 Sep 2022 03:44:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35798 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234075AbiIAHni (ORCPT ); Thu, 1 Sep 2022 03:43:38 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B7F7312693B; Thu, 1 Sep 2022 00:43:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=5iC1mZMgXVVxWk77LtnoSdDtdc2kx+mMMMthlFAifwU=; b=G6voUUBkf24dJwqFnHdyZFDKeu 2ErVE0svQ0pz3RM1KFr8MpEc4Vl2djVgl8O5xM6qDsfx6MjXRTL54qSLlFXFejr3vo6+AjwC25FeJ IkUL9Wq0pWFAK1ag0qMRXZJpNwqCH0q1sdb7tBDEJqicLK9CpXJhL8t18OoVyeaTsUJnIXRebnaT/ +fUcwochpqYGdtH49Od+Km/VbZxMsZ6PsKZAAniYqXQfRiAyKojbJQHrXC2T8givSshwNT2uidEi8 XZimm3W2dAJypitRpifFnS32wdbjZcSLRyLVEcaMDlyPvmorsvxI7eHC7XbBZfNNOta8tMvlMHvfn z4Tq5Auw==; Received: from 213-225-1-14.nat.highway.a1.net ([213.225.1.14] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1oTerE-00ANoi-67; Thu, 01 Sep 2022 07:43:24 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Damien Le Moal , Naohiro Aota , Johannes Thumshirn , Qu Wenruo , Jens Axboe , "Darrick J. Wong" , linux-block@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 16/17] btrfs: split zone append bios in btrfs_submit_bio Date: Thu, 1 Sep 2022 10:42:15 +0300 Message-Id: <20220901074216.1849941-17-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220901074216.1849941-1-hch@lst.de> References: <20220901074216.1849941-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The current btrfs zoned device support is a little cumbersome in the data I/O path as it requires the callers to not support more I/O than the supported ZONE_APPEND size by the underlying device. This leads to a lot of extra accounting. Instead change btrfs_submit_bio so that it can take write bios of arbitrary size and form from the upper layers, and just split them internally to the ZONE_APPEND queue limits. Then remove all the upper layer warts catering to limited write sized on zoned devices, including the extra refcount in the compressed_bio. Signed-off-by: Christoph Hellwig Reviewed-by: Josef Bacik --- fs/btrfs/compression.c | 112 ++++++++--------------------------------- fs/btrfs/compression.h | 3 -- fs/btrfs/extent_io.c | 74 ++++++--------------------- fs/btrfs/inode.c | 4 -- fs/btrfs/volumes.c | 40 +++++++++------ fs/btrfs/zoned.c | 20 -------- fs/btrfs/zoned.h | 9 ---- 7 files changed, 62 insertions(+), 200 deletions(-) diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 5e8b75b030ace..f89cac08dc4a4 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -255,57 +255,14 @@ static void btrfs_finish_compressed_write_work(struct work_struct *work) static void end_compressed_bio_write(struct btrfs_bio *bbio) { struct compressed_bio *cb = bbio->private; + struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb); - if (bbio->bio.bi_status) - cb->status = bbio->bio.bi_status; - - if (refcount_dec_and_test(&cb->pending_ios)) { - struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb); + cb->status = bbio->bio.bi_status; + queue_work(fs_info->compressed_write_workers, &cb->write_end_work); - queue_work(fs_info->compressed_write_workers, &cb->write_end_work); - } bio_put(&bbio->bio); } -/* - * Allocate a compressed_bio, which will be used to read/write on-disk - * (aka, compressed) * data. - * - * @cb: The compressed_bio structure, which records all the needed - * information to bind the compressed data to the uncompressed - * page cache. - * @disk_byten: The logical bytenr where the compressed data will be read - * from or written to. - * @endio_func: The endio function to call after the IO for compressed data - * is finished. - */ -static struct bio *alloc_compressed_bio(struct compressed_bio *cb, u64 disk_bytenr, - blk_opf_t opf, - btrfs_bio_end_io_t endio_func) -{ - struct bio *bio; - - bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, cb->inode, endio_func, cb); - bio->bi_iter.bi_sector = disk_bytenr >> SECTOR_SHIFT; - - if (bio_op(bio) == REQ_OP_ZONE_APPEND) { - struct btrfs_fs_info *fs_info = btrfs_sb(cb->inode->i_sb); - struct extent_map *em; - - em = btrfs_get_chunk_map(fs_info, disk_bytenr, - fs_info->sectorsize); - if (IS_ERR(em)) { - bio_put(bio); - return ERR_CAST(em); - } - - bio_set_dev(bio, em->map_lookup->stripes[0].dev->bdev); - free_extent_map(em); - } - refcount_inc(&cb->pending_ios); - return bio; -} - /* * worker function to build and submit bios for previously compressed pages. * The corresponding pages in the inode should be marked for writeback @@ -329,16 +286,12 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, struct compressed_bio *cb; u64 cur_disk_bytenr = disk_start; blk_status_t ret = BLK_STS_OK; - const bool use_append = btrfs_use_zone_append(inode, disk_start); - const enum req_op bio_op = REQ_BTRFS_ONE_ORDERED | - (use_append ? REQ_OP_ZONE_APPEND : REQ_OP_WRITE); ASSERT(IS_ALIGNED(start, fs_info->sectorsize) && IS_ALIGNED(len, fs_info->sectorsize)); cb = kmalloc(sizeof(struct compressed_bio), GFP_NOFS); if (!cb) return BLK_STS_RESOURCE; - refcount_set(&cb->pending_ios, 1); cb->status = BLK_STS_OK; cb->inode = &inode->vfs_inode; cb->start = start; @@ -349,8 +302,15 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, INIT_WORK(&cb->write_end_work, btrfs_finish_compressed_write_work); cb->nr_pages = nr_pages; - if (blkcg_css) + if (blkcg_css) { kthread_associate_blkcg(blkcg_css); + write_flags |= REQ_CGROUP_PUNT; + } + + write_flags |= REQ_BTRFS_ONE_ORDERED; + bio = btrfs_bio_alloc(BIO_MAX_VECS, REQ_OP_WRITE | write_flags, + cb->inode, end_compressed_bio_write, cb); + bio->bi_iter.bi_sector = cur_disk_bytenr >> SECTOR_SHIFT; while (cur_disk_bytenr < disk_start + compressed_len) { u64 offset = cur_disk_bytenr - disk_start; @@ -358,19 +318,7 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, unsigned int real_size; unsigned int added; struct page *page = compressed_pages[index]; - bool submit = false; - - /* Allocate new bio if submitted or not yet allocated */ - if (!bio) { - bio = alloc_compressed_bio(cb, cur_disk_bytenr, - bio_op | write_flags, end_compressed_bio_write); - if (IS_ERR(bio)) { - ret = errno_to_blk_status(PTR_ERR(bio)); - break; - } - if (blkcg_css) - bio->bi_opf |= REQ_CGROUP_PUNT; - } + /* * We have various limits on the real read size: * - page boundary @@ -380,36 +328,21 @@ blk_status_t btrfs_submit_compressed_write(struct btrfs_inode *inode, u64 start, real_size = min_t(u64, real_size, compressed_len - offset); ASSERT(IS_ALIGNED(real_size, fs_info->sectorsize)); - if (use_append) - added = bio_add_zone_append_page(bio, page, real_size, - offset_in_page(offset)); - else - added = bio_add_page(bio, page, real_size, - offset_in_page(offset)); - /* Reached zoned boundary */ - if (added == 0) - submit = true; - + added = bio_add_page(bio, page, real_size, offset_in_page(offset)); + /* + * Maximum compressed extent is smaller than bio size limit, + * thus bio_add_page() should always success. + */ + ASSERT(added == real_size); cur_disk_bytenr += added; - - /* Finished the range */ - if (cur_disk_bytenr == disk_start + compressed_len) - submit = true; - - if (submit) { - ASSERT(bio->bi_iter.bi_size); - btrfs_bio(bio)->file_offset = start; - btrfs_submit_bio(fs_info, bio, 0); - bio = NULL; - } - cond_resched(); } + /* Finished the range */ + ASSERT(bio->bi_iter.bi_size); + btrfs_bio(bio)->file_offset = start; + btrfs_submit_bio(fs_info, bio, 0); if (blkcg_css) kthread_associate_blkcg(NULL); - - if (refcount_dec_and_test(&cb->pending_ios)) - finish_compressed_bio_write(cb); return ret; } @@ -613,7 +546,6 @@ void btrfs_submit_compressed_read(struct inode *inode, struct bio *bio, goto out; } - refcount_set(&cb->pending_ios, 1); cb->status = BLK_STS_OK; cb->inode = inode; diff --git a/fs/btrfs/compression.h b/fs/btrfs/compression.h index 1aa02903de697..25876f7a26949 100644 --- a/fs/btrfs/compression.h +++ b/fs/btrfs/compression.h @@ -30,9 +30,6 @@ static_assert((BTRFS_MAX_COMPRESSED % PAGE_SIZE) == 0); #define BTRFS_ZLIB_DEFAULT_LEVEL 3 struct compressed_bio { - /* Number of outstanding bios */ - refcount_t pending_ios; - /* Number of compressed pages in the array */ unsigned int nr_pages; diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 33e80f8dd0b1b..40dadc46e00d8 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2597,7 +2597,6 @@ static int btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl, u32 real_size; const sector_t sector = disk_bytenr >> SECTOR_SHIFT; bool contig = false; - int ret; ASSERT(bio); /* The limit should be calculated when bio_ctrl->bio is allocated */ @@ -2646,12 +2645,7 @@ static int btrfs_bio_add_page(struct btrfs_bio_ctrl *bio_ctrl, if (real_size == 0) return 0; - if (bio_op(bio) == REQ_OP_ZONE_APPEND) - ret = bio_add_zone_append_page(bio, page, real_size, pg_offset); - else - ret = bio_add_page(bio, page, real_size, pg_offset); - - return ret; + return bio_add_page(bio, page, real_size, pg_offset); } static void calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl, @@ -2666,7 +2660,7 @@ static void calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl, * to them. */ if (bio_ctrl->compress_type == BTRFS_COMPRESS_NONE && - bio_op(bio_ctrl->bio) == REQ_OP_ZONE_APPEND) { + btrfs_use_zone_append(inode, logical)) { ordered = btrfs_lookup_ordered_extent(inode, file_offset); if (ordered) { bio_ctrl->len_to_oe_boundary = min_t(u32, U32_MAX, @@ -2680,17 +2674,15 @@ static void calc_bio_boundaries(struct btrfs_bio_ctrl *bio_ctrl, bio_ctrl->len_to_oe_boundary = U32_MAX; } -static int alloc_new_bio(struct btrfs_inode *inode, - struct btrfs_bio_ctrl *bio_ctrl, - struct writeback_control *wbc, - blk_opf_t opf, - btrfs_bio_end_io_t end_io_func, - u64 disk_bytenr, u32 offset, u64 file_offset, - enum btrfs_compression_type compress_type) +static void alloc_new_bio(struct btrfs_inode *inode, + struct btrfs_bio_ctrl *bio_ctrl, + struct writeback_control *wbc, blk_opf_t opf, + btrfs_bio_end_io_t end_io_func, + u64 disk_bytenr, u32 offset, u64 file_offset, + enum btrfs_compression_type compress_type) { struct btrfs_fs_info *fs_info = inode->root->fs_info; struct bio *bio; - int ret; bio = btrfs_bio_alloc(BIO_MAX_VECS, opf, &inode->vfs_inode, end_io_func, NULL); @@ -2708,40 +2700,14 @@ static int alloc_new_bio(struct btrfs_inode *inode, if (wbc) { /* - * For Zone append we need the correct block_device that we are - * going to write to set in the bio to be able to respect the - * hardware limitation. Look it up here: + * Pick the last added device to support cgroup writeback. For + * multi-device file systems this means blk-cgroup policies have + * to always be set on the last added/replaced device. + * This is a bit odd but has been like that for a long time. */ - if (bio_op(bio) == REQ_OP_ZONE_APPEND) { - struct btrfs_device *dev; - - dev = btrfs_zoned_get_device(fs_info, disk_bytenr, - fs_info->sectorsize); - if (IS_ERR(dev)) { - ret = PTR_ERR(dev); - goto error; - } - - bio_set_dev(bio, dev->bdev); - } else { - /* - * Otherwise pick the last added device to support - * cgroup writeback. For multi-device file systems this - * means blk-cgroup policies have to always be set on the - * last added/replaced device. This is a bit odd but has - * been like that for a long time. - */ - bio_set_dev(bio, fs_info->fs_devices->latest_dev->bdev); - } + bio_set_dev(bio, fs_info->fs_devices->latest_dev->bdev); wbc_init_bio(wbc, bio); - } else { - ASSERT(bio_op(bio) != REQ_OP_ZONE_APPEND); } - return 0; -error: - bio_ctrl->bio = NULL; - btrfs_bio_end_io(btrfs_bio(bio), errno_to_blk_status(ret)); - return ret; } /* @@ -2767,7 +2733,6 @@ static int submit_extent_page(blk_opf_t opf, enum btrfs_compression_type compress_type, bool force_bio_submit) { - int ret = 0; struct btrfs_inode *inode = BTRFS_I(page->mapping->host); unsigned int cur = pg_offset; @@ -2784,12 +2749,9 @@ static int submit_extent_page(blk_opf_t opf, /* Allocate new bio if needed */ if (!bio_ctrl->bio) { - ret = alloc_new_bio(inode, bio_ctrl, wbc, opf, - end_io_func, disk_bytenr, offset, - page_offset(page) + cur, - compress_type); - if (ret < 0) - return ret; + alloc_new_bio(inode, bio_ctrl, wbc, opf, end_io_func, + disk_bytenr, offset, + page_offset(page) + cur, compress_type); } /* * We must go through btrfs_bio_add_page() to ensure each @@ -3354,10 +3316,6 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, * find_next_dirty_byte() are all exclusive */ iosize = min(min(em_end, end + 1), dirty_range_end) - cur; - - if (btrfs_use_zone_append(inode, em->block_start)) - op = REQ_OP_ZONE_APPEND; - free_extent_map(em); em = NULL; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 9c562d36e4570..1a0bf381f2437 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7727,10 +7727,6 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start, iomap->offset = start; iomap->bdev = fs_info->fs_devices->latest_dev->bdev; iomap->length = len; - - if (write && btrfs_use_zone_append(BTRFS_I(inode), em->block_start)) - iomap->flags |= IOMAP_F_ZONE_APPEND; - free_extent_map(em); return 0; diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index e497b63238189..0d828b58cc9c3 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6632,13 +6632,22 @@ struct bio *btrfs_bio_alloc(unsigned int nr_vecs, blk_opf_t opf, return bio; } -static struct bio *btrfs_split_bio(struct bio *orig, u64 map_length) +static struct bio *btrfs_split_bio(struct btrfs_fs_info *fs_info, + struct bio *orig, u64 map_length, + bool use_append) { struct btrfs_bio *orig_bbio = btrfs_bio(orig); struct bio *bio; - bio = bio_split(orig, map_length >> SECTOR_SHIFT, GFP_NOFS, - &btrfs_clone_bioset); + if (use_append) { + unsigned int nr_segs; + + bio = bio_split_rw(orig, &fs_info->limits, &nr_segs, + &btrfs_clone_bioset, map_length); + } else { + bio = bio_split(orig, map_length >> SECTOR_SHIFT, GFP_NOFS, + &btrfs_clone_bioset); + } btrfs_bio_init(btrfs_bio(bio), orig_bbio->inode, NULL, orig_bbio); btrfs_bio(bio)->file_offset = orig_bbio->file_offset; @@ -6970,16 +6979,10 @@ static void btrfs_submit_dev_bio(struct btrfs_device *dev, struct bio *bio) */ if (bio_op(bio) == REQ_OP_ZONE_APPEND) { u64 physical = bio->bi_iter.bi_sector << SECTOR_SHIFT; + u64 zone_start = round_down(physical, dev->fs_info->zone_size); - if (btrfs_dev_is_sequential(dev, physical)) { - u64 zone_start = round_down(physical, - dev->fs_info->zone_size); - - bio->bi_iter.bi_sector = zone_start >> SECTOR_SHIFT; - } else { - bio->bi_opf &= ~REQ_OP_ZONE_APPEND; - bio->bi_opf |= REQ_OP_WRITE; - } + ASSERT(btrfs_dev_is_sequential(dev, physical)); + bio->bi_iter.bi_sector = zone_start >> SECTOR_SHIFT; } btrfs_debug_in_rcu(dev->fs_info, "%s: rw %d 0x%x, sector=%llu, dev=%lu (%s id %llu), size=%u", @@ -7179,9 +7182,11 @@ static bool btrfs_submit_chunk(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror_num) { struct btrfs_bio *bbio = btrfs_bio(bio); + struct btrfs_inode *bi = BTRFS_I(bbio->inode); u64 logical = bio->bi_iter.bi_sector << 9; u64 length = bio->bi_iter.bi_size; u64 map_length = length; + bool use_append = btrfs_use_zone_append(bi, logical); struct btrfs_io_context *bioc = NULL; struct btrfs_io_stripe smap; int ret; @@ -7193,8 +7198,11 @@ static bool btrfs_submit_chunk(struct btrfs_fs_info *fs_info, struct bio *bio, goto fail; map_length = min(map_length, length); + if (use_append) + map_length = min(map_length, fs_info->max_zone_append_size); + if (map_length < length) { - bio = btrfs_split_bio(bio, map_length); + bio = btrfs_split_bio(fs_info, bio, map_length, use_append); bbio = btrfs_bio(bio); } @@ -7210,9 +7218,9 @@ static bool btrfs_submit_chunk(struct btrfs_fs_info *fs_info, struct bio *bio, } if (btrfs_op(bio) == BTRFS_MAP_WRITE) { - struct btrfs_inode *bi = BTRFS_I(bbio->inode); - - if (bio_op(bio) == REQ_OP_ZONE_APPEND) { + if (use_append) { + bio->bi_opf &= ~REQ_OP_WRITE; + bio->bi_opf |= REQ_OP_ZONE_APPEND; ret = btrfs_extract_ordered_extent(btrfs_bio(bio)); if (ret) goto fail_put_bio; diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index 6e04fbbd76b92..988e9fc5a6b7b 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1818,26 +1818,6 @@ int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, u64 logical, return btrfs_zoned_issue_zeroout(tgt_dev, physical_pos, length); } -struct btrfs_device *btrfs_zoned_get_device(struct btrfs_fs_info *fs_info, - u64 logical, u64 length) -{ - struct btrfs_device *device; - struct extent_map *em; - struct map_lookup *map; - - em = btrfs_get_chunk_map(fs_info, logical, length); - if (IS_ERR(em)) - return ERR_CAST(em); - - map = em->map_lookup; - /* We only support single profile for now */ - device = map->stripes[0].dev; - - free_extent_map(em); - - return device; -} - /** * Activate block group and underlying device zones * diff --git a/fs/btrfs/zoned.h b/fs/btrfs/zoned.h index 0f22b22fe359f..74153ab52169f 100644 --- a/fs/btrfs/zoned.h +++ b/fs/btrfs/zoned.h @@ -64,8 +64,6 @@ void btrfs_revert_meta_write_pointer(struct btrfs_block_group *cache, int btrfs_zoned_issue_zeroout(struct btrfs_device *device, u64 physical, u64 length); int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, u64 logical, u64 physical_start, u64 physical_pos); -struct btrfs_device *btrfs_zoned_get_device(struct btrfs_fs_info *fs_info, - u64 logical, u64 length); bool btrfs_zone_activate(struct btrfs_block_group *block_group); int btrfs_zone_finish(struct btrfs_block_group *block_group); bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags); @@ -209,13 +207,6 @@ static inline int btrfs_sync_zone_write_pointer(struct btrfs_device *tgt_dev, return -EOPNOTSUPP; } -static inline struct btrfs_device *btrfs_zoned_get_device( - struct btrfs_fs_info *fs_info, - u64 logical, u64 length) -{ - return ERR_PTR(-EOPNOTSUPP); -} - static inline bool btrfs_zone_activate(struct btrfs_block_group *block_group) { return true; From patchwork Thu Sep 1 07:42:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12962047 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACC3FECAAD8 for ; Thu, 1 Sep 2022 07:44:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234033AbiIAHoU (ORCPT ); Thu, 1 Sep 2022 03:44:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34852 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234118AbiIAHnq (ORCPT ); Thu, 1 Sep 2022 03:43:46 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F5D6126DC6; Thu, 1 Sep 2022 00:43:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=ZKNiS3a/9C2boxpP0LYq6qVDpc8a6QliuN3Wa2lrTSg=; b=V24jpI2xK/h3II61ZVjkylZDJu euH7JDCPXV21T0EBqJL266HsMXmDD8VuZp0SuGsifikk1x+Xr8PNBL9FZBPd9nrlwQy10QI0kbO/I QRXDqMACU7h4t1iIm+dHWeIr2dMxfIZaHJdi5j0Hum/yRhJ4OADG2V8vM6rxeTeefBc5ShCK7Xagg FNjUz0b4Ab9Vt3AknW8tvl7f/IzUPSk7SEjx+oUSyK0Ikri3jDR+qypoy1Fdc3NyUx44cc6PGRexK UMhxB3EyZPCnaYPQwiEkA26/AxgchXKEAstgOQt2lCnQ8Qe6keG4KojsLpRHGRjmJHq6vGJSexgaX Qok/yxcw==; Received: from 213-225-1-14.nat.highway.a1.net ([213.225.1.14] helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1oTerI-00ANpU-ON; Thu, 01 Sep 2022 07:43:29 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Damien Le Moal , Naohiro Aota , Johannes Thumshirn , Qu Wenruo , Jens Axboe , "Darrick J. Wong" , linux-block@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [PATCH 17/17] iomap: remove IOMAP_F_ZONE_APPEND Date: Thu, 1 Sep 2022 10:42:16 +0300 Message-Id: <20220901074216.1849941-18-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220901074216.1849941-1-hch@lst.de> References: <20220901074216.1849941-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org No users left now that btrfs takes REQ_OP_WRITE bios from iomap and splits and converts them to REQ_OP_ZONE_APPEND internally. Signed-off-by: Christoph Hellwig Reviewed-by: Johannes Thumshirn Reviewed-by: Damien Le Moal Reviewed-by: Josef Bacik --- fs/iomap/direct-io.c | 10 ++-------- include/linux/iomap.h | 1 - 2 files changed, 2 insertions(+), 9 deletions(-) diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 4eb559a16c9ed..9e883a9f80388 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -217,16 +217,10 @@ static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio, { blk_opf_t opflags = REQ_SYNC | REQ_IDLE; - if (!(dio->flags & IOMAP_DIO_WRITE)) { - WARN_ON_ONCE(iomap->flags & IOMAP_F_ZONE_APPEND); + if (!(dio->flags & IOMAP_DIO_WRITE)) return REQ_OP_READ; - } - - if (iomap->flags & IOMAP_F_ZONE_APPEND) - opflags |= REQ_OP_ZONE_APPEND; - else - opflags |= REQ_OP_WRITE; + opflags |= REQ_OP_WRITE; if (use_fua) opflags |= REQ_FUA; else diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 238a03087e17e..ee6d511ef29dd 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -55,7 +55,6 @@ struct vm_fault; #define IOMAP_F_SHARED 0x04 #define IOMAP_F_MERGED 0x08 #define IOMAP_F_BUFFER_HEAD 0x10 -#define IOMAP_F_ZONE_APPEND 0x20 /* * Flags set by the core iomap code during operations: