From patchwork Fri May 27 08:43:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12863144 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01C0AC433EF for ; Fri, 27 May 2022 08:43:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349287AbiE0Ina (ORCPT ); Fri, 27 May 2022 04:43:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40488 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244989AbiE0In2 (ORCPT ); Fri, 27 May 2022 04:43:28 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 859CB2AC64 for ; Fri, 27 May 2022 01:43:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=eB+5gB9D9FX3TVoZQAoCn6FKb/EkoZKAq3CE2JBaW3A=; b=TyuHvVF68kbsr8qrNM1LcNrGz+ 14+pmUw453zKee2MZvkE3DhSW++6Sip+86XVqhzcy8fijDO7sST3SgLWQpwktKNNCUq8Xcni7LXlG IoClhyZwHFpRBgFrf5lVT9v5xdgkPTDOBTvEaE89MEfmyLmR3qRkUKPfVTApPxblWwhGgsJq6btwk xa0xz0ousMagLqHYcmFY11/6BQIz4qFimHOB86LL7G57xl3dR4x/rD6SW3nYoA5/15kyX4JxuLOlu /afsgPWfCef1VIwuJGyLVud6U6KStONlFN9WfI8rovYr0Q+1JsroTuo7Elxi/d3UGiMH32qzW9FnE 4Alo4vFQ==; Received: from [2001:4bb8:18c:7298:b5ab:7d49:c6be:2011] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nuVZ6-00H3UM-RR; Fri, 27 May 2022 08:43:25 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Qu Wenruo , linux-btrfs@vger.kernel.org Subject: [PATCH 1/9] btrfs: save the original bi_iter into btrfs_bio for buffered read Date: Fri, 27 May 2022 10:43:12 +0200 Message-Id: <20220527084320.2130831-2-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220527084320.2130831-1-hch@lst.de> References: <20220527084320.2130831-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Qu Wenruo Although we have btrfs_bio::iter, it currently have very limited usage: - RAID56 Which is not needed at all - btrfs_bio_clone() This is used mostly for direct IO. For the incoming read repair patches, we want to grab the original logical bytenr, and be able to iterate the range of the bio (no matter if it's cloned). So this patch will also save btrfs_bio::iter for buffered read bios at submit_one_bio(). And for the sake of consistency, also save the btrfs_bio::iter for direct IO at btrfs_submit_dio_bio(). The reason that we didn't save the iter in btrfs_map_bio() is, btrfs_map_bio() is going to handle various bios, with or without btrfs_bio bioset. And we want to keep btrfs_map_bio() to handle and only handle plain bios without bother the bioset. Signed-off-by: Qu Wenruo Signed-off-by: Christoph Hellwig --- fs/btrfs/inode.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index eaeef64ea3486..025444aba2847 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -2633,6 +2633,10 @@ void btrfs_submit_data_read_bio(struct inode *inode, struct bio *bio, struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); blk_status_t ret; + /* save the original iter for read repair */ + if (bio_op(bio) == REQ_OP_READ) + btrfs_bio(bio)->iter = bio->bi_iter; + if (compress_type != BTRFS_COMPRESS_NONE) { /* * btrfs_submit_compressed_read will handle completing the bio @@ -7947,6 +7951,10 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); struct btrfs_dio_private *dip = bio->bi_private; blk_status_t ret; + + /* save the original iter for read repair */ + if (btrfs_op(bio) == BTRFS_MAP_READ) { + btrfs_bio(bio)->iter = bio->bi_iter; if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) goto map; From patchwork Fri May 27 08:43:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12863145 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F11AC433F5 for ; Fri, 27 May 2022 08:43:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349788AbiE0Inf (ORCPT ); Fri, 27 May 2022 04:43:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40536 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238691AbiE0Ind (ORCPT ); Fri, 27 May 2022 04:43:33 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EEF5C2AC64 for ; Fri, 27 May 2022 01:43:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=q1WJFJG/fxQxNU+bCa3zOYf9I1HWEMKAcH1bhOogkRo=; b=fQoImXxn6883oYrB1VNjCL3wjW C9KWKGrGDGiSPUa+6y+qA2I3Rv1fKSENTOuXEj13CUOG/m+ptcSbWpUQLyfTeWNajwS5FbqqiQWJ3 6Dha5Hv7Eg8T8WLaonJ+iNDEZUlhtQJzee5AX6xzesE/Qeoa4NtKVbr3PNr16nGZWvQkWB8gMDXgo 4rYtoekQF/1bwZKa2/CZIgYpnIhUY1rp51YBWa/SPEwRue+O6trsllZRnnizhrDk/bTJqjhm+MvNE Okc+Ulec3274d/CJk9Wj4Ls+PDYtDzfoZXvB9FUbADeqgu7W4RHhrL7kxBkeofExd6+7ghLymkF68 rvjjRWow==; Received: from [2001:4bb8:18c:7298:b5ab:7d49:c6be:2011] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nuVZ9-00H3V0-Cu; Fri, 27 May 2022 08:43:27 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Qu Wenruo , linux-btrfs@vger.kernel.org Subject: [PATCH 2/9] btrfs: set ->file_offset in end_bio_extent_readpage Date: Fri, 27 May 2022 10:43:13 +0200 Message-Id: <20220527084320.2130831-3-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220527084320.2130831-1-hch@lst.de> References: <20220527084320.2130831-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org The new repair code expects ->file_offset to be set for all bios. Set it just after entering end_bio_extent_readpage. As that requires looking at the first vector before the first loop iteration also use that opportunity to set various file-wide variables just once instead of once per loop iteration. Signed-off-by: Christoph Hellwig Reviewed-by: Qu Wenruo --- fs/btrfs/extent_io.c | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 48463705ef0e6..54a0ec62c7b0d 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3009,25 +3009,30 @@ static struct extent_buffer *find_extent_buffer_readpage( */ static void end_bio_extent_readpage(struct bio *bio) { + struct bio_vec *first_vec = bio_first_bvec_all(bio); + struct inode *inode = first_vec->bv_page->mapping->host; + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + const u32 sectorsize = fs_info->sectorsize; struct bio_vec *bvec; struct btrfs_bio *bbio = btrfs_bio(bio); - struct extent_io_tree *tree, *failure_tree; + int mirror = bbio->mirror_num; + struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree; + struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree; + bool uptodate = !bio->bi_status; struct processed_extent processed = { 0 }; /* * The offset to the beginning of a bio, since one bio can never be * larger than UINT_MAX, u32 here is enough. */ u32 bio_offset = 0; - int mirror; struct bvec_iter_all iter_all; + btrfs_bio(bio)->file_offset = + page_offset(first_vec->bv_page) + first_vec->bv_offset; + ASSERT(!bio_flagged(bio, BIO_CLONED)); bio_for_each_segment_all(bvec, bio, iter_all) { - bool uptodate = !bio->bi_status; struct page *page = bvec->bv_page; - struct inode *inode = page->mapping->host; - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - const u32 sectorsize = fs_info->sectorsize; unsigned int error_bitmap = (unsigned int)-1; bool repair = false; u64 start; @@ -3037,9 +3042,7 @@ static void end_bio_extent_readpage(struct bio *bio) btrfs_debug(fs_info, "end_bio_extent_readpage: bi_sector=%llu, err=%d, mirror=%u", bio->bi_iter.bi_sector, bio->bi_status, - bbio->mirror_num); - tree = &BTRFS_I(inode)->io_tree; - failure_tree = &BTRFS_I(inode)->io_failure_tree; + mirror); /* * We always issue full-sector reads, but if some block in a @@ -3062,7 +3065,6 @@ static void end_bio_extent_readpage(struct bio *bio) end = start + bvec->bv_len - 1; len = bvec->bv_len; - mirror = bbio->mirror_num; if (likely(uptodate)) { if (is_data_inode(inode)) { error_bitmap = btrfs_verify_data_csum(bbio, From patchwork Fri May 27 08:43:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12863146 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93056C433EF for ; Fri, 27 May 2022 08:43:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349799AbiE0Inh (ORCPT ); Fri, 27 May 2022 04:43:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40550 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346268AbiE0Ind (ORCPT ); Fri, 27 May 2022 04:43:33 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA1192AC53 for ; Fri, 27 May 2022 01:43:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=UhD5RxF+3M0A6imDKah3y64o23a8aOPSlCwhwsAxI6E=; b=JtDnxTGnwGJw6UJxEm8H6WWDvB H6E/JG+X5IVnCioPpxbQCyEnyHX4ukiX9aEUUGLdpiU3V24oZuambBIXOyNE8hCi9e9fvK+gMt7g5 Sey8Ndae0sbinkkdgjR1ORYpvdxi/m/ayzzV572jbvYXwdQ716WyRH1la4uioEPQ310PB5taiHv2Q w2bFi/NpnfjiSSbI40mpRFQcizIgkPxN1m2+0TSYJsSFZq82boAi13pqo967TRAHIcdWXCeBIwJ3q s5RfmmTCaWuYs47NbUwsMUXNhiPFaon4h+igzeQQ9C/mDz1FARoay6blvJwqsWaNZXem8OfJtkVUo VNocV6wg==; Received: from [2001:4bb8:18c:7298:b5ab:7d49:c6be:2011] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nuVZB-00H3Vf-VI; Fri, 27 May 2022 08:43:30 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Qu Wenruo , linux-btrfs@vger.kernel.org Subject: [PATCH 3/9] btrfs: factor out a btrfs_map_repair_bio helper Date: Fri, 27 May 2022 10:43:14 +0200 Message-Id: <20220527084320.2130831-4-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220527084320.2130831-1-hch@lst.de> References: <20220527084320.2130831-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Factor out the guts of repair_io_failure so that we have a new helper to submit synchronous I/O for repair. Unlike repair_io_failure itself this helper also handles reads. Signed-off-by: Christoph Hellwig --- fs/btrfs/extent_io.c | 82 ++++++-------------------------------------- fs/btrfs/volumes.c | 69 +++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.h | 2 ++ 3 files changed, 81 insertions(+), 72 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 54a0ec62c7b0d..27775031ed2d4 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2308,27 +2308,13 @@ int free_io_failure(struct extent_io_tree *failure_tree, return err; } -/* - * this bypasses the standard btrfs submit functions deliberately, as - * the standard behavior is to write all copies in a raid setup. here we only - * want to write the one bad copy. so we do the mapping for ourselves and issue - * submit_bio directly. - * to avoid any synchronization issues, wait for the data after writing, which - * actually prevents the read that triggered the error from finishing. - * currently, there can be no more than two copies of every data bit. thus, - * exactly one rewrite is required. - */ static int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start, u64 length, u64 logical, struct page *page, unsigned int pg_offset, int mirror_num) { - struct btrfs_device *dev; struct bio_vec bvec; struct bio bio; - u64 map_length = 0; - u64 sector; - struct btrfs_io_context *bioc = NULL; - int ret = 0; + int ret; ASSERT(!(fs_info->sb->s_flags & SB_RDONLY)); BUG_ON(!mirror_num); @@ -2336,67 +2322,19 @@ static int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start, if (btrfs_repair_one_zone(fs_info, logical)) return 0; - map_length = length; - - /* - * Avoid races with device replace and make sure our bioc has devices - * associated to its stripes that don't go away while we are doing the - * read repair operation. - */ - btrfs_bio_counter_inc_blocked(fs_info); - if (btrfs_is_parity_mirror(fs_info, logical, length)) { - /* - * Note that we don't use BTRFS_MAP_WRITE because it's supposed - * to update all raid stripes, but here we just want to correct - * bad stripe, thus BTRFS_MAP_READ is abused to only get the bad - * stripe's dev and sector. - */ - ret = btrfs_map_block(fs_info, BTRFS_MAP_READ, logical, - &map_length, &bioc, 0); - if (ret) - goto out_counter_dec; - ASSERT(bioc->mirror_num == 1); - } else { - ret = btrfs_map_block(fs_info, BTRFS_MAP_WRITE, logical, - &map_length, &bioc, mirror_num); - if (ret) - goto out_counter_dec; - BUG_ON(mirror_num != bioc->mirror_num); - } - - sector = bioc->stripes[bioc->mirror_num - 1].physical >> 9; - dev = bioc->stripes[bioc->mirror_num - 1].dev; - btrfs_put_bioc(bioc); - - if (!dev || !dev->bdev || - !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state)) { - ret = -EIO; - goto out_counter_dec; - } - - bio_init(&bio, dev->bdev, &bvec, 1, REQ_OP_WRITE | REQ_SYNC); - bio.bi_iter.bi_sector = sector; + bio_init(&bio, NULL, &bvec, 1, REQ_OP_WRITE | REQ_SYNC); + bio.bi_iter.bi_sector = logical >> 9; __bio_add_page(&bio, page, length, pg_offset); + ret = btrfs_map_repair_bio(fs_info, &bio, mirror_num); + bio_uninit(&bio); - btrfsic_check_bio(&bio); - ret = submit_bio_wait(&bio); - if (ret) { - /* try to remap that extent elsewhere? */ - btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS); - goto out_bio_uninit; - } + if (ret) + return ret; btrfs_info_rl_in_rcu(fs_info, - "read error corrected: ino %llu off %llu (dev %s sector %llu)", - ino, start, - rcu_str_deref(dev->name), sector); - ret = 0; - -out_bio_uninit: - bio_uninit(&bio); -out_counter_dec: - btrfs_bio_counter_dec(fs_info); - return ret; + "read error corrected: ino %llu off %llu (logical %llu)", + ino, start, logical); + return 0; } int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 1954a3e1c93a9..515f5fccf3d17 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6805,6 +6805,75 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, return errno_to_blk_status(ret); } +/* + * This bypasses the standard btrfs submit functions deliberately, as the + * standard behavior is to write all copies in a raid setup. Here we only want + * to write the one bad copy. Sso do the mapping ourselves and submit directly + * and synchronously. + */ +int btrfs_map_repair_bio(struct btrfs_fs_info *fs_info, struct bio *bio, + int mirror_num) +{ + u64 logical = bio->bi_iter.bi_sector << 9; + u64 map_length = bio->bi_iter.bi_size; + struct btrfs_io_context *bioc = NULL; + struct btrfs_device *dev; + u64 sector; + int ret; + + ASSERT(mirror_num); + ASSERT(bio_op(bio) == REQ_OP_WRITE); + + /* + * Avoid races with device replace and make sure our bioc has devices + * associated to its stripes that don't go away while we are doing the + * read repair operation. + */ + btrfs_bio_counter_inc_blocked(fs_info); + if (btrfs_is_parity_mirror(fs_info, logical, map_length)) { + /* + * Note that we don't use BTRFS_MAP_WRITE because it's supposed + * to update all raid stripes, but here we just want to correct + * bad stripe, thus BTRFS_MAP_READ is abused to only get the bad + * stripe's dev and sector. + */ + ret = btrfs_map_block(fs_info, BTRFS_MAP_READ, logical, + &map_length, &bioc, 0); + if (ret) + goto out_counter_dec; + ASSERT(bioc->mirror_num == 1); + } else { + ret = btrfs_map_block(fs_info, BTRFS_MAP_WRITE, logical, + &map_length, &bioc, mirror_num); + if (ret) + goto out_counter_dec; + BUG_ON(mirror_num != bioc->mirror_num); + } + + sector = bioc->stripes[bioc->mirror_num - 1].physical >> 9; + dev = bioc->stripes[bioc->mirror_num - 1].dev; + btrfs_put_bioc(bioc); + + if (!dev || !dev->bdev || + !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state)) { + ret = -EIO; + goto out_counter_dec; + } + + bio_set_dev(bio, dev->bdev); + bio->bi_iter.bi_sector = sector; + + btrfsic_check_bio(bio); + submit_bio_wait(bio); + + ret = blk_status_to_errno(bio->bi_status); + if (ret) + btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS); +out_counter_dec: + btrfs_bio_counter_dec(fs_info); + return ret; +} + static bool dev_args_match_fs_devices(const struct btrfs_dev_lookup_args *args, const struct btrfs_fs_devices *fs_devices) { diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index 517288d46ecf5..00c87833ce841 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -565,6 +565,8 @@ struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans, void btrfs_mapping_tree_free(struct extent_map_tree *tree); blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror_num); +int btrfs_map_repair_bio(struct btrfs_fs_info *fs_info, struct bio *bio, + int mirror_num); int btrfs_open_devices(struct btrfs_fs_devices *fs_devices, fmode_t flags, void *holder); struct btrfs_device *btrfs_scan_one_device(const char *path, From patchwork Fri May 27 08:43:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12863148 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54598C433F5 for ; Fri, 27 May 2022 08:43:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236422AbiE0Inm (ORCPT ); Fri, 27 May 2022 04:43:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40574 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349794AbiE0Ing (ORCPT ); Fri, 27 May 2022 04:43:36 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CCD242AC53 for ; Fri, 27 May 2022 01:43:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=pLz7fCaRYTbnnB67rTLJ5HL/EzUvu0brbWAz/EHB/NQ=; b=DLhfjS8Ot/Ywi6hqkRI+yuQfEG tv75UkYcBH5CEV6IfYDC2BcLWPGIzcGUBC84CQYhwkdqyQ9qO9yZauJvbr3za3QTnRPJtXP9HJArv msWZNpIb+MOFVDrhdSuRFsm8dV4seC+pUNWeIITGbmIliN4oT1FRM/cAqclEWcg4jACZGMy8e1VD5 IgOt3QZpnKmZVWIa2iXZzehKc7SkkTqF2ToXkkgGPDUcznbWfVrqKjwskKkXUsH92I47Ku9Gl/MiP FbAnRpv0hh5ii9/Z3n5IAX1y34IhL3JGBDTp6KjRk6siC2ii+fZpliv6X2F7L8h6wVINNuNu5P9zJ 8zU6pQUQ==; Received: from [2001:4bb8:18c:7298:b5ab:7d49:c6be:2011] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nuVZE-00H3W9-I3; Fri, 27 May 2022 08:43:32 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Qu Wenruo , linux-btrfs@vger.kernel.org Subject: [PATCH 4/9] btrfs: support read bios in btrfs_map_repair_bio Date: Fri, 27 May 2022 10:43:15 +0200 Message-Id: <20220527084320.2130831-5-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220527084320.2130831-1-hch@lst.de> References: <20220527084320.2130831-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Enhance btrfs_map_repair_bio to also support reading so that we have a single function dealing with all synchronous bio I/O for the repair code. Signed-off-by: Christoph Hellwig --- fs/btrfs/volumes.c | 48 ++++++++++++++++++++++++++++++++++++---------- 1 file changed, 38 insertions(+), 10 deletions(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 515f5fccf3d17..9053b62af3607 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6805,6 +6805,11 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, return errno_to_blk_status(ret); } +static void btrfs_bio_end_io_sync(struct bio *bio) +{ + complete(bio->bi_private); +} + /* * This bypasses the standard btrfs submit functions deliberately, as the * standard behavior is to write all copies in a raid setup. Here we only want @@ -6814,15 +6819,17 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int btrfs_map_repair_bio(struct btrfs_fs_info *fs_info, struct bio *bio, int mirror_num) { + enum btrfs_map_op op = btrfs_op(bio); u64 logical = bio->bi_iter.bi_sector << 9; u64 map_length = bio->bi_iter.bi_size; + bool is_raid56 = btrfs_is_parity_mirror(fs_info, logical, map_length); struct btrfs_io_context *bioc = NULL; + unsigned int stripe_idx = 0; struct btrfs_device *dev; u64 sector; int ret; ASSERT(mirror_num); - ASSERT(bio_op(bio) == REQ_OP_WRITE); /* * Avoid races with device replace and make sure our bioc has devices @@ -6830,7 +6837,23 @@ int btrfs_map_repair_bio(struct btrfs_fs_info *fs_info, struct bio *bio, * read repair operation. */ btrfs_bio_counter_inc_blocked(fs_info); - if (btrfs_is_parity_mirror(fs_info, logical, map_length)) { + if (is_raid56) { + if (op == BTRFS_MAP_READ) { + DECLARE_COMPLETION_ONSTACK(done); + + ret = __btrfs_map_block(fs_info, op, logical, + &map_length, &bioc, mirror_num, 1); + if (ret) + goto out_counter_dec; + + bio->bi_private = &done; + bio->bi_end_io = btrfs_bio_end_io_sync; + ret = raid56_parity_recover(bio, bioc, + map_length, mirror_num, 1); + wait_for_completion_io(&done); + goto out_bio_status; + } + /* * Note that we don't use BTRFS_MAP_WRITE because it's supposed * to update all raid stripes, but here we just want to correct @@ -6843,19 +6866,24 @@ int btrfs_map_repair_bio(struct btrfs_fs_info *fs_info, struct bio *bio, goto out_counter_dec; ASSERT(bioc->mirror_num == 1); } else { - ret = btrfs_map_block(fs_info, BTRFS_MAP_WRITE, logical, - &map_length, &bioc, mirror_num); + ret = btrfs_map_block(fs_info, op, logical, &map_length, &bioc, + mirror_num); if (ret) goto out_counter_dec; - BUG_ON(mirror_num != bioc->mirror_num); + + if (op == BTRFS_MAP_WRITE) { + ASSERT(mirror_num == bioc->mirror_num); + stripe_idx = bioc->mirror_num - 1; + } } - sector = bioc->stripes[bioc->mirror_num - 1].physical >> 9; - dev = bioc->stripes[bioc->mirror_num - 1].dev; + sector = bioc->stripes[stripe_idx].physical >> 9; + dev = bioc->stripes[stripe_idx].dev; btrfs_put_bioc(bioc); if (!dev || !dev->bdev || - !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state)) { + (op == BTRFS_MAP_WRITE && + !test_bit(BTRFS_DEV_STATE_WRITEABLE, &dev->dev_state))) { ret = -EIO; goto out_counter_dec; } @@ -6865,9 +6893,9 @@ int btrfs_map_repair_bio(struct btrfs_fs_info *fs_info, struct bio *bio, btrfsic_check_bio(bio); submit_bio_wait(bio); - +out_bio_status: ret = blk_status_to_errno(bio->bi_status); - if (ret) + if (ret && op == BTRFS_MAP_WRITE) btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS); out_counter_dec: btrfs_bio_counter_dec(fs_info); From patchwork Fri May 27 08:43:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12863147 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03A8DC433EF for ; Fri, 27 May 2022 08:43:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346268AbiE0Inl (ORCPT ); Fri, 27 May 2022 04:43:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40612 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236422AbiE0Ink (ORCPT ); Fri, 27 May 2022 04:43:40 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9AF4C2B1A7 for ; Fri, 27 May 2022 01:43:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=i/0s3jvAhx4j07ncel33kgZWsLi3W57LabdsyvkmSeA=; b=HqSNleLWusA6eP0/iPzasX9ngC 09HRdqZJQYFDdsJHmJTMnKc31QimWmpGJguRZPOKlANUUvUQpxwq/14YrQ1APnKuMrobVxCXrRgZr d5IaXFC1oT4c3UGmLpG87AX2kXeACdMpcF9yEEHC25jXtvypt1i2n3mpQFWM7fHCovJ4ncsuJom6/ iU+WqQqbBx+MZcvmTE/mo15spZlVTlcJt+WAs6oTyx5OTy5F3F/i+YdVJ83tJNrzJ+OJEmNb7jrmk asXfIK1NtfkQfYeq0wpuoidgka68rGSe9oAlAdSaCxPpknhC5lPitFAPXscnxnhRaJlCZpVK5MEIe iyKKH4HQ==; Received: from [2001:4bb8:18c:7298:b5ab:7d49:c6be:2011] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nuVZH-00H3XS-3Y; Fri, 27 May 2022 08:43:35 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Qu Wenruo , linux-btrfs@vger.kernel.org Subject: [PATCH 5/9] btrfs: add new read repair infrastructure Date: Fri, 27 May 2022 10:43:16 +0200 Message-Id: <20220527084320.2130831-6-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220527084320.2130831-1-hch@lst.de> References: <20220527084320.2130831-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This adds a new read repair implementation for btrfs. It is synchronous in that the end I/O handlers call them, and will get back the results instead of potentially getting multiple concurrent calls back into the original end I/O handler. The synchronous nature has the following advantages: - there is no need for a per-I/O tree of I/O failures, as everything related to the I/O failure can be handled locally - not having separate repair end I/O helpers will in the future help to reuse the direct I/O bio from iomap for the actual submission and thus remove the btrfs_dio_private infrastructure Because submitting many sector size synchronous I/Os would be very slow when multiple sectors (or a whole read) fail, this new code instead submits a single read and repair write bio for each contiguous section. It uses clone of the bio to do that and thus does not need to allocate any extra bio_vecs. Note that this cloning is open coded instead of using the block layer clone helpers as the clone is based on the save iter in the btrfs_bio, and not bio.bi_iter, which at the point that the repair code is called has been advanced by the low-level driver. Signed-off-by: Christoph Hellwig --- fs/btrfs/Makefile | 2 +- fs/btrfs/inode.c | 2 +- fs/btrfs/read-repair.c | 248 +++++++++++++++++++++++++++++++++++++++++ fs/btrfs/read-repair.h | 33 ++++++ fs/btrfs/super.c | 9 +- 5 files changed, 290 insertions(+), 4 deletions(-) create mode 100644 fs/btrfs/read-repair.c create mode 100644 fs/btrfs/read-repair.h diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index 99f9995670ea3..0b2605c750cab 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -31,7 +31,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \ backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \ uuid-tree.o props.o free-space-tree.o tree-checker.o space-info.o \ block-rsv.o delalloc-space.o block-group.o discard.o reflink.o \ - subpage.o tree-mod-log.o + subpage.o tree-mod-log.o read-repair.o btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 025444aba2847..e6195b9490b6b 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -7953,7 +7953,7 @@ static inline blk_status_t btrfs_submit_dio_bio(struct bio *bio, blk_status_t ret; /* save the original iter for read repair */ - if (btrfs_op(bio) == BTRFS_MAP_READ) { + if (btrfs_op(bio) == BTRFS_MAP_READ) btrfs_bio(bio)->iter = bio->bi_iter; if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) diff --git a/fs/btrfs/read-repair.c b/fs/btrfs/read-repair.c new file mode 100644 index 0000000000000..b7010a4589953 --- /dev/null +++ b/fs/btrfs/read-repair.c @@ -0,0 +1,248 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2022 Christoph Hellwig. + */ +#include "ctree.h" +#include "volumes.h" +#include "read-repair.h" +#include "btrfs_inode.h" + +static struct bio_set read_repair_bioset; + +static int next_mirror(struct btrfs_read_repair *rr, int cur_mirror) +{ + if (cur_mirror == rr->num_copies) + return cur_mirror + 1 - rr->num_copies; + return cur_mirror + 1; +} + +static int prev_mirror(struct btrfs_read_repair *rr, int cur_mirror) +{ + if (cur_mirror == 1) + return rr->num_copies; + return cur_mirror - 1; +} + +/* + * Clone a new bio from the src_bbio, using the saved iter in the btrfs_bio + * instead of using bio->bi_iter like the block layer cloning helpers. + */ +static struct btrfs_bio *btrfs_repair_bio_clone(struct btrfs_bio *src_bbio, + u64 offset, u32 size, unsigned int op) +{ + struct btrfs_bio *bbio; + struct bio *bio; + + bio = bio_alloc_bioset(NULL, 0, op | REQ_SYNC, GFP_NOFS, + &read_repair_bioset); + bio_set_flag(bio, BIO_CLONED); + + bio->bi_io_vec = src_bbio->bio.bi_io_vec; + bio->bi_iter = src_bbio->iter; + bio_advance(bio, offset); + bio->bi_iter.bi_size = size; + + bbio = btrfs_bio(bio); + memset(bbio, 0, offsetof(struct btrfs_bio, bio)); + bbio->iter = bbio->bio.bi_iter; + bbio->file_offset = src_bbio->file_offset + offset; + + return bbio; +} + +static void btrfs_repair_one_mirror(struct btrfs_bio *read_bbio, + struct btrfs_bio *failed_bbio, struct inode *inode, + u32 good_size, int bad_mirror) +{ + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + struct btrfs_inode *bi = BTRFS_I(inode); + u64 logical = read_bbio->iter.bi_sector << SECTOR_SHIFT; + u64 file_offset = read_bbio->file_offset; + struct btrfs_bio *write_bbio; + int ret; + + /* + * For zoned file systems repair has to relocate the whole zone. + */ + if (btrfs_repair_one_zone(fs_info, logical)) + return; + + /* + * Otherwise just clone good part of the read bio and write it back to + * the previously bad mirror. + */ + write_bbio = btrfs_repair_bio_clone(read_bbio, 0, good_size, + REQ_OP_WRITE); + ret = btrfs_map_repair_bio(fs_info, &write_bbio->bio, bad_mirror); + bio_put(&write_bbio->bio); + + btrfs_info_rl(fs_info, + "%s: root %lld ino %llu off %llu logical %llu/%u from good mirror %d", + ret ? "failed to correct read error" : "read error corrected", + bi->root->root_key.objectid, btrfs_ino(bi), + file_offset, logical, read_bbio->iter.bi_size, bad_mirror); +} + +static bool btrfs_repair_read_bio(struct btrfs_bio *bbio, + struct btrfs_bio *failed_bbio, struct inode *inode, + u32 *good_size, int read_mirror) +{ + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + u32 start_offset = bbio->file_offset - failed_bbio->file_offset; + u8 csum[BTRFS_CSUM_SIZE]; + struct bvec_iter iter; + struct bio_vec bv; + u32 offset; + + if (btrfs_map_repair_bio(fs_info, &bbio->bio, read_mirror)) + return false; + + *good_size = bbio->iter.bi_size; + if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM) + return true; + + btrfs_bio_for_each_sector(fs_info, bv, bbio, iter, offset) { + u8 *expected_csum = + btrfs_csum_ptr(fs_info, failed_bbio->csum, + start_offset + offset); + + if (btrfs_check_sector_csum(fs_info, bv.bv_page, bv.bv_offset, + csum, expected_csum)) { + /* + * Just fail if checksum verification failed for the + * very first sector. Else return how much good data we + * found so that we can only write back as much to the + * bad mirror(s). + */ + if (offset == 0) + return false; + *good_size = offset; + break; + } + } + + return true; +} + +bool __btrfs_read_repair_finish(struct btrfs_read_repair *rr, + struct btrfs_bio *failed_bbio, struct inode *inode, + u64 end_offset, repair_endio_t endio) +{ + u8 first_mirror, bad_mirror, read_mirror; + u64 start_offset = rr->start_offset; + struct btrfs_bio *read_bbio = NULL; + bool uptodate = false; + u32 good_size; + + bad_mirror = first_mirror = failed_bbio->mirror_num; + while ((read_mirror = next_mirror(rr, bad_mirror)) != first_mirror) { + if (read_bbio) + bio_put(&read_bbio->bio); + + /* + * Try to read the entire failed range from a presumably good + * range. + */ + read_bbio = btrfs_repair_bio_clone(failed_bbio, + start_offset, end_offset - start_offset, + REQ_OP_READ); + if (!btrfs_repair_read_bio(read_bbio, failed_bbio, inode, + &good_size, read_mirror)) { + /* + * If we failed to read any data at all, go straight to + * the next mirror. + */ + bad_mirror = read_mirror; + continue; + } + + /* + * If we have some good data write it back to all the previously + * bad mirrors. + */ + for (;;) { + btrfs_repair_one_mirror(read_bbio, failed_bbio, inode, + good_size, bad_mirror); + if (bad_mirror == first_mirror) + break; + bad_mirror = prev_mirror(rr, bad_mirror); + } + + /* + * If the whole bio was good, we are done now. + */ + if (good_size == read_bbio->iter.bi_size) { + uptodate = true; + break; + } + + /* + * Only the start of the bio was good. Complete the good bytes + * and fix up the iter to cover bad sectors so that the bad + * range can be passed to the endio handler n case there is no + * good mirror left. + */ + if (endio) + endio(read_bbio, inode, true); + start_offset += good_size; + read_bbio->file_offset += good_size; + bio_advance_iter(&read_bbio->bio, &read_bbio->iter, good_size); + + /* + * Restart the loop now that we've made some progress. + * + * This ensures we go back to mirrors that returned bad data for + * earlier as they might have good data for subsequent sectors. + */ + first_mirror = bad_mirror = read_mirror; + } + + if (endio) + endio(read_bbio, inode, uptodate); + bio_put(&read_bbio->bio); + + rr->in_use = false; + return uptodate; +} + +bool btrfs_read_repair_add(struct btrfs_read_repair *rr, + struct btrfs_bio *failed_bbio, struct inode *inode, + u64 start_offset) +{ + if (rr->in_use) + return true; + + /* + * Only set ->num_copies once as it must be the same for the whole + * I/O that the repair code iterates over. + */ + if (!rr->num_copies) { + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + + rr->num_copies = btrfs_num_copies(fs_info, + failed_bbio->iter.bi_sector << SECTOR_SHIFT, + failed_bbio->iter.bi_size); + } + + /* + * If there is no other copy of the data to recovery from, give up now + * and don't even try to build up a larget batch. + */ + if (rr->num_copies < 2) + return false; + + rr->in_use = true; + rr->start_offset = start_offset; + return true; +} + +int __init btrfs_read_repair_init(void) +{ + return bioset_init(&read_repair_bioset, BIO_POOL_SIZE, + offsetof(struct btrfs_bio, bio), 0); +} + +void btrfs_read_repair_exit(void) +{ + bioset_exit(&read_repair_bioset); +} diff --git a/fs/btrfs/read-repair.h b/fs/btrfs/read-repair.h new file mode 100644 index 0000000000000..20366f5f0a239 --- /dev/null +++ b/fs/btrfs/read-repair.h @@ -0,0 +1,33 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef BTRFS_READ_REPAIR_H +#define BTRFS_READ_REPAIR_H + +struct btrfs_read_repair { + u64 start_offset; + bool in_use; + int num_copies; +}; + +typedef void (*repair_endio_t)(struct btrfs_bio *repair_bbio, + struct inode *inode, bool uptodate); + +bool btrfs_read_repair_add(struct btrfs_read_repair *rr, + struct btrfs_bio *failed_bbio, struct inode *inode, + u64 bio_offset); +bool __btrfs_read_repair_finish(struct btrfs_read_repair *rr, + struct btrfs_bio *failed_bbio, struct inode *inode, + u64 end_offset, repair_endio_t end_io); +static inline bool btrfs_read_repair_finish(struct btrfs_read_repair *rr, + struct btrfs_bio *failed_bbio, struct inode *inode, + u64 end_offset, repair_endio_t endio) +{ + if (!rr->in_use) + return true; + return __btrfs_read_repair_finish(rr, failed_bbio, inode, end_offset, + endio); +} + +int __init btrfs_read_repair_init(void); +void btrfs_read_repair_exit(void); + +#endif /* BTRFS_READ_REPAIR_H */ diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 938f42e53a8f8..11270bd78cd71 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -48,6 +48,7 @@ #include "block-group.h" #include "discard.h" #include "qgroup.h" +#include "read-repair.h" #define CREATE_TRACE_POINTS #include @@ -2645,10 +2646,12 @@ static int __init init_btrfs_fs(void) err = extent_io_init(); if (err) goto free_cachep; - - err = extent_state_cache_init(); + err = btrfs_read_repair_init(); if (err) goto free_extent_io; + err = extent_state_cache_init(); + if (err) + goto free_read_repair; err = extent_map_init(); if (err) @@ -2706,6 +2709,8 @@ static int __init init_btrfs_fs(void) extent_map_exit(); free_extent_state_cache: extent_state_cache_exit(); +free_read_repair: + btrfs_read_repair_exit(); free_extent_io: extent_io_exit(); free_cachep: From patchwork Fri May 27 08:43:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12863149 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE4A2C433EF for ; Fri, 27 May 2022 08:43:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349814AbiE0Ino (ORCPT ); Fri, 27 May 2022 04:43:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40640 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349807AbiE0Inm (ORCPT ); Fri, 27 May 2022 04:43:42 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 53A542AC53 for ; Fri, 27 May 2022 01:43:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=8yjf+T+H978dkR6H1s5RjqZsxsoHY9pvIhFsqRmgP8Q=; b=hUiNsVJlWxaLrGOU4shwE9eEo5 yJlANk2Y8F52KbG95wmAudb9+yf912Uqf3Q2PQL07qgk4kjdyu9zvTwNomLwe4ct3mQdAvz0Mt4kx CLLSotM3RXOrL6VXIsYGvSMBRBjPdmt2YX360QCiWYa9i+CeEAjcZBLUDXwylW2otOtvWOgRolKvn JUwCBJF9ZpI5USzraLQtW43C4l9UGL5kzPXQo2SP1UiXmgCBTfnXg/Jb8vNWX3njJt+g8C5+FIRWR KqtTTMZdm/4tfPhKQ7GwgdL9m+ufqfVeAARiS7z7Q2NIt1oHJosNLSomECFlV4Q/BdjfdKMsdW/cf o6l8QIHg==; Received: from [2001:4bb8:18c:7298:b5ab:7d49:c6be:2011] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nuVZJ-00H3YA-NC; Fri, 27 May 2022 08:43:38 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Qu Wenruo , linux-btrfs@vger.kernel.org Subject: [PATCH 6/9] btrfs: use the new read repair code for direct I/O Date: Fri, 27 May 2022 10:43:17 +0200 Message-Id: <20220527084320.2130831-7-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220527084320.2130831-1-hch@lst.de> References: <20220527084320.2130831-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Rewrite btrfs_check_read_dio_bio to use btrfs_bio_for_each_sector and start/end a repair session as needed. Signed-off-by: Christoph Hellwig --- fs/btrfs/inode.c | 40 ++++++++++------------------------------ 1 file changed, 10 insertions(+), 30 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index e6195b9490b6b..76575b1bf30ad 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -55,6 +55,7 @@ #include "zoned.h" #include "subpage.h" #include "inode-item.h" +#include "read-repair.h" struct btrfs_iget_args { u64 ino; @@ -7853,55 +7854,34 @@ static void btrfs_dio_private_put(struct btrfs_dio_private *dip) bio_endio(&dip->bio); } -static void submit_dio_repair_bio(struct inode *inode, struct bio *bio, - int mirror_num, - enum btrfs_compression_type compress_type) -{ - struct btrfs_dio_private *dip = bio->bi_private; - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - - BUG_ON(bio_op(bio) == REQ_OP_WRITE); - - refcount_inc(&dip->refs); - if (btrfs_map_bio(fs_info, bio, mirror_num)) - refcount_dec(&dip->refs); -} - static blk_status_t btrfs_check_read_dio_bio(struct btrfs_dio_private *dip, struct btrfs_bio *bbio, const bool uptodate) { struct inode *inode = dip->inode; struct btrfs_fs_info *fs_info = BTRFS_I(inode)->root->fs_info; - struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree; - struct extent_io_tree *io_tree = &BTRFS_I(inode)->io_tree; const bool csum = !(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM); + struct btrfs_read_repair rr = { }; blk_status_t err = BLK_STS_OK; struct bvec_iter iter; struct bio_vec bv; u32 offset; btrfs_bio_for_each_sector(fs_info, bv, bbio, iter, offset) { - u64 start = bbio->file_offset + offset; - if (uptodate && (!csum || !check_data_csum(inode, bbio, offset, bv.bv_page, - bv.bv_offset, start))) { - clean_io_failure(fs_info, failure_tree, io_tree, start, - bv.bv_page, btrfs_ino(BTRFS_I(inode)), - bv.bv_offset); + bv.bv_offset, bbio->file_offset + offset))) { + if (!btrfs_read_repair_finish(&rr, bbio, inode, offset, + NULL)) + err = BLK_STS_IOERR; } else { - int ret; - - ret = btrfs_repair_one_sector(inode, &bbio->bio, offset, - bv.bv_page, bv.bv_offset, start, - bbio->mirror_num, - submit_dio_repair_bio); - if (ret) - err = errno_to_blk_status(ret); + if (!btrfs_read_repair_add(&rr, bbio, inode, offset)) + err = BLK_STS_IOERR; } } + if (!btrfs_read_repair_finish(&rr, bbio, inode, offset, NULL)) + err = BLK_STS_IOERR; return err; } From patchwork Fri May 27 08:43:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12863150 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3961C433EF for ; Fri, 27 May 2022 08:43:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349813AbiE0Inr (ORCPT ); Fri, 27 May 2022 04:43:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240856AbiE0Inp (ORCPT ); Fri, 27 May 2022 04:43:45 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AB9A04B421 for ; Fri, 27 May 2022 01:43:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=C/d6+ung3MOjefm2dIiytyBUmOopj0MfgDrP/CBpcwU=; b=dwSJSVIn3I12gxUilfgDPM1bUB LQQQ956vimxF53sUj6V0ge9tYSkJsIuHBSyxU8xIgDZRSBx/IpCz7GMIWj0TtPNd7TjZ6u85x3Mbm 4QGn16PoSu9Wh/9VHYrx6fNZX5QZHxMgG8liBPLy8CENRkO++X1az5JOs1+8+mbOGO98w3jsh07zX 2Sv1CGV6PTRhcb8+8yGs+kB2hXICt3OPjjgmRRjykMeaJ3+7xgHUsvyGAg3Lk50eJYiNVwA+qLqbs LKMbBFtAXYWmZlwJaZZI4r0UY9oV+dnvA9cfdLcPUB2joxOgjBXUyM05YzHgaIPu/ICaQCH8lebtw a6ijLw/w==; Received: from [2001:4bb8:18c:7298:b5ab:7d49:c6be:2011] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nuVZM-00H3Yt-9h; Fri, 27 May 2022 08:43:40 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Qu Wenruo , linux-btrfs@vger.kernel.org Subject: [PATCH 7/9] btrfs: use the new read repair code for buffered reads Date: Fri, 27 May 2022 10:43:18 +0200 Message-Id: <20220527084320.2130831-8-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220527084320.2130831-1-hch@lst.de> References: <20220527084320.2130831-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Start/end a repair session as needed in end_bio_extent_readpage and submit_data_read_repair. Unlike direct I/O, the buffered I/O handler completes I/O on a per-sector basis and thus needs to pass an endio handler to the repair code, which unlocks all pages and marks them as either uptodate or not. Signed-off-by: Christoph Hellwig --- fs/btrfs/extent_io.c | 76 ++++++++++++++++++++------------------------ 1 file changed, 35 insertions(+), 41 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 27775031ed2d4..9d7835ba6d396 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -30,6 +30,7 @@ #include "zoned.h" #include "block-group.h" #include "compression.h" +#include "read-repair.h" static struct kmem_cache *extent_state_cache; static struct kmem_cache *extent_buffer_cache; @@ -2683,14 +2684,29 @@ static void end_sector_io(struct page *page, u64 offset, bool uptodate) offset + sectorsize - 1, &cached); } +static void end_read_repair(struct btrfs_bio *repair_bbio, struct inode *inode, + bool uptodate) +{ + struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + struct bvec_iter iter; + struct bio_vec bv; + u32 offset; + + btrfs_bio_for_each_sector(fs_info, bv, repair_bbio, iter, offset) + end_sector_io(bv.bv_page, repair_bbio->file_offset + offset, + uptodate); +} + static void submit_data_read_repair(struct inode *inode, struct bio *failed_bio, u32 bio_offset, const struct bio_vec *bvec, - int failed_mirror, unsigned int error_bitmap) + int failed_mirror, + unsigned int error_bitmap, + struct btrfs_read_repair *rr) { - const unsigned int pgoff = bvec->bv_offset; + struct btrfs_bio *failed_bbio = btrfs_bio(failed_bio); struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); + u64 start = page_offset(bvec->bv_page) + bvec->bv_offset; struct page *page = bvec->bv_page; - const u64 start = page_offset(bvec->bv_page) + bvec->bv_offset; const u64 end = start + bvec->bv_len - 1; const u32 sectorsize = fs_info->sectorsize; const int nr_bits = (end + 1 - start) >> fs_info->sectorsize_bits; @@ -2712,38 +2728,17 @@ static void submit_data_read_repair(struct inode *inode, struct bio *failed_bio, /* Iterate through all the sectors in the range */ for (i = 0; i < nr_bits; i++) { - const unsigned int offset = i * sectorsize; - bool uptodate = false; - int ret; - - if (!(error_bitmap & (1U << i))) { - /* - * This sector has no error, just end the page read - * and unlock the range. - */ - uptodate = true; - goto next; + bool uptodate = !(error_bitmap & (1U << i)); + + if (uptodate || + !btrfs_read_repair_add(rr, failed_bbio, inode, + bio_offset)) { + btrfs_read_repair_finish(rr, failed_bbio, inode, + bio_offset, end_read_repair); + end_sector_io(page, start, uptodate); } - - ret = btrfs_repair_one_sector(inode, failed_bio, - bio_offset + offset, - page, pgoff + offset, start + offset, - failed_mirror, btrfs_submit_data_read_bio); - if (!ret) { - /* - * We have submitted the read repair, the page release - * will be handled by the endio function of the - * submitted repair bio. - * Thus we don't need to do any thing here. - */ - continue; - } - /* - * Continue on failed repair, otherwise the remaining sectors - * will not be properly unlocked. - */ -next: - end_sector_io(page, start + offset, uptodate); + bio_offset += sectorsize; + start += sectorsize; } } @@ -2954,8 +2949,6 @@ static void end_bio_extent_readpage(struct bio *bio) struct bio_vec *bvec; struct btrfs_bio *bbio = btrfs_bio(bio); int mirror = bbio->mirror_num; - struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree; - struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree; bool uptodate = !bio->bi_status; struct processed_extent processed = { 0 }; /* @@ -2964,6 +2957,7 @@ static void end_bio_extent_readpage(struct bio *bio) */ u32 bio_offset = 0; struct bvec_iter_all iter_all; + struct btrfs_read_repair rr = { }; btrfs_bio(bio)->file_offset = page_offset(first_vec->bv_page) + first_vec->bv_offset; @@ -3020,10 +3014,6 @@ static void end_bio_extent_readpage(struct bio *bio) loff_t i_size = i_size_read(inode); pgoff_t end_index = i_size >> PAGE_SHIFT; - clean_io_failure(BTRFS_I(inode)->root->fs_info, - failure_tree, tree, start, page, - btrfs_ino(BTRFS_I(inode)), 0); - /* * Zero out the remaining part if this range straddles * i_size. @@ -3063,9 +3053,11 @@ static void end_bio_extent_readpage(struct bio *bio) * and bad sectors, we just continue to the next bvec. */ submit_data_read_repair(inode, bio, bio_offset, bvec, - mirror, error_bitmap); + mirror, error_bitmap, &rr); } else { /* Update page status and unlock */ + btrfs_read_repair_finish(&rr, btrfs_bio(bio), inode, + bio_offset, end_read_repair); end_page_read(page, uptodate, start, len); endio_readpage_release_extent(&processed, BTRFS_I(inode), start, end, PageUptodate(page)); @@ -3076,6 +3068,8 @@ static void end_bio_extent_readpage(struct bio *bio) } /* Release the last extent */ + btrfs_read_repair_finish(&rr, btrfs_bio(bio), inode, bio_offset, + end_read_repair); endio_readpage_release_extent(&processed, NULL, 0, 0, false); btrfs_bio_free_csum(bbio); bio_put(bio); From patchwork Fri May 27 08:43:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12863152 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CAB2C433F5 for ; Fri, 27 May 2022 08:43:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349829AbiE0Inv (ORCPT ); Fri, 27 May 2022 04:43:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1349819AbiE0Ins (ORCPT ); Fri, 27 May 2022 04:43:48 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8BE825DA3A for ; Fri, 27 May 2022 01:43:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=CUYycioeNFVnoT311yJjyp+gJdK8USpqzZV71Whaozk=; b=XY/FjmU7z8kmGF8JfVaf8+a8I6 oxihEYhNMmgwpuMlnnDYSXGdRGKUZMiGv9sGHiKpYqzgPtf4uyWipsZ7wayJj42Mf6YeiSZeD4Q1f peesfOdV+bR77gbT+YEL472A32fn5UJRq8r+P44Gx2iJftYgfw5jLleper09iTaTQ60LeHTy2I/Z8 IqhuKGuLlrO6KtSl2yZuuhMFSYJK26bhaKDQl/6dMh3F8OpQf5EorRBiUm5lQ+k+lT7In4Fo6T8lN NK8pNiVBKkgrIe+0edigpMN6sZgdPUQZKxa5HCF4vsx/LREiBbrfaeHGqQ9Hqy/rq3cM05RX1R22d 6nVgdenw==; Received: from [2001:4bb8:18c:7298:b5ab:7d49:c6be:2011] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nuVZO-00H3a9-QN; Fri, 27 May 2022 08:43:43 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Qu Wenruo , linux-btrfs@vger.kernel.org Subject: [PATCH 8/9] btrfs: remove io_failure_record infrastructure completely Date: Fri, 27 May 2022 10:43:19 +0200 Message-Id: <20220527084320.2130831-9-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220527084320.2130831-1-hch@lst.de> References: <20220527084320.2130831-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org From: Qu Wenruo Since our read repair are always handled by btrfs_read_repair_ctrl, which only has the lifespan inside endio function. This means we no longer needs to record which range and its mirror number for failure. Now if we failed to read some data page, we have already tried every mirrors we have, thus no need to record the failed range. Thus this patch can remove the whole io_failure_record structure and its related functions. Signed-off-by: Qu Wenruo Signed-off-by: Christoph Hellwig --- fs/btrfs/btrfs_inode.h | 5 - fs/btrfs/extent-io-tree.h | 15 -- fs/btrfs/extent_io.c | 365 ----------------------------------- fs/btrfs/extent_io.h | 25 --- fs/btrfs/inode.c | 7 - include/trace/events/btrfs.h | 1 - 6 files changed, 418 deletions(-) diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h index 33811e896623f..3eeba0eb9f16b 100644 --- a/fs/btrfs/btrfs_inode.h +++ b/fs/btrfs/btrfs_inode.h @@ -91,11 +91,6 @@ struct btrfs_inode { /* the io_tree does range state (DIRTY, LOCKED etc) */ struct extent_io_tree io_tree; - /* special utility tree used to record which mirrors have already been - * tried when checksums fail for a given block - */ - struct extent_io_tree io_failure_tree; - /* * Keep track of where the inode has extent items mapped in order to * make sure the i_size adjustments are accurate diff --git a/fs/btrfs/extent-io-tree.h b/fs/btrfs/extent-io-tree.h index c3eb52dbe61cc..8ab9b6cd53ed4 100644 --- a/fs/btrfs/extent-io-tree.h +++ b/fs/btrfs/extent-io-tree.h @@ -56,7 +56,6 @@ enum { IO_TREE_FS_EXCLUDED_EXTENTS, IO_TREE_BTREE_INODE_IO, IO_TREE_INODE_IO, - IO_TREE_INODE_IO_FAILURE, IO_TREE_RELOC_BLOCKS, IO_TREE_TRANS_DIRTY_PAGES, IO_TREE_ROOT_DIRTY_LOG_PAGES, @@ -250,18 +249,4 @@ bool btrfs_find_delalloc_range(struct extent_io_tree *tree, u64 *start, u64 *end, u64 max_bytes, struct extent_state **cached_state); -/* This should be reworked in the future and put elsewhere. */ -struct io_failure_record *get_state_failrec(struct extent_io_tree *tree, u64 start); -int set_state_failrec(struct extent_io_tree *tree, u64 start, - struct io_failure_record *failrec); -void btrfs_free_io_failure_record(struct btrfs_inode *inode, u64 start, - u64 end); -int free_io_failure(struct extent_io_tree *failure_tree, - struct extent_io_tree *io_tree, - struct io_failure_record *rec); -int clean_io_failure(struct btrfs_fs_info *fs_info, - struct extent_io_tree *failure_tree, - struct extent_io_tree *io_tree, u64 start, - struct page *page, u64 ino, unsigned int pg_offset); - #endif /* BTRFS_EXTENT_IO_TREE_H */ diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 9d7835ba6d396..c0cb3d4f5440f 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2169,66 +2169,6 @@ u64 count_range_bits(struct extent_io_tree *tree, return total_bytes; } -/* - * set the private field for a given byte offset in the tree. If there isn't - * an extent_state there already, this does nothing. - */ -int set_state_failrec(struct extent_io_tree *tree, u64 start, - struct io_failure_record *failrec) -{ - struct rb_node *node; - struct extent_state *state; - int ret = 0; - - spin_lock(&tree->lock); - /* - * this search will find all the extents that end after - * our range starts. - */ - node = tree_search(tree, start); - if (!node) { - ret = -ENOENT; - goto out; - } - state = rb_entry(node, struct extent_state, rb_node); - if (state->start != start) { - ret = -ENOENT; - goto out; - } - state->failrec = failrec; -out: - spin_unlock(&tree->lock); - return ret; -} - -struct io_failure_record *get_state_failrec(struct extent_io_tree *tree, u64 start) -{ - struct rb_node *node; - struct extent_state *state; - struct io_failure_record *failrec; - - spin_lock(&tree->lock); - /* - * this search will find all the extents that end after - * our range starts. - */ - node = tree_search(tree, start); - if (!node) { - failrec = ERR_PTR(-ENOENT); - goto out; - } - state = rb_entry(node, struct extent_state, rb_node); - if (state->start != start) { - failrec = ERR_PTR(-ENOENT); - goto out; - } - - failrec = state->failrec; -out: - spin_unlock(&tree->lock); - return failrec; -} - /* * searches a range in the state tree for a given mask. * If 'filled' == 1, this returns 1 only if every extent in the tree @@ -2285,30 +2225,6 @@ int test_range_bit(struct extent_io_tree *tree, u64 start, u64 end, return bitset; } -int free_io_failure(struct extent_io_tree *failure_tree, - struct extent_io_tree *io_tree, - struct io_failure_record *rec) -{ - int ret; - int err = 0; - - set_state_failrec(failure_tree, rec->start, NULL); - ret = clear_extent_bits(failure_tree, rec->start, - rec->start + rec->len - 1, - EXTENT_LOCKED | EXTENT_DIRTY); - if (ret) - err = ret; - - ret = clear_extent_bits(io_tree, rec->start, - rec->start + rec->len - 1, - EXTENT_DAMAGED); - if (ret && !err) - err = ret; - - kfree(rec); - return err; -} - static int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start, u64 length, u64 logical, struct page *page, unsigned int pg_offset, int mirror_num) @@ -2361,287 +2277,6 @@ int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num) return ret; } -/* - * each time an IO finishes, we do a fast check in the IO failure tree - * to see if we need to process or clean up an io_failure_record - */ -int clean_io_failure(struct btrfs_fs_info *fs_info, - struct extent_io_tree *failure_tree, - struct extent_io_tree *io_tree, u64 start, - struct page *page, u64 ino, unsigned int pg_offset) -{ - u64 private; - struct io_failure_record *failrec; - struct extent_state *state; - int num_copies; - int ret; - - private = 0; - ret = count_range_bits(failure_tree, &private, (u64)-1, 1, - EXTENT_DIRTY, 0); - if (!ret) - return 0; - - failrec = get_state_failrec(failure_tree, start); - if (IS_ERR(failrec)) - return 0; - - BUG_ON(!failrec->this_mirror); - - if (sb_rdonly(fs_info->sb)) - goto out; - - spin_lock(&io_tree->lock); - state = find_first_extent_bit_state(io_tree, - failrec->start, - EXTENT_LOCKED); - spin_unlock(&io_tree->lock); - - if (state && state->start <= failrec->start && - state->end >= failrec->start + failrec->len - 1) { - num_copies = btrfs_num_copies(fs_info, failrec->logical, - failrec->len); - if (num_copies > 1) { - repair_io_failure(fs_info, ino, start, failrec->len, - failrec->logical, page, pg_offset, - failrec->failed_mirror); - } - } - -out: - free_io_failure(failure_tree, io_tree, failrec); - - return 0; -} - -/* - * Can be called when - * - hold extent lock - * - under ordered extent - * - the inode is freeing - */ -void btrfs_free_io_failure_record(struct btrfs_inode *inode, u64 start, u64 end) -{ - struct extent_io_tree *failure_tree = &inode->io_failure_tree; - struct io_failure_record *failrec; - struct extent_state *state, *next; - - if (RB_EMPTY_ROOT(&failure_tree->state)) - return; - - spin_lock(&failure_tree->lock); - state = find_first_extent_bit_state(failure_tree, start, EXTENT_DIRTY); - while (state) { - if (state->start > end) - break; - - ASSERT(state->end <= end); - - next = next_state(state); - - failrec = state->failrec; - free_extent_state(state); - kfree(failrec); - - state = next; - } - spin_unlock(&failure_tree->lock); -} - -static struct io_failure_record *btrfs_get_io_failure_record(struct inode *inode, - u64 start) -{ - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct io_failure_record *failrec; - struct extent_map *em; - struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree; - struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree; - struct extent_map_tree *em_tree = &BTRFS_I(inode)->extent_tree; - const u32 sectorsize = fs_info->sectorsize; - int ret; - u64 logical; - - failrec = get_state_failrec(failure_tree, start); - if (!IS_ERR(failrec)) { - btrfs_debug(fs_info, - "Get IO Failure Record: (found) logical=%llu, start=%llu, len=%llu", - failrec->logical, failrec->start, failrec->len); - /* - * when data can be on disk more than twice, add to failrec here - * (e.g. with a list for failed_mirror) to make - * clean_io_failure() clean all those errors at once. - */ - - return failrec; - } - - failrec = kzalloc(sizeof(*failrec), GFP_NOFS); - if (!failrec) - return ERR_PTR(-ENOMEM); - - failrec->start = start; - failrec->len = sectorsize; - failrec->this_mirror = 0; - failrec->compress_type = BTRFS_COMPRESS_NONE; - - read_lock(&em_tree->lock); - em = lookup_extent_mapping(em_tree, start, failrec->len); - if (!em) { - read_unlock(&em_tree->lock); - kfree(failrec); - return ERR_PTR(-EIO); - } - - if (em->start > start || em->start + em->len <= start) { - free_extent_map(em); - em = NULL; - } - read_unlock(&em_tree->lock); - if (!em) { - kfree(failrec); - return ERR_PTR(-EIO); - } - - logical = start - em->start; - logical = em->block_start + logical; - if (test_bit(EXTENT_FLAG_COMPRESSED, &em->flags)) { - logical = em->block_start; - failrec->compress_type = em->compress_type; - } - - btrfs_debug(fs_info, - "Get IO Failure Record: (new) logical=%llu, start=%llu, len=%llu", - logical, start, failrec->len); - - failrec->logical = logical; - free_extent_map(em); - - /* Set the bits in the private failure tree */ - ret = set_extent_bits(failure_tree, start, start + sectorsize - 1, - EXTENT_LOCKED | EXTENT_DIRTY); - if (ret >= 0) { - ret = set_state_failrec(failure_tree, start, failrec); - /* Set the bits in the inode's tree */ - ret = set_extent_bits(tree, start, start + sectorsize - 1, - EXTENT_DAMAGED); - } else if (ret < 0) { - kfree(failrec); - return ERR_PTR(ret); - } - - return failrec; -} - -static bool btrfs_check_repairable(struct inode *inode, - struct io_failure_record *failrec, - int failed_mirror) -{ - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - int num_copies; - - num_copies = btrfs_num_copies(fs_info, failrec->logical, failrec->len); - if (num_copies == 1) { - /* - * we only have a single copy of the data, so don't bother with - * all the retry and error correction code that follows. no - * matter what the error is, it is very likely to persist. - */ - btrfs_debug(fs_info, - "Check Repairable: cannot repair, num_copies=%d, next_mirror %d, failed_mirror %d", - num_copies, failrec->this_mirror, failed_mirror); - return false; - } - - /* The failure record should only contain one sector */ - ASSERT(failrec->len == fs_info->sectorsize); - - /* - * There are two premises: - * a) deliver good data to the caller - * b) correct the bad sectors on disk - * - * Since we're only doing repair for one sector, we only need to get - * a good copy of the failed sector and if we succeed, we have setup - * everything for repair_io_failure to do the rest for us. - */ - ASSERT(failed_mirror); - failrec->failed_mirror = failed_mirror; - failrec->this_mirror++; - if (failrec->this_mirror == failed_mirror) - failrec->this_mirror++; - - if (failrec->this_mirror > num_copies) { - btrfs_debug(fs_info, - "Check Repairable: (fail) num_copies=%d, next_mirror %d, failed_mirror %d", - num_copies, failrec->this_mirror, failed_mirror); - return false; - } - - return true; -} - -int btrfs_repair_one_sector(struct inode *inode, - struct bio *failed_bio, u32 bio_offset, - struct page *page, unsigned int pgoff, - u64 start, int failed_mirror, - submit_bio_hook_t *submit_bio_hook) -{ - struct io_failure_record *failrec; - struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb); - struct extent_io_tree *tree = &BTRFS_I(inode)->io_tree; - struct extent_io_tree *failure_tree = &BTRFS_I(inode)->io_failure_tree; - struct btrfs_bio *failed_bbio = btrfs_bio(failed_bio); - const int icsum = bio_offset >> fs_info->sectorsize_bits; - struct bio *repair_bio; - struct btrfs_bio *repair_bbio; - - btrfs_debug(fs_info, - "repair read error: read error at %llu", start); - - BUG_ON(bio_op(failed_bio) == REQ_OP_WRITE); - - failrec = btrfs_get_io_failure_record(inode, start); - if (IS_ERR(failrec)) - return PTR_ERR(failrec); - - - if (!btrfs_check_repairable(inode, failrec, failed_mirror)) { - free_io_failure(failure_tree, tree, failrec); - return -EIO; - } - - repair_bio = btrfs_bio_alloc(1); - repair_bbio = btrfs_bio(repair_bio); - repair_bbio->file_offset = start; - repair_bio->bi_opf = REQ_OP_READ; - repair_bio->bi_end_io = failed_bio->bi_end_io; - repair_bio->bi_iter.bi_sector = failrec->logical >> 9; - repair_bio->bi_private = failed_bio->bi_private; - - if (failed_bbio->csum) { - const u32 csum_size = fs_info->csum_size; - - repair_bbio->csum = repair_bbio->csum_inline; - memcpy(repair_bbio->csum, - failed_bbio->csum + csum_size * icsum, csum_size); - } - - bio_add_page(repair_bio, page, failrec->len, pgoff); - repair_bbio->iter = repair_bio->bi_iter; - - btrfs_debug(btrfs_sb(inode->i_sb), - "repair read error: submitting new read to mirror %d", - failrec->this_mirror); - - /* - * At this point we have a bio, so any errors from submit_bio_hook() - * will be handled by the endio on the repair_bio, so we can't return an - * error here. - */ - submit_bio_hook(inode, repair_bio, failrec->this_mirror, failrec->compress_type); - return BLK_STS_OK; -} - static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len) { struct btrfs_fs_info *fs_info = btrfs_sb(page->mapping->host->i_sb); diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h index 72966cf21961e..901f24cf2de28 100644 --- a/fs/btrfs/extent_io.h +++ b/fs/btrfs/extent_io.h @@ -61,7 +61,6 @@ struct btrfs_root; struct btrfs_inode; struct btrfs_io_bio; struct btrfs_fs_info; -struct io_failure_record; struct extent_io_tree; typedef void (submit_bio_hook_t)(struct inode *inode, struct bio *bio, @@ -252,30 +251,6 @@ struct bio *btrfs_bio_clone_partial(struct bio *orig, u64 offset, u64 size); void end_extent_writepage(struct page *page, int err, u64 start, u64 end); int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num); -/* - * When IO fails, either with EIO or csum verification fails, we - * try other mirrors that might have a good copy of the data. This - * io_failure_record is used to record state as we go through all the - * mirrors. If another mirror has good data, the sector is set up to date - * and things continue. If a good mirror can't be found, the original - * bio end_io callback is called to indicate things have failed. - */ -struct io_failure_record { - struct page *page; - u64 start; - u64 len; - u64 logical; - enum btrfs_compression_type compress_type; - int this_mirror; - int failed_mirror; -}; - -int btrfs_repair_one_sector(struct inode *inode, - struct bio *failed_bio, u32 bio_offset, - struct page *page, unsigned int pgoff, - u64 start, int failed_mirror, - submit_bio_hook_t *submit_bio_hook); - #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS bool find_lock_delalloc_range(struct inode *inode, struct page *locked_page, u64 *start, diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 76575b1bf30ad..b6186fc4466a6 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -3133,8 +3133,6 @@ static int btrfs_finish_ordered_io(struct btrfs_ordered_extent *ordered_extent) ordered_extent->disk_num_bytes); } - btrfs_free_io_failure_record(inode, start, end); - if (test_bit(BTRFS_ORDERED_TRUNCATED, &ordered_extent->flags)) { truncated = true; logical_len = ordered_extent->truncated_len; @@ -5345,8 +5343,6 @@ void btrfs_evict_inode(struct inode *inode) if (is_bad_inode(inode)) goto no_delete; - btrfs_free_io_failure_record(BTRFS_I(inode), 0, (u64)-1); - if (test_bit(BTRFS_FS_LOG_RECOVERING, &fs_info->flags)) goto no_delete; @@ -8818,12 +8814,9 @@ struct inode *btrfs_alloc_inode(struct super_block *sb) inode = &ei->vfs_inode; extent_map_tree_init(&ei->extent_tree); extent_io_tree_init(fs_info, &ei->io_tree, IO_TREE_INODE_IO, inode); - extent_io_tree_init(fs_info, &ei->io_failure_tree, - IO_TREE_INODE_IO_FAILURE, inode); extent_io_tree_init(fs_info, &ei->file_extent_tree, IO_TREE_INODE_FILE_EXTENT, inode); ei->io_tree.track_uptodate = true; - ei->io_failure_tree.track_uptodate = true; atomic_set(&ei->sync_writers, 0); mutex_init(&ei->log_mutex); btrfs_ordered_inode_tree_init(&ei->ordered_tree); diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index 290f07eb050af..764e9643c123c 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -82,7 +82,6 @@ struct btrfs_space_info; EM( IO_TREE_FS_EXCLUDED_EXTENTS, "EXCLUDED_EXTENTS") \ EM( IO_TREE_BTREE_INODE_IO, "BTREE_INODE_IO") \ EM( IO_TREE_INODE_IO, "INODE_IO") \ - EM( IO_TREE_INODE_IO_FAILURE, "INODE_IO_FAILURE") \ EM( IO_TREE_RELOC_BLOCKS, "RELOC_BLOCKS") \ EM( IO_TREE_TRANS_DIRTY_PAGES, "TRANS_DIRTY_PAGES") \ EM( IO_TREE_ROOT_DIRTY_LOG_PAGES, "ROOT_DIRTY_LOG_PAGES") \ From patchwork Fri May 27 08:43:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 12863151 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CEFE9C433EF for ; Fri, 27 May 2022 08:43:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349824AbiE0Inu (ORCPT ); Fri, 27 May 2022 04:43:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40756 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240856AbiE0Ins (ORCPT ); Fri, 27 May 2022 04:43:48 -0400 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 10DDC5DE6B for ; Fri, 27 May 2022 01:43:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From:Sender :Reply-To:Content-Type:Content-ID:Content-Description; bh=BXrGfpWzqdUsxu1+Zb9reao9QfbVhtK7A+dHaXIVexk=; b=sRVQ6GIyg1PnwaeNzsXdENvGVZ ZlG5SUd1oMvGPMk2FhzugDlj0S3UNL6hRI9r9lDM5BXCfe65eLy43hcT2wwLlcQ2RbnYHGcVEzp/y ecrBQJrpyKrMeYV+h71mE7xS/kMHEiO3zBrRxX+/XHKgvdYolGnAs2LsodILQra6oBJn6gFrNWH8C h9Csy4GfzGE1pW7J8ZW7J+6MhyO1HU0dDA8b6bzg0v1cGLQ8bOnsYdNZlbB8RDirq/nawsnNC2Yxx mjTpUD3oAiPt4al3biSh5NF7uW7xzf2a3SmeOFyT8Hc4I4mbWklxHxShITpTLz2LOprOYX7BpNyd4 FIAEkVjg==; Received: from [2001:4bb8:18c:7298:b5ab:7d49:c6be:2011] (helo=localhost) by bombadil.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1nuVZR-00H3an-Bw; Fri, 27 May 2022 08:43:45 +0000 From: Christoph Hellwig To: Chris Mason , Josef Bacik , David Sterba Cc: Qu Wenruo , linux-btrfs@vger.kernel.org Subject: [PATCH 9/9] btrfs: fold repair_io_failure into btrfs_repair_eb_io_failure Date: Fri, 27 May 2022 10:43:20 +0200 Message-Id: <20220527084320.2130831-10-hch@lst.de> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20220527084320.2130831-1-hch@lst.de> References: <20220527084320.2130831-1-hch@lst.de> MIME-Version: 1.0 X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Fold repair_io_failure into the only remaining caller. This is still inefficient with single page I/Os, but I have some ideas on how to improve the metadata repair in the future. Signed-off-by: Christoph Hellwig --- fs/btrfs/extent_io.c | 51 +++++++++++++++----------------------------- 1 file changed, 17 insertions(+), 34 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index c0cb3d4f5440f..57a709262b730 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2225,35 +2225,6 @@ int test_range_bit(struct extent_io_tree *tree, u64 start, u64 end, return bitset; } -static int repair_io_failure(struct btrfs_fs_info *fs_info, u64 ino, u64 start, - u64 length, u64 logical, struct page *page, - unsigned int pg_offset, int mirror_num) -{ - struct bio_vec bvec; - struct bio bio; - int ret; - - ASSERT(!(fs_info->sb->s_flags & SB_RDONLY)); - BUG_ON(!mirror_num); - - if (btrfs_repair_one_zone(fs_info, logical)) - return 0; - - bio_init(&bio, NULL, &bvec, 1, REQ_OP_WRITE | REQ_SYNC); - bio.bi_iter.bi_sector = logical >> 9; - __bio_add_page(&bio, page, length, pg_offset); - ret = btrfs_map_repair_bio(fs_info, &bio, mirror_num); - bio_uninit(&bio); - - if (ret) - return ret; - - btrfs_info_rl_in_rcu(fs_info, - "read error corrected: ino %llu off %llu (logical %llu)", - ino, start, logical); - return 0; -} - int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num) { struct btrfs_fs_info *fs_info = eb->fs_info; @@ -2261,20 +2232,32 @@ int btrfs_repair_eb_io_failure(const struct extent_buffer *eb, int mirror_num) int i, num_pages = num_extent_pages(eb); int ret = 0; + WARN_ON_ONCE(!mirror_num); + if (sb_rdonly(fs_info->sb)) return -EROFS; + if (btrfs_repair_one_zone(fs_info, eb->start)) + return 0; + for (i = 0; i < num_pages; i++) { struct page *p = eb->pages[i]; + struct bio_vec bvec; + struct bio bio; + + bio_init(&bio, NULL, &bvec, 1, REQ_OP_WRITE | REQ_SYNC); + bio.bi_iter.bi_sector = start >> 9; + __bio_add_page(&bio, p, PAGE_SIZE, start - page_offset(p)); + ret = btrfs_map_repair_bio(fs_info, &bio, mirror_num); + bio_uninit(&bio); - ret = repair_io_failure(fs_info, 0, start, PAGE_SIZE, start, p, - start - page_offset(p), mirror_num); if (ret) - break; - start += PAGE_SIZE; + return ret; } - return ret; + btrfs_info_rl_in_rcu(fs_info, + "metadata read error corrected: logical %llu.", eb->start); + return 0; } static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)