[v2] btrfs: scrub: handle RST lookup error correctly

Message ID	67e8eaaf47d261c3fb3f26dd1463c06dd3412d5e.1718688466.git.wqu@suse.com (mailing list archive)
State	New, archived
Headers	show Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8C9F9FC18 for <linux-btrfs@vger.kernel.org>; Tue, 18 Jun 2024 05:28:42 +0000 (UTC) From: Qu Wenruo <wqu@suse.com> To: linux-btrfs@vger.kernel.org Cc: Johannes.Thumshirn@wdc.com Subject: [PATCH v2] btrfs: scrub: handle RST lookup error correctly Date: Tue, 18 Jun 2024 14:58:20 +0930 Message-ID: <67e8eaaf47d261c3fb3f26dd1463c06dd3412d5e.1718688466.git.wqu@suse.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[v2] btrfs: scrub: handle RST lookup error correctly \| expand [v2] btrfs: scrub: handle RST lookup error correctly

Message ID

67e8eaaf47d261c3fb3f26dd1463c06dd3412d5e.1718688466.git.wqu@suse.com (mailing list archive)

State

New, archived

Headers

From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org
Cc: Johannes.Thumshirn@wdc.com
Subject: [PATCH v2] btrfs: scrub: handle RST lookup error correctly
Date: Tue, 18 Jun 2024 14:58:20 +0930
Message-ID: 
 <67e8eaaf47d261c3fb3f26dd1463c06dd3412d5e.1718688466.git.wqu@suse.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

[v2] btrfs: scrub: handle RST lookup error correctly | expand

Commit Message

Qu Wenruo June 18, 2024, 5:28 a.m. UTC

[BUG]
When running btrfs/060 with forced RST feature, it would crash the
following ASSERT() inside scrub_read_endio():

	ASSERT(sector_nr < stripe->nr_sectors);

Before that, we would have tree dump from
btrfs_get_raid_extent_offset(), as we failed to find the RST entry for
the range.

[CAUSE]
Inside scrub_submit_extent_sector_read() every time we allocated a new
bbio we immediate call btrfs_map_block() to make sure there is some RST
range covering the scrub target.

But if btrfs_map_block() failed, we immediately call endio for the bbio,
meanwhile since the bbio is newly allocated, it's completely empty.

Then inside scrub_read_endio(), we go through the bvecs to find
the sector number (as bi_sector is no longer reliable if the bio is
submitted to lower layers).

And since the bio is empty, such bvecs iteration would not find any
sector matching the sector, and return sector_nr == stripe->nr_sectors,
triggering the ASSERT().

[FIX]
Instead of calling btrfs_map_block() after allocating a new bbio, call
btrfs_map_block() first.

Since our only objective of calling btrfs_map_block() is only to update
stripe_len, there is really no need to do that after btrfs_alloc_bio().

This new timing would avoid the problem of handling empty bbio
completely, and in fact fixes a possible race window for the old code,
where if the submission thread is the only owner of the pending_io, the
scrub would never finish (since we didn't decrease the pending_io
counter).

Although the root cause of RST lookup failure still needs to be
addressed.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
In fact, with this bug patched, I still hit an RST related problem, and
it's from OE finishing, thus I believe there are still other bugs in the
RST related code.

Changelog:
v2:
- Only mark the current sector as error and continue to the next sector
  This would make the error handling of RST error more accurate, other
  than marking all the remaining sectors as error.

RFC->v1:
- Fix the bug by altering the btrfs_map_block()
  Now we have no chance to create empty bbio, and much easier to handle
  the RST error.
---
 fs/btrfs/scrub.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

Comments

Johannes Thumshirn June 18, 2024, 8:13 a.m. UTC | #1

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 188a9c42c9eb..35cf23a3a432 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1688,20 +1688,24 @@  static void scrub_submit_extent_sector_read(struct scrub_ctx *sctx,
 					    (i << fs_info->sectorsize_bits);
 			int err;
 
+			io_stripe.is_scrub = true;
+			stripe_len = (nr_sectors - i) << fs_info->sectorsize_bits;
+			/*
+			 * For RST cases, we need to manually split the bbio to
+			 * follow the RST boundary.
+			 */
+			err = btrfs_map_block(fs_info, BTRFS_MAP_READ, logical,
+					&stripe_len, &bioc, &io_stripe, &mirror);
+			btrfs_put_bioc(bioc);
+			if (err < 0) {
+				set_bit(i, &stripe->io_error_bitmap);
+				set_bit(i, &stripe->error_bitmap);
+				continue;
+			}
+
 			bbio = btrfs_bio_alloc(stripe->nr_sectors, REQ_OP_READ,
 					       fs_info, scrub_read_endio, stripe);
 			bbio->bio.bi_iter.bi_sector = logical >> SECTOR_SHIFT;
-
-			io_stripe.is_scrub = true;
-			err = btrfs_map_block(fs_info, BTRFS_MAP_READ, logical,
-					      &stripe_len, &bioc, &io_stripe,
-					      &mirror);
-			btrfs_put_bioc(bioc);
-			if (err) {
-				btrfs_bio_end_io(bbio,
-						 errno_to_blk_status(err));
-				return;
-			}
 		}
 
 		__bio_add_page(&bbio->bio, page, fs_info->sectorsize, pgoff);

[v2] btrfs: scrub: handle RST lookup error correctly

Commit Message

Comments

Patch