Message ID | 4db115f8781fbf68f30ca82a431cdd931767638d.1680765987.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: dev-replace: error out if we have unrepaired metadata error during | expand |
On Thu, Apr 06, 2023 at 03:26:29PM +0800, Qu Wenruo wrote: > [BUG] > Even before the scrub rework, if we have some corrupted metadata failed > to be repaired during replace, we still continue replace and let it > finish just as there is nothing wrong: > > BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 started > BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad csum, has 0x00000000 want 0xade80ca1 > BTRFS warning (device dm-4): tree block 5578752 mirror 0 has bad csum, has 0x00000000 want 0xade80ca1 > BTRFS warning (device dm-4): checksum error at logical 5578752 on dev /dev/mapper/test-scratch1, physical 5578752: metadata leaf (level 0) in tree 5 > BTRFS warning (device dm-4): checksum error at logical 5578752 on dev /dev/mapper/test-scratch1, physical 5578752: metadata leaf (level 0) in tree 5 > BTRFS error (device dm-4): bdev /dev/mapper/test-scratch1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 > BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad bytenr, has 0 want 5578752 > BTRFS error (device dm-4): unable to fixup (regular) error at logical 5578752 on dev /dev/mapper/test-scratch1 > BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 finished > > This can lead to unexpected problems for the result fs. > > [CAUSE] > Btrfs reuses scrub code path for dev-replace to iterate all dev extents. > > But unlike scrub, dev-replace doesn't really bother to check the scrub > progress, which records all the errors found during replace. > > And even if we checks the progress, we can not really determine which > errors are minor, which are critical just by the plain numbers. > (remember we don't treat metadata/data checksum error differently). > > This behavior is there from the very beginning. The code depends on the scrub rewrite, should we do a backport for <=6.3 kernels?
On 2023/4/7 01:09, David Sterba wrote: > On Thu, Apr 06, 2023 at 03:26:29PM +0800, Qu Wenruo wrote: >> [BUG] >> Even before the scrub rework, if we have some corrupted metadata failed >> to be repaired during replace, we still continue replace and let it >> finish just as there is nothing wrong: >> >> BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 started >> BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad csum, has 0x00000000 want 0xade80ca1 >> BTRFS warning (device dm-4): tree block 5578752 mirror 0 has bad csum, has 0x00000000 want 0xade80ca1 >> BTRFS warning (device dm-4): checksum error at logical 5578752 on dev /dev/mapper/test-scratch1, physical 5578752: metadata leaf (level 0) in tree 5 >> BTRFS warning (device dm-4): checksum error at logical 5578752 on dev /dev/mapper/test-scratch1, physical 5578752: metadata leaf (level 0) in tree 5 >> BTRFS error (device dm-4): bdev /dev/mapper/test-scratch1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 >> BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad bytenr, has 0 want 5578752 >> BTRFS error (device dm-4): unable to fixup (regular) error at logical 5578752 on dev /dev/mapper/test-scratch1 >> BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 finished >> >> This can lead to unexpected problems for the result fs. >> >> [CAUSE] >> Btrfs reuses scrub code path for dev-replace to iterate all dev extents. >> >> But unlike scrub, dev-replace doesn't really bother to check the scrub >> progress, which records all the errors found during replace. >> >> And even if we checks the progress, we can not really determine which >> errors are minor, which are critical just by the plain numbers. >> (remember we don't treat metadata/data checksum error differently). >> >> This behavior is there from the very beginning. > > The code depends on the scrub rewrite, should we do a backport for <=6.3 > kernels? For older kernels, the backport can be more complex, thus I'm still trying to figure out a way to do the same work, but no good progress yet. Thanks, Qu
On Thu, Apr 06, 2023 at 03:26:29PM +0800, Qu Wenruo wrote: > [BUG] > Even before the scrub rework, if we have some corrupted metadata failed > to be repaired during replace, we still continue replace and let it > finish just as there is nothing wrong: > > BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 started > BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad csum, has 0x00000000 want 0xade80ca1 > BTRFS warning (device dm-4): tree block 5578752 mirror 0 has bad csum, has 0x00000000 want 0xade80ca1 > BTRFS warning (device dm-4): checksum error at logical 5578752 on dev /dev/mapper/test-scratch1, physical 5578752: metadata leaf (level 0) in tree 5 > BTRFS warning (device dm-4): checksum error at logical 5578752 on dev /dev/mapper/test-scratch1, physical 5578752: metadata leaf (level 0) in tree 5 > BTRFS error (device dm-4): bdev /dev/mapper/test-scratch1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 > BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad bytenr, has 0 want 5578752 > BTRFS error (device dm-4): unable to fixup (regular) error at logical 5578752 on dev /dev/mapper/test-scratch1 > BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 finished > > This can lead to unexpected problems for the result fs. > > [CAUSE] > Btrfs reuses scrub code path for dev-replace to iterate all dev extents. > > But unlike scrub, dev-replace doesn't really bother to check the scrub > progress, which records all the errors found during replace. > > And even if we checks the progress, we can not really determine which > errors are minor, which are critical just by the plain numbers. > (remember we don't treat metadata/data checksum error differently). > > This behavior is there from the very beginning. > > [FIX] > Instead of continue the replace, just error out if we hit an unrepaired > metadata sector. > > Now the dev-replace would be rejected with -EIO, to inform the user. > Although it also means, the fs has some metadata error which can not be > repaired, the user would be super upset anyway. > > The new dmesg would look like this: > > BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 started > BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad csum, has 0x00000000 want 0xade80ca1 > BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad csum, has 0x00000000 want 0xade80ca1 > BTRFS error (device dm-4): unable to fixup (regular) error at logical 5570560 on dev /dev/mapper/test-scratch1 physical 5570560 > BTRFS warning (device dm-4): header error at logical 5570560 on dev /dev/mapper/test-scratch1, physical 5570560: metadata leaf (level 0) in tree 5 > BTRFS warning (device dm-4): header error at logical 5570560 on dev /dev/mapper/test-scratch1, physical 5570560: metadata leaf (level 0) in tree 5 > BTRFS error (device dm-4): stripe 5570560 has unrepaired metadata sector at 5578752 > BTRFS error (device dm-4): btrfs_scrub_dev(/dev/mapper/test-scratch1, 1, /dev/mapper/test-scratch2) failed -5 > > Signed-off-by: Qu Wenruo <wqu@suse.com> Where is this patch supposed to be applied? I tried the 6.3 base and misc-next but both have several conflicts. The other scrub patches https://lore.kernel.org/linux-btrfs/cover.1681364951.git.wqu@suse.com/ also don't apply either separately or on top of this patch.
On 2023/4/18 06:11, David Sterba wrote: > On Thu, Apr 06, 2023 at 03:26:29PM +0800, Qu Wenruo wrote: >> [BUG] >> Even before the scrub rework, if we have some corrupted metadata failed >> to be repaired during replace, we still continue replace and let it >> finish just as there is nothing wrong: >> >> BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 started >> BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad csum, has 0x00000000 want 0xade80ca1 >> BTRFS warning (device dm-4): tree block 5578752 mirror 0 has bad csum, has 0x00000000 want 0xade80ca1 >> BTRFS warning (device dm-4): checksum error at logical 5578752 on dev /dev/mapper/test-scratch1, physical 5578752: metadata leaf (level 0) in tree 5 >> BTRFS warning (device dm-4): checksum error at logical 5578752 on dev /dev/mapper/test-scratch1, physical 5578752: metadata leaf (level 0) in tree 5 >> BTRFS error (device dm-4): bdev /dev/mapper/test-scratch1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 >> BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad bytenr, has 0 want 5578752 >> BTRFS error (device dm-4): unable to fixup (regular) error at logical 5578752 on dev /dev/mapper/test-scratch1 >> BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 finished >> >> This can lead to unexpected problems for the result fs. >> >> [CAUSE] >> Btrfs reuses scrub code path for dev-replace to iterate all dev extents. >> >> But unlike scrub, dev-replace doesn't really bother to check the scrub >> progress, which records all the errors found during replace. >> >> And even if we checks the progress, we can not really determine which >> errors are minor, which are critical just by the plain numbers. >> (remember we don't treat metadata/data checksum error differently). >> >> This behavior is there from the very beginning. >> >> [FIX] >> Instead of continue the replace, just error out if we hit an unrepaired >> metadata sector. >> >> Now the dev-replace would be rejected with -EIO, to inform the user. >> Although it also means, the fs has some metadata error which can not be >> repaired, the user would be super upset anyway. >> >> The new dmesg would look like this: >> >> BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 started >> BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad csum, has 0x00000000 want 0xade80ca1 >> BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad csum, has 0x00000000 want 0xade80ca1 >> BTRFS error (device dm-4): unable to fixup (regular) error at logical 5570560 on dev /dev/mapper/test-scratch1 physical 5570560 >> BTRFS warning (device dm-4): header error at logical 5570560 on dev /dev/mapper/test-scratch1, physical 5570560: metadata leaf (level 0) in tree 5 >> BTRFS warning (device dm-4): header error at logical 5570560 on dev /dev/mapper/test-scratch1, physical 5570560: metadata leaf (level 0) in tree 5 >> BTRFS error (device dm-4): stripe 5570560 has unrepaired metadata sector at 5578752 >> BTRFS error (device dm-4): btrfs_scrub_dev(/dev/mapper/test-scratch1, 1, /dev/mapper/test-scratch2) failed -5 >> >> Signed-off-by: Qu Wenruo <wqu@suse.com> > > Where is this patch supposed to be applied? I tried the 6.3 base and > misc-next but both have several conflicts. It is based on misc-next of the time, which has all the scrub rework. I'll refresh the patch soon. Thanks, Qu > The other scrub patches > https://lore.kernel.org/linux-btrfs/cover.1681364951.git.wqu@suse.com/ > also don't apply either separately or on top of this patch.
On Tue, Apr 18, 2023 at 06:53:02AM +0800, Qu Wenruo wrote: > > > On 2023/4/18 06:11, David Sterba wrote: > > On Thu, Apr 06, 2023 at 03:26:29PM +0800, Qu Wenruo wrote: > >> [BUG] > >> Even before the scrub rework, if we have some corrupted metadata failed > >> to be repaired during replace, we still continue replace and let it > >> finish just as there is nothing wrong: > >> > >> BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 started > >> BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad csum, has 0x00000000 want 0xade80ca1 > >> BTRFS warning (device dm-4): tree block 5578752 mirror 0 has bad csum, has 0x00000000 want 0xade80ca1 > >> BTRFS warning (device dm-4): checksum error at logical 5578752 on dev /dev/mapper/test-scratch1, physical 5578752: metadata leaf (level 0) in tree 5 > >> BTRFS warning (device dm-4): checksum error at logical 5578752 on dev /dev/mapper/test-scratch1, physical 5578752: metadata leaf (level 0) in tree 5 > >> BTRFS error (device dm-4): bdev /dev/mapper/test-scratch1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 > >> BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad bytenr, has 0 want 5578752 > >> BTRFS error (device dm-4): unable to fixup (regular) error at logical 5578752 on dev /dev/mapper/test-scratch1 > >> BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 finished > >> > >> This can lead to unexpected problems for the result fs. > >> > >> [CAUSE] > >> Btrfs reuses scrub code path for dev-replace to iterate all dev extents. > >> > >> But unlike scrub, dev-replace doesn't really bother to check the scrub > >> progress, which records all the errors found during replace. > >> > >> And even if we checks the progress, we can not really determine which > >> errors are minor, which are critical just by the plain numbers. > >> (remember we don't treat metadata/data checksum error differently). > >> > >> This behavior is there from the very beginning. > >> > >> [FIX] > >> Instead of continue the replace, just error out if we hit an unrepaired > >> metadata sector. > >> > >> Now the dev-replace would be rejected with -EIO, to inform the user. > >> Although it also means, the fs has some metadata error which can not be > >> repaired, the user would be super upset anyway. > >> > >> The new dmesg would look like this: > >> > >> BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 started > >> BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad csum, has 0x00000000 want 0xade80ca1 > >> BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad csum, has 0x00000000 want 0xade80ca1 > >> BTRFS error (device dm-4): unable to fixup (regular) error at logical 5570560 on dev /dev/mapper/test-scratch1 physical 5570560 > >> BTRFS warning (device dm-4): header error at logical 5570560 on dev /dev/mapper/test-scratch1, physical 5570560: metadata leaf (level 0) in tree 5 > >> BTRFS warning (device dm-4): header error at logical 5570560 on dev /dev/mapper/test-scratch1, physical 5570560: metadata leaf (level 0) in tree 5 > >> BTRFS error (device dm-4): stripe 5570560 has unrepaired metadata sector at 5578752 > >> BTRFS error (device dm-4): btrfs_scrub_dev(/dev/mapper/test-scratch1, 1, /dev/mapper/test-scratch2) failed -5 > >> > >> Signed-off-by: Qu Wenruo <wqu@suse.com> > > > > Where is this patch supposed to be applied? I tried the 6.3 base and > > misc-next but both have several conflicts. > > It is based on misc-next of the time, which has all the scrub rework. > > I'll refresh the patch soon. Hold on, I did not notice the patch was already in misc-next, no need for resend.
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 8450c7b4b8b9..ea5e1df8d513 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -1685,14 +1685,33 @@ static void scrub_submit_initial_read(struct scrub_ctx *sctx, btrfs_submit_bio(bbio, mirror); } -static void flush_scrub_stripes(struct scrub_ctx *sctx) +static bool stripe_has_metadata_error(struct scrub_stripe *stripe) +{ + int i; + + for_each_set_bit(i, &stripe->error_bitmap, stripe->nr_sectors) { + if (stripe->sectors[i].is_metadata) { + struct btrfs_fs_info *fs_info = stripe->bg->fs_info; + + btrfs_err(fs_info, + "stripe %llu has unrepaired metadata sector at %llu", + stripe->logical, + stripe->logical + (i << fs_info->sectorsize_bits)); + return true; + } + } + return false; +} + +static int flush_scrub_stripes(struct scrub_ctx *sctx) { struct btrfs_fs_info *fs_info = sctx->fs_info; struct scrub_stripe *stripe; const int nr_stripes = sctx->cur_stripe; + int ret = 0; if (!nr_stripes) - return; + return ret; ASSERT(test_bit(SCRUB_STRIPE_FLAG_INITIALIZED, &sctx->stripes[0].state)); @@ -1739,6 +1758,18 @@ static void flush_scrub_stripes(struct scrub_ctx *sctx) /* Submit for dev-replace. */ if (sctx->is_dev_replace) { + /* + * For dev-replace, if we know there is something wrong with + * metadata, we should immedately abort. + */ + for (int i = 0; i < nr_stripes; i++) { + stripe = &sctx->stripes[i]; + + if (stripe_has_metadata_error(stripe)) { + ret = -EIO; + goto out; + } + } for (int i = 0; i < nr_stripes; i++) { unsigned long good; @@ -1759,7 +1790,9 @@ static void flush_scrub_stripes(struct scrub_ctx *sctx) wait_scrub_stripe_io(stripe); scrub_reset_stripe(stripe); } +out: sctx->cur_stripe = 0; + return ret; } static void raid56_scrub_wait_endio(struct bio *bio) @@ -1776,8 +1809,11 @@ static int queue_scrub_stripe(struct scrub_ctx *sctx, int ret; /* No available slot, submit all stripes and wait for them. */ - if (sctx->cur_stripe >= SCRUB_STRIPES_PER_SCTX) - flush_scrub_stripes(sctx); + if (sctx->cur_stripe >= SCRUB_STRIPES_PER_SCTX) { + ret = flush_scrub_stripes(sctx); + if (ret < 0) + return ret; + } stripe = &sctx->stripes[sctx->cur_stripe]; @@ -2108,6 +2144,7 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, const u64 profile = map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK; const u64 chunk_logical = bg->start; int ret; + int tmp; u64 physical = map->stripes[stripe_index].physical; const u64 dev_stripe_len = btrfs_calc_stripe_length(em); const u64 physical_end = physical + dev_stripe_len; @@ -2234,7 +2271,9 @@ static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx, break; } out: - flush_scrub_stripes(sctx); + tmp = flush_scrub_stripes(sctx); + if (!ret) + ret = tmp; if (sctx->raid56_data_stripes) { for (int i = 0; i < nr_data_stripes(map); i++) release_scrub_stripe(&sctx->raid56_data_stripes[i]);
[BUG] Even before the scrub rework, if we have some corrupted metadata failed to be repaired during replace, we still continue replace and let it finish just as there is nothing wrong: BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 started BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad csum, has 0x00000000 want 0xade80ca1 BTRFS warning (device dm-4): tree block 5578752 mirror 0 has bad csum, has 0x00000000 want 0xade80ca1 BTRFS warning (device dm-4): checksum error at logical 5578752 on dev /dev/mapper/test-scratch1, physical 5578752: metadata leaf (level 0) in tree 5 BTRFS warning (device dm-4): checksum error at logical 5578752 on dev /dev/mapper/test-scratch1, physical 5578752: metadata leaf (level 0) in tree 5 BTRFS error (device dm-4): bdev /dev/mapper/test-scratch1 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0 BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad bytenr, has 0 want 5578752 BTRFS error (device dm-4): unable to fixup (regular) error at logical 5578752 on dev /dev/mapper/test-scratch1 BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 finished This can lead to unexpected problems for the result fs. [CAUSE] Btrfs reuses scrub code path for dev-replace to iterate all dev extents. But unlike scrub, dev-replace doesn't really bother to check the scrub progress, which records all the errors found during replace. And even if we checks the progress, we can not really determine which errors are minor, which are critical just by the plain numbers. (remember we don't treat metadata/data checksum error differently). This behavior is there from the very beginning. [FIX] Instead of continue the replace, just error out if we hit an unrepaired metadata sector. Now the dev-replace would be rejected with -EIO, to inform the user. Although it also means, the fs has some metadata error which can not be repaired, the user would be super upset anyway. The new dmesg would look like this: BTRFS info (device dm-4): dev_replace from /dev/mapper/test-scratch1 (devid 1) to /dev/mapper/test-scratch2 started BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad csum, has 0x00000000 want 0xade80ca1 BTRFS warning (device dm-4): tree block 5578752 mirror 1 has bad csum, has 0x00000000 want 0xade80ca1 BTRFS error (device dm-4): unable to fixup (regular) error at logical 5570560 on dev /dev/mapper/test-scratch1 physical 5570560 BTRFS warning (device dm-4): header error at logical 5570560 on dev /dev/mapper/test-scratch1, physical 5570560: metadata leaf (level 0) in tree 5 BTRFS warning (device dm-4): header error at logical 5570560 on dev /dev/mapper/test-scratch1, physical 5570560: metadata leaf (level 0) in tree 5 BTRFS error (device dm-4): stripe 5570560 has unrepaired metadata sector at 5578752 BTRFS error (device dm-4): btrfs_scrub_dev(/dev/mapper/test-scratch1, 1, /dev/mapper/test-scratch2) failed -5 Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/scrub.c | 49 +++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 44 insertions(+), 5 deletions(-)