Message ID | 91a3647a1f2657b89bd63c12fa466c6c70965d22.1709606883.git.wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: scrub: fix false alerts on zoned device scrubing | expand |
On Tue, Mar 05, 2024 at 01:18:22PM +1030, Qu Wenruo wrote: > [BUG] > When using zoned devices (zbc), scrub would always report super block > errors like the following: > > # btrfs scrub start -fB /mnt/btrfs/ > Starting scrub on devid 1 > scrub done for b7b5c759-1baa-4561-a0ca-b8d0babcde56 > Scrub started: Tue Mar 5 12:49:14 2024 > Status: finished > Duration: 0:00:00 > Total to scrub: 288.00KiB > Rate: 288.00KiB/s > Error summary: super=2 > Corrected: 0 > Uncorrectable: 0 > Unverified: 0 > > [CAUSE] > Since the very beginning of scrub, we always go with btrfs_sb_offset() > to grab the super blocks. > This is fine for regular btrfs filesystems, but for zoned btrfs, super > blocks are stored in dedicated zones with a ring buffer like structure. > > This means the old btrfs_sb_offset() is not able to give the correct > bytenr for us to grabbing the super blocks, thus except the primary > super block, the rest would be garbage and cause the above false alerts. > > [FIX] > Instead of btrfs_sb_offset(), go with btrfs_sb_log_location() which is > zoned friendly, to grab the correct super block location. > > This would introduce new error patterns, as btrfs_sb_log_location() can > fail with extra errors. > > Here for -ENOENT we just end the scrub as there are no more super > blocks. > For other errors, we record it as a super block error and exit. > > Reported-by: WA AM <waautomata@gmail.com> > Link: https://lore.kernel.org/all/CANU2Z0EvUzfYxczLgGUiREoMndE9WdQnbaawV5Fv5gNXptPUKw@mail.gmail.com/ > CC: stable@vger.kernel.org # 5.15+ > Signed-off-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com> > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/scrub.c | 13 +++++++++++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c > index c4bd0e60db59..e1b67baa4072 100644 > --- a/fs/btrfs/scrub.c > +++ b/fs/btrfs/scrub.c > @@ -2788,7 +2788,6 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx, > struct btrfs_device *scrub_dev) > { > int i; > - u64 bytenr; > u64 gen; > int ret = 0; > struct page *page; > @@ -2812,7 +2811,17 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx, > gen = btrfs_get_last_trans_committed(fs_info); > > for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { > - bytenr = btrfs_sb_offset(i); > + u64 bytenr; > + > + ret = btrfs_sb_log_location(scrub_dev, i, 0, &bytenr); > + if (ret == -ENOENT) > + break; > + if (ret < 0) { > + spin_lock(&sctx->stat_lock); > + sctx->stat.super_errors++; > + spin_unlock(&sctx->stat_lock); > + break; Since an error from scrub_one_super can continue, this can be "continue" also? E.g, if btrfs_sb_log_location() returns -EUCLEAN on the 2nd SB, it fails to detect the 3rd SB's corruption. Other than that, looks good. Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> > + } > if (bytenr + BTRFS_SUPER_INFO_SIZE > > scrub_dev->commit_total_bytes) > break; > -- > 2.44.0 >
在 2024/3/5 19:18, Naohiro Aota 写道: > On Tue, Mar 05, 2024 at 01:18:22PM +1030, Qu Wenruo wrote: >> [BUG] >> When using zoned devices (zbc), scrub would always report super block >> errors like the following: >> >> # btrfs scrub start -fB /mnt/btrfs/ >> Starting scrub on devid 1 >> scrub done for b7b5c759-1baa-4561-a0ca-b8d0babcde56 >> Scrub started: Tue Mar 5 12:49:14 2024 >> Status: finished >> Duration: 0:00:00 >> Total to scrub: 288.00KiB >> Rate: 288.00KiB/s >> Error summary: super=2 >> Corrected: 0 >> Uncorrectable: 0 >> Unverified: 0 >> >> [CAUSE] >> Since the very beginning of scrub, we always go with btrfs_sb_offset() >> to grab the super blocks. >> This is fine for regular btrfs filesystems, but for zoned btrfs, super >> blocks are stored in dedicated zones with a ring buffer like structure. >> >> This means the old btrfs_sb_offset() is not able to give the correct >> bytenr for us to grabbing the super blocks, thus except the primary >> super block, the rest would be garbage and cause the above false alerts. >> >> [FIX] >> Instead of btrfs_sb_offset(), go with btrfs_sb_log_location() which is >> zoned friendly, to grab the correct super block location. >> >> This would introduce new error patterns, as btrfs_sb_log_location() can >> fail with extra errors. >> >> Here for -ENOENT we just end the scrub as there are no more super >> blocks. >> For other errors, we record it as a super block error and exit. >> >> Reported-by: WA AM <waautomata@gmail.com> >> Link: https://lore.kernel.org/all/CANU2Z0EvUzfYxczLgGUiREoMndE9WdQnbaawV5Fv5gNXptPUKw@mail.gmail.com/ >> CC: stable@vger.kernel.org # 5.15+ >> Signed-off-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com> >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> fs/btrfs/scrub.c | 13 +++++++++++-- >> 1 file changed, 11 insertions(+), 2 deletions(-) >> >> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c >> index c4bd0e60db59..e1b67baa4072 100644 >> --- a/fs/btrfs/scrub.c >> +++ b/fs/btrfs/scrub.c >> @@ -2788,7 +2788,6 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx, >> struct btrfs_device *scrub_dev) >> { >> int i; >> - u64 bytenr; >> u64 gen; >> int ret = 0; >> struct page *page; >> @@ -2812,7 +2811,17 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx, >> gen = btrfs_get_last_trans_committed(fs_info); >> >> for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { >> - bytenr = btrfs_sb_offset(i); >> + u64 bytenr; >> + >> + ret = btrfs_sb_log_location(scrub_dev, i, 0, &bytenr); >> + if (ret == -ENOENT) >> + break; >> + if (ret < 0) { >> + spin_lock(&sctx->stat_lock); >> + sctx->stat.super_errors++; >> + spin_unlock(&sctx->stat_lock); >> + break; > > Since an error from scrub_one_super can continue, this can be "continue" > also? > E.g, if btrfs_sb_log_location() returns -EUCLEAN on the 2nd SB, it > fails to detect the 3rd SB's corruption. You're right, I originally though it can return error from blkdev_zone_mgmt() and if that failed we can no longer continue. But that can only happen in write case, meanwhile we're in the READ path. And in that case, sb_write_pointer() can return -ECULEAN and we can still continue. Would allow continue for non-ENOENT errors. Thanks, Qu > > Other than that, looks good. > > Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> > >> + } >> if (bytenr + BTRFS_SUPER_INFO_SIZE > >> scrub_dev->commit_total_bytes) >> break; >> -- >> 2.44.0
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index c4bd0e60db59..e1b67baa4072 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -2788,7 +2788,6 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx, struct btrfs_device *scrub_dev) { int i; - u64 bytenr; u64 gen; int ret = 0; struct page *page; @@ -2812,7 +2811,17 @@ static noinline_for_stack int scrub_supers(struct scrub_ctx *sctx, gen = btrfs_get_last_trans_committed(fs_info); for (i = 0; i < BTRFS_SUPER_MIRROR_MAX; i++) { - bytenr = btrfs_sb_offset(i); + u64 bytenr; + + ret = btrfs_sb_log_location(scrub_dev, i, 0, &bytenr); + if (ret == -ENOENT) + break; + if (ret < 0) { + spin_lock(&sctx->stat_lock); + sctx->stat.super_errors++; + spin_unlock(&sctx->stat_lock); + break; + } if (bytenr + BTRFS_SUPER_INFO_SIZE > scrub_dev->commit_total_bytes) break;