Message ID | 20240912143312.14442-1-jth@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: scrub: skip PREALLOC extents on RAID stripe-tree | expand |
在 2024/9/13 00:03, Johannes Thumshirn 写道: > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > When scrubbing a RAID stripe-tree based filesystem with preallocated > extents, the btrfs_map_block() called by > scrub_submit_extent_sector_read() will return ENOENT, because there is > no RAID stripe-tree entry for preallocated extents. This then causes > the sector to be marked as a sector with an I/O error. > > To prevent this false alert don't mark secotors for that > btrfs_map_block() returned an ENOENT as I/O errors but skip them. > > This results for example in errors in fstests' btrfs/060 .. btrfs/074 > which all perform fsstress and scrub operations. Whit this fix, these > errors are gone and the tests pass again. > > Cc: Qu Wenru <wqu@suse.com> My concern is, ENOENT can be some real problems other than PREALLOC. I'd prefer this to be the last-resort method. Would it be possible to create an RST entry for preallocated operations manually? E.g. without creating a dummy OE, but just insert the needed RST entries into RST tree at fallocate time? Thanks, Qu > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > --- > fs/btrfs/scrub.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c > index 3a3427428074..b195c41676e3 100644 > --- a/fs/btrfs/scrub.c > +++ b/fs/btrfs/scrub.c > @@ -1657,7 +1657,7 @@ static u32 stripe_length(const struct scrub_stripe *stripe) > } > > static void scrub_submit_extent_sector_read(struct scrub_ctx *sctx, > - struct scrub_stripe *stripe) > + struct scrub_stripe *stripe) > { > struct btrfs_fs_info *fs_info = stripe->bg->fs_info; > struct btrfs_bio *bbio = NULL; > @@ -1703,10 +1703,21 @@ static void scrub_submit_extent_sector_read(struct scrub_ctx *sctx, > err = btrfs_map_block(fs_info, BTRFS_MAP_READ, logical, > &stripe_len, &bioc, &io_stripe, &mirror); > btrfs_put_bioc(bioc); > - if (err < 0) { > + if (err < 0 && err != -ENOENT) { > set_bit(i, &stripe->io_error_bitmap); > set_bit(i, &stripe->error_bitmap); > continue; > + } else if (err == -ENOENT) { > + /* > + * btrfs_map_block() returns -ENOENT if it can't > + * find the logical address in the RAID stripe > + * tree. This can happens on PREALLOC extents. > + * As we know the extent tree has an extent > + * recorded there, we can be sure this is a > + * PREALLOC extent, so skip this sector and > + * continue to the next. > + */ > + continue; > } > > bbio = btrfs_bio_alloc(stripe->nr_sectors, REQ_OP_READ,
On 12.09.24 23:32, Qu Wenruo wrote: > > > 在 2024/9/13 00:03, Johannes Thumshirn 写道: >> From: Johannes Thumshirn <johannes.thumshirn@wdc.com> >> >> When scrubbing a RAID stripe-tree based filesystem with preallocated >> extents, the btrfs_map_block() called by >> scrub_submit_extent_sector_read() will return ENOENT, because there is >> no RAID stripe-tree entry for preallocated extents. This then causes >> the sector to be marked as a sector with an I/O error. >> >> To prevent this false alert don't mark secotors for that >> btrfs_map_block() returned an ENOENT as I/O errors but skip them. >> >> This results for example in errors in fstests' btrfs/060 .. btrfs/074 >> which all perform fsstress and scrub operations. Whit this fix, these >> errors are gone and the tests pass again. >> >> Cc: Qu Wenru <wqu@suse.com> > > My concern is, ENOENT can be some real problems other than PREALLOC. > I'd prefer this to be the last-resort method. Hm but what else could create an entry in the extent tree without having it in the stripe tree? I can't really think of a situation creating this layout. > Would it be possible to create an RST entry for preallocated operations > manually? E.g. without creating a dummy OE, but just insert the needed > RST entries into RST tree at fallocate time? Let me give it a try. But I'm a bit less happy to do so, as RST already increases the write amplification.
在 2024/9/13 15:12, Johannes Thumshirn 写道: > On 12.09.24 23:32, Qu Wenruo wrote: >> >> >> 在 2024/9/13 00:03, Johannes Thumshirn 写道: >>> From: Johannes Thumshirn <johannes.thumshirn@wdc.com> >>> >>> When scrubbing a RAID stripe-tree based filesystem with preallocated >>> extents, the btrfs_map_block() called by >>> scrub_submit_extent_sector_read() will return ENOENT, because there is >>> no RAID stripe-tree entry for preallocated extents. This then causes >>> the sector to be marked as a sector with an I/O error. >>> >>> To prevent this false alert don't mark secotors for that >>> btrfs_map_block() returned an ENOENT as I/O errors but skip them. >>> >>> This results for example in errors in fstests' btrfs/060 .. btrfs/074 >>> which all perform fsstress and scrub operations. Whit this fix, these >>> errors are gone and the tests pass again. >>> >>> Cc: Qu Wenru <wqu@suse.com> >> >> My concern is, ENOENT can be some real problems other than PREALLOC. >> I'd prefer this to be the last-resort method. > > Hm but what else could create an entry in the extent tree without having > it in the stripe tree? I can't really think of a situation creating this > layout. My concern is that, if by some other bug that certain writes didn't create needed RST entry, we will always treat them as preallocated during scrub. Thus it may be better to have a way to distinguish a real missing entry and preallocated extents. > > >> Would it be possible to create an RST entry for preallocated operations >> manually? E.g. without creating a dummy OE, but just insert the needed >> RST entries into RST tree at fallocate time? > > Let me give it a try. But I'm a bit less happy to do so, as RST already > increases the write amplification. Well, write amplification is always a big problem for btrfs... Thanks, Qu
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c index 3a3427428074..b195c41676e3 100644 --- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -1657,7 +1657,7 @@ static u32 stripe_length(const struct scrub_stripe *stripe) } static void scrub_submit_extent_sector_read(struct scrub_ctx *sctx, - struct scrub_stripe *stripe) + struct scrub_stripe *stripe) { struct btrfs_fs_info *fs_info = stripe->bg->fs_info; struct btrfs_bio *bbio = NULL; @@ -1703,10 +1703,21 @@ static void scrub_submit_extent_sector_read(struct scrub_ctx *sctx, err = btrfs_map_block(fs_info, BTRFS_MAP_READ, logical, &stripe_len, &bioc, &io_stripe, &mirror); btrfs_put_bioc(bioc); - if (err < 0) { + if (err < 0 && err != -ENOENT) { set_bit(i, &stripe->io_error_bitmap); set_bit(i, &stripe->error_bitmap); continue; + } else if (err == -ENOENT) { + /* + * btrfs_map_block() returns -ENOENT if it can't + * find the logical address in the RAID stripe + * tree. This can happens on PREALLOC extents. + * As we know the extent tree has an extent + * recorded there, we can be sure this is a + * PREALLOC extent, so skip this sector and + * continue to the next. + */ + continue; } bbio = btrfs_bio_alloc(stripe->nr_sectors, REQ_OP_READ,