diff mbox series

btrfs: scrub: skip PREALLOC extents on RAID stripe-tree

Message ID 20240912143312.14442-1-jth@kernel.org (mailing list archive)
State New
Headers show
Series btrfs: scrub: skip PREALLOC extents on RAID stripe-tree | expand

Commit Message

Johannes Thumshirn Sept. 12, 2024, 2:33 p.m. UTC
From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

When scrubbing a RAID stripe-tree based filesystem with preallocated
extents, the btrfs_map_block() called by
scrub_submit_extent_sector_read() will return ENOENT, because there is
no RAID stripe-tree entry for preallocated extents. This then causes
the sector to be marked as a sector with an I/O error.

To prevent this false alert don't mark secotors for that
btrfs_map_block() returned an ENOENT as I/O errors but skip them.

This results for example in errors in fstests' btrfs/060 .. btrfs/074
which all perform fsstress and scrub operations. Whit this fix, these
errors are gone and the tests pass again.

Cc: Qu Wenru <wqu@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/scrub.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

Comments

Qu Wenruo Sept. 12, 2024, 9:32 p.m. UTC | #1
在 2024/9/13 00:03, Johannes Thumshirn 写道:
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> When scrubbing a RAID stripe-tree based filesystem with preallocated
> extents, the btrfs_map_block() called by
> scrub_submit_extent_sector_read() will return ENOENT, because there is
> no RAID stripe-tree entry for preallocated extents. This then causes
> the sector to be marked as a sector with an I/O error.
>
> To prevent this false alert don't mark secotors for that
> btrfs_map_block() returned an ENOENT as I/O errors but skip them.
>
> This results for example in errors in fstests' btrfs/060 .. btrfs/074
> which all perform fsstress and scrub operations. Whit this fix, these
> errors are gone and the tests pass again.
>
> Cc: Qu Wenru <wqu@suse.com>

My concern is, ENOENT can be some real problems other than PREALLOC.
I'd prefer this to be the last-resort method.

Would it be possible to create an RST entry for preallocated operations
manually? E.g. without creating a dummy OE, but just insert the needed
RST entries into RST tree at fallocate time?

Thanks,
Qu

> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>   fs/btrfs/scrub.c | 15 +++++++++++++--
>   1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index 3a3427428074..b195c41676e3 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -1657,7 +1657,7 @@ static u32 stripe_length(const struct scrub_stripe *stripe)
>   }
>
>   static void scrub_submit_extent_sector_read(struct scrub_ctx *sctx,
> -					    struct scrub_stripe *stripe)
> +					   struct scrub_stripe *stripe)
>   {
>   	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
>   	struct btrfs_bio *bbio = NULL;
> @@ -1703,10 +1703,21 @@ static void scrub_submit_extent_sector_read(struct scrub_ctx *sctx,
>   			err = btrfs_map_block(fs_info, BTRFS_MAP_READ, logical,
>   					      &stripe_len, &bioc, &io_stripe, &mirror);
>   			btrfs_put_bioc(bioc);
> -			if (err < 0) {
> +			if (err < 0 && err != -ENOENT) {
>   				set_bit(i, &stripe->io_error_bitmap);
>   				set_bit(i, &stripe->error_bitmap);
>   				continue;
> +			} else if (err == -ENOENT) {
> +				/*
> +				 * btrfs_map_block() returns -ENOENT if it can't
> +				 * find the logical  address in the RAID stripe
> +				 * tree. This can happens on PREALLOC  extents.
> +				 * As we know the extent tree  has an extent
> +				 * recorded there, we can be sure this is a
> +				 * PREALLOC  extent, so skip this sector and
> +				 * continue to the next.
> +				 */
> +				continue;
>   			}
>
>   			bbio = btrfs_bio_alloc(stripe->nr_sectors, REQ_OP_READ,
Johannes Thumshirn Sept. 13, 2024, 5:42 a.m. UTC | #2
On 12.09.24 23:32, Qu Wenruo wrote:
> 
> 
> 在 2024/9/13 00:03, Johannes Thumshirn 写道:
>> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>
>> When scrubbing a RAID stripe-tree based filesystem with preallocated
>> extents, the btrfs_map_block() called by
>> scrub_submit_extent_sector_read() will return ENOENT, because there is
>> no RAID stripe-tree entry for preallocated extents. This then causes
>> the sector to be marked as a sector with an I/O error.
>>
>> To prevent this false alert don't mark secotors for that
>> btrfs_map_block() returned an ENOENT as I/O errors but skip them.
>>
>> This results for example in errors in fstests' btrfs/060 .. btrfs/074
>> which all perform fsstress and scrub operations. Whit this fix, these
>> errors are gone and the tests pass again.
>>
>> Cc: Qu Wenru <wqu@suse.com>
> 
> My concern is, ENOENT can be some real problems other than PREALLOC.
> I'd prefer this to be the last-resort method.

Hm but what else could create an entry in the extent tree without having 
it in the stripe tree? I can't really think of a situation creating this 
layout.


> Would it be possible to create an RST entry for preallocated operations
> manually? E.g. without creating a dummy OE, but just insert the needed
> RST entries into RST tree at fallocate time?

Let me give it a try. But I'm a bit less happy to do so, as RST already 
increases the write amplification.
Qu Wenruo Sept. 13, 2024, 5:47 a.m. UTC | #3
在 2024/9/13 15:12, Johannes Thumshirn 写道:
> On 12.09.24 23:32, Qu Wenruo wrote:
>>
>>
>> 在 2024/9/13 00:03, Johannes Thumshirn 写道:
>>> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>>
>>> When scrubbing a RAID stripe-tree based filesystem with preallocated
>>> extents, the btrfs_map_block() called by
>>> scrub_submit_extent_sector_read() will return ENOENT, because there is
>>> no RAID stripe-tree entry for preallocated extents. This then causes
>>> the sector to be marked as a sector with an I/O error.
>>>
>>> To prevent this false alert don't mark secotors for that
>>> btrfs_map_block() returned an ENOENT as I/O errors but skip them.
>>>
>>> This results for example in errors in fstests' btrfs/060 .. btrfs/074
>>> which all perform fsstress and scrub operations. Whit this fix, these
>>> errors are gone and the tests pass again.
>>>
>>> Cc: Qu Wenru <wqu@suse.com>
>>
>> My concern is, ENOENT can be some real problems other than PREALLOC.
>> I'd prefer this to be the last-resort method.
>
> Hm but what else could create an entry in the extent tree without having
> it in the stripe tree? I can't really think of a situation creating this
> layout.

My concern is that, if by some other bug that certain writes didn't
create needed RST entry, we will always treat them as preallocated
during scrub.

Thus it may be better to have a way to distinguish a real missing entry
and preallocated extents.

>
>
>> Would it be possible to create an RST entry for preallocated operations
>> manually? E.g. without creating a dummy OE, but just insert the needed
>> RST entries into RST tree at fallocate time?
>
> Let me give it a try. But I'm a bit less happy to do so, as RST already
> increases the write amplification.

Well, write amplification is always a big problem for btrfs...

Thanks,
Qu
diff mbox series

Patch

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 3a3427428074..b195c41676e3 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1657,7 +1657,7 @@  static u32 stripe_length(const struct scrub_stripe *stripe)
 }
 
 static void scrub_submit_extent_sector_read(struct scrub_ctx *sctx,
-					    struct scrub_stripe *stripe)
+					   struct scrub_stripe *stripe)
 {
 	struct btrfs_fs_info *fs_info = stripe->bg->fs_info;
 	struct btrfs_bio *bbio = NULL;
@@ -1703,10 +1703,21 @@  static void scrub_submit_extent_sector_read(struct scrub_ctx *sctx,
 			err = btrfs_map_block(fs_info, BTRFS_MAP_READ, logical,
 					      &stripe_len, &bioc, &io_stripe, &mirror);
 			btrfs_put_bioc(bioc);
-			if (err < 0) {
+			if (err < 0 && err != -ENOENT) {
 				set_bit(i, &stripe->io_error_bitmap);
 				set_bit(i, &stripe->error_bitmap);
 				continue;
+			} else if (err == -ENOENT) {
+				/*
+				 * btrfs_map_block() returns -ENOENT if it can't
+				 * find the logical  address in the RAID stripe
+				 * tree. This can happens on PREALLOC  extents.
+				 * As we know the extent tree  has an extent
+				 * recorded there, we can be sure this is a
+				 * PREALLOC  extent, so skip this sector and
+				 * continue to the next.
+				 */
+				continue;
 			}
 
 			bbio = btrfs_bio_alloc(stripe->nr_sectors, REQ_OP_READ,