diff mbox series

btrfs: scrub: avoid crash if scrub is trying to do recovery for a removed block group

Message ID 45841a7e90525bf1efa2324ab9d80aeb9e20457c.1682477110.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: scrub: avoid crash if scrub is trying to do recovery for a removed block group | expand

Commit Message

Qu Wenruo April 26, 2023, 2:45 a.m. UTC
[BUG]
Syzbot reported an ASSERT() got triggered during a scrub repair along
with balance:

 BTRFS info (device loop5): balance: start -d -m
 BTRFS info (device loop5): relocating block group 6881280 flags data|metadata
 BTRFS info (device loop5): found 3 extents, stage: move data extents
 BTRFS info (device loop5): scrub: started on devid 1
 BTRFS info (device loop5): relocating block group 5242880 flags data|metadata
 BTRFS info (device loop5): found 6 extents, stage: move data extents
 BTRFS info (device loop5): found 1 extents, stage: update data pointers
 BTRFS warning (device loop5): tree block 5500928 mirror 1 has bad bytenr, has 0 want 5500928
 BTRFS info (device loop5): balance: ended with status: 0
 BTRFS warning (device loop5): tree block 5435392 mirror 1 has bad bytenr, has 0 want 5435392
 BTRFS warning (device loop5): tree block 5423104 mirror 1 has bad bytenr, has 0 want 5423104
 assertion failed: 0, in fs/btrfs/scrub.c:614
 ------------[ cut here ]------------
 kernel BUG at fs/btrfs/messages.c:259!
 invalid opcode: 0000 [#2] PREEMPT SMP KASAN
 Call Trace:
   <TASK>
   lock_full_stripe fs/btrfs/scrub.c:614 [inline]
   scrub_handle_errored_block+0x1ee1/0x4730 fs/btrfs/scrub.c:1067
   scrub_bio_end_io_worker+0x9bb/0x1370 fs/btrfs/scrub.c:2559
   process_one_work+0x8a0/0x10e0 kernel/workqueue.c:2390
   worker_thread+0xa63/0x1210 kernel/workqueue.c:2537
   kthread+0x270/0x300 kernel/kthread.c:376
   ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
   </TASK>

[CAUSE]
Btrfs can delete empty block groups either through auto-cleanup or
relcation.

Scrub normally is able to handle this situation well by doing extra
checking, and holding the block group cache pointer during the whole
scrub lifespan.

But unfortunately for lock_full_stripe() and unlock_full_stripe()
functions, due to the context restriction, they have to do an extra
search on the block group cache.
(While the main scrub threads holds a proper btrfs_block_group, but we
have no way to directly use that in repair context).

Thus it can happen that the target block group is already deleted by
relocation.

In that case, we trigger the above ASSERT().

[FIX]
Instead of triggering the ASSERT(), let's just return 0 and continue,
this would leave @locked_ret to be false, and we won't try to unlock
later.

CC: stable@vger.kernel.org
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
There would be no upstream commit, as upstream has completely rewritten
the scrub code in v6.4 merge window, and gets rid of the
lock_full_stripe()/unlock_full_stripe() functions.

I hope we don't have more scrub fixes which would only apply to older
kernels.
---
 fs/btrfs/scrub.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Comments

David Sterba April 28, 2023, 5:15 p.m. UTC | #1
On Wed, Apr 26, 2023 at 10:45:59AM +0800, Qu Wenruo wrote:
> [BUG]
> Syzbot reported an ASSERT() got triggered during a scrub repair along
> with balance:
> 
>  BTRFS info (device loop5): balance: start -d -m
>  BTRFS info (device loop5): relocating block group 6881280 flags data|metadata
>  BTRFS info (device loop5): found 3 extents, stage: move data extents
>  BTRFS info (device loop5): scrub: started on devid 1
>  BTRFS info (device loop5): relocating block group 5242880 flags data|metadata
>  BTRFS info (device loop5): found 6 extents, stage: move data extents
>  BTRFS info (device loop5): found 1 extents, stage: update data pointers
>  BTRFS warning (device loop5): tree block 5500928 mirror 1 has bad bytenr, has 0 want 5500928
>  BTRFS info (device loop5): balance: ended with status: 0
>  BTRFS warning (device loop5): tree block 5435392 mirror 1 has bad bytenr, has 0 want 5435392
>  BTRFS warning (device loop5): tree block 5423104 mirror 1 has bad bytenr, has 0 want 5423104
>  assertion failed: 0, in fs/btrfs/scrub.c:614
>  ------------[ cut here ]------------
>  kernel BUG at fs/btrfs/messages.c:259!
>  invalid opcode: 0000 [#2] PREEMPT SMP KASAN
>  Call Trace:
>    <TASK>
>    lock_full_stripe fs/btrfs/scrub.c:614 [inline]
>    scrub_handle_errored_block+0x1ee1/0x4730 fs/btrfs/scrub.c:1067
>    scrub_bio_end_io_worker+0x9bb/0x1370 fs/btrfs/scrub.c:2559
>    process_one_work+0x8a0/0x10e0 kernel/workqueue.c:2390
>    worker_thread+0xa63/0x1210 kernel/workqueue.c:2537
>    kthread+0x270/0x300 kernel/kthread.c:376
>    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
>    </TASK>
> 
> [CAUSE]
> Btrfs can delete empty block groups either through auto-cleanup or
> relcation.
> 
> Scrub normally is able to handle this situation well by doing extra
> checking, and holding the block group cache pointer during the whole
> scrub lifespan.
> 
> But unfortunately for lock_full_stripe() and unlock_full_stripe()
> functions, due to the context restriction, they have to do an extra
> search on the block group cache.
> (While the main scrub threads holds a proper btrfs_block_group, but we
> have no way to directly use that in repair context).
> 
> Thus it can happen that the target block group is already deleted by
> relocation.
> 
> In that case, we trigger the above ASSERT().
> 
> [FIX]
> Instead of triggering the ASSERT(), let's just return 0 and continue,
> this would leave @locked_ret to be false, and we won't try to unlock
> later.
> 
> CC: stable@vger.kernel.org
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> There would be no upstream commit, as upstream has completely rewritten
> the scrub code in v6.4 merge window, and gets rid of the
> lock_full_stripe()/unlock_full_stripe() functions.

I think we can explain and justify such change:

- there's a report and reproducer, so this is an existing problem

- it's a short diff

- we should do extra testing of the stable trees + this patch before
  submitting

> I hope we don't have more scrub fixes which would only apply to older
> kernels.

We'll handle that case by case, fixing problems in old stable tree is
desirable. Patches without an exact corresponding upstream commit should
be rare but with rewrites this could happen.

> ---
>  fs/btrfs/scrub.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
> index 69c93ae333f6..43d0613c0dd3 100644
> --- a/fs/btrfs/scrub.c
> +++ b/fs/btrfs/scrub.c
> @@ -610,10 +610,9 @@ static int lock_full_stripe(struct btrfs_fs_info *fs_info, u64 bytenr,
>  
>  	*locked_ret = false;
>  	bg_cache = btrfs_lookup_block_group(fs_info, bytenr);
> -	if (!bg_cache) {
> -		ASSERT(0);

I like the using ASSERT(0) less and less each time I see it. We now have
about 33 of them and it looks like a misuse. It's a BUG() in disguise.
We should handle the error properly, either return a code and let
callers handle it or return -EUCLEAN if it's an unrepairable condition.
Assertions should be for code invariants, API misuse prevention, not for
error handling.
diff mbox series

Patch

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 69c93ae333f6..43d0613c0dd3 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -610,10 +610,9 @@  static int lock_full_stripe(struct btrfs_fs_info *fs_info, u64 bytenr,
 
 	*locked_ret = false;
 	bg_cache = btrfs_lookup_block_group(fs_info, bytenr);
-	if (!bg_cache) {
-		ASSERT(0);
-		return -ENOENT;
-	}
+	/* The block group is removed, no need to do any lock. */
+	if (!bg_cache)
+		return 0;
 
 	/* Profiles not based on parity don't need full stripe lock */
 	if (!(bg_cache->flags & BTRFS_BLOCK_GROUP_RAID56_MASK))