diff mbox series

btrfs: fix unmountable seed device after fstrim

Message ID c7715d09a67f212e0ecb5fea2d598513912092f4.1619443900.git.anand.jain@oracle.com (mailing list archive)
State New, archived
Headers show
Series btrfs: fix unmountable seed device after fstrim | expand

Commit Message

Anand Jain April 30, 2021, 9:10 a.m. UTC
The following test case reproduces an issue of wrongly freeing the in-use
block on the readonly seed device when fstrim is called on the rw sprout
device. As shown below.

create a seed device and add a sprout device to it
	mkfs.btrfs -fq -dsingle -msingle /dev/loop0
	btrfstune -S 1 /dev/loop0
	mount /dev/loop0 /btrfs
	btrfs dev add -f /dev/loop1 /btrfs

  BTRFS info (device loop0): relocating block group 290455552 flags system
  BTRFS info (device loop0): relocating block group 1048576 flags system
  BTRFS info (device loop0): disk added /dev/loop1

	umount /btrfs
mount the sprout device and run fstrim
	mount /dev/loop1 /btrfs
	fstrim /btrfs
	umount /btrfs

now try to mount the seed device, and it fails...
	mount /dev/loop0 /btrfs

  mount: /btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.

  BTRFS error (device loop0): bad tree block start, want 5292032 have 0
  BTRFS warning (device loop0): couldn't read-tree root
  BTRFS error (device loop0): open_ctree failed

Block 5292032 is missing on the readonly seed device.

From the dump-tree of the seed device taken before the fstrim. Block
5292032 belonged to the block group starting at 5242880

<snip>
 item 3 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16169 itemsize 24
	block group used 114688 chunk_objectid 256 flags METADATA
<snip>

From the dump-tree of the sprout device (taken before the fstrim).
fstrim used the block-group 5242880 to find the free space to free.

<snip>

 item 1 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16226 itemsize 24
	block group used 32768 chunk_objectid 256 flags METADATA
<snip>


bpf tracing the fstrim command finds the missing block 5292032 within the
range of the discarded blocks...

  kprobe:btrfs_discard_extent {
    printf("free start %llu end %llu num_bytes %llu\n", arg1, arg1+arg2, arg2);
  }

btrfs_discard_extent(..., start, num_bytes, ...):

 free start 5259264 end 5406720 num_bytes 147456
<snip>

Fix this by avoiding the discard command to the readonly seed device.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reported-by: Chris Murphy <lists@colorremedies.com>
---

A xfstests case to follow.

 fs/btrfs/extent-tree.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Filipe Manana April 30, 2021, 10:11 a.m. UTC | #1
On Fri, Apr 30, 2021 at 10:14 AM Anand Jain <anand.jain@oracle.com> wrote:
>
> The following test case reproduces an issue of wrongly freeing the in-use
> block on the readonly seed device when fstrim is called on the rw sprout
> device. As shown below.
>
> create a seed device and add a sprout device to it
>         mkfs.btrfs -fq -dsingle -msingle /dev/loop0
>         btrfstune -S 1 /dev/loop0
>         mount /dev/loop0 /btrfs
>         btrfs dev add -f /dev/loop1 /btrfs
>
>   BTRFS info (device loop0): relocating block group 290455552 flags system
>   BTRFS info (device loop0): relocating block group 1048576 flags system
>   BTRFS info (device loop0): disk added /dev/loop1
>
>         umount /btrfs
> mount the sprout device and run fstrim
>         mount /dev/loop1 /btrfs
>         fstrim /btrfs
>         umount /btrfs
>
> now try to mount the seed device, and it fails...
>         mount /dev/loop0 /btrfs
>
>   mount: /btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.
>
>   BTRFS error (device loop0): bad tree block start, want 5292032 have 0
>   BTRFS warning (device loop0): couldn't read-tree root
>   BTRFS error (device loop0): open_ctree failed
>
> Block 5292032 is missing on the readonly seed device.
>
> From the dump-tree of the seed device taken before the fstrim. Block
> 5292032 belonged to the block group starting at 5242880
>
> <snip>
>  item 3 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16169 itemsize 24
>         block group used 114688 chunk_objectid 256 flags METADATA
> <snip>
>
> From the dump-tree of the sprout device (taken before the fstrim).
> fstrim used the block-group 5242880 to find the free space to free.
>
> <snip>
>
>  item 1 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16226 itemsize 24
>         block group used 32768 chunk_objectid 256 flags METADATA
> <snip>
>
>
> bpf tracing the fstrim command finds the missing block 5292032 within the
> range of the discarded blocks...
>
>   kprobe:btrfs_discard_extent {
>     printf("free start %llu end %llu num_bytes %llu\n", arg1, arg1+arg2, arg2);
>   }
>
> btrfs_discard_extent(..., start, num_bytes, ...):
>
>  free start 5259264 end 5406720 num_bytes 147456
> <snip>
>
> Fix this by avoiding the discard command to the readonly seed device.

Quite hard to read the changelog.

It could be better organized, indentation is screwed up in many
places, shell command lines should have some prefix like $ or # to
make it clear what is a command and what is the output of a command,
missing full collons at the end of sentences, sentences not starting
with a capital letter, etc.

>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> Reported-by: Chris Murphy <lists@colorremedies.com>
> ---
>
> A xfstests case to follow.
>
>  fs/btrfs/extent-tree.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 7a28314189b4..0d19bd213715 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -1340,12 +1340,19 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
>                 stripe = bbio->stripes;
>                 for (i = 0; i < bbio->num_stripes; i++, stripe++) {
>                         u64 bytes;
> +                       struct btrfs_device *device = stripe->dev;
>
> -                       if (!stripe->dev->bdev) {
> +                       if (!device->bdev) {
>                                 ASSERT(btrfs_test_opt(fs_info, DEGRADED));
>                                 continue;
>                         }
>
> +                       /*
> +                        * Skip sending discard command to a non-writeable device.
> +                        */
> +                       if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state))
> +                               continue;

With such an obvious and easy to read conditional, the comment is
completely unnecessary, it doesn't add any value.

Other than that, the fix itself looks good.

Thanks.

> +
>                         ret = do_discard_extent(stripe, &bytes);
>                         if (!ret) {
>                                 discarded_bytes += bytes;
> --
> 2.29.2
>
Anand Jain April 30, 2021, 11:06 a.m. UTC | #2
On 30/4/21 6:11 pm, Filipe Manana wrote:
> On Fri, Apr 30, 2021 at 10:14 AM Anand Jain <anand.jain@oracle.com> wrote:
>>
>> The following test case reproduces an issue of wrongly freeing the in-use
>> block on the readonly seed device when fstrim is called on the rw sprout
>> device. As shown below.
>>
>> create a seed device and add a sprout device to it
>>          mkfs.btrfs -fq -dsingle -msingle /dev/loop0
>>          btrfstune -S 1 /dev/loop0
>>          mount /dev/loop0 /btrfs
>>          btrfs dev add -f /dev/loop1 /btrfs
>>
>>    BTRFS info (device loop0): relocating block group 290455552 flags system
>>    BTRFS info (device loop0): relocating block group 1048576 flags system
>>    BTRFS info (device loop0): disk added /dev/loop1
>>
>>          umount /btrfs
>> mount the sprout device and run fstrim
>>          mount /dev/loop1 /btrfs
>>          fstrim /btrfs
>>          umount /btrfs
>>
>> now try to mount the seed device, and it fails...
>>          mount /dev/loop0 /btrfs
>>
>>    mount: /btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.
>>
>>    BTRFS error (device loop0): bad tree block start, want 5292032 have 0
>>    BTRFS warning (device loop0): couldn't read-tree root
>>    BTRFS error (device loop0): open_ctree failed
>>
>> Block 5292032 is missing on the readonly seed device.
>>
>>  From the dump-tree of the seed device taken before the fstrim. Block
>> 5292032 belonged to the block group starting at 5242880
>>
>> <snip>
>>   item 3 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16169 itemsize 24
>>          block group used 114688 chunk_objectid 256 flags METADATA
>> <snip>
>>
>>  From the dump-tree of the sprout device (taken before the fstrim).
>> fstrim used the block-group 5242880 to find the free space to free.
>>
>> <snip>
>>
>>   item 1 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16226 itemsize 24
>>          block group used 32768 chunk_objectid 256 flags METADATA
>> <snip>
>>
>>
>> bpf tracing the fstrim command finds the missing block 5292032 within the
>> range of the discarded blocks...
>>
>>    kprobe:btrfs_discard_extent {
>>      printf("free start %llu end %llu num_bytes %llu\n", arg1, arg1+arg2, arg2);
>>    }
>>
>> btrfs_discard_extent(..., start, num_bytes, ...):
>>
>>   free start 5259264 end 5406720 num_bytes 147456
>> <snip>
>>
>> Fix this by avoiding the discard command to the readonly seed device.
> 
> Quite hard to read the changelog.
> 
> It could be better organized, indentation is screwed up in many
> places, shell command lines should have some prefix like $ or # to
> make it clear what is a command and what is the output of a command,
> missing full collons at the end of sentences, sentences not starting
> with a capital letter, etc.

   OK. Let me try to fix it.

> 
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> Reported-by: Chris Murphy <lists@colorremedies.com>
>> ---
>>
>> A xfstests case to follow.
>>
>>   fs/btrfs/extent-tree.c | 9 ++++++++-
>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 7a28314189b4..0d19bd213715 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -1340,12 +1340,19 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
>>                  stripe = bbio->stripes;
>>                  for (i = 0; i < bbio->num_stripes; i++, stripe++) {
>>                          u64 bytes;
>> +                       struct btrfs_device *device = stripe->dev;
>>
>> -                       if (!stripe->dev->bdev) {
>> +                       if (!device->bdev) {
>>                                  ASSERT(btrfs_test_opt(fs_info, DEGRADED));
>>                                  continue;
>>                          }
>>
>> +                       /*
>> +                        * Skip sending discard command to a non-writeable device.
>> +                        */
>> +                       if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state))
>> +                               continue;
> 
> With such an obvious and easy to read conditional, the comment is
> completely unnecessary, it doesn't add any value.
> 
  Will remove.

> Other than that, the fix itself looks good.
> 

Thanks

> Thanks.
> 
>> +
>>                          ret = do_discard_extent(stripe, &bytes);
>>                          if (!ret) {
>>                                  discarded_bytes += bytes;
>> --
>> 2.29.2
>>
> 
>
diff mbox series

Patch

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 7a28314189b4..0d19bd213715 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1340,12 +1340,19 @@  int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
 		stripe = bbio->stripes;
 		for (i = 0; i < bbio->num_stripes; i++, stripe++) {
 			u64 bytes;
+			struct btrfs_device *device = stripe->dev;
 
-			if (!stripe->dev->bdev) {
+			if (!device->bdev) {
 				ASSERT(btrfs_test_opt(fs_info, DEGRADED));
 				continue;
 			}
 
+			/*
+			 * Skip sending discard command to a non-writeable device.
+			 */
+			if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state))
+				continue;
+
 			ret = do_discard_extent(stripe, &bytes);
 			if (!ret) {
 				discarded_bytes += bytes;