diff mbox series

btrfs: fix unmountable seed device after fstrim

Message ID d6fcae756c5ce47da3527e5db4760d676420d950.1619783910.git.anand.jain@oracle.com (mailing list archive)
State New, archived
Headers show
Series btrfs: fix unmountable seed device after fstrim | expand

Commit Message

Anand Jain April 30, 2021, 11:59 a.m. UTC
The following test case reproduces an issue of wrongly freeing in-use
blocks on the readonly seed device when fstrim is called on the rw sprout
device. As shown below.

Create a seed device and add a sprout device to it:
	$ mkfs.btrfs -fq -dsingle -msingle /dev/loop0
	$ btrfstune -S 1 /dev/loop0
	$ mount /dev/loop0 /btrfs
	$ btrfs dev add -f /dev/loop1 /btrfs
	BTRFS info (device loop0): relocating block group 290455552 flags system
	BTRFS info (device loop0): relocating block group 1048576 flags system
	BTRFS info (device loop0): disk added /dev/loop1
	$ umount /btrfs

Mount the sprout device and run fstrim:
	$ mount /dev/loop1 /btrfs
	$ fstrim /btrfs
	$ umount /btrfs

Now try to mount the seed device, and it fails:
	$ mount /dev/loop0 /btrfs
	mount: /btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.

Block 5292032 is missing on the readonly seed device.
	$ dmesg -kt | tail
	<snip>
	BTRFS error (device loop0): bad tree block start, want 5292032 have 0
	BTRFS warning (device loop0): couldn't read-tree root
	BTRFS error (device loop0): open_ctree failed

From the dump-tree of the seed device (taken before the fstrim). Block
5292032 belonged to the block group starting at 5242880
	$ btrfs inspect dump-tree -e /dev/loop0 | grep -A1 BLOCK_GROUP
	<snip>
	item 3 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16169 itemsize 24
		block group used 114688 chunk_objectid 256 flags METADATA
	<snip>

From the dump-tree of the sprout device (taken before the fstrim).
fstrim(8) used block-group 5242880 to find the related free space to free.
	$ btrfs inspect dump-tree -e /dev/loop1 | grep -A1 BLOCK_GROUP
	<snip>
	item 1 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16226 itemsize 24
		block group used 32768 chunk_objectid 256 flags METADATA
	<snip>

Bpf kernel tracing the fstrim(8) command finds the missing block 5292032
within the range of the discarded blocks as below.
	kprobe:btrfs_discard_extent {
		printf("freeing start %llu end %llu num_bytes %llu:\n",
			arg1, arg1+arg2, arg2);
	}

	freeing start 5259264 end 5406720 num_bytes 147456
	<snip>

Fix this by avoiding the discard command to the readonly seed device.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reported-by: Chris Murphy <lists@colorremedies.com>
---
 fs/btrfs/extent-tree.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Filipe Manana April 30, 2021, 12:14 p.m. UTC | #1
On Fri, Apr 30, 2021 at 1:03 PM Anand Jain <anand.jain@oracle.com> wrote:
>
> The following test case reproduces an issue of wrongly freeing in-use
> blocks on the readonly seed device when fstrim is called on the rw sprout
> device. As shown below.
>
> Create a seed device and add a sprout device to it:
>         $ mkfs.btrfs -fq -dsingle -msingle /dev/loop0

An example of making things easier to the eye here, is adding a blank
line before the mkfs line.
The same applies to all the other similar places below.

>         $ btrfstune -S 1 /dev/loop0
>         $ mount /dev/loop0 /btrfs
>         $ btrfs dev add -f /dev/loop1 /btrfs
>         BTRFS info (device loop0): relocating block group 290455552 flags system
>         BTRFS info (device loop0): relocating block group 1048576 flags system
>         BTRFS info (device loop0): disk added /dev/loop1
>         $ umount /btrfs
>
> Mount the sprout device and run fstrim:
>         $ mount /dev/loop1 /btrfs
>         $ fstrim /btrfs
>         $ umount /btrfs
>
> Now try to mount the seed device, and it fails:
>         $ mount /dev/loop0 /btrfs
>         mount: /btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.
>
> Block 5292032 is missing on the readonly seed device.

Colon ":" instead of ".", plus blank line.

>         $ dmesg -kt | tail
>         <snip>
>         BTRFS error (device loop0): bad tree block start, want 5292032 have 0
>         BTRFS warning (device loop0): couldn't read-tree root
>         BTRFS error (device loop0): open_ctree failed
>
> From the dump-tree of the seed device (taken before the fstrim). Block
> 5292032 belonged to the block group starting at 5242880

Missing colon and blank line too.

>         $ btrfs inspect dump-tree -e /dev/loop0 | grep -A1 BLOCK_GROUP
>         <snip>
>         item 3 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16169 itemsize 24
>                 block group used 114688 chunk_objectid 256 flags METADATA
>         <snip>
>
> From the dump-tree of the sprout device (taken before the fstrim).
> fstrim(8) used block-group 5242880 to find the related free space to free.

Colon ":" and not ".", plus blank line.

>         $ btrfs inspect dump-tree -e /dev/loop1 | grep -A1 BLOCK_GROUP
>         <snip>
>         item 1 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16226 itemsize 24
>                 block group used 32768 chunk_objectid 256 flags METADATA
>         <snip>
>
> Bpf kernel tracing the fstrim(8) command finds the missing block 5292032
> within the range of the discarded blocks as below.

Same as before.

>         kprobe:btrfs_discard_extent {
>                 printf("freeing start %llu end %llu num_bytes %llu:\n",
>                         arg1, arg1+arg2, arg2);
>         }
>
>         freeing start 5259264 end 5406720 num_bytes 147456
>         <snip>
>
> Fix this by avoiding the discard command to the readonly seed device.
>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> Reported-by: Chris Murphy <lists@colorremedies.com>

The fix looks good. Don't feel forced to address the style comments
above, consider them more a recommendation for the future.

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Thanks.

> ---
>  fs/btrfs/extent-tree.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 7a28314189b4..f1d15b68994a 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -1340,12 +1340,16 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
>                 stripe = bbio->stripes;
>                 for (i = 0; i < bbio->num_stripes; i++, stripe++) {
>                         u64 bytes;
> +                       struct btrfs_device *device = stripe->dev;
>
> -                       if (!stripe->dev->bdev) {
> +                       if (!device->bdev) {
>                                 ASSERT(btrfs_test_opt(fs_info, DEGRADED));
>                                 continue;
>                         }
>
> +                       if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state))
> +                               continue;
> +
>                         ret = do_discard_extent(stripe, &bytes);
>                         if (!ret) {
>                                 discarded_bytes += bytes;
> --
> 2.29.2
>
Anand Jain April 30, 2021, 12:48 p.m. UTC | #2
On 30/4/21 8:14 pm, Filipe Manana wrote:
> On Fri, Apr 30, 2021 at 1:03 PM Anand Jain <anand.jain@oracle.com> wrote:
>>
>> The following test case reproduces an issue of wrongly freeing in-use
>> blocks on the readonly seed device when fstrim is called on the rw sprout
>> device. As shown below.
>>
>> Create a seed device and add a sprout device to it:
>>          $ mkfs.btrfs -fq -dsingle -msingle /dev/loop0
> 
> An example of making things easier to the eye here, is adding a blank
> line before the mkfs line.
> The same applies to all the other similar places below.
> 
>>          $ btrfstune -S 1 /dev/loop0
>>          $ mount /dev/loop0 /btrfs
>>          $ btrfs dev add -f /dev/loop1 /btrfs
>>          BTRFS info (device loop0): relocating block group 290455552 flags system
>>          BTRFS info (device loop0): relocating block group 1048576 flags system
>>          BTRFS info (device loop0): disk added /dev/loop1
>>          $ umount /btrfs
>>
>> Mount the sprout device and run fstrim:
>>          $ mount /dev/loop1 /btrfs
>>          $ fstrim /btrfs
>>          $ umount /btrfs
>>
>> Now try to mount the seed device, and it fails:
>>          $ mount /dev/loop0 /btrfs
>>          mount: /btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.
>>
>> Block 5292032 is missing on the readonly seed device.
> 
> Colon ":" instead of ".", plus blank line.
> 
>>          $ dmesg -kt | tail
>>          <snip>
>>          BTRFS error (device loop0): bad tree block start, want 5292032 have 0
>>          BTRFS warning (device loop0): couldn't read-tree root
>>          BTRFS error (device loop0): open_ctree failed
>>
>>  From the dump-tree of the seed device (taken before the fstrim). Block
>> 5292032 belonged to the block group starting at 5242880
> 
> Missing colon and blank line too.
> 
>>          $ btrfs inspect dump-tree -e /dev/loop0 | grep -A1 BLOCK_GROUP
>>          <snip>
>>          item 3 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16169 itemsize 24
>>                  block group used 114688 chunk_objectid 256 flags METADATA
>>          <snip>
>>
>>  From the dump-tree of the sprout device (taken before the fstrim).
>> fstrim(8) used block-group 5242880 to find the related free space to free.
> 
> Colon ":" and not ".", plus blank line.
> 
>>          $ btrfs inspect dump-tree -e /dev/loop1 | grep -A1 BLOCK_GROUP
>>          <snip>
>>          item 1 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16226 itemsize 24
>>                  block group used 32768 chunk_objectid 256 flags METADATA
>>          <snip>
>>
>> Bpf kernel tracing the fstrim(8) command finds the missing block 5292032
>> within the range of the discarded blocks as below.
> 
> Same as before.
> 
>>          kprobe:btrfs_discard_extent {
>>                  printf("freeing start %llu end %llu num_bytes %llu:\n",
>>                          arg1, arg1+arg2, arg2);
>>          }
>>
>>          freeing start 5259264 end 5406720 num_bytes 147456
>>          <snip>
>>
>> Fix this by avoiding the discard command to the readonly seed device.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> Reported-by: Chris Murphy <lists@colorremedies.com>
> 
> The fix looks good. Don't feel forced to address the style comments
> above, consider them more a recommendation for the future.
> 

Yep. Thanks.
  Also, I have missed the re-roll count and its log here.
  I will just mention that here.
     v2:
        Fix commit changelog.
        Drop a code comment.
If David needs, I don't mind resending with these changes.

> Reviewed-by: Filipe Manana <fdmanana@suse.com>
> 
> Thanks.
> 
>> ---
>>   fs/btrfs/extent-tree.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 7a28314189b4..f1d15b68994a 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -1340,12 +1340,16 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
>>                  stripe = bbio->stripes;
>>                  for (i = 0; i < bbio->num_stripes; i++, stripe++) {
>>                          u64 bytes;
>> +                       struct btrfs_device *device = stripe->dev;
>>
>> -                       if (!stripe->dev->bdev) {
>> +                       if (!device->bdev) {
>>                                  ASSERT(btrfs_test_opt(fs_info, DEGRADED));
>>                                  continue;
>>                          }
>>
>> +                       if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state))
>> +                               continue;
>> +
>>                          ret = do_discard_extent(stripe, &bytes);
>>                          if (!ret) {
>>                                  discarded_bytes += bytes;
>> --
>> 2.29.2
>>
> 
>
David Sterba May 3, 2021, 1:34 p.m. UTC | #3
On Fri, Apr 30, 2021 at 08:48:56PM +0800, Anand Jain wrote:
> On 30/4/21 8:14 pm, Filipe Manana wrote:
> > On Fri, Apr 30, 2021 at 1:03 PM Anand Jain <anand.jain@oracle.com> wrote:
> > 
> > The fix looks good. Don't feel forced to address the style comments
> > above, consider them more a recommendation for the future.
> > 
> 
> Yep. Thanks.
>   Also, I have missed the re-roll count and its log here.
>   I will just mention that here.
>      v2:
>         Fix commit changelog.
>         Drop a code comment.
> If David needs, I don't mind resending with these changes.

Not needed, I'm often adjusting changelog formatting so I'd do the
suggested fixups anyway. Patch added to misc-next, thanks.
diff mbox series

Patch

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 7a28314189b4..f1d15b68994a 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1340,12 +1340,16 @@  int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr,
 		stripe = bbio->stripes;
 		for (i = 0; i < bbio->num_stripes; i++, stripe++) {
 			u64 bytes;
+			struct btrfs_device *device = stripe->dev;
 
-			if (!stripe->dev->bdev) {
+			if (!device->bdev) {
 				ASSERT(btrfs_test_opt(fs_info, DEGRADED));
 				continue;
 			}
 
+			if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state))
+				continue;
+
 			ret = do_discard_extent(stripe, &bytes);
 			if (!ret) {
 				discarded_bytes += bytes;