Message ID | d6fcae756c5ce47da3527e5db4760d676420d950.1619783910.git.anand.jain@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: fix unmountable seed device after fstrim | expand |
On Fri, Apr 30, 2021 at 1:03 PM Anand Jain <anand.jain@oracle.com> wrote: > > The following test case reproduces an issue of wrongly freeing in-use > blocks on the readonly seed device when fstrim is called on the rw sprout > device. As shown below. > > Create a seed device and add a sprout device to it: > $ mkfs.btrfs -fq -dsingle -msingle /dev/loop0 An example of making things easier to the eye here, is adding a blank line before the mkfs line. The same applies to all the other similar places below. > $ btrfstune -S 1 /dev/loop0 > $ mount /dev/loop0 /btrfs > $ btrfs dev add -f /dev/loop1 /btrfs > BTRFS info (device loop0): relocating block group 290455552 flags system > BTRFS info (device loop0): relocating block group 1048576 flags system > BTRFS info (device loop0): disk added /dev/loop1 > $ umount /btrfs > > Mount the sprout device and run fstrim: > $ mount /dev/loop1 /btrfs > $ fstrim /btrfs > $ umount /btrfs > > Now try to mount the seed device, and it fails: > $ mount /dev/loop0 /btrfs > mount: /btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error. > > Block 5292032 is missing on the readonly seed device. Colon ":" instead of ".", plus blank line. > $ dmesg -kt | tail > <snip> > BTRFS error (device loop0): bad tree block start, want 5292032 have 0 > BTRFS warning (device loop0): couldn't read-tree root > BTRFS error (device loop0): open_ctree failed > > From the dump-tree of the seed device (taken before the fstrim). Block > 5292032 belonged to the block group starting at 5242880 Missing colon and blank line too. > $ btrfs inspect dump-tree -e /dev/loop0 | grep -A1 BLOCK_GROUP > <snip> > item 3 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16169 itemsize 24 > block group used 114688 chunk_objectid 256 flags METADATA > <snip> > > From the dump-tree of the sprout device (taken before the fstrim). > fstrim(8) used block-group 5242880 to find the related free space to free. Colon ":" and not ".", plus blank line. > $ btrfs inspect dump-tree -e /dev/loop1 | grep -A1 BLOCK_GROUP > <snip> > item 1 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16226 itemsize 24 > block group used 32768 chunk_objectid 256 flags METADATA > <snip> > > Bpf kernel tracing the fstrim(8) command finds the missing block 5292032 > within the range of the discarded blocks as below. Same as before. > kprobe:btrfs_discard_extent { > printf("freeing start %llu end %llu num_bytes %llu:\n", > arg1, arg1+arg2, arg2); > } > > freeing start 5259264 end 5406720 num_bytes 147456 > <snip> > > Fix this by avoiding the discard command to the readonly seed device. > > Signed-off-by: Anand Jain <anand.jain@oracle.com> > Reported-by: Chris Murphy <lists@colorremedies.com> The fix looks good. Don't feel forced to address the style comments above, consider them more a recommendation for the future. Reviewed-by: Filipe Manana <fdmanana@suse.com> Thanks. > --- > fs/btrfs/extent-tree.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index 7a28314189b4..f1d15b68994a 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -1340,12 +1340,16 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, > stripe = bbio->stripes; > for (i = 0; i < bbio->num_stripes; i++, stripe++) { > u64 bytes; > + struct btrfs_device *device = stripe->dev; > > - if (!stripe->dev->bdev) { > + if (!device->bdev) { > ASSERT(btrfs_test_opt(fs_info, DEGRADED)); > continue; > } > > + if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) > + continue; > + > ret = do_discard_extent(stripe, &bytes); > if (!ret) { > discarded_bytes += bytes; > -- > 2.29.2 >
On 30/4/21 8:14 pm, Filipe Manana wrote: > On Fri, Apr 30, 2021 at 1:03 PM Anand Jain <anand.jain@oracle.com> wrote: >> >> The following test case reproduces an issue of wrongly freeing in-use >> blocks on the readonly seed device when fstrim is called on the rw sprout >> device. As shown below. >> >> Create a seed device and add a sprout device to it: >> $ mkfs.btrfs -fq -dsingle -msingle /dev/loop0 > > An example of making things easier to the eye here, is adding a blank > line before the mkfs line. > The same applies to all the other similar places below. > >> $ btrfstune -S 1 /dev/loop0 >> $ mount /dev/loop0 /btrfs >> $ btrfs dev add -f /dev/loop1 /btrfs >> BTRFS info (device loop0): relocating block group 290455552 flags system >> BTRFS info (device loop0): relocating block group 1048576 flags system >> BTRFS info (device loop0): disk added /dev/loop1 >> $ umount /btrfs >> >> Mount the sprout device and run fstrim: >> $ mount /dev/loop1 /btrfs >> $ fstrim /btrfs >> $ umount /btrfs >> >> Now try to mount the seed device, and it fails: >> $ mount /dev/loop0 /btrfs >> mount: /btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error. >> >> Block 5292032 is missing on the readonly seed device. > > Colon ":" instead of ".", plus blank line. > >> $ dmesg -kt | tail >> <snip> >> BTRFS error (device loop0): bad tree block start, want 5292032 have 0 >> BTRFS warning (device loop0): couldn't read-tree root >> BTRFS error (device loop0): open_ctree failed >> >> From the dump-tree of the seed device (taken before the fstrim). Block >> 5292032 belonged to the block group starting at 5242880 > > Missing colon and blank line too. > >> $ btrfs inspect dump-tree -e /dev/loop0 | grep -A1 BLOCK_GROUP >> <snip> >> item 3 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16169 itemsize 24 >> block group used 114688 chunk_objectid 256 flags METADATA >> <snip> >> >> From the dump-tree of the sprout device (taken before the fstrim). >> fstrim(8) used block-group 5242880 to find the related free space to free. > > Colon ":" and not ".", plus blank line. > >> $ btrfs inspect dump-tree -e /dev/loop1 | grep -A1 BLOCK_GROUP >> <snip> >> item 1 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16226 itemsize 24 >> block group used 32768 chunk_objectid 256 flags METADATA >> <snip> >> >> Bpf kernel tracing the fstrim(8) command finds the missing block 5292032 >> within the range of the discarded blocks as below. > > Same as before. > >> kprobe:btrfs_discard_extent { >> printf("freeing start %llu end %llu num_bytes %llu:\n", >> arg1, arg1+arg2, arg2); >> } >> >> freeing start 5259264 end 5406720 num_bytes 147456 >> <snip> >> >> Fix this by avoiding the discard command to the readonly seed device. >> >> Signed-off-by: Anand Jain <anand.jain@oracle.com> >> Reported-by: Chris Murphy <lists@colorremedies.com> > > The fix looks good. Don't feel forced to address the style comments > above, consider them more a recommendation for the future. > Yep. Thanks. Also, I have missed the re-roll count and its log here. I will just mention that here. v2: Fix commit changelog. Drop a code comment. If David needs, I don't mind resending with these changes. > Reviewed-by: Filipe Manana <fdmanana@suse.com> > > Thanks. > >> --- >> fs/btrfs/extent-tree.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c >> index 7a28314189b4..f1d15b68994a 100644 >> --- a/fs/btrfs/extent-tree.c >> +++ b/fs/btrfs/extent-tree.c >> @@ -1340,12 +1340,16 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, >> stripe = bbio->stripes; >> for (i = 0; i < bbio->num_stripes; i++, stripe++) { >> u64 bytes; >> + struct btrfs_device *device = stripe->dev; >> >> - if (!stripe->dev->bdev) { >> + if (!device->bdev) { >> ASSERT(btrfs_test_opt(fs_info, DEGRADED)); >> continue; >> } >> >> + if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) >> + continue; >> + >> ret = do_discard_extent(stripe, &bytes); >> if (!ret) { >> discarded_bytes += bytes; >> -- >> 2.29.2 >> > >
On Fri, Apr 30, 2021 at 08:48:56PM +0800, Anand Jain wrote: > On 30/4/21 8:14 pm, Filipe Manana wrote: > > On Fri, Apr 30, 2021 at 1:03 PM Anand Jain <anand.jain@oracle.com> wrote: > > > > The fix looks good. Don't feel forced to address the style comments > > above, consider them more a recommendation for the future. > > > > Yep. Thanks. > Also, I have missed the re-roll count and its log here. > I will just mention that here. > v2: > Fix commit changelog. > Drop a code comment. > If David needs, I don't mind resending with these changes. Not needed, I'm often adjusting changelog formatting so I'd do the suggested fixups anyway. Patch added to misc-next, thanks.
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 7a28314189b4..f1d15b68994a 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1340,12 +1340,16 @@ int btrfs_discard_extent(struct btrfs_fs_info *fs_info, u64 bytenr, stripe = bbio->stripes; for (i = 0; i < bbio->num_stripes; i++, stripe++) { u64 bytes; + struct btrfs_device *device = stripe->dev; - if (!stripe->dev->bdev) { + if (!device->bdev) { ASSERT(btrfs_test_opt(fs_info, DEGRADED)); continue; } + if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) + continue; + ret = do_discard_extent(stripe, &bytes); if (!ret) { discarded_bytes += bytes;
The following test case reproduces an issue of wrongly freeing in-use blocks on the readonly seed device when fstrim is called on the rw sprout device. As shown below. Create a seed device and add a sprout device to it: $ mkfs.btrfs -fq -dsingle -msingle /dev/loop0 $ btrfstune -S 1 /dev/loop0 $ mount /dev/loop0 /btrfs $ btrfs dev add -f /dev/loop1 /btrfs BTRFS info (device loop0): relocating block group 290455552 flags system BTRFS info (device loop0): relocating block group 1048576 flags system BTRFS info (device loop0): disk added /dev/loop1 $ umount /btrfs Mount the sprout device and run fstrim: $ mount /dev/loop1 /btrfs $ fstrim /btrfs $ umount /btrfs Now try to mount the seed device, and it fails: $ mount /dev/loop0 /btrfs mount: /btrfs: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error. Block 5292032 is missing on the readonly seed device. $ dmesg -kt | tail <snip> BTRFS error (device loop0): bad tree block start, want 5292032 have 0 BTRFS warning (device loop0): couldn't read-tree root BTRFS error (device loop0): open_ctree failed From the dump-tree of the seed device (taken before the fstrim). Block 5292032 belonged to the block group starting at 5242880 $ btrfs inspect dump-tree -e /dev/loop0 | grep -A1 BLOCK_GROUP <snip> item 3 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16169 itemsize 24 block group used 114688 chunk_objectid 256 flags METADATA <snip> From the dump-tree of the sprout device (taken before the fstrim). fstrim(8) used block-group 5242880 to find the related free space to free. $ btrfs inspect dump-tree -e /dev/loop1 | grep -A1 BLOCK_GROUP <snip> item 1 key (5242880 BLOCK_GROUP_ITEM 8388608) itemoff 16226 itemsize 24 block group used 32768 chunk_objectid 256 flags METADATA <snip> Bpf kernel tracing the fstrim(8) command finds the missing block 5292032 within the range of the discarded blocks as below. kprobe:btrfs_discard_extent { printf("freeing start %llu end %llu num_bytes %llu:\n", arg1, arg1+arg2, arg2); } freeing start 5259264 end 5406720 num_bytes 147456 <snip> Fix this by avoiding the discard command to the readonly seed device. Signed-off-by: Anand Jain <anand.jain@oracle.com> Reported-by: Chris Murphy <lists@colorremedies.com> --- fs/btrfs/extent-tree.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)