diff mbox series

btrfs: fix the mount crash caused by confusing return value

Message ID 2de92bdcebd36e4119467797dedae8a9d97a9df3.1677314616.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: fix the mount crash caused by confusing return value | expand

Commit Message

Qu Wenruo Feb. 25, 2023, 8:44 a.m. UTC
[BUG]
Btrfs mount can lead to crash if the fs has critical trees corrupted:

 # mkfs.btrfs  -f /dev/test/scratch1
 # btrfs-map-logical -l 30588928 /dev/test/scratch1 # The bytenr is tree root
 mirror 1 logical 30588928 physical 38977536 device /dev/test/scratch1
 mirror 2 logical 30588928 physical 307412992 device /dev/test/scratch1
 # xfs_io -f -c "pwrite 38977536 4" -c "pwrite 307412992 4" /dev/test/scratch1
 # mount /dev/test/scratch1 /mnt/btrfs

And the above mount would crash with the following dmesg:

 BTRFS warning (device dm-4): checksum verify failed on logical 30588928 mirror 1 wanted 0xcdcdcdcd found 0x6ca45898 level 0
 BTRFS warning (device dm-4): checksum verify failed on logical 30588928 mirror 2 wanted 0xcdcdcdcd found 0x6ca45898 level 0
 BTRFS warning (device dm-4): couldn't read tree root
 ==================================================================
 BUG: KASAN: null-ptr-deref in btrfs_iget+0x74/0x160 [btrfs]
 Read of size 8 at addr 00000000000001f7 by task mount/4040

[CAUSE]
In open_ctree(), we have two variables to indicates errors: @ret and
@err.

Unfortunately such confusion leads to the above crash, as in the error
handling of load_super_root(), we just goto fail_tree_roots label, but
at the end of error handling, we return @err instead of @ret.

Thus even we failed to load tree root, we still return 0 for
open_ctree(), thus later btrfs_iget() would fail.

[FIX]
Instead of such dangerous mixing, remove @err variable completely, and
use @ret only.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Unfortunately I failed to pin down a single patch causing the problem.
Would craft a test case for it.
---
 fs/btrfs/disk-io.c | 42 +++++++++++++++---------------------------
 1 file changed, 15 insertions(+), 27 deletions(-)

Comments

Anand Jain Feb. 27, 2023, 6:14 a.m. UTC | #1
On 2/25/23 16:44, Qu Wenruo wrote:
> [BUG]
> Btrfs mount can lead to crash if the fs has critical trees corrupted:
> 
>   # mkfs.btrfs  -f /dev/test/scratch1
>   # btrfs-map-logical -l 30588928 /dev/test/scratch1 # The bytenr is tree root
>   mirror 1 logical 30588928 physical 38977536 device /dev/test/scratch1
>   mirror 2 logical 30588928 physical 307412992 device /dev/test/scratch1
>   # xfs_io -f -c "pwrite 38977536 4" -c "pwrite 307412992 4" /dev/test/scratch1
>   # mount /dev/test/scratch1 /mnt/btrfs
> 
> And the above mount would crash with the following dmesg:
> 
>   BTRFS warning (device dm-4): checksum verify failed on logical 30588928 mirror 1 wanted 0xcdcdcdcd found 0x6ca45898 level 0
>   BTRFS warning (device dm-4): checksum verify failed on logical 30588928 mirror 2 wanted 0xcdcdcdcd found 0x6ca45898 level 0
>   BTRFS warning (device dm-4): couldn't read tree root
>   ==================================================================
>   BUG: KASAN: null-ptr-deref in btrfs_iget+0x74/0x160 [btrfs]
>   Read of size 8 at addr 00000000000001f7 by task mount/4040
> 
> [CAUSE]
> In open_ctree(), we have two variables to indicates errors: @ret and
> @err.
> 
> Unfortunately such confusion leads to the above crash, as in the error
> handling of load_super_root(), we just goto fail_tree_roots label, but
> at the end of error handling, we return @err instead of @ret.
> 
> Thus even we failed to load tree root, we still return 0 for
> open_ctree(), thus later btrfs_iget() would fail.


  There are many child functions in open_ctree() that rely on the default
  value of @err, which is -EINVAL, to return an error instead of ret.

  The decoupling of @ret and the actual error returned by open_ctree()
  is intentional. IMO.

  However, the bug is that btrfs_init_btree_inode() return value is
  assigned to @err instead of @ret.

  ret = btrfs_init_btree_inode(sb);

  And the regression is caused by the following commit:

  commit 097421116e288dd3f5baaf1dd7e86035db60336f
   btrfs: move all btree inode initialization into btrfs_init_btree_inode

  This commit is still in misc-next. We can fold a fix as below:


diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 48368d4bc331..0e0c30fe6df6 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3360,9 +3360,11 @@ int __cold open_ctree(struct super_block *sb, 
struct btrfs_fs_devices *fs_device
                 goto fail;
         }

-       err = btrfs_init_btree_inode(sb);
-       if (err)
+       ret = btrfs_init_btree_inode(sb);
+       if (ret) {
+               err = ret;
                 goto fail;
+       }

         invalidate_bdev(fs_devices->latest_dev->bdev);
Qu Wenruo Feb. 27, 2023, 6:34 a.m. UTC | #2
On 2023/2/27 14:14, Anand Jain wrote:
> On 2/25/23 16:44, Qu Wenruo wrote:
>> [BUG]
>> Btrfs mount can lead to crash if the fs has critical trees corrupted:
>>
>>   # mkfs.btrfs  -f /dev/test/scratch1
>>   # btrfs-map-logical -l 30588928 /dev/test/scratch1 # The bytenr is 
>> tree root
>>   mirror 1 logical 30588928 physical 38977536 device /dev/test/scratch1
>>   mirror 2 logical 30588928 physical 307412992 device /dev/test/scratch1
>>   # xfs_io -f -c "pwrite 38977536 4" -c "pwrite 307412992 4" 
>> /dev/test/scratch1
>>   # mount /dev/test/scratch1 /mnt/btrfs
>>
>> And the above mount would crash with the following dmesg:
>>
>>   BTRFS warning (device dm-4): checksum verify failed on logical 
>> 30588928 mirror 1 wanted 0xcdcdcdcd found 0x6ca45898 level 0
>>   BTRFS warning (device dm-4): checksum verify failed on logical 
>> 30588928 mirror 2 wanted 0xcdcdcdcd found 0x6ca45898 level 0
>>   BTRFS warning (device dm-4): couldn't read tree root
>>   ==================================================================
>>   BUG: KASAN: null-ptr-deref in btrfs_iget+0x74/0x160 [btrfs]
>>   Read of size 8 at addr 00000000000001f7 by task mount/4040
>>
>> [CAUSE]
>> In open_ctree(), we have two variables to indicates errors: @ret and
>> @err.
>>
>> Unfortunately such confusion leads to the above crash, as in the error
>> handling of load_super_root(), we just goto fail_tree_roots label, but
>> at the end of error handling, we return @err instead of @ret.
>>
>> Thus even we failed to load tree root, we still return 0 for
>> open_ctree(), thus later btrfs_iget() would fail.
> 
> 
>   There are many child functions in open_ctree() that rely on the default
>   value of @err, which is -EINVAL, to return an error instead of ret.

That's even more bug prone.

After the first call site assigning @err, there is no more default value 
for @err.

And this is not the first bug involving two error indicating numbers.
Thus unless definitely needed, I strongly recommend not to use two error 
values.

Thanks,
Qu

> 
>   The decoupling of @ret and the actual error returned by open_ctree()
>   is intentional. IMO.
> 
>   However, the bug is that btrfs_init_btree_inode() return value is
>   assigned to @err instead of @ret.
> 
>   ret = btrfs_init_btree_inode(sb);
> 
>   And the regression is caused by the following commit:
> 
>   commit 097421116e288dd3f5baaf1dd7e86035db60336f
>    btrfs: move all btree inode initialization into btrfs_init_btree_inode
> 
>   This commit is still in misc-next. We can fold a fix as below:
> 
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 48368d4bc331..0e0c30fe6df6 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3360,9 +3360,11 @@ int __cold open_ctree(struct super_block *sb, 
> struct btrfs_fs_devices *fs_device
>                  goto fail;
>          }
> 
> -       err = btrfs_init_btree_inode(sb);
> -       if (err)
> +       ret = btrfs_init_btree_inode(sb);
> +       if (ret) {
> +               err = ret;
>                  goto fail;
> +       }
> 
>          invalidate_bdev(fs_devices->latest_dev->bdev);
> 
>
David Sterba Feb. 27, 2023, 9:56 p.m. UTC | #3
On Mon, Feb 27, 2023 at 02:34:25PM +0800, Qu Wenruo wrote:
> 
> 
> On 2023/2/27 14:14, Anand Jain wrote:
> > On 2/25/23 16:44, Qu Wenruo wrote:
> >> [BUG]
> >> Btrfs mount can lead to crash if the fs has critical trees corrupted:
> >>
> >>   # mkfs.btrfs  -f /dev/test/scratch1
> >>   # btrfs-map-logical -l 30588928 /dev/test/scratch1 # The bytenr is 
> >> tree root
> >>   mirror 1 logical 30588928 physical 38977536 device /dev/test/scratch1
> >>   mirror 2 logical 30588928 physical 307412992 device /dev/test/scratch1
> >>   # xfs_io -f -c "pwrite 38977536 4" -c "pwrite 307412992 4" 
> >> /dev/test/scratch1
> >>   # mount /dev/test/scratch1 /mnt/btrfs
> >>
> >> And the above mount would crash with the following dmesg:
> >>
> >>   BTRFS warning (device dm-4): checksum verify failed on logical 
> >> 30588928 mirror 1 wanted 0xcdcdcdcd found 0x6ca45898 level 0
> >>   BTRFS warning (device dm-4): checksum verify failed on logical 
> >> 30588928 mirror 2 wanted 0xcdcdcdcd found 0x6ca45898 level 0
> >>   BTRFS warning (device dm-4): couldn't read tree root
> >>   ==================================================================
> >>   BUG: KASAN: null-ptr-deref in btrfs_iget+0x74/0x160 [btrfs]
> >>   Read of size 8 at addr 00000000000001f7 by task mount/4040
> >>
> >> [CAUSE]
> >> In open_ctree(), we have two variables to indicates errors: @ret and
> >> @err.
> >>
> >> Unfortunately such confusion leads to the above crash, as in the error
> >> handling of load_super_root(), we just goto fail_tree_roots label, but
> >> at the end of error handling, we return @err instead of @ret.
> >>
> >> Thus even we failed to load tree root, we still return 0 for
> >> open_ctree(), thus later btrfs_iget() would fail.
> > 
> > 
> >   There are many child functions in open_ctree() that rely on the default
> >   value of @err, which is -EINVAL, to return an error instead of ret.
> 
> That's even more bug prone.
> 
> After the first call site assigning @err, there is no more default value 
> for @err.
> 
> And this is not the first bug involving two error indicating numbers.
> Thus unless definitely needed, I strongly recommend not to use two error 
> values.

Agreed, the error handling styles in open_ctree are mixed and we should
switch it to using 'ret' with error value set before gotos. Basically
what you did in this patch, please resend it as a cleanup, I've merged
toe fixup from Anand to the original patch.
diff mbox series

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 48368d4bc331..e11aca6353a5 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3339,14 +3339,11 @@  int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	struct btrfs_root *tree_root;
 	struct btrfs_root *chunk_root;
 	int ret;
-	int err = -EINVAL;
 	int level;
 
 	ret = init_mount_fs_info(fs_info, sb);
-	if (ret) {
-		err = ret;
+	if (ret)
 		goto fail;
-	}
 
 	/* These need to be init'ed before we start creating inodes and such. */
 	tree_root = btrfs_alloc_root(fs_info, BTRFS_ROOT_TREE_OBJECTID,
@@ -3356,12 +3353,12 @@  int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 				      GFP_KERNEL);
 	fs_info->chunk_root = chunk_root;
 	if (!tree_root || !chunk_root) {
-		err = -ENOMEM;
+		ret = -ENOMEM;
 		goto fail;
 	}
 
-	err = btrfs_init_btree_inode(sb);
-	if (err)
+	ret = btrfs_init_btree_inode(sb);
+	if (ret)
 		goto fail;
 
 	invalidate_bdev(fs_devices->latest_dev->bdev);
@@ -3371,7 +3368,7 @@  int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	 */
 	disk_super = btrfs_read_dev_super(fs_devices->latest_dev->bdev);
 	if (IS_ERR(disk_super)) {
-		err = PTR_ERR(disk_super);
+		ret = PTR_ERR(disk_super);
 		goto fail_alloc;
 	}
 
@@ -3383,7 +3380,7 @@  int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	if (!btrfs_supported_super_csum(csum_type)) {
 		btrfs_err(fs_info, "unsupported checksum algorithm: %u",
 			  csum_type);
-		err = -EINVAL;
+		ret = -EINVAL;
 		btrfs_release_disk_super(disk_super);
 		goto fail_alloc;
 	}
@@ -3392,7 +3389,6 @@  int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 
 	ret = btrfs_init_csum_hash(fs_info, csum_type);
 	if (ret) {
-		err = ret;
 		btrfs_release_disk_super(disk_super);
 		goto fail_alloc;
 	}
@@ -3403,7 +3399,7 @@  int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	 */
 	if (btrfs_check_super_csum(fs_info, disk_super)) {
 		btrfs_err(fs_info, "superblock checksum mismatch");
-		err = -EINVAL;
+		ret = -EINVAL;
 		btrfs_release_disk_super(disk_super);
 		goto fail_alloc;
 	}
@@ -3433,7 +3429,7 @@  int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	ret = btrfs_validate_mount_super(fs_info);
 	if (ret) {
 		btrfs_err(fs_info, "superblock contains fatal errors");
-		err = -EINVAL;
+		ret = -EINVAL;
 		goto fail_alloc;
 	}
 
@@ -3465,16 +3461,12 @@  int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	fs_info->stripesize = stripesize;
 
 	ret = btrfs_parse_options(fs_info, options, sb->s_flags);
-	if (ret) {
-		err = ret;
+	if (ret)
 		goto fail_alloc;
-	}
 
 	ret = btrfs_check_features(fs_info, !sb_rdonly(sb));
-	if (ret < 0) {
-		err = ret;
+	if (ret < 0)
 		goto fail_alloc;
-	}
 
 	if (sectorsize < PAGE_SIZE) {
 		struct btrfs_subpage_info *subpage_info;
@@ -3501,10 +3493,8 @@  int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	}
 
 	ret = btrfs_init_workqueues(fs_info);
-	if (ret) {
-		err = ret;
+	if (ret)
 		goto fail_sb_buffer;
-	}
 
 	sb->s_bdi->ra_pages *= btrfs_super_num_devices(disk_super);
 	sb->s_bdi->ra_pages = max(sb->s_bdi->ra_pages, SZ_4M / PAGE_SIZE);
@@ -3702,16 +3692,14 @@  int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	    !btrfs_test_opt(fs_info, NOLOGREPLAY)) {
 		btrfs_info(fs_info, "start tree-log replay");
 		ret = btrfs_replay_log(fs_info, fs_devices);
-		if (ret) {
-			err = ret;
+		if (ret)
 			goto fail_qgroup;
-		}
 	}
 
 	fs_info->fs_root = btrfs_get_fs_root(fs_info, BTRFS_FS_TREE_OBJECTID, true);
 	if (IS_ERR(fs_info->fs_root)) {
-		err = PTR_ERR(fs_info->fs_root);
-		btrfs_warn(fs_info, "failed to read fs tree: %d", err);
+		ret = PTR_ERR(fs_info->fs_root);
+		btrfs_warn(fs_info, "failed to read fs tree: %d", ret);
 		fs_info->fs_root = NULL;
 		goto fail_qgroup;
 	}
@@ -3788,7 +3776,7 @@  int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	iput(fs_info->btree_inode);
 fail:
 	btrfs_close_devices(fs_info->fs_devices);
-	return err;
+	return ret;
 }
 ALLOW_ERROR_INJECTION(open_ctree, ERRNO);