diff mbox series

[v2] btrfs: free device without BTRFS_MAGIC

Message ID 1ee9b318e3bb851aaec9c1efd1eadb117ad46638.1600741332.git.anand.jain@oracle.com
State New, archived
Headers show
Series [v2] btrfs: free device without BTRFS_MAGIC | expand

Commit Message

Anand Jain Sept. 22, 2020, 3:13 a.m. UTC
Many things can happen after the device is scanned and before the device
is mounted.

One such thing is losing the BTRFS_MAGIC on the device.

If it happens we still won't free that device from the memory and causes
the userland to confuse.

For example: As the BTRFS_IOC_DEV_INFO still carries the device path which
does not have the BTRFS_MAGIC, the btrfs fi show still shows device
which does not belong. As shown below.

mkfs.btrfs -fq -draid1 -mraid1 /dev/sda /dev/sdb

wipefs -a /dev/sdb
mount -o degraded /dev/sda /btrfs
btrfs fi show -m

/dev/sdb does not contain BTRFS_MAGIC and we still show it as part of
btrfs.
Label: none  uuid: 470ec6fb-646b-4464-b3cb-df1b26c527bd
        Total devices 2 FS bytes used 128.00KiB
        devid    1 size 3.00GiB used 571.19MiB path /dev/sda
        devid    2 size 3.00GiB used 571.19MiB path /dev/sdb

Fix is to return -ENODATA error code in btrfs_read_dev_one_super()
when BTRFS_MAGIC check fails, and its parent open_fs_devices() to
free the device in the mount-thread.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
v2: Do not return ENODATA on `btrfs_super_bytenr(super) != bytenr`

 fs/btrfs/disk-io.c |  8 ++++++--
 fs/btrfs/volumes.c | 19 +++++++++++++------
 2 files changed, 19 insertions(+), 8 deletions(-)

Comments

Nikolay Borisov Sept. 23, 2020, 11:09 a.m. UTC | #1
On 22.09.20 г. 6:13 ч., Anand Jain wrote:
> Many things can happen after the device is scanned and before the device
> is mounted.
> 
> One such thing is losing the BTRFS_MAGIC on the device.
> 
> If it happens we still won't free that device from the memory and causes
> the userland to confuse.
> 
> For example: As the BTRFS_IOC_DEV_INFO still carries the device path which
> does not have the BTRFS_MAGIC, the btrfs fi show still shows device
> which does not belong. As shown below.
> 
> mkfs.btrfs -fq -draid1 -mraid1 /dev/sda /dev/sdb
> 
> wipefs -a /dev/sdb
> mount -o degraded /dev/sda /btrfs
> btrfs fi show -m
> 
> /dev/sdb does not contain BTRFS_MAGIC and we still show it as part of
> btrfs.
> Label: none  uuid: 470ec6fb-646b-4464-b3cb-df1b26c527bd
>         Total devices 2 FS bytes used 128.00KiB
>         devid    1 size 3.00GiB used 571.19MiB path /dev/sda
>         devid    2 size 3.00GiB used 571.19MiB path /dev/sdb
> 
> Fix is to return -ENODATA error code in btrfs_read_dev_one_super()
> when BTRFS_MAGIC check fails, and its parent open_fs_devices() to
> free the device in the mount-thread.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>

Has an fstest for this been submitted ?

<snip>
Anand Jain Sept. 24, 2020, 11:55 a.m. UTC | #2
On 23/9/20 7:09 pm, Nikolay Borisov wrote:
> 
> 
> On 22.09.20 г. 6:13 ч., Anand Jain wrote:
>> Many things can happen after the device is scanned and before the device
>> is mounted.
>>
>> One such thing is losing the BTRFS_MAGIC on the device.
>>
>> If it happens we still won't free that device from the memory and causes
>> the userland to confuse.
>>
>> For example: As the BTRFS_IOC_DEV_INFO still carries the device path which
>> does not have the BTRFS_MAGIC, the btrfs fi show still shows device
>> which does not belong. As shown below.
>>
>> mkfs.btrfs -fq -draid1 -mraid1 /dev/sda /dev/sdb
>>
>> wipefs -a /dev/sdb
>> mount -o degraded /dev/sda /btrfs
>> btrfs fi show -m
>>
>> /dev/sdb does not contain BTRFS_MAGIC and we still show it as part of
>> btrfs.
>> Label: none  uuid: 470ec6fb-646b-4464-b3cb-df1b26c527bd
>>          Total devices 2 FS bytes used 128.00KiB
>>          devid    1 size 3.00GiB used 571.19MiB path /dev/sda
>>          devid    2 size 3.00GiB used 571.19MiB path /dev/sdb
>>
>> Fix is to return -ENODATA error code in btrfs_read_dev_one_super()
>> when BTRFS_MAGIC check fails, and its parent open_fs_devices() to
>> free the device in the mount-thread.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> 
> Has an fstest for this been submitted ?
> 

  This is fix for btrfs/198.


> <snip>
>
Josef Bacik Sept. 28, 2020, 6:14 p.m. UTC | #3
On 9/21/20 11:13 PM, Anand Jain wrote:
> Many things can happen after the device is scanned and before the device
> is mounted.
> 
> One such thing is losing the BTRFS_MAGIC on the device.
> 
> If it happens we still won't free that device from the memory and causes
> the userland to confuse.
> 
> For example: As the BTRFS_IOC_DEV_INFO still carries the device path which
> does not have the BTRFS_MAGIC, the btrfs fi show still shows device
> which does not belong. As shown below.
> 
> mkfs.btrfs -fq -draid1 -mraid1 /dev/sda /dev/sdb
> 
> wipefs -a /dev/sdb
> mount -o degraded /dev/sda /btrfs
> btrfs fi show -m
> 
> /dev/sdb does not contain BTRFS_MAGIC and we still show it as part of
> btrfs.
> Label: none  uuid: 470ec6fb-646b-4464-b3cb-df1b26c527bd
>          Total devices 2 FS bytes used 128.00KiB
>          devid    1 size 3.00GiB used 571.19MiB path /dev/sda
>          devid    2 size 3.00GiB used 571.19MiB path /dev/sdb
> 

Wouldn't this also happen if the bytenrs didn't match?  In that case you aren't 
freeing anything, and we'd still show the improper device correct?  So why not 
deal with that case in a similar way?  Thanks,

Josef
Anand Jain Sept. 30, 2020, 7:21 a.m. UTC | #4
On 29/9/20 2:14 am, Josef Bacik wrote:
> On 9/21/20 11:13 PM, Anand Jain wrote:
>> Many things can happen after the device is scanned and before the device
>> is mounted.
>>
>> One such thing is losing the BTRFS_MAGIC on the device.
>>
>> If it happens we still won't free that device from the memory and causes
>> the userland to confuse.
>>
>> For example: As the BTRFS_IOC_DEV_INFO still carries the device path 
>> which
>> does not have the BTRFS_MAGIC, the btrfs fi show still shows device
>> which does not belong. As shown below.
>>
>> mkfs.btrfs -fq -draid1 -mraid1 /dev/sda /dev/sdb
>>
>> wipefs -a /dev/sdb
>> mount -o degraded /dev/sda /btrfs
>> btrfs fi show -m
>>
>> /dev/sdb does not contain BTRFS_MAGIC and we still show it as part of
>> btrfs.
>> Label: none  uuid: 470ec6fb-646b-4464-b3cb-df1b26c527bd
>>          Total devices 2 FS bytes used 128.00KiB
>>          devid    1 size 3.00GiB used 571.19MiB path /dev/sda
>>          devid    2 size 3.00GiB used 571.19MiB path /dev/sdb
>>
> 
> Wouldn't this also happen if the bytenrs didn't match?  In that case you 
> aren't freeing anything, and we'd still show the improper device 
> correct?  So why not deal with that case in a similar way?  Thanks,

Freeing the device without the BTRFS_MAGIC is mandatory because the
device does not belong to btrfs even though we could notice from the
sysfs that there is missing flag on this devid.

I think I should check for the BTRFS_MAGIC first before bytenr check,
I shall swap them in v2 if there are no other comments. We need this
patch as a fix for the test case btrfs/198.

However bytenrs mismatch indicates corruption. If the degraded mount
option is not provided we would fail the mount. The user shall have the
opportunity to fix the corrupted superblock. We don't automatically
recover the corrupted superblock from the backup superblock copies. If
the degraded mount option is provided the corrupted device still be in
the device_list but with the missing flag set. Just by looking at btrfs
fi show the user won't know that one of the devices is not part of the
volume however when he looks into the /sys/fs/btrfs/fsid/devinfo/<devid>
/missing it shall show 1. Our serviceability part of the degraded volume
has some unfinished business when we evaluate it against the standard
RAS features, but we are slowly getting there.

Thanks, Anand

> 
> Josef
Josef Bacik Sept. 30, 2020, 12:41 p.m. UTC | #5
On 9/30/20 3:21 AM, Anand Jain wrote:
> 
> 
> On 29/9/20 2:14 am, Josef Bacik wrote:
>> On 9/21/20 11:13 PM, Anand Jain wrote:
>>> Many things can happen after the device is scanned and before the device
>>> is mounted.
>>>
>>> One such thing is losing the BTRFS_MAGIC on the device.
>>>
>>> If it happens we still won't free that device from the memory and causes
>>> the userland to confuse.
>>>
>>> For example: As the BTRFS_IOC_DEV_INFO still carries the device path which
>>> does not have the BTRFS_MAGIC, the btrfs fi show still shows device
>>> which does not belong. As shown below.
>>>
>>> mkfs.btrfs -fq -draid1 -mraid1 /dev/sda /dev/sdb
>>>
>>> wipefs -a /dev/sdb
>>> mount -o degraded /dev/sda /btrfs
>>> btrfs fi show -m
>>>
>>> /dev/sdb does not contain BTRFS_MAGIC and we still show it as part of
>>> btrfs.
>>> Label: none  uuid: 470ec6fb-646b-4464-b3cb-df1b26c527bd
>>>          Total devices 2 FS bytes used 128.00KiB
>>>          devid    1 size 3.00GiB used 571.19MiB path /dev/sda
>>>          devid    2 size 3.00GiB used 571.19MiB path /dev/sdb
>>>
>>
>> Wouldn't this also happen if the bytenrs didn't match?  In that case you 
>> aren't freeing anything, and we'd still show the improper device correct?  So 
>> why not deal with that case in a similar way?  Thanks,
> 
> Freeing the device without the BTRFS_MAGIC is mandatory because the
> device does not belong to btrfs even though we could notice from the
> sysfs that there is missing flag on this devid.
> 
> I think I should check for the BTRFS_MAGIC first before bytenr check,
> I shall swap them in v2 if there are no other comments. We need this
> patch as a fix for the test case btrfs/198.
> 
> However bytenrs mismatch indicates corruption. If the degraded mount
> option is not provided we would fail the mount. The user shall have the
> opportunity to fix the corrupted superblock. We don't automatically
> recover the corrupted superblock from the backup superblock copies. If
> the degraded mount option is provided the corrupted device still be in
> the device_list but with the missing flag set. Just by looking at btrfs
> fi show the user won't know that one of the devices is not part of the
> volume however when he looks into the /sys/fs/btrfs/fsid/devinfo/<devid>
> /missing it shall show 1. Our serviceability part of the degraded volume
> has some unfinished business when we evaluate it against the standard
> RAS features, but we are slowly getting there.
> 

Alright that's fair.  You can add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef
diff mbox series

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 160b485d2cc0..d84a49fe9639 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3429,12 +3429,16 @@  struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev,
 		return ERR_CAST(page);
 
 	super = page_address(page);
-	if (btrfs_super_bytenr(super) != bytenr ||
-		    btrfs_super_magic(super) != BTRFS_MAGIC) {
+	if (btrfs_super_bytenr(super) != bytenr) {
 		btrfs_release_disk_super(super);
 		return ERR_PTR(-EINVAL);
 	}
 
+	if (btrfs_super_magic(super) != BTRFS_MAGIC) {
+		btrfs_release_disk_super(super);
+		return ERR_PTR(-ENODATA);
+	}
+
 	return super;
 }
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 7cc677a7e544..ec9dac40b4f1 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1198,17 +1198,24 @@  static int open_fs_devices(struct btrfs_fs_devices *fs_devices,
 {
 	struct btrfs_device *device;
 	struct btrfs_device *latest_dev = NULL;
+	struct btrfs_device *tmp_device;
 
 	flags |= FMODE_EXCL;
 
-	list_for_each_entry(device, &fs_devices->devices, dev_list) {
-		/* Just open everything we can; ignore failures here */
-		if (btrfs_open_one_device(fs_devices, device, flags, holder))
-			continue;
+	list_for_each_entry_safe(device, tmp_device, &fs_devices->devices,
+				 dev_list) {
+		int ret;
 
-		if (!latest_dev ||
-		    device->generation > latest_dev->generation)
+		/* Just open everything we can; ignore failures here */
+		ret = btrfs_open_one_device(fs_devices, device, flags, holder);
+		if (ret == 0 && (!latest_dev ||
+		    device->generation > latest_dev->generation)) {
 			latest_dev = device;
+		} else if (ret == -ENODATA) {
+			fs_devices->num_devices--;
+			list_del(&device->dev_list);
+			btrfs_free_device(device);
+		}
 	}
 	if (fs_devices->open_devices == 0)
 		return -EINVAL;