Message ID | 1ee9b318e3bb851aaec9c1efd1eadb117ad46638.1600741332.git.anand.jain@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] btrfs: free device without BTRFS_MAGIC | expand |
On 22.09.20 г. 6:13 ч., Anand Jain wrote: > Many things can happen after the device is scanned and before the device > is mounted. > > One such thing is losing the BTRFS_MAGIC on the device. > > If it happens we still won't free that device from the memory and causes > the userland to confuse. > > For example: As the BTRFS_IOC_DEV_INFO still carries the device path which > does not have the BTRFS_MAGIC, the btrfs fi show still shows device > which does not belong. As shown below. > > mkfs.btrfs -fq -draid1 -mraid1 /dev/sda /dev/sdb > > wipefs -a /dev/sdb > mount -o degraded /dev/sda /btrfs > btrfs fi show -m > > /dev/sdb does not contain BTRFS_MAGIC and we still show it as part of > btrfs. > Label: none uuid: 470ec6fb-646b-4464-b3cb-df1b26c527bd > Total devices 2 FS bytes used 128.00KiB > devid 1 size 3.00GiB used 571.19MiB path /dev/sda > devid 2 size 3.00GiB used 571.19MiB path /dev/sdb > > Fix is to return -ENODATA error code in btrfs_read_dev_one_super() > when BTRFS_MAGIC check fails, and its parent open_fs_devices() to > free the device in the mount-thread. > > Signed-off-by: Anand Jain <anand.jain@oracle.com> Has an fstest for this been submitted ? <snip>
On 23/9/20 7:09 pm, Nikolay Borisov wrote: > > > On 22.09.20 г. 6:13 ч., Anand Jain wrote: >> Many things can happen after the device is scanned and before the device >> is mounted. >> >> One such thing is losing the BTRFS_MAGIC on the device. >> >> If it happens we still won't free that device from the memory and causes >> the userland to confuse. >> >> For example: As the BTRFS_IOC_DEV_INFO still carries the device path which >> does not have the BTRFS_MAGIC, the btrfs fi show still shows device >> which does not belong. As shown below. >> >> mkfs.btrfs -fq -draid1 -mraid1 /dev/sda /dev/sdb >> >> wipefs -a /dev/sdb >> mount -o degraded /dev/sda /btrfs >> btrfs fi show -m >> >> /dev/sdb does not contain BTRFS_MAGIC and we still show it as part of >> btrfs. >> Label: none uuid: 470ec6fb-646b-4464-b3cb-df1b26c527bd >> Total devices 2 FS bytes used 128.00KiB >> devid 1 size 3.00GiB used 571.19MiB path /dev/sda >> devid 2 size 3.00GiB used 571.19MiB path /dev/sdb >> >> Fix is to return -ENODATA error code in btrfs_read_dev_one_super() >> when BTRFS_MAGIC check fails, and its parent open_fs_devices() to >> free the device in the mount-thread. >> >> Signed-off-by: Anand Jain <anand.jain@oracle.com> > > Has an fstest for this been submitted ? > This is fix for btrfs/198. > <snip> >
On 9/21/20 11:13 PM, Anand Jain wrote: > Many things can happen after the device is scanned and before the device > is mounted. > > One such thing is losing the BTRFS_MAGIC on the device. > > If it happens we still won't free that device from the memory and causes > the userland to confuse. > > For example: As the BTRFS_IOC_DEV_INFO still carries the device path which > does not have the BTRFS_MAGIC, the btrfs fi show still shows device > which does not belong. As shown below. > > mkfs.btrfs -fq -draid1 -mraid1 /dev/sda /dev/sdb > > wipefs -a /dev/sdb > mount -o degraded /dev/sda /btrfs > btrfs fi show -m > > /dev/sdb does not contain BTRFS_MAGIC and we still show it as part of > btrfs. > Label: none uuid: 470ec6fb-646b-4464-b3cb-df1b26c527bd > Total devices 2 FS bytes used 128.00KiB > devid 1 size 3.00GiB used 571.19MiB path /dev/sda > devid 2 size 3.00GiB used 571.19MiB path /dev/sdb > Wouldn't this also happen if the bytenrs didn't match? In that case you aren't freeing anything, and we'd still show the improper device correct? So why not deal with that case in a similar way? Thanks, Josef
On 29/9/20 2:14 am, Josef Bacik wrote: > On 9/21/20 11:13 PM, Anand Jain wrote: >> Many things can happen after the device is scanned and before the device >> is mounted. >> >> One such thing is losing the BTRFS_MAGIC on the device. >> >> If it happens we still won't free that device from the memory and causes >> the userland to confuse. >> >> For example: As the BTRFS_IOC_DEV_INFO still carries the device path >> which >> does not have the BTRFS_MAGIC, the btrfs fi show still shows device >> which does not belong. As shown below. >> >> mkfs.btrfs -fq -draid1 -mraid1 /dev/sda /dev/sdb >> >> wipefs -a /dev/sdb >> mount -o degraded /dev/sda /btrfs >> btrfs fi show -m >> >> /dev/sdb does not contain BTRFS_MAGIC and we still show it as part of >> btrfs. >> Label: none uuid: 470ec6fb-646b-4464-b3cb-df1b26c527bd >> Total devices 2 FS bytes used 128.00KiB >> devid 1 size 3.00GiB used 571.19MiB path /dev/sda >> devid 2 size 3.00GiB used 571.19MiB path /dev/sdb >> > > Wouldn't this also happen if the bytenrs didn't match? In that case you > aren't freeing anything, and we'd still show the improper device > correct? So why not deal with that case in a similar way? Thanks, Freeing the device without the BTRFS_MAGIC is mandatory because the device does not belong to btrfs even though we could notice from the sysfs that there is missing flag on this devid. I think I should check for the BTRFS_MAGIC first before bytenr check, I shall swap them in v2 if there are no other comments. We need this patch as a fix for the test case btrfs/198. However bytenrs mismatch indicates corruption. If the degraded mount option is not provided we would fail the mount. The user shall have the opportunity to fix the corrupted superblock. We don't automatically recover the corrupted superblock from the backup superblock copies. If the degraded mount option is provided the corrupted device still be in the device_list but with the missing flag set. Just by looking at btrfs fi show the user won't know that one of the devices is not part of the volume however when he looks into the /sys/fs/btrfs/fsid/devinfo/<devid> /missing it shall show 1. Our serviceability part of the degraded volume has some unfinished business when we evaluate it against the standard RAS features, but we are slowly getting there. Thanks, Anand > > Josef
On 9/30/20 3:21 AM, Anand Jain wrote: > > > On 29/9/20 2:14 am, Josef Bacik wrote: >> On 9/21/20 11:13 PM, Anand Jain wrote: >>> Many things can happen after the device is scanned and before the device >>> is mounted. >>> >>> One such thing is losing the BTRFS_MAGIC on the device. >>> >>> If it happens we still won't free that device from the memory and causes >>> the userland to confuse. >>> >>> For example: As the BTRFS_IOC_DEV_INFO still carries the device path which >>> does not have the BTRFS_MAGIC, the btrfs fi show still shows device >>> which does not belong. As shown below. >>> >>> mkfs.btrfs -fq -draid1 -mraid1 /dev/sda /dev/sdb >>> >>> wipefs -a /dev/sdb >>> mount -o degraded /dev/sda /btrfs >>> btrfs fi show -m >>> >>> /dev/sdb does not contain BTRFS_MAGIC and we still show it as part of >>> btrfs. >>> Label: none uuid: 470ec6fb-646b-4464-b3cb-df1b26c527bd >>> Total devices 2 FS bytes used 128.00KiB >>> devid 1 size 3.00GiB used 571.19MiB path /dev/sda >>> devid 2 size 3.00GiB used 571.19MiB path /dev/sdb >>> >> >> Wouldn't this also happen if the bytenrs didn't match? In that case you >> aren't freeing anything, and we'd still show the improper device correct? So >> why not deal with that case in a similar way? Thanks, > > Freeing the device without the BTRFS_MAGIC is mandatory because the > device does not belong to btrfs even though we could notice from the > sysfs that there is missing flag on this devid. > > I think I should check for the BTRFS_MAGIC first before bytenr check, > I shall swap them in v2 if there are no other comments. We need this > patch as a fix for the test case btrfs/198. > > However bytenrs mismatch indicates corruption. If the degraded mount > option is not provided we would fail the mount. The user shall have the > opportunity to fix the corrupted superblock. We don't automatically > recover the corrupted superblock from the backup superblock copies. If > the degraded mount option is provided the corrupted device still be in > the device_list but with the missing flag set. Just by looking at btrfs > fi show the user won't know that one of the devices is not part of the > volume however when he looks into the /sys/fs/btrfs/fsid/devinfo/<devid> > /missing it shall show 1. Our serviceability part of the degraded volume > has some unfinished business when we evaluate it against the standard > RAS features, but we are slowly getting there. > Alright that's fair. You can add Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 160b485d2cc0..d84a49fe9639 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3429,12 +3429,16 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev, return ERR_CAST(page); super = page_address(page); - if (btrfs_super_bytenr(super) != bytenr || - btrfs_super_magic(super) != BTRFS_MAGIC) { + if (btrfs_super_bytenr(super) != bytenr) { btrfs_release_disk_super(super); return ERR_PTR(-EINVAL); } + if (btrfs_super_magic(super) != BTRFS_MAGIC) { + btrfs_release_disk_super(super); + return ERR_PTR(-ENODATA); + } + return super; } diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 7cc677a7e544..ec9dac40b4f1 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1198,17 +1198,24 @@ static int open_fs_devices(struct btrfs_fs_devices *fs_devices, { struct btrfs_device *device; struct btrfs_device *latest_dev = NULL; + struct btrfs_device *tmp_device; flags |= FMODE_EXCL; - list_for_each_entry(device, &fs_devices->devices, dev_list) { - /* Just open everything we can; ignore failures here */ - if (btrfs_open_one_device(fs_devices, device, flags, holder)) - continue; + list_for_each_entry_safe(device, tmp_device, &fs_devices->devices, + dev_list) { + int ret; - if (!latest_dev || - device->generation > latest_dev->generation) + /* Just open everything we can; ignore failures here */ + ret = btrfs_open_one_device(fs_devices, device, flags, holder); + if (ret == 0 && (!latest_dev || + device->generation > latest_dev->generation)) { latest_dev = device; + } else if (ret == -ENODATA) { + fs_devices->num_devices--; + list_del(&device->dev_list); + btrfs_free_device(device); + } } if (fs_devices->open_devices == 0) return -EINVAL;
Many things can happen after the device is scanned and before the device is mounted. One such thing is losing the BTRFS_MAGIC on the device. If it happens we still won't free that device from the memory and causes the userland to confuse. For example: As the BTRFS_IOC_DEV_INFO still carries the device path which does not have the BTRFS_MAGIC, the btrfs fi show still shows device which does not belong. As shown below. mkfs.btrfs -fq -draid1 -mraid1 /dev/sda /dev/sdb wipefs -a /dev/sdb mount -o degraded /dev/sda /btrfs btrfs fi show -m /dev/sdb does not contain BTRFS_MAGIC and we still show it as part of btrfs. Label: none uuid: 470ec6fb-646b-4464-b3cb-df1b26c527bd Total devices 2 FS bytes used 128.00KiB devid 1 size 3.00GiB used 571.19MiB path /dev/sda devid 2 size 3.00GiB used 571.19MiB path /dev/sdb Fix is to return -ENODATA error code in btrfs_read_dev_one_super() when BTRFS_MAGIC check fails, and its parent open_fs_devices() to free the device in the mount-thread. Signed-off-by: Anand Jain <anand.jain@oracle.com> --- v2: Do not return ENODATA on `btrfs_super_bytenr(super) != bytenr` fs/btrfs/disk-io.c | 8 ++++++-- fs/btrfs/volumes.c | 19 +++++++++++++------ 2 files changed, 19 insertions(+), 8 deletions(-)