Message ID | 20191007094515.925-5-anand.jain@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: fix issues due to alien device | expand |
On 7.10.19 г. 12:45 ч., Anand Jain wrote: > Following test case explains it all, even though the degraded mount is > successful the btrfs-progs fails to report the missing device. > > mkfs.btrfs -fq -draid1 -mraid1 /dev/sdc /dev/sdd && \ > wipefs -a /dev/sdd && mount -o degraded /dev/sdc /btrfs && \ > btrfs fi show -m /btrfs > > Label: none uuid: 2b3b8d92-572b-4d37-b4ee-046d3a538495 > Total devices 2 FS bytes used 128.00KiB > devid 1 size 1.09TiB used 2.01GiB path /dev/sdc > devid 2 size 1.09TiB used 2.01GiB path /dev/sdd > > This is because btrfs-progs does it fundamentally wrong way that > it deduces the missing device status in the user land instead of > refuting from the kernel. > > At the same time in the kernel when we know that there is device > with non-btrfs magic, then remove that device from the list so > that btrfs-progs or someother userland utility won't be confused. > > Signed-off-by: Anand Jain <anand.jain@oracle.com> > --- > fs/btrfs/disk-io.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index 326d5281ad93..e05856432456 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -3417,7 +3417,7 @@ int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num, > if (btrfs_super_bytenr(super) != bytenr || > btrfs_super_magic(super) != BTRFS_MAGIC) { > brelse(bh); > - return -EINVAL; > + return -EUCLEAN; This is really non-obvious and you are propagating the special-meaning of EUCLEAN waaaaaaaay beyond btrfs_open_one_device. In fact what this patch does is make the following call chain return EUCLAN: btrfs_open_one_device <-- finally removing the device in this function btrfs_get_bdev_and_sb <-- propagating it to here btrfs_read_dev_super btrfs_read_dev_one_super <-- you return the EUCLEAN And your commit log doesn't mention anything about that. EUCLEAN warrants a comment in this case since it changes behavior in higher-level layers. > } > > *bh_ret = bh; >
On 2019/10/7 下午9:30, Nikolay Borisov wrote: > > > On 7.10.19 г. 12:45 ч., Anand Jain wrote: >> Following test case explains it all, even though the degraded mount is >> successful the btrfs-progs fails to report the missing device. >> >> mkfs.btrfs -fq -draid1 -mraid1 /dev/sdc /dev/sdd && \ >> wipefs -a /dev/sdd && mount -o degraded /dev/sdc /btrfs && \ >> btrfs fi show -m /btrfs >> >> Label: none uuid: 2b3b8d92-572b-4d37-b4ee-046d3a538495 >> Total devices 2 FS bytes used 128.00KiB >> devid 1 size 1.09TiB used 2.01GiB path /dev/sdc >> devid 2 size 1.09TiB used 2.01GiB path /dev/sdd >> >> This is because btrfs-progs does it fundamentally wrong way that >> it deduces the missing device status in the user land instead of >> refuting from the kernel. >> >> At the same time in the kernel when we know that there is device >> with non-btrfs magic, then remove that device from the list so >> that btrfs-progs or someother userland utility won't be confused. >> >> Signed-off-by: Anand Jain <anand.jain@oracle.com> >> --- >> fs/btrfs/disk-io.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index 326d5281ad93..e05856432456 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -3417,7 +3417,7 @@ int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num, >> if (btrfs_super_bytenr(super) != bytenr || >> btrfs_super_magic(super) != BTRFS_MAGIC) { >> brelse(bh); >> - return -EINVAL; >> + return -EUCLEAN; > > This is really non-obvious and you are propagating the special-meaning > of EUCLEAN waaaaaaaay beyond btrfs_open_one_device. In fact what this > patch does is make the following call chain return EUCLAN: > > btrfs_open_one_device <-- finally removing the device in this function > btrfs_get_bdev_and_sb <-- propagating it to here > btrfs_read_dev_super > btrfs_read_dev_one_super <-- you return the EUCLEAN > > > And your commit log doesn't mention anything about that. EUCLEAN > warrants a comment in this case since it changes behavior in > higher-level layers. And, for most case, EUCLEAN should have a proper kernel message for what's going wrong, the value we hit and the value we expect. And for EUCLEAN, it's more like graceful error out for impossible thing. This is definitely not the case, and I believe the original EINVAL makes more sense than EUCLEAN. Thanks, Qu > >> } >> >> *bh_ret = bh; >>
On Mon, Oct 07, 2019 at 09:37:49PM +0800, Qu Wenruo wrote: > > > On 2019/10/7 下午9:30, Nikolay Borisov wrote: > > > > > > On 7.10.19 г. 12:45 ч., Anand Jain wrote: > >> Following test case explains it all, even though the degraded mount is > >> successful the btrfs-progs fails to report the missing device. > >> > >> mkfs.btrfs -fq -draid1 -mraid1 /dev/sdc /dev/sdd && \ > >> wipefs -a /dev/sdd && mount -o degraded /dev/sdc /btrfs && \ > >> btrfs fi show -m /btrfs > >> > >> Label: none uuid: 2b3b8d92-572b-4d37-b4ee-046d3a538495 > >> Total devices 2 FS bytes used 128.00KiB > >> devid 1 size 1.09TiB used 2.01GiB path /dev/sdc > >> devid 2 size 1.09TiB used 2.01GiB path /dev/sdd > >> > >> This is because btrfs-progs does it fundamentally wrong way that > >> it deduces the missing device status in the user land instead of > >> refuting from the kernel. > >> > >> At the same time in the kernel when we know that there is device > >> with non-btrfs magic, then remove that device from the list so > >> that btrfs-progs or someother userland utility won't be confused. > >> > >> Signed-off-by: Anand Jain <anand.jain@oracle.com> > >> --- > >> fs/btrfs/disk-io.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > >> index 326d5281ad93..e05856432456 100644 > >> --- a/fs/btrfs/disk-io.c > >> +++ b/fs/btrfs/disk-io.c > >> @@ -3417,7 +3417,7 @@ int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num, > >> if (btrfs_super_bytenr(super) != bytenr || > >> btrfs_super_magic(super) != BTRFS_MAGIC) { > >> brelse(bh); > >> - return -EINVAL; > >> + return -EUCLEAN; > > > > This is really non-obvious and you are propagating the special-meaning > > of EUCLEAN waaaaaaaay beyond btrfs_open_one_device. In fact what this > > patch does is make the following call chain return EUCLAN: > > > > btrfs_open_one_device <-- finally removing the device in this function > > btrfs_get_bdev_and_sb <-- propagating it to here > > btrfs_read_dev_super > > btrfs_read_dev_one_super <-- you return the EUCLEAN > > > > > > And your commit log doesn't mention anything about that. EUCLEAN > > warrants a comment in this case since it changes behavior in > > higher-level layers. > > > And, for most case, EUCLEAN should have a proper kernel message for > what's going wrong, the value we hit and the value we expect. > > And for EUCLEAN, it's more like graceful error out for impossible thing. > This is definitely not the case, and I believe the original EINVAL makes > more sense than EUCLEAN. I agree, EUCLEAN is wrong here.
On 10/8/19 1:03 AM, David Sterba wrote: > On Mon, Oct 07, 2019 at 09:37:49PM +0800, Qu Wenruo wrote: >> >> >> On 2019/10/7 下午9:30, Nikolay Borisov wrote: >>> >>> >>> On 7.10.19 г. 12:45 ч., Anand Jain wrote: >>>> Following test case explains it all, even though the degraded mount is >>>> successful the btrfs-progs fails to report the missing device. >>>> >>>> mkfs.btrfs -fq -draid1 -mraid1 /dev/sdc /dev/sdd && \ >>>> wipefs -a /dev/sdd && mount -o degraded /dev/sdc /btrfs && \ >>>> btrfs fi show -m /btrfs >>>> >>>> Label: none uuid: 2b3b8d92-572b-4d37-b4ee-046d3a538495 >>>> Total devices 2 FS bytes used 128.00KiB >>>> devid 1 size 1.09TiB used 2.01GiB path /dev/sdc >>>> devid 2 size 1.09TiB used 2.01GiB path /dev/sdd >>>> >>>> This is because btrfs-progs does it fundamentally wrong way that >>>> it deduces the missing device status in the user land instead of >>>> refuting from the kernel. >>>> >>>> At the same time in the kernel when we know that there is device >>>> with non-btrfs magic, then remove that device from the list so >>>> that btrfs-progs or someother userland utility won't be confused. >>>> >>>> Signed-off-by: Anand Jain <anand.jain@oracle.com> >>>> --- >>>> fs/btrfs/disk-io.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >>>> index 326d5281ad93..e05856432456 100644 >>>> --- a/fs/btrfs/disk-io.c >>>> +++ b/fs/btrfs/disk-io.c >>>> @@ -3417,7 +3417,7 @@ int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num, >>>> if (btrfs_super_bytenr(super) != bytenr || >>>> btrfs_super_magic(super) != BTRFS_MAGIC) { >>>> brelse(bh); >>>> - return -EINVAL; >>>> + return -EUCLEAN; >>> >>> This is really non-obvious and you are propagating the special-meaning >>> of EUCLEAN waaaaaaaay beyond btrfs_open_one_device. In fact what this >>> patch does is make the following call chain return EUCLAN: >>> >>> btrfs_open_one_device <-- finally removing the device in this function >>> btrfs_get_bdev_and_sb <-- propagating it to here >>> btrfs_read_dev_super >>> btrfs_read_dev_one_super <-- you return the EUCLEAN >>> >>> >>> And your commit log doesn't mention anything about that. EUCLEAN >>> warrants a comment in this case since it changes behavior in >>> higher-level layers. Ok. Which means the commit log needs to be fixed. >> >> And, for most case, EUCLEAN should have a proper kernel message for >> what's going wrong, the value we hit and the value we expect. If its a kernel message for EUCLEAN in the context of mounted state, I would say yes, as it indicates a corruption. But here we are still in the unmounted context and moving towards mounted context. Having an alien device in the fs_devices is not something logical bug nor a corruption, it just about a disk whose disk superblock got overwritten by the user operation. And its not a good idea to log the kernel messages arising from the user operation especially in the unmounted context. Otherwise we shall endup cluttering the kernel messages. Moreover if there is an alien device, the user must use -o degraded and we do have the missing kernel messages, and this will suffice to explain the state about the fsid being mounted. And the alien fsid, its a stale we don't worry about it. So no kernel messages. >> And for EUCLEAN, it's more like graceful error out for impossible thing. >> This is definitely not the case, and I believe the original EINVAL makes >> more sense than EUCLEAN. > > I agree, EUCLEAN is wrong here. > I am ok with any other suitable. It does not matter. And the other choice I have is ESTALE. But EINVAL is no go because we still want to keep the messed up device unless there is a confirmation that its alien. In the same function we use EINVAL if the device/partition size changed to smaller size after we have scanned the device. Which means we still keep the device in the list. Its the user choice, no need to meddle with it. bytenr = btrfs_sb_offset(copy_num); if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode)) return -EINVAL; But, EUCLEAN /* structure needs cleaning */ is errno which exactly says whats needed here. Because an alien device is a kind of corruption but it can easily happen due to unplanned device operations in the userland. But as it happened before we assembled the volume so its safe to clean and is not a road block when there is redundancy with degraded option. Thanks, Anand
On 8/10/19 11:26 AM, Anand Jain wrote: > On 10/8/19 1:03 AM, David Sterba wrote: >> On Mon, Oct 07, 2019 at 09:37:49PM +0800, Qu Wenruo wrote: >>> >>> >>> On 2019/10/7 下午9:30, Nikolay Borisov wrote: >>>> >>>> >>>> On 7.10.19 г. 12:45 ч., Anand Jain wrote: >>>>> Following test case explains it all, even though the degraded mount is >>>>> successful the btrfs-progs fails to report the missing device. >>>>> >>>>> mkfs.btrfs -fq -draid1 -mraid1 /dev/sdc /dev/sdd && \ >>>>> wipefs -a /dev/sdd && mount -o degraded /dev/sdc /btrfs && \ >>>>> btrfs fi show -m /btrfs >>>>> >>>>> Label: none uuid: 2b3b8d92-572b-4d37-b4ee-046d3a538495 >>>>> Total devices 2 FS bytes used 128.00KiB >>>>> devid 1 size 1.09TiB used 2.01GiB path /dev/sdc >>>>> devid 2 size 1.09TiB used 2.01GiB path /dev/sdd >>>>> >>>>> This is because btrfs-progs does it fundamentally wrong way that >>>>> it deduces the missing device status in the user land instead of >>>>> refuting from the kernel. >>>>> >>>>> At the same time in the kernel when we know that there is device >>>>> with non-btrfs magic, then remove that device from the list so >>>>> that btrfs-progs or someother userland utility won't be confused. >>>>> >>>>> Signed-off-by: Anand Jain <anand.jain@oracle.com> >>>>> --- >>>>> fs/btrfs/disk-io.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >>>>> index 326d5281ad93..e05856432456 100644 >>>>> --- a/fs/btrfs/disk-io.c >>>>> +++ b/fs/btrfs/disk-io.c >>>>> @@ -3417,7 +3417,7 @@ int btrfs_read_dev_one_super(struct >>>>> block_device *bdev, int copy_num, >>>>> if (btrfs_super_bytenr(super) != bytenr || >>>>> btrfs_super_magic(super) != BTRFS_MAGIC) { >>>>> brelse(bh); >>>>> - return -EINVAL; >>>>> + return -EUCLEAN; >>>> >>>> This is really non-obvious and you are propagating the special-meaning >>>> of EUCLEAN waaaaaaaay beyond btrfs_open_one_device. In fact what this >>>> patch does is make the following call chain return EUCLAN: >>>> >>>> btrfs_open_one_device <-- finally removing the device in this function >>>> btrfs_get_bdev_and_sb <-- propagating it to here >>>> btrfs_read_dev_super >>>> btrfs_read_dev_one_super <-- you return the EUCLEAN >>>> >>>> >>>> And your commit log doesn't mention anything about that. EUCLEAN >>>> warrants a comment in this case since it changes behavior in >>>> higher-level layers. > > Ok. Which means the commit log needs to be fixed. > >>> >>> And, for most case, EUCLEAN should have a proper kernel message for >>> what's going wrong, the value we hit and the value we expect. > > If its a kernel message for EUCLEAN in the context of mounted state, > I would say yes, as it indicates a corruption. > But here we are still in the unmounted context and moving towards > mounted context. Having an alien device in the fs_devices is not > something logical bug nor a corruption, it just about a disk whose > disk superblock got overwritten by the user operation. And its not > a good idea to log the kernel messages arising from the user > operation especially in the unmounted context. Otherwise we shall > endup cluttering the kernel messages. Moreover if there is an alien > device, the user must use -o degraded and we do have the missing > kernel messages, and this will suffice to explain the state about the > fsid being mounted. And the alien fsid, its a stale we don't worry > about it. So no kernel messages. > >>> And for EUCLEAN, it's more like graceful error out for impossible thing. >>> This is definitely not the case, and I believe the original EINVAL makes >>> more sense than EUCLEAN. >> >> I agree, EUCLEAN is wrong here. >> > > I am ok with any other suitable. It does not matter. And the other > choice I have is ESTALE. > > But EINVAL is no go because we still want to keep the messed up device > unless there is a confirmation that its alien. > > In the same function we use EINVAL if the device/partition size > changed to smaller size after we have scanned the device. Which > means we still keep the device in the list. Its the user choice, > no need to meddle with it. > > bytenr = btrfs_sb_offset(copy_num); > if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode)) > return -EINVAL; I didn't know that btrfs/197 btrfs/198 is failing. I found that we need this patch series so that it passes these tests. Logs for failing test cases are here [1]. Any feedback if errno -ESTALE is better instead of -EUCLEAN? As mentioned I can't use -EINVAL because its already been used. [1] btrfs/197 - output mismatch (see /xfstests-dev/results//btrfs/197.out.bad) --- tests/btrfs/197.out 2020-01-08 18:08:44.040258593 +0800 +++ /xfstests-dev/results//btrfs/197.out.bad 2020-01-15 16:32:23.870187397 +0800 @@ -3,23 +3,19 @@ Label: none uuid: <UUID> Total devices <NUM> FS bytes used <SIZE> devid <DEVID> size <SIZE> used <SIZE> path SCRATCH_DEV - *** Some devices missing raid5 Label: none uuid: <UUID> ... (Run 'diff -u /xfstests-dev/tests/btrfs/197.out /xfstests-dev/results//btrfs/197.out.bad' to see the entire diff) btrfs/198 - output mismatch (see /xfstests-dev/results//btrfs/198.out.bad) --- tests/btrfs/198.out 2020-01-08 18:08:44.040258593 +0800 +++ /xfstests-dev/results//btrfs/198.out.bad 2020-01-15 16:32:28.882282030 +0800 @@ -3,23 +3,19 @@ Label: none uuid: <UUID> Total devices <NUM> FS bytes used <SIZE> devid <DEVID> size <SIZE> used <SIZE> path SCRATCH_DEV - *** Some devices missing raid5 Label: none uuid: <UUID> ... (Run 'diff -u /xfstests-dev/tests/btrfs/198.out /xfstests-dev/results//btrfs/198.out.bad' to see the entire diff) Thanks, Anand > But, EUCLEAN /* structure needs cleaning */ is errno which exactly > says whats needed here. Because an alien device is a kind of > corruption but it can easily happen due to unplanned device operations > in the userland. But as it happened before we assembled the volume so > its safe to clean and is not a road block when there is redundancy > with degraded option. > > Thanks, Anand >
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 326d5281ad93..e05856432456 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3417,7 +3417,7 @@ int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num, if (btrfs_super_bytenr(super) != bytenr || btrfs_super_magic(super) != BTRFS_MAGIC) { brelse(bh); - return -EINVAL; + return -EUCLEAN; } *bh_ret = bh;
Following test case explains it all, even though the degraded mount is successful the btrfs-progs fails to report the missing device. mkfs.btrfs -fq -draid1 -mraid1 /dev/sdc /dev/sdd && \ wipefs -a /dev/sdd && mount -o degraded /dev/sdc /btrfs && \ btrfs fi show -m /btrfs Label: none uuid: 2b3b8d92-572b-4d37-b4ee-046d3a538495 Total devices 2 FS bytes used 128.00KiB devid 1 size 1.09TiB used 2.01GiB path /dev/sdc devid 2 size 1.09TiB used 2.01GiB path /dev/sdd This is because btrfs-progs does it fundamentally wrong way that it deduces the missing device status in the user land instead of refuting from the kernel. At the same time in the kernel when we know that there is device with non-btrfs magic, then remove that device from the list so that btrfs-progs or someother userland utility won't be confused. Signed-off-by: Anand Jain <anand.jain@oracle.com> --- fs/btrfs/disk-io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)