diff mbox series

[4/5] btrfs: remove identified alien device in open_fs_devices

Message ID 20191007094515.925-5-anand.jain@oracle.com (mailing list archive)
State New, archived
Headers show
Series btrfs: fix issues due to alien device | expand

Commit Message

Anand Jain Oct. 7, 2019, 9:45 a.m. UTC
Following test case explains it all, even though the degraded mount is
successful the btrfs-progs fails to report the missing device.

 mkfs.btrfs -fq -draid1 -mraid1 /dev/sdc /dev/sdd && \
 wipefs -a /dev/sdd && mount -o degraded /dev/sdc /btrfs && \
 btrfs fi show -m /btrfs

 Label: none  uuid: 2b3b8d92-572b-4d37-b4ee-046d3a538495
	Total devices 2 FS bytes used 128.00KiB
	devid    1 size 1.09TiB used 2.01GiB path /dev/sdc
	devid    2 size 1.09TiB used 2.01GiB path /dev/sdd

This is because btrfs-progs does it fundamentally wrong way that
it deduces the missing device status in the user land instead of
refuting from the kernel.

At the same time in the kernel when we know that there is device
with non-btrfs magic, then remove that device from the list so
that btrfs-progs or someother userland utility won't be confused.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/disk-io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Nikolay Borisov Oct. 7, 2019, 1:30 p.m. UTC | #1
On 7.10.19 г. 12:45 ч., Anand Jain wrote:
> Following test case explains it all, even though the degraded mount is
> successful the btrfs-progs fails to report the missing device.
> 
>  mkfs.btrfs -fq -draid1 -mraid1 /dev/sdc /dev/sdd && \
>  wipefs -a /dev/sdd && mount -o degraded /dev/sdc /btrfs && \
>  btrfs fi show -m /btrfs
> 
>  Label: none  uuid: 2b3b8d92-572b-4d37-b4ee-046d3a538495
> 	Total devices 2 FS bytes used 128.00KiB
> 	devid    1 size 1.09TiB used 2.01GiB path /dev/sdc
> 	devid    2 size 1.09TiB used 2.01GiB path /dev/sdd
> 
> This is because btrfs-progs does it fundamentally wrong way that
> it deduces the missing device status in the user land instead of
> refuting from the kernel.
> 
> At the same time in the kernel when we know that there is device
> with non-btrfs magic, then remove that device from the list so
> that btrfs-progs or someother userland utility won't be confused.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  fs/btrfs/disk-io.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 326d5281ad93..e05856432456 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3417,7 +3417,7 @@ int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
>  	if (btrfs_super_bytenr(super) != bytenr ||
>  		    btrfs_super_magic(super) != BTRFS_MAGIC) {
>  		brelse(bh);
> -		return -EINVAL;
> +		return -EUCLEAN;

This is really non-obvious and you are propagating the special-meaning
of EUCLEAN waaaaaaaay beyond btrfs_open_one_device. In fact what this
patch does is make the following call chain return EUCLAN:

btrfs_open_one_device <-- finally removing the device in this function
 btrfs_get_bdev_and_sb <-- propagating it to here
  btrfs_read_dev_super
    btrfs_read_dev_one_super <-- you return the EUCLEAN


And your commit log doesn't mention anything about that. EUCLEAN
warrants a comment in this case since it changes behavior in
higher-level layers.

>  	}
>  
>  	*bh_ret = bh;
>
Qu Wenruo Oct. 7, 2019, 1:37 p.m. UTC | #2
On 2019/10/7 下午9:30, Nikolay Borisov wrote:
>
>
> On 7.10.19 г. 12:45 ч., Anand Jain wrote:
>> Following test case explains it all, even though the degraded mount is
>> successful the btrfs-progs fails to report the missing device.
>>
>>  mkfs.btrfs -fq -draid1 -mraid1 /dev/sdc /dev/sdd && \
>>  wipefs -a /dev/sdd && mount -o degraded /dev/sdc /btrfs && \
>>  btrfs fi show -m /btrfs
>>
>>  Label: none  uuid: 2b3b8d92-572b-4d37-b4ee-046d3a538495
>> 	Total devices 2 FS bytes used 128.00KiB
>> 	devid    1 size 1.09TiB used 2.01GiB path /dev/sdc
>> 	devid    2 size 1.09TiB used 2.01GiB path /dev/sdd
>>
>> This is because btrfs-progs does it fundamentally wrong way that
>> it deduces the missing device status in the user land instead of
>> refuting from the kernel.
>>
>> At the same time in the kernel when we know that there is device
>> with non-btrfs magic, then remove that device from the list so
>> that btrfs-progs or someother userland utility won't be confused.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>  fs/btrfs/disk-io.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 326d5281ad93..e05856432456 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -3417,7 +3417,7 @@ int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
>>  	if (btrfs_super_bytenr(super) != bytenr ||
>>  		    btrfs_super_magic(super) != BTRFS_MAGIC) {
>>  		brelse(bh);
>> -		return -EINVAL;
>> +		return -EUCLEAN;
>
> This is really non-obvious and you are propagating the special-meaning
> of EUCLEAN waaaaaaaay beyond btrfs_open_one_device. In fact what this
> patch does is make the following call chain return EUCLAN:
>
> btrfs_open_one_device <-- finally removing the device in this function
>  btrfs_get_bdev_and_sb <-- propagating it to here
>   btrfs_read_dev_super
>     btrfs_read_dev_one_super <-- you return the EUCLEAN
>
>
> And your commit log doesn't mention anything about that. EUCLEAN
> warrants a comment in this case since it changes behavior in
> higher-level layers.


And, for most case, EUCLEAN should have a proper kernel message for
what's going wrong, the value we hit and the value we expect.

And for EUCLEAN, it's more like graceful error out for impossible thing.
This is definitely not the case, and I believe the original EINVAL makes
more sense than EUCLEAN.

Thanks,
Qu

>
>>  	}
>>
>>  	*bh_ret = bh;
>>
David Sterba Oct. 7, 2019, 5:03 p.m. UTC | #3
On Mon, Oct 07, 2019 at 09:37:49PM +0800, Qu Wenruo wrote:
> 
> 
> On 2019/10/7 下午9:30, Nikolay Borisov wrote:
> >
> >
> > On 7.10.19 г. 12:45 ч., Anand Jain wrote:
> >> Following test case explains it all, even though the degraded mount is
> >> successful the btrfs-progs fails to report the missing device.
> >>
> >>  mkfs.btrfs -fq -draid1 -mraid1 /dev/sdc /dev/sdd && \
> >>  wipefs -a /dev/sdd && mount -o degraded /dev/sdc /btrfs && \
> >>  btrfs fi show -m /btrfs
> >>
> >>  Label: none  uuid: 2b3b8d92-572b-4d37-b4ee-046d3a538495
> >> 	Total devices 2 FS bytes used 128.00KiB
> >> 	devid    1 size 1.09TiB used 2.01GiB path /dev/sdc
> >> 	devid    2 size 1.09TiB used 2.01GiB path /dev/sdd
> >>
> >> This is because btrfs-progs does it fundamentally wrong way that
> >> it deduces the missing device status in the user land instead of
> >> refuting from the kernel.
> >>
> >> At the same time in the kernel when we know that there is device
> >> with non-btrfs magic, then remove that device from the list so
> >> that btrfs-progs or someother userland utility won't be confused.
> >>
> >> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> >> ---
> >>  fs/btrfs/disk-io.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> >> index 326d5281ad93..e05856432456 100644
> >> --- a/fs/btrfs/disk-io.c
> >> +++ b/fs/btrfs/disk-io.c
> >> @@ -3417,7 +3417,7 @@ int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
> >>  	if (btrfs_super_bytenr(super) != bytenr ||
> >>  		    btrfs_super_magic(super) != BTRFS_MAGIC) {
> >>  		brelse(bh);
> >> -		return -EINVAL;
> >> +		return -EUCLEAN;
> >
> > This is really non-obvious and you are propagating the special-meaning
> > of EUCLEAN waaaaaaaay beyond btrfs_open_one_device. In fact what this
> > patch does is make the following call chain return EUCLAN:
> >
> > btrfs_open_one_device <-- finally removing the device in this function
> >  btrfs_get_bdev_and_sb <-- propagating it to here
> >   btrfs_read_dev_super
> >     btrfs_read_dev_one_super <-- you return the EUCLEAN
> >
> >
> > And your commit log doesn't mention anything about that. EUCLEAN
> > warrants a comment in this case since it changes behavior in
> > higher-level layers.
> 
> 
> And, for most case, EUCLEAN should have a proper kernel message for
> what's going wrong, the value we hit and the value we expect.
> 
> And for EUCLEAN, it's more like graceful error out for impossible thing.
> This is definitely not the case, and I believe the original EINVAL makes
> more sense than EUCLEAN.

I agree, EUCLEAN is wrong here.
Anand Jain Oct. 8, 2019, 3:26 a.m. UTC | #4
On 10/8/19 1:03 AM, David Sterba wrote:
> On Mon, Oct 07, 2019 at 09:37:49PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2019/10/7 下午9:30, Nikolay Borisov wrote:
>>>
>>>
>>> On 7.10.19 г. 12:45 ч., Anand Jain wrote:
>>>> Following test case explains it all, even though the degraded mount is
>>>> successful the btrfs-progs fails to report the missing device.
>>>>
>>>>   mkfs.btrfs -fq -draid1 -mraid1 /dev/sdc /dev/sdd && \
>>>>   wipefs -a /dev/sdd && mount -o degraded /dev/sdc /btrfs && \
>>>>   btrfs fi show -m /btrfs
>>>>
>>>>   Label: none  uuid: 2b3b8d92-572b-4d37-b4ee-046d3a538495
>>>> 	Total devices 2 FS bytes used 128.00KiB
>>>> 	devid    1 size 1.09TiB used 2.01GiB path /dev/sdc
>>>> 	devid    2 size 1.09TiB used 2.01GiB path /dev/sdd
>>>>
>>>> This is because btrfs-progs does it fundamentally wrong way that
>>>> it deduces the missing device status in the user land instead of
>>>> refuting from the kernel.
>>>>
>>>> At the same time in the kernel when we know that there is device
>>>> with non-btrfs magic, then remove that device from the list so
>>>> that btrfs-progs or someother userland utility won't be confused.
>>>>
>>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>>> ---
>>>>   fs/btrfs/disk-io.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>> index 326d5281ad93..e05856432456 100644
>>>> --- a/fs/btrfs/disk-io.c
>>>> +++ b/fs/btrfs/disk-io.c
>>>> @@ -3417,7 +3417,7 @@ int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
>>>>   	if (btrfs_super_bytenr(super) != bytenr ||
>>>>   		    btrfs_super_magic(super) != BTRFS_MAGIC) {
>>>>   		brelse(bh);
>>>> -		return -EINVAL;
>>>> +		return -EUCLEAN;
>>>
>>> This is really non-obvious and you are propagating the special-meaning
>>> of EUCLEAN waaaaaaaay beyond btrfs_open_one_device. In fact what this
>>> patch does is make the following call chain return EUCLAN:
>>>
>>> btrfs_open_one_device <-- finally removing the device in this function
>>>   btrfs_get_bdev_and_sb <-- propagating it to here
>>>    btrfs_read_dev_super
>>>      btrfs_read_dev_one_super <-- you return the EUCLEAN
>>>
>>>
>>> And your commit log doesn't mention anything about that. EUCLEAN
>>> warrants a comment in this case since it changes behavior in
>>> higher-level layers.

  Ok. Which means the commit log needs to be fixed.

>>
>> And, for most case, EUCLEAN should have a proper kernel message for
>> what's going wrong, the value we hit and the value we expect.

  If its a kernel message for EUCLEAN in the context of mounted state,
  I would say yes, as it indicates a corruption.
  But here we are still in the unmounted context and moving towards
  mounted context. Having an alien device in the fs_devices is not
  something logical bug nor a corruption, it just about a disk whose
  disk superblock got overwritten by the user operation. And its not
  a good idea to log the kernel messages arising from the user
  operation especially in the unmounted context. Otherwise we shall
  endup cluttering the kernel messages. Moreover if there is an alien
  device, the user must use -o degraded and we do have the missing
  kernel messages, and this will suffice to explain the state about the
  fsid being mounted. And the alien fsid, its a stale we don't worry
  about it. So no kernel messages.

>> And for EUCLEAN, it's more like graceful error out for impossible thing.
>> This is definitely not the case, and I believe the original EINVAL makes
>> more sense than EUCLEAN.
> 
> I agree, EUCLEAN is wrong here.
>

   I am ok with any other suitable. It does not matter. And the other
   choice I have is ESTALE.

   But EINVAL is no go because we still want to keep the messed up device
   unless there is a confirmation that its alien.

   In the same function we use EINVAL if the device/partition size
   changed to smaller size after we have scanned the device. Which
   means we still keep the device in the list. Its the user choice,
   no need to meddle with it.

    bytenr = btrfs_sb_offset(copy_num);
    if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode))
           return -EINVAL;

   But, EUCLEAN /* structure needs cleaning */ is errno which exactly
   says whats needed here. Because an alien device is a kind of
   corruption but it can easily happen due to unplanned device operations
   in the userland. But as it happened before we assembled the volume so
   its safe to clean and is not a road block when there is redundancy
   with degraded option.

Thanks, Anand
Anand Jain Jan. 15, 2020, 8:56 a.m. UTC | #5
On 8/10/19 11:26 AM, Anand Jain wrote:
> On 10/8/19 1:03 AM, David Sterba wrote:
>> On Mon, Oct 07, 2019 at 09:37:49PM +0800, Qu Wenruo wrote:
>>>
>>>
>>> On 2019/10/7 下午9:30, Nikolay Borisov wrote:
>>>>
>>>>
>>>> On 7.10.19 г. 12:45 ч., Anand Jain wrote:
>>>>> Following test case explains it all, even though the degraded mount is
>>>>> successful the btrfs-progs fails to report the missing device.
>>>>>
>>>>>   mkfs.btrfs -fq -draid1 -mraid1 /dev/sdc /dev/sdd && \
>>>>>   wipefs -a /dev/sdd && mount -o degraded /dev/sdc /btrfs && \
>>>>>   btrfs fi show -m /btrfs
>>>>>
>>>>>   Label: none  uuid: 2b3b8d92-572b-4d37-b4ee-046d3a538495
>>>>>     Total devices 2 FS bytes used 128.00KiB
>>>>>     devid    1 size 1.09TiB used 2.01GiB path /dev/sdc
>>>>>     devid    2 size 1.09TiB used 2.01GiB path /dev/sdd
>>>>>
>>>>> This is because btrfs-progs does it fundamentally wrong way that
>>>>> it deduces the missing device status in the user land instead of
>>>>> refuting from the kernel.
>>>>>
>>>>> At the same time in the kernel when we know that there is device
>>>>> with non-btrfs magic, then remove that device from the list so
>>>>> that btrfs-progs or someother userland utility won't be confused.
>>>>>
>>>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>>>> ---
>>>>>   fs/btrfs/disk-io.c | 2 +-
>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>>> index 326d5281ad93..e05856432456 100644
>>>>> --- a/fs/btrfs/disk-io.c
>>>>> +++ b/fs/btrfs/disk-io.c
>>>>> @@ -3417,7 +3417,7 @@ int btrfs_read_dev_one_super(struct 
>>>>> block_device *bdev, int copy_num,
>>>>>       if (btrfs_super_bytenr(super) != bytenr ||
>>>>>               btrfs_super_magic(super) != BTRFS_MAGIC) {
>>>>>           brelse(bh);
>>>>> -        return -EINVAL;
>>>>> +        return -EUCLEAN;
>>>>
>>>> This is really non-obvious and you are propagating the special-meaning
>>>> of EUCLEAN waaaaaaaay beyond btrfs_open_one_device. In fact what this
>>>> patch does is make the following call chain return EUCLAN:
>>>>
>>>> btrfs_open_one_device <-- finally removing the device in this function
>>>>   btrfs_get_bdev_and_sb <-- propagating it to here
>>>>    btrfs_read_dev_super
>>>>      btrfs_read_dev_one_super <-- you return the EUCLEAN
>>>>
>>>>
>>>> And your commit log doesn't mention anything about that. EUCLEAN
>>>> warrants a comment in this case since it changes behavior in
>>>> higher-level layers.
> 
>   Ok. Which means the commit log needs to be fixed.
> 
>>>
>>> And, for most case, EUCLEAN should have a proper kernel message for
>>> what's going wrong, the value we hit and the value we expect.
> 
>   If its a kernel message for EUCLEAN in the context of mounted state,
>   I would say yes, as it indicates a corruption.
>   But here we are still in the unmounted context and moving towards
>   mounted context. Having an alien device in the fs_devices is not
>   something logical bug nor a corruption, it just about a disk whose
>   disk superblock got overwritten by the user operation. And its not
>   a good idea to log the kernel messages arising from the user
>   operation especially in the unmounted context. Otherwise we shall
>   endup cluttering the kernel messages. Moreover if there is an alien
>   device, the user must use -o degraded and we do have the missing
>   kernel messages, and this will suffice to explain the state about the
>   fsid being mounted. And the alien fsid, its a stale we don't worry
>   about it. So no kernel messages.
> 
>>> And for EUCLEAN, it's more like graceful error out for impossible thing.
>>> This is definitely not the case, and I believe the original EINVAL makes
>>> more sense than EUCLEAN.
>>
>> I agree, EUCLEAN is wrong here.
>>
> 
>    I am ok with any other suitable. It does not matter. And the other
>    choice I have is ESTALE.
> 
>    But EINVAL is no go because we still want to keep the messed up device
>    unless there is a confirmation that its alien.
> 
>    In the same function we use EINVAL if the device/partition size
>    changed to smaller size after we have scanned the device. Which
>    means we still keep the device in the list. Its the user choice,
>    no need to meddle with it.
> 
>     bytenr = btrfs_sb_offset(copy_num);
>     if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode))
>            return -EINVAL;


    I didn't know that btrfs/197 btrfs/198 is failing.
    I found that we need this patch series so that it
    passes these tests. Logs for failing test cases are here [1].

    Any feedback if errno -ESTALE is better instead of -EUCLEAN? As
    mentioned I can't use -EINVAL because its already been used.

[1]
btrfs/197	- output mismatch (see /xfstests-dev/results//btrfs/197.out.bad)
     --- tests/btrfs/197.out	2020-01-08 18:08:44.040258593 +0800
     +++ /xfstests-dev/results//btrfs/197.out.bad	2020-01-15 
16:32:23.870187397 +0800
     @@ -3,23 +3,19 @@
      Label: none  uuid: <UUID>
      	Total devices <NUM> FS bytes used <SIZE>
      	devid <DEVID> size <SIZE> used <SIZE> path SCRATCH_DEV
     -	*** Some devices missing

      raid5
      Label: none  uuid: <UUID>
     ...
     (Run 'diff -u /xfstests-dev/tests/btrfs/197.out 
/xfstests-dev/results//btrfs/197.out.bad'  to see the entire diff)
btrfs/198	- output mismatch (see /xfstests-dev/results//btrfs/198.out.bad)
     --- tests/btrfs/198.out	2020-01-08 18:08:44.040258593 +0800
     +++ /xfstests-dev/results//btrfs/198.out.bad	2020-01-15 
16:32:28.882282030 +0800
     @@ -3,23 +3,19 @@
      Label: none  uuid: <UUID>
      	Total devices <NUM> FS bytes used <SIZE>
      	devid <DEVID> size <SIZE> used <SIZE> path SCRATCH_DEV
     -	*** Some devices missing

      raid5
      Label: none  uuid: <UUID>
     ...
     (Run 'diff -u /xfstests-dev/tests/btrfs/198.out 
/xfstests-dev/results//btrfs/198.out.bad'  to see the entire diff)

Thanks, Anand



>    But, EUCLEAN /* structure needs cleaning */ is errno which exactly
>    says whats needed here. Because an alien device is a kind of
>    corruption but it can easily happen due to unplanned device operations
>    in the userland. But as it happened before we assembled the volume so
>    its safe to clean and is not a road block when there is redundancy
>    with degraded option.
> 
> Thanks, Anand
>
diff mbox series

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 326d5281ad93..e05856432456 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3417,7 +3417,7 @@  int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
 	if (btrfs_super_bytenr(super) != bytenr ||
 		    btrfs_super_magic(super) != BTRFS_MAGIC) {
 		brelse(bh);
-		return -EINVAL;
+		return -EUCLEAN;
 	}
 
 	*bh_ret = bh;