diff mbox series

[3/4] btrfs: include non-missing as a qualifier for the latest_bdev

Message ID 1570175403-4073-4-git-send-email-anand.jain@oracle.com (mailing list archive)
State New, archived
Headers show
Series btrfs: fix issues due to alien device | expand

Commit Message

Anand Jain Oct. 4, 2019, 7:50 a.m. UTC
btrfs_free_extra_devids() reorgs fs_devices::latest_bdev
to point to the bdev with greatest device::generation number.
For a typical-missing device the generation number is zero so
fs_devices::latest_bdev will never point to it.

But if the missing device is due to alienating [1], then
device::generation is not-zero and if it is >= to rest of
device::generation in the list, then fs_devices::latest_bdev
ends up pointing to the missing device and reports the error
like this [2]

[1]
mkfs.btrfs -fq /dev/sdd && mount /dev/sdd /btrfs
mkfs.btrfs -fq -draid1 -mraid1 /dev/sdb /dev/sdc
sleep 3 # avoid racing with udev's useless scans if needed
btrfs dev add -f /dev/sdb /btrfs

mount -o degraded /dev/sdc /btrfs1

[2]
mount: wrong fs type, bad option, bad superblock on /dev/sdc,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

kernel: BTRFS warning (device sdc): devid 1 uuid 072a0192-675b-4d5a-8640-a5cf2b2c704d is missing
kernel: BTRFS error (device sdc): failed to read devices
kernel: BTRFS error (device sdc): open_ctree failed

Fix the root of the issue, by checking if the the device is not
missing before it can be a contender for the fs_devices::latest_bdev
title.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/volumes.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Nikolay Borisov Oct. 4, 2019, 8:11 a.m. UTC | #1
On 4.10.19 г. 10:50 ч., Anand Jain wrote:
> btrfs_free_extra_devids() reorgs fs_devices::latest_bdev
> to point to the bdev with greatest device::generation number.
> For a typical-missing device the generation number is zero so
> fs_devices::latest_bdev will never point to it.
> 
> But if the missing device is due to alienating [1], then
> device::generation is not-zero and if it is >= to rest of
> device::generation in the list, then fs_devices::latest_bdev
> ends up pointing to the missing device and reports the error
> like this [2]
> 
> [1]
> mkfs.btrfs -fq /dev/sdd && mount /dev/sdd /btrfs
> mkfs.btrfs -fq -draid1 -mraid1 /dev/sdb /dev/sdc
> sleep 3 # avoid racing with udev's useless scans if needed
> btrfs dev add -f /dev/sdb /btrfs

Hm, here I think the correct way is to refuse adding /dev/sdb to an
existing fs if it's detected to be part of a different one. I.e it
should require wipefs to be done.

> 
> mount -o degraded /dev/sdc /btrfs1
> 
> [2]
> mount: wrong fs type, bad option, bad superblock on /dev/sdc,
>        missing codepage or helper program, or other error
> 
>        In some cases useful info is found in syslog - try
>        dmesg | tail or so.
> 
> kernel: BTRFS warning (device sdc): devid 1 uuid 072a0192-675b-4d5a-8640-a5cf2b2c704d is missing
> kernel: BTRFS error (device sdc): failed to read devices
> kernel: BTRFS error (device sdc): open_ctree failed
> 
> Fix the root of the issue, by checking if the the device is not
> missing before it can be a contender for the fs_devices::latest_bdev
> title.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  fs/btrfs/volumes.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 05ade8c7342b..6c42048ec099 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1186,6 +1186,8 @@ void btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices, int step)
>  							&device->dev_state)) {
>  			if (!test_bit(BTRFS_DEV_STATE_REPLACE_TGT,
>  			     &device->dev_state) &&
> +			    !test_bit(BTRFS_DEV_STATE_MISSING,
> +			     &device->dev_state) &&
>  			     (!latest_dev ||
>  			      device->generation > latest_dev->generation)) {
>  				latest_dev = device;
>
Graham Cobb Oct. 4, 2019, 9:08 a.m. UTC | #2
On 04/10/2019 09:11, Nikolay Borisov wrote:
> 
> 
> On 4.10.19 г. 10:50 ч., Anand Jain wrote:
>> btrfs_free_extra_devids() reorgs fs_devices::latest_bdev
>> to point to the bdev with greatest device::generation number.
>> For a typical-missing device the generation number is zero so
>> fs_devices::latest_bdev will never point to it.
>>
>> But if the missing device is due to alienating [1], then
>> device::generation is not-zero and if it is >= to rest of
>> device::generation in the list, then fs_devices::latest_bdev
>> ends up pointing to the missing device and reports the error
>> like this [2]
>>
>> [1]
>> mkfs.btrfs -fq /dev/sdd && mount /dev/sdd /btrfs
>> mkfs.btrfs -fq -draid1 -mraid1 /dev/sdb /dev/sdc
>> sleep 3 # avoid racing with udev's useless scans if needed
>> btrfs dev add -f /dev/sdb /btrfs
> 
> Hm, here I think the correct way is to refuse adding /dev/sdb to an
> existing fs if it's detected to be part of a different one. I.e it
> should require wipefs to be done.

I disagree. -f means "force overwrite of existing filesystem on the
given disk(s)". It shouldn't be any different whether the existing fs is
btrfs or something else.

Graham
Nikolay Borisov Oct. 4, 2019, 9:20 a.m. UTC | #3
On 4.10.19 г. 12:08 ч., Graham Cobb wrote:
> On 04/10/2019 09:11, Nikolay Borisov wrote:
>>
>>
>> On 4.10.19 г. 10:50 ч., Anand Jain wrote:
>>> btrfs_free_extra_devids() reorgs fs_devices::latest_bdev
>>> to point to the bdev with greatest device::generation number.
>>> For a typical-missing device the generation number is zero so
>>> fs_devices::latest_bdev will never point to it.
>>>
>>> But if the missing device is due to alienating [1], then
>>> device::generation is not-zero and if it is >= to rest of
>>> device::generation in the list, then fs_devices::latest_bdev
>>> ends up pointing to the missing device and reports the error
>>> like this [2]
>>>
>>> [1]
>>> mkfs.btrfs -fq /dev/sdd && mount /dev/sdd /btrfs
>>> mkfs.btrfs -fq -draid1 -mraid1 /dev/sdb /dev/sdc
>>> sleep 3 # avoid racing with udev's useless scans if needed
>>> btrfs dev add -f /dev/sdb /btrfs
>>
>> Hm, here I think the correct way is to refuse adding /dev/sdb to an
>> existing fs if it's detected to be part of a different one. I.e it
>> should require wipefs to be done.
> 
> I disagree. -f means "force overwrite of existing filesystem on the
> given disk(s)". It shouldn't be any different whether the existing fs is
> btrfs or something else.

You are right, looking at btrfs device add implementation though it
seems that wipefs is being done on the device to be added:

cmd_device_add
 btrfs_prepare_device
  btrfs_wipe_existing_sb

So if the device to be added is formatted and then it goes through
btrfs_init_new_device which commits the transaction which in turns
rewrites the sb then why is the error happening in the first place?


> 
> Graham
>
diff mbox series

Patch

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 05ade8c7342b..6c42048ec099 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1186,6 +1186,8 @@  void btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices, int step)
 							&device->dev_state)) {
 			if (!test_bit(BTRFS_DEV_STATE_REPLACE_TGT,
 			     &device->dev_state) &&
+			    !test_bit(BTRFS_DEV_STATE_MISSING,
+			     &device->dev_state) &&
 			     (!latest_dev ||
 			      device->generation > latest_dev->generation)) {
 				latest_dev = device;