diff mbox series

[RFC,2/2] btrfs: fix replace of seed device

Message ID eb6040708e4f351ae668726862e3f112f64d8ab9.1598012410.git.anand.jain@oracle.com
State New, archived
Headers show
Series [1/2] btrfs: initialize sysfs devid and device link for seed device | expand

Commit Message

Anand Jain Aug. 21, 2020, 1:15 p.m. UTC
If you replace a seed device in a sprouted fs, it appears to have
successfully replaced the seed device, but if you look closely, it didn't.

Here is an example.

mkfs.btrfs -fq /dev/sda
btrfstune -S1 /dev/sda
mount /dev/sda /btrfs
btrfs dev add /dev/sdb /btrfs
umount /btrfs; btrfs dev scan --forget
mount -o device=/dev/sda /dev/sdb /btrfs
btrfs rep start -f /dev/sda /dev/sdc /btrfs; echo $?
0

  BTRFS info (device sdb): dev_replace from /dev/sda (devid 1) to /dev/sdc started
  BTRFS info (device sdb): dev_replace from /dev/sda (devid 1) to /dev/sdc finished

btrfs fi show
Label: none  uuid: ab2c88b7-be81-4a7e-9849-c3666e7f9f4f
	Total devices 2 FS bytes used 256.00KiB
	devid    1 size 3.00GiB used 520.00MiB path /dev/sdc
	devid    2 size 3.00GiB used 896.00MiB path /dev/sdb

Label: none  uuid: 10bd3202-0415-43af-96a8-d5409f310a7e
	Total devices 1 FS bytes used 128.00KiB
	devid    1 size 3.00GiB used 536.00MiB path /dev/sda

So as per the replace start command and kernel log replace was successful.

Now let's try to clean mount.

umount /btrfs;  btrfs dev scan --forget

mount -o device=/dev/sdc /dev/sdb /btrfs
mount: /btrfs: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error.

[  636.157517] BTRFS error (device sdc): failed to read chunk tree: -2
[  636.180177] BTRFS error (device sdc): open_ctree failed

That's because per dev items it is still looking for the original seed
device.

btrfs in dump-tree -d /dev/sdb

	item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
		devid 1 total_bytes 3221225472 bytes_used 545259520
		io_align 4096 io_width 4096 sector_size 4096 type 0
		generation 6 start_offset 0 dev_group 0
		seek_speed 0 bandwidth 0
		uuid 59368f50-9af2-4b17-91da-8a783cc418d4  <--- seed uuid
		fsid 10bd3202-0415-43af-96a8-d5409f310a7e  <--- seed fsid
	item 1 key (DEV_ITEMS DEV_ITEM 2) itemoff 16087 itemsize 98
		devid 2 total_bytes 3221225472 bytes_used 939524096
		io_align 4096 io_width 4096 sector_size 4096 type 0
		generation 0 start_offset 0 dev_group 0
		seek_speed 0 bandwidth 0
		uuid 56a0a6bc-4630-4998-8daf-3c3030c4256a  <- sprout uuid
		fsid ab2c88b7-be81-4a7e-9849-c3666e7f9f4f <- sprout fsid

But the replaced target has the following uuid+fsid in its superblock
which doesn't match with the expected uuid+fsid in its devitem.

btrfs in dump-super /dev/sdc | egrep '^generation|dev_item.uuid|dev_item.fsid|devid'
generation	20
dev_item.uuid	59368f50-9af2-4b17-91da-8a783cc418d4
dev_item.fsid	ab2c88b7-be81-4a7e-9849-c3666e7f9f4f [match]
dev_item.devid	1

So if you provide the original seed device the mount shall be successful.
Which so long happening in the test case btrfs/163.

btrfs dev scan --forget
mount -o device=/dev/sda /dev/sdb /btrfs

Fix in this patch:
Make it as you can't replace a seed device, you can only add a new device
and then delete the seed device. If replace is attempted then returns -EINVAL.
As in the below changes.

Another possible fix:
If we want to keep the ability to replace for seed-device, then we could
update the fsid of the replace-target blocks. And after replacement, you
have seed device but with sprout fsid. But then I don't know what is the
point and if there is any such use case.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---

To test this patch you need the patch 1/2.
Tested with fstests group volume

 fs/btrfs/dev-replace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Josef Bacik Aug. 21, 2020, 2:38 p.m. UTC | #1
On 8/21/20 9:15 AM, Anand Jain wrote:
> If you replace a seed device in a sprouted fs, it appears to have
> successfully replaced the seed device, but if you look closely, it didn't.
> 
> Here is an example.
> 
> mkfs.btrfs -fq /dev/sda
> btrfstune -S1 /dev/sda
> mount /dev/sda /btrfs
> btrfs dev add /dev/sdb /btrfs
> umount /btrfs; btrfs dev scan --forget
> mount -o device=/dev/sda /dev/sdb /btrfs
> btrfs rep start -f /dev/sda /dev/sdc /btrfs; echo $?
> 0
> 
>    BTRFS info (device sdb): dev_replace from /dev/sda (devid 1) to /dev/sdc started
>    BTRFS info (device sdb): dev_replace from /dev/sda (devid 1) to /dev/sdc finished
> 
> btrfs fi show
> Label: none  uuid: ab2c88b7-be81-4a7e-9849-c3666e7f9f4f
> 	Total devices 2 FS bytes used 256.00KiB
> 	devid    1 size 3.00GiB used 520.00MiB path /dev/sdc
> 	devid    2 size 3.00GiB used 896.00MiB path /dev/sdb
> 
> Label: none  uuid: 10bd3202-0415-43af-96a8-d5409f310a7e
> 	Total devices 1 FS bytes used 128.00KiB
> 	devid    1 size 3.00GiB used 536.00MiB path /dev/sda
> 
> So as per the replace start command and kernel log replace was successful.
> 
> Now let's try to clean mount.
> 
> umount /btrfs;  btrfs dev scan --forget
> 
> mount -o device=/dev/sdc /dev/sdb /btrfs
> mount: /btrfs: wrong fs type, bad option, bad superblock on /dev/sdb, missing codepage or helper program, or other error.
> 
> [  636.157517] BTRFS error (device sdc): failed to read chunk tree: -2
> [  636.180177] BTRFS error (device sdc): open_ctree failed
> 
> That's because per dev items it is still looking for the original seed
> device.
> 
> btrfs in dump-tree -d /dev/sdb
> 
> 	item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
> 		devid 1 total_bytes 3221225472 bytes_used 545259520
> 		io_align 4096 io_width 4096 sector_size 4096 type 0
> 		generation 6 start_offset 0 dev_group 0
> 		seek_speed 0 bandwidth 0
> 		uuid 59368f50-9af2-4b17-91da-8a783cc418d4  <--- seed uuid
> 		fsid 10bd3202-0415-43af-96a8-d5409f310a7e  <--- seed fsid
> 	item 1 key (DEV_ITEMS DEV_ITEM 2) itemoff 16087 itemsize 98
> 		devid 2 total_bytes 3221225472 bytes_used 939524096
> 		io_align 4096 io_width 4096 sector_size 4096 type 0
> 		generation 0 start_offset 0 dev_group 0
> 		seek_speed 0 bandwidth 0
> 		uuid 56a0a6bc-4630-4998-8daf-3c3030c4256a  <- sprout uuid
> 		fsid ab2c88b7-be81-4a7e-9849-c3666e7f9f4f <- sprout fsid
> 
> But the replaced target has the following uuid+fsid in its superblock
> which doesn't match with the expected uuid+fsid in its devitem.
> 
> btrfs in dump-super /dev/sdc | egrep '^generation|dev_item.uuid|dev_item.fsid|devid'
> generation	20
> dev_item.uuid	59368f50-9af2-4b17-91da-8a783cc418d4
> dev_item.fsid	ab2c88b7-be81-4a7e-9849-c3666e7f9f4f [match]
> dev_item.devid	1
> 
> So if you provide the original seed device the mount shall be successful.
> Which so long happening in the test case btrfs/163.
> 
> btrfs dev scan --forget
> mount -o device=/dev/sda /dev/sdb /btrfs
> 
> Fix in this patch:
> Make it as you can't replace a seed device, you can only add a new device
> and then delete the seed device. If replace is attempted then returns -EINVAL.
> As in the below changes.
> 
> Another possible fix:
> If we want to keep the ability to replace for seed-device, then we could
> update the fsid of the replace-target blocks. And after replacement, you
> have seed device but with sprout fsid. But then I don't know what is the
> point and if there is any such use case.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---

This is something Boris dug into until I told him to drop it.  I _think_ 
I'm ok with this, but really what I'd rather do is restripe the whole fs 
with the new UUID for this case.  Where did that redo the UUID work go? 
Thanks,

Josef
Anand Jain Aug. 23, 2020, 3:05 p.m. UTC | #2
On 21/8/20 10:38 pm, Josef Bacik wrote:
> On 8/21/20 9:15 AM, Anand Jain wrote:
>> If you replace a seed device in a sprouted fs, it appears to have
>> successfully replaced the seed device, but if you look closely, it 
>> didn't.
>>
>> Here is an example.
>>
>> mkfs.btrfs -fq /dev/sda
>> btrfstune -S1 /dev/sda
>> mount /dev/sda /btrfs
>> btrfs dev add /dev/sdb /btrfs
>> umount /btrfs; btrfs dev scan --forget
>> mount -o device=/dev/sda /dev/sdb /btrfs
>> btrfs rep start -f /dev/sda /dev/sdc /btrfs; echo $?
>> 0
>>
>>    BTRFS info (device sdb): dev_replace from /dev/sda (devid 1) to 
>> /dev/sdc started
>>    BTRFS info (device sdb): dev_replace from /dev/sda (devid 1) to 
>> /dev/sdc finished
>>
>> btrfs fi show
>> Label: none  uuid: ab2c88b7-be81-4a7e-9849-c3666e7f9f4f
>>     Total devices 2 FS bytes used 256.00KiB
>>     devid    1 size 3.00GiB used 520.00MiB path /dev/sdc
>>     devid    2 size 3.00GiB used 896.00MiB path /dev/sdb
>>
>> Label: none  uuid: 10bd3202-0415-43af-96a8-d5409f310a7e
>>     Total devices 1 FS bytes used 128.00KiB
>>     devid    1 size 3.00GiB used 536.00MiB path /dev/sda
>>
>> So as per the replace start command and kernel log replace was 
>> successful.
>>
>> Now let's try to clean mount.
>>
>> umount /btrfs;  btrfs dev scan --forget
>>
>> mount -o device=/dev/sdc /dev/sdb /btrfs
>> mount: /btrfs: wrong fs type, bad option, bad superblock on /dev/sdb, 
>> missing codepage or helper program, or other error.
>>
>> [  636.157517] BTRFS error (device sdc): failed to read chunk tree: -2
>> [  636.180177] BTRFS error (device sdc): open_ctree failed
>>
>> That's because per dev items it is still looking for the original seed
>> device.
>>
>> btrfs in dump-tree -d /dev/sdb
>>
>>     item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
>>         devid 1 total_bytes 3221225472 bytes_used 545259520
>>         io_align 4096 io_width 4096 sector_size 4096 type 0
>>         generation 6 start_offset 0 dev_group 0
>>         seek_speed 0 bandwidth 0
>>         uuid 59368f50-9af2-4b17-91da-8a783cc418d4  <--- seed uuid
>>         fsid 10bd3202-0415-43af-96a8-d5409f310a7e  <--- seed fsid
>>     item 1 key (DEV_ITEMS DEV_ITEM 2) itemoff 16087 itemsize 98
>>         devid 2 total_bytes 3221225472 bytes_used 939524096
>>         io_align 4096 io_width 4096 sector_size 4096 type 0
>>         generation 0 start_offset 0 dev_group 0
>>         seek_speed 0 bandwidth 0
>>         uuid 56a0a6bc-4630-4998-8daf-3c3030c4256a  <- sprout uuid
>>         fsid ab2c88b7-be81-4a7e-9849-c3666e7f9f4f <- sprout fsid
>>
>> But the replaced target has the following uuid+fsid in its superblock
>> which doesn't match with the expected uuid+fsid in its devitem.
>>
>> btrfs in dump-super /dev/sdc | egrep 
>> '^generation|dev_item.uuid|dev_item.fsid|devid'
>> generation    20
>> dev_item.uuid    59368f50-9af2-4b17-91da-8a783cc418d4
>> dev_item.fsid    ab2c88b7-be81-4a7e-9849-c3666e7f9f4f [match]
>> dev_item.devid    1
>>
>> So if you provide the original seed device the mount shall be successful.
>> Which so long happening in the test case btrfs/163.
>>
>> btrfs dev scan --forget
>> mount -o device=/dev/sda /dev/sdb /btrfs
>>
>> Fix in this patch:
>> Make it as you can't replace a seed device, you can only add a new device
>> and then delete the seed device. If replace is attempted then returns 
>> -EINVAL.
>> As in the below changes.
>>
>> Another possible fix:
>> If we want to keep the ability to replace for seed-device, then we could
>> update the fsid of the replace-target blocks. And after replacement, you
>> have seed device but with sprout fsid. But then I don't know what is the
>> point and if there is any such use case.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
> 
> This is something Boris dug into until I told him to drop it.  I _think_ 
> I'm ok with this, but really what I'd rather do is restripe the whole fs 
> with the new UUID for this case.  Where did that redo the UUID work go? 

Redo the UUID part is not yet implemented. Perhaps restripe the whole
fs- is another approach, both of these approaches have the same question
unanswered what use case does the replace of seed device solve and would
it be a redundant operation to device add and delete ops.

Thanks, Anand

> Thanks,
> 
> Josef
diff mbox series

Patch

diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 1d09bd3101e2..1aff10337b90 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -225,7 +225,7 @@  static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
 	int ret = 0;
 
 	*device_out = NULL;
-	if (fs_info->fs_devices->seeding) {
+	if (srcdev->fs_devices->seeding) {
 		btrfs_err(fs_info, "the filesystem is a seed filesystem!");
 		return -EINVAL;
 	}