diff mbox series

btrfs: reject device with CHANGING_FSID_V2

Message ID 02d59bdd7a8b778deb17e300354558498db59376.1692178060.git.anand.jain@oracle.com (mailing list archive)
State New, archived
Headers show
Series btrfs: reject device with CHANGING_FSID_V2 | expand

Commit Message

Anand Jain Aug. 16, 2023, 12:30 p.m. UTC
The BTRFS_SUPER_FLAG_CHANGING_FSID_V2 flag indicates a transient state
where the device in the userspace btrfstune -m|-M operation failed to
complete changing the fsid.

This flag makes the kernel to automatically determine the other
partner devices to which a given device can be associated, based on the
fsid, metadata_uuid and generation values.

btrfstune -m|M feature is especially useful in virtual cloud setups, where
compute instances (disk images) are quickly copied, fsid changed, and
launched. Given numerous disk images with the same metadata_uuid but
different fsid, there's no clear way a device can be correctly assembled
with the proper partners when the CHANGING_FSID_V2 flag is set. So, the
disk could be assembled incorrectly, as in the example below:

Before this patch:

Consider the following two filesystems:
   /dev/loop[2-3] are raw copies of /dev/loop[0-1] and the btrsftune -m
operation fails.

In this scenario, as the /dev/loop0's fsid change is interrupted, and the
CHANGING_FSID_V2 flag is set as shown below.

  $ p="device|devid|^metadata_uuid|^fsid|^incom|^generation|^flags"

  $ btrfs inspect dump-super /dev/loop0 | egrep '$p'
  superblock: bytenr=65536, device=/dev/loop0
  flags			0x1000000001
  fsid			7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
  metadata_uuid		bb040a9f-233a-4de2-ad84-49aa5a28059b
  generation		9
  num_devices		2
  incompat_flags	0x741
  dev_item.devid	1

  $ btrfs inspect dump-super /dev/loop1 | egrep '$p'
  superblock: bytenr=65536, device=/dev/loop1
  flags			0x1
  fsid			11d2af4d-1b71-45a9-83f6-f2100766939d
  metadata_uuid		bb040a9f-233a-4de2-ad84-49aa5a28059b
  generation		10
  num_devices		2
  incompat_flags	0x741
  dev_item.devid	2

  $ btrfs inspect dump-super /dev/loop2 | egrep '$p'
  superblock: bytenr=65536, device=/dev/loop2
  flags			0x1
  fsid			7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
  metadata_uuid		bb040a9f-233a-4de2-ad84-49aa5a28059b
  generation		8
  num_devices		2
  incompat_flags	0x741
  dev_item.devid	1

  $ btrfs inspect dump-super /dev/loop3 | egrep '$p'
  superblock: bytenr=65536, device=/dev/loop3
  flags			0x1
  fsid			7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
  metadata_uuid		bb040a9f-233a-4de2-ad84-49aa5a28059b
  generation		8
  num_devices		2
  incompat_flags	0x741
  dev_item.devid	2


It is normal that some devices aren't instantly discovered during
system boot or iSCSI discovery. The controlled scan below demonstrates
this.

  $ btrfs device scan --forget
  $ btrfs device scan /dev/loop0
  Scanning for btrfs filesystems on '/dev/loop0'
  $ mount /dev/loop3 /btrfs
  $ btrfs filesystem show -m
  Label: none  uuid: 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
	Total devices 2 FS bytes used 144.00KiB
	devid    1 size 300.00MiB used 48.00MiB path /dev/loop0
	devid    2 size 300.00MiB used 40.00MiB path /dev/loop3

/dev/loop0 and /dev/loop3 are incorrectly partnered.

This kernel patch removes functions and code connected to the
CHANGING_FSID_V2 flag.

With this patch, now devices with the CHANGING_FSID_V2 flag are rejected.
And its partner will fail to mount with the extra -o degraded option.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
Moreover, a btrfs-progs patch (below) has eliminated the use of the
CHANGING_FSID_V2 flag entirely:

   [PATCH RFC] btrfs-progs: btrfstune -m|M remove 2-stage commit

And we solve the compatability concerns as below:

  New-kernel new-progs - has no CHANGING_FSID_V2 flag.
  Old-kernel new-progs - has no CHANGING_FSID_V2 flag, kernel code unused.
  Old-kernel old-progs - bug may occur.
  New-kernel old-progs - Should use host with the newer btrfs-progs to fix.

For legacy systems to help fix such a condition in the userspace instead
we have the below patchset which ports of kernel's CHANGING_FSID_V2 code.

   [PATCH 00/16] btrfs-progs: recover from failed metadata_uuid

And if it couldn't fix in some cases, users can use manually reunite,
with the patchset:

   [PATCH 00/10] btrfs-progs: check and tune: add device and noscan options

 fs/btrfs/disk-io.c |  10 ---
 fs/btrfs/volumes.c | 166 ++++-----------------------------------------
 fs/btrfs/volumes.h |   1 -
 3 files changed, 13 insertions(+), 164 deletions(-)

Comments

David Sterba Aug. 17, 2023, 12:04 p.m. UTC | #1
On Wed, Aug 16, 2023 at 08:30:40PM +0800, Anand Jain wrote:
> The BTRFS_SUPER_FLAG_CHANGING_FSID_V2 flag indicates a transient state
> where the device in the userspace btrfstune -m|-M operation failed to
> complete changing the fsid.
> 
> This flag makes the kernel to automatically determine the other
> partner devices to which a given device can be associated, based on the
> fsid, metadata_uuid and generation values.
> 
> btrfstune -m|M feature is especially useful in virtual cloud setups, where
> compute instances (disk images) are quickly copied, fsid changed, and
> launched. Given numerous disk images with the same metadata_uuid but
> different fsid, there's no clear way a device can be correctly assembled
> with the proper partners when the CHANGING_FSID_V2 flag is set. So, the
> disk could be assembled incorrectly, as in the example below:
> 
> Before this patch:
> 
> Consider the following two filesystems:
>    /dev/loop[2-3] are raw copies of /dev/loop[0-1] and the btrsftune -m
> operation fails.
> 
> In this scenario, as the /dev/loop0's fsid change is interrupted, and the
> CHANGING_FSID_V2 flag is set as shown below.
> 
>   $ p="device|devid|^metadata_uuid|^fsid|^incom|^generation|^flags"
> 
>   $ btrfs inspect dump-super /dev/loop0 | egrep '$p'
>   superblock: bytenr=65536, device=/dev/loop0
>   flags			0x1000000001
>   fsid			7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
>   metadata_uuid		bb040a9f-233a-4de2-ad84-49aa5a28059b
>   generation		9
>   num_devices		2
>   incompat_flags	0x741
>   dev_item.devid	1
> 
>   $ btrfs inspect dump-super /dev/loop1 | egrep '$p'
>   superblock: bytenr=65536, device=/dev/loop1
>   flags			0x1
>   fsid			11d2af4d-1b71-45a9-83f6-f2100766939d
>   metadata_uuid		bb040a9f-233a-4de2-ad84-49aa5a28059b
>   generation		10
>   num_devices		2
>   incompat_flags	0x741
>   dev_item.devid	2
> 
>   $ btrfs inspect dump-super /dev/loop2 | egrep '$p'
>   superblock: bytenr=65536, device=/dev/loop2
>   flags			0x1
>   fsid			7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
>   metadata_uuid		bb040a9f-233a-4de2-ad84-49aa5a28059b
>   generation		8
>   num_devices		2
>   incompat_flags	0x741
>   dev_item.devid	1
> 
>   $ btrfs inspect dump-super /dev/loop3 | egrep '$p'
>   superblock: bytenr=65536, device=/dev/loop3
>   flags			0x1
>   fsid			7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
>   metadata_uuid		bb040a9f-233a-4de2-ad84-49aa5a28059b
>   generation		8
>   num_devices		2
>   incompat_flags	0x741
>   dev_item.devid	2
> 
> 
> It is normal that some devices aren't instantly discovered during
> system boot or iSCSI discovery. The controlled scan below demonstrates
> this.
> 
>   $ btrfs device scan --forget
>   $ btrfs device scan /dev/loop0
>   Scanning for btrfs filesystems on '/dev/loop0'
>   $ mount /dev/loop3 /btrfs
>   $ btrfs filesystem show -m
>   Label: none  uuid: 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
> 	Total devices 2 FS bytes used 144.00KiB
> 	devid    1 size 300.00MiB used 48.00MiB path /dev/loop0
> 	devid    2 size 300.00MiB used 40.00MiB path /dev/loop3
> 
> /dev/loop0 and /dev/loop3 are incorrectly partnered.
> 
> This kernel patch removes functions and code connected to the
> CHANGING_FSID_V2 flag.

I didn't have a closer look but it seems you're removing all the logic
to make the metadata uuid robust and usable in case of interrupted
conversion, while finding another case where it does not work as you
expect. With this it would be change in behaviour, we need to check
the original use case. IIRC as the metadata uuid change is lightweight
we want to try harder to deal with the easy errors instead of rejecting
the filesystem mount.
Anand Jain Aug. 17, 2023, 5:19 p.m. UTC | #2
On 17/8/23 20:04, David Sterba wrote:
> On Wed, Aug 16, 2023 at 08:30:40PM +0800, Anand Jain wrote:
>> The BTRFS_SUPER_FLAG_CHANGING_FSID_V2 flag indicates a transient state
>> where the device in the userspace btrfstune -m|-M operation failed to
>> complete changing the fsid.
>>
>> This flag makes the kernel to automatically determine the other
>> partner devices to which a given device can be associated, based on the
>> fsid, metadata_uuid and generation values.
>>
>> btrfstune -m|M feature is especially useful in virtual cloud setups, where
>> compute instances (disk images) are quickly copied, fsid changed, and
>> launched. Given numerous disk images with the same metadata_uuid but
>> different fsid, there's no clear way a device can be correctly assembled
>> with the proper partners when the CHANGING_FSID_V2 flag is set. So, the
>> disk could be assembled incorrectly, as in the example below:
>>
>> Before this patch:
>>
>> Consider the following two filesystems:
>>     /dev/loop[2-3] are raw copies of /dev/loop[0-1] and the btrsftune -m
>> operation fails.
>>
>> In this scenario, as the /dev/loop0's fsid change is interrupted, and the
>> CHANGING_FSID_V2 flag is set as shown below.
>>
>>    $ p="device|devid|^metadata_uuid|^fsid|^incom|^generation|^flags"
>>
>>    $ btrfs inspect dump-super /dev/loop0 | egrep '$p'
>>    superblock: bytenr=65536, device=/dev/loop0
>>    flags			0x1000000001
>>    fsid			7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
>>    metadata_uuid		bb040a9f-233a-4de2-ad84-49aa5a28059b
>>    generation		9
>>    num_devices		2
>>    incompat_flags	0x741
>>    dev_item.devid	1
>>
>>    $ btrfs inspect dump-super /dev/loop1 | egrep '$p'
>>    superblock: bytenr=65536, device=/dev/loop1
>>    flags			0x1
>>    fsid			11d2af4d-1b71-45a9-83f6-f2100766939d
>>    metadata_uuid		bb040a9f-233a-4de2-ad84-49aa5a28059b
>>    generation		10
>>    num_devices		2
>>    incompat_flags	0x741
>>    dev_item.devid	2
>>
>>    $ btrfs inspect dump-super /dev/loop2 | egrep '$p'
>>    superblock: bytenr=65536, device=/dev/loop2
>>    flags			0x1
>>    fsid			7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
>>    metadata_uuid		bb040a9f-233a-4de2-ad84-49aa5a28059b
>>    generation		8
>>    num_devices		2
>>    incompat_flags	0x741
>>    dev_item.devid	1
>>
>>    $ btrfs inspect dump-super /dev/loop3 | egrep '$p'
>>    superblock: bytenr=65536, device=/dev/loop3
>>    flags			0x1
>>    fsid			7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
>>    metadata_uuid		bb040a9f-233a-4de2-ad84-49aa5a28059b
>>    generation		8
>>    num_devices		2
>>    incompat_flags	0x741
>>    dev_item.devid	2
>>
>>
>> It is normal that some devices aren't instantly discovered during
>> system boot or iSCSI discovery. The controlled scan below demonstrates
>> this.
>>
>>    $ btrfs device scan --forget
>>    $ btrfs device scan /dev/loop0
>>    Scanning for btrfs filesystems on '/dev/loop0'
>>    $ mount /dev/loop3 /btrfs
>>    $ btrfs filesystem show -m
>>    Label: none  uuid: 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
>> 	Total devices 2 FS bytes used 144.00KiB
>> 	devid    1 size 300.00MiB used 48.00MiB path /dev/loop0
>> 	devid    2 size 300.00MiB used 40.00MiB path /dev/loop3
>>
>> /dev/loop0 and /dev/loop3 are incorrectly partnered.
>>
>> This kernel patch removes functions and code connected to the
>> CHANGING_FSID_V2 flag.
> 
> I didn't have a closer look but it seems you're removing all the logic
> to make the metadata uuid robust and usable in case of interrupted
> conversion, while finding another case where it does not work as you
> expect. With this it would be change in behaviour, we need to check
> the original use case. IIRC as the metadata uuid change is lightweight
> we want to try harder to deal with the easy errors instead of rejecting
> the filesystem mount.

Robust indeed. Silently assembling wrong devices-data loss risk?
Failing to assemble is still safe.

I think it is better to introduce a sub-command to clone btrfs
filesystem with a new device-uuid and same fsid (as it looks like
same fsid has some use case).

Thanks, Anand
Qu Wenruo Aug. 18, 2023, 12:21 a.m. UTC | #3
On 2023/8/18 01:19, Anand Jain wrote:
> 
> 
> On 17/8/23 20:04, David Sterba wrote:
>> On Wed, Aug 16, 2023 at 08:30:40PM +0800, Anand Jain wrote:
>>> The BTRFS_SUPER_FLAG_CHANGING_FSID_V2 flag indicates a transient state
>>> where the device in the userspace btrfstune -m|-M operation failed to
>>> complete changing the fsid.
>>>
>>> This flag makes the kernel to automatically determine the other
>>> partner devices to which a given device can be associated, based on the
>>> fsid, metadata_uuid and generation values.
>>>
>>> btrfstune -m|M feature is especially useful in virtual cloud setups, 
>>> where
>>> compute instances (disk images) are quickly copied, fsid changed, and
>>> launched. Given numerous disk images with the same metadata_uuid but
>>> different fsid, there's no clear way a device can be correctly assembled
>>> with the proper partners when the CHANGING_FSID_V2 flag is set. So, the
>>> disk could be assembled incorrectly, as in the example below:
>>>
>>> Before this patch:
>>>
>>> Consider the following two filesystems:
>>>     /dev/loop[2-3] are raw copies of /dev/loop[0-1] and the btrsftune -m
>>> operation fails.
>>>
>>> In this scenario, as the /dev/loop0's fsid change is interrupted, and 
>>> the
>>> CHANGING_FSID_V2 flag is set as shown below.
>>>
>>>    $ p="device|devid|^metadata_uuid|^fsid|^incom|^generation|^flags"
>>>
>>>    $ btrfs inspect dump-super /dev/loop0 | egrep '$p'
>>>    superblock: bytenr=65536, device=/dev/loop0
>>>    flags            0x1000000001
>>>    fsid            7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
>>>    metadata_uuid        bb040a9f-233a-4de2-ad84-49aa5a28059b
>>>    generation        9
>>>    num_devices        2
>>>    incompat_flags    0x741
>>>    dev_item.devid    1
>>>
>>>    $ btrfs inspect dump-super /dev/loop1 | egrep '$p'
>>>    superblock: bytenr=65536, device=/dev/loop1
>>>    flags            0x1
>>>    fsid            11d2af4d-1b71-45a9-83f6-f2100766939d
>>>    metadata_uuid        bb040a9f-233a-4de2-ad84-49aa5a28059b
>>>    generation        10
>>>    num_devices        2
>>>    incompat_flags    0x741
>>>    dev_item.devid    2
>>>
>>>    $ btrfs inspect dump-super /dev/loop2 | egrep '$p'
>>>    superblock: bytenr=65536, device=/dev/loop2
>>>    flags            0x1
>>>    fsid            7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
>>>    metadata_uuid        bb040a9f-233a-4de2-ad84-49aa5a28059b
>>>    generation        8
>>>    num_devices        2
>>>    incompat_flags    0x741
>>>    dev_item.devid    1
>>>
>>>    $ btrfs inspect dump-super /dev/loop3 | egrep '$p'
>>>    superblock: bytenr=65536, device=/dev/loop3
>>>    flags            0x1
>>>    fsid            7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
>>>    metadata_uuid        bb040a9f-233a-4de2-ad84-49aa5a28059b
>>>    generation        8
>>>    num_devices        2
>>>    incompat_flags    0x741
>>>    dev_item.devid    2
>>>
>>>
>>> It is normal that some devices aren't instantly discovered during
>>> system boot or iSCSI discovery. The controlled scan below demonstrates
>>> this.
>>>
>>>    $ btrfs device scan --forget
>>>    $ btrfs device scan /dev/loop0
>>>    Scanning for btrfs filesystems on '/dev/loop0'
>>>    $ mount /dev/loop3 /btrfs
>>>    $ btrfs filesystem show -m
>>>    Label: none  uuid: 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
>>>     Total devices 2 FS bytes used 144.00KiB
>>>     devid    1 size 300.00MiB used 48.00MiB path /dev/loop0
>>>     devid    2 size 300.00MiB used 40.00MiB path /dev/loop3
>>>
>>> /dev/loop0 and /dev/loop3 are incorrectly partnered.
>>>
>>> This kernel patch removes functions and code connected to the
>>> CHANGING_FSID_V2 flag.
>>
>> I didn't have a closer look but it seems you're removing all the logic
>> to make the metadata uuid robust and usable in case of interrupted
>> conversion, while finding another case where it does not work as you
>> expect. With this it would be change in behaviour, we need to check
>> the original use case. IIRC as the metadata uuid change is lightweight
>> we want to try harder to deal with the easy errors instead of rejecting
>> the filesystem mount.
> 
> Robust indeed. Silently assembling wrong devices-data loss risk?
> Failing to assemble is still safe.
> 
> I think it is better to introduce a sub-command to clone btrfs
> filesystem with a new device-uuid and same fsid (as it looks like
> same fsid has some use case).
> 
> Thanks, Anand

Oh, my memory comes back, the original design for the two stage 
commitment is to avoid split brain cases when one device is committed 
with the new flag, while the remaining one doesn't.

With the extra stage, even if at stage 1 or 2 the transaction is 
interrupted and only one device got the new flag, it can help us to 
locate the stage and recover.

Thanks,
Qu
Anand Jain Aug. 18, 2023, 8:27 a.m. UTC | #4
On 18/08/2023 08:21, Qu Wenruo wrote:
> 
> 
> On 2023/8/18 01:19, Anand Jain wrote:
>>
>>
>> On 17/8/23 20:04, David Sterba wrote:
>>> On Wed, Aug 16, 2023 at 08:30:40PM +0800, Anand Jain wrote:
>>>> The BTRFS_SUPER_FLAG_CHANGING_FSID_V2 flag indicates a transient state
>>>> where the device in the userspace btrfstune -m|-M operation failed to
>>>> complete changing the fsid.
>>>>
>>>> This flag makes the kernel to automatically determine the other
>>>> partner devices to which a given device can be associated, based on the
>>>> fsid, metadata_uuid and generation values.
>>>>
>>>> btrfstune -m|M feature is especially useful in virtual cloud setups, 
>>>> where
>>>> compute instances (disk images) are quickly copied, fsid changed, and
>>>> launched. Given numerous disk images with the same metadata_uuid but
>>>> different fsid, there's no clear way a device can be correctly 
>>>> assembled
>>>> with the proper partners when the CHANGING_FSID_V2 flag is set. So, the
>>>> disk could be assembled incorrectly, as in the example below:
>>>>
>>>> Before this patch:
>>>>
>>>> Consider the following two filesystems:
>>>>     /dev/loop[2-3] are raw copies of /dev/loop[0-1] and the 
>>>> btrsftune -m
>>>> operation fails.
>>>>
>>>> In this scenario, as the /dev/loop0's fsid change is interrupted, 
>>>> and the
>>>> CHANGING_FSID_V2 flag is set as shown below.
>>>>
>>>>    $ p="device|devid|^metadata_uuid|^fsid|^incom|^generation|^flags"
>>>>
>>>>    $ btrfs inspect dump-super /dev/loop0 | egrep '$p'
>>>>    superblock: bytenr=65536, device=/dev/loop0
>>>>    flags            0x1000000001
>>>>    fsid            7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
>>>>    metadata_uuid        bb040a9f-233a-4de2-ad84-49aa5a28059b
>>>>    generation        9
>>>>    num_devices        2
>>>>    incompat_flags    0x741
>>>>    dev_item.devid    1
>>>>
>>>>    $ btrfs inspect dump-super /dev/loop1 | egrep '$p'
>>>>    superblock: bytenr=65536, device=/dev/loop1
>>>>    flags            0x1
>>>>    fsid            11d2af4d-1b71-45a9-83f6-f2100766939d
>>>>    metadata_uuid        bb040a9f-233a-4de2-ad84-49aa5a28059b
>>>>    generation        10
>>>>    num_devices        2
>>>>    incompat_flags    0x741
>>>>    dev_item.devid    2
>>>>
>>>>    $ btrfs inspect dump-super /dev/loop2 | egrep '$p'
>>>>    superblock: bytenr=65536, device=/dev/loop2
>>>>    flags            0x1
>>>>    fsid            7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
>>>>    metadata_uuid        bb040a9f-233a-4de2-ad84-49aa5a28059b
>>>>    generation        8
>>>>    num_devices        2
>>>>    incompat_flags    0x741
>>>>    dev_item.devid    1
>>>>
>>>>    $ btrfs inspect dump-super /dev/loop3 | egrep '$p'
>>>>    superblock: bytenr=65536, device=/dev/loop3
>>>>    flags            0x1
>>>>    fsid            7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
>>>>    metadata_uuid        bb040a9f-233a-4de2-ad84-49aa5a28059b
>>>>    generation        8
>>>>    num_devices        2
>>>>    incompat_flags    0x741
>>>>    dev_item.devid    2
>>>>
>>>>
>>>> It is normal that some devices aren't instantly discovered during
>>>> system boot or iSCSI discovery. The controlled scan below demonstrates
>>>> this.
>>>>
>>>>    $ btrfs device scan --forget
>>>>    $ btrfs device scan /dev/loop0
>>>>    Scanning for btrfs filesystems on '/dev/loop0'
>>>>    $ mount /dev/loop3 /btrfs
>>>>    $ btrfs filesystem show -m
>>>>    Label: none  uuid: 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
>>>>     Total devices 2 FS bytes used 144.00KiB
>>>>     devid    1 size 300.00MiB used 48.00MiB path /dev/loop0
>>>>     devid    2 size 300.00MiB used 40.00MiB path /dev/loop3
>>>>
>>>> /dev/loop0 and /dev/loop3 are incorrectly partnered.
>>>>
>>>> This kernel patch removes functions and code connected to the
>>>> CHANGING_FSID_V2 flag.
>>>
>>> I didn't have a closer look but it seems you're removing all the logic
>>> to make the metadata uuid robust and usable in case of interrupted
>>> conversion, while finding another case where it does not work as you
>>> expect. With this it would be change in behaviour, we need to check
>>> the original use case. IIRC as the metadata uuid change is lightweight
>>> we want to try harder to deal with the easy errors instead of rejecting
>>> the filesystem mount.
>>
>> Robust indeed. Silently assembling wrong devices-data loss risk?
>> Failing to assemble is still safe.
>>
>> I think it is better to introduce a sub-command to clone btrfs
>> filesystem with a new device-uuid and same fsid (as it looks like
>> same fsid has some use case).
>>
>> Thanks, Anand
> 
> Oh, my memory comes back, the original design for the two stage 
> commitment is to avoid split brain cases when one device is committed 
> with the new flag, while the remaining one doesn't.
> 
> With the extra stage, even if at stage 1 or 2 the transaction is 
> interrupted and only one device got the new flag, it can help us to 
> locate the stage and recover.

As this comment is about the btrfstune patch

  [PATCH RFC] btrfs-progs: btrfstune -m|M remove 2-stage commit

Let's discuss it there.

Thanks.
David Sterba Sept. 18, 2023, 10:44 p.m. UTC | #5
On Wed, Aug 16, 2023 at 08:30:40PM +0800, Anand Jain wrote:
> The BTRFS_SUPER_FLAG_CHANGING_FSID_V2 flag indicates a transient state
> where the device in the userspace btrfstune -m|-M operation failed to
> complete changing the fsid.
> 
> This flag makes the kernel to automatically determine the other
> partner devices to which a given device can be associated, based on the
> fsid, metadata_uuid and generation values.
> 
> btrfstune -m|M feature is especially useful in virtual cloud setups, where
> compute instances (disk images) are quickly copied, fsid changed, and
> launched. Given numerous disk images with the same metadata_uuid but
> different fsid, there's no clear way a device can be correctly assembled
> with the proper partners when the CHANGING_FSID_V2 flag is set. So, the
> disk could be assembled incorrectly, as in the example below:
> 
> Before this patch:
> 
> Consider the following two filesystems:
>    /dev/loop[2-3] are raw copies of /dev/loop[0-1] and the btrsftune -m
> operation fails.
> 
> In this scenario, as the /dev/loop0's fsid change is interrupted, and the
> CHANGING_FSID_V2 flag is set as shown below.
> 
>   $ p="device|devid|^metadata_uuid|^fsid|^incom|^generation|^flags"
> 
>   $ btrfs inspect dump-super /dev/loop0 | egrep '$p'
>   superblock: bytenr=65536, device=/dev/loop0
>   flags			0x1000000001
>   fsid			7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
>   metadata_uuid		bb040a9f-233a-4de2-ad84-49aa5a28059b
>   generation		9
>   num_devices		2
>   incompat_flags	0x741
>   dev_item.devid	1
> 
>   $ btrfs inspect dump-super /dev/loop1 | egrep '$p'
>   superblock: bytenr=65536, device=/dev/loop1
>   flags			0x1
>   fsid			11d2af4d-1b71-45a9-83f6-f2100766939d
>   metadata_uuid		bb040a9f-233a-4de2-ad84-49aa5a28059b
>   generation		10
>   num_devices		2
>   incompat_flags	0x741
>   dev_item.devid	2
> 
>   $ btrfs inspect dump-super /dev/loop2 | egrep '$p'
>   superblock: bytenr=65536, device=/dev/loop2
>   flags			0x1
>   fsid			7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
>   metadata_uuid		bb040a9f-233a-4de2-ad84-49aa5a28059b
>   generation		8
>   num_devices		2
>   incompat_flags	0x741
>   dev_item.devid	1
> 
>   $ btrfs inspect dump-super /dev/loop3 | egrep '$p'
>   superblock: bytenr=65536, device=/dev/loop3
>   flags			0x1
>   fsid			7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
>   metadata_uuid		bb040a9f-233a-4de2-ad84-49aa5a28059b
>   generation		8
>   num_devices		2
>   incompat_flags	0x741
>   dev_item.devid	2
> 
> 
> It is normal that some devices aren't instantly discovered during
> system boot or iSCSI discovery. The controlled scan below demonstrates
> this.
> 
>   $ btrfs device scan --forget
>   $ btrfs device scan /dev/loop0
>   Scanning for btrfs filesystems on '/dev/loop0'
>   $ mount /dev/loop3 /btrfs
>   $ btrfs filesystem show -m
>   Label: none  uuid: 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
> 	Total devices 2 FS bytes used 144.00KiB
> 	devid    1 size 300.00MiB used 48.00MiB path /dev/loop0
> 	devid    2 size 300.00MiB used 40.00MiB path /dev/loop3
> 
> /dev/loop0 and /dev/loop3 are incorrectly partnered.
> 
> This kernel patch removes functions and code connected to the
> CHANGING_FSID_V2 flag.
> 
> With this patch, now devices with the CHANGING_FSID_V2 flag are rejected.
> And its partner will fail to mount with the extra -o degraded option.
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
> Moreover, a btrfs-progs patch (below) has eliminated the use of the
> CHANGING_FSID_V2 flag entirely:
> 
>    [PATCH RFC] btrfs-progs: btrfstune -m|M remove 2-stage commit
> 
> And we solve the compatability concerns as below:
> 
>   New-kernel new-progs - has no CHANGING_FSID_V2 flag.
>   Old-kernel new-progs - has no CHANGING_FSID_V2 flag, kernel code unused.
>   Old-kernel old-progs - bug may occur.
>   New-kernel old-progs - Should use host with the newer btrfs-progs to fix.
> 
> For legacy systems to help fix such a condition in the userspace instead
> we have the below patchset which ports of kernel's CHANGING_FSID_V2 code.
> 
>    [PATCH 00/16] btrfs-progs: recover from failed metadata_uuid
> 
> And if it couldn't fix in some cases, users can use manually reunite,
> with the patchset:
> 
>    [PATCH 00/10] btrfs-progs: check and tune: add device and noscan options
> 
>  fs/btrfs/disk-io.c |  10 ---
>  fs/btrfs/volumes.c | 166 ++++-----------------------------------------
>  fs/btrfs/volumes.h |   1 -
>  3 files changed, 13 insertions(+), 164 deletions(-)

Please split the kernel patch in two, one rejecting the CHANGING_FSID_V2
bit and then removing the unused code. I think the scanning code still
has to recognize the bit and skip the device, I haven't checked if
remains like that after this patch.
Anand Jain Sept. 19, 2023, 11:40 a.m. UTC | #6
On 19/09/2023 06:44, David Sterba wrote:
> On Wed, Aug 16, 2023 at 08:30:40PM +0800, Anand Jain wrote:
>> The BTRFS_SUPER_FLAG_CHANGING_FSID_V2 flag indicates a transient state
>> where the device in the userspace btrfstune -m|-M operation failed to
>> complete changing the fsid.
>>
>> This flag makes the kernel to automatically determine the other
>> partner devices to which a given device can be associated, based on the
>> fsid, metadata_uuid and generation values.
>>
>> btrfstune -m|M feature is especially useful in virtual cloud setups, where
>> compute instances (disk images) are quickly copied, fsid changed, and
>> launched. Given numerous disk images with the same metadata_uuid but
>> different fsid, there's no clear way a device can be correctly assembled
>> with the proper partners when the CHANGING_FSID_V2 flag is set. So, the
>> disk could be assembled incorrectly, as in the example below:
>>
>> Before this patch:
>>
>> Consider the following two filesystems:
>>     /dev/loop[2-3] are raw copies of /dev/loop[0-1] and the btrsftune -m
>> operation fails.
>>
>> In this scenario, as the /dev/loop0's fsid change is interrupted, and the
>> CHANGING_FSID_V2 flag is set as shown below.
>>
>>    $ p="device|devid|^metadata_uuid|^fsid|^incom|^generation|^flags"
>>
>>    $ btrfs inspect dump-super /dev/loop0 | egrep '$p'
>>    superblock: bytenr=65536, device=/dev/loop0
>>    flags			0x1000000001
>>    fsid			7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
>>    metadata_uuid		bb040a9f-233a-4de2-ad84-49aa5a28059b
>>    generation		9
>>    num_devices		2
>>    incompat_flags	0x741
>>    dev_item.devid	1
>>
>>    $ btrfs inspect dump-super /dev/loop1 | egrep '$p'
>>    superblock: bytenr=65536, device=/dev/loop1
>>    flags			0x1
>>    fsid			11d2af4d-1b71-45a9-83f6-f2100766939d
>>    metadata_uuid		bb040a9f-233a-4de2-ad84-49aa5a28059b
>>    generation		10
>>    num_devices		2
>>    incompat_flags	0x741
>>    dev_item.devid	2
>>
>>    $ btrfs inspect dump-super /dev/loop2 | egrep '$p'
>>    superblock: bytenr=65536, device=/dev/loop2
>>    flags			0x1
>>    fsid			7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
>>    metadata_uuid		bb040a9f-233a-4de2-ad84-49aa5a28059b
>>    generation		8
>>    num_devices		2
>>    incompat_flags	0x741
>>    dev_item.devid	1
>>
>>    $ btrfs inspect dump-super /dev/loop3 | egrep '$p'
>>    superblock: bytenr=65536, device=/dev/loop3
>>    flags			0x1
>>    fsid			7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
>>    metadata_uuid		bb040a9f-233a-4de2-ad84-49aa5a28059b
>>    generation		8
>>    num_devices		2
>>    incompat_flags	0x741
>>    dev_item.devid	2
>>
>>
>> It is normal that some devices aren't instantly discovered during
>> system boot or iSCSI discovery. The controlled scan below demonstrates
>> this.
>>
>>    $ btrfs device scan --forget
>>    $ btrfs device scan /dev/loop0
>>    Scanning for btrfs filesystems on '/dev/loop0'
>>    $ mount /dev/loop3 /btrfs
>>    $ btrfs filesystem show -m
>>    Label: none  uuid: 7d4b4b93-2b27-4432-b4e4-4be1fbccbd45
>> 	Total devices 2 FS bytes used 144.00KiB
>> 	devid    1 size 300.00MiB used 48.00MiB path /dev/loop0
>> 	devid    2 size 300.00MiB used 40.00MiB path /dev/loop3
>>
>> /dev/loop0 and /dev/loop3 are incorrectly partnered.
>>
>> This kernel patch removes functions and code connected to the
>> CHANGING_FSID_V2 flag.
>>
>> With this patch, now devices with the CHANGING_FSID_V2 flag are rejected.
>> And its partner will fail to mount with the extra -o degraded option.
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>> Moreover, a btrfs-progs patch (below) has eliminated the use of the
>> CHANGING_FSID_V2 flag entirely:
>>
>>     [PATCH RFC] btrfs-progs: btrfstune -m|M remove 2-stage commit
>>
>> And we solve the compatability concerns as below:
>>
>>    New-kernel new-progs - has no CHANGING_FSID_V2 flag.
>>    Old-kernel new-progs - has no CHANGING_FSID_V2 flag, kernel code unused.
>>    Old-kernel old-progs - bug may occur.
>>    New-kernel old-progs - Should use host with the newer btrfs-progs to fix.
>>
>> For legacy systems to help fix such a condition in the userspace instead
>> we have the below patchset which ports of kernel's CHANGING_FSID_V2 code.
>>
>>     [PATCH 00/16] btrfs-progs: recover from failed metadata_uuid
>>
>> And if it couldn't fix in some cases, users can use manually reunite,
>> with the patchset:
>>
>>     [PATCH 00/10] btrfs-progs: check and tune: add device and noscan options
>>
>>   fs/btrfs/disk-io.c |  10 ---
>>   fs/btrfs/volumes.c | 166 ++++-----------------------------------------
>>   fs/btrfs/volumes.h |   1 -
>>   3 files changed, 13 insertions(+), 164 deletions(-)
> 
> Please split the kernel patch in two, one rejecting the CHANGING_FSID_V2
> bit and then removing the unused code.

Yep. Will do.

> I think the scanning code still
> has to recognize the bit and skip the device, I haven't checked if
> remains like that after this patch.

Right. This patch does it. We only have to do it at device_list_add()
because mount, device-scan, and device-ready all rely on
device_list_add() to scan.

Thanks, Anand
diff mbox series

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9858c77b99e6..98e73805f3fc 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3156,7 +3156,6 @@  int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 	u32 nodesize;
 	u32 stripesize;
 	u64 generation;
-	u64 features;
 	u16 csum_type;
 	struct btrfs_super_block *disk_super;
 	struct btrfs_fs_info *fs_info = btrfs_sb(sb);
@@ -3238,15 +3237,6 @@  int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 
 	disk_super = fs_info->super_copy;
 
-
-	features = btrfs_super_flags(disk_super);
-	if (features & BTRFS_SUPER_FLAG_CHANGING_FSID_V2) {
-		features &= ~BTRFS_SUPER_FLAG_CHANGING_FSID_V2;
-		btrfs_set_super_flags(disk_super, features);
-		btrfs_info(fs_info,
-			"found metadata UUID change in progress flag, clearing");
-	}
-
 	memcpy(fs_info->super_for_commit, fs_info->super_copy,
 	       sizeof(*fs_info->super_for_commit));
 
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 852590e06d76..e3a83f779558 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -453,58 +453,6 @@  static noinline struct btrfs_fs_devices *find_fsid(
 	return NULL;
 }
 
-/*
- * First check if the metadata_uuid is different from the fsid in the given
- * fs_devices. Then check if the given fsid is the same as the metadata_uuid
- * in the fs_devices. If it is, return true; otherwise, return false.
- */
-static inline bool check_fsid_changed(const struct btrfs_fs_devices *fs_devices,
-				      const u8 *fsid)
-{
-	return memcmp(fs_devices->fsid, fs_devices->metadata_uuid,
-		      BTRFS_FSID_SIZE) != 0 &&
-	       memcmp(fs_devices->metadata_uuid, fsid, BTRFS_FSID_SIZE) == 0;
-}
-
-static struct btrfs_fs_devices *find_fsid_with_metadata_uuid(
-				struct btrfs_super_block *disk_super)
-{
-
-	struct btrfs_fs_devices *fs_devices;
-
-	/*
-	 * Handle scanned device having completed its fsid change but
-	 * belonging to a fs_devices that was created by first scanning
-	 * a device which didn't have its fsid/metadata_uuid changed
-	 * at all and the CHANGING_FSID_V2 flag set.
-	 */
-	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
-		if (!fs_devices->fsid_change)
-			continue;
-
-		if (match_fsid_fs_devices(fs_devices, disk_super->metadata_uuid,
-					  fs_devices->fsid))
-			return fs_devices;
-	}
-
-	/*
-	 * Handle scanned device having completed its fsid change but
-	 * belonging to a fs_devices that was created by a device that
-	 * has an outdated pair of fsid/metadata_uuid and
-	 * CHANGING_FSID_V2 flag set.
-	 */
-	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
-		if (!fs_devices->fsid_change)
-			continue;
-
-		if (check_fsid_changed(fs_devices, disk_super->metadata_uuid))
-			return fs_devices;
-	}
-
-	return find_fsid(disk_super->fsid, disk_super->metadata_uuid);
-}
-
-
 static int
 btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
 		      int flush, struct block_device **bdev,
@@ -685,84 +633,6 @@  u8 *btrfs_sb_fsid_ptr(struct btrfs_super_block *sb)
 	return has_metadata_uuid ? sb->metadata_uuid : sb->fsid;
 }
 
-/*
- * Handle scanned device having its CHANGING_FSID_V2 flag set and the fs_devices
- * being created with a disk that has already completed its fsid change. Such
- * disk can belong to an fs which has its FSID changed or to one which doesn't.
- * Handle both cases here.
- */
-static struct btrfs_fs_devices *find_fsid_inprogress(
-					struct btrfs_super_block *disk_super)
-{
-	struct btrfs_fs_devices *fs_devices;
-
-	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
-		if (fs_devices->fsid_change)
-			continue;
-
-		if (check_fsid_changed(fs_devices,  disk_super->fsid))
-			return fs_devices;
-	}
-
-	return find_fsid(disk_super->fsid, NULL);
-}
-
-static struct btrfs_fs_devices *find_fsid_changed(
-					struct btrfs_super_block *disk_super)
-{
-	struct btrfs_fs_devices *fs_devices;
-
-	/*
-	 * Handles the case where scanned device is part of an fs that had
-	 * multiple successful changes of FSID but currently device didn't
-	 * observe it. Meaning our fsid will be different than theirs. We need
-	 * to handle two subcases :
-	 *  1 - The fs still continues to have different METADATA/FSID uuids.
-	 *  2 - The fs is switched back to its original FSID (METADATA/FSID
-	 *  are equal).
-	 */
-	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
-		/* Changed UUIDs */
-		if (check_fsid_changed(fs_devices, disk_super->metadata_uuid) &&
-		    memcmp(fs_devices->fsid, disk_super->fsid,
-			   BTRFS_FSID_SIZE) != 0)
-			return fs_devices;
-
-		/* Unchanged UUIDs */
-		if (memcmp(fs_devices->metadata_uuid, fs_devices->fsid,
-			   BTRFS_FSID_SIZE) == 0 &&
-		    memcmp(fs_devices->fsid, disk_super->metadata_uuid,
-			   BTRFS_FSID_SIZE) == 0)
-			return fs_devices;
-	}
-
-	return NULL;
-}
-
-static struct btrfs_fs_devices *find_fsid_reverted_metadata(
-				struct btrfs_super_block *disk_super)
-{
-	struct btrfs_fs_devices *fs_devices;
-
-	/*
-	 * Handle the case where the scanned device is part of an fs whose last
-	 * metadata UUID change reverted it to the original FSID. At the same
-	 * time fs_devices was first created by another constituent device
-	 * which didn't fully observe the operation. This results in an
-	 * btrfs_fs_devices created with metadata/fsid different AND
-	 * btrfs_fs_devices::fsid_change set AND the metadata_uuid of the
-	 * fs_devices equal to the FSID of the disk.
-	 */
-	list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
-		if (!fs_devices->fsid_change)
-			continue;
-
-		if (check_fsid_changed(fs_devices, disk_super->fsid))
-			return fs_devices;
-	}
-
-	return NULL;
-}
 /*
  * Add new device to list of registered devices
  *
@@ -783,8 +653,14 @@  static noinline struct btrfs_device *device_list_add(const char *path,
 	int error;
 	bool has_metadata_uuid = (btrfs_super_incompat_flags(disk_super) &
 		BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
-	bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
-					BTRFS_SUPER_FLAG_CHANGING_FSID_V2);
+
+	if ((btrfs_super_flags(disk_super) &
+	    BTRFS_SUPER_FLAG_CHANGING_FSID_V2)) {
+		btrfs_err(NULL,
+"device %s has incomplete FSID changes please use btrfstune to complete",
+			  path);
+		return ERR_PTR(-EINVAL);
+	}
 
 	error = lookup_bdev(path, &path_devt);
 	if (error) {
@@ -793,20 +669,13 @@  static noinline struct btrfs_device *device_list_add(const char *path,
 		return ERR_PTR(error);
 	}
 
-	if (fsid_change_in_progress) {
-		if (!has_metadata_uuid)
-			fs_devices = find_fsid_inprogress(disk_super);
-		else
-			fs_devices = find_fsid_changed(disk_super);
-	} else if (has_metadata_uuid) {
-		fs_devices = find_fsid_with_metadata_uuid(disk_super);
+	if (has_metadata_uuid) {
+		fs_devices = find_fsid(disk_super->fsid,
+				       disk_super->metadata_uuid);
 	} else {
-		fs_devices = find_fsid_reverted_metadata(disk_super);
-		if (!fs_devices)
-			fs_devices = find_fsid(disk_super->fsid, NULL);
+		fs_devices = find_fsid(disk_super->fsid, NULL);
 	}
 
-
 	if (!fs_devices) {
 		fs_devices = alloc_fs_devices(disk_super->fsid);
 		if (has_metadata_uuid)
@@ -816,8 +685,6 @@  static noinline struct btrfs_device *device_list_add(const char *path,
 		if (IS_ERR(fs_devices))
 			return ERR_CAST(fs_devices);
 
-		fs_devices->fsid_change = fsid_change_in_progress;
-
 		mutex_lock(&fs_devices->device_list_mutex);
 		list_add(&fs_devices->fs_list, &fs_uuids);
 
@@ -831,18 +698,11 @@  static noinline struct btrfs_device *device_list_add(const char *path,
 		mutex_lock(&fs_devices->device_list_mutex);
 		device = btrfs_find_device(fs_devices, &args);
 
-		/*
-		 * If this disk has been pulled into an fs devices created by
-		 * a device which had the CHANGING_FSID_V2 flag then replace the
-		 * metadata_uuid/fsid values of the fs_devices.
-		 */
-		if (fs_devices->fsid_change &&
-		    found_transid > fs_devices->latest_generation) {
+		if (found_transid > fs_devices->latest_generation) {
 			memcpy(fs_devices->fsid, disk_super->fsid,
 					BTRFS_FSID_SIZE);
 			memcpy(fs_devices->metadata_uuid,
 			       btrfs_sb_fsid_ptr(disk_super), BTRFS_FSID_SIZE);
-			fs_devices->fsid_change = false;
 		}
 	}
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 2128a032c3b7..6672ca13eda5 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -353,7 +353,6 @@  struct btrfs_fs_devices {
 	bool rotating;
 	/* Devices support TRIM/discard commands. */
 	bool discardable;
-	bool fsid_change;
 	/* The filesystem is a seed filesystem. */
 	bool seeding;