mbox series

[0/2] btrfs: support cloned-device mount capability

Message ID cover.1695826320.git.anand.jain@oracle.com (mailing list archive)
Headers show
Series btrfs: support cloned-device mount capability | expand

Message

Anand Jain Sept. 28, 2023, 1:09 a.m. UTC
Guilherme's previous work [1] aimed at the mounting of cloned devices
using a superblock flag SINGLE_DEV during mkfs.
 [1] https://lore.kernel.org/linux-btrfs/20230831001544.3379273-1-gpiccoli@igalia.com/

Building upon this work, here is in-memory only approach. As it mounts
we determine if the same fsid is already mounted if then we generate a
random temp fsid which shall be used the mount, in-memory only not
written to the disk. We distinguish device by devt.

Mount option / superblock flag:
-------------------------------
 These patches show we don't have to limit the single-device / temp_fsid
capability with a mount option or a superblock flag from the btrfs
internals pov. However, if necessary from the user's perspective,
we can add them later on top of this patch. I've prepared a mount option
-o temp_fsid patch, but I'm not included at this time. As most of the
tests was without it.

Compatible with other features that may be affected:
----------------------------------------------------
 Multi device:
    A btrfs filesytem on a single device can be copied using dd and
    mounted simlutaneously. However, a multi device btrfs copied using
    dd and trying to mount simlutaneously is forced to fail:

      mount: /btrfs1: mount(2) system call failed: File exists.

 Send and receive:
    Quick tests shows send and receive between two single devices with
    the same fsid mounted on the _same_ host works!.
    (Also, the receive-mnt can receive from multiple senders as long as
    conflits are managed externally. ;-).)

 Replace: 
     Works fine.

btrfs-progs:
------------
 btrfs-progs needs to be updated to support the commands such as

	btrfs filesystem show

 when devices are not mounted. So the device list is not based on
 the fisd any more.

Testing:
-------
 This patch has been under testing for some time. The challenge is to get
 the fstests to test this reasonably well.

 As of now, this patch runs fine on a large set of fstests test cases
 using a custom-built mkfs.btrfs with the -U option and a new -P option
 to copy the device FSID and UUID from the TEST_DEV to the SCRATCH_DEV
 at the scratch_mkfs time. For example:

  Config file:

     config_fsid=$(btrfs in dump-super $TEST_DEV | grep -E ^fsid | \
							awk '{print $2}')
     config_uuid=$(btrfs in dump-super $TEST_DEV | \
				grep -E ^dev_item.uuid | awk '{print $2}')
     MKFS_OPTIONS="-U $config_fsid -P $config_uuid"

 This configuration option ensures that both TEST_DEV and SCRATCH_DEV will
 have the same FSID and device UUID while still applying test-specific
 scratch mkfs options.

Mkfs.btrfs:
-----------
 mkfs.btrfs needs to be updated to support the -P option for testing only.

   btrfs-progs: allow duplicate fsid for single device
   btrfs-progs: add mkfs -P option for dev_uuid

Anand Jain (2):
  btrfs: add helper function find_fsid_by_disk
  btrfs: support cloned-device mount capability

 fs/btrfs/disk-io.c |  3 +-
 fs/btrfs/volumes.c | 75 +++++++++++++++++++++++++++++++++++++++++++---
 fs/btrfs/volumes.h |  2 ++
 3 files changed, 75 insertions(+), 5 deletions(-)

Comments

Anand Jain Oct. 2, 2023, 12:52 p.m. UTC | #1
Gentle ping on any comments

Thanks, Anand


On 28/09/2023 09:09, Anand Jain wrote:
> Guilherme's previous work [1] aimed at the mounting of cloned devices
> using a superblock flag SINGLE_DEV during mkfs.
>   [1] https://lore.kernel.org/linux-btrfs/20230831001544.3379273-1-gpiccoli@igalia.com/
> 
> Building upon this work, here is in-memory only approach. As it mounts
> we determine if the same fsid is already mounted if then we generate a
> random temp fsid which shall be used the mount, in-memory only not
> written to the disk. We distinguish device by devt.
> 
> Mount option / superblock flag:
> -------------------------------
>   These patches show we don't have to limit the single-device / temp_fsid
> capability with a mount option or a superblock flag from the btrfs
> internals pov. However, if necessary from the user's perspective,
> we can add them later on top of this patch. I've prepared a mount option
> -o temp_fsid patch, but I'm not included at this time. As most of the
> tests was without it.
> 
> Compatible with other features that may be affected:
> ----------------------------------------------------
>   Multi device:
>      A btrfs filesytem on a single device can be copied using dd and
>      mounted simlutaneously. However, a multi device btrfs copied using
>      dd and trying to mount simlutaneously is forced to fail:
> 
>        mount: /btrfs1: mount(2) system call failed: File exists.
> 
>   Send and receive:
>      Quick tests shows send and receive between two single devices with
>      the same fsid mounted on the _same_ host works!.
>      (Also, the receive-mnt can receive from multiple senders as long as
>      conflits are managed externally. ;-).)
> 
>   Replace:
>       Works fine.
> 
> btrfs-progs:
> ------------
>   btrfs-progs needs to be updated to support the commands such as
> 
> 	btrfs filesystem show
> 
>   when devices are not mounted. So the device list is not based on
>   the fisd any more.
> 
> Testing:
> -------
>   This patch has been under testing for some time. The challenge is to get
>   the fstests to test this reasonably well.
> 
>   As of now, this patch runs fine on a large set of fstests test cases
>   using a custom-built mkfs.btrfs with the -U option and a new -P option
>   to copy the device FSID and UUID from the TEST_DEV to the SCRATCH_DEV
>   at the scratch_mkfs time. For example:
> 
>    Config file:
> 
>       config_fsid=$(btrfs in dump-super $TEST_DEV | grep -E ^fsid | \
> 							awk '{print $2}')
>       config_uuid=$(btrfs in dump-super $TEST_DEV | \
> 				grep -E ^dev_item.uuid | awk '{print $2}')
>       MKFS_OPTIONS="-U $config_fsid -P $config_uuid"
> 
>   This configuration option ensures that both TEST_DEV and SCRATCH_DEV will
>   have the same FSID and device UUID while still applying test-specific
>   scratch mkfs options.
> 
> Mkfs.btrfs:
> -----------
>   mkfs.btrfs needs to be updated to support the -P option for testing only.
> 
>     btrfs-progs: allow duplicate fsid for single device
>     btrfs-progs: add mkfs -P option for dev_uuid
> 
> Anand Jain (2):
>    btrfs: add helper function find_fsid_by_disk
>    btrfs: support cloned-device mount capability
> 
>   fs/btrfs/disk-io.c |  3 +-
>   fs/btrfs/volumes.c | 75 +++++++++++++++++++++++++++++++++++++++++++---
>   fs/btrfs/volumes.h |  2 ++
>   3 files changed, 75 insertions(+), 5 deletions(-)
>
David Sterba Oct. 2, 2023, 1 p.m. UTC | #2
On Thu, Sep 28, 2023 at 09:09:45AM +0800, Anand Jain wrote:
> Guilherme's previous work [1] aimed at the mounting of cloned devices
> using a superblock flag SINGLE_DEV during mkfs.
>  [1] https://lore.kernel.org/linux-btrfs/20230831001544.3379273-1-gpiccoli@igalia.com/
> 
> Building upon this work, here is in-memory only approach. As it mounts
> we determine if the same fsid is already mounted if then we generate a
> random temp fsid which shall be used the mount, in-memory only not
> written to the disk. We distinguish device by devt.
> 
> Mount option / superblock flag:
> -------------------------------
>  These patches show we don't have to limit the single-device / temp_fsid
> capability with a mount option or a superblock flag from the btrfs
> internals pov. However, if necessary from the user's perspective,
> we can add them later on top of this patch. I've prepared a mount option
> -o temp_fsid patch, but I'm not included at this time. As most of the
> tests was without it.
> 
> Compatible with other features that may be affected:
> ----------------------------------------------------
>  Multi device:
>     A btrfs filesytem on a single device can be copied using dd and
>     mounted simlutaneously. However, a multi device btrfs copied using
>     dd and trying to mount simlutaneously is forced to fail:
> 
>       mount: /btrfs1: mount(2) system call failed: File exists.
> 
>  Send and receive:
>     Quick tests shows send and receive between two single devices with
>     the same fsid mounted on the _same_ host works!.

Does it depend if the filesystem remains mounted for the whole
time? So if there's an unmount, mount again with a temp-fsid, will
the receive still work?

>     (Also, the receive-mnt can receive from multiple senders as long as
>     conflits are managed externally. ;-).)
> 
>  Replace: 
>      Works fine.
> 
> btrfs-progs:
> ------------
>  btrfs-progs needs to be updated to support the commands such as
> 
> 	btrfs filesystem show
> 
>  when devices are not mounted. So the device list is not based on
>  the fisd any more.
> 
> Testing:
> -------
>  This patch has been under testing for some time. The challenge is to get
>  the fstests to test this reasonably well.
> 
>  As of now, this patch runs fine on a large set of fstests test cases
>  using a custom-built mkfs.btrfs with the -U option and a new -P option
>  to copy the device FSID and UUID from the TEST_DEV to the SCRATCH_DEV
>  at the scratch_mkfs time. For example:
> 
>   Config file:
> 
>      config_fsid=$(btrfs in dump-super $TEST_DEV | grep -E ^fsid | \
> 							awk '{print $2}')
>      config_uuid=$(btrfs in dump-super $TEST_DEV | \
> 				grep -E ^dev_item.uuid | awk '{print $2}')
>      MKFS_OPTIONS="-U $config_fsid -P $config_uuid"
> 
>  This configuration option ensures that both TEST_DEV and SCRATCH_DEV will
>  have the same FSID and device UUID while still applying test-specific
>  scratch mkfs options.
> 
> Mkfs.btrfs:
> -----------
>  mkfs.btrfs needs to be updated to support the -P option for testing only.
> 
>    btrfs-progs: allow duplicate fsid for single device
>    btrfs-progs: add mkfs -P option for dev_uuid
> 
> Anand Jain (2):
>   btrfs: add helper function find_fsid_by_disk
>   btrfs: support cloned-device mount capability

Added to misc-next, thanks.
Anand Jain Oct. 3, 2023, 1:13 a.m. UTC | #3
On 02/10/2023 21:00, David Sterba wrote:
> On Thu, Sep 28, 2023 at 09:09:45AM +0800, Anand Jain wrote:
>> Guilherme's previous work [1] aimed at the mounting of cloned devices
>> using a superblock flag SINGLE_DEV during mkfs.
>>   [1] https://lore.kernel.org/linux-btrfs/20230831001544.3379273-1-gpiccoli@igalia.com/
>>
>> Building upon this work, here is in-memory only approach. As it mounts
>> we determine if the same fsid is already mounted if then we generate a
>> random temp fsid which shall be used the mount, in-memory only not
>> written to the disk. We distinguish device by devt.
>>
>> Mount option / superblock flag:
>> -------------------------------
>>   These patches show we don't have to limit the single-device / temp_fsid
>> capability with a mount option or a superblock flag from the btrfs
>> internals pov. However, if necessary from the user's perspective,
>> we can add them later on top of this patch. I've prepared a mount option
>> -o temp_fsid patch, but I'm not included at this time. As most of the
>> tests was without it.
>>
>> Compatible with other features that may be affected:
>> ----------------------------------------------------
>>   Multi device:
>>      A btrfs filesytem on a single device can be copied using dd and
>>      mounted simlutaneously. However, a multi device btrfs copied using
>>      dd and trying to mount simlutaneously is forced to fail:
>>
>>        mount: /btrfs1: mount(2) system call failed: File exists.
>>


>>   Send and receive:
>>      Quick tests shows send and receive between two single devices with
>>      the same fsid mounted on the _same_ host works!.
> 
> Does it depend if the filesystem remains mounted for the whole
> time? So if there's an unmount, mount again with a temp-fsid, will
> the receive still work?


Yes! Send-receive works even after a mount-recycle with the new 
temp-fsid, as shown below.

Cc-ing Filipe for any comments or send-receive scenario that might fail, 
if any?


---------------------
mkfs a test device whose uuid and fsid will be duplicated

  $ mkfs.btrfs -fq /dev/sdc1
  $ blkid /dev/sdc1
/dev/sdc1: UUID="99821bd4-322c-4a71-a88d-b9bb3e56223b" 
UUID_SUB="1881db58-1c2f-4639-bf87-5c0af24433d6" TYPE="btrfs" 
PARTUUID="a0de6580-01"
  $ mount /dev/sdc1 /btrfs


using the above uuid and fsid mkfs two more scratch device

  $ mkfs.btrfs -fq -U 99821bd4-322c-4a71-a88d-b9bb3e56223b -P 
1881db58-1c2f-4639-bf87-5c0af24433d6 /dev/sdc2
  $ mkfs.btrfs -fq -U 99821bd4-322c-4a71-a88d-b9bb3e56223b -P 
1881db58-1c2f-4639-bf87-5c0af24433d6 /dev/sdc3

mount scratch devices; it will mount using temp-fsid

  $ mount /dev/sdc2 /btrfs1
  $ mount /dev/sdc3 /btrfs2
  $ btrfs filesystem show -m
  Label: none  uuid: 99821bd4-322c-4a71-a88d-b9bb3e56223b
	Total devices 1 FS bytes used 144.00KiB
	devid    1 size 10.00GiB used 536.00MiB path /dev/sdc1

  Label: none  uuid: d041437c-d12e-427c-b0c2-e2591b069feb
	Total devices 1 FS bytes used 144.00KiB
	devid    1 size 10.00GiB used 536.00MiB path /dev/sdc2

  Label: none  uuid: 91c7978f-342f-43d5-a88a-d131dd34962e
	Total devices 1 FS bytes used 144.00KiB
	devid    1 size 10.00GiB used 536.00MiB path /dev/sdc3


create first data and send-receive

  $ xfs_io -f -c 'pwrite -S 0x16 0 9000' /btrfs1/foo
  $ btrfs su snap -r /btrfs1 /btrfs1/snap1
  Create a readonly snapshot of '/btrfs1' in '/btrfs1/snap1'
  $ btrfs send /btrfs1/snap1 | btrfs receive /btrfs2
  At subvol /btrfs1/snap1
  At subvol snap1

  $ sha256sum /btrfs1/foo
  e856cd48942364eed9a205c64aa5e737ab52a73ba2800b07de9d4c331f88cb5b 
/btrfs1/foo
  $ sha256sum /btrfs2/snap1/foo
  e856cd48942364eed9a205c64aa5e737ab52a73ba2800b07de9d4c331f88cb5b 
/btrfs2/snap1/foo


mount recycle so that we have new temp-fsid

  $ umount /btrfs2
  $ umount /btrfs1
  $ mount /dev/sdc2 /btrfs1
  $ mount /dev/sdc3 /btrfs2
  $ btrfs filesystem show -m
  Label: none  uuid: 99821bd4-322c-4a71-a88d-b9bb3e56223b
	Total devices 1 FS bytes used 144.00KiB
	devid    1 size 10.00GiB used 536.00MiB path /dev/sdc1

  Label: none  uuid: 34549411-c9cf-4118-8e42-58dbfd5c4964
	Total devices 1 FS bytes used 172.00KiB
	devid    1 size 10.00GiB used 536.00MiB path /dev/sdc2

  Label: none  uuid: a9ec3b45-f809-49ad-bcb2-bd4b65b130d8
	Total devices 1 FS bytes used 172.00KiB
	devid    1 size 10.00GiB used 536.00MiB path /dev/sdc3


modify foo and send-receive

  $ xfs_io -f -c 'pwrite -S 0xdb 0 9000' /btrfs1/foo
  $ btrfs su snap -r /btrfs1 /btrfs1/snap2
  Create a readonly snapshot of '/btrfs1' in '/btrfs1/snap2'
  $ btrfs send -p /btrfs1/snap1 /btrfs1/snap2 | btrfs receive /btrfs2
  At snapshot snap2
  At subvol /btrfs1/snap2

  $ sha256sum /btrfs1/foo
  5a97ea23517b5f1255161345715f5831b59cbcd62f1fd57e40329980faa7dbd8 
/btrfs1/foo
  $ sha256sum /btrfs2/snap2/foo
  5a97ea23517b5f1255161345715f5831b59cbcd62f1fd57e40329980faa7dbd8 
/btrfs2/snap2/foo
-----------------------------------------------------


> 
>>      (Also, the receive-mnt can receive from multiple senders as long as
>>      conflits are managed externally. ;-).)
>>

I mean multiple senders on temp-fsid mount as long as they have the same 
superblock::fsid.


Thanks, Anand



>>   Replace:
>>       Works fine.
>>
>> btrfs-progs:
>> ------------
>>   btrfs-progs needs to be updated to support the commands such as
>>
>> 	btrfs filesystem show
>>
>>   when devices are not mounted. So the device list is not based on
>>   the fisd any more.
>>
>> Testing:
>> -------
>>   This patch has been under testing for some time. The challenge is to get
>>   the fstests to test this reasonably well.
>>
>>   As of now, this patch runs fine on a large set of fstests test cases
>>   using a custom-built mkfs.btrfs with the -U option and a new -P option
>>   to copy the device FSID and UUID from the TEST_DEV to the SCRATCH_DEV
>>   at the scratch_mkfs time. For example:
>>
>>    Config file:
>>
>>       config_fsid=$(btrfs in dump-super $TEST_DEV | grep -E ^fsid | \
>> 							awk '{print $2}')
>>       config_uuid=$(btrfs in dump-super $TEST_DEV | \
>> 				grep -E ^dev_item.uuid | awk '{print $2}')
>>       MKFS_OPTIONS="-U $config_fsid -P $config_uuid"
>>
>>   This configuration option ensures that both TEST_DEV and SCRATCH_DEV will
>>   have the same FSID and device UUID while still applying test-specific
>>   scratch mkfs options.
>>
>> Mkfs.btrfs:
>> -----------
>>   mkfs.btrfs needs to be updated to support the -P option for testing only.
>>
>>     btrfs-progs: allow duplicate fsid for single device
>>     btrfs-progs: add mkfs -P option for dev_uuid
>>
>> Anand Jain (2):
>>    btrfs: add helper function find_fsid_by_disk
>>    btrfs: support cloned-device mount capability
> 
> Added to misc-next, thanks.
Guilherme G. Piccoli Oct. 6, 2023, 7:42 a.m. UTC | #4
Hi Anand / David, I was out at a conference and some holidays, so missed
this patch. Is this a replacement of the temp-fsid approach?

So, to clarify a bit the inner workings of this patch: we don't have the
temp-fsid superblock flag anymore? Also, we can mount multiple
partitions holding the same filesystem at the same time, given the
nature of the patch (that generates the random fsid based on devt as per
my superficial understanding) - right? And we don't use the
metadata_uuid field here anymore, i.e., we kinda "lose" the original fsid?

If that approaches is considered better than mine and works fine for the
Steam Deck use case, I'm glad in having that! But I would like at least
to understand why it was preferred over the temp-fsid one, and what are
the differences we can expect (need a flag to mkfs or can use btrfstune
for that, for example).

Thanks in advance,


Guilherme
Anand Jain Oct. 7, 2023, 8:01 a.m. UTC | #5
On 10/6/23 15:42, Guilherme G. Piccoli wrote:
> Hi Anand / David, I was out at a conference and some holidays, so missed
> this patch. Is this a replacement of the temp-fsid approach?
> 
> So, to clarify a bit the inner workings of this patch: we don't have the
> temp-fsid superblock flag anymore? 

While btrfs doesn't need this superblock flag internally, we may
consider adding it for improved usability with other Btrfs features.

> Also, we can mount multiple
> partitions holding the same filesystem at the same time, given the
> nature of the patch (that generates the random fsid based on devt as per
> my superficial understanding) - right?

Indeed, devt remains unique to the partition we've utilized for a
similar purpose prior to this patch. Are there any devices lacking
a distinct devt value?


> And we don't use the
> metadata_uuid field here anymore,

Btrfs has always assigned fs_devices::metadata_uuid to either fsid
or metadata_uuid.


> i.e., we kinda "lose" the original fsid?

How? Have you tested to confirm?


> If that approaches is considered better than mine and works fine for the
> Steam Deck use case, 
> I'm glad in having that! 

As you have a use case to verify, can you indeed confirm whether
it works?

> But I would like at least
> to understand why it was preferred over the temp-fsid one, and what are
> the differences we can expect (need a flag to mkfs or can use btrfstune
> for that, for example).

The in-memory disk-super hack in the original patch is essentially a
workaround. This led to the necessity of restricting devices using
metadata_uuid from being used as temp-fsid device. A more appropriate
approach is to enhance device_list_add() to intelligently manage
duplicate disk-super entries by checking devt and permitting them
to mount if unique. This solution deviates from the original patch
and simultaneously addresses the subvol-mount corruption issue
observed in the original implementation.

The additional adjustments [1], such as sysfs interface, the constraints
on device additions, and the limitations on seed devices, are
supplementary patches essential to the comprehensive solution.

   [1] [PATCH 0/4] btrfs: sysfs and unsupported temp-fsid features for 
clones

However, the superblock temp-fsid flag isn't inherently necessary
within btrfs internals. Nevertheless, it can be considered for addition
if it makes the usability of other btrfs features with temp-fsid more 
seamless.

Thanks, Anand
Guilherme G. Piccoli Oct. 18, 2023, 2:15 p.m. UTC | #6
On 07/10/2023 10:01, Anand Jain wrote:
> [...]

Hi Anand, thanks for your response and apologies for my delay.


>> Also, we can mount multiple
>> partitions holding the same filesystem at the same time, given the
>> nature of the patch (that generates the random fsid based on devt as per
>> my superficial understanding) - right?
> 
> Indeed, devt remains unique to the partition we've utilized for a
> similar purpose prior to this patch. Are there any devices lacking
> a distinct devt value?
> 

Not that I'm aware of, it was more of a curiosity.


>> i.e., we kinda "lose" the original fsid?
> 
> How? Have you tested to confirm?

Oh no, not literally I meant. When we go with the temp-fsid approach as
you implemented, the kernel doesn't inform the real fsid. But that's not
an issue at all, more of a curiosity...

I just tested misc-next and your approach seems to be working fine!


> 
>> If that approaches is considered better than mine and works fine for the
>> Steam Deck use case, 
>> I'm glad in having that! 
> 
> As you have a use case to verify, can you indeed confirm whether
> it works?

It does work! I'll test more in the Steam Deck, but so far seems to be
addressing fine the use case we have...


> [...] 
>> But I would like at least
>> to understand why it was preferred over the temp-fsid one, and what are
>> the differences we can expect (need a flag to mkfs or can use btrfstune
>> for that, for example).
> 
> The in-memory disk-super hack in the original patch is essentially a
> workaround. This led to the necessity of restricting devices using
> metadata_uuid from being used as temp-fsid device. A more appropriate
> approach is to enhance device_list_add() to intelligently manage
> duplicate disk-super entries by checking devt and permitting them
> to mount if unique. This solution deviates from the original patch
> and simultaneously addresses the subvol-mount corruption issue
> observed in the original implementation.
> 

OK, makes sense Anand.
Thanks,


Guilherme
Anand Jain Oct. 19, 2023, 2:13 a.m. UTC | #7
> 
> It does work! I'll test more in the Steam Deck, but so far seems to be
> addressing fine the use case we have...

Thanks for the use-case validation!. Is there a way to turn
your use-case into a test-case?


One remaining challenge is with 'btrfs filesystem show' when
cloned devices are unmounted. Currently, only shows one
cloned device.

We could consider porting kernel changes to btrfs-progs to
display all devices, (perhaps with a random fsid). Please go
ahead if you have some time to work on it, as I won't be able
to look into it for the next two weeks.

Thanks, Anand
Guilherme G. Piccoli Oct. 19, 2023, 8:15 a.m. UTC | #8
On 19/10/2023 04:13, Anand Jain wrote:
> [...]
> Thanks for the use-case validation!. Is there a way to turn
> your use-case into a test-case?
> 

The xfstests that was submitted for the incompat TEMP_FSID flag covers
the use case - we just need to rewrite that dropping the test for the
flag and changing that for checking sysfs temp-fsid feature.

I can do that next week if such wait is fine =)
Cheers,


Guilherme