Message ID | cover.1695826320.git.anand.jain@oracle.com (mailing list archive) |
---|---|
Headers | show |
Series | btrfs: support cloned-device mount capability | expand |
Gentle ping on any comments Thanks, Anand On 28/09/2023 09:09, Anand Jain wrote: > Guilherme's previous work [1] aimed at the mounting of cloned devices > using a superblock flag SINGLE_DEV during mkfs. > [1] https://lore.kernel.org/linux-btrfs/20230831001544.3379273-1-gpiccoli@igalia.com/ > > Building upon this work, here is in-memory only approach. As it mounts > we determine if the same fsid is already mounted if then we generate a > random temp fsid which shall be used the mount, in-memory only not > written to the disk. We distinguish device by devt. > > Mount option / superblock flag: > ------------------------------- > These patches show we don't have to limit the single-device / temp_fsid > capability with a mount option or a superblock flag from the btrfs > internals pov. However, if necessary from the user's perspective, > we can add them later on top of this patch. I've prepared a mount option > -o temp_fsid patch, but I'm not included at this time. As most of the > tests was without it. > > Compatible with other features that may be affected: > ---------------------------------------------------- > Multi device: > A btrfs filesytem on a single device can be copied using dd and > mounted simlutaneously. However, a multi device btrfs copied using > dd and trying to mount simlutaneously is forced to fail: > > mount: /btrfs1: mount(2) system call failed: File exists. > > Send and receive: > Quick tests shows send and receive between two single devices with > the same fsid mounted on the _same_ host works!. > (Also, the receive-mnt can receive from multiple senders as long as > conflits are managed externally. ;-).) > > Replace: > Works fine. > > btrfs-progs: > ------------ > btrfs-progs needs to be updated to support the commands such as > > btrfs filesystem show > > when devices are not mounted. So the device list is not based on > the fisd any more. > > Testing: > ------- > This patch has been under testing for some time. The challenge is to get > the fstests to test this reasonably well. > > As of now, this patch runs fine on a large set of fstests test cases > using a custom-built mkfs.btrfs with the -U option and a new -P option > to copy the device FSID and UUID from the TEST_DEV to the SCRATCH_DEV > at the scratch_mkfs time. For example: > > Config file: > > config_fsid=$(btrfs in dump-super $TEST_DEV | grep -E ^fsid | \ > awk '{print $2}') > config_uuid=$(btrfs in dump-super $TEST_DEV | \ > grep -E ^dev_item.uuid | awk '{print $2}') > MKFS_OPTIONS="-U $config_fsid -P $config_uuid" > > This configuration option ensures that both TEST_DEV and SCRATCH_DEV will > have the same FSID and device UUID while still applying test-specific > scratch mkfs options. > > Mkfs.btrfs: > ----------- > mkfs.btrfs needs to be updated to support the -P option for testing only. > > btrfs-progs: allow duplicate fsid for single device > btrfs-progs: add mkfs -P option for dev_uuid > > Anand Jain (2): > btrfs: add helper function find_fsid_by_disk > btrfs: support cloned-device mount capability > > fs/btrfs/disk-io.c | 3 +- > fs/btrfs/volumes.c | 75 +++++++++++++++++++++++++++++++++++++++++++--- > fs/btrfs/volumes.h | 2 ++ > 3 files changed, 75 insertions(+), 5 deletions(-) >
On Thu, Sep 28, 2023 at 09:09:45AM +0800, Anand Jain wrote: > Guilherme's previous work [1] aimed at the mounting of cloned devices > using a superblock flag SINGLE_DEV during mkfs. > [1] https://lore.kernel.org/linux-btrfs/20230831001544.3379273-1-gpiccoli@igalia.com/ > > Building upon this work, here is in-memory only approach. As it mounts > we determine if the same fsid is already mounted if then we generate a > random temp fsid which shall be used the mount, in-memory only not > written to the disk. We distinguish device by devt. > > Mount option / superblock flag: > ------------------------------- > These patches show we don't have to limit the single-device / temp_fsid > capability with a mount option or a superblock flag from the btrfs > internals pov. However, if necessary from the user's perspective, > we can add them later on top of this patch. I've prepared a mount option > -o temp_fsid patch, but I'm not included at this time. As most of the > tests was without it. > > Compatible with other features that may be affected: > ---------------------------------------------------- > Multi device: > A btrfs filesytem on a single device can be copied using dd and > mounted simlutaneously. However, a multi device btrfs copied using > dd and trying to mount simlutaneously is forced to fail: > > mount: /btrfs1: mount(2) system call failed: File exists. > > Send and receive: > Quick tests shows send and receive between two single devices with > the same fsid mounted on the _same_ host works!. Does it depend if the filesystem remains mounted for the whole time? So if there's an unmount, mount again with a temp-fsid, will the receive still work? > (Also, the receive-mnt can receive from multiple senders as long as > conflits are managed externally. ;-).) > > Replace: > Works fine. > > btrfs-progs: > ------------ > btrfs-progs needs to be updated to support the commands such as > > btrfs filesystem show > > when devices are not mounted. So the device list is not based on > the fisd any more. > > Testing: > ------- > This patch has been under testing for some time. The challenge is to get > the fstests to test this reasonably well. > > As of now, this patch runs fine on a large set of fstests test cases > using a custom-built mkfs.btrfs with the -U option and a new -P option > to copy the device FSID and UUID from the TEST_DEV to the SCRATCH_DEV > at the scratch_mkfs time. For example: > > Config file: > > config_fsid=$(btrfs in dump-super $TEST_DEV | grep -E ^fsid | \ > awk '{print $2}') > config_uuid=$(btrfs in dump-super $TEST_DEV | \ > grep -E ^dev_item.uuid | awk '{print $2}') > MKFS_OPTIONS="-U $config_fsid -P $config_uuid" > > This configuration option ensures that both TEST_DEV and SCRATCH_DEV will > have the same FSID and device UUID while still applying test-specific > scratch mkfs options. > > Mkfs.btrfs: > ----------- > mkfs.btrfs needs to be updated to support the -P option for testing only. > > btrfs-progs: allow duplicate fsid for single device > btrfs-progs: add mkfs -P option for dev_uuid > > Anand Jain (2): > btrfs: add helper function find_fsid_by_disk > btrfs: support cloned-device mount capability Added to misc-next, thanks.
On 02/10/2023 21:00, David Sterba wrote: > On Thu, Sep 28, 2023 at 09:09:45AM +0800, Anand Jain wrote: >> Guilherme's previous work [1] aimed at the mounting of cloned devices >> using a superblock flag SINGLE_DEV during mkfs. >> [1] https://lore.kernel.org/linux-btrfs/20230831001544.3379273-1-gpiccoli@igalia.com/ >> >> Building upon this work, here is in-memory only approach. As it mounts >> we determine if the same fsid is already mounted if then we generate a >> random temp fsid which shall be used the mount, in-memory only not >> written to the disk. We distinguish device by devt. >> >> Mount option / superblock flag: >> ------------------------------- >> These patches show we don't have to limit the single-device / temp_fsid >> capability with a mount option or a superblock flag from the btrfs >> internals pov. However, if necessary from the user's perspective, >> we can add them later on top of this patch. I've prepared a mount option >> -o temp_fsid patch, but I'm not included at this time. As most of the >> tests was without it. >> >> Compatible with other features that may be affected: >> ---------------------------------------------------- >> Multi device: >> A btrfs filesytem on a single device can be copied using dd and >> mounted simlutaneously. However, a multi device btrfs copied using >> dd and trying to mount simlutaneously is forced to fail: >> >> mount: /btrfs1: mount(2) system call failed: File exists. >> >> Send and receive: >> Quick tests shows send and receive between two single devices with >> the same fsid mounted on the _same_ host works!. > > Does it depend if the filesystem remains mounted for the whole > time? So if there's an unmount, mount again with a temp-fsid, will > the receive still work? Yes! Send-receive works even after a mount-recycle with the new temp-fsid, as shown below. Cc-ing Filipe for any comments or send-receive scenario that might fail, if any? --------------------- mkfs a test device whose uuid and fsid will be duplicated $ mkfs.btrfs -fq /dev/sdc1 $ blkid /dev/sdc1 /dev/sdc1: UUID="99821bd4-322c-4a71-a88d-b9bb3e56223b" UUID_SUB="1881db58-1c2f-4639-bf87-5c0af24433d6" TYPE="btrfs" PARTUUID="a0de6580-01" $ mount /dev/sdc1 /btrfs using the above uuid and fsid mkfs two more scratch device $ mkfs.btrfs -fq -U 99821bd4-322c-4a71-a88d-b9bb3e56223b -P 1881db58-1c2f-4639-bf87-5c0af24433d6 /dev/sdc2 $ mkfs.btrfs -fq -U 99821bd4-322c-4a71-a88d-b9bb3e56223b -P 1881db58-1c2f-4639-bf87-5c0af24433d6 /dev/sdc3 mount scratch devices; it will mount using temp-fsid $ mount /dev/sdc2 /btrfs1 $ mount /dev/sdc3 /btrfs2 $ btrfs filesystem show -m Label: none uuid: 99821bd4-322c-4a71-a88d-b9bb3e56223b Total devices 1 FS bytes used 144.00KiB devid 1 size 10.00GiB used 536.00MiB path /dev/sdc1 Label: none uuid: d041437c-d12e-427c-b0c2-e2591b069feb Total devices 1 FS bytes used 144.00KiB devid 1 size 10.00GiB used 536.00MiB path /dev/sdc2 Label: none uuid: 91c7978f-342f-43d5-a88a-d131dd34962e Total devices 1 FS bytes used 144.00KiB devid 1 size 10.00GiB used 536.00MiB path /dev/sdc3 create first data and send-receive $ xfs_io -f -c 'pwrite -S 0x16 0 9000' /btrfs1/foo $ btrfs su snap -r /btrfs1 /btrfs1/snap1 Create a readonly snapshot of '/btrfs1' in '/btrfs1/snap1' $ btrfs send /btrfs1/snap1 | btrfs receive /btrfs2 At subvol /btrfs1/snap1 At subvol snap1 $ sha256sum /btrfs1/foo e856cd48942364eed9a205c64aa5e737ab52a73ba2800b07de9d4c331f88cb5b /btrfs1/foo $ sha256sum /btrfs2/snap1/foo e856cd48942364eed9a205c64aa5e737ab52a73ba2800b07de9d4c331f88cb5b /btrfs2/snap1/foo mount recycle so that we have new temp-fsid $ umount /btrfs2 $ umount /btrfs1 $ mount /dev/sdc2 /btrfs1 $ mount /dev/sdc3 /btrfs2 $ btrfs filesystem show -m Label: none uuid: 99821bd4-322c-4a71-a88d-b9bb3e56223b Total devices 1 FS bytes used 144.00KiB devid 1 size 10.00GiB used 536.00MiB path /dev/sdc1 Label: none uuid: 34549411-c9cf-4118-8e42-58dbfd5c4964 Total devices 1 FS bytes used 172.00KiB devid 1 size 10.00GiB used 536.00MiB path /dev/sdc2 Label: none uuid: a9ec3b45-f809-49ad-bcb2-bd4b65b130d8 Total devices 1 FS bytes used 172.00KiB devid 1 size 10.00GiB used 536.00MiB path /dev/sdc3 modify foo and send-receive $ xfs_io -f -c 'pwrite -S 0xdb 0 9000' /btrfs1/foo $ btrfs su snap -r /btrfs1 /btrfs1/snap2 Create a readonly snapshot of '/btrfs1' in '/btrfs1/snap2' $ btrfs send -p /btrfs1/snap1 /btrfs1/snap2 | btrfs receive /btrfs2 At snapshot snap2 At subvol /btrfs1/snap2 $ sha256sum /btrfs1/foo 5a97ea23517b5f1255161345715f5831b59cbcd62f1fd57e40329980faa7dbd8 /btrfs1/foo $ sha256sum /btrfs2/snap2/foo 5a97ea23517b5f1255161345715f5831b59cbcd62f1fd57e40329980faa7dbd8 /btrfs2/snap2/foo ----------------------------------------------------- > >> (Also, the receive-mnt can receive from multiple senders as long as >> conflits are managed externally. ;-).) >> I mean multiple senders on temp-fsid mount as long as they have the same superblock::fsid. Thanks, Anand >> Replace: >> Works fine. >> >> btrfs-progs: >> ------------ >> btrfs-progs needs to be updated to support the commands such as >> >> btrfs filesystem show >> >> when devices are not mounted. So the device list is not based on >> the fisd any more. >> >> Testing: >> ------- >> This patch has been under testing for some time. The challenge is to get >> the fstests to test this reasonably well. >> >> As of now, this patch runs fine on a large set of fstests test cases >> using a custom-built mkfs.btrfs with the -U option and a new -P option >> to copy the device FSID and UUID from the TEST_DEV to the SCRATCH_DEV >> at the scratch_mkfs time. For example: >> >> Config file: >> >> config_fsid=$(btrfs in dump-super $TEST_DEV | grep -E ^fsid | \ >> awk '{print $2}') >> config_uuid=$(btrfs in dump-super $TEST_DEV | \ >> grep -E ^dev_item.uuid | awk '{print $2}') >> MKFS_OPTIONS="-U $config_fsid -P $config_uuid" >> >> This configuration option ensures that both TEST_DEV and SCRATCH_DEV will >> have the same FSID and device UUID while still applying test-specific >> scratch mkfs options. >> >> Mkfs.btrfs: >> ----------- >> mkfs.btrfs needs to be updated to support the -P option for testing only. >> >> btrfs-progs: allow duplicate fsid for single device >> btrfs-progs: add mkfs -P option for dev_uuid >> >> Anand Jain (2): >> btrfs: add helper function find_fsid_by_disk >> btrfs: support cloned-device mount capability > > Added to misc-next, thanks.
Hi Anand / David, I was out at a conference and some holidays, so missed this patch. Is this a replacement of the temp-fsid approach? So, to clarify a bit the inner workings of this patch: we don't have the temp-fsid superblock flag anymore? Also, we can mount multiple partitions holding the same filesystem at the same time, given the nature of the patch (that generates the random fsid based on devt as per my superficial understanding) - right? And we don't use the metadata_uuid field here anymore, i.e., we kinda "lose" the original fsid? If that approaches is considered better than mine and works fine for the Steam Deck use case, I'm glad in having that! But I would like at least to understand why it was preferred over the temp-fsid one, and what are the differences we can expect (need a flag to mkfs or can use btrfstune for that, for example). Thanks in advance, Guilherme
On 10/6/23 15:42, Guilherme G. Piccoli wrote: > Hi Anand / David, I was out at a conference and some holidays, so missed > this patch. Is this a replacement of the temp-fsid approach? > > So, to clarify a bit the inner workings of this patch: we don't have the > temp-fsid superblock flag anymore? While btrfs doesn't need this superblock flag internally, we may consider adding it for improved usability with other Btrfs features. > Also, we can mount multiple > partitions holding the same filesystem at the same time, given the > nature of the patch (that generates the random fsid based on devt as per > my superficial understanding) - right? Indeed, devt remains unique to the partition we've utilized for a similar purpose prior to this patch. Are there any devices lacking a distinct devt value? > And we don't use the > metadata_uuid field here anymore, Btrfs has always assigned fs_devices::metadata_uuid to either fsid or metadata_uuid. > i.e., we kinda "lose" the original fsid? How? Have you tested to confirm? > If that approaches is considered better than mine and works fine for the > Steam Deck use case, > I'm glad in having that! As you have a use case to verify, can you indeed confirm whether it works? > But I would like at least > to understand why it was preferred over the temp-fsid one, and what are > the differences we can expect (need a flag to mkfs or can use btrfstune > for that, for example). The in-memory disk-super hack in the original patch is essentially a workaround. This led to the necessity of restricting devices using metadata_uuid from being used as temp-fsid device. A more appropriate approach is to enhance device_list_add() to intelligently manage duplicate disk-super entries by checking devt and permitting them to mount if unique. This solution deviates from the original patch and simultaneously addresses the subvol-mount corruption issue observed in the original implementation. The additional adjustments [1], such as sysfs interface, the constraints on device additions, and the limitations on seed devices, are supplementary patches essential to the comprehensive solution. [1] [PATCH 0/4] btrfs: sysfs and unsupported temp-fsid features for clones However, the superblock temp-fsid flag isn't inherently necessary within btrfs internals. Nevertheless, it can be considered for addition if it makes the usability of other btrfs features with temp-fsid more seamless. Thanks, Anand
On 07/10/2023 10:01, Anand Jain wrote: > [...] Hi Anand, thanks for your response and apologies for my delay. >> Also, we can mount multiple >> partitions holding the same filesystem at the same time, given the >> nature of the patch (that generates the random fsid based on devt as per >> my superficial understanding) - right? > > Indeed, devt remains unique to the partition we've utilized for a > similar purpose prior to this patch. Are there any devices lacking > a distinct devt value? > Not that I'm aware of, it was more of a curiosity. >> i.e., we kinda "lose" the original fsid? > > How? Have you tested to confirm? Oh no, not literally I meant. When we go with the temp-fsid approach as you implemented, the kernel doesn't inform the real fsid. But that's not an issue at all, more of a curiosity... I just tested misc-next and your approach seems to be working fine! > >> If that approaches is considered better than mine and works fine for the >> Steam Deck use case, >> I'm glad in having that! > > As you have a use case to verify, can you indeed confirm whether > it works? It does work! I'll test more in the Steam Deck, but so far seems to be addressing fine the use case we have... > [...] >> But I would like at least >> to understand why it was preferred over the temp-fsid one, and what are >> the differences we can expect (need a flag to mkfs or can use btrfstune >> for that, for example). > > The in-memory disk-super hack in the original patch is essentially a > workaround. This led to the necessity of restricting devices using > metadata_uuid from being used as temp-fsid device. A more appropriate > approach is to enhance device_list_add() to intelligently manage > duplicate disk-super entries by checking devt and permitting them > to mount if unique. This solution deviates from the original patch > and simultaneously addresses the subvol-mount corruption issue > observed in the original implementation. > OK, makes sense Anand. Thanks, Guilherme
> > It does work! I'll test more in the Steam Deck, but so far seems to be > addressing fine the use case we have... Thanks for the use-case validation!. Is there a way to turn your use-case into a test-case? One remaining challenge is with 'btrfs filesystem show' when cloned devices are unmounted. Currently, only shows one cloned device. We could consider porting kernel changes to btrfs-progs to display all devices, (perhaps with a random fsid). Please go ahead if you have some time to work on it, as I won't be able to look into it for the next two weeks. Thanks, Anand
On 19/10/2023 04:13, Anand Jain wrote: > [...] > Thanks for the use-case validation!. Is there a way to turn > your use-case into a test-case? > The xfstests that was submitted for the incompat TEMP_FSID flag covers the use case - we just need to rewrite that dropping the test for the flag and changing that for checking sysfs temp-fsid feature. I can do that next week if such wait is fine =) Cheers, Guilherme