Message ID | 20230504170708.787361-1-gpiccoli@igalia.com (mailing list archive) |
---|---|
Headers | show |
Series | Supporting same fsid filesystems mounting on btrfs | expand |
On 04/05/2023 19.07, Guilherme G. Piccoli wrote: > Hi folks, this is an attempt of supporting same fsid mounting on btrfs. > Currently, we cannot reliably mount same fsid filesystems even one at > a time in btrfs, but if users want to mount them at the same time, it's > pretty much impossible. Other filesystems like ext4 are capable of that. Hi Guilherme, did you tried to run "btrfs dev scan --forget /dev/sd.." before mount the filesystem ? Assuming that you have two devices /dev/sdA and /dev/sdB with two btrfs filesystem with the same uuid, you should mount /dev/sdA btrfs dev scan --forget /dev/sdB # you can use event /dev/sdA mount /dev/sdA /mnt/target and to mount /dev/sdB btrfs dev scan --forget /dev/sdA # you can use event /dev/sdB mount /dev/sdB /mnt/target I made a quick test using two loop devices and it seems that it works reliably. Another option should be make a kernel change that "forget" the device before mounting *if* the filesystem is composed by only one device ( and another few exceptions like the filesystem is already mounted). This would avoid all the problem related to make a "temporary" uuid. > > The goal is to allow systems with A/B partitioning scheme (like the > Steam Deck console or various mobile devices) to be able to hold > the same filesystem image in both partitions; it also allows to have > block device level check for filesystem integrity - this is used in the > Steam Deck image installation, to check if the current read-only image > is pristine. A bit more details are provided in the following ML thread: > > https://lore.kernel.org/linux-btrfs/c702fe27-8da9-505b-6e27-713edacf723a@igalia.com/ > > The mechanism used to achieve it is based in the metadata_uuid feature, > leveraging such code infrastructure for that. The patches are based on > kernel 6.3 and were tested both in a virtual machine as well as in the > Steam Deck. Comments, suggestions and overall feedback is greatly > appreciated - thanks in advance! > > Cheers, > > > Guilherme > > > Guilherme G. Piccoli (2): > btrfs: Introduce the virtual_fsid feature > btrfs: Add module parameter to enable non-mount scan skipping > > fs/btrfs/disk-io.c | 22 +++++++-- > fs/btrfs/ioctl.c | 18 ++++++++ > fs/btrfs/super.c | 41 ++++++++++++----- > fs/btrfs/super.h | 1 + > fs/btrfs/volumes.c | 111 +++++++++++++++++++++++++++++++++++++++------ > fs/btrfs/volumes.h | 11 ++++- > 6 files changed, 174 insertions(+), 30 deletions(-) >
On 04/05/2023 16:28, Goffredo Baroncelli wrote: > [...] > Hi Guilherme, > > did you tried to run "btrfs dev scan --forget /dev/sd.." before > mount the filesystem ? > > Assuming that you have two devices /dev/sdA and /dev/sdB with two btrfs > filesystem with the same uuid, you should mount /dev/sdA > > btrfs dev scan --forget /dev/sdB # you can use event /dev/sdA > mount /dev/sdA /mnt/target > > and to mount /dev/sdB > > btrfs dev scan --forget /dev/sdA # you can use event /dev/sdB > mount /dev/sdB /mnt/target > > I made a quick test using two loop devices and it seems that it works > reliably. Hi Goffredo, thanks for your suggestion! This seems interesting with regards the second patch here..indeed, I can mount any of the 2 filesystems if I use the forget option - interesting option, wasn't aware of that. But unfortunately it seems limited to mounting one device at a time, and we need to be able to mount *both* of them, due to an installation step. If I try to forget the device that is mounted, it gets (obviously) a "busy device" error. Is there any missing step from my side, or mounting both devices is really a limitation when using the forget option? > > Another option should be make a kernel change that "forget" the device > before mounting *if* the filesystem is composed by only one device ( > and another few exceptions like the filesystem is already mounted). > > This would avoid all the problem related to make a "temporary" uuid. I guess again this would be useful in the scope of the second patch here...we could check the way you're proposing instead of having the module parameter. In a way this is similar to the forget approach, right? But it's kind of an "automatic" forget heh How btrfs would know it is a case for single-device filesystem? In other words: how would we distinguish between the cases we want to auto-forget before mounting, and the cases in which this behavior is undesired? Thanks again for your feedback, it is much appreciated. Cheers, Guilherme
On 04/05/2023 22.10, Guilherme G. Piccoli wrote: > On 04/05/2023 16:28, Goffredo Baroncelli wrote: >> [...] >> Hi Guilherme, >> >> did you tried to run "btrfs dev scan --forget /dev/sd.." before >> mount the filesystem ? >> >> Assuming that you have two devices /dev/sdA and /dev/sdB with two btrfs >> filesystem with the same uuid, you should mount /dev/sdA >> >> btrfs dev scan --forget /dev/sdB # you can use event /dev/sdA >> mount /dev/sdA /mnt/target >> >> and to mount /dev/sdB >> >> btrfs dev scan --forget /dev/sdA # you can use event /dev/sdB >> mount /dev/sdB /mnt/target >> >> I made a quick test using two loop devices and it seems that it works >> reliably. > > Hi Goffredo, thanks for your suggestion! > > This seems interesting with regards the second patch here..indeed, I can > mount any of the 2 filesystems if I use the forget option - interesting > option, wasn't aware of that. > > But unfortunately it seems limited to mounting one device at a time, and > we need to be able to mount *both* of them, due to an installation step. > If I try to forget the device that is mounted, it gets (obviously) a > "busy device" error. Ahh, I didn't realized that you want to mount the two FS at the same time. > > Is there any missing step from my side, or mounting both devices is > really a limitation when using the forget option? > From my limited BTRFS internal knowledge, I think that your patches takes the correct approach: using the "metadata_uuid" code to allow two filesystems with the same uuid to exist at the same time. > >> >> Another option should be make a kernel change that "forget" the device >> before mounting *if* the filesystem is composed by only one device ( >> and another few exceptions like the filesystem is already mounted). >> >> This would avoid all the problem related to make a "temporary" uuid. > > I guess again this would be useful in the scope of the second patch > here...we could check the way you're proposing instead of having the > module parameter. In a way this is similar to the forget approach, > right? But it's kind of an "automatic" forget heh > > How btrfs would know it is a case for single-device filesystem? In other > words: how would we distinguish between the cases we want to auto-forget > before mounting, and the cases in which this behavior is undesired? If I remember correctly in the super-block there is the number of disks that compose the filesystem. If the count is 1, it should be safe to forget-before-mount the filesystem (or to not store in the cache after a scan) > > Thanks again for your feedback, it is much appreciated. > Cheers, > > > Guilherme
On 5/5/23 01:07, Guilherme G. Piccoli wrote: > Hi folks, this is an attempt of supporting same fsid mounting on btrfs. > Currently, we cannot reliably mount same fsid filesystems even one at > a time in btrfs, but if users want to mount them at the same time, it's > pretty much impossible. Other filesystems like ext4 are capable of that. > > The goal is to allow systems with A/B partitioning scheme (like the > Steam Deck console or various mobile devices) to be able to hold > the same filesystem image in both partitions; it also allows to have > block device level check for filesystem integrity - this is used in the > Steam Deck image installation, to check if the current read-only image > is pristine. A bit more details are provided in the following ML thread: > > https://lore.kernel.org/linux-btrfs/c702fe27-8da9-505b-6e27-713edacf723a@igalia.com/ Confused about your requirement: 2 identical filesystems mounted simultaneously or just one at a time? Latter works. Bugs were fixed. Have you considered using the btrfs seed device feature to avoid sacrificing 50% capacity? Create read-only seed device as golden image, add writable device on top. Example: $ btrfstune -S1 /dev/rdonly-golden-img $ mount /dev/rdonly-golden-img /btrfs $ btrfs dev add /dev/rw-dev /btrfs $ mount -o remount,rw /dev/rw-dev /btrfs To switch golden image: $ btrfs dev del /dev/rdonly-golden-img /btrfs $ umount /btrfs $ btrfstune -S1 /dev/rw-dev Thanks, Anand > The mechanism used to achieve it is based in the metadata_uuid feature, > leveraging such code infrastructure for that. The patches are based on > kernel 6.3 and were tested both in a virtual machine as well as in the > Steam Deck. Comments, suggestions and overall feedback is greatly > appreciated - thanks in advance! > > Cheers, > > > Guilherme > > > Guilherme G. Piccoli (2): > btrfs: Introduce the virtual_fsid feature > btrfs: Add module parameter to enable non-mount scan skipping > > fs/btrfs/disk-io.c | 22 +++++++-- > fs/btrfs/ioctl.c | 18 ++++++++ > fs/btrfs/super.c | 41 ++++++++++++----- > fs/btrfs/super.h | 1 + > fs/btrfs/volumes.c | 111 +++++++++++++++++++++++++++++++++++++++------ > fs/btrfs/volumes.h | 11 ++++- > 6 files changed, 174 insertions(+), 30 deletions(-) >
On 04/05/2023 18:09, Goffredo Baroncelli wrote: > [...] >> Is there any missing step from my side, or mounting both devices is >> really a limitation when using the forget option? >> > > From my limited BTRFS internal knowledge, I think that your patches > takes the correct approach: using the "metadata_uuid" code to allow > two filesystems with the same uuid to exist at the same time. > > [...] >> How btrfs would know it is a case for single-device filesystem? In other >> words: how would we distinguish between the cases we want to auto-forget >> before mounting, and the cases in which this behavior is undesired? > > If I remember correctly in the super-block there is the number of disks > that compose the filesystem. If the count is 1, it should be safe to > forget-before-mount the filesystem (or to not store in the cache > after a scan) Thanks Goffredo, for the clarifications =)
On 05/05/2023 02:16, Anand Jain wrote: > [...] >> >> https://lore.kernel.org/linux-btrfs/c702fe27-8da9-505b-6e27-713edacf723a@igalia.com/ > > Confused about your requirement: 2 identical filesystems mounted > simultaneously or just one at a time? Latter works. Bugs were fixed. Hi Anand, apologies - in fact, in this old-ish thread I mentioned we need to mount one at a time, and this corresponds for the majority of the use case. BUT...it seems that for the installing step we require to have *both* mounted at the same time for a while, so it was a change in the requirement since last analysis, and this is really what we implemented here. > > Have you considered using the btrfs seed device feature to avoid > sacrificing 50% capacity? Create read-only seed device as golden image, > add writable device on top. Example: > > $ btrfstune -S1 /dev/rdonly-golden-img > $ mount /dev/rdonly-golden-img /btrfs > $ btrfs dev add /dev/rw-dev /btrfs > $ mount -o remount,rw /dev/rw-dev /btrfs > > To switch golden image: > > $ btrfs dev del /dev/rdonly-golden-img /btrfs > $ umount /btrfs > $ btrfstune -S1 /dev/rw-dev > Yeah, I'm aware that btrfs has some features that might fit and could even save space, but due to some requirements on Deck it's not possible to use them. I'll defer a more detailed response for John / Vivek / Ludovico, that are aware of the use case in a detail level I'm not, since they designed the installation / update path from the ground up. Cheers, Guilherme
On 05/05/2023 18.27, Guilherme G. Piccoli wrote: > On 05/05/2023 02:16, Anand Jain wrote: >> [...] >>> >>> https://lore.kernel.org/linux-btrfs/c702fe27-8da9-505b-6e27-713edacf723a@igalia.com/ >> >> Confused about your requirement: 2 identical filesystems mounted >> simultaneously or just one at a time? Latter works. Bugs were fixed. > > Hi Anand, apologies - in fact, in this old-ish thread I mentioned we > need to mount one at a time, and this corresponds for the majority of > the use case. BUT...it seems that for the installing step we require to > have *both* mounted at the same time for a while, so it was a change in > the requirement since last analysis, and this is really what we > implemented here. What if the different images have different uuid from the begin ?
On 05/05/2023 17:27, Guilherme G. Piccoli wrote: > On 05/05/2023 02:16, Anand Jain wrote: [cut] > I'll defer a more detailed response for John / Vivek / Ludovico, that > are aware of the use case in a detail level I'm not, since they designed > the installation / update path from the ground up. > The OS images are entirely independent. The goal is that you could completely corrupt slot A and it would have no impact on the bootability of slot B. So, yes, we sacrifice space but as a trade off we get robustness which is more important to us. ========================================================================= When a new OS image is delivered, the normal flow is this (simplified): While booted on slot A (for example) the update process is started. Our client fetches the most recent image from the update server. This is delivered as a block level diff between the image you have and the image you want. The partitions that are allocated to slot B have the new data written into them. As a final step, the root fs of the new slot is mounted and a couple of initialisation steps are completed (mostly writing config into the common boot partition: The slot B partitions contents are not modified as a result of this). The system is rebooted. If all goes well slot B is booted and becomes the primary (current) image. If it fails for some reason, the bootloader will (either automatically or by user intervention) go back to booting slot A. Note that other than the final mount to update the common boot partition with information about the new image we don't care at all about the contents or even the type of the filesystems we have delivered (and even then all we care about is that we _can_ mount it, not what it is). =========================================================================== Now normally this is not a problem: If the new image is not the same as the current one we will have written entirely new filesystems into the B partitions and there is no conflict. However if the user wishes or needs to reinstall a fresh copy of the _current_ image (for whatever reason: maybe the current image is damaged in some way and they need to so a factory reset) then with btrfs in the mix this breaks down: Since btrfs won't (at present) tolerate a second fs with the same fsuuid we have to check that the user is not installing the same image on both slots. If the user has a broken image which is also the latest release and needs to recover we have to artificially select an _older_ image, put that on slot B. boot into that, then the user needs to boot that and upgrade _again_ to get a repaired A slot. This sort of works but isn't a great user experience and introduces an artificial restriction - suddenly the images _do_ affect one another. If the user subverts our safety checks (or we mess up and put the same image on both slots) then suddenly the whole system becomes unbootable which is less than ideal. Hope that clarifies the situation and explains why we care.
On Thu, May 04, 2023 at 02:07:06PM -0300, Guilherme G. Piccoli wrote: > Hi folks, this is an attempt of supporting same fsid mounting on btrfs. > Currently, we cannot reliably mount same fsid filesystems even one at > a time in btrfs, but if users want to mount them at the same time, it's > pretty much impossible. Other filesystems like ext4 are capable of that. > > The goal is to allow systems with A/B partitioning scheme (like the > Steam Deck console or various mobile devices) to be able to hold > the same filesystem image in both partitions; it also allows to have > block device level check for filesystem integrity - this is used in the > Steam Deck image installation, to check if the current read-only image > is pristine. A bit more details are provided in the following ML thread: > > https://lore.kernel.org/linux-btrfs/c702fe27-8da9-505b-6e27-713edacf723a@igalia.com/ > > The mechanism used to achieve it is based in the metadata_uuid feature, > leveraging such code infrastructure for that. The patches are based on > kernel 6.3 and were tested both in a virtual machine as well as in the > Steam Deck. Comments, suggestions and overall feedback is greatly > appreciated - thanks in advance! So how does this work if someone needs to mount 3 copies of the same filesystem at the same time? On XFS, we have the "nouuid" mount option which skips the duplicate UUID checking done at mount time so that multiple snapshots or images of the same filesystem can be mounted at the same time. This means we don't get the same filesystem mounted by accident, but also allows all the cases we know about where multiple versions of the filesystem need to be mounted at the same time. I know, fs UUIDs are used differently in btrfs vs XFS, but it would be nice for users if filesystems shared the same interfaces for doing the same sort of management operations... Cheers, Dave.
On 07/05/2023 20:10, Dave Chinner wrote: > [...] > So how does this work if someone needs to mount 3 copies of the same > filesystem at the same time? > > On XFS, we have the "nouuid" mount option which skips the duplicate > UUID checking done at mount time so that multiple snapshots or > images of the same filesystem can be mounted at the same time. This > means we don't get the same filesystem mounted by accident, but also > allows all the cases we know about where multiple versions of the > filesystem need to be mounted at the same time. > > I know, fs UUIDs are used differently in btrfs vs XFS, but it would > be nice for users if filesystems shared the same interfaces for > doing the same sort of management operations... > > Cheers, > > Dave. Hi Dave, thanks for the information / suggestion. I see no reason for the virtual_fsid fails with 3 or more devices; the idea is that it creates random fsids for the every device in which you mount with the flag, so shouldn't be a problem. Of course renaming to "nouuid" would be completely fine (at least for me) to keep consistency among filesystems; the only question that remains is if we should go with a mount option or the compat_ro flag as strongly suggest by Qu. Cheers, Guilherme
On 04/05/2023 14:07, Guilherme G. Piccoli wrote:
> [...]
Hi folks, V2 sent here:
https://lore.kernel.org/linux-btrfs/20230803154453.1488248-1-gpiccoli@igalia.com/
Thanks,
Guilherme