mbox series

[0/2] Supporting same fsid filesystems mounting on btrfs

Message ID 20230504170708.787361-1-gpiccoli@igalia.com (mailing list archive)
Headers show
Series Supporting same fsid filesystems mounting on btrfs | expand

Message

Guilherme G. Piccoli May 4, 2023, 5:07 p.m. UTC
Hi folks, this is an attempt of supporting same fsid mounting on btrfs.
Currently, we cannot reliably mount same fsid filesystems even one at
a time in btrfs, but if users want to mount them at the same time, it's
pretty much impossible. Other filesystems like ext4 are capable of that.

The goal is to allow systems with A/B partitioning scheme (like the
Steam Deck console or various mobile devices) to be able to hold
the same filesystem image in both partitions; it also allows to have
block device level check for filesystem integrity - this is used in the
Steam Deck image installation, to check if the current read-only image
is pristine. A bit more details are provided in the following ML thread:

https://lore.kernel.org/linux-btrfs/c702fe27-8da9-505b-6e27-713edacf723a@igalia.com/

The mechanism used to achieve it is based in the metadata_uuid feature,
leveraging such code infrastructure for that. The patches are based on
kernel 6.3 and were tested both in a virtual machine as well as in the
Steam Deck. Comments, suggestions and overall feedback is greatly
appreciated - thanks in advance!

Cheers,


Guilherme


Guilherme G. Piccoli (2):
  btrfs: Introduce the virtual_fsid feature
  btrfs: Add module parameter to enable non-mount scan skipping

 fs/btrfs/disk-io.c |  22 +++++++--
 fs/btrfs/ioctl.c   |  18 ++++++++
 fs/btrfs/super.c   |  41 ++++++++++++-----
 fs/btrfs/super.h   |   1 +
 fs/btrfs/volumes.c | 111 +++++++++++++++++++++++++++++++++++++++------
 fs/btrfs/volumes.h |  11 ++++-
 6 files changed, 174 insertions(+), 30 deletions(-)

Comments

Goffredo Baroncelli May 4, 2023, 7:28 p.m. UTC | #1
On 04/05/2023 19.07, Guilherme G. Piccoli wrote:
> Hi folks, this is an attempt of supporting same fsid mounting on btrfs.
> Currently, we cannot reliably mount same fsid filesystems even one at
> a time in btrfs, but if users want to mount them at the same time, it's
> pretty much impossible. Other filesystems like ext4 are capable of that.

Hi Guilherme,

did you tried to run "btrfs dev scan --forget /dev/sd.." before
mount the filesystem ?

Assuming that you have two devices /dev/sdA and /dev/sdB with two btrfs
filesystem with the same uuid, you should mount /dev/sdA

btrfs dev scan --forget /dev/sdB # you can use event /dev/sdA
mount /dev/sdA /mnt/target

and to mount /dev/sdB

btrfs dev scan --forget /dev/sdA # you can use event /dev/sdB
mount /dev/sdB /mnt/target

I made a quick test using two loop devices and it seems that it works
reliably.

Another option should be make a kernel change that "forget" the device
before mounting *if* the filesystem is composed by only one device (
and another few exceptions like the filesystem is already mounted).

This would avoid all the problem related to make a "temporary" uuid.

> 
> The goal is to allow systems with A/B partitioning scheme (like the
> Steam Deck console or various mobile devices) to be able to hold
> the same filesystem image in both partitions; it also allows to have
> block device level check for filesystem integrity - this is used in the
> Steam Deck image installation, to check if the current read-only image
> is pristine. A bit more details are provided in the following ML thread:
> 
> https://lore.kernel.org/linux-btrfs/c702fe27-8da9-505b-6e27-713edacf723a@igalia.com/
> 
> The mechanism used to achieve it is based in the metadata_uuid feature,
> leveraging such code infrastructure for that. The patches are based on
> kernel 6.3 and were tested both in a virtual machine as well as in the
> Steam Deck. Comments, suggestions and overall feedback is greatly
> appreciated - thanks in advance!
> 
> Cheers,
> 
> 
> Guilherme
> 
> 
> Guilherme G. Piccoli (2):
>    btrfs: Introduce the virtual_fsid feature
>    btrfs: Add module parameter to enable non-mount scan skipping
> 
>   fs/btrfs/disk-io.c |  22 +++++++--
>   fs/btrfs/ioctl.c   |  18 ++++++++
>   fs/btrfs/super.c   |  41 ++++++++++++-----
>   fs/btrfs/super.h   |   1 +
>   fs/btrfs/volumes.c | 111 +++++++++++++++++++++++++++++++++++++++------
>   fs/btrfs/volumes.h |  11 ++++-
>   6 files changed, 174 insertions(+), 30 deletions(-)
>
Guilherme G. Piccoli May 4, 2023, 8:10 p.m. UTC | #2
On 04/05/2023 16:28, Goffredo Baroncelli wrote:
> [...]
> Hi Guilherme,
> 
> did you tried to run "btrfs dev scan --forget /dev/sd.." before
> mount the filesystem ?
> 
> Assuming that you have two devices /dev/sdA and /dev/sdB with two btrfs
> filesystem with the same uuid, you should mount /dev/sdA
> 
> btrfs dev scan --forget /dev/sdB # you can use event /dev/sdA
> mount /dev/sdA /mnt/target
> 
> and to mount /dev/sdB
> 
> btrfs dev scan --forget /dev/sdA # you can use event /dev/sdB
> mount /dev/sdB /mnt/target
> 
> I made a quick test using two loop devices and it seems that it works
> reliably.

Hi Goffredo, thanks for your suggestion!

This seems interesting with regards the second patch here..indeed, I can
mount any of the 2 filesystems if I use the forget option - interesting
option, wasn't aware of that.

But unfortunately it seems limited to mounting one device at a time, and
we need to be able to mount *both* of them, due to an installation step.
If I try to forget the device that is mounted, it gets (obviously) a
"busy device" error.

Is there any missing step from my side, or mounting both devices is
really a limitation when using the forget option?


> 
> Another option should be make a kernel change that "forget" the device
> before mounting *if* the filesystem is composed by only one device (
> and another few exceptions like the filesystem is already mounted).
> 
> This would avoid all the problem related to make a "temporary" uuid.

I guess again this would be useful in the scope of the second patch
here...we could check the way you're proposing instead of having the
module parameter. In a way this is similar to the forget approach,
right? But it's kind of an "automatic" forget heh

How btrfs would know it is a case for single-device filesystem? In other
words: how would we distinguish between the cases we want to auto-forget
before mounting, and the cases in which this behavior is undesired?

Thanks again for your feedback, it is much appreciated.
Cheers,


Guilherme
Goffredo Baroncelli May 4, 2023, 9:09 p.m. UTC | #3
On 04/05/2023 22.10, Guilherme G. Piccoli wrote:
> On 04/05/2023 16:28, Goffredo Baroncelli wrote:
>> [...]
>> Hi Guilherme,
>>
>> did you tried to run "btrfs dev scan --forget /dev/sd.." before
>> mount the filesystem ?
>>
>> Assuming that you have two devices /dev/sdA and /dev/sdB with two btrfs
>> filesystem with the same uuid, you should mount /dev/sdA
>>
>> btrfs dev scan --forget /dev/sdB # you can use event /dev/sdA
>> mount /dev/sdA /mnt/target
>>
>> and to mount /dev/sdB
>>
>> btrfs dev scan --forget /dev/sdA # you can use event /dev/sdB
>> mount /dev/sdB /mnt/target
>>
>> I made a quick test using two loop devices and it seems that it works
>> reliably.
> 
> Hi Goffredo, thanks for your suggestion!
> 
> This seems interesting with regards the second patch here..indeed, I can
> mount any of the 2 filesystems if I use the forget option - interesting
> option, wasn't aware of that.
> 
> But unfortunately it seems limited to mounting one device at a time, and
> we need to be able to mount *both* of them, due to an installation step.
> If I try to forget the device that is mounted, it gets (obviously) a
> "busy device" error.

Ahh, I didn't realized that you want to mount the two FS at the
same time.

> 
> Is there any missing step from my side, or mounting both devices is
> really a limitation when using the forget option?
> 

 From my limited BTRFS internal knowledge, I think that your patches
takes the correct approach: using the "metadata_uuid" code to allow
two filesystems with the same uuid to exist at the same time.


> 
>>
>> Another option should be make a kernel change that "forget" the device
>> before mounting *if* the filesystem is composed by only one device (
>> and another few exceptions like the filesystem is already mounted).
>>
>> This would avoid all the problem related to make a "temporary" uuid.
> 
> I guess again this would be useful in the scope of the second patch
> here...we could check the way you're proposing instead of having the
> module parameter. In a way this is similar to the forget approach,
> right? But it's kind of an "automatic" forget heh
> 
> How btrfs would know it is a case for single-device filesystem? In other
> words: how would we distinguish between the cases we want to auto-forget
> before mounting, and the cases in which this behavior is undesired?

If I remember correctly in the super-block there is the number of disks
that compose the filesystem. If the count is 1, it should be safe to
forget-before-mount the filesystem (or to not store in the cache
after a scan)


> 
> Thanks again for your feedback, it is much appreciated.
> Cheers,
> 
> 
> Guilherme
Anand Jain May 5, 2023, 5:16 a.m. UTC | #4
On 5/5/23 01:07, Guilherme G. Piccoli wrote:
> Hi folks, this is an attempt of supporting same fsid mounting on btrfs.
> Currently, we cannot reliably mount same fsid filesystems even one at
> a time in btrfs, but if users want to mount them at the same time, it's
> pretty much impossible. Other filesystems like ext4 are capable of that.
> 


> The goal is to allow systems with A/B partitioning scheme (like the
> Steam Deck console or various mobile devices) to be able to hold
> the same filesystem image in both partitions; it also allows to have
> block device level check for filesystem integrity - this is used in the
> Steam Deck image installation, to check if the current read-only image
> is pristine. A bit more details are provided in the following ML thread:
> 
> https://lore.kernel.org/linux-btrfs/c702fe27-8da9-505b-6e27-713edacf723a@igalia.com/

Confused about your requirement: 2 identical filesystems mounted 
simultaneously or just one at a time? Latter works. Bugs were fixed.

Have you considered using the btrfs seed device feature to avoid 
sacrificing 50% capacity? Create read-only seed device as golden image, 
add writable device on top. Example:

   $ btrfstune -S1 /dev/rdonly-golden-img
   $ mount /dev/rdonly-golden-img /btrfs
   $ btrfs dev add /dev/rw-dev /btrfs
   $ mount -o remount,rw /dev/rw-dev /btrfs

To switch golden image:

   $ btrfs dev del /dev/rdonly-golden-img /btrfs
   $ umount /btrfs
   $ btrfstune -S1 /dev/rw-dev

Thanks, Anand


> The mechanism used to achieve it is based in the metadata_uuid feature,
> leveraging such code infrastructure for that. The patches are based on
> kernel 6.3 and were tested both in a virtual machine as well as in the
> Steam Deck. Comments, suggestions and overall feedback is greatly
> appreciated - thanks in advance!
> 
> Cheers,
> 
> 
> Guilherme
> 
> 
> Guilherme G. Piccoli (2):
>    btrfs: Introduce the virtual_fsid feature
>    btrfs: Add module parameter to enable non-mount scan skipping
> 
>   fs/btrfs/disk-io.c |  22 +++++++--
>   fs/btrfs/ioctl.c   |  18 ++++++++
>   fs/btrfs/super.c   |  41 ++++++++++++-----
>   fs/btrfs/super.h   |   1 +
>   fs/btrfs/volumes.c | 111 +++++++++++++++++++++++++++++++++++++++------
>   fs/btrfs/volumes.h |  11 ++++-
>   6 files changed, 174 insertions(+), 30 deletions(-)
>
Guilherme G. Piccoli May 5, 2023, 4:21 p.m. UTC | #5
On 04/05/2023 18:09, Goffredo Baroncelli wrote:
> [...]
>> Is there any missing step from my side, or mounting both devices is
>> really a limitation when using the forget option?
>>
> 
>  From my limited BTRFS internal knowledge, I think that your patches
> takes the correct approach: using the "metadata_uuid" code to allow
> two filesystems with the same uuid to exist at the same time.
> 
> [...] 
>> How btrfs would know it is a case for single-device filesystem? In other
>> words: how would we distinguish between the cases we want to auto-forget
>> before mounting, and the cases in which this behavior is undesired?
> 
> If I remember correctly in the super-block there is the number of disks
> that compose the filesystem. If the count is 1, it should be safe to
> forget-before-mount the filesystem (or to not store in the cache
> after a scan)

Thanks Goffredo, for the clarifications =)
Guilherme G. Piccoli May 5, 2023, 4:27 p.m. UTC | #6
On 05/05/2023 02:16, Anand Jain wrote:
> [...]
>>
>> https://lore.kernel.org/linux-btrfs/c702fe27-8da9-505b-6e27-713edacf723a@igalia.com/
> 
> Confused about your requirement: 2 identical filesystems mounted 
> simultaneously or just one at a time? Latter works. Bugs were fixed.

Hi Anand, apologies - in fact, in this old-ish thread I mentioned we
need to mount one at a time, and this corresponds for the majority of
the use case. BUT...it seems that for the installing step we require to
have *both* mounted at the same time for a while, so it was a change in
the requirement since last analysis, and this is really what we
implemented here.


> 
> Have you considered using the btrfs seed device feature to avoid 
> sacrificing 50% capacity? Create read-only seed device as golden image, 
> add writable device on top. Example:
> 
>    $ btrfstune -S1 /dev/rdonly-golden-img
>    $ mount /dev/rdonly-golden-img /btrfs
>    $ btrfs dev add /dev/rw-dev /btrfs
>    $ mount -o remount,rw /dev/rw-dev /btrfs
> 
> To switch golden image:
> 
>    $ btrfs dev del /dev/rdonly-golden-img /btrfs
>    $ umount /btrfs
>    $ btrfstune -S1 /dev/rw-dev
> 

Yeah, I'm aware that btrfs has some features that might fit and could
even save space, but due to some requirements on Deck it's not possible
to use them.

I'll defer a more detailed response for John / Vivek / Ludovico, that
are aware of the use case in a detail level I'm not, since they designed
the installation / update path from the ground up.

Cheers,


Guilherme
Goffredo Baroncelli May 5, 2023, 5:37 p.m. UTC | #7
On 05/05/2023 18.27, Guilherme G. Piccoli wrote:
> On 05/05/2023 02:16, Anand Jain wrote:
>> [...]
>>>
>>> https://lore.kernel.org/linux-btrfs/c702fe27-8da9-505b-6e27-713edacf723a@igalia.com/
>>
>> Confused about your requirement: 2 identical filesystems mounted
>> simultaneously or just one at a time? Latter works. Bugs were fixed.
> 
> Hi Anand, apologies - in fact, in this old-ish thread I mentioned we
> need to mount one at a time, and this corresponds for the majority of
> the use case. BUT...it seems that for the installing step we require to
> have *both* mounted at the same time for a while, so it was a change in
> the requirement since last analysis, and this is really what we
> implemented here.

What if the different images have different uuid from the begin ?
Vivek Das Mohapatra May 5, 2023, 6:15 p.m. UTC | #8
On 05/05/2023 17:27, Guilherme G. Piccoli wrote:
> On 05/05/2023 02:16, Anand Jain wrote:

[cut]

> I'll defer a more detailed response for John / Vivek / Ludovico, that
> are aware of the use case in a detail level I'm not, since they designed
> the installation / update path from the ground up.
> 

The OS images are entirely independent. The goal is that you could
completely corrupt slot A and it would have no impact on the bootability
of slot B.

So, yes, we sacrifice space but as a trade off we get robustness which
is more important to us.

=========================================================================

When a new OS image is delivered, the normal flow is this (simplified):

While booted on slot A (for example) the update process is started.

Our client fetches the most recent image from the update server.

This is delivered as a block level diff between the image you
have and the image you want.

The partitions that are allocated to slot B have the new data written
into them.

As a final step, the root fs of the new slot is mounted and a couple of
initialisation steps are completed (mostly writing config into the
common boot partition: The slot B partitions contents are not modified
as a result of this).

The system is rebooted. If all goes well slot B is booted and becomes
the primary (current) image.

If it fails for some reason, the bootloader will (either automatically
or by user intervention) go back to booting slot A.

Note that other than the final mount to update the common boot partition
with information about the new image we don't care at all about the
contents or even the type of the filesystems we have delivered (and even
then all we care about is that we _can_ mount it, not what it is).
===========================================================================

Now normally this is not a problem: If the new image is not the same as
the current one we will have written entirely new filesystems into
the B partitions and there is no conflict.

However if the user wishes or needs to reinstall a fresh copy of the
_current_ image (for whatever reason: maybe the current image is damaged
in some way and they need to so a factory reset) then with btrfs in the
mix this breaks down:

Since btrfs won't (at present) tolerate a second fs with the same fsuuid
we have to check that the user is not installing the same image on both
slots.

If the user has a broken image which is also the latest release and
needs to recover we have to artificially select an _older_ image, put
that on slot B. boot into that, then the user needs to boot that and
upgrade _again_ to get a repaired A slot.

This sort of works but isn't a great user experience and introduces an
artificial restriction - suddenly the images _do_ affect one another.

If the user subverts our safety checks (or we mess up and put the same
image on both slots) then suddenly the whole system becomes unbootable
which is less than ideal.

Hope that clarifies the situation and explains why we care.
Dave Chinner May 7, 2023, 11:10 p.m. UTC | #9
On Thu, May 04, 2023 at 02:07:06PM -0300, Guilherme G. Piccoli wrote:
> Hi folks, this is an attempt of supporting same fsid mounting on btrfs.
> Currently, we cannot reliably mount same fsid filesystems even one at
> a time in btrfs, but if users want to mount them at the same time, it's
> pretty much impossible. Other filesystems like ext4 are capable of that.
> 
> The goal is to allow systems with A/B partitioning scheme (like the
> Steam Deck console or various mobile devices) to be able to hold
> the same filesystem image in both partitions; it also allows to have
> block device level check for filesystem integrity - this is used in the
> Steam Deck image installation, to check if the current read-only image
> is pristine. A bit more details are provided in the following ML thread:
> 
> https://lore.kernel.org/linux-btrfs/c702fe27-8da9-505b-6e27-713edacf723a@igalia.com/
> 
> The mechanism used to achieve it is based in the metadata_uuid feature,
> leveraging such code infrastructure for that. The patches are based on
> kernel 6.3 and were tested both in a virtual machine as well as in the
> Steam Deck. Comments, suggestions and overall feedback is greatly
> appreciated - thanks in advance!

So how does this work if someone needs to mount 3 copies of the same
filesystem at the same time?

On XFS, we have the "nouuid" mount option which skips the duplicate
UUID checking done at mount time so that multiple snapshots or
images of the same filesystem can be mounted at the same time. This
means we don't get the same filesystem mounted by accident, but also
allows all the cases we know about where multiple versions of the
filesystem need to be mounted at the same time.

I know, fs UUIDs are used differently in btrfs vs XFS, but it would
be nice for users if filesystems shared the same interfaces for
doing the same sort of management operations...

Cheers,

Dave.
Guilherme G. Piccoli May 8, 2023, 10:45 p.m. UTC | #10
On 07/05/2023 20:10, Dave Chinner wrote:
> [...]
> So how does this work if someone needs to mount 3 copies of the same
> filesystem at the same time?
> 
> On XFS, we have the "nouuid" mount option which skips the duplicate
> UUID checking done at mount time so that multiple snapshots or
> images of the same filesystem can be mounted at the same time. This
> means we don't get the same filesystem mounted by accident, but also
> allows all the cases we know about where multiple versions of the
> filesystem need to be mounted at the same time.
> 
> I know, fs UUIDs are used differently in btrfs vs XFS, but it would
> be nice for users if filesystems shared the same interfaces for
> doing the same sort of management operations...
> 
> Cheers,
> 
> Dave.

Hi Dave, thanks for the information / suggestion.

I see no reason for the virtual_fsid fails with 3 or more devices; the
idea is that it creates random fsids for the every device in which you
mount with the flag, so shouldn't be a problem.

Of course renaming to "nouuid" would be completely fine (at least for
me) to keep consistency among filesystems; the only question that
remains is if we should go with a mount option or the compat_ro flag as
strongly suggest by Qu.

Cheers,


Guilherme
Guilherme G. Piccoli Aug. 3, 2023, 3:47 p.m. UTC | #11
On 04/05/2023 14:07, Guilherme G. Piccoli wrote:
> [...]

Hi folks, V2 sent here:
https://lore.kernel.org/linux-btrfs/20230803154453.1488248-1-gpiccoli@igalia.com/

Thanks,


Guilherme