diff mbox series

[V3,2/2] btrfs: Introduce the single-dev feature

Message ID 20230831001544.3379273-3-gpiccoli@igalia.com (mailing list archive)
State New, archived
Headers show
Series Supporting same fsid mounting through the single-dev compat_ro feature | expand

Commit Message

Guilherme G. Piccoli Aug. 31, 2023, 12:12 a.m. UTC
Btrfs doesn't currently support to mount 2 different devices holding the
same filesystem - the fsid is exposed as a unique identifier by the
driver. This case is supported though in some other common filesystems,
like ext4.

Supporting the same-fsid mounts has the advantage of allowing btrfs to
be used in A/B partitioned devices, like mobile phones or the Steam Deck
for example. Without this support, it's not safe for users to keep the
same "image version" in both A and B partitions, a setup that is quite
common for development, for example. Also, as a big bonus, it allows fs
integrity check based on block devices for RO devices (whereas currently
it is required that both have different fsid, breaking the block device
hash comparison).

Such same-fsid mounting is hereby added through the usage of the
filesystem feature "single-dev" - when such feature is used, btrfs
generates a random fsid for the filesystem and leverages the long-term
present metadata_uuid infrastructure to enable the usage of this
secondary virtual fsid, effectively requiring few non-invasive changes
to the code and no new potential corner cases.

In order to prevent more code complexity and corner cases, given
the nature of this mechanism (single devices), the single-dev feature
is not allowed when the metadata_uuid flag is already present on the
fs, or if the device is on fsid-change state. Device removal/replace
is also disabled for devices presenting the single-dev feature.

Suggested-by: John Schoenick <johns@valvesoftware.com>
Suggested-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
---
V3:

- Improve messaging on device replace with SINGLE_DEV devices;
it was a bad copy/paste for the removal case.

- Added single-dev feature to sysfs features array (caught through
the fstests work!). Thanks Anand for confirming that it's necessary.

- s/pr_info/btrfs_info and shifted to checking flags inside
functions instead of growing their argument list.
(Thanks Josef!)

- Changed memcmp() comparison to use "!=" ;
- Rebased against btrfs/for-next branch, adding the patch
"btrfs: simplify alloc_fs_devices() remove arg2" [0] on top.
(Thanks Anand!)

[0] https://lore.kernel.org/linux-btrfs/20230823145213.jfJYluPxXiX8zox086A3c8NeaQvvfYnJ43ZCpnE_KU0@z/

Anand: the distinction of fsid/metadata_uuid is indeed required on
btrfs_validate_super() - since we don't write the virtual/rand fsid to
the disk, and such function operates on the superblock that is read
from the disk, it fails for the single_dev case unless we condition check
there - thanks for noticing that though, was interesting to experiment
and validate =)


 fs/btrfs/disk-io.c         | 17 +++++++-
 fs/btrfs/fs.h              |  3 +-
 fs/btrfs/ioctl.c           | 18 ++++++++
 fs/btrfs/super.c           |  8 ++--
 fs/btrfs/sysfs.c           |  2 +
 fs/btrfs/volumes.c         | 84 ++++++++++++++++++++++++++++++++------
 fs/btrfs/volumes.h         |  3 +-
 include/uapi/linux/btrfs.h |  7 ++++
 8 files changed, 122 insertions(+), 20 deletions(-)

Comments

David Sterba Sept. 5, 2023, 4:50 p.m. UTC | #1
On Wed, Aug 30, 2023 at 09:12:34PM -0300, Guilherme G. Piccoli wrote:
> Btrfs doesn't currently support to mount 2 different devices holding the
> same filesystem - the fsid is exposed as a unique identifier by the
> driver. This case is supported though in some other common filesystems,
> like ext4.
> 
> Supporting the same-fsid mounts has the advantage of allowing btrfs to
> be used in A/B partitioned devices, like mobile phones or the Steam Deck
> for example. Without this support, it's not safe for users to keep the
> same "image version" in both A and B partitions, a setup that is quite
> common for development, for example. Also, as a big bonus, it allows fs
> integrity check based on block devices for RO devices (whereas currently
> it is required that both have different fsid, breaking the block device
> hash comparison).
> 
> Such same-fsid mounting is hereby added through the usage of the
> filesystem feature "single-dev" - when such feature is used, btrfs
> generates a random fsid for the filesystem and leverages the long-term
> present metadata_uuid infrastructure to enable the usage of this
> secondary virtual fsid, effectively requiring few non-invasive changes
> to the code and no new potential corner cases.
> 
> In order to prevent more code complexity and corner cases, given
> the nature of this mechanism (single devices), the single-dev feature
> is not allowed when the metadata_uuid flag is already present on the
> fs, or if the device is on fsid-change state. Device removal/replace
> is also disabled for devices presenting the single-dev feature.
> 
> Suggested-by: John Schoenick <johns@valvesoftware.com>
> Suggested-by: Qu Wenruo <wqu@suse.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>

I'd like to pick this as a feature for 6.7, it's extending code we
already have for metadata_uuid so this is a low risk feature. The only
problem I see for now is the name, using the word 'single'.

We have single as a block group profile name and a filesystem can exist
on a single device too, this is would be confusing when referring to it.
Single-dev can be a working name but for a final release we should
really try to pick something more unique. I don't have a suggestion for
now.

The plan for now is that I'll add the patch to a topic branch and add it
to for-next so it could be tested but there might be some updates still
needed. Either as changes to this patch or as separate patches, that
depends.
Guilherme G. Piccoli Sept. 5, 2023, 8:23 p.m. UTC | #2
On 05/09/2023 13:50, David Sterba wrote:
> [...]
> I'd like to pick this as a feature for 6.7, it's extending code we
> already have for metadata_uuid so this is a low risk feature. The only
> problem I see for now is the name, using the word 'single'.
> 
> We have single as a block group profile name and a filesystem can exist
> on a single device too, this is would be confusing when referring to it.
> Single-dev can be a working name but for a final release we should
> really try to pick something more unique. I don't have a suggestion for
> now.
> 
> The plan for now is that I'll add the patch to a topic branch and add it
> to for-next so it could be tested but there might be some updates still
> needed. Either as changes to this patch or as separate patches, that
> depends.
> 

Hi David, thanks for your feedback! I agree with you that this name is a
bit confusing, we can easily change that! How about virtual-fsid?
I confess I'm not the best (by far!) to name stuff, so I'll be glad to
follow a suggestion from anyone here heheh

I also agree we could have this merged in your -next tree, and once a
new (good) name is proposed, I can re-submit with that and you'd replace
the patch in your tree, if that makes sense to you. Of course an extra
patch changing the name is also valid, if it's your preference.

Cheers,


Guilherme
Anand Jain Sept. 6, 2023, 9:49 a.m. UTC | #3
On 9/6/23 04:23, Guilherme G. Piccoli wrote:
> On 05/09/2023 13:50, David Sterba wrote:
>> [...]
>> I'd like to pick this as a feature for 6.7, it's extending code we
>> already have for metadata_uuid so this is a low risk feature. The only
>> problem I see for now is the name, using the word 'single'.
>>
>> We have single as a block group profile name and a filesystem can exist
>> on a single device too, this is would be confusing when referring to it.
>> Single-dev can be a working name but for a final release we should
>> really try to pick something more unique. I don't have a suggestion for
>> now.
>>
>> The plan for now is that I'll add the patch to a topic branch and add it
>> to for-next so it could be tested but there might be some updates still
>> needed. Either as changes to this patch or as separate patches, that
>> depends.
>>
> 
> Hi David, thanks for your feedback! I agree with you that this name is a
> bit confusing, we can easily change that! How about virtual-fsid?
> I confess I'm not the best (by far!) to name stuff, so I'll be glad to
> follow a suggestion from anyone here heheh
> 

This feature might also be expanded to support multiple devices, so 
removing 'single' makes sense.

virtual-fsid is good.
or
random-fsid

Thanks, Anand

> I also agree we could have this merged in your -next tree, and once a
> new (good) name is proposed, I can re-submit with that and you'd replace
> the patch in your tree, if that makes sense to you. Of course an extra
> patch changing the name is also valid, if it's your preference.
> 
> Cheers,
> 
> 
> Guilherme
Anand Jain Sept. 6, 2023, 4:14 p.m. UTC | #4
On 31/08/2023 08:12, Guilherme G. Piccoli wrote:
> Btrfs doesn't currently support to mount 2 different devices holding the
> same filesystem - the fsid is exposed as a unique identifier by the
> driver. This case is supported though in some other common filesystems,
> like ext4.
> 
> Supporting the same-fsid mounts has the advantage of allowing btrfs to
> be used in A/B partitioned devices, like mobile phones or the Steam Deck
> for example. Without this support, it's not safe for users to keep the
> same "image version" in both A and B partitions, a setup that is quite
> common for development, for example. Also, as a big bonus, it allows fs
> integrity check based on block devices for RO devices (whereas currently
> it is required that both have different fsid, breaking the block device
> hash comparison).
> 
> Such same-fsid mounting is hereby added through the usage of the
> filesystem feature "single-dev" - when such feature is used, btrfs
> generates a random fsid for the filesystem and leverages the long-term
> present metadata_uuid infrastructure to enable the usage of this
> secondary virtual fsid, effectively requiring few non-invasive changes
> to the code and no new potential corner cases.
> 
> In order to prevent more code complexity and corner cases, given
> the nature of this mechanism (single devices), the single-dev feature
> is not allowed when the metadata_uuid flag is already present on the
> fs, or if the device is on fsid-change state. Device removal/replace
> is also disabled for devices presenting the single-dev feature.
> 
> Suggested-by: John Schoenick <johns@valvesoftware.com>
> Suggested-by: Qu Wenruo <wqu@suse.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
> V3:
> 
> - Improve messaging on device replace with SINGLE_DEV devices;
> it was a bad copy/paste for the removal case.
> 
> - Added single-dev feature to sysfs features array (caught through
> the fstests work!). Thanks Anand for confirming that it's necessary.
> 
> - s/pr_info/btrfs_info and shifted to checking flags inside
> functions instead of growing their argument list.
> (Thanks Josef!)
> 
> - Changed memcmp() comparison to use "!=" ;
> - Rebased against btrfs/for-next branch, adding the patch
> "btrfs: simplify alloc_fs_devices() remove arg2" [0] on top.
> (Thanks Anand!)
> 
> [0] https://lore.kernel.org/linux-btrfs/20230823145213.jfJYluPxXiX8zox086A3c8NeaQvvfYnJ43ZCpnE_KU0@z/
> 


> Anand: the distinction of fsid/metadata_uuid is indeed required on
> btrfs_validate_super() - since we don't write the virtual/rand fsid to
> the disk, and such function operates on the superblock that is read
> from the disk, it fails for the single_dev case unless we condition check
> there - thanks for noticing that though, was interesting to experiment
> and validate =)

Yep, that makes sense. Thanks. I have added cases 1 and 2 in an upcoming
patch, and as part of this patch, you could add case 3 as below. Case 4
is just for discussion.


1. Normally

     fs_devices->fsid == fs_devices->metadata_uuid == sb->fsid;
     sb->metadata_uuid == 0;

2. BTRFS_FEATURE_INCOMPAT_METADATA_UUID

     fs_devices->fsid == sb->fsid;
     fs_devices->metadata_uuid == sb->metadata_uuid;


3. BTRFS_FEATURE_COMPAT_RO_SINGLE_DEV

     fs_devices->fsid == random();
     fs_devices->metadata_uuid = sb->fsid;
     sb->metadata_uuid == 0;



For future development: (ignore for now)

4. BTRFS_FEATURE_INCOMPAT_METADATA_UUID |\
     BTRFS_FEATURE_COMPAT_RO_SINGLE_DEV

     fs_devices->fsid == random();
     sb->fsid == actual_fsid (unused);
     fs_devices->metadata_uuid == sb->metadata_uuid;




> @@ -2380,7 +2381,21 @@ int btrfs_validate_super(struct btrfs_fs_info *fs_info,
>   		ret = -EINVAL;
>   	}
>   
> -	if (memcmp(fs_info->fs_devices->fsid, sb->fsid, BTRFS_FSID_SIZE) != 0) {
> +	/*
> +	 * For SINGLE_DEV devices, btrfs creates a random fsid and makes
> +	 * use of the metadata_uuid infrastructure in order to allow, for
> +	 * example, two devices with same fsid getting mounted at the same
> +	 * time. But notice no changes happen at the disk level, the random
> +	 * generated fsid is a driver abstraction, not written to the disk.
> +	 * That's the reason we're required here to compare the fsid  with
> +	 * the metadata_uuid for such devices.
> +	 */
> +	if (btrfs_fs_compat_ro(fs_info, SINGLE_DEV))
> +		fsid = fs_info->fs_devices->metadata_uuid;
> +	else
> +		fsid = fs_info->fs_devices->fsid;
> +


> +	if (memcmp(fsid, sb->fsid, BTRFS_FSID_SIZE) != 0) {
>   		btrfs_err(fs_info,
>   		"superblock fsid doesn't match fsid of fs_devices: %pU != %pU",
>   			  sb->fsid, fs_info->fs_devices->fsid);
> diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
> index a523d64d5491..cad761f81932 100644
> --- a/fs/btrfs/fs.h
> +++ b/fs/btrfs/fs.h
> @@ -198,7 +198,8 @@ enum {
>   	(BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |	\
>   	 BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID | \
>   	 BTRFS_FEATURE_COMPAT_RO_VERITY |		\
> -	 BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE)
> +	 BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE |	\
> +	 BTRFS_FEATURE_COMPAT_RO_SINGLE_DEV)
>   
>   #define BTRFS_FEATURE_COMPAT_RO_SAFE_SET	0ULL
>   #define BTRFS_FEATURE_COMPAT_RO_SAFE_CLEAR	0ULL
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index a895d105464b..23eb15869cb5 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2678,6 +2678,12 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
>   	if (!capable(CAP_SYS_ADMIN))
>   		return -EPERM;
>   
> +	if (btrfs_fs_compat_ro(fs_info, SINGLE_DEV)) {
> +		btrfs_err(fs_info,
> +			  "device removal is unsupported on SINGLE_DEV devices\n");


No \n at the end. btrfs_err() already adds one.

> +		return -EINVAL;
> +	}
> +
>   	vol_args = memdup_user(arg, sizeof(*vol_args));
>   	if (IS_ERR(vol_args))
>   		return PTR_ERR(vol_args);
> @@ -2744,6 +2750,12 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
>   	if (!capable(CAP_SYS_ADMIN))
>   		return -EPERM;
>   
> +	if (btrfs_fs_compat_ro(fs_info, SINGLE_DEV)) {
> +		btrfs_err(fs_info,
> +			  "device removal is unsupported on SINGLE_DEV devices\n");

here too.

> +		return -EINVAL;
> +	}
> +
>   	vol_args = memdup_user(arg, sizeof(*vol_args));
>   	if (IS_ERR(vol_args))
>   		return PTR_ERR(vol_args);
> @@ -3268,6 +3280,12 @@ static long btrfs_ioctl_dev_replace(struct btrfs_fs_info *fs_info,
>   	if (!capable(CAP_SYS_ADMIN))
>   		return -EPERM;
>   
> +	if (btrfs_fs_compat_ro(fs_info, SINGLE_DEV)) {
> +		btrfs_err(fs_info,
> +			  "device replace is unsupported on SINGLE_DEV devices\n");

and here.

> +		return -EINVAL;
> +	}
> +
>   	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
>   		btrfs_err(fs_info, "device replace not supported on extent tree v2 yet");
>   		return -EINVAL;
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index cffdd6f7f8e8..5e20e7337261 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -889,7 +889,7 @@ static int btrfs_parse_device_options(const char *options, blk_mode_t flags)
>   				error = -ENOMEM;
>   				goto out;
>   			}
> -			device = btrfs_scan_one_device(device_name, flags);

> +			device = btrfs_scan_one_device(device_name, flags, true);

Why do we have to pass 'true' in btrfs_scan_one_device() here? It is
single device and I don't see any special handle for the seed device.


>   			kfree(device_name);
>   			if (IS_ERR(device)) {
>   				error = PTR_ERR(device);


> @@ -1484,7 +1484,7 @@ static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
>   		goto error_fs_info;
>   	}
>   
> -	device = btrfs_scan_one_device(device_name, mode);
> +	device = btrfs_scan_one_device(device_name, mode, true);
>   	if (IS_ERR(device)) {
>   		mutex_unlock(&uuid_mutex);
>   		error = PTR_ERR(device);



> @@ -2196,7 +2196,7 @@ static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
>   	switch (cmd) {
>   	case BTRFS_IOC_SCAN_DEV:
>   		mutex_lock(&uuid_mutex);
> -		device = btrfs_scan_one_device(vol->name, BLK_OPEN_READ);
> +		device = btrfs_scan_one_device(vol->name, BLK_OPEN_READ, false);
>   		ret = PTR_ERR_OR_ZERO(device);
>   		mutex_unlock(&uuid_mutex);
>   		break;
> @@ -2210,7 +2210,7 @@ static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
>   		break;
>   	case BTRFS_IOC_DEVICES_READY:
>   		mutex_lock(&uuid_mutex);
> -		device = btrfs_scan_one_device(vol->name, BLK_OPEN_READ);
> +		device = btrfs_scan_one_device(vol->name, BLK_OPEN_READ, false);
>   		if (IS_ERR(device)) {
>   			mutex_unlock(&uuid_mutex);
>   			ret = PTR_ERR(device);

With this patch, command 'btrfs device scan' and 'btrfs device ready'
returns -EINVAL for the single-device?  Some os distributions checks
the status using these commands during boot. Instead, it is ok to
just return success here.



> diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
> index b1d1ac25237b..5294064ffc64 100644
> --- a/fs/btrfs/sysfs.c
> +++ b/fs/btrfs/sysfs.c
> @@ -290,6 +290,7 @@ BTRFS_FEAT_ATTR_INCOMPAT(no_holes, NO_HOLES);
>   BTRFS_FEAT_ATTR_INCOMPAT(metadata_uuid, METADATA_UUID);
>   BTRFS_FEAT_ATTR_COMPAT_RO(free_space_tree, FREE_SPACE_TREE);
>   BTRFS_FEAT_ATTR_COMPAT_RO(block_group_tree, BLOCK_GROUP_TREE);
> +BTRFS_FEAT_ATTR_COMPAT_RO(single_dev, SINGLE_DEV);
>   BTRFS_FEAT_ATTR_INCOMPAT(raid1c34, RAID1C34);
>   #ifdef CONFIG_BLK_DEV_ZONED
>   BTRFS_FEAT_ATTR_INCOMPAT(zoned, ZONED);
> @@ -322,6 +323,7 @@ static struct attribute *btrfs_supported_feature_attrs[] = {
>   	BTRFS_FEAT_ATTR_PTR(free_space_tree),
>   	BTRFS_FEAT_ATTR_PTR(raid1c34),
>   	BTRFS_FEAT_ATTR_PTR(block_group_tree),
> +	BTRFS_FEAT_ATTR_PTR(single_dev),
>   #ifdef CONFIG_BLK_DEV_ZONED
>   	BTRFS_FEAT_ATTR_PTR(zoned),
>   #endif
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 999cb82dd288..b53318d32603 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -763,8 +763,37 @@ static struct btrfs_fs_devices *find_fsid_reverted_metadata(
>   
>   	return NULL;
>   }
> +
> +static void prepare_virtual_fsid(struct btrfs_super_block *disk_super,
> +				 const char *path)
> +{
> +	struct btrfs_fs_devices *fs_devices;
> +	u8 vfsid[BTRFS_FSID_SIZE];
> +	bool dup_fsid = true;
> +
> +	while (dup_fsid) {
> +		dup_fsid = false;
> +		generate_random_uuid(vfsid);
> +
> +		list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
> +			if (!memcmp(vfsid, fs_devices->fsid, BTRFS_FSID_SIZE) ||
> +			    !memcmp(vfsid, fs_devices->metadata_uuid,
> +				    BTRFS_FSID_SIZE))
> +				dup_fsid = true;
> +		}
> +	}
> +
> +	memcpy(disk_super->metadata_uuid, disk_super->fsid, BTRFS_FSID_SIZE);
> +	memcpy(disk_super->fsid, vfsid, BTRFS_FSID_SIZE);
> +
> +	btrfs_info(NULL,
> +		"virtual fsid (%pU) set for SINGLE_DEV device %s (real fsid %pU)\n",
> +		disk_super->fsid, path, disk_super->metadata_uuid);
> +}
> +
>   /*
> - * Add new device to list of registered devices
> + * Add new device to list of registered devices, or in case of a SINGLE_DEV
> + * device, also creates a virtual fsid to cope with same-fsid cases.
>    *
>    * Returns:
>    * device pointer which was just added or updated when successful
> @@ -785,6 +814,8 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>   		BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
>   	bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
>   					BTRFS_SUPER_FLAG_CHANGING_FSID_V2);
> +	bool single_dev = (btrfs_super_compat_ro_flags(disk_super) &
> +			BTRFS_FEATURE_COMPAT_RO_SINGLE_DEV);
>   
>   	error = lookup_bdev(path, &path_devt);
>   	if (error) {
> @@ -793,23 +824,32 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>   		return ERR_PTR(error);
>   	}
>   
> -	if (fsid_change_in_progress) {
> -		if (!has_metadata_uuid)
> -			fs_devices = find_fsid_inprogress(disk_super);
> -		else
> -			fs_devices = find_fsid_changed(disk_super);
> -	} else if (has_metadata_uuid) {
> -		fs_devices = find_fsid_with_metadata_uuid(disk_super);
> +	if (single_dev) {
> +		if (has_metadata_uuid || fsid_change_in_progress) {
> +			btrfs_err(NULL,
> +		"SINGLE_DEV devices don't support the metadata_uuid feature\n");
> +			return ERR_PTR(-EINVAL);

It could right?

> +		}
> +		prepare_virtual_fsid(disk_super, path);
>   	} else {
> -		fs_devices = find_fsid_reverted_metadata(disk_super);
> -		if (!fs_devices)
> -			fs_devices = find_fsid(disk_super->fsid, NULL);
> +		if (fsid_change_in_progress) {
> +			if (!has_metadata_uuid)
> +				fs_devices = find_fsid_inprogress(disk_super);
> +			else
> +				fs_devices = find_fsid_changed(disk_super);
> +		} else if (has_metadata_uuid) {
> +			fs_devices = find_fsid_with_metadata_uuid(disk_super);
> +		} else {
> +			fs_devices = find_fsid_reverted_metadata(disk_super);
> +			if (!fs_devices)
> +				fs_devices = find_fsid(disk_super->fsid, NULL);
> +		}
>   	}
>   
>   
>   	if (!fs_devices) {
>   		fs_devices = alloc_fs_devices(disk_super->fsid);
> -		if (has_metadata_uuid)
> +		if (has_metadata_uuid || single_dev)
>   			memcpy(fs_devices->metadata_uuid,
>   			       disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>   
> @@ -1357,13 +1397,15 @@ int btrfs_forget_devices(dev_t devt)
>    * and we are not allowed to call set_blocksize during the scan. The superblock
>    * is read via pagecache
>    */
> -struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags)
> +struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
> +					   bool mounting)
>   {
>   	struct btrfs_super_block *disk_super;
>   	bool new_device_added = false;
>   	struct btrfs_device *device = NULL;
>   	struct block_device *bdev;
>   	u64 bytenr, bytenr_orig;
> +	bool single_dev;
>   	int ret;
>   
>   	lockdep_assert_held(&uuid_mutex);
> @@ -1402,6 +1444,16 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags)
>   		goto error_bdev_put;
>   	}
>   
> +	single_dev = btrfs_super_compat_ro_flags(disk_super) &
> +			BTRFS_FEATURE_COMPAT_RO_SINGLE_DEV;
> +
> +	if (!mounting && single_dev) {
> +		pr_info("BTRFS: skipped non-mount scan on SINGLE_DEV device %s\n",
> +			path);
> +		btrfs_release_disk_super(disk_super);

leaks bdev?

> +		return ERR_PTR(-EINVAL);

We need to let seed device scan even for the single device.

In fact we can make no-scan required for the any fs with the total_devs 
== 1.

I wrote a patch send it out for the review. So no special handling for
single-device will be required.

Thanks, Anand

> +	}
> +
>   	device = device_list_add(path, disk_super, &new_device_added);
>   	if (!IS_ERR(device) && new_device_added)
>   		btrfs_free_stale_devices(device->devt, device);
> @@ -2391,6 +2443,12 @@ int btrfs_get_dev_args_from_path(struct btrfs_fs_info *fs_info,
>   
>   	args->devid = btrfs_stack_device_id(&disk_super->dev_item);
>   	memcpy(args->uuid, disk_super->dev_item.uuid, BTRFS_UUID_SIZE);
> +
> +	/*
> +	 * Note that SINGLE_DEV devices are not handled in a special way here;
> +	 * device removal/replace is instead forbidden when such feature is
> +	 * present, this note is for future users/readers of this function.
> +	 */
>   	if (btrfs_fs_incompat(fs_info, METADATA_UUID))
>   		memcpy(args->fsid, disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>   	else
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 2128a032c3b7..1ffeb333c55c 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -611,7 +611,8 @@ struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans,
>   void btrfs_mapping_tree_free(struct extent_map_tree *tree);
>   int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
>   		       blk_mode_t flags, void *holder);
> -struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags);
> +struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
> +					   bool mounting);
>   int btrfs_forget_devices(dev_t devt);
>   void btrfs_close_devices(struct btrfs_fs_devices *fs_devices);
>   void btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices);
> diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
> index dbb8b96da50d..cb7a7cfe1ea9 100644
> --- a/include/uapi/linux/btrfs.h
> +++ b/include/uapi/linux/btrfs.h
> @@ -313,6 +313,13 @@ struct btrfs_ioctl_fs_info_args {
>    */
>   #define BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE	(1ULL << 3)
>   
> +/*
> + * Single devices (as flagged by the corresponding compat_ro flag) only
> + * gets scanned during mount time; also, a random fsid is generated for
> + * them, in order to cope with same-fsid filesystem mounts.
> + */
> +#define BTRFS_FEATURE_COMPAT_RO_SINGLE_DEV		(1ULL << 4)
> +
>   #define BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF	(1ULL << 0)
>   #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL	(1ULL << 1)
>   #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS	(1ULL << 2)
David Sterba Sept. 7, 2023, 1:55 p.m. UTC | #5
On Wed, Sep 06, 2023 at 05:49:05PM +0800, Anand Jain wrote:
> On 9/6/23 04:23, Guilherme G. Piccoli wrote:
> > On 05/09/2023 13:50, David Sterba wrote:
> >> [...]
> >> I'd like to pick this as a feature for 6.7, it's extending code we
> >> already have for metadata_uuid so this is a low risk feature. The only
> >> problem I see for now is the name, using the word 'single'.
> >>
> >> We have single as a block group profile name and a filesystem can exist
> >> on a single device too, this is would be confusing when referring to it.
> >> Single-dev can be a working name but for a final release we should
> >> really try to pick something more unique. I don't have a suggestion for
> >> now.
> >>
> >> The plan for now is that I'll add the patch to a topic branch and add it
> >> to for-next so it could be tested but there might be some updates still
> >> needed. Either as changes to this patch or as separate patches, that
> >> depends.
> >>
> > 
> > Hi David, thanks for your feedback! I agree with you that this name is a
> > bit confusing, we can easily change that! How about virtual-fsid?
> > I confess I'm not the best (by far!) to name stuff, so I'll be glad to
> > follow a suggestion from anyone here heheh
> > 
> 
> This feature might also be expanded to support multiple devices, so 
> removing 'single' makes sense.

I'm not sure how this would work. In case of the single device we can be
sure which device belongs to the filesystem, just need the incompat bit
and internal uuid to distinguish it from the others.

With multiple devices how could we track which belong to the same
filesystem? This is the same problem we already have with scanning and
duplicating block devices.

The only thing I see is to specify the devices as mount options,
possibly with the bit marking the devices as seen but not
scanned/registered and never considered for the automatic mount.

> virtual-fsid is good.
> or
> random-fsid

I'm thinking about something that would be closer to how the devices'
uuids can be duplicated, so cloned_fsid or duplicate_fsid/dup_fsid.
Virtual can be anything, random sounds too random.
Guilherme G. Piccoli Sept. 7, 2023, 1:56 p.m. UTC | #6
On 06/09/2023 06:49, Anand Jain wrote:
> [...]
>> Hi David, thanks for your feedback! I agree with you that this name is a
>> bit confusing, we can easily change that! How about virtual-fsid?
>> I confess I'm not the best (by far!) to name stuff, so I'll be glad to
>> follow a suggestion from anyone here heheh
>>
> 
> This feature might also be expanded to support multiple devices, so 
> removing 'single' makes sense.
> 
> virtual-fsid is good.
> or
> random-fsid
> 
> Thanks, Anand
> 

Thanks Anand!

David / Josef, what do you think? 'virtual-fsid' makes sense for you?
Thanks,


Guilherme
Guilherme G. Piccoli Sept. 7, 2023, 3:06 p.m. UTC | #7
Hi Anand, thanks for the feedback / review, much appreciated!
You're definitely helping a lot with this patch =)


On 06/09/2023 13:14, Anand Jain wrote:
> [...]
>> Anand: the distinction of fsid/metadata_uuid is indeed required on
>> btrfs_validate_super() - since we don't write the virtual/rand fsid to
>> the disk, and such function operates on the superblock that is read
>> from the disk, it fails for the single_dev case unless we condition check
>> there - thanks for noticing that though, was interesting to experiment
>> and validate =)
> 
> Yep, that makes sense. Thanks. I have added cases 1 and 2 in an upcoming
> patch, and as part of this patch, you could add case 3 as below. Case 4
> is just for discussion.
> 
> 
> 1. Normally
> 
>      fs_devices->fsid == fs_devices->metadata_uuid == sb->fsid;
>      sb->metadata_uuid == 0;
> 
> 2. BTRFS_FEATURE_INCOMPAT_METADATA_UUID
> 
>      fs_devices->fsid == sb->fsid;
>      fs_devices->metadata_uuid == sb->metadata_uuid;
> 
> 
> 3. BTRFS_FEATURE_COMPAT_RO_SINGLE_DEV
> 
>      fs_devices->fsid == random();
>      fs_devices->metadata_uuid = sb->fsid;
>      sb->metadata_uuid == 0;
> 
> 
> 
> For future development: (ignore for now)
> 
> 4. BTRFS_FEATURE_INCOMPAT_METADATA_UUID |\
>      BTRFS_FEATURE_COMPAT_RO_SINGLE_DEV
> 
>      fs_devices->fsid == random();
>      sb->fsid == actual_fsid (unused);
>      fs_devices->metadata_uuid == sb->metadata_uuid;
> 
This is a very good way of expressing the differences, quite good as
documentation! Thanks for that, it makes sense for me.

What do you mean by "you could add case 3 as below"? I'm already doing
that in the code, correct? Or do you mean somehow document that? I guess
this could be kinda copy/paste as a comment or in the wiki, for example.

[Looking in the list I think I found a patch from you adding these
comments to volumes.h =) ]

> [...]
>> +		btrfs_err(fs_info,
>> +			  "device removal is unsupported on SINGLE_DEV devices\n");
> 
> No \n at the end. btrfs_err() already adds one.
> 
>> +		btrfs_err(fs_info,
>> +			  "device removal is unsupported on SINGLE_DEV devices\n");
> 
> here too.
>> +		btrfs_err(fs_info,
>> +			  "device replace is unsupported on SINGLE_DEV devices\n");
> 
> and here.
> 

Good catch, will fix that!


> [...]
>> @@ -889,7 +889,7 @@ static int btrfs_parse_device_options(const char *options, blk_mode_t flags)
>>   				error = -ENOMEM;
>>   				goto out;
>>   			}
>> -			device = btrfs_scan_one_device(device_name, flags);
> 
>> +			device = btrfs_scan_one_device(device_name, flags, true);
> 
> Why do we have to pass 'true' in btrfs_scan_one_device() here? It is
> single device and I don't see any special handle for the seed device.
>>   	case BTRFS_IOC_SCAN_DEV:
>>   		mutex_lock(&uuid_mutex);
>> -		device = btrfs_scan_one_device(vol->name, BLK_OPEN_READ);
>> +		device = btrfs_scan_one_device(vol->name, BLK_OPEN_READ, false);
>>   		ret = PTR_ERR_OR_ZERO(device);
>>   		mutex_unlock(&uuid_mutex);
>>   		break;
>> @@ -2210,7 +2210,7 @@ static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
>>   		break;
>>   	case BTRFS_IOC_DEVICES_READY:
>>   		mutex_lock(&uuid_mutex);
>> -		device = btrfs_scan_one_device(vol->name, BLK_OPEN_READ);
>> +		device = btrfs_scan_one_device(vol->name, BLK_OPEN_READ, false);
>>   		if (IS_ERR(device)) {
>>   			mutex_unlock(&uuid_mutex);
>>   			ret = PTR_ERR(device);
> 
> With this patch, command 'btrfs device scan' and 'btrfs device ready'
> returns -EINVAL for the single-device?  Some os distributions checks
> the status using these commands during boot. Instead, it is ok to
> just return success here.

These are related things. So, regarding the
btrfs_parse_device_options(), as per my understanding this a mount
option that causes a device scan - so, it is a mount-time operation
somehow, correct? But I'm glad to s/true/false and prevent such
scanning, if you think it's more appropriate.

Now, about "just return success" on device scans, just return 0 then? OK
for me...


> [...] 
>> +	if (single_dev) {
>> +		if (has_metadata_uuid || fsid_change_in_progress) {
>> +			btrfs_err(NULL,
>> +		"SINGLE_DEV devices don't support the metadata_uuid feature\n");
>> +			return ERR_PTR(-EINVAL);
> 
> It could right?

In theory, yes. But notice that we have a special situation with
SINGLE_DEV - we make use of the metadata_uuid infrastructure
*partially*; the in-memory structures are affected, but the superblock
is not touched. Now, to support both metadata_uuid and SINGLE_DEV, it
means for example that the user wants to mount two identical devices
(with metadata_uuid enabled) at same time, which is not currently
supported. But then SINGLE_DEV can't use metadata_uuid for that, since
this infrastructure is in use already, we'd need to have like a third
fsid entity, IIUC.

I think this could be achieved but I'm not sure the value for that, and
specially in the first iteration of the patch. I'd vote to merge this as
simple as possible and maybe extend that in the future to support
co-existing metadata_uuid, if that makes sense for some users. OK for you?


> [...]
>> @@ -1402,6 +1444,16 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags)
>>   		goto error_bdev_put;
>>   	}
>>   
>> +	single_dev = btrfs_super_compat_ro_flags(disk_super) &
>> +			BTRFS_FEATURE_COMPAT_RO_SINGLE_DEV;
>> +
>> +	if (!mounting && single_dev) {
>> +		pr_info("BTRFS: skipped non-mount scan on SINGLE_DEV device %s\n",
>> +			path);
>> +		btrfs_release_disk_super(disk_super);
> 
> leaks bdev?
> 

Ugh, apparently yes, thanks for noticing this!


>> +		return ERR_PTR(-EINVAL);
> 
> We need to let seed device scan even for the single device.
> 
> In fact we can make no-scan required for the any fs with the total_devs 
> == 1.
> 
> I wrote a patch send it out for the review. So no special handling for
> single-device will be required.

This one?
https://lore.kernel.org/linux-btrfs/b0e0240254557461c137cd9b943f00b0d5048083.1693959204.git.anand.jain@oracle.com/

OK, seems it does directly affect my patch, if yours is merged I can
remove part of the code I'm proposing, which is nice. I'll wait David /
Josef feedback on your patch to move on from here, if that's accepted,
I'll incorporate yours in my next iteration.
--

If possible, could you CC me in your patches related to metadata_uuid
and fsid for now, since I'm also working on that? This helps me to be
aware of the stuff.

For example, looking in the list, I just found:
https://lore.kernel.org/linux-btrfs/0b71460e3a52cf77cd0f7d533e28d2502e285c11.1693820430.git.anand.jain@oracle.com/

This is the perfect place to add the comment above, related to fsid's
right? I'll do that in my next version.

Thanks,


Guilherme
Guilherme G. Piccoli Sept. 7, 2023, 3:07 p.m. UTC | #8
On 07/09/2023 10:55, David Sterba wrote:
> [...]
>> virtual-fsid is good.
>> or
>> random-fsid
> 
> I'm thinking about something that would be closer to how the devices'
> uuids can be duplicated, so cloned_fsid or duplicate_fsid/dup_fsid.
> Virtual can be anything, random sounds too random.
> 

same-fsid maybe? I could go with any of them, up to you / Josef =)
Anand Jain Sept. 7, 2023, 4:01 p.m. UTC | #9
On 9/7/23 21:55, David Sterba wrote:
> On Wed, Sep 06, 2023 at 05:49:05PM +0800, Anand Jain wrote:
>> On 9/6/23 04:23, Guilherme G. Piccoli wrote:
>>> On 05/09/2023 13:50, David Sterba wrote:
>>>> [...]
>>>> I'd like to pick this as a feature for 6.7, it's extending code we
>>>> already have for metadata_uuid so this is a low risk feature. The only
>>>> problem I see for now is the name, using the word 'single'.
>>>>
>>>> We have single as a block group profile name and a filesystem can exist
>>>> on a single device too, this is would be confusing when referring to it.
>>>> Single-dev can be a working name but for a final release we should
>>>> really try to pick something more unique. I don't have a suggestion for
>>>> now.
>>>>
>>>> The plan for now is that I'll add the patch to a topic branch and add it
>>>> to for-next so it could be tested but there might be some updates still
>>>> needed. Either as changes to this patch or as separate patches, that
>>>> depends.
>>>>
>>>
>>> Hi David, thanks for your feedback! I agree with you that this name is a
>>> bit confusing, we can easily change that! How about virtual-fsid?
>>> I confess I'm not the best (by far!) to name stuff, so I'll be glad to
>>> follow a suggestion from anyone here heheh
>>>
>>
>> This feature might also be expanded to support multiple devices, so
>> removing 'single' makes sense.
> 
> I'm not sure how this would work. In case of the single device we can be
> sure which device belongs to the filesystem, just need the incompat bit
> and internal uuid to distinguish it from the others.
> 
> With multiple devices how could we track which belong to the same
> filesystem? This is the same problem we already have with scanning and
> duplicating block devices.
> 
> The only thing I see is to specify the devices as mount options,
> possibly with the bit marking the devices as seen but not
> scanned/registered and never considered for the automatic mount.
> 

Indeed, users have no means to accurately identify and group the devices
unless this information is maintained in a configuration file (similar
to md? I think). What I mean to convey is that it's possible through
alternative methods with certain trade-offs. Personally, I don't prefer
the configuration file method, but not all use cases share the same
preference.


>> virtual-fsid is good.
>> or
>> random-fsid
> 
> I'm thinking about something that would be closer to how the devices'
> uuids can be duplicated, so cloned_fsid or duplicate_fsid/dup_fsid.
> Virtual can be anything, random sounds too random.

At each mount, the fsid will be different and temporary; it may or may
not be a duplicate device. So, I believe temp_fsid, proxy_fsid, or 
virtual_fsid could represent these configurations.

Thanks, Anand
David Sterba Sept. 11, 2023, 6:28 p.m. UTC | #10
On Wed, Aug 30, 2023 at 09:12:34PM -0300, Guilherme G. Piccoli wrote:
> Btrfs doesn't currently support to mount 2 different devices holding the
> same filesystem - the fsid is exposed as a unique identifier by the
> driver. This case is supported though in some other common filesystems,
> like ext4.
> 
> Supporting the same-fsid mounts has the advantage of allowing btrfs to
> be used in A/B partitioned devices, like mobile phones or the Steam Deck
> for example. Without this support, it's not safe for users to keep the
> same "image version" in both A and B partitions, a setup that is quite
> common for development, for example. Also, as a big bonus, it allows fs
> integrity check based on block devices for RO devices (whereas currently
> it is required that both have different fsid, breaking the block device
> hash comparison).
> 
> Such same-fsid mounting is hereby added through the usage of the
> filesystem feature "single-dev" - when such feature is used, btrfs
> generates a random fsid for the filesystem and leverages the long-term
> present metadata_uuid infrastructure to enable the usage of this
> secondary virtual fsid, effectively requiring few non-invasive changes
> to the code and no new potential corner cases.
> 
> In order to prevent more code complexity and corner cases, given
> the nature of this mechanism (single devices), the single-dev feature
> is not allowed when the metadata_uuid flag is already present on the
> fs, or if the device is on fsid-change state. Device removal/replace
> is also disabled for devices presenting the single-dev feature.
> 
> Suggested-by: John Schoenick <johns@valvesoftware.com>
> Suggested-by: Qu Wenruo <wqu@suse.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>

I've added Anand's patch
https://lore.kernel.org/linux-btrfs/de8d71b1b08f2c6ce75e3c45ee801659ecd4dc43.1694164368.git.anand.jain@oracle.com/
to misc-next that implements subset of your patch, namely extending
btrfs_scan_one_device() with the 'mounting' parameter. I haven't looked
if the semantics is the same so I let you take a look.

As there were more comments to V3, please fix that and resend. Thanks.
Guilherme G. Piccoli Sept. 11, 2023, 6:53 p.m. UTC | #11
On 11/09/2023 15:28, David Sterba wrote:
> On Wed, Aug 30, 2023 at 09:12:34PM -0300, Guilherme G. Piccoli wrote:
> I've added Anand's patch
> https://lore.kernel.org/linux-btrfs/de8d71b1b08f2c6ce75e3c45ee801659ecd4dc43.1694164368.git.anand.jain@oracle.com/
> to misc-next that implements subset of your patch, namely extending
> btrfs_scan_one_device() with the 'mounting' parameter. I haven't looked
> if the semantics is the same so I let you take a look.
> 
> As there were more comments to V3, please fix that and resend. Thanks.
> 

Thanks David, will do.
Did we agree about the name of the feature? temp_fsid maybe?
Anand Jain Sept. 12, 2023, 9:20 a.m. UTC | #12
On 12/09/2023 02:28, David Sterba wrote:
> On Wed, Aug 30, 2023 at 09:12:34PM -0300, Guilherme G. Piccoli wrote:
>> Btrfs doesn't currently support to mount 2 different devices holding the
>> same filesystem - the fsid is exposed as a unique identifier by the
>> driver. This case is supported though in some other common filesystems,
>> like ext4.
>>
>> Supporting the same-fsid mounts has the advantage of allowing btrfs to
>> be used in A/B partitioned devices, like mobile phones or the Steam Deck
>> for example. Without this support, it's not safe for users to keep the
>> same "image version" in both A and B partitions, a setup that is quite
>> common for development, for example. Also, as a big bonus, it allows fs
>> integrity check based on block devices for RO devices (whereas currently
>> it is required that both have different fsid, breaking the block device
>> hash comparison).
>>
>> Such same-fsid mounting is hereby added through the usage of the
>> filesystem feature "single-dev" - when such feature is used, btrfs
>> generates a random fsid for the filesystem and leverages the long-term
>> present metadata_uuid infrastructure to enable the usage of this
>> secondary virtual fsid, effectively requiring few non-invasive changes
>> to the code and no new potential corner cases.
>>
>> In order to prevent more code complexity and corner cases, given
>> the nature of this mechanism (single devices), the single-dev feature
>> is not allowed when the metadata_uuid flag is already present on the
>> fs, or if the device is on fsid-change state. Device removal/replace
>> is also disabled for devices presenting the single-dev feature.
>>
>> Suggested-by: John Schoenick <johns@valvesoftware.com>
>> Suggested-by: Qu Wenruo <wqu@suse.com>
>> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> 
> I've added Anand's patch
> https://lore.kernel.org/linux-btrfs/de8d71b1b08f2c6ce75e3c45ee801659ecd4dc43.1694164368.git.anand.jain@oracle.com/
> to misc-next that implements subset of your patch, namely extending
> btrfs_scan_one_device() with the 'mounting' parameter. I haven't looked
> if the semantics is the same so I let you take a look.
> 
> As there were more comments to V3, please fix that and resend. Thanks.

Guliherme,

   Please also add the newly sent patch to the latest misc-next branch:
     [PATCH] btrfs: scan forget for no instance of dev

   The test case btrfs/254 fails without it.

Thanks.
Guilherme G. Piccoli Sept. 12, 2023, 9:26 p.m. UTC | #13
On 12/09/2023 06:20, Anand Jain wrote:
> [...]
>> I've added Anand's patch
>> https://lore.kernel.org/linux-btrfs/de8d71b1b08f2c6ce75e3c45ee801659ecd4dc43.1694164368.git.anand.jain@oracle.com/
>> to misc-next that implements subset of your patch, namely extending
>> btrfs_scan_one_device() with the 'mounting' parameter. I haven't looked
>> if the semantics is the same so I let you take a look.
>>
>> As there were more comments to V3, please fix that and resend. Thanks.
> [...] 
>    Please also add the newly sent patch to the latest misc-next branch:
>      [PATCH] btrfs: scan forget for no instance of dev
> 
>    The test case btrfs/254 fails without it.
> 

Sure Anand, thanks for the heads-up!

Now, sorry for the silly question, but where is misc-next?! I'm looking
here: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/

I based my work in the branch "for-next", but I can't find misc-next.
Also, I couldn't find "btrfs: pseudo device-scan for single-device
filesystems" in the tree...probably some silly confusion from my side,
any advice is appreciated!

Thanks,


Guilherme
Anand Jain Sept. 13, 2023, 12:39 a.m. UTC | #14
On 13/09/2023 05:26, Guilherme G. Piccoli wrote:
> On 12/09/2023 06:20, Anand Jain wrote:
>> [...]
>>> I've added Anand's patch
>>> https://lore.kernel.org/linux-btrfs/de8d71b1b08f2c6ce75e3c45ee801659ecd4dc43.1694164368.git.anand.jain@oracle.com/
>>> to misc-next that implements subset of your patch, namely extending
>>> btrfs_scan_one_device() with the 'mounting' parameter. I haven't looked
>>> if the semantics is the same so I let you take a look.
>>>
>>> As there were more comments to V3, please fix that and resend. Thanks.
>> [...]
>>     Please also add the newly sent patch to the latest misc-next branch:
>>       [PATCH] btrfs: scan forget for no instance of dev
>>
>>     The test case btrfs/254 fails without it.
>>
> 
> Sure Anand, thanks for the heads-up!
> 
> Now, sorry for the silly question, but where is misc-next?! I'm looking
> here: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/
> 
> I based my work in the branch "for-next", but I can't find misc-next.
> Also, I couldn't find "btrfs: pseudo device-scan for single-device
> filesystems" in the tree...probably some silly confusion from my side,
> any advice is appreciated!


David maintains the upcoming mainline merges in the branch "misc-next" here:

    https://github.com/kdave/btrfs-devel.git

Thanks.
Guilherme G. Piccoli Sept. 13, 2023, 1:15 p.m. UTC | #15
On 12/09/2023 21:39, Anand Jain wrote:
> [..] 
>> Now, sorry for the silly question, but where is misc-next?! I'm looking
>> here: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/
>>
>> I based my work in the branch "for-next", but I can't find misc-next.
>> Also, I couldn't find "btrfs: pseudo device-scan for single-device
>> filesystems" in the tree...probably some silly confusion from my side,
>> any advice is appreciated!
> 
> 
> David maintains the upcoming mainline merges in the branch "misc-next" here:
> 
>     https://github.com/kdave/btrfs-devel.git
> 
> Thanks.
> 

Thanks a lot Anand!

Checking now, I could also find it in the documentation - my bad, apologies
David Sterba Sept. 13, 2023, 5:24 p.m. UTC | #16
On Tue, Sep 12, 2023 at 05:20:42PM +0800, Anand Jain wrote:
> 
> 
> On 12/09/2023 02:28, David Sterba wrote:
> > On Wed, Aug 30, 2023 at 09:12:34PM -0300, Guilherme G. Piccoli wrote:
> >> Btrfs doesn't currently support to mount 2 different devices holding the
> >> same filesystem - the fsid is exposed as a unique identifier by the
> >> driver. This case is supported though in some other common filesystems,
> >> like ext4.
> >>
> >> Supporting the same-fsid mounts has the advantage of allowing btrfs to
> >> be used in A/B partitioned devices, like mobile phones or the Steam Deck
> >> for example. Without this support, it's not safe for users to keep the
> >> same "image version" in both A and B partitions, a setup that is quite
> >> common for development, for example. Also, as a big bonus, it allows fs
> >> integrity check based on block devices for RO devices (whereas currently
> >> it is required that both have different fsid, breaking the block device
> >> hash comparison).
> >>
> >> Such same-fsid mounting is hereby added through the usage of the
> >> filesystem feature "single-dev" - when such feature is used, btrfs
> >> generates a random fsid for the filesystem and leverages the long-term
> >> present metadata_uuid infrastructure to enable the usage of this
> >> secondary virtual fsid, effectively requiring few non-invasive changes
> >> to the code and no new potential corner cases.
> >>
> >> In order to prevent more code complexity and corner cases, given
> >> the nature of this mechanism (single devices), the single-dev feature
> >> is not allowed when the metadata_uuid flag is already present on the
> >> fs, or if the device is on fsid-change state. Device removal/replace
> >> is also disabled for devices presenting the single-dev feature.
> >>
> >> Suggested-by: John Schoenick <johns@valvesoftware.com>
> >> Suggested-by: Qu Wenruo <wqu@suse.com>
> >> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> > 
> > I've added Anand's patch
> > https://lore.kernel.org/linux-btrfs/de8d71b1b08f2c6ce75e3c45ee801659ecd4dc43.1694164368.git.anand.jain@oracle.com/
> > to misc-next that implements subset of your patch, namely extending
> > btrfs_scan_one_device() with the 'mounting' parameter. I haven't looked
> > if the semantics is the same so I let you take a look.
> > 
> > As there were more comments to V3, please fix that and resend. Thanks.
> 
> Guliherme,
> 
>    Please also add the newly sent patch to the latest misc-next branch:
>      [PATCH] btrfs: scan forget for no instance of dev
> 
>    The test case btrfs/254 fails without it.

The mention patch has been folded to the scanning/register patch so you
may now use misc-next as a base.
David Sterba Sept. 13, 2023, 5:32 p.m. UTC | #17
On Wed, Sep 13, 2023 at 10:15:13AM -0300, Guilherme G. Piccoli wrote:
> On 12/09/2023 21:39, Anand Jain wrote:
> > [..] 
> >> Now, sorry for the silly question, but where is misc-next?! I'm looking
> >> here: https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git/
> >>
> >> I based my work in the branch "for-next", but I can't find misc-next.
> >> Also, I couldn't find "btrfs: pseudo device-scan for single-device
> >> filesystems" in the tree...probably some silly confusion from my side,
> >> any advice is appreciated!
> > 
> > 
> > David maintains the upcoming mainline merges in the branch "misc-next" here:
> > 
> >     https://github.com/kdave/btrfs-devel.git
> > 
> > Thanks.
> > 
> 
> Thanks a lot Anand!
> 
> Checking now, I could also find it in the documentation - my bad, apologies

It's documented at https://btrfs.readthedocs.io/en/latest/Source-repositories.html
but it can be always improved. If the page contents was incomplete from
you POV please let me know or open an issue at
github.com/kdave/btrfs-progs .
Guilherme G. Piccoli Sept. 13, 2023, 5:56 p.m. UTC | #18
On 13/09/2023 14:24, David Sterba wrote:
> On Tue, Sep 12, 2023 at 05:20:42PM +0800, Anand Jain wrote:
>> [...]
> The mention patch has been folded to the scanning/register patch so you
> may now use misc-next as a base.
> 

Perfect, thanks David - I'm already working with misc-next here =)
Guilherme G. Piccoli Sept. 13, 2023, 5:58 p.m. UTC | #19
On 13/09/2023 14:32, David Sterba wrote:
> [...]
>> Checking now, I could also find it in the documentation - my bad, apologies
> 
> It's documented at https://btrfs.readthedocs.io/en/latest/Source-repositories.html
> but it can be always improved. If the page contents was incomplete from
> you POV please let me know or open an issue at
> github.com/kdave/btrfs-progs .
> 

Indeed, it's there and from my perspective, it's very fine as is, I just
missed that, silly me heh

Cheers!
diff mbox series

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 9858c77b99e6..7d872107a8f7 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2300,6 +2300,7 @@  int btrfs_validate_super(struct btrfs_fs_info *fs_info,
 {
 	u64 nodesize = btrfs_super_nodesize(sb);
 	u64 sectorsize = btrfs_super_sectorsize(sb);
+	u8 *fsid;
 	int ret = 0;
 
 	if (btrfs_super_magic(sb) != BTRFS_MAGIC) {
@@ -2380,7 +2381,21 @@  int btrfs_validate_super(struct btrfs_fs_info *fs_info,
 		ret = -EINVAL;
 	}
 
-	if (memcmp(fs_info->fs_devices->fsid, sb->fsid, BTRFS_FSID_SIZE) != 0) {
+	/*
+	 * For SINGLE_DEV devices, btrfs creates a random fsid and makes
+	 * use of the metadata_uuid infrastructure in order to allow, for
+	 * example, two devices with same fsid getting mounted at the same
+	 * time. But notice no changes happen at the disk level, the random
+	 * generated fsid is a driver abstraction, not written to the disk.
+	 * That's the reason we're required here to compare the fsid  with
+	 * the metadata_uuid for such devices.
+	 */
+	if (btrfs_fs_compat_ro(fs_info, SINGLE_DEV))
+		fsid = fs_info->fs_devices->metadata_uuid;
+	else
+		fsid = fs_info->fs_devices->fsid;
+
+	if (memcmp(fsid, sb->fsid, BTRFS_FSID_SIZE) != 0) {
 		btrfs_err(fs_info,
 		"superblock fsid doesn't match fsid of fs_devices: %pU != %pU",
 			  sb->fsid, fs_info->fs_devices->fsid);
diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
index a523d64d5491..cad761f81932 100644
--- a/fs/btrfs/fs.h
+++ b/fs/btrfs/fs.h
@@ -198,7 +198,8 @@  enum {
 	(BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE |	\
 	 BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID | \
 	 BTRFS_FEATURE_COMPAT_RO_VERITY |		\
-	 BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE)
+	 BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE |	\
+	 BTRFS_FEATURE_COMPAT_RO_SINGLE_DEV)
 
 #define BTRFS_FEATURE_COMPAT_RO_SAFE_SET	0ULL
 #define BTRFS_FEATURE_COMPAT_RO_SAFE_CLEAR	0ULL
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a895d105464b..23eb15869cb5 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2678,6 +2678,12 @@  static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
+	if (btrfs_fs_compat_ro(fs_info, SINGLE_DEV)) {
+		btrfs_err(fs_info,
+			  "device removal is unsupported on SINGLE_DEV devices\n");
+		return -EINVAL;
+	}
+
 	vol_args = memdup_user(arg, sizeof(*vol_args));
 	if (IS_ERR(vol_args))
 		return PTR_ERR(vol_args);
@@ -2744,6 +2750,12 @@  static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
+	if (btrfs_fs_compat_ro(fs_info, SINGLE_DEV)) {
+		btrfs_err(fs_info,
+			  "device removal is unsupported on SINGLE_DEV devices\n");
+		return -EINVAL;
+	}
+
 	vol_args = memdup_user(arg, sizeof(*vol_args));
 	if (IS_ERR(vol_args))
 		return PTR_ERR(vol_args);
@@ -3268,6 +3280,12 @@  static long btrfs_ioctl_dev_replace(struct btrfs_fs_info *fs_info,
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
+	if (btrfs_fs_compat_ro(fs_info, SINGLE_DEV)) {
+		btrfs_err(fs_info,
+			  "device replace is unsupported on SINGLE_DEV devices\n");
+		return -EINVAL;
+	}
+
 	if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
 		btrfs_err(fs_info, "device replace not supported on extent tree v2 yet");
 		return -EINVAL;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index cffdd6f7f8e8..5e20e7337261 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -889,7 +889,7 @@  static int btrfs_parse_device_options(const char *options, blk_mode_t flags)
 				error = -ENOMEM;
 				goto out;
 			}
-			device = btrfs_scan_one_device(device_name, flags);
+			device = btrfs_scan_one_device(device_name, flags, true);
 			kfree(device_name);
 			if (IS_ERR(device)) {
 				error = PTR_ERR(device);
@@ -1484,7 +1484,7 @@  static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
 		goto error_fs_info;
 	}
 
-	device = btrfs_scan_one_device(device_name, mode);
+	device = btrfs_scan_one_device(device_name, mode, true);
 	if (IS_ERR(device)) {
 		mutex_unlock(&uuid_mutex);
 		error = PTR_ERR(device);
@@ -2196,7 +2196,7 @@  static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
 	switch (cmd) {
 	case BTRFS_IOC_SCAN_DEV:
 		mutex_lock(&uuid_mutex);
-		device = btrfs_scan_one_device(vol->name, BLK_OPEN_READ);
+		device = btrfs_scan_one_device(vol->name, BLK_OPEN_READ, false);
 		ret = PTR_ERR_OR_ZERO(device);
 		mutex_unlock(&uuid_mutex);
 		break;
@@ -2210,7 +2210,7 @@  static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
 		break;
 	case BTRFS_IOC_DEVICES_READY:
 		mutex_lock(&uuid_mutex);
-		device = btrfs_scan_one_device(vol->name, BLK_OPEN_READ);
+		device = btrfs_scan_one_device(vol->name, BLK_OPEN_READ, false);
 		if (IS_ERR(device)) {
 			mutex_unlock(&uuid_mutex);
 			ret = PTR_ERR(device);
diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index b1d1ac25237b..5294064ffc64 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -290,6 +290,7 @@  BTRFS_FEAT_ATTR_INCOMPAT(no_holes, NO_HOLES);
 BTRFS_FEAT_ATTR_INCOMPAT(metadata_uuid, METADATA_UUID);
 BTRFS_FEAT_ATTR_COMPAT_RO(free_space_tree, FREE_SPACE_TREE);
 BTRFS_FEAT_ATTR_COMPAT_RO(block_group_tree, BLOCK_GROUP_TREE);
+BTRFS_FEAT_ATTR_COMPAT_RO(single_dev, SINGLE_DEV);
 BTRFS_FEAT_ATTR_INCOMPAT(raid1c34, RAID1C34);
 #ifdef CONFIG_BLK_DEV_ZONED
 BTRFS_FEAT_ATTR_INCOMPAT(zoned, ZONED);
@@ -322,6 +323,7 @@  static struct attribute *btrfs_supported_feature_attrs[] = {
 	BTRFS_FEAT_ATTR_PTR(free_space_tree),
 	BTRFS_FEAT_ATTR_PTR(raid1c34),
 	BTRFS_FEAT_ATTR_PTR(block_group_tree),
+	BTRFS_FEAT_ATTR_PTR(single_dev),
 #ifdef CONFIG_BLK_DEV_ZONED
 	BTRFS_FEAT_ATTR_PTR(zoned),
 #endif
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 999cb82dd288..b53318d32603 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -763,8 +763,37 @@  static struct btrfs_fs_devices *find_fsid_reverted_metadata(
 
 	return NULL;
 }
+
+static void prepare_virtual_fsid(struct btrfs_super_block *disk_super,
+				 const char *path)
+{
+	struct btrfs_fs_devices *fs_devices;
+	u8 vfsid[BTRFS_FSID_SIZE];
+	bool dup_fsid = true;
+
+	while (dup_fsid) {
+		dup_fsid = false;
+		generate_random_uuid(vfsid);
+
+		list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
+			if (!memcmp(vfsid, fs_devices->fsid, BTRFS_FSID_SIZE) ||
+			    !memcmp(vfsid, fs_devices->metadata_uuid,
+				    BTRFS_FSID_SIZE))
+				dup_fsid = true;
+		}
+	}
+
+	memcpy(disk_super->metadata_uuid, disk_super->fsid, BTRFS_FSID_SIZE);
+	memcpy(disk_super->fsid, vfsid, BTRFS_FSID_SIZE);
+
+	btrfs_info(NULL,
+		"virtual fsid (%pU) set for SINGLE_DEV device %s (real fsid %pU)\n",
+		disk_super->fsid, path, disk_super->metadata_uuid);
+}
+
 /*
- * Add new device to list of registered devices
+ * Add new device to list of registered devices, or in case of a SINGLE_DEV
+ * device, also creates a virtual fsid to cope with same-fsid cases.
  *
  * Returns:
  * device pointer which was just added or updated when successful
@@ -785,6 +814,8 @@  static noinline struct btrfs_device *device_list_add(const char *path,
 		BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
 	bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
 					BTRFS_SUPER_FLAG_CHANGING_FSID_V2);
+	bool single_dev = (btrfs_super_compat_ro_flags(disk_super) &
+			BTRFS_FEATURE_COMPAT_RO_SINGLE_DEV);
 
 	error = lookup_bdev(path, &path_devt);
 	if (error) {
@@ -793,23 +824,32 @@  static noinline struct btrfs_device *device_list_add(const char *path,
 		return ERR_PTR(error);
 	}
 
-	if (fsid_change_in_progress) {
-		if (!has_metadata_uuid)
-			fs_devices = find_fsid_inprogress(disk_super);
-		else
-			fs_devices = find_fsid_changed(disk_super);
-	} else if (has_metadata_uuid) {
-		fs_devices = find_fsid_with_metadata_uuid(disk_super);
+	if (single_dev) {
+		if (has_metadata_uuid || fsid_change_in_progress) {
+			btrfs_err(NULL,
+		"SINGLE_DEV devices don't support the metadata_uuid feature\n");
+			return ERR_PTR(-EINVAL);
+		}
+		prepare_virtual_fsid(disk_super, path);
 	} else {
-		fs_devices = find_fsid_reverted_metadata(disk_super);
-		if (!fs_devices)
-			fs_devices = find_fsid(disk_super->fsid, NULL);
+		if (fsid_change_in_progress) {
+			if (!has_metadata_uuid)
+				fs_devices = find_fsid_inprogress(disk_super);
+			else
+				fs_devices = find_fsid_changed(disk_super);
+		} else if (has_metadata_uuid) {
+			fs_devices = find_fsid_with_metadata_uuid(disk_super);
+		} else {
+			fs_devices = find_fsid_reverted_metadata(disk_super);
+			if (!fs_devices)
+				fs_devices = find_fsid(disk_super->fsid, NULL);
+		}
 	}
 
 
 	if (!fs_devices) {
 		fs_devices = alloc_fs_devices(disk_super->fsid);
-		if (has_metadata_uuid)
+		if (has_metadata_uuid || single_dev)
 			memcpy(fs_devices->metadata_uuid,
 			       disk_super->metadata_uuid, BTRFS_FSID_SIZE);
 
@@ -1357,13 +1397,15 @@  int btrfs_forget_devices(dev_t devt)
  * and we are not allowed to call set_blocksize during the scan. The superblock
  * is read via pagecache
  */
-struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags)
+struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
+					   bool mounting)
 {
 	struct btrfs_super_block *disk_super;
 	bool new_device_added = false;
 	struct btrfs_device *device = NULL;
 	struct block_device *bdev;
 	u64 bytenr, bytenr_orig;
+	bool single_dev;
 	int ret;
 
 	lockdep_assert_held(&uuid_mutex);
@@ -1402,6 +1444,16 @@  struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags)
 		goto error_bdev_put;
 	}
 
+	single_dev = btrfs_super_compat_ro_flags(disk_super) &
+			BTRFS_FEATURE_COMPAT_RO_SINGLE_DEV;
+
+	if (!mounting && single_dev) {
+		pr_info("BTRFS: skipped non-mount scan on SINGLE_DEV device %s\n",
+			path);
+		btrfs_release_disk_super(disk_super);
+		return ERR_PTR(-EINVAL);
+	}
+
 	device = device_list_add(path, disk_super, &new_device_added);
 	if (!IS_ERR(device) && new_device_added)
 		btrfs_free_stale_devices(device->devt, device);
@@ -2391,6 +2443,12 @@  int btrfs_get_dev_args_from_path(struct btrfs_fs_info *fs_info,
 
 	args->devid = btrfs_stack_device_id(&disk_super->dev_item);
 	memcpy(args->uuid, disk_super->dev_item.uuid, BTRFS_UUID_SIZE);
+
+	/*
+	 * Note that SINGLE_DEV devices are not handled in a special way here;
+	 * device removal/replace is instead forbidden when such feature is
+	 * present, this note is for future users/readers of this function.
+	 */
 	if (btrfs_fs_incompat(fs_info, METADATA_UUID))
 		memcpy(args->fsid, disk_super->metadata_uuid, BTRFS_FSID_SIZE);
 	else
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 2128a032c3b7..1ffeb333c55c 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -611,7 +611,8 @@  struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans,
 void btrfs_mapping_tree_free(struct extent_map_tree *tree);
 int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
 		       blk_mode_t flags, void *holder);
-struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags);
+struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
+					   bool mounting);
 int btrfs_forget_devices(dev_t devt);
 void btrfs_close_devices(struct btrfs_fs_devices *fs_devices);
 void btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices);
diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
index dbb8b96da50d..cb7a7cfe1ea9 100644
--- a/include/uapi/linux/btrfs.h
+++ b/include/uapi/linux/btrfs.h
@@ -313,6 +313,13 @@  struct btrfs_ioctl_fs_info_args {
  */
 #define BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE	(1ULL << 3)
 
+/*
+ * Single devices (as flagged by the corresponding compat_ro flag) only
+ * gets scanned during mount time; also, a random fsid is generated for
+ * them, in order to cope with same-fsid filesystem mounts.
+ */
+#define BTRFS_FEATURE_COMPAT_RO_SINGLE_DEV		(1ULL << 4)
+
 #define BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF	(1ULL << 0)
 #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL	(1ULL << 1)
 #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS	(1ULL << 2)