diff mbox series

btrfs: allow single disk devices to mount with older generations

Message ID 6b1f037344cd8d24566f3d9873b820a73384242c.1598995167.git.josef@toxicpanda.com
State New, archived
Headers show
Series btrfs: allow single disk devices to mount with older generations | expand

Commit Message

Josef Bacik Sept. 1, 2020, 9:19 p.m. UTC
We have this check to make sure we don't accidentally add older devices
that may have disappeared and re-appeared with an older generation from
being added to an fs_devices.  This makes sense, we don't want stale
disks in our file system.  However for single disks this doesn't really
make sense.  I've seen this in testing, but I was provided a reproducer
from a project that builds btrfs images on loopback devices.  The
loopback device gets cached with the new generation, and then if it is
re-used to generate a new file system we'll fail to mount it because the
new fs is "older" than what we have in cache.

Fix this by simply ignoring this check if we're a single disk file
system, as we're not going to cause problems for the fs by allowing the
disk to be mounted with an older generation than what is in our cache.

I've also added a error message for this case, as it was kind of
annoying to find originally.

Reported-by: Daan De Meyer <daandemeyer@fb.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/volumes.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Nikolay Borisov Sept. 2, 2020, 8:58 a.m. UTC | #1
On 2.09.20 г. 0:19 ч., Josef Bacik wrote:
> We have this check to make sure we don't accidentally add older devices
> that may have disappeared and re-appeared with an older generation from
> being added to an fs_devices.  This makes sense, we don't want stale
> disks in our file system.  However for single disks this doesn't really
> make sense.  I've seen this in testing, but I was provided a reproducer
> from a project that builds btrfs images on loopback devices.  The
> loopback device gets cached with the new generation, and then if it is
> re-used to generate a new file system we'll fail to mount it because the
> new fs is "older" than what we have in cache.
> 
> Fix this by simply ignoring this check if we're a single disk file
> system, as we're not going to cause problems for the fs by allowing the
> disk to be mounted with an older generation than what is in our cache.
> 
> I've also added a error message for this case, as it was kind of
> annoying to find originally.
> 
> Reported-by: Daan De Meyer <daandemeyer@fb.com>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---

Since you've got a reproducer is it possible to turn this into an fstests?
Anand Jain Sept. 2, 2020, 10:41 a.m. UTC | #2
On 2/9/20 5:19 am, Josef Bacik wrote:
> We have this check to make sure we don't accidentally add older devices
> that may have disappeared and re-appeared with an older generation from
> being added to an fs_devices.  This makes sense, we don't want stale
> disks in our file system.  However for single disks this doesn't really
> make sense.  I've seen this in testing, but I was provided a reproducer
> from a project that builds btrfs images on loopback devices.  The
> loopback device gets cached with the new generation, and then if it is
> re-used to generate a new file system we'll fail to mount it because the
> new fs is "older" than what we have in cache.
> 
> Fix this by simply ignoring this check if we're a single disk file
> system, as we're not going to cause problems for the fs by allowing the
> disk to be mounted with an older generation than what is in our cache.
> 
> I've also added a error message for this case, as it was kind of
> annoying to find originally.
> 
> Reported-by: Daan De Meyer <daandemeyer@fb.com>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>   fs/btrfs/volumes.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 77b7da42c651..eb2cc27ef602 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -786,6 +786,7 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>   	struct rcu_string *name;
>   	u64 found_transid = btrfs_super_generation(disk_super);
>   	u64 devid = btrfs_stack_device_id(&disk_super->dev_item);
> +	bool multi_disk = btrfs_super_num_devices(disk_super) > 1;
>   	bool has_metadata_uuid = (btrfs_super_incompat_flags(disk_super) &
>   		BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
>   	bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
> @@ -914,7 +915,8 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>   		 * tracking a problem where systems fail mount by subvolume id
>   		 * when we reject replacement on a mounted FS.
>   		 */
> -		if (!fs_devices->opened && found_transid < device->generation) {
> +		if (multi_disk && !fs_devices->opened &&
> +		    found_transid < device->generation) {
>   			/*
>   			 * That is if the FS is _not_ mounted and if you
>   			 * are here, that means there is more than one
> @@ -922,6 +924,10 @@ static noinline struct btrfs_device *device_list_add(const char *path,
>   			 * with larger generation number or the last-in if
>   			 * generation are equal.
>   			 */
> +			btrfs_warn_in_rcu(device->fs_info,
> +		  "old device %s not being added for fsid:devid for %pU:%llu",
> +					  rcu_str_deref(device->name),
> +					  disk_super->fsid, devid);
>   			mutex_unlock(&fs_devices->device_list_mutex);
>   			return ERR_PTR(-EEXIST);
>   		}
> 

After the patch - that means if there are two identical but different
generation images/disks and if the systemd auto-scans both of them,
the scan will race and the last scanned disk/image will mount
successfully. Whereas before the patch- the disk/image with the larger
generation always won (even in single disk FS).

Are we ok with this? IMO the last scanned gets mounted is also kind of fair.

Internally I had a similar reported. I just told them to use
  btrfs device scan --forget
and try. It worked.


Reviewed-by: Anand Jain <anand.jain@oracle.com>


Thanks, Anand
Josef Bacik Sept. 2, 2020, 5:14 p.m. UTC | #3
On 9/2/20 6:41 AM, Anand Jain wrote:
> On 2/9/20 5:19 am, Josef Bacik wrote:
>> We have this check to make sure we don't accidentally add older devices
>> that may have disappeared and re-appeared with an older generation from
>> being added to an fs_devices.  This makes sense, we don't want stale
>> disks in our file system.  However for single disks this doesn't really
>> make sense.  I've seen this in testing, but I was provided a reproducer
>> from a project that builds btrfs images on loopback devices.  The
>> loopback device gets cached with the new generation, and then if it is
>> re-used to generate a new file system we'll fail to mount it because the
>> new fs is "older" than what we have in cache.
>>
>> Fix this by simply ignoring this check if we're a single disk file
>> system, as we're not going to cause problems for the fs by allowing the
>> disk to be mounted with an older generation than what is in our cache.
>>
>> I've also added a error message for this case, as it was kind of
>> annoying to find originally.
>>
>> Reported-by: Daan De Meyer <daandemeyer@fb.com>
>> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
>> ---
>>   fs/btrfs/volumes.c | 8 +++++++-
>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 77b7da42c651..eb2cc27ef602 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -786,6 +786,7 @@ static noinline struct btrfs_device *device_list_add(const 
>> char *path,
>>       struct rcu_string *name;
>>       u64 found_transid = btrfs_super_generation(disk_super);
>>       u64 devid = btrfs_stack_device_id(&disk_super->dev_item);
>> +    bool multi_disk = btrfs_super_num_devices(disk_super) > 1;
>>       bool has_metadata_uuid = (btrfs_super_incompat_flags(disk_super) &
>>           BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
>>       bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
>> @@ -914,7 +915,8 @@ static noinline struct btrfs_device *device_list_add(const 
>> char *path,
>>            * tracking a problem where systems fail mount by subvolume id
>>            * when we reject replacement on a mounted FS.
>>            */
>> -        if (!fs_devices->opened && found_transid < device->generation) {
>> +        if (multi_disk && !fs_devices->opened &&
>> +            found_transid < device->generation) {
>>               /*
>>                * That is if the FS is _not_ mounted and if you
>>                * are here, that means there is more than one
>> @@ -922,6 +924,10 @@ static noinline struct btrfs_device 
>> *device_list_add(const char *path,
>>                * with larger generation number or the last-in if
>>                * generation are equal.
>>                */
>> +            btrfs_warn_in_rcu(device->fs_info,
>> +          "old device %s not being added for fsid:devid for %pU:%llu",
>> +                      rcu_str_deref(device->name),
>> +                      disk_super->fsid, devid);
>>               mutex_unlock(&fs_devices->device_list_mutex);
>>               return ERR_PTR(-EEXIST);
>>           }
>>
> 
> After the patch - that means if there are two identical but different
> generation images/disks and if the systemd auto-scans both of them,
> the scan will race and the last scanned disk/image will mount
> successfully. Whereas before the patch- the disk/image with the larger
> generation always won (even in single disk FS).
> 
> Are we ok with this? IMO the last scanned gets mounted is also kind of fair.
> 
> Internally I had a similar reported. I just told them to use
>   btrfs device scan --forget
> and try. It worked.
> 
> 
> Reviewed-by: Anand Jain <anand.jain@oracle.com>
> 

Yeah it's kind of wonky, really I just want to make the "surprise! this doesn't 
work" aspect of this go away.  There's some cases where you really just need to 
do btrfs device scan --forget, and that'll be ok.  But for the basic "I mounted 
this, and then blew everything away, and then tried to mount the old version", I 
want it to work without being weird.  Thanks,

Josef
diff mbox series

Patch

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 77b7da42c651..eb2cc27ef602 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -786,6 +786,7 @@  static noinline struct btrfs_device *device_list_add(const char *path,
 	struct rcu_string *name;
 	u64 found_transid = btrfs_super_generation(disk_super);
 	u64 devid = btrfs_stack_device_id(&disk_super->dev_item);
+	bool multi_disk = btrfs_super_num_devices(disk_super) > 1;
 	bool has_metadata_uuid = (btrfs_super_incompat_flags(disk_super) &
 		BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
 	bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
@@ -914,7 +915,8 @@  static noinline struct btrfs_device *device_list_add(const char *path,
 		 * tracking a problem where systems fail mount by subvolume id
 		 * when we reject replacement on a mounted FS.
 		 */
-		if (!fs_devices->opened && found_transid < device->generation) {
+		if (multi_disk && !fs_devices->opened &&
+		    found_transid < device->generation) {
 			/*
 			 * That is if the FS is _not_ mounted and if you
 			 * are here, that means there is more than one
@@ -922,6 +924,10 @@  static noinline struct btrfs_device *device_list_add(const char *path,
 			 * with larger generation number or the last-in if
 			 * generation are equal.
 			 */
+			btrfs_warn_in_rcu(device->fs_info,
+		  "old device %s not being added for fsid:devid for %pU:%llu",
+					  rcu_str_deref(device->name),
+					  disk_super->fsid, devid);
 			mutex_unlock(&fs_devices->device_list_mutex);
 			return ERR_PTR(-EEXIST);
 		}