Message ID | 6b1f037344cd8d24566f3d9873b820a73384242c.1598995167.git.josef@toxicpanda.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: allow single disk devices to mount with older generations | expand |
On 2.09.20 г. 0:19 ч., Josef Bacik wrote: > We have this check to make sure we don't accidentally add older devices > that may have disappeared and re-appeared with an older generation from > being added to an fs_devices. This makes sense, we don't want stale > disks in our file system. However for single disks this doesn't really > make sense. I've seen this in testing, but I was provided a reproducer > from a project that builds btrfs images on loopback devices. The > loopback device gets cached with the new generation, and then if it is > re-used to generate a new file system we'll fail to mount it because the > new fs is "older" than what we have in cache. > > Fix this by simply ignoring this check if we're a single disk file > system, as we're not going to cause problems for the fs by allowing the > disk to be mounted with an older generation than what is in our cache. > > I've also added a error message for this case, as it was kind of > annoying to find originally. > > Reported-by: Daan De Meyer <daandemeyer@fb.com> > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > --- Since you've got a reproducer is it possible to turn this into an fstests?
On 2/9/20 5:19 am, Josef Bacik wrote: > We have this check to make sure we don't accidentally add older devices > that may have disappeared and re-appeared with an older generation from > being added to an fs_devices. This makes sense, we don't want stale > disks in our file system. However for single disks this doesn't really > make sense. I've seen this in testing, but I was provided a reproducer > from a project that builds btrfs images on loopback devices. The > loopback device gets cached with the new generation, and then if it is > re-used to generate a new file system we'll fail to mount it because the > new fs is "older" than what we have in cache. > > Fix this by simply ignoring this check if we're a single disk file > system, as we're not going to cause problems for the fs by allowing the > disk to be mounted with an older generation than what is in our cache. > > I've also added a error message for this case, as it was kind of > annoying to find originally. > > Reported-by: Daan De Meyer <daandemeyer@fb.com> > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > --- > fs/btrfs/volumes.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index 77b7da42c651..eb2cc27ef602 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -786,6 +786,7 @@ static noinline struct btrfs_device *device_list_add(const char *path, > struct rcu_string *name; > u64 found_transid = btrfs_super_generation(disk_super); > u64 devid = btrfs_stack_device_id(&disk_super->dev_item); > + bool multi_disk = btrfs_super_num_devices(disk_super) > 1; > bool has_metadata_uuid = (btrfs_super_incompat_flags(disk_super) & > BTRFS_FEATURE_INCOMPAT_METADATA_UUID); > bool fsid_change_in_progress = (btrfs_super_flags(disk_super) & > @@ -914,7 +915,8 @@ static noinline struct btrfs_device *device_list_add(const char *path, > * tracking a problem where systems fail mount by subvolume id > * when we reject replacement on a mounted FS. > */ > - if (!fs_devices->opened && found_transid < device->generation) { > + if (multi_disk && !fs_devices->opened && > + found_transid < device->generation) { > /* > * That is if the FS is _not_ mounted and if you > * are here, that means there is more than one > @@ -922,6 +924,10 @@ static noinline struct btrfs_device *device_list_add(const char *path, > * with larger generation number or the last-in if > * generation are equal. > */ > + btrfs_warn_in_rcu(device->fs_info, > + "old device %s not being added for fsid:devid for %pU:%llu", > + rcu_str_deref(device->name), > + disk_super->fsid, devid); > mutex_unlock(&fs_devices->device_list_mutex); > return ERR_PTR(-EEXIST); > } > After the patch - that means if there are two identical but different generation images/disks and if the systemd auto-scans both of them, the scan will race and the last scanned disk/image will mount successfully. Whereas before the patch- the disk/image with the larger generation always won (even in single disk FS). Are we ok with this? IMO the last scanned gets mounted is also kind of fair. Internally I had a similar reported. I just told them to use btrfs device scan --forget and try. It worked. Reviewed-by: Anand Jain <anand.jain@oracle.com> Thanks, Anand
On 9/2/20 6:41 AM, Anand Jain wrote: > On 2/9/20 5:19 am, Josef Bacik wrote: >> We have this check to make sure we don't accidentally add older devices >> that may have disappeared and re-appeared with an older generation from >> being added to an fs_devices. This makes sense, we don't want stale >> disks in our file system. However for single disks this doesn't really >> make sense. I've seen this in testing, but I was provided a reproducer >> from a project that builds btrfs images on loopback devices. The >> loopback device gets cached with the new generation, and then if it is >> re-used to generate a new file system we'll fail to mount it because the >> new fs is "older" than what we have in cache. >> >> Fix this by simply ignoring this check if we're a single disk file >> system, as we're not going to cause problems for the fs by allowing the >> disk to be mounted with an older generation than what is in our cache. >> >> I've also added a error message for this case, as it was kind of >> annoying to find originally. >> >> Reported-by: Daan De Meyer <daandemeyer@fb.com> >> Signed-off-by: Josef Bacik <josef@toxicpanda.com> >> --- >> fs/btrfs/volumes.c | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >> index 77b7da42c651..eb2cc27ef602 100644 >> --- a/fs/btrfs/volumes.c >> +++ b/fs/btrfs/volumes.c >> @@ -786,6 +786,7 @@ static noinline struct btrfs_device *device_list_add(const >> char *path, >> struct rcu_string *name; >> u64 found_transid = btrfs_super_generation(disk_super); >> u64 devid = btrfs_stack_device_id(&disk_super->dev_item); >> + bool multi_disk = btrfs_super_num_devices(disk_super) > 1; >> bool has_metadata_uuid = (btrfs_super_incompat_flags(disk_super) & >> BTRFS_FEATURE_INCOMPAT_METADATA_UUID); >> bool fsid_change_in_progress = (btrfs_super_flags(disk_super) & >> @@ -914,7 +915,8 @@ static noinline struct btrfs_device *device_list_add(const >> char *path, >> * tracking a problem where systems fail mount by subvolume id >> * when we reject replacement on a mounted FS. >> */ >> - if (!fs_devices->opened && found_transid < device->generation) { >> + if (multi_disk && !fs_devices->opened && >> + found_transid < device->generation) { >> /* >> * That is if the FS is _not_ mounted and if you >> * are here, that means there is more than one >> @@ -922,6 +924,10 @@ static noinline struct btrfs_device >> *device_list_add(const char *path, >> * with larger generation number or the last-in if >> * generation are equal. >> */ >> + btrfs_warn_in_rcu(device->fs_info, >> + "old device %s not being added for fsid:devid for %pU:%llu", >> + rcu_str_deref(device->name), >> + disk_super->fsid, devid); >> mutex_unlock(&fs_devices->device_list_mutex); >> return ERR_PTR(-EEXIST); >> } >> > > After the patch - that means if there are two identical but different > generation images/disks and if the systemd auto-scans both of them, > the scan will race and the last scanned disk/image will mount > successfully. Whereas before the patch- the disk/image with the larger > generation always won (even in single disk FS). > > Are we ok with this? IMO the last scanned gets mounted is also kind of fair. > > Internally I had a similar reported. I just told them to use > btrfs device scan --forget > and try. It worked. > > > Reviewed-by: Anand Jain <anand.jain@oracle.com> > Yeah it's kind of wonky, really I just want to make the "surprise! this doesn't work" aspect of this go away. There's some cases where you really just need to do btrfs device scan --forget, and that'll be ok. But for the basic "I mounted this, and then blew everything away, and then tried to mount the old version", I want it to work without being weird. Thanks, Josef
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 77b7da42c651..eb2cc27ef602 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -786,6 +786,7 @@ static noinline struct btrfs_device *device_list_add(const char *path, struct rcu_string *name; u64 found_transid = btrfs_super_generation(disk_super); u64 devid = btrfs_stack_device_id(&disk_super->dev_item); + bool multi_disk = btrfs_super_num_devices(disk_super) > 1; bool has_metadata_uuid = (btrfs_super_incompat_flags(disk_super) & BTRFS_FEATURE_INCOMPAT_METADATA_UUID); bool fsid_change_in_progress = (btrfs_super_flags(disk_super) & @@ -914,7 +915,8 @@ static noinline struct btrfs_device *device_list_add(const char *path, * tracking a problem where systems fail mount by subvolume id * when we reject replacement on a mounted FS. */ - if (!fs_devices->opened && found_transid < device->generation) { + if (multi_disk && !fs_devices->opened && + found_transid < device->generation) { /* * That is if the FS is _not_ mounted and if you * are here, that means there is more than one @@ -922,6 +924,10 @@ static noinline struct btrfs_device *device_list_add(const char *path, * with larger generation number or the last-in if * generation are equal. */ + btrfs_warn_in_rcu(device->fs_info, + "old device %s not being added for fsid:devid for %pU:%llu", + rcu_str_deref(device->name), + disk_super->fsid, devid); mutex_unlock(&fs_devices->device_list_mutex); return ERR_PTR(-EEXIST); }
We have this check to make sure we don't accidentally add older devices that may have disappeared and re-appeared with an older generation from being added to an fs_devices. This makes sense, we don't want stale disks in our file system. However for single disks this doesn't really make sense. I've seen this in testing, but I was provided a reproducer from a project that builds btrfs images on loopback devices. The loopback device gets cached with the new generation, and then if it is re-used to generate a new file system we'll fail to mount it because the new fs is "older" than what we have in cache. Fix this by simply ignoring this check if we're a single disk file system, as we're not going to cause problems for the fs by allowing the disk to be mounted with an older generation than what is in our cache. I've also added a error message for this case, as it was kind of annoying to find originally. Reported-by: Daan De Meyer <daandemeyer@fb.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> --- fs/btrfs/volumes.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)