diff mbox series

[4/4] btrfs: validate device_list at scan for stray free

Message ID 87d75575e16637a84b82326d5c53cb78cdf9a7e0.1709991203.git.anand.jain@oracle.com (mailing list archive)
State New, archived
Headers show
Series btrfs: enhanced logic for stray single device | expand

Commit Message

Anand Jain March 9, 2024, 1:44 p.m. UTC
Tempfsid assumes all registered single devices in the fs_devicies list are
to be mounted; otherwise, they won't be in the btrfs_device list.

We recently fixed a related bug caused by leaving failed-open device in
the list. This triggered tempfsid activation upon subsequent mounts of the
same fsid wrongly.

To prevent this, scan the entire device list at mount for any stray
device and free them in btrfs_scan_one_device().

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/volumes.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Boris Burkov March 14, 2024, 5:11 p.m. UTC | #1
On Sat, Mar 09, 2024 at 07:14:31PM +0530, Anand Jain wrote:
> Tempfsid assumes all registered single devices in the fs_devicies list are
> to be mounted; otherwise, they won't be in the btrfs_device list.
> 
> We recently fixed a related bug caused by leaving failed-open device in
> the list. This triggered tempfsid activation upon subsequent mounts of the
> same fsid wrongly.
> 
> To prevent this, scan the entire device list at mount for any stray
> device and free them in btrfs_scan_one_device().

Is this an additional precaution on top of maintaining an invariant on
every umount/failed mount that we have freed stale devices of single
device fs-es? Or is it fundamentally impossible for us to enforce that
invariant?

It feels like overkill to hack up free_stale_devices in this way,
compared to just ensuring that we manage cleaning up single devices
fs-es correctly when we are in a cleanup context. If this is practically
the best way to ensure we don't get caught with our pants down by a
random stale device, then I suppose it's fine.

A total aside I just thought of:
I think it might also make sense to consider adding logic to look for
single device fs-es with a device->bdev that is stale from the block
layer's perspective, and somehow marking those in a way that tempfsid
cares about. That would help with things that like that case where we
delete the block dev out from under a mounted fs and mount it a second
time with tempfsid after it's recreated. Not a huge deal, as we've
already discussed, though.

> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  fs/btrfs/volumes.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 60d848392cd0..bb0857cfbef2 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -1382,6 +1382,8 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
>  
>  	lockdep_assert_held(&uuid_mutex);
>  
> +	btrfs_free_stale_devices(0, NULL, true);
> +
>  	/*
>  	 * we would like to check all the supers, but that would make
>  	 * a btrfs mount succeed after a mkfs from a different FS.
> -- 
> 2.38.1
>
Anand Jain March 16, 2024, 2:13 p.m. UTC | #2
On 3/14/24 22:41, Boris Burkov wrote:
> On Sat, Mar 09, 2024 at 07:14:31PM +0530, Anand Jain wrote:
>> Tempfsid assumes all registered single devices in the fs_devicies list are
>> to be mounted; otherwise, they won't be in the btrfs_device list.
>>
>> We recently fixed a related bug caused by leaving failed-open device in
>> the list. This triggered tempfsid activation upon subsequent mounts of the
>> same fsid wrongly.
>>
>> To prevent this, scan the entire device list at mount for any stray
>> device and free them in btrfs_scan_one_device().
> 
> Is this an additional precaution on top of maintaining an invariant on
> every umount/failed mount that we have freed stale devices of single
> device fs-es? Or is it fundamentally impossible for us to enforce that
> invariant?
> 

Hmm. That's the ultimate goal: maintaining such an invariant. However,
there are bugs. So, this is the place where we can detect whether we
are successful. If we aren't, then we can work around it by freeing
the stale device and avoiding bad consequences.
I think I should also include a warning message when we detect and
free, so that it can be reviewed for the proper fix.
Does that seem reasonable?

> It feels like overkill to hack up free_stale_devices in this way,
> compared to just ensuring that we manage cleaning up single devices
> fs-es correctly when we are in a cleanup context. If this is practically
> the best way to ensure we don't get caught with our pants down by a
> random stale device, then I suppose it's fine.
> 
> A total aside I just thought of:
> I think it might also make sense to consider adding logic to look for
> single device fs-es with a device->bdev that is stale from the block
> layer's perspective, and somehow marking those in a way that tempfsid
> cares about. 


How would we know if the block layer considers a certain device's
block device (device->bdev) as stale?

If you mention a Write IO failure, we already put the filesystem
in read-only mode if that happens. But, we can't close the device
due to the pending write. (Some operating systems have an option
to call panic, which dumps the memory to the coredump device and
reboots.).

> That would help with things that like that case where we
> delete the block dev out from under a mounted fs and mount it a second
> time with tempfsid after it's recreated. Not a huge deal, as we've
> already discussed, though.

Yeah, thanks for brainstorming. Basically, we need a way to distinguish 
between the same physical-device with multiple nodes and different 
physical-devices with the same filesystem.

Thanks, Anand

> 
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>   fs/btrfs/volumes.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 60d848392cd0..bb0857cfbef2 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -1382,6 +1382,8 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
>>   
>>   	lockdep_assert_held(&uuid_mutex);
>>   
>> +	btrfs_free_stale_devices(0, NULL, true);
>> +
>>   	/*
>>   	 * we would like to check all the supers, but that would make
>>   	 * a btrfs mount succeed after a mkfs from a different FS.
>> -- 
>> 2.38.1
>>
Anand Jain March 21, 2024, 11:19 a.m. UTC | #3
Boris,

Gentle ping?

Thanks.

On 3/16/24 19:43, Anand Jain wrote:
> On 3/14/24 22:41, Boris Burkov wrote:
>> On Sat, Mar 09, 2024 at 07:14:31PM +0530, Anand Jain wrote:
>>> Tempfsid assumes all registered single devices in the fs_devicies 
>>> list are
>>> to be mounted; otherwise, they won't be in the btrfs_device list.
>>>
>>> We recently fixed a related bug caused by leaving failed-open device in
>>> the list. This triggered tempfsid activation upon subsequent mounts 
>>> of the
>>> same fsid wrongly.
>>>
>>> To prevent this, scan the entire device list at mount for any stray
>>> device and free them in btrfs_scan_one_device().
>>
>> Is this an additional precaution on top of maintaining an invariant on
>> every umount/failed mount that we have freed stale devices of single
>> device fs-es? Or is it fundamentally impossible for us to enforce that
>> invariant?
>>
> 
> Hmm. That's the ultimate goal: maintaining such an invariant. However,
> there are bugs. So, this is the place where we can detect whether we
> are successful. If we aren't, then we can work around it by freeing
> the stale device and avoiding bad consequences.
> I think I should also include a warning message when we detect and
> free, so that it can be reviewed for the proper fix.
> Does that seem reasonable?
> 
>> It feels like overkill to hack up free_stale_devices in this way,
>> compared to just ensuring that we manage cleaning up single devices
>> fs-es correctly when we are in a cleanup context. If this is practically
>> the best way to ensure we don't get caught with our pants down by a
>> random stale device, then I suppose it's fine.
>>
>> A total aside I just thought of:
>> I think it might also make sense to consider adding logic to look for
>> single device fs-es with a device->bdev that is stale from the block
>> layer's perspective, and somehow marking those in a way that tempfsid
>> cares about. 
> 
> 
> How would we know if the block layer considers a certain device's
> block device (device->bdev) as stale?
> 
> If you mention a Write IO failure, we already put the filesystem
> in read-only mode if that happens. But, we can't close the device
> due to the pending write. (Some operating systems have an option
> to call panic, which dumps the memory to the coredump device and
> reboots.).
> 
>> That would help with things that like that case where we
>> delete the block dev out from under a mounted fs and mount it a second
>> time with tempfsid after it's recreated. Not a huge deal, as we've
>> already discussed, though.
> 
> Yeah, thanks for brainstorming. Basically, we need a way to distinguish 
> between the same physical-device with multiple nodes and different 
> physical-devices with the same filesystem.
> 
> Thanks, Anand
> 
>>
>>>
>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>> ---
>>>   fs/btrfs/volumes.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 60d848392cd0..bb0857cfbef2 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -1382,6 +1382,8 @@ struct btrfs_device 
>>> *btrfs_scan_one_device(const char *path, blk_mode_t flags,
>>>       lockdep_assert_held(&uuid_mutex);
>>> +    btrfs_free_stale_devices(0, NULL, true);
>>> +
>>>       /*
>>>        * we would like to check all the supers, but that would make
>>>        * a btrfs mount succeed after a mkfs from a different FS.
>>> -- 
>>> 2.38.1
>>>
>
diff mbox series

Patch

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 60d848392cd0..bb0857cfbef2 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -1382,6 +1382,8 @@  struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
 
 	lockdep_assert_held(&uuid_mutex);
 
+	btrfs_free_stale_devices(0, NULL, true);
+
 	/*
 	 * we would like to check all the supers, but that would make
 	 * a btrfs mount succeed after a mkfs from a different FS.