diff mbox series

btrfs: validate device maj:min during open

Message ID 752b8526be21d984e0ee58c7f66d312664ff5ac5.1709256891.git.anand.jain@oracle.com (mailing list archive)
State New, archived
Headers show
Series btrfs: validate device maj:min during open | expand

Commit Message

Anand Jain March 1, 2024, 1:51 a.m. UTC
Boris managed to create a device capable of changing its maj:min without
altering its device path.

Only multi-devices can be scanned. A device that gets scanned and remains
in the Btrfs kernel cache might end up with an incorrect maj:min.

Despite the tempfsid feature patch did not introduce this bug, it could
lead to issues if the above multi-device is converted to a single device
with a stale maj:min. Subsequently, attempting to mount the same device
with the correct maj:min might mistake it for another device with the same
fsid, potentially resulting in wrongly auto-enabling the tempfsid feature.

To address this, this patch validates the device's maj:min at the time of
device open and updates it if it has changed since the last scan.

Fixes: a5b8a5f9f835 ("btrfs: support cloned-device mount capability")
Reported-by: Boris Burkov <boris@bur.io>
Co-developed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
 fs/btrfs/volumes.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Comments

Boris Burkov March 1, 2024, 3:34 p.m. UTC | #1
On Fri, Mar 01, 2024 at 07:21:32AM +0530, Anand Jain wrote:
> Boris managed to create a device capable of changing its maj:min without
> altering its device path.
> 
> Only multi-devices can be scanned. A device that gets scanned and remains
> in the Btrfs kernel cache might end up with an incorrect maj:min.
> 
> Despite the tempfsid feature patch did not introduce this bug, it could
> lead to issues if the above multi-device is converted to a single device
> with a stale maj:min. Subsequently, attempting to mount the same device
> with the correct maj:min might mistake it for another device with the same
> fsid, potentially resulting in wrongly auto-enabling the tempfsid feature.
> 
> To address this, this patch validates the device's maj:min at the time of
> device open and updates it if it has changed since the last scan.

I do believe this patch is correct for fixing my test case, but I don't
believe that it is the proper fix for this issue. This is just plugging
one more hole in a leaky dam.

> 
> Fixes: a5b8a5f9f835 ("btrfs: support cloned-device mount capability")
> Reported-by: Boris Burkov <boris@bur.io>
> Co-developed-by: Boris Burkov <boris@bur.io>
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
>  fs/btrfs/volumes.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index deb4f191730d..4c498f088302 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -644,6 +644,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
>  	struct bdev_handle *bdev_handle;
>  	struct btrfs_super_block *disk_super;
>  	u64 devid;
> +	dev_t devt;
>  	int ret;
>  
>  	if (device->bdev)
> @@ -692,6 +693,24 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
>  	device->bdev = bdev_handle->bdev;
>  	clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
>  
> +	ret = lookup_bdev(device->name->str, &devt);

It should be fine to just use the dev_t in bdev_handle->bdev->bd_dev.

> +	if (ret) {
> +		btrfs_err(NULL,
> +	"failed to validate (%d:%d) maj:min for device %s %d resetting to 0",
> +			  MAJOR(device->devt), MINOR(device->devt),
> +			  device->name->str, ret);
> +		device->devt = 0;
> +	} else {
> +		if (device->devt != devt) {
> +			btrfs_warn(NULL,
> +				"device %s maj:min changed from %d:%d to %d:%d",
> +				   device->name->str, MAJOR(device->devt),
> +				   MINOR(device->devt), MAJOR(devt),
> +				   MINOR(devt));
> +			device->devt = devt;
> +		}
> +	}
> +
>  	fs_devices->open_devices++;
>  	if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) &&
>  	    device->devid != BTRFS_DEV_REPLACE_DEVID) {
> -- 
> 2.38.1
>
Anand Jain March 1, 2024, 4:47 p.m. UTC | #2
On 3/1/24 21:04, Boris Burkov wrote:
> On Fri, Mar 01, 2024 at 07:21:32AM +0530, Anand Jain wrote:
>> Boris managed to create a device capable of changing its maj:min without
>> altering its device path.
>>
>> Only multi-devices can be scanned. A device that gets scanned and remains
>> in the Btrfs kernel cache might end up with an incorrect maj:min.
>>
>> Despite the tempfsid feature patch did not introduce this bug, it could
>> lead to issues if the above multi-device is converted to a single device
>> with a stale maj:min. Subsequently, attempting to mount the same device
>> with the correct maj:min might mistake it for another device with the same
>> fsid, potentially resulting in wrongly auto-enabling the tempfsid feature.
>>
>> To address this, this patch validates the device's maj:min at the time of
>> device open and updates it if it has changed since the last scan.
> 

> I do believe this patch is correct for fixing my test case,

Right. It fixes the bug reported in the testcase only.

> but I don't
> believe that it is the proper fix for this issue.

Indeed, I anticipated that it might be confusing. As I've clarified
elsewhere, this isn't the solution for the already known multi-nodes
single-device issue. The resolution is currently being discussed.

I think I've come up with a better idea now. I'll send out the
proposal soon for discussions.

However, my first challenge is to simulate a multi-nodes
single-device environment for testing.

Thx, Anand

> This is just plugging
> one more hole in a leaky dam.

>> Fixes: a5b8a5f9f835 ("btrfs: support cloned-device mount capability")
>> Reported-by: Boris Burkov <boris@bur.io>
>> Co-developed-by: Boris Burkov <boris@bur.io>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>>   fs/btrfs/volumes.c | 19 +++++++++++++++++++
>>   1 file changed, 19 insertions(+)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index deb4f191730d..4c498f088302 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -644,6 +644,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
>>   	struct bdev_handle *bdev_handle;
>>   	struct btrfs_super_block *disk_super;
>>   	u64 devid;
>> +	dev_t devt;
>>   	int ret;
>>   
>>   	if (device->bdev)
>> @@ -692,6 +693,24 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
>>   	device->bdev = bdev_handle->bdev;
>>   	clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
>>   
>> +	ret = lookup_bdev(device->name->str, &devt);
> 
> It should be fine to just use the dev_t in bdev_handle->bdev->bd_dev.
> 
>> +	if (ret) {
>> +		btrfs_err(NULL,
>> +	"failed to validate (%d:%d) maj:min for device %s %d resetting to 0",
>> +			  MAJOR(device->devt), MINOR(device->devt),
>> +			  device->name->str, ret);
>> +		device->devt = 0;
>> +	} else {
>> +		if (device->devt != devt) {
>> +			btrfs_warn(NULL,
>> +				"device %s maj:min changed from %d:%d to %d:%d",
>> +				   device->name->str, MAJOR(device->devt),
>> +				   MINOR(device->devt), MAJOR(devt),
>> +				   MINOR(devt));
>> +			device->devt = devt;
>> +		}
>> +	}
>> +
>>   	fs_devices->open_devices++;
>>   	if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) &&
>>   	    device->devid != BTRFS_DEV_REPLACE_DEVID) {
>> -- 
>> 2.38.1
>>
Anand Jain March 1, 2024, 4:59 p.m. UTC | #3
I missed a comment.

>> @@ -692,6 +693,24 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
>>   	device->bdev = bdev_handle->bdev;
>>   	clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
>>   
>> +	ret = lookup_bdev(device->name->str, &devt);
> 
> It should be fine to just use the dev_t in bdev_handle->bdev->bd_dev.

  Oh. That's much improved. I'll fix.

Thanks,
-Anand
Boris Burkov March 1, 2024, 7:47 p.m. UTC | #4
On Fri, Mar 01, 2024 at 10:17:41PM +0530, Anand Jain wrote:
> 
> 
> On 3/1/24 21:04, Boris Burkov wrote:
> > On Fri, Mar 01, 2024 at 07:21:32AM +0530, Anand Jain wrote:
> > > Boris managed to create a device capable of changing its maj:min without
> > > altering its device path.
> > > 
> > > Only multi-devices can be scanned. A device that gets scanned and remains
> > > in the Btrfs kernel cache might end up with an incorrect maj:min.
> > > 
> > > Despite the tempfsid feature patch did not introduce this bug, it could
> > > lead to issues if the above multi-device is converted to a single device
> > > with a stale maj:min. Subsequently, attempting to mount the same device
> > > with the correct maj:min might mistake it for another device with the same
> > > fsid, potentially resulting in wrongly auto-enabling the tempfsid feature.
> > > 
> > > To address this, this patch validates the device's maj:min at the time of
> > > device open and updates it if it has changed since the last scan.
> > 
> 
> > I do believe this patch is correct for fixing my test case,
> 
> Right. It fixes the bug reported in the testcase only.
> 
> > but I don't
> > believe that it is the proper fix for this issue.
> 
> Indeed, I anticipated that it might be confusing. As I've clarified
> elsewhere, this isn't the solution for the already known multi-nodes
> single-device issue. The resolution is currently being discussed.

I'm not familiar with the the multi-nodes single-device issue.

My point was that while I agree that this fix works for the issue I do
know about, an invalid devt in the btrfs device cache, I do not like it
as a matter of taste.

We have an invalid cache and instead of invalidating it in a general
way at the point of invalidation, you are proposing that we just fix it
up when we happen to notice. I believe that is a fragile solution that
will simply leave the door open for more bugs in the future. (if not
present).

If we do go this way, at the very least I think we should do something a
bit more general. Spitballing here, but perhaps we can get rid of all
redundant caching of random values like devt in btrfs_device and just
use the contents of the bdev struct as it is currently in the block
layer. Or we need a way to treat a btrfs_device as invalid unless
open_one_device has run on it.

> 
> I think I've come up with a better idea now. I'll send out the
> proposal soon for discussions.
> 
> However, my first challenge is to simulate a multi-nodes
> single-device environment for testing.
> 
> Thx, Anand
> 
> > This is just plugging
> > one more hole in a leaky dam.
> 
> > > Fixes: a5b8a5f9f835 ("btrfs: support cloned-device mount capability")
> > > Reported-by: Boris Burkov <boris@bur.io>
> > > Co-developed-by: Boris Burkov <boris@bur.io>
> > > Signed-off-by: Anand Jain <anand.jain@oracle.com>
> > > ---
> > >   fs/btrfs/volumes.c | 19 +++++++++++++++++++
> > >   1 file changed, 19 insertions(+)
> > > 
> > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > > index deb4f191730d..4c498f088302 100644
> > > --- a/fs/btrfs/volumes.c
> > > +++ b/fs/btrfs/volumes.c
> > > @@ -644,6 +644,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
> > >   	struct bdev_handle *bdev_handle;
> > >   	struct btrfs_super_block *disk_super;
> > >   	u64 devid;
> > > +	dev_t devt;
> > >   	int ret;
> > >   	if (device->bdev)
> > > @@ -692,6 +693,24 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
> > >   	device->bdev = bdev_handle->bdev;
> > >   	clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
> > > +	ret = lookup_bdev(device->name->str, &devt);
> > 
> > It should be fine to just use the dev_t in bdev_handle->bdev->bd_dev.
> > 
> > > +	if (ret) {
> > > +		btrfs_err(NULL,
> > > +	"failed to validate (%d:%d) maj:min for device %s %d resetting to 0",
> > > +			  MAJOR(device->devt), MINOR(device->devt),
> > > +			  device->name->str, ret);
> > > +		device->devt = 0;
> > > +	} else {
> > > +		if (device->devt != devt) {
> > > +			btrfs_warn(NULL,
> > > +				"device %s maj:min changed from %d:%d to %d:%d",
> > > +				   device->name->str, MAJOR(device->devt),
> > > +				   MINOR(device->devt), MAJOR(devt),
> > > +				   MINOR(devt));
> > > +			device->devt = devt;
> > > +		}
> > > +	}
> > > +
> > >   	fs_devices->open_devices++;
> > >   	if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) &&
> > >   	    device->devid != BTRFS_DEV_REPLACE_DEVID) {
> > > -- 
> > > 2.38.1
> > >
diff mbox series

Patch

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index deb4f191730d..4c498f088302 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -644,6 +644,7 @@  static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
 	struct bdev_handle *bdev_handle;
 	struct btrfs_super_block *disk_super;
 	u64 devid;
+	dev_t devt;
 	int ret;
 
 	if (device->bdev)
@@ -692,6 +693,24 @@  static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
 	device->bdev = bdev_handle->bdev;
 	clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
 
+	ret = lookup_bdev(device->name->str, &devt);
+	if (ret) {
+		btrfs_err(NULL,
+	"failed to validate (%d:%d) maj:min for device %s %d resetting to 0",
+			  MAJOR(device->devt), MINOR(device->devt),
+			  device->name->str, ret);
+		device->devt = 0;
+	} else {
+		if (device->devt != devt) {
+			btrfs_warn(NULL,
+				"device %s maj:min changed from %d:%d to %d:%d",
+				   device->name->str, MAJOR(device->devt),
+				   MINOR(device->devt), MAJOR(devt),
+				   MINOR(devt));
+			device->devt = devt;
+		}
+	}
+
 	fs_devices->open_devices++;
 	if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) &&
 	    device->devid != BTRFS_DEV_REPLACE_DEVID) {