diff mbox series

[RFC,V5,2/2] btrfs: consolidate device_list_mutex in prepare_sprout to its parent

Message ID f00bad4ba0e8fd7f0c46c21118537fd49fd3c359.1630370459.git.anand.jain@oracle.com (mailing list archive)
State New, archived
Headers show
Series btrfs: device_list_mutex fix lockdep warn and cleanup | expand

Commit Message

Anand Jain Aug. 31, 2021, 1:21 a.m. UTC
btrfs_prepare_sprout() moves seed devices into its own struct fs_devices,
so that its parent function btrfs_init_new_device() can add the new sprout
device to fs_info->fs_devices.

Both btrfs_prepare_sprout() and btrfs_init_new_device() needs
device_list_mutex. But they are holding it sequentially, thus creates a
small window to an opportunity to race. Close this opportunity and hold
device_list_mutex common to both btrfs_init_new_device() and
btrfs_prepare_sprout().

Signed-off-by: Anand Jain <anand.jain@oracle.com>
---
RFC because IMO the cleanup of device_list_mutex makes sense even though
there isn't another thread that could race potentially race as of now.

Depends on
 [PATCH v2] btrfs: fix lockdep warning while mounting sprout fs
which removed the device_list_mutex from clone_fs_devices() otherwise
this patch will cause a double mutex error.

v2: fix the missing mutex_unlock in the error return
v3: -
v4: -
v5: - (Except for the change in below SO comments)

 fs/btrfs/volumes.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Nikolay Borisov Aug. 31, 2021, 1:03 p.m. UTC | #1
On 31.08.21 г. 4:21, Anand Jain wrote:
> btrfs_prepare_sprout() moves seed devices into its own struct fs_devices,
> so that its parent function btrfs_init_new_device() can add the new sprout
> device to fs_info->fs_devices.
> 
> Both btrfs_prepare_sprout() and btrfs_init_new_device() needs
> device_list_mutex. But they are holding it sequentially, thus creates a
> small window to an opportunity to race. Close this opportunity and hold
> device_list_mutex common to both btrfs_init_new_device() and
> btrfs_prepare_sprout().
> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>

That's a moot point, what's important is that btrfs_prepare_sprout
leaves the fs_devices in a consistent state and btrfs_init_new_device
also takes the lock when it's about to modify the devices list, so
that's fine as well. While the patch itself won't do any harm I think
it's irrelevant.

<snip>
Anand Jain Sept. 3, 2021, 3:08 a.m. UTC | #2
On 31/08/2021 21:03, Nikolay Borisov wrote:
> 
> 
> On 31.08.21 г. 4:21, Anand Jain wrote:
>> btrfs_prepare_sprout() moves seed devices into its own struct fs_devices,
>> so that its parent function btrfs_init_new_device() can add the new sprout
>> device to fs_info->fs_devices.
>>
>> Both btrfs_prepare_sprout() and btrfs_init_new_device() needs
>> device_list_mutex. But they are holding it sequentially, thus creates a
>> small window to an opportunity to race. Close this opportunity and hold
>> device_list_mutex common to both btrfs_init_new_device() and
>> btrfs_prepare_sprout().
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> 
> That's a moot point, what's important is that btrfs_prepare_sprout
> leaves the fs_devices in a consistent state and btrfs_init_new_device
> also takes the lock when it's about to modify the devices list, so
> that's fine as well. While the patch itself won't do any harm I think
> it's irrelevant.


  This patch is for the cleanup of the device_list_mutex.

Thanks, Anand
David Sterba Sept. 17, 2021, 3:37 p.m. UTC | #3
On Tue, Aug 31, 2021 at 09:21:29AM +0800, Anand Jain wrote:
> btrfs_prepare_sprout() moves seed devices into its own struct fs_devices,
> so that its parent function btrfs_init_new_device() can add the new sprout
> device to fs_info->fs_devices.
> 
> Both btrfs_prepare_sprout() and btrfs_init_new_device() needs
> device_list_mutex. But they are holding it sequentially, thus creates a
> small window to an opportunity to race. Close this opportunity and hold
> device_list_mutex common to both btrfs_init_new_device() and
> btrfs_prepare_sprout().

I don't se what exactly would go wrong with the separate device list
locking, but I see at least one potential problem with the new code.

> 
> Signed-off-by: Anand Jain <anand.jain@oracle.com>
> ---
> RFC because IMO the cleanup of device_list_mutex makes sense even though
> there isn't another thread that could race potentially race as of now.
> 
> Depends on
>  [PATCH v2] btrfs: fix lockdep warning while mounting sprout fs
> which removed the device_list_mutex from clone_fs_devices() otherwise
> this patch will cause a double mutex error.
> 
> v2: fix the missing mutex_unlock in the error return
> v3: -
> v4: -
> v5: - (Except for the change in below SO comments)
> 
>  fs/btrfs/volumes.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index fa9fe47b5b68..53ead67b625c 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2369,6 +2369,8 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
>  	u64 super_flags;
>  
>  	lockdep_assert_held(&uuid_mutex);
> +	lockdep_assert_held(&fs_devices->device_list_mutex);
> +
>  	if (!fs_devices->seeding)
>  		return -EINVAL;
>  
> @@ -2400,7 +2402,6 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
>  	INIT_LIST_HEAD(&seed_devices->alloc_list);
>  	mutex_init(&seed_devices->device_list_mutex);

A few lines before this one there's alloc_fs_devices and
clone_fs_devices, both allocating memory. This would happen under a big
lock as device_list_mutex also protects superblock write. This is a
pattern to avoid.

A rough idea would be to split btrfs_prepare_sprout into parts where the
allocations are not done under the lock and the locked part. It could be
partially inlined to btrfs_init_new_device.

>  
> -	mutex_lock(&fs_devices->device_list_mutex);
>  	list_splice_init_rcu(&fs_devices->devices, &seed_devices->devices,
>  			      synchronize_rcu);
>  	list_for_each_entry(device, &seed_devices->devices, dev_list)
> @@ -2416,7 +2417,6 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
>  	generate_random_uuid(fs_devices->fsid);
>  	memcpy(fs_devices->metadata_uuid, fs_devices->fsid, BTRFS_FSID_SIZE);
>  	memcpy(disk_super->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
> -	mutex_unlock(&fs_devices->device_list_mutex);
>  
>  	super_flags = btrfs_super_flags(disk_super) &
>  		      ~BTRFS_SUPER_FLAG_SEEDING;
> @@ -2591,10 +2591,12 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>  	device->dev_stats_valid = 1;
>  	set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
>  
> +	mutex_lock(&fs_devices->device_list_mutex);
>  	if (seeding_dev) {
>  		btrfs_clear_sb_rdonly(sb);
>  		ret = btrfs_prepare_sprout(fs_info);
>  		if (ret) {
> +			mutex_unlock(&fs_devices->device_list_mutex);
>  			btrfs_abort_transaction(trans, ret);
>  			goto error_trans;
>  		}
> @@ -2604,7 +2606,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>  
>  	device->fs_devices = fs_devices;
>  
> -	mutex_lock(&fs_devices->device_list_mutex);
>  	mutex_lock(&fs_info->chunk_mutex);
>  	list_add_rcu(&device->dev_list, &fs_devices->devices);
>  	list_add(&device->dev_alloc_list, &fs_devices->alloc_list);
> -- 
> 2.31.1
Anand Jain Sept. 18, 2021, 12:10 a.m. UTC | #4
On 17/09/2021 23:37, David Sterba wrote:
> On Tue, Aug 31, 2021 at 09:21:29AM +0800, Anand Jain wrote:
>> btrfs_prepare_sprout() moves seed devices into its own struct fs_devices,
>> so that its parent function btrfs_init_new_device() can add the new sprout
>> device to fs_info->fs_devices.
>>
>> Both btrfs_prepare_sprout() and btrfs_init_new_device() needs
>> device_list_mutex. But they are holding it sequentially, thus creates a
>> small window to an opportunity to race. Close this opportunity and hold
>> device_list_mutex common to both btrfs_init_new_device() and
>> btrfs_prepare_sprout().
> 
> I don't se what exactly would go wrong with the separate device list
> locking, but I see at least one potential problem with the new code.
> 
>>
>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>> ---
>> RFC because IMO the cleanup of device_list_mutex makes sense even though
>> there isn't another thread that could race potentially race as of now.
>>
>> Depends on
>>   [PATCH v2] btrfs: fix lockdep warning while mounting sprout fs
>> which removed the device_list_mutex from clone_fs_devices() otherwise
>> this patch will cause a double mutex error.
>>
>> v2: fix the missing mutex_unlock in the error return
>> v3: -
>> v4: -
>> v5: - (Except for the change in below SO comments)
>>
>>   fs/btrfs/volumes.c | 7 ++++---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index fa9fe47b5b68..53ead67b625c 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -2369,6 +2369,8 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
>>   	u64 super_flags;
>>   
>>   	lockdep_assert_held(&uuid_mutex);
>> +	lockdep_assert_held(&fs_devices->device_list_mutex);
>> +
>>   	if (!fs_devices->seeding)
>>   		return -EINVAL;
>>   
>> @@ -2400,7 +2402,6 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
>>   	INIT_LIST_HEAD(&seed_devices->alloc_list);


>>   	mutex_init(&seed_devices->device_list_mutex);

  BTW mutex_init here will go, as the sprout's private
  fs_devices::device_list_mutex is unused. It is a pending cleanup.

> A few lines before this one there's alloc_fs_devices and
> clone_fs_devices, both allocating memory. This would happen under a big
> lock as device_list_mutex also protects superblock write. This is a
> pattern to avoid.

  Oh. That's right. Thx. One way is to flag NOFS alloc.

> A rough idea would be to split btrfs_prepare_sprout into parts where the
> allocations are not done under the lock and the locked part. It could be
> partially inlined to btrfs_init_new_device.

  I think you mean something like this...

  btrfs_init_new_device()
  <snip>
    if seeding_dev
       alloc_prepare_sprout
    mutex_lock(&fs_devices->device_list_mutex);
    if seeding_dev
       finish_prepare_sprout
    <snip>
    mutex_unlock(&fs_devices->device_list_mutex);

  I am trying.

Thanks, Anand

>>   
>> -	mutex_lock(&fs_devices->device_list_mutex);
>>   	list_splice_init_rcu(&fs_devices->devices, &seed_devices->devices,
>>   			      synchronize_rcu);
>>   	list_for_each_entry(device, &seed_devices->devices, dev_list)
>> @@ -2416,7 +2417,6 @@ static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
>>   	generate_random_uuid(fs_devices->fsid);
>>   	memcpy(fs_devices->metadata_uuid, fs_devices->fsid, BTRFS_FSID_SIZE);
>>   	memcpy(disk_super->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
>> -	mutex_unlock(&fs_devices->device_list_mutex);
>>   
>>   	super_flags = btrfs_super_flags(disk_super) &
>>   		      ~BTRFS_SUPER_FLAG_SEEDING;
>> @@ -2591,10 +2591,12 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>>   	device->dev_stats_valid = 1;
>>   	set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
>>   
>> +	mutex_lock(&fs_devices->device_list_mutex);
>>   	if (seeding_dev) {
>>   		btrfs_clear_sb_rdonly(sb);
>>   		ret = btrfs_prepare_sprout(fs_info);
>>   		if (ret) {
>> +			mutex_unlock(&fs_devices->device_list_mutex);
>>   			btrfs_abort_transaction(trans, ret);
>>   			goto error_trans;
>>   		}
>> @@ -2604,7 +2606,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>>   
>>   	device->fs_devices = fs_devices;
>>   
>> -	mutex_lock(&fs_devices->device_list_mutex);
>>   	mutex_lock(&fs_info->chunk_mutex);
>>   	list_add_rcu(&device->dev_list, &fs_devices->devices);
>>   	list_add(&device->dev_alloc_list, &fs_devices->alloc_list);
>> -- 
>> 2.31.1
diff mbox series

Patch

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index fa9fe47b5b68..53ead67b625c 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2369,6 +2369,8 @@  static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
 	u64 super_flags;
 
 	lockdep_assert_held(&uuid_mutex);
+	lockdep_assert_held(&fs_devices->device_list_mutex);
+
 	if (!fs_devices->seeding)
 		return -EINVAL;
 
@@ -2400,7 +2402,6 @@  static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
 	INIT_LIST_HEAD(&seed_devices->alloc_list);
 	mutex_init(&seed_devices->device_list_mutex);
 
-	mutex_lock(&fs_devices->device_list_mutex);
 	list_splice_init_rcu(&fs_devices->devices, &seed_devices->devices,
 			      synchronize_rcu);
 	list_for_each_entry(device, &seed_devices->devices, dev_list)
@@ -2416,7 +2417,6 @@  static int btrfs_prepare_sprout(struct btrfs_fs_info *fs_info)
 	generate_random_uuid(fs_devices->fsid);
 	memcpy(fs_devices->metadata_uuid, fs_devices->fsid, BTRFS_FSID_SIZE);
 	memcpy(disk_super->fsid, fs_devices->fsid, BTRFS_FSID_SIZE);
-	mutex_unlock(&fs_devices->device_list_mutex);
 
 	super_flags = btrfs_super_flags(disk_super) &
 		      ~BTRFS_SUPER_FLAG_SEEDING;
@@ -2591,10 +2591,12 @@  int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 	device->dev_stats_valid = 1;
 	set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
 
+	mutex_lock(&fs_devices->device_list_mutex);
 	if (seeding_dev) {
 		btrfs_clear_sb_rdonly(sb);
 		ret = btrfs_prepare_sprout(fs_info);
 		if (ret) {
+			mutex_unlock(&fs_devices->device_list_mutex);
 			btrfs_abort_transaction(trans, ret);
 			goto error_trans;
 		}
@@ -2604,7 +2606,6 @@  int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
 
 	device->fs_devices = fs_devices;
 
-	mutex_lock(&fs_devices->device_list_mutex);
 	mutex_lock(&fs_info->chunk_mutex);
 	list_add_rcu(&device->dev_list, &fs_devices->devices);
 	list_add(&device->dev_alloc_list, &fs_devices->alloc_list);