diff mbox series

[v2,1/2] btrfs: zoned: use rcu list in btrfs_can_activate_zone

Message ID b1495af336d60ff11a0fac6c27f15533e9b82b31.1646649873.git.johannes.thumshirn@wdc.com (mailing list archive)
State New, archived
Headers show
Series btrfs: zoned: two independent fixes for zoned btrfs | expand

Commit Message

Johannes Thumshirn March 7, 2022, 10:47 a.m. UTC
btrfs_can_activate_zone() can be called with the device_list_mutex already
held, which will lead to a deadlock.

As we're only traversing the list for reads we can switch from the
device_list_mutex to an rcu traversal of the list.

Fixes: a85f05e59bc1 ("btrfs: zoned: avoid chunk allocation if active block group has enough space")
Cc: Naohiro Aota <naohiro.aota@wwdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/zoned.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Anand Jain March 7, 2022, 12:34 p.m. UTC | #1
On 07/03/2022 18:47, Johannes Thumshirn wrote:
> btrfs_can_activate_zone() can be called with the device_list_mutex already
> held, which will lead to a deadlock.

Could you please state that thread? I tried to trace it back and lost it.

Thanks, Anand

> As we're only traversing the list for reads we can switch from the
> device_list_mutex to an rcu traversal of the list.
> 
> Fixes: a85f05e59bc1 ("btrfs: zoned: avoid chunk allocation if active block group has enough space")
> Cc: Naohiro Aota <naohiro.aota@wwdc.com>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>   fs/btrfs/zoned.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index b7b5fac1c779..cf6341d45689 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -1986,8 +1986,8 @@ bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags)
>   	ASSERT((flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0);
>   
>   	/* Check if there is a device with active zones left */
> -	mutex_lock(&fs_devices->device_list_mutex);
> -	list_for_each_entry(device, &fs_devices->devices, dev_list) {
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) {
>   		struct btrfs_zoned_device_info *zinfo = device->zone_info;
>   
>   		if (!device->bdev)
> @@ -1999,7 +1999,7 @@ bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags)
>   			break;
>   		}
>   	}
> -	mutex_unlock(&fs_devices->device_list_mutex);
> +	rcu_read_unlock();
>   
>   	return ret;
>   }
Johannes Thumshirn March 7, 2022, 12:54 p.m. UTC | #2
On 07/03/2022 13:34, Anand Jain wrote:
> On 07/03/2022 18:47, Johannes Thumshirn wrote:
>> btrfs_can_activate_zone() can be called with the device_list_mutex already
>> held, which will lead to a deadlock.
> 
> Could you please state that thread? I tried to trace it back and lost it.
> 


For debugging purpose I've added similar code to prepare_allocation() and 
got a deadlock (see the lockdep splat @[1]).


So code inspection showed, that we're calling btrfs_can_activate_zone()
under the same circumstances from can_allocate_chunk() and thus are prone
to this deadlock as well. I think it should be:

insert_dev_extents() // Takes device_list_mutex
`-> insert_dev_extent()
 `-> btrfs_insert_empty_item()
  `-> btrfs_insert_empty_items()
   `-> btrfs_search_slot()
    `-> btrfs_cow_block()
     `-> __btrfs_cow_block()
      `-> btrfs_alloc_tree_block()
       `-> btrfs_reserve_extent()
        `-> find_free_extent()
         `-> find_free_extent_update_loop()
          `-> can_allocate_chunk()
           `-> btrfs_can_activate_zone() // Takes device_list_mutex again


[1]
[   15.165897] 
[   15.166062] ============================================
[   15.166572] WARNING: possible recursive locking detected
[   15.167117] 5.17.0-rc6-dennis #79 Not tainted
[   15.167487] --------------------------------------------
[   15.167733] kworker/u8:3/146 is trying to acquire lock:
[   15.167733] ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: find_free_extent+0x15a/0x14f0 [btrfs]
[   15.167733] 
[   15.167733] but task is already holding lock:
[   15.167733] ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_create_pending_block_groups+0x20a/0x560 [btrfs]
[   15.167733] 
[   15.167733] other info that might help us debug this:
[   15.167733]  Possible unsafe locking scenario:
[   15.167733] 
[   15.171834]        CPU0
[   15.171834]        ----
[   15.171834]   lock(&fs_devs->device_list_mutex);
[   15.171834]   lock(&fs_devs->device_list_mutex);
[   15.171834] 
[   15.171834]  *** DEADLOCK ***
[   15.171834] 
[   15.171834]  May be due to missing lock nesting notation
[   15.171834] 
[   15.171834] 5 locks held by kworker/u8:3/146:
[   15.171834]  #0: ffff888100050938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1c3/0x5a0
[   15.171834]  #1: ffffc9000067be80 ((work_completion)(&fs_info->async_data_reclaim_work)){+.+.}-{0:0}, at: process_one_work+0x1c3/0x5a0
[   15.176244]  #2: ffff88810521e620 (sb_internal){.+.+}-{0:0}, at: flush_space+0x335/0x600 [btrfs]
[   15.176244]  #3: ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_create_pending_block_groups+0x20a/0x560 [btrfs]
[   15.176244]  #4: ffff8881152e4b78 (btrfs-dev-00){++++}-{3:3}, at: __btrfs_tree_lock+0x27/0x130 [btrfs]
[   15.179641] 
[   15.179641] stack backtrace:
[   15.179641] CPU: 1 PID: 146 Comm: kworker/u8:3 Not tainted 5.17.0-rc6-dennis #79
[   15.179641] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
[   15.179641] Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs]
[   15.179641] Call Trace:
[   15.179641]  <TASK>
[   15.179641]  dump_stack_lvl+0x45/0x59
[   15.179641]  __lock_acquire.cold+0x217/0x2b2
[   15.179641]  lock_acquire+0xbf/0x2b0
[   15.183838]  ? find_free_extent+0x15a/0x14f0 [btrfs]
[   15.183838]  __mutex_lock+0x8e/0x970
[   15.183838]  ? find_free_extent+0x15a/0x14f0 [btrfs]
[   15.183838]  ? find_free_extent+0x15a/0x14f0 [btrfs]
[   15.183838]  ? lock_is_held_type+0xd7/0x130
[   15.183838]  ? find_free_extent+0x15a/0x14f0 [btrfs]
[   15.183838]  find_free_extent+0x15a/0x14f0 [btrfs]
[   15.183838]  ? _raw_spin_unlock+0x24/0x40
[   15.183838]  ? btrfs_get_alloc_profile+0x106/0x230 [btrfs]
[   15.187601]  btrfs_reserve_extent+0x131/0x260 [btrfs]
[   15.187601]  btrfs_alloc_tree_block+0xb5/0x3b0 [btrfs]
[   15.187601]  __btrfs_cow_block+0x138/0x600 [btrfs]
[   15.187601]  btrfs_cow_block+0x10f/0x230 [btrfs]
[   15.187601]  btrfs_search_slot+0x55f/0xbc0 [btrfs]
[   15.187601]  ? lock_is_held_type+0xd7/0x130
[   15.187601]  btrfs_insert_empty_items+0x2d/0x60 [btrfs]
[   15.187601]  btrfs_create_pending_block_groups+0x2b3/0x560 [btrfs]
[   15.187601]  __btrfs_end_transaction+0x36/0x2a0 [btrfs]
[   15.192037]  flush_space+0x374/0x600 [btrfs]
[   15.192037]  ? find_held_lock+0x2b/0x80
[   15.192037]  ? btrfs_async_reclaim_data_space+0x49/0x180 [btrfs]
[   15.192037]  ? lock_release+0x131/0x2b0
[   15.192037]  btrfs_async_reclaim_data_space+0x70/0x180 [btrfs]
[   15.192037]  process_one_work+0x24c/0x5a0
[   15.192037]  worker_thread+0x4a/0x3d0
[   15.192037]  ? process_one_work+0x5a0/0x5a0
[   15.195630]  kthread+0xed/0x120
[   15.195630]  ? kthread_complete_and_exit+0x20/0x20
[   15.195630]  ret_from_fork+0x22/0x30
[   15.195630]  </TASK>
David Sterba March 7, 2022, 2:22 p.m. UTC | #3
On Mon, Mar 07, 2022 at 12:54:47PM +0000, Johannes Thumshirn wrote:
> On 07/03/2022 13:34, Anand Jain wrote:
> > On 07/03/2022 18:47, Johannes Thumshirn wrote:
> >> btrfs_can_activate_zone() can be called with the device_list_mutex already
> >> held, which will lead to a deadlock.
> > 
> > Could you please state that thread? I tried to trace it back and lost it.
> > 
> 
> 
> For debugging purpose I've added similar code to prepare_allocation() and 
> got a deadlock (see the lockdep splat @[1]).
> 
> 
> So code inspection showed, that we're calling btrfs_can_activate_zone()
> under the same circumstances from can_allocate_chunk() and thus are prone
> to this deadlock as well. I think it should be:
> 
> insert_dev_extents() // Takes device_list_mutex
> `-> insert_dev_extent()
>  `-> btrfs_insert_empty_item()
>   `-> btrfs_insert_empty_items()
>    `-> btrfs_search_slot()
>     `-> btrfs_cow_block()
>      `-> __btrfs_cow_block()
>       `-> btrfs_alloc_tree_block()
>        `-> btrfs_reserve_extent()
>         `-> find_free_extent()
>          `-> find_free_extent_update_loop()
>           `-> can_allocate_chunk()
>            `-> btrfs_can_activate_zone() // Takes device_list_mutex again
> 
> 
> [1]
> [   15.165897] 
> [   15.166062] ============================================
> [   15.166572] WARNING: possible recursive locking detected
> [   15.167117] 5.17.0-rc6-dennis #79 Not tainted
> [   15.167487] --------------------------------------------
> [   15.167733] kworker/u8:3/146 is trying to acquire lock:
> [   15.167733] ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: find_free_extent+0x15a/0x14f0 [btrfs]
> [   15.167733] 
> [   15.167733] but task is already holding lock:
> [   15.167733] ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_create_pending_block_groups+0x20a/0x560 [btrfs]
> [   15.167733] 
> [   15.167733] other info that might help us debug this:
> [   15.167733]  Possible unsafe locking scenario:
> [   15.167733] 
> [   15.171834]        CPU0
> [   15.171834]        ----
> [   15.171834]   lock(&fs_devs->device_list_mutex);
> [   15.171834]   lock(&fs_devs->device_list_mutex);
> [   15.171834] 
> [   15.171834]  *** DEADLOCK ***
> [   15.171834] 
> [   15.171834]  May be due to missing lock nesting notation
> [   15.171834] 
> [   15.171834] 5 locks held by kworker/u8:3/146:
> [   15.171834]  #0: ffff888100050938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1c3/0x5a0
> [   15.171834]  #1: ffffc9000067be80 ((work_completion)(&fs_info->async_data_reclaim_work)){+.+.}-{0:0}, at: process_one_work+0x1c3/0x5a0
> [   15.176244]  #2: ffff88810521e620 (sb_internal){.+.+}-{0:0}, at: flush_space+0x335/0x600 [btrfs]
> [   15.176244]  #3: ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_create_pending_block_groups+0x20a/0x560 [btrfs]
> [   15.176244]  #4: ffff8881152e4b78 (btrfs-dev-00){++++}-{3:3}, at: __btrfs_tree_lock+0x27/0x130 [btrfs]
> [   15.179641] 
> [   15.179641] stack backtrace:
> [   15.179641] CPU: 1 PID: 146 Comm: kworker/u8:3 Not tainted 5.17.0-rc6-dennis #79
> [   15.179641] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014
> [   15.179641] Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs]
> [   15.179641] Call Trace:
> [   15.179641]  <TASK>
> [   15.179641]  dump_stack_lvl+0x45/0x59
> [   15.179641]  __lock_acquire.cold+0x217/0x2b2
> [   15.179641]  lock_acquire+0xbf/0x2b0
> [   15.183838]  ? find_free_extent+0x15a/0x14f0 [btrfs]
> [   15.183838]  __mutex_lock+0x8e/0x970
> [   15.183838]  ? find_free_extent+0x15a/0x14f0 [btrfs]
> [   15.183838]  ? find_free_extent+0x15a/0x14f0 [btrfs]
> [   15.183838]  ? lock_is_held_type+0xd7/0x130
> [   15.183838]  ? find_free_extent+0x15a/0x14f0 [btrfs]
> [   15.183838]  find_free_extent+0x15a/0x14f0 [btrfs]
> [   15.183838]  ? _raw_spin_unlock+0x24/0x40
> [   15.183838]  ? btrfs_get_alloc_profile+0x106/0x230 [btrfs]
> [   15.187601]  btrfs_reserve_extent+0x131/0x260 [btrfs]
> [   15.187601]  btrfs_alloc_tree_block+0xb5/0x3b0 [btrfs]
> [   15.187601]  __btrfs_cow_block+0x138/0x600 [btrfs]
> [   15.187601]  btrfs_cow_block+0x10f/0x230 [btrfs]
> [   15.187601]  btrfs_search_slot+0x55f/0xbc0 [btrfs]
> [   15.187601]  ? lock_is_held_type+0xd7/0x130
> [   15.187601]  btrfs_insert_empty_items+0x2d/0x60 [btrfs]
> [   15.187601]  btrfs_create_pending_block_groups+0x2b3/0x560 [btrfs]
> [   15.187601]  __btrfs_end_transaction+0x36/0x2a0 [btrfs]
> [   15.192037]  flush_space+0x374/0x600 [btrfs]
> [   15.192037]  ? find_held_lock+0x2b/0x80
> [   15.192037]  ? btrfs_async_reclaim_data_space+0x49/0x180 [btrfs]
> [   15.192037]  ? lock_release+0x131/0x2b0
> [   15.192037]  btrfs_async_reclaim_data_space+0x70/0x180 [btrfs]
> [   15.192037]  process_one_work+0x24c/0x5a0
> [   15.192037]  worker_thread+0x4a/0x3d0
> [   15.192037]  ? process_one_work+0x5a0/0x5a0
> [   15.195630]  kthread+0xed/0x120
> [   15.195630]  ? kthread_complete_and_exit+0x20/0x20
> [   15.195630]  ret_from_fork+0x22/0x30

Thanks, pasted to the changelog of the patch.
Anand Jain March 9, 2022, 7:37 a.m. UTC | #4
On 07/03/2022 18:47, Johannes Thumshirn wrote:
> btrfs_can_activate_zone() can be called with the device_list_mutex already
> held, which will lead to a deadlock.
> 
> As we're only traversing the list for reads we can switch from the
> device_list_mutex to an rcu traversal of the list.
> 
> Fixes: a85f05e59bc1 ("btrfs: zoned: avoid chunk allocation if active block group has enough space")
> Cc: Naohiro Aota <naohiro.aota@wwdc.com>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>   fs/btrfs/zoned.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> index b7b5fac1c779..cf6341d45689 100644
> --- a/fs/btrfs/zoned.c
> +++ b/fs/btrfs/zoned.c
> @@ -1986,8 +1986,8 @@ bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags)
>   	ASSERT((flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0);
>   
>   	/* Check if there is a device with active zones left */
> -	mutex_lock(&fs_devices->device_list_mutex);
> -	list_for_each_entry(device, &fs_devices->devices, dev_list) {
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) {


  Sorry for the late comment.
  Why not use dev_alloc_list and, chunk_mutex here?

Thanks, Anand

>   		struct btrfs_zoned_device_info *zinfo = device->zone_info;
>   
>   		if (!device->bdev)
> @@ -1999,7 +1999,7 @@ bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags)
>   			break;
>   		}
>   	}
> -	mutex_unlock(&fs_devices->device_list_mutex);
> +	rcu_read_unlock();
>   
>   	return ret;
>   }
Johannes Thumshirn March 9, 2022, 7:44 a.m. UTC | #5
On 09/03/2022 08:37, Anand Jain wrote:
> On 07/03/2022 18:47, Johannes Thumshirn wrote:
>> btrfs_can_activate_zone() can be called with the device_list_mutex already
>> held, which will lead to a deadlock.
>>
>> As we're only traversing the list for reads we can switch from the
>> device_list_mutex to an rcu traversal of the list.
>>
>> Fixes: a85f05e59bc1 ("btrfs: zoned: avoid chunk allocation if active block group has enough space")
>> Cc: Naohiro Aota <naohiro.aota@wwdc.com>
>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>> ---
>>   fs/btrfs/zoned.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
>> index b7b5fac1c779..cf6341d45689 100644
>> --- a/fs/btrfs/zoned.c
>> +++ b/fs/btrfs/zoned.c
>> @@ -1986,8 +1986,8 @@ bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags)
>>   	ASSERT((flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0);
>>   
>>   	/* Check if there is a device with active zones left */
>> -	mutex_lock(&fs_devices->device_list_mutex);
>> -	list_for_each_entry(device, &fs_devices->devices, dev_list) {
>> +	rcu_read_lock();
>> +	list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) {
> 
> 
>   Sorry for the late comment.
>   Why not use dev_alloc_list and, chunk_mutex here?
> 

That's a good question indeed. Let me try it :)
David Sterba March 9, 2022, 2:47 p.m. UTC | #6
On Wed, Mar 09, 2022 at 07:44:11AM +0000, Johannes Thumshirn wrote:
> On 09/03/2022 08:37, Anand Jain wrote:
> > On 07/03/2022 18:47, Johannes Thumshirn wrote:
> >> btrfs_can_activate_zone() can be called with the device_list_mutex already
> >> held, which will lead to a deadlock.
> >>
> >> As we're only traversing the list for reads we can switch from the
> >> device_list_mutex to an rcu traversal of the list.
> >>
> >> Fixes: a85f05e59bc1 ("btrfs: zoned: avoid chunk allocation if active block group has enough space")
> >> Cc: Naohiro Aota <naohiro.aota@wwdc.com>
> >> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> >> ---
> >>   fs/btrfs/zoned.c | 6 +++---
> >>   1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
> >> index b7b5fac1c779..cf6341d45689 100644
> >> --- a/fs/btrfs/zoned.c
> >> +++ b/fs/btrfs/zoned.c
> >> @@ -1986,8 +1986,8 @@ bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags)
> >>   	ASSERT((flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0);
> >>   
> >>   	/* Check if there is a device with active zones left */
> >> -	mutex_lock(&fs_devices->device_list_mutex);
> >> -	list_for_each_entry(device, &fs_devices->devices, dev_list) {
> >> +	rcu_read_lock();
> >> +	list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) {
> > 
> > 
> >   Sorry for the late comment.
> >   Why not use dev_alloc_list and, chunk_mutex here?
> > 
> 
> That's a good question indeed. Let me try it :)

The 5.18 is now frozen, so please send it as an incremental fix, thanks.
diff mbox series

Patch

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index b7b5fac1c779..cf6341d45689 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1986,8 +1986,8 @@  bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags)
 	ASSERT((flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0);
 
 	/* Check if there is a device with active zones left */
-	mutex_lock(&fs_devices->device_list_mutex);
-	list_for_each_entry(device, &fs_devices->devices, dev_list) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) {
 		struct btrfs_zoned_device_info *zinfo = device->zone_info;
 
 		if (!device->bdev)
@@ -1999,7 +1999,7 @@  bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags)
 			break;
 		}
 	}
-	mutex_unlock(&fs_devices->device_list_mutex);
+	rcu_read_unlock();
 
 	return ret;
 }