Message ID | b1495af336d60ff11a0fac6c27f15533e9b82b31.1646649873.git.johannes.thumshirn@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: zoned: two independent fixes for zoned btrfs | expand |
On 07/03/2022 18:47, Johannes Thumshirn wrote: > btrfs_can_activate_zone() can be called with the device_list_mutex already > held, which will lead to a deadlock. Could you please state that thread? I tried to trace it back and lost it. Thanks, Anand > As we're only traversing the list for reads we can switch from the > device_list_mutex to an rcu traversal of the list. > > Fixes: a85f05e59bc1 ("btrfs: zoned: avoid chunk allocation if active block group has enough space") > Cc: Naohiro Aota <naohiro.aota@wwdc.com> > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > --- > fs/btrfs/zoned.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c > index b7b5fac1c779..cf6341d45689 100644 > --- a/fs/btrfs/zoned.c > +++ b/fs/btrfs/zoned.c > @@ -1986,8 +1986,8 @@ bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags) > ASSERT((flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0); > > /* Check if there is a device with active zones left */ > - mutex_lock(&fs_devices->device_list_mutex); > - list_for_each_entry(device, &fs_devices->devices, dev_list) { > + rcu_read_lock(); > + list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) { > struct btrfs_zoned_device_info *zinfo = device->zone_info; > > if (!device->bdev) > @@ -1999,7 +1999,7 @@ bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags) > break; > } > } > - mutex_unlock(&fs_devices->device_list_mutex); > + rcu_read_unlock(); > > return ret; > }
On 07/03/2022 13:34, Anand Jain wrote: > On 07/03/2022 18:47, Johannes Thumshirn wrote: >> btrfs_can_activate_zone() can be called with the device_list_mutex already >> held, which will lead to a deadlock. > > Could you please state that thread? I tried to trace it back and lost it. > For debugging purpose I've added similar code to prepare_allocation() and got a deadlock (see the lockdep splat @[1]). So code inspection showed, that we're calling btrfs_can_activate_zone() under the same circumstances from can_allocate_chunk() and thus are prone to this deadlock as well. I think it should be: insert_dev_extents() // Takes device_list_mutex `-> insert_dev_extent() `-> btrfs_insert_empty_item() `-> btrfs_insert_empty_items() `-> btrfs_search_slot() `-> btrfs_cow_block() `-> __btrfs_cow_block() `-> btrfs_alloc_tree_block() `-> btrfs_reserve_extent() `-> find_free_extent() `-> find_free_extent_update_loop() `-> can_allocate_chunk() `-> btrfs_can_activate_zone() // Takes device_list_mutex again [1] [ 15.165897] [ 15.166062] ============================================ [ 15.166572] WARNING: possible recursive locking detected [ 15.167117] 5.17.0-rc6-dennis #79 Not tainted [ 15.167487] -------------------------------------------- [ 15.167733] kworker/u8:3/146 is trying to acquire lock: [ 15.167733] ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: find_free_extent+0x15a/0x14f0 [btrfs] [ 15.167733] [ 15.167733] but task is already holding lock: [ 15.167733] ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_create_pending_block_groups+0x20a/0x560 [btrfs] [ 15.167733] [ 15.167733] other info that might help us debug this: [ 15.167733] Possible unsafe locking scenario: [ 15.167733] [ 15.171834] CPU0 [ 15.171834] ---- [ 15.171834] lock(&fs_devs->device_list_mutex); [ 15.171834] lock(&fs_devs->device_list_mutex); [ 15.171834] [ 15.171834] *** DEADLOCK *** [ 15.171834] [ 15.171834] May be due to missing lock nesting notation [ 15.171834] [ 15.171834] 5 locks held by kworker/u8:3/146: [ 15.171834] #0: ffff888100050938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1c3/0x5a0 [ 15.171834] #1: ffffc9000067be80 ((work_completion)(&fs_info->async_data_reclaim_work)){+.+.}-{0:0}, at: process_one_work+0x1c3/0x5a0 [ 15.176244] #2: ffff88810521e620 (sb_internal){.+.+}-{0:0}, at: flush_space+0x335/0x600 [btrfs] [ 15.176244] #3: ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_create_pending_block_groups+0x20a/0x560 [btrfs] [ 15.176244] #4: ffff8881152e4b78 (btrfs-dev-00){++++}-{3:3}, at: __btrfs_tree_lock+0x27/0x130 [btrfs] [ 15.179641] [ 15.179641] stack backtrace: [ 15.179641] CPU: 1 PID: 146 Comm: kworker/u8:3 Not tainted 5.17.0-rc6-dennis #79 [ 15.179641] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014 [ 15.179641] Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs] [ 15.179641] Call Trace: [ 15.179641] <TASK> [ 15.179641] dump_stack_lvl+0x45/0x59 [ 15.179641] __lock_acquire.cold+0x217/0x2b2 [ 15.179641] lock_acquire+0xbf/0x2b0 [ 15.183838] ? find_free_extent+0x15a/0x14f0 [btrfs] [ 15.183838] __mutex_lock+0x8e/0x970 [ 15.183838] ? find_free_extent+0x15a/0x14f0 [btrfs] [ 15.183838] ? find_free_extent+0x15a/0x14f0 [btrfs] [ 15.183838] ? lock_is_held_type+0xd7/0x130 [ 15.183838] ? find_free_extent+0x15a/0x14f0 [btrfs] [ 15.183838] find_free_extent+0x15a/0x14f0 [btrfs] [ 15.183838] ? _raw_spin_unlock+0x24/0x40 [ 15.183838] ? btrfs_get_alloc_profile+0x106/0x230 [btrfs] [ 15.187601] btrfs_reserve_extent+0x131/0x260 [btrfs] [ 15.187601] btrfs_alloc_tree_block+0xb5/0x3b0 [btrfs] [ 15.187601] __btrfs_cow_block+0x138/0x600 [btrfs] [ 15.187601] btrfs_cow_block+0x10f/0x230 [btrfs] [ 15.187601] btrfs_search_slot+0x55f/0xbc0 [btrfs] [ 15.187601] ? lock_is_held_type+0xd7/0x130 [ 15.187601] btrfs_insert_empty_items+0x2d/0x60 [btrfs] [ 15.187601] btrfs_create_pending_block_groups+0x2b3/0x560 [btrfs] [ 15.187601] __btrfs_end_transaction+0x36/0x2a0 [btrfs] [ 15.192037] flush_space+0x374/0x600 [btrfs] [ 15.192037] ? find_held_lock+0x2b/0x80 [ 15.192037] ? btrfs_async_reclaim_data_space+0x49/0x180 [btrfs] [ 15.192037] ? lock_release+0x131/0x2b0 [ 15.192037] btrfs_async_reclaim_data_space+0x70/0x180 [btrfs] [ 15.192037] process_one_work+0x24c/0x5a0 [ 15.192037] worker_thread+0x4a/0x3d0 [ 15.192037] ? process_one_work+0x5a0/0x5a0 [ 15.195630] kthread+0xed/0x120 [ 15.195630] ? kthread_complete_and_exit+0x20/0x20 [ 15.195630] ret_from_fork+0x22/0x30 [ 15.195630] </TASK>
On Mon, Mar 07, 2022 at 12:54:47PM +0000, Johannes Thumshirn wrote: > On 07/03/2022 13:34, Anand Jain wrote: > > On 07/03/2022 18:47, Johannes Thumshirn wrote: > >> btrfs_can_activate_zone() can be called with the device_list_mutex already > >> held, which will lead to a deadlock. > > > > Could you please state that thread? I tried to trace it back and lost it. > > > > > For debugging purpose I've added similar code to prepare_allocation() and > got a deadlock (see the lockdep splat @[1]). > > > So code inspection showed, that we're calling btrfs_can_activate_zone() > under the same circumstances from can_allocate_chunk() and thus are prone > to this deadlock as well. I think it should be: > > insert_dev_extents() // Takes device_list_mutex > `-> insert_dev_extent() > `-> btrfs_insert_empty_item() > `-> btrfs_insert_empty_items() > `-> btrfs_search_slot() > `-> btrfs_cow_block() > `-> __btrfs_cow_block() > `-> btrfs_alloc_tree_block() > `-> btrfs_reserve_extent() > `-> find_free_extent() > `-> find_free_extent_update_loop() > `-> can_allocate_chunk() > `-> btrfs_can_activate_zone() // Takes device_list_mutex again > > > [1] > [ 15.165897] > [ 15.166062] ============================================ > [ 15.166572] WARNING: possible recursive locking detected > [ 15.167117] 5.17.0-rc6-dennis #79 Not tainted > [ 15.167487] -------------------------------------------- > [ 15.167733] kworker/u8:3/146 is trying to acquire lock: > [ 15.167733] ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: find_free_extent+0x15a/0x14f0 [btrfs] > [ 15.167733] > [ 15.167733] but task is already holding lock: > [ 15.167733] ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_create_pending_block_groups+0x20a/0x560 [btrfs] > [ 15.167733] > [ 15.167733] other info that might help us debug this: > [ 15.167733] Possible unsafe locking scenario: > [ 15.167733] > [ 15.171834] CPU0 > [ 15.171834] ---- > [ 15.171834] lock(&fs_devs->device_list_mutex); > [ 15.171834] lock(&fs_devs->device_list_mutex); > [ 15.171834] > [ 15.171834] *** DEADLOCK *** > [ 15.171834] > [ 15.171834] May be due to missing lock nesting notation > [ 15.171834] > [ 15.171834] 5 locks held by kworker/u8:3/146: > [ 15.171834] #0: ffff888100050938 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x1c3/0x5a0 > [ 15.171834] #1: ffffc9000067be80 ((work_completion)(&fs_info->async_data_reclaim_work)){+.+.}-{0:0}, at: process_one_work+0x1c3/0x5a0 > [ 15.176244] #2: ffff88810521e620 (sb_internal){.+.+}-{0:0}, at: flush_space+0x335/0x600 [btrfs] > [ 15.176244] #3: ffff888102962ee0 (&fs_devs->device_list_mutex){+.+.}-{3:3}, at: btrfs_create_pending_block_groups+0x20a/0x560 [btrfs] > [ 15.176244] #4: ffff8881152e4b78 (btrfs-dev-00){++++}-{3:3}, at: __btrfs_tree_lock+0x27/0x130 [btrfs] > [ 15.179641] > [ 15.179641] stack backtrace: > [ 15.179641] CPU: 1 PID: 146 Comm: kworker/u8:3 Not tainted 5.17.0-rc6-dennis #79 > [ 15.179641] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1.fc35 04/01/2014 > [ 15.179641] Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs] > [ 15.179641] Call Trace: > [ 15.179641] <TASK> > [ 15.179641] dump_stack_lvl+0x45/0x59 > [ 15.179641] __lock_acquire.cold+0x217/0x2b2 > [ 15.179641] lock_acquire+0xbf/0x2b0 > [ 15.183838] ? find_free_extent+0x15a/0x14f0 [btrfs] > [ 15.183838] __mutex_lock+0x8e/0x970 > [ 15.183838] ? find_free_extent+0x15a/0x14f0 [btrfs] > [ 15.183838] ? find_free_extent+0x15a/0x14f0 [btrfs] > [ 15.183838] ? lock_is_held_type+0xd7/0x130 > [ 15.183838] ? find_free_extent+0x15a/0x14f0 [btrfs] > [ 15.183838] find_free_extent+0x15a/0x14f0 [btrfs] > [ 15.183838] ? _raw_spin_unlock+0x24/0x40 > [ 15.183838] ? btrfs_get_alloc_profile+0x106/0x230 [btrfs] > [ 15.187601] btrfs_reserve_extent+0x131/0x260 [btrfs] > [ 15.187601] btrfs_alloc_tree_block+0xb5/0x3b0 [btrfs] > [ 15.187601] __btrfs_cow_block+0x138/0x600 [btrfs] > [ 15.187601] btrfs_cow_block+0x10f/0x230 [btrfs] > [ 15.187601] btrfs_search_slot+0x55f/0xbc0 [btrfs] > [ 15.187601] ? lock_is_held_type+0xd7/0x130 > [ 15.187601] btrfs_insert_empty_items+0x2d/0x60 [btrfs] > [ 15.187601] btrfs_create_pending_block_groups+0x2b3/0x560 [btrfs] > [ 15.187601] __btrfs_end_transaction+0x36/0x2a0 [btrfs] > [ 15.192037] flush_space+0x374/0x600 [btrfs] > [ 15.192037] ? find_held_lock+0x2b/0x80 > [ 15.192037] ? btrfs_async_reclaim_data_space+0x49/0x180 [btrfs] > [ 15.192037] ? lock_release+0x131/0x2b0 > [ 15.192037] btrfs_async_reclaim_data_space+0x70/0x180 [btrfs] > [ 15.192037] process_one_work+0x24c/0x5a0 > [ 15.192037] worker_thread+0x4a/0x3d0 > [ 15.192037] ? process_one_work+0x5a0/0x5a0 > [ 15.195630] kthread+0xed/0x120 > [ 15.195630] ? kthread_complete_and_exit+0x20/0x20 > [ 15.195630] ret_from_fork+0x22/0x30 Thanks, pasted to the changelog of the patch.
On 07/03/2022 18:47, Johannes Thumshirn wrote: > btrfs_can_activate_zone() can be called with the device_list_mutex already > held, which will lead to a deadlock. > > As we're only traversing the list for reads we can switch from the > device_list_mutex to an rcu traversal of the list. > > Fixes: a85f05e59bc1 ("btrfs: zoned: avoid chunk allocation if active block group has enough space") > Cc: Naohiro Aota <naohiro.aota@wwdc.com> > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > --- > fs/btrfs/zoned.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c > index b7b5fac1c779..cf6341d45689 100644 > --- a/fs/btrfs/zoned.c > +++ b/fs/btrfs/zoned.c > @@ -1986,8 +1986,8 @@ bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags) > ASSERT((flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0); > > /* Check if there is a device with active zones left */ > - mutex_lock(&fs_devices->device_list_mutex); > - list_for_each_entry(device, &fs_devices->devices, dev_list) { > + rcu_read_lock(); > + list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) { Sorry for the late comment. Why not use dev_alloc_list and, chunk_mutex here? Thanks, Anand > struct btrfs_zoned_device_info *zinfo = device->zone_info; > > if (!device->bdev) > @@ -1999,7 +1999,7 @@ bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags) > break; > } > } > - mutex_unlock(&fs_devices->device_list_mutex); > + rcu_read_unlock(); > > return ret; > }
On 09/03/2022 08:37, Anand Jain wrote: > On 07/03/2022 18:47, Johannes Thumshirn wrote: >> btrfs_can_activate_zone() can be called with the device_list_mutex already >> held, which will lead to a deadlock. >> >> As we're only traversing the list for reads we can switch from the >> device_list_mutex to an rcu traversal of the list. >> >> Fixes: a85f05e59bc1 ("btrfs: zoned: avoid chunk allocation if active block group has enough space") >> Cc: Naohiro Aota <naohiro.aota@wwdc.com> >> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> >> --- >> fs/btrfs/zoned.c | 6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c >> index b7b5fac1c779..cf6341d45689 100644 >> --- a/fs/btrfs/zoned.c >> +++ b/fs/btrfs/zoned.c >> @@ -1986,8 +1986,8 @@ bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags) >> ASSERT((flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0); >> >> /* Check if there is a device with active zones left */ >> - mutex_lock(&fs_devices->device_list_mutex); >> - list_for_each_entry(device, &fs_devices->devices, dev_list) { >> + rcu_read_lock(); >> + list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) { > > > Sorry for the late comment. > Why not use dev_alloc_list and, chunk_mutex here? > That's a good question indeed. Let me try it :)
On Wed, Mar 09, 2022 at 07:44:11AM +0000, Johannes Thumshirn wrote: > On 09/03/2022 08:37, Anand Jain wrote: > > On 07/03/2022 18:47, Johannes Thumshirn wrote: > >> btrfs_can_activate_zone() can be called with the device_list_mutex already > >> held, which will lead to a deadlock. > >> > >> As we're only traversing the list for reads we can switch from the > >> device_list_mutex to an rcu traversal of the list. > >> > >> Fixes: a85f05e59bc1 ("btrfs: zoned: avoid chunk allocation if active block group has enough space") > >> Cc: Naohiro Aota <naohiro.aota@wwdc.com> > >> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > >> --- > >> fs/btrfs/zoned.c | 6 +++--- > >> 1 file changed, 3 insertions(+), 3 deletions(-) > >> > >> diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c > >> index b7b5fac1c779..cf6341d45689 100644 > >> --- a/fs/btrfs/zoned.c > >> +++ b/fs/btrfs/zoned.c > >> @@ -1986,8 +1986,8 @@ bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags) > >> ASSERT((flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0); > >> > >> /* Check if there is a device with active zones left */ > >> - mutex_lock(&fs_devices->device_list_mutex); > >> - list_for_each_entry(device, &fs_devices->devices, dev_list) { > >> + rcu_read_lock(); > >> + list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) { > > > > > > Sorry for the late comment. > > Why not use dev_alloc_list and, chunk_mutex here? > > > > That's a good question indeed. Let me try it :) The 5.18 is now frozen, so please send it as an incremental fix, thanks.
diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c index b7b5fac1c779..cf6341d45689 100644 --- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1986,8 +1986,8 @@ bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags) ASSERT((flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) == 0); /* Check if there is a device with active zones left */ - mutex_lock(&fs_devices->device_list_mutex); - list_for_each_entry(device, &fs_devices->devices, dev_list) { + rcu_read_lock(); + list_for_each_entry_rcu(device, &fs_devices->devices, dev_list) { struct btrfs_zoned_device_info *zinfo = device->zone_info; if (!device->bdev) @@ -1999,7 +1999,7 @@ bool btrfs_can_activate_zone(struct btrfs_fs_devices *fs_devices, u64 flags) break; } } - mutex_unlock(&fs_devices->device_list_mutex); + rcu_read_unlock(); return ret; }
btrfs_can_activate_zone() can be called with the device_list_mutex already held, which will lead to a deadlock. As we're only traversing the list for reads we can switch from the device_list_mutex to an rcu traversal of the list. Fixes: a85f05e59bc1 ("btrfs: zoned: avoid chunk allocation if active block group has enough space") Cc: Naohiro Aota <naohiro.aota@wwdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> --- fs/btrfs/zoned.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)