Message ID | 20200102112746.145045-3-wqu@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Introduce per-profile available space array to avoid over-confident can_overcommit() | expand |
On 1/2/20 6:27 AM, Qu Wenruo wrote: > There are 4 locations where device size or used space get updated: > - Chunk allocation > - Chunk removal > - Device grow > - Device shrink > > Now also update per-profile available space at those timings. > > For __btrfs_alloc_chunk() we can't acquire device_list_mutex as in > btrfs_finish_chunk_alloc() we could hold device_list_mutex and cause > dead lock. > These are protecting two different things though, holding the chunk_mutex doesn't keep things from being removed from the device list. Looking at patch 1 can't we just do the device list traversal under RCU and then not have to worry about the locking at all? Thanks, Josef
On 2020/1/3 上午12:17, Josef Bacik wrote: > On 1/2/20 6:27 AM, Qu Wenruo wrote: >> There are 4 locations where device size or used space get updated: >> - Chunk allocation >> - Chunk removal >> - Device grow >> - Device shrink >> >> Now also update per-profile available space at those timings. >> >> For __btrfs_alloc_chunk() we can't acquire device_list_mutex as in >> btrfs_finish_chunk_alloc() we could hold device_list_mutex and cause >> dead lock. >> > > These are protecting two different things though, holding the > chunk_mutex doesn't keep things from being removed from the device list. > > Looking at patch 1 can't we just do the device list traversal under RCU > and then not have to worry about the locking at all? Thanks, That's very interesting solution. But from the comment of btrfs_fs_devices::alloc_list, it says: ``` /* * Devices which can satisfy space allocation. Protected by * chunk_mutex */ struct list_head alloc_list; ``` And __btrfs_chunk_alloc() is iterating alloc_list without extra protection, so it should be OK I guess. Thanks, Qu > > Josef
On 2020/1/3 上午12:17, Josef Bacik wrote: > On 1/2/20 6:27 AM, Qu Wenruo wrote: >> There are 4 locations where device size or used space get updated: >> - Chunk allocation >> - Chunk removal >> - Device grow >> - Device shrink >> >> Now also update per-profile available space at those timings. >> >> For __btrfs_alloc_chunk() we can't acquire device_list_mutex as in >> btrfs_finish_chunk_alloc() we could hold device_list_mutex and cause >> dead lock. >> > > These are protecting two different things though, holding the > chunk_mutex doesn't keep things from being removed from the device list. Further looking into the lock schema, Nikolay's comment is right, chunk_mutex protects alloc_list (rw devices). Device won't disappear from alloc_list, as in btrfs_rm_device() we took chunk_mutex before removing writeable device. In fact, device_list_mutex is unrelated to alloc_list. So other calc_per_profile_avaiable() call sites are in fact not safe, I shouldn't take device_list_mutex, but chunk_mutex which are much easier to get. > > Looking at patch 1 can't we just do the device list traversal under RCU > and then not have to worry about the locking at all? Thanks, I guess we need SRCU, since in __btrfs_alloc_chunk() context we need to sleep for search dev extent tree. And I'm not confident enough for some corner case, maybe some RCU sync case would cause unexpected performance degrade as the RCU sync can be more expensive than simple mutex lock. Thanks, Qu > > Josef
On 1/3/20 4:52 AM, Qu Wenruo wrote: > > > On 2020/1/3 上午12:17, Josef Bacik wrote: >> On 1/2/20 6:27 AM, Qu Wenruo wrote: >>> There are 4 locations where device size or used space get updated: >>> - Chunk allocation >>> - Chunk removal >>> - Device grow >>> - Device shrink >>> >>> Now also update per-profile available space at those timings. >>> >>> For __btrfs_alloc_chunk() we can't acquire device_list_mutex as in >>> btrfs_finish_chunk_alloc() we could hold device_list_mutex and cause >>> dead lock. >>> >> >> These are protecting two different things though, holding the >> chunk_mutex doesn't keep things from being removed from the device list. > > Further looking into the lock schema, Nikolay's comment is right, > chunk_mutex protects alloc_list (rw devices). > > Device won't disappear from alloc_list, as in btrfs_rm_device() we took > chunk_mutex before removing writeable device. > > In fact, device_list_mutex is unrelated to alloc_list. > > So other calc_per_profile_avaiable() call sites are in fact not safe, I > shouldn't take device_list_mutex, but chunk_mutex which are much easier > to get. > >> >> Looking at patch 1 can't we just do the device list traversal under RCU >> and then not have to worry about the locking at all? Thanks, > > I guess we need SRCU, since in __btrfs_alloc_chunk() context we need to > sleep for search dev extent tree. > > And I'm not confident enough for some corner case, maybe some RCU sync > case would cause unexpected performance degrade as the RCU sync can be > more expensive than simple mutex lock. > Sigh sorry I was reading it like we were doing fs_devices->devices, which is all handled via RCU. But you're right, alloc_list is covered by the chunk mutex, so just make sure the chunk mutex is taken every where and you should be good to go. Thanks, Josef
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index fa948f7eebc2..be8250332c60 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2804,6 +2804,7 @@ int btrfs_grow_device(struct btrfs_trans_handle *trans, struct btrfs_super_block *super_copy = fs_info->super_copy; u64 old_total; u64 diff; + int ret; if (!test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) return -EACCES; @@ -2832,6 +2833,11 @@ int btrfs_grow_device(struct btrfs_trans_handle *trans, &trans->transaction->dev_update_list); mutex_unlock(&fs_info->chunk_mutex); + mutex_lock(&fs_info->fs_devices->device_list_mutex); + ret = calc_per_profile_avail(fs_info); + mutex_unlock(&fs_info->fs_devices->device_list_mutex); + if (ret < 0) + return ret; return btrfs_update_device(trans, device); } @@ -3010,7 +3016,12 @@ int btrfs_remove_chunk(struct btrfs_trans_handle *trans, u64 chunk_offset) goto out; } } + ret = calc_per_profile_avail(fs_info); mutex_unlock(&fs_devices->device_list_mutex); + if (ret < 0) { + btrfs_abort_transaction(trans, ret); + goto out; + } ret = btrfs_free_chunk(trans, chunk_offset); if (ret) { @@ -4826,6 +4837,10 @@ int btrfs_shrink_device(struct btrfs_device *device, u64 new_size) device->fs_devices->total_rw_bytes += diff; atomic64_add(diff, &fs_info->free_chunk_space); mutex_unlock(&fs_info->chunk_mutex); + } else { + mutex_lock(&fs_info->fs_devices->device_list_mutex); + ret = calc_per_profile_avail(fs_info); + mutex_unlock(&fs_info->fs_devices->device_list_mutex); } return ret; } @@ -5143,8 +5158,14 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, check_raid56_incompat_flag(info, type); check_raid1c34_incompat_flag(info, type); + /* + * In this context we don't need device_list_mutex as chunk_mutex are + * protecting us. + */ + ret = calc_per_profile_avail(info); + kfree(devices_info); - return 0; + return ret; error_del_extent: write_lock(&em_tree->lock);
There are 4 locations where device size or used space get updated: - Chunk allocation - Chunk removal - Device grow - Device shrink Now also update per-profile available space at those timings. For __btrfs_alloc_chunk() we can't acquire device_list_mutex as in btrfs_finish_chunk_alloc() we could hold device_list_mutex and cause dead lock. Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/volumes.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-)