diff mbox series

[2/2] btrfs: zoned: skip reporting zone for new block group

Message ID ec6b55668686f77593f12c579832886294fc7310.1741596325.git.naohiro.aota@wdc.com (mailing list archive)
State New
Headers show
Series btrfs: zoned: skip reporting zone for new block group | expand

Commit Message

Naohiro Aota March 12, 2025, 1:31 a.m. UTC
There is a potential deadlock if we do report zones in an IO context. When one
process do a report zones and another process freezes the block device, the
report zones side cannot allocate a tag because the freeze is already started.
This can thus result in new block group creation to hang forever, blocking the
write path.

Thankfully, a new block group should be created on empty zones. So, reporting
the zones is not necessary and we can set the write pointer = 0 and load the
zone capacity from the block layer using bdev_zone_capacity() helper.

CC: stable@vger.kernel.org # 6.13+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 fs/btrfs/zoned.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

Comments

Damien Le Moal March 12, 2025, 1:42 a.m. UTC | #1
On 3/12/25 10:31, Naohiro Aota wrote:
> There is a potential deadlock if we do report zones in an IO context. When one
> process do a report zones and another process freezes the block device, the
> report zones side cannot allocate a tag because the freeze is already started.
> This can thus result in new block group creation to hang forever, blocking the
> write path.

+Shin'ichiro

blktest has a failing test case due to a lockdep splat triggered by this. Would
be good to add that information (with the splat) here.

> 
> Thankfully, a new block group should be created on empty zones. So, reporting
> the zones is not necessary and we can set the write pointer = 0 and load the
> zone capacity from the block layer using bdev_zone_capacity() helper.
> 
> CC: stable@vger.kernel.org # 6.13+
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>

With that fixed, looks good to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
diff mbox series

Patch

diff --git a/fs/btrfs/zoned.c b/fs/btrfs/zoned.c
index 4956baf8220a..6c730f6bce10 100644
--- a/fs/btrfs/zoned.c
+++ b/fs/btrfs/zoned.c
@@ -1277,7 +1277,7 @@  struct zone_info {
 
 static int btrfs_load_zone_info(struct btrfs_fs_info *fs_info, int zone_idx,
 				struct zone_info *info, unsigned long *active,
-				struct btrfs_chunk_map *map)
+				struct btrfs_chunk_map *map, bool new)
 {
 	struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace;
 	struct btrfs_device *device;
@@ -1307,6 +1307,8 @@  static int btrfs_load_zone_info(struct btrfs_fs_info *fs_info, int zone_idx,
 		return 0;
 	}
 
+	ASSERT(!new || btrfs_dev_is_empty_zone(device, info->physical));
+
 	/* This zone will be used for allocation, so mark this zone non-empty. */
 	btrfs_dev_clear_zone_empty(device, info->physical);
 
@@ -1319,6 +1321,18 @@  static int btrfs_load_zone_info(struct btrfs_fs_info *fs_info, int zone_idx,
 	 * to determine the allocation offset within the zone.
 	 */
 	WARN_ON(!IS_ALIGNED(info->physical, fs_info->zone_size));
+
+	if (new) {
+		sector_t capacity;
+
+		capacity = bdev_zone_capacity(device->bdev, info->physical >> SECTOR_SHIFT);
+		up_read(&dev_replace->rwsem);
+		info->alloc_offset = 0;
+		info->capacity = capacity << SECTOR_SHIFT;
+
+		return 0;
+	}
+
 	nofs_flag = memalloc_nofs_save();
 	ret = btrfs_get_dev_zone(device, info->physical, &zone);
 	memalloc_nofs_restore(nofs_flag);
@@ -1588,7 +1602,7 @@  int btrfs_load_block_group_zone_info(struct btrfs_block_group *cache, bool new)
 	}
 
 	for (i = 0; i < map->num_stripes; i++) {
-		ret = btrfs_load_zone_info(fs_info, i, &zone_info[i], active, map);
+		ret = btrfs_load_zone_info(fs_info, i, &zone_info[i], active, map, new);
 		if (ret)
 			goto out;