From patchwork Wed Feb 19 07:57:56 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Naohiro Aota X-Patchwork-Id: 13981651 Received: from esa4.hgst.iphmx.com (esa4.hgst.iphmx.com [216.71.154.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CB1B31C5F0C for ; Wed, 19 Feb 2025 07:58:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=216.71.154.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739951907; cv=none; b=ua14SUjfvt6PbLFIVcASZMwTFpPYcN1Bpk4vCIaijfQS0BvuES3H+IX3JOEc1d/m0ih1nXM7k+MELiJZNvMkLh2fZLqmPtTxvj2NnDRVNe17v8PRpFa2NwqTQ7Pell7nnYvyy9+Bs4bhGzUDj/0uTPiNSSsrcajRRmIJRrvdPTc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739951907; c=relaxed/simple; bh=rE7EPQMY/i12LbqPqDYzxx0lVd1FxSol+C69vuSJYjc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=M3wWMyG+2NdOLxsuar1N3fj/dJZ8ckkI5znnELAaO19Nh51k0dtHuZ3OZJEbY1YXhqyLhsl1wAiV6I+8myLkpsuyWBQFVw+isQ8FFXNfkYtJFXNmrveSKnGquGka2JZws/9NcsPMOZd3Ue70/LLhPOgn7ogXxgzQ+uEfWyOJTr0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=wdc.com; spf=pass smtp.mailfrom=wdc.com; dkim=pass (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b=NgFjSzbC; arc=none smtp.client-ip=216.71.154.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=wdc.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=wdc.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=wdc.com header.i=@wdc.com header.b="NgFjSzbC" DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=wdc.com; i=@wdc.com; q=dns/txt; s=dkim.wdc.com; t=1739951905; x=1771487905; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=rE7EPQMY/i12LbqPqDYzxx0lVd1FxSol+C69vuSJYjc=; b=NgFjSzbCq5eRQrUBHszZJr+eMoLwtEqFpGdTZDuf6uANVT3eiNMhq0aC KnuBRRuxCGWXJehLcNIFlu6d2TZa01t9LlKpU0HstznuD/KHaUG8L7MFx XFXzQ/pU+TYqmD/Wbu0scWxVz8++eVLC90EHCuN5oX0FwZMS94r28Pn22 NaMIv9izt6DB8KnBf70nx4ae2dRFHWBRDnSMrIDr6TLEEiF6YdPJRTRwr gyIq3dPOJk8aNX+qVgLXONtKjTFuxemd3YcdH4QR3P1o5Q75os1kIZYrF 1Tk2dH7/X1XgJ9H9BmFt5Lu5XsTwEIG+7vLzum2SZLZkOwyUD+C62zyHN w==; X-CSE-ConnectionGUID: lzBDQvMAQ5Cog49vfSxfhw== X-CSE-MsgGUID: Bl7m34hbTz6ZkwbaQmJ36w== X-IronPort-AV: E=Sophos;i="6.13,298,1732550400"; d="scan'208";a="38310820" Received: from uls-op-cesaip01.wdc.com (HELO uls-op-cesaep01.wdc.com) ([199.255.45.14]) by ob1.hgst.iphmx.com with ESMTP; 19 Feb 2025 15:58:13 +0800 IronPort-SDR: 67b5815a_+A7QIDmthqDlBj3jccA+Ax5ijk6JoQXaW8brAX0TBgH9kzi zJhAxouZ2ZNkl82vwn0qR1XI/jtpnGaLAmJSvXw== Received: from uls-op-cesaip01.wdc.com ([10.248.3.36]) by uls-op-cesaep01.wdc.com with ESMTP/TLS/ECDHE-RSA-AES128-GCM-SHA256; 18 Feb 2025 22:59:38 -0800 WDCIronportException: Internal Received: from 5cg20343qs.ad.shared (HELO naota-xeon..) ([10.224.109.7]) by uls-op-cesaip01.wdc.com with ESMTP; 18 Feb 2025 23:58:12 -0800 From: Naohiro Aota To: linux-btrfs@vger.kernel.org Cc: Naohiro Aota Subject: [PATCH v2 12/12] btrfs-progs: zoned: fix alloc_offset calculation for partly conventional block groups Date: Wed, 19 Feb 2025 16:57:56 +0900 Message-ID: X-Mailer: git-send-email 2.48.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-btrfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 When one of two zones composing a DUP block group is a conventional zone, we have the zone_info[i]->alloc_offset = WP_CONVENTIONAL. That will, of course, not match the write pointer of the other zone, and fails that block group. This commit solves that issue by properly recovering the emulated write pointer from the last allocated extent. The offset for the SINGLE, DUP, and RAID1 are straight-forward: it is same as the end of last allocated extent. The RAID0 and RAID10 are a bit tricky that we need to do the math of striping. Signed-off-by: Naohiro Aota --- kernel-shared/zoned.c | 65 +++++++++++++++++++++++++++++++++---------- 1 file changed, 51 insertions(+), 14 deletions(-) diff --git a/kernel-shared/zoned.c b/kernel-shared/zoned.c index 484bade1d2ed..d96311af70b2 100644 --- a/kernel-shared/zoned.c +++ b/kernel-shared/zoned.c @@ -981,7 +981,7 @@ static int btrfs_load_block_group_dup(struct btrfs_fs_info *fs_info, struct btrfs_block_group *bg, struct map_lookup *map, struct zone_info *zone_info, - unsigned long *active) + unsigned long *active, u64 last_alloc) { if ((map->type & BTRFS_BLOCK_GROUP_DATA) && !fs_info->stripe_root) { btrfs_err(fs_info, "zoned: data DUP profile needs raid-stripe-tree"); @@ -1002,6 +1002,12 @@ static int btrfs_load_block_group_dup(struct btrfs_fs_info *fs_info, zone_info[1].physical); return -EIO; } + + if (zone_info[0].alloc_offset == WP_CONVENTIONAL) + zone_info[0].alloc_offset = last_alloc; + if (zone_info[1].alloc_offset == WP_CONVENTIONAL) + zone_info[1].alloc_offset = last_alloc; + if (zone_info[0].alloc_offset != zone_info[1].alloc_offset) { btrfs_err(fs_info, "zoned: write pointer offset mismatch of zones in DUP profile"); @@ -1022,7 +1028,7 @@ static int btrfs_load_block_group_raid1(struct btrfs_fs_info *fs_info, struct btrfs_block_group *bg, struct map_lookup *map, struct zone_info *zone_info, - unsigned long *active) + unsigned long *active, u64 last_alloc) { int i; @@ -1036,9 +1042,10 @@ static int btrfs_load_block_group_raid1(struct btrfs_fs_info *fs_info, bg->zone_capacity = min_not_zero(zone_info[0].capacity, zone_info[1].capacity); for (i = 0; i < map->num_stripes; i++) { - if (zone_info[i].alloc_offset == WP_MISSING_DEV || - zone_info[i].alloc_offset == WP_CONVENTIONAL) + if (zone_info[i].alloc_offset == WP_MISSING_DEV) continue; + if (zone_info[i].alloc_offset == WP_CONVENTIONAL) + zone_info[i].alloc_offset = last_alloc; if (zone_info[0].alloc_offset != zone_info[i].alloc_offset) { btrfs_err(fs_info, @@ -1066,7 +1073,7 @@ static int btrfs_load_block_group_raid0(struct btrfs_fs_info *fs_info, struct btrfs_block_group *bg, struct map_lookup *map, struct zone_info *zone_info, - unsigned long *active) + unsigned long *active, u64 last_alloc) { if ((map->type & BTRFS_BLOCK_GROUP_DATA) && !fs_info->stripe_root) { btrfs_err(fs_info, "zoned: data %s needs raid-stripe-tree", @@ -1075,9 +1082,24 @@ static int btrfs_load_block_group_raid0(struct btrfs_fs_info *fs_info, } for (int i = 0; i < map->num_stripes; i++) { - if (zone_info[i].alloc_offset == WP_MISSING_DEV || - zone_info[i].alloc_offset == WP_CONVENTIONAL) + if (zone_info[i].alloc_offset == WP_MISSING_DEV) continue; + if (zone_info[i].alloc_offset == WP_CONVENTIONAL) { + u64 stripe_nr, full_stripe_nr; + u64 stripe_offset; + int stripe_index; + + stripe_nr = last_alloc / map->stripe_len; + stripe_offset = stripe_nr * map->stripe_len; + full_stripe_nr = stripe_nr / map->num_stripes; + stripe_index = stripe_nr % map->num_stripes; + + zone_info[i].alloc_offset = full_stripe_nr * map->stripe_len; + if (stripe_index > i) + zone_info[i].alloc_offset += map->stripe_len; + else if (stripe_index == i) + zone_info[i].alloc_offset += (last_alloc - stripe_offset); + } if (test_bit(0, active) != test_bit(i, active)) { return -EIO; @@ -1096,7 +1118,7 @@ static int btrfs_load_block_group_raid10(struct btrfs_fs_info *fs_info, struct btrfs_block_group *bg, struct map_lookup *map, struct zone_info *zone_info, - unsigned long *active) + unsigned long *active, u64 last_alloc) { if ((map->type & BTRFS_BLOCK_GROUP_DATA) && !fs_info->stripe_root) { btrfs_err(fs_info, "zoned: data %s needs raid-stripe-tree", @@ -1105,9 +1127,24 @@ static int btrfs_load_block_group_raid10(struct btrfs_fs_info *fs_info, } for (int i = 0; i < map->num_stripes; i++) { - if (zone_info[i].alloc_offset == WP_MISSING_DEV || - zone_info[i].alloc_offset == WP_CONVENTIONAL) + if (zone_info[i].alloc_offset == WP_MISSING_DEV) continue; + if (zone_info[i].alloc_offset == WP_CONVENTIONAL) { + u64 stripe_nr, full_stripe_nr; + u64 stripe_offset; + int stripe_index; + + stripe_nr = last_alloc / map->stripe_len; + stripe_offset = stripe_nr * map->stripe_len; + full_stripe_nr = stripe_nr / (map->num_stripes / map->sub_stripes); + stripe_index = stripe_nr % (map->num_stripes / map->sub_stripes); + + zone_info[i].alloc_offset = full_stripe_nr * map->stripe_len; + if (stripe_index > (i / map->sub_stripes)) + zone_info[i].alloc_offset += map->stripe_len; + else if (stripe_index == (i / map->sub_stripes)) + zone_info[i].alloc_offset += (last_alloc - stripe_offset); + } if (test_bit(0, active) != test_bit(i, active)) { return -EIO; @@ -1214,18 +1251,18 @@ int btrfs_load_block_group_zone_info(struct btrfs_fs_info *fs_info, ret = btrfs_load_block_group_single(fs_info, cache, &zone_info[0], active); break; case BTRFS_BLOCK_GROUP_DUP: - ret = btrfs_load_block_group_dup(fs_info, cache, map, zone_info, active); + ret = btrfs_load_block_group_dup(fs_info, cache, map, zone_info, active, last_alloc); break; case BTRFS_BLOCK_GROUP_RAID1: case BTRFS_BLOCK_GROUP_RAID1C3: case BTRFS_BLOCK_GROUP_RAID1C4: - ret = btrfs_load_block_group_raid1(fs_info, cache, map, zone_info, active); + ret = btrfs_load_block_group_raid1(fs_info, cache, map, zone_info, active, last_alloc); break; case BTRFS_BLOCK_GROUP_RAID0: - ret = btrfs_load_block_group_raid0(fs_info, cache, map, zone_info, active); + ret = btrfs_load_block_group_raid0(fs_info, cache, map, zone_info, active, last_alloc); break; case BTRFS_BLOCK_GROUP_RAID10: - ret = btrfs_load_block_group_raid10(fs_info, cache, map, zone_info, active); + ret = btrfs_load_block_group_raid10(fs_info, cache, map, zone_info, active, last_alloc); break; case BTRFS_BLOCK_GROUP_RAID5: case BTRFS_BLOCK_GROUP_RAID6: