diff mbox series

btrfs: stripe-tree: correctly truncate stripe extents on delete

Message ID 20240820143434.25332-1-jth@kernel.org (mailing list archive)
State New
Headers show
Series btrfs: stripe-tree: correctly truncate stripe extents on delete | expand

Commit Message

Johannes Thumshirn Aug. 20, 2024, 2:34 p.m. UTC
From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

In our CI system, we're seeing the following ASSERT()ion to trigger when
running RAID stripe-tree tests on non-zoned devices:

 assertion failed: found_start >= start && found_end <= end, in fs/btrfs/raid-stripe-tree.c:64

This ASSERT()ion triggers, because for the initial design of RAID stripe-tree,
I had the "one ordered-extent equals one bio" rule of zoned btrfs in mind.

But for a RAID stripe-tree based system, that is not hosted on a zoned
storage device, but on a regular device this rule doesn't apply.

So in case the range we want to delete starts in the middle of the
previous item, grab the item and "truncate" it's length. That is, subtract
the deleted portion from the key's offset.

In case the range to delete ends in the middle of an item, we have to
adjust both the item's key as well as the stripe extents.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/raid-stripe-tree.c | 50 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 49 insertions(+), 1 deletion(-)

Comments

Josef Bacik Aug. 21, 2024, 5:15 p.m. UTC | #1
On Tue, Aug 20, 2024 at 04:34:33PM +0200, Johannes Thumshirn wrote:
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> In our CI system, we're seeing the following ASSERT()ion to trigger when
> running RAID stripe-tree tests on non-zoned devices:
> 
>  assertion failed: found_start >= start && found_end <= end, in fs/btrfs/raid-stripe-tree.c:64
> 
> This ASSERT()ion triggers, because for the initial design of RAID stripe-tree,
> I had the "one ordered-extent equals one bio" rule of zoned btrfs in mind.
> 
> But for a RAID stripe-tree based system, that is not hosted on a zoned
> storage device, but on a regular device this rule doesn't apply.
> 
> So in case the range we want to delete starts in the middle of the
> previous item, grab the item and "truncate" it's length. That is, subtract
> the deleted portion from the key's offset.
> 
> In case the range to delete ends in the middle of an item, we have to
> adjust both the item's key as well as the stripe extents.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/raid-stripe-tree.c | 50 ++++++++++++++++++++++++++++++++++++-
>  1 file changed, 49 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
> index 4c859b550f6c..c8365d14271f 100644
> --- a/fs/btrfs/raid-stripe-tree.c
> +++ b/fs/btrfs/raid-stripe-tree.c
> @@ -61,7 +61,55 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
>  		trace_btrfs_raid_extent_delete(fs_info, start, end,
>  					       found_start, found_end);
>  
> -		ASSERT(found_start >= start && found_end <= end);
> +		if (found_start < start) {
> +			struct btrfs_key prev;
> +			u64 diff = start - found_start;
> +
> +			ret = btrfs_previous_item(stripe_root, path, start,
> +						  BTRFS_RAID_STRIPE_KEY);

This is only safe if we're not path->slots[0] == 0, otherwise we'll do
btrfs_prev_leaf(), which doesn't modify anything, adn then we'll be in trouble.
If this is safe then a comment indicating why we expect this to only back up one
slot, and maybe an

ASSERT(path->slots[0] > 0);

before the btrfs_previous_item to make sure we don't screw this up later.
Thanks,

Josef
diff mbox series

Patch

diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index 4c859b550f6c..c8365d14271f 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -61,7 +61,55 @@  int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
 		trace_btrfs_raid_extent_delete(fs_info, start, end,
 					       found_start, found_end);
 
-		ASSERT(found_start >= start && found_end <= end);
+		if (found_start < start) {
+			struct btrfs_key prev;
+			u64 diff = start - found_start;
+
+			ret = btrfs_previous_item(stripe_root, path, start,
+						  BTRFS_RAID_STRIPE_KEY);
+			leaf = path->nodes[0];
+			slot = path->slots[0];
+			btrfs_item_key_to_cpu(leaf, &prev, slot);
+			prev.offset -= diff;
+
+			btrfs_mark_buffer_dirty(trans, leaf);
+
+			start += diff;
+			length -= diff;
+
+			btrfs_release_path(path);
+			continue;
+		}
+
+		if (end < found_end && found_end - end < key.offset) {
+			struct btrfs_stripe_extent *stripe_extent;
+			u64 diff = key.offset - length;
+			int num_stripes;
+
+			num_stripes = btrfs_num_raid_stripes(
+				btrfs_item_size(leaf, slot));
+			stripe_extent = btrfs_item_ptr(
+				leaf, slot, struct btrfs_stripe_extent);
+
+			for (int i = 0; i < num_stripes; i++) {
+				struct btrfs_raid_stride *stride =
+					&stripe_extent->strides[i];
+				u64 physical = btrfs_raid_stride_physical(
+					leaf, stride);
+
+				physical += diff;
+				btrfs_set_raid_stride_physical(leaf, stride,
+							       physical);
+			}
+
+			key.objectid += diff;
+			key.offset -= diff;
+
+			btrfs_mark_buffer_dirty(trans, leaf);
+			btrfs_release_path(path);
+			break;
+		}
+
 		ret = btrfs_del_item(trans, stripe_root, path);
 		if (ret)
 			break;