Message ID | 20240911095206.31060-1-jth@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] btrfs: stripe-tree: correctly truncate stripe extents on delete | expand |
On Wed, Sep 11, 2024 at 11:52:05AM +0200, Johannes Thumshirn wrote: > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > In our CI system, we're seeing the following ASSERT()ion to trigger when > running RAID stripe-tree tests on non-zoned devices: > > assertion failed: found_start >= start && found_end <= end, in fs/btrfs/raid-stripe-tree.c:64 > > This ASSERT()ion triggers, because for the initial design of RAID stripe-tree, > I had the "one ordered-extent equals one bio" rule of zoned btrfs in mind. > > But for a RAID stripe-tree based system, that is not hosted on a zoned > storage device, but on a regular device this rule doesn't apply. > > So in case the range we want to delete starts in the middle of the > previous item, grab the item and "truncate" it's length. That is, subtract > the deleted portion from the key's offset. > > In case the range to delete ends in the middle of an item, we have to > adjust both the item's key as well as the stripe extents. > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > --- > > Changes to v1: > - ASSERT() that slot > 0 before calling btrfs_previous_item() > > fs/btrfs/raid-stripe-tree.c | 52 ++++++++++++++++++++++++++++++++++++- > 1 file changed, 51 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c > index 4c859b550f6c..075fecd08d87 100644 > --- a/fs/btrfs/raid-stripe-tree.c > +++ b/fs/btrfs/raid-stripe-tree.c > @@ -61,7 +61,57 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le > trace_btrfs_raid_extent_delete(fs_info, start, end, > found_start, found_end); > > - ASSERT(found_start >= start && found_end <= end); > + if (found_start < start) { > + struct btrfs_key prev; > + u64 diff = start - found_start; > + > + ASSERT(slot > 0); > + > + ret = btrfs_previous_item(stripe_root, path, start, > + BTRFS_RAID_STRIPE_KEY); > + leaf = path->nodes[0]; > + slot = path->slots[0]; > + btrfs_item_key_to_cpu(leaf, &prev, slot); > + prev.offset -= diff; > + > + btrfs_mark_buffer_dirty(trans, leaf); > + > + start += diff; > + length -= diff; > + > + btrfs_release_path(path); > + continue; > + } > + > + if (end < found_end && found_end - end < key.offset) { > + struct btrfs_stripe_extent *stripe_extent; > + u64 diff = key.offset - length; > + int num_stripes; > + > + num_stripes = btrfs_num_raid_stripes( > + btrfs_item_size(leaf, slot)); > + stripe_extent = btrfs_item_ptr( > + leaf, slot, struct btrfs_stripe_extent); > + > + for (int i = 0; i < num_stripes; i++) { > + struct btrfs_raid_stride *stride = > + &stripe_extent->strides[i]; > + u64 physical = btrfs_raid_stride_physical( > + leaf, stride); > + > + physical += diff; > + btrfs_set_raid_stride_physical(leaf, stride, > + physical); > + } > + > + key.objectid += diff; > + key.offset -= diff; This part was confusing and isn't necessary, you can drop this bit and then add Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef
On Wed, Sep 11, 2024 at 10:52 AM Johannes Thumshirn <jth@kernel.org> wrote: > > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > In our CI system, we're seeing the following ASSERT()ion to trigger when > running RAID stripe-tree tests on non-zoned devices: > > assertion failed: found_start >= start && found_end <= end, in fs/btrfs/raid-stripe-tree.c:64 > > This ASSERT()ion triggers, because for the initial design of RAID stripe-tree, > I had the "one ordered-extent equals one bio" rule of zoned btrfs in mind. > > But for a RAID stripe-tree based system, that is not hosted on a zoned > storage device, but on a regular device this rule doesn't apply. > > So in case the range we want to delete starts in the middle of the > previous item, grab the item and "truncate" it's length. That is, subtract > the deleted portion from the key's offset. > > In case the range to delete ends in the middle of an item, we have to > adjust both the item's key as well as the stripe extents. > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > --- > > Changes to v1: > - ASSERT() that slot > 0 before calling btrfs_previous_item() > > fs/btrfs/raid-stripe-tree.c | 52 ++++++++++++++++++++++++++++++++++++- > 1 file changed, 51 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c > index 4c859b550f6c..075fecd08d87 100644 > --- a/fs/btrfs/raid-stripe-tree.c > +++ b/fs/btrfs/raid-stripe-tree.c > @@ -61,7 +61,57 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le > trace_btrfs_raid_extent_delete(fs_info, start, end, > found_start, found_end); > > - ASSERT(found_start >= start && found_end <= end); I was looking at this in for-next, and several questions popped in my mind, see below. > + if (found_start < start) { > + struct btrfs_key prev; > + u64 diff = start - found_start; > + > + ASSERT(slot > 0); > + > + ret = btrfs_previous_item(stripe_root, path, start, > + BTRFS_RAID_STRIPE_KEY); This doesn't do anything, we ignore the return value and then the "continue" below makes us go to the start of the loop and do a btrfs_search_slot(), without using the leaf (see below as well). > + leaf = path->nodes[0]; > + slot = path->slots[0]; > + btrfs_item_key_to_cpu(leaf, &prev, slot); > + prev.offset -= diff; Why do we decrement prom prev.offset? It's a local variable and then we don't use it anymore. I.e. these 4 lines can be removed. > + > + btrfs_mark_buffer_dirty(trans, leaf); Why do we mark the leaf as dirty? We haven't changed it > + > + start += diff; > + length -= diff; > + > + btrfs_release_path(path); We don't use the path for anything, we extract the key, don't do anything with it and then release the path. Or did I miss something? Thanks. > + continue; > + } > + > + if (end < found_end && found_end - end < key.offset) { > + struct btrfs_stripe_extent *stripe_extent; > + u64 diff = key.offset - length; > + int num_stripes; > + > + num_stripes = btrfs_num_raid_stripes( > + btrfs_item_size(leaf, slot)); > + stripe_extent = btrfs_item_ptr( > + leaf, slot, struct btrfs_stripe_extent); > + > + for (int i = 0; i < num_stripes; i++) { > + struct btrfs_raid_stride *stride = > + &stripe_extent->strides[i]; > + u64 physical = btrfs_raid_stride_physical( > + leaf, stride); > + > + physical += diff; > + btrfs_set_raid_stride_physical(leaf, stride, > + physical); > + } > + > + key.objectid += diff; > + key.offset -= diff; > + > + btrfs_mark_buffer_dirty(trans, leaf); > + btrfs_release_path(path); > + break; > + } > + > ret = btrfs_del_item(trans, stripe_root, path); > if (ret) > break; > -- > 2.43.0 > >
diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c index 4c859b550f6c..075fecd08d87 100644 --- a/fs/btrfs/raid-stripe-tree.c +++ b/fs/btrfs/raid-stripe-tree.c @@ -61,7 +61,57 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le trace_btrfs_raid_extent_delete(fs_info, start, end, found_start, found_end); - ASSERT(found_start >= start && found_end <= end); + if (found_start < start) { + struct btrfs_key prev; + u64 diff = start - found_start; + + ASSERT(slot > 0); + + ret = btrfs_previous_item(stripe_root, path, start, + BTRFS_RAID_STRIPE_KEY); + leaf = path->nodes[0]; + slot = path->slots[0]; + btrfs_item_key_to_cpu(leaf, &prev, slot); + prev.offset -= diff; + + btrfs_mark_buffer_dirty(trans, leaf); + + start += diff; + length -= diff; + + btrfs_release_path(path); + continue; + } + + if (end < found_end && found_end - end < key.offset) { + struct btrfs_stripe_extent *stripe_extent; + u64 diff = key.offset - length; + int num_stripes; + + num_stripes = btrfs_num_raid_stripes( + btrfs_item_size(leaf, slot)); + stripe_extent = btrfs_item_ptr( + leaf, slot, struct btrfs_stripe_extent); + + for (int i = 0; i < num_stripes; i++) { + struct btrfs_raid_stride *stride = + &stripe_extent->strides[i]; + u64 physical = btrfs_raid_stride_physical( + leaf, stride); + + physical += diff; + btrfs_set_raid_stride_physical(leaf, stride, + physical); + } + + key.objectid += diff; + key.offset -= diff; + + btrfs_mark_buffer_dirty(trans, leaf); + btrfs_release_path(path); + break; + } + ret = btrfs_del_item(trans, stripe_root, path); if (ret) break;