Message ID | 20250107-rst-delete-fixes-v2-6-0c7b14c0aac2@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: more RST delete fixes | expand |
On Tue, Jan 7, 2025 at 12:50 PM Johannes Thumshirn <jth@kernel.org> wrote: > > From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > > When a user requests the deletion of a range that spans multiple stripe > extents and btrfs_search_slot() returns us the second RAID stripe extent, > we need to pick the previous item and truncate it, if there's still a > range to delete left, move on to the next item. > > The following diagram illustrates the operation: > > |--- RAID Stripe Extent ---||--- RAID Stripe Extent ---| > |--- keep ---|--- drop ---| > > While at it, comment the trivial case of a whole item delete as well. > > Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > --- > fs/btrfs/raid-stripe-tree.c | 28 ++++++++++++++++++++++++++++ > 1 file changed, 28 insertions(+) > > diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c > index 79f8f692aaa8f6df2c9482fbd7777c2812528f65..893d963951315abfc734e1ca232b3087b7889431 100644 > --- a/fs/btrfs/raid-stripe-tree.c > +++ b/fs/btrfs/raid-stripe-tree.c > @@ -103,6 +103,31 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le > found_end = found_start + key.offset; > ret = 0; > > + /* > + * The stripe extent starts before the range we want to delete, > + * but the range spans more than one stripe extent: > + * > + * |--- RAID Stripe Extent ---||--- RAID Stripe Extent ---| > + * |--- keep ---|--- drop ---| > + * > + * This means we have to get the previous item, truncate its > + * length and then restart the search. > + */ > + if (found_start > start) { > + > + ret = btrfs_previous_item(stripe_root, path, start, > + BTRFS_RAID_STRIPE_KEY); > + if (ret < 0) > + break; > + ret = 0; > + > + leaf = path->nodes[0]; > + slot = path->slots[0]; > + btrfs_item_key_to_cpu(leaf, &key, slot); > + found_start = key.objectid; > + found_end = found_start + key.offset; Hum, this isn't safe, ignoring the case where btrfs_previous_item() returns 1, meaning there's no previous item. In that case previous_item() returns pointing to the same leaf and slot, and then below we delete the item instead of trimming it (increasing its range start and decreasing its length). Thanks. > + } > + > if (key.type != BTRFS_RAID_STRIPE_KEY) > break; > > @@ -156,6 +181,9 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le > break; > } > > + /* > + * Finally we can delete the whole item, no more special cases. > + */ > ret = btrfs_del_item(trans, stripe_root, path); > if (ret) > break; > > -- > 2.43.0 > >
On 09.01.25 16:24, Filipe Manana wrote: > On Tue, Jan 7, 2025 at 12:50 PM Johannes Thumshirn <jth@kernel.org> wrote: >> >> From: Johannes Thumshirn <johannes.thumshirn@wdc.com> >> >> When a user requests the deletion of a range that spans multiple stripe >> extents and btrfs_search_slot() returns us the second RAID stripe extent, >> we need to pick the previous item and truncate it, if there's still a >> range to delete left, move on to the next item. >> >> The following diagram illustrates the operation: >> >> |--- RAID Stripe Extent ---||--- RAID Stripe Extent ---| >> |--- keep ---|--- drop ---| >> >> While at it, comment the trivial case of a whole item delete as well. >> >> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> >> --- >> fs/btrfs/raid-stripe-tree.c | 28 ++++++++++++++++++++++++++++ >> 1 file changed, 28 insertions(+) >> >> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c >> index 79f8f692aaa8f6df2c9482fbd7777c2812528f65..893d963951315abfc734e1ca232b3087b7889431 100644 >> --- a/fs/btrfs/raid-stripe-tree.c >> +++ b/fs/btrfs/raid-stripe-tree.c >> @@ -103,6 +103,31 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le >> found_end = found_start + key.offset; >> ret = 0; >> >> + /* >> + * The stripe extent starts before the range we want to delete, >> + * but the range spans more than one stripe extent: >> + * >> + * |--- RAID Stripe Extent ---||--- RAID Stripe Extent ---| >> + * |--- keep ---|--- drop ---| >> + * >> + * This means we have to get the previous item, truncate its >> + * length and then restart the search. >> + */ >> + if (found_start > start) { >> + >> + ret = btrfs_previous_item(stripe_root, path, start, >> + BTRFS_RAID_STRIPE_KEY); >> + if (ret < 0) >> + break; >> + ret = 0; >> + >> + leaf = path->nodes[0]; >> + slot = path->slots[0]; >> + btrfs_item_key_to_cpu(leaf, &key, slot); >> + found_start = key.objectid; >> + found_end = found_start + key.offset; > > Hum, this isn't safe, ignoring the case where btrfs_previous_item() > returns 1, meaning there's no previous item. > > In that case previous_item() returns pointing to the same leaf and > slot, and then below we delete the item instead of trimming it > (increasing its range start and decreasing its length). Good catch! But what should we do when we end up in this situation? Doesn't that mean that either do_free_extent_accounting() passed in a bogus range or btrfs_previous_item() should've done one more call to btrfs_pref_leaf()?
On Fri, Jan 10, 2025 at 11:33 AM Johannes Thumshirn <Johannes.Thumshirn@wdc.com> wrote: > > On 09.01.25 16:24, Filipe Manana wrote: > > On Tue, Jan 7, 2025 at 12:50 PM Johannes Thumshirn <jth@kernel.org> wrote: > >> > >> From: Johannes Thumshirn <johannes.thumshirn@wdc.com> > >> > >> When a user requests the deletion of a range that spans multiple stripe > >> extents and btrfs_search_slot() returns us the second RAID stripe extent, > >> we need to pick the previous item and truncate it, if there's still a > >> range to delete left, move on to the next item. > >> > >> The following diagram illustrates the operation: > >> > >> |--- RAID Stripe Extent ---||--- RAID Stripe Extent ---| > >> |--- keep ---|--- drop ---| > >> > >> While at it, comment the trivial case of a whole item delete as well. > >> > >> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> > >> --- > >> fs/btrfs/raid-stripe-tree.c | 28 ++++++++++++++++++++++++++++ > >> 1 file changed, 28 insertions(+) > >> > >> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c > >> index 79f8f692aaa8f6df2c9482fbd7777c2812528f65..893d963951315abfc734e1ca232b3087b7889431 100644 > >> --- a/fs/btrfs/raid-stripe-tree.c > >> +++ b/fs/btrfs/raid-stripe-tree.c > >> @@ -103,6 +103,31 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le > >> found_end = found_start + key.offset; > >> ret = 0; > >> > >> + /* > >> + * The stripe extent starts before the range we want to delete, > >> + * but the range spans more than one stripe extent: > >> + * > >> + * |--- RAID Stripe Extent ---||--- RAID Stripe Extent ---| > >> + * |--- keep ---|--- drop ---| > >> + * > >> + * This means we have to get the previous item, truncate its > >> + * length and then restart the search. > >> + */ > >> + if (found_start > start) { > >> + > >> + ret = btrfs_previous_item(stripe_root, path, start, > >> + BTRFS_RAID_STRIPE_KEY); > >> + if (ret < 0) > >> + break; > >> + ret = 0; > >> + > >> + leaf = path->nodes[0]; > >> + slot = path->slots[0]; > >> + btrfs_item_key_to_cpu(leaf, &key, slot); > >> + found_start = key.objectid; > >> + found_end = found_start + key.offset; > > > > Hum, this isn't safe, ignoring the case where btrfs_previous_item() > > returns 1, meaning there's no previous item. > > > > In that case previous_item() returns pointing to the same leaf and > > slot, and then below we delete the item instead of trimming it > > (increasing its range start and decreasing its length). > > Good catch! > > But what should we do when we end up in this situation? Doesn't that > mean that either do_free_extent_accounting() passed in a bogus range or > btrfs_previous_item() should've done one more call to btrfs_pref_leaf()? When it returns 1 it means there's no previous leaf - the item we processed was the first in the tree, so any future calls to btrfs_prev_leaf() will keep returning 1 (unless some other task inserts smaller keys). The right thing is probably to log an error telling the target range, dump the leaf, abort the transaction and return an error. >
diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c index 79f8f692aaa8f6df2c9482fbd7777c2812528f65..893d963951315abfc734e1ca232b3087b7889431 100644 --- a/fs/btrfs/raid-stripe-tree.c +++ b/fs/btrfs/raid-stripe-tree.c @@ -103,6 +103,31 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le found_end = found_start + key.offset; ret = 0; + /* + * The stripe extent starts before the range we want to delete, + * but the range spans more than one stripe extent: + * + * |--- RAID Stripe Extent ---||--- RAID Stripe Extent ---| + * |--- keep ---|--- drop ---| + * + * This means we have to get the previous item, truncate its + * length and then restart the search. + */ + if (found_start > start) { + + ret = btrfs_previous_item(stripe_root, path, start, + BTRFS_RAID_STRIPE_KEY); + if (ret < 0) + break; + ret = 0; + + leaf = path->nodes[0]; + slot = path->slots[0]; + btrfs_item_key_to_cpu(leaf, &key, slot); + found_start = key.objectid; + found_end = found_start + key.offset; + } + if (key.type != BTRFS_RAID_STRIPE_KEY) break; @@ -156,6 +181,9 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le break; } + /* + * Finally we can delete the whole item, no more special cases. + */ ret = btrfs_del_item(trans, stripe_root, path); if (ret) break;