diff mbox series

[v3,1/5] btrfs: replace stripe extents

Message ID 20240701-b4-rst-updates-v3-1-e0437e1e04a6@kernel.org (mailing list archive)
State New
Headers show
Series btrfs: rst: updates for RAID stripe tree | expand

Commit Message

Johannes Thumshirn July 1, 2024, 10:25 a.m. UTC
From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

If we can't insert a stripe extent in the RAID stripe tree, because
the key that points to the specific position in the stripe tree is
already existing, we have to remove the item and then replace it by a
new item.

This can happen for example on device replace operations.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/btrfs/raid-stripe-tree.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

Comments

Josef Bacik July 1, 2024, 1:57 p.m. UTC | #1
On Mon, Jul 01, 2024 at 12:25:15PM +0200, Johannes Thumshirn wrote:
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> If we can't insert a stripe extent in the RAID stripe tree, because
> the key that points to the specific position in the stripe tree is
> already existing, we have to remove the item and then replace it by a
> new item.
> 
> This can happen for example on device replace operations.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/raid-stripe-tree.c | 34 ++++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
> 
> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
> index e6f7a234b8f6..3020820dd6e2 100644
> --- a/fs/btrfs/raid-stripe-tree.c
> +++ b/fs/btrfs/raid-stripe-tree.c
> @@ -73,6 +73,37 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
>  	return ret;
>  }
>  
> +static int replace_raid_extent_item(struct btrfs_trans_handle *trans,
> +				    struct btrfs_key *key,
> +				    struct btrfs_stripe_extent *stripe_extent,
> +				    const size_t item_size)
> +{
> +	struct btrfs_fs_info *fs_info = trans->fs_info;
> +	struct btrfs_root *stripe_root = fs_info->stripe_root;
> +	struct btrfs_path *path;
> +	int ret;
> +
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
> +
> +	ret = btrfs_search_slot(trans, stripe_root, key, path, -1, 1);
> +	if (ret)
> +		goto err;

This will leak 1 and we'll get an awkward btrfs_abort_transaction() call.  This
should be

if (ret) {
	ret = (ret == 1) ? -ENOENT : ret;
	goto err;
}

or whatever.  Thanks,

Josef
Johannes Thumshirn July 1, 2024, 3:08 p.m. UTC | #2
On 01.07.24 15:58, Josef Bacik wrote:
> On Mon, Jul 01, 2024 at 12:25:15PM +0200, Johannes Thumshirn wrote:
>> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>>
>> If we can't insert a stripe extent in the RAID stripe tree, because
>> the key that points to the specific position in the stripe tree is
>> already existing, we have to remove the item and then replace it by a
>> new item.
>>
>> This can happen for example on device replace operations.
>>
>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>> ---
>>   fs/btrfs/raid-stripe-tree.c | 34 ++++++++++++++++++++++++++++++++++
>>   1 file changed, 34 insertions(+)
>>
>> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
>> index e6f7a234b8f6..3020820dd6e2 100644
>> --- a/fs/btrfs/raid-stripe-tree.c
>> +++ b/fs/btrfs/raid-stripe-tree.c
>> @@ -73,6 +73,37 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
>>   	return ret;
>>   }
>>   
>> +static int replace_raid_extent_item(struct btrfs_trans_handle *trans,
>> +				    struct btrfs_key *key,
>> +				    struct btrfs_stripe_extent *stripe_extent,
>> +				    const size_t item_size)
>> +{
>> +	struct btrfs_fs_info *fs_info = trans->fs_info;
>> +	struct btrfs_root *stripe_root = fs_info->stripe_root;
>> +	struct btrfs_path *path;
>> +	int ret;
>> +
>> +	path = btrfs_alloc_path();
>> +	if (!path)
>> +		return -ENOMEM;
>> +
>> +	ret = btrfs_search_slot(trans, stripe_root, key, path, -1, 1);
>> +	if (ret)
>> +		goto err;
> 
> This will leak 1 and we'll get an awkward btrfs_abort_transaction() call.  This
> should be
> 
> if (ret) {
> 	ret = (ret == 1) ? -ENOENT : ret;
> 	goto err;
> }
> 
> or whatever.  Thanks,

I wonder why I've never seen this in my testing. Could it be, that due 
to the fact that btrfs_insert_item() returns -EEXIST on the same 
key.objectid, we're more or less guaranteed it'll exist.
Josef Bacik July 1, 2024, 8:34 p.m. UTC | #3
On Mon, Jul 01, 2024 at 03:08:22PM +0000, Johannes Thumshirn wrote:
> On 01.07.24 15:58, Josef Bacik wrote:
> > On Mon, Jul 01, 2024 at 12:25:15PM +0200, Johannes Thumshirn wrote:
> >> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> >>
> >> If we can't insert a stripe extent in the RAID stripe tree, because
> >> the key that points to the specific position in the stripe tree is
> >> already existing, we have to remove the item and then replace it by a
> >> new item.
> >>
> >> This can happen for example on device replace operations.
> >>
> >> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> >> ---
> >>   fs/btrfs/raid-stripe-tree.c | 34 ++++++++++++++++++++++++++++++++++
> >>   1 file changed, 34 insertions(+)
> >>
> >> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
> >> index e6f7a234b8f6..3020820dd6e2 100644
> >> --- a/fs/btrfs/raid-stripe-tree.c
> >> +++ b/fs/btrfs/raid-stripe-tree.c
> >> @@ -73,6 +73,37 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
> >>   	return ret;
> >>   }
> >>   
> >> +static int replace_raid_extent_item(struct btrfs_trans_handle *trans,
> >> +				    struct btrfs_key *key,
> >> +				    struct btrfs_stripe_extent *stripe_extent,
> >> +				    const size_t item_size)
> >> +{
> >> +	struct btrfs_fs_info *fs_info = trans->fs_info;
> >> +	struct btrfs_root *stripe_root = fs_info->stripe_root;
> >> +	struct btrfs_path *path;
> >> +	int ret;
> >> +
> >> +	path = btrfs_alloc_path();
> >> +	if (!path)
> >> +		return -ENOMEM;
> >> +
> >> +	ret = btrfs_search_slot(trans, stripe_root, key, path, -1, 1);
> >> +	if (ret)
> >> +		goto err;
> > 
> > This will leak 1 and we'll get an awkward btrfs_abort_transaction() call.  This
> > should be
> > 
> > if (ret) {
> > 	ret = (ret == 1) ? -ENOENT : ret;
> > 	goto err;
> > }
> > 
> > or whatever.  Thanks,
> 
> I wonder why I've never seen this in my testing. Could it be, that due 
> to the fact that btrfs_insert_item() returns -EEXIST on the same 
> key.objectid, we're more or less guaranteed it'll exist.

Yeah it's fine in the way it is currently, but if anything changes in the future
we're going to figure it out and be super sad we didn't just handle it right in
the first place.  Thanks,

Josef
Josef Bacik July 1, 2024, 8:37 p.m. UTC | #4
On Mon, Jul 01, 2024 at 12:25:15PM +0200, Johannes Thumshirn wrote:
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> If we can't insert a stripe extent in the RAID stripe tree, because
> the key that points to the specific position in the stripe tree is
> already existing, we have to remove the item and then replace it by a
> new item.
> 
> This can happen for example on device replace operations.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/btrfs/raid-stripe-tree.c | 34 ++++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
> 
> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
> index e6f7a234b8f6..3020820dd6e2 100644
> --- a/fs/btrfs/raid-stripe-tree.c
> +++ b/fs/btrfs/raid-stripe-tree.c
> @@ -73,6 +73,37 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
>  	return ret;
>  }
>  
> +static int replace_raid_extent_item(struct btrfs_trans_handle *trans,
> +				    struct btrfs_key *key,
> +				    struct btrfs_stripe_extent *stripe_extent,
> +				    const size_t item_size)
> +{
> +	struct btrfs_fs_info *fs_info = trans->fs_info;
> +	struct btrfs_root *stripe_root = fs_info->stripe_root;
> +	struct btrfs_path *path;
> +	int ret;
> +
> +	path = btrfs_alloc_path();
> +	if (!path)
> +		return -ENOMEM;
> +
> +	ret = btrfs_search_slot(trans, stripe_root, key, path, -1, 1);
> +	if (ret)
> +		goto err;
> +
> +	ret = btrfs_del_item(trans, stripe_root, path);
> +	if (ret)
> +		goto err;
> +
> +	btrfs_free_path(path);
> +
> +	return btrfs_insert_item(trans, stripe_root, key, stripe_extent,
> +				 item_size);
> + err:
> +	btrfs_free_path(path);
> +	return ret;
> +}
> +
>  static int btrfs_insert_one_raid_extent(struct btrfs_trans_handle *trans,
>  					struct btrfs_io_context *bioc)
>  {
> @@ -112,6 +143,9 @@ static int btrfs_insert_one_raid_extent(struct btrfs_trans_handle *trans,
>  
>  	ret = btrfs_insert_item(trans, stripe_root, &stripe_key, stripe_extent,
>  				item_size);
> +	if (ret == -EEXIST)
> +		ret = replace_raid_extent_item(trans, &stripe_key,
> +					       stripe_extent, item_size);

I had another thought, how often is this particular thing happening?  Bec ause
we're doing 3 path allocations here in the worst case.  If it happens more than
say 10% of the time then we need to allocate a path once in
btrfs_insert_one_raid_extent(), do the insert, and if it fails re-use that path
to do the delete and insert the new one.  Thanks,

Josef
Johannes Thumshirn July 2, 2024, 5:41 a.m. UTC | #5
On 01.07.24 22:37, Josef Bacik wrote:
>> +	if (ret == -EEXIST)
>> +		ret = replace_raid_extent_item(trans, &stripe_key,
>> +					       stripe_extent, item_size);
> 
> I had another thought, how often is this particular thing happening?  Bec ause
> we're doing 3 path allocations here in the worst case.  If it happens more than
> say 10% of the time then we need to allocate a path once in
> btrfs_insert_one_raid_extent(), do the insert, and if it fails re-use that path
> to do the delete and insert the new one.  Thanks,

That indeed is a good question. I'll add some tracepoints to see how 
often this is getting called.
diff mbox series

Patch

diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index e6f7a234b8f6..3020820dd6e2 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -73,6 +73,37 @@  int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
 	return ret;
 }
 
+static int replace_raid_extent_item(struct btrfs_trans_handle *trans,
+				    struct btrfs_key *key,
+				    struct btrfs_stripe_extent *stripe_extent,
+				    const size_t item_size)
+{
+	struct btrfs_fs_info *fs_info = trans->fs_info;
+	struct btrfs_root *stripe_root = fs_info->stripe_root;
+	struct btrfs_path *path;
+	int ret;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	ret = btrfs_search_slot(trans, stripe_root, key, path, -1, 1);
+	if (ret)
+		goto err;
+
+	ret = btrfs_del_item(trans, stripe_root, path);
+	if (ret)
+		goto err;
+
+	btrfs_free_path(path);
+
+	return btrfs_insert_item(trans, stripe_root, key, stripe_extent,
+				 item_size);
+ err:
+	btrfs_free_path(path);
+	return ret;
+}
+
 static int btrfs_insert_one_raid_extent(struct btrfs_trans_handle *trans,
 					struct btrfs_io_context *bioc)
 {
@@ -112,6 +143,9 @@  static int btrfs_insert_one_raid_extent(struct btrfs_trans_handle *trans,
 
 	ret = btrfs_insert_item(trans, stripe_root, &stripe_key, stripe_extent,
 				item_size);
+	if (ret == -EEXIST)
+		ret = replace_raid_extent_item(trans, &stripe_key,
+					       stripe_extent, item_size);
 	if (ret)
 		btrfs_abort_transaction(trans, ret);