[v2] btrfs: fix size class loading logic

Message ID	1c4c25be5fa66e14ae772f134045f64cf1fb74a6.1674686119.git.boris@bur.io (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-btrfs-owner@vger.kernel.org> Feedback-ID: i083147f8:Fastmail From: Boris Burkov <boris@bur.io> To: linux-btrfs@vger.kernel.org, kernel-team@fb.com Subject: [PATCH v2] btrfs: fix size class loading logic Date: Wed, 25 Jan 2023 14:37:41 -0800 Message-Id: <1c4c25be5fa66e14ae772f134045f64cf1fb74a6.1674686119.git.boris@bur.io> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[v2] btrfs: fix size class loading logic \| expand [v2] btrfs: fix size class loading logic

Message ID

1c4c25be5fa66e14ae772f134045f64cf1fb74a6.1674686119.git.boris@bur.io (mailing list archive)

State

New, archived

Headers

Feedback-ID: i083147f8:Fastmail
From: Boris Burkov <boris@bur.io>
To: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: [PATCH v2] btrfs: fix size class loading logic
Date: Wed, 25 Jan 2023 14:37:41 -0800
Message-Id: 
 <1c4c25be5fa66e14ae772f134045f64cf1fb74a6.1674686119.git.boris@bur.io>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

[v2] btrfs: fix size class loading logic | expand

Commit Message

Boris Burkov Jan. 25, 2023, 10:37 p.m. UTC

This is an incremental patch fixing bugs in:
btrfs: load block group size class when caching

The commit message should be:
btrfs: load block group size class when caching

Since the size class is an artifact of an arbitrary anti fragmentation
strategy, it doesn't really make sense to persist it. Furthermore, most
of the size class logic assumes fresh block groups. That is of course
not a reasonable assumption -- we will be upgrading kernels with
existing filesystems whose block groups are not classified.

To work around those issues, implement logic to compute the size class
of the block groups as we cache them in. To perfectly assess the state
of a block group, we would have to read the entire extent tree (since
the free space cache mashes together contiguous extent items) which
would be prohibitively expensive for larger file systems with more
extents.

We can do it relatively cheaply by implementing a simple heuristic of
sampling a handful of extents and picking the smallest one we see. In
the happy case where the block group was classified, we will only see
extents of the correct size. In the unhappy case, we will hopefully find
one of the smaller extents, but there is no perfect answer anyway.
Autorelocation will eventually churn up the block group if there is
significant freeing anyway.

There was no regression in mount performance at end state of the fsperf
test suite, and the delay until the block group is marked cached is
minimized by the constant number of extent samples.

Signed-off-by: Boris Burkov <boris@bur.io>
---
v2: just commit message stuff to make it a nicer incremental fixup
patch. Also, drop the sysfs patch since it isn't a fixup.

 fs/btrfs/block-group.c | 56 ++++++++++++++++++++++++++++--------------
 1 file changed, 37 insertions(+), 19 deletions(-)

Comments

David Sterba Jan. 27, 2023, 1:26 p.m. UTC | #1

On Wed, Jan 25, 2023 at 02:37:41PM -0800, Boris Burkov wrote:
> This is an incremental patch fixing bugs in:
> btrfs: load block group size class when caching

Folded to the patch, thanks.

> The commit message should be:
> btrfs: load block group size class when caching
> 
> Since the size class is an artifact of an arbitrary anti fragmentation
> strategy, it doesn't really make sense to persist it. Furthermore, most
> of the size class logic assumes fresh block groups. That is of course
> not a reasonable assumption -- we will be upgrading kernels with
> existing filesystems whose block groups are not classified.
> 
> To work around those issues, implement logic to compute the size class
> of the block groups as we cache them in. To perfectly assess the state
> of a block group, we would have to read the entire extent tree (since
> the free space cache mashes together contiguous extent items) which
> would be prohibitively expensive for larger file systems with more
> extents.
> 
> We can do it relatively cheaply by implementing a simple heuristic of
> sampling a handful of extents and picking the smallest one we see. In
> the happy case where the block group was classified, we will only see
> extents of the correct size. In the unhappy case, we will hopefully find
> one of the smaller extents, but there is no perfect answer anyway.
> Autorelocation will eventually churn up the block group if there is
> significant freeing anyway.
> 
> There was no regression in mount performance at end state of the fsperf
> test suite, and the delay until the block group is marked cached is
> minimized by the constant number of extent samples.
> 
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
> v2: just commit message stuff to make it a nicer incremental fixup
> patch. Also, drop the sysfs patch since it isn't a fixup.
> 
>  fs/btrfs/block-group.c | 56 ++++++++++++++++++++++++++++--------------
>  1 file changed, 37 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 73e1270b3904..45ccb25c5b1f 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -638,7 +656,8 @@ static int sample_block_group_extent_item(struct btrfs_block_group *block_group,
>   *
>   * Returns: 0 on success, negative error code on error.
>   */
> -static int load_block_group_size_class(struct btrfs_block_group *block_group)
> +static int load_block_group_size_class(struct btrfs_caching_control *caching_ctl,
> +				       struct btrfs_block_group *block_group)
>  {
>  	struct btrfs_key key;
>  	int i;
> @@ -646,11 +665,11 @@ static int load_block_group_size_class(struct btrfs_block_group *block_group)
>  	enum btrfs_block_group_size_class size_class = BTRFS_BG_SZ_NONE;
>  	int ret;
>  
> -	if (btrfs_block_group_should_use_size_class(block_group))
> +	if (!btrfs_block_group_should_use_size_class(block_group))

Though this change was in the "btrfs: don't use size classes for zoned
file systems".

>  		return 0;
>  
>  	for (i = 0; i < 5; ++i) {
> -		ret = sample_block_group_extent_item(block_group, i, 5, &key);
> +		ret = sample_block_group_extent_item(caching_ctl, block_group, i, 5, &key);
>  		if (ret < 0)
>  			goto out;
>  		if (ret > 0)

diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 73e1270b3904..45ccb25c5b1f 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -555,7 +555,8 @@  u64 add_new_free_space(struct btrfs_block_group *block_group, u64 start, u64 end
  * Returns: 0 on success, 1 if the search didn't yield a useful item, negative
  * error code on error.
  */
-static int sample_block_group_extent_item(struct btrfs_block_group *block_group,
+static int sample_block_group_extent_item(struct btrfs_caching_control *caching_ctl,
+					  struct btrfs_block_group *block_group,
 					  int index, int max_index,
 					  struct btrfs_key *key)
 {
@@ -563,17 +564,19 @@  static int sample_block_group_extent_item(struct btrfs_block_group *block_group,
 	struct btrfs_root *extent_root;
 	int ret = 0;
 	u64 search_offset;
+	u64 search_end = block_group->start + block_group->length;
 	struct btrfs_path *path;
 
 	ASSERT(index >= 0);
 	ASSERT(index <= max_index);
 	ASSERT(max_index > 0);
+	lockdep_assert_held(&caching_ctl->mutex);
+	lockdep_assert_held_read(&fs_info->commit_root_sem);
 
 	path = btrfs_alloc_path();
 	if (!path)
 		return -ENOMEM;
 
-	down_read(&fs_info->commit_root_sem);
 	extent_root = btrfs_extent_root(fs_info, max_t(u64, block_group->start,
 						       BTRFS_SUPER_INFO_OFFSET));
 
@@ -586,21 +589,36 @@  static int sample_block_group_extent_item(struct btrfs_block_group *block_group,
 	key->type = BTRFS_EXTENT_ITEM_KEY;
 	key->offset = 0;
 
-	ret = btrfs_search_slot(NULL, extent_root, key, path, 0, 0);
-	if (ret != 0)
-		goto out;
-	if (key->objectid < block_group->start ||
-	    key->objectid > block_group->start + block_group->length) {
-		ret = 1;
-		goto out;
-	}
-	if (key->type != BTRFS_EXTENT_ITEM_KEY) {
-		ret = 1;
-		goto out;
+	while (1) {
+		ret = btrfs_search_forward(extent_root, key, path, 0);
+		if (ret != 0)
+			goto out;
+		/* Success; sampled an extent item in the block group */
+		if (key->type == BTRFS_EXTENT_ITEM_KEY &&
+		    key->objectid >= block_group->start &&
+		    key->objectid + key->offset <= search_end)
+			goto out;
+
+		/* We can't possibly find a valid extent item anymore */
+		if (key->objectid >= search_end) {
+			ret = 1;
+			break;
+		}
+		if (key->type < BTRFS_EXTENT_ITEM_KEY)
+			key->type = BTRFS_EXTENT_ITEM_KEY;
+		else
+			key->objectid++;
+		btrfs_release_path(path);
+		up_read(&fs_info->commit_root_sem);
+		mutex_unlock(&caching_ctl->mutex);
+		cond_resched();
+		mutex_lock(&caching_ctl->mutex);
+		down_read(&fs_info->commit_root_sem);
 	}
 out:
+	lockdep_assert_held(&caching_ctl->mutex);
+	lockdep_assert_held_read(&fs_info->commit_root_sem);
 	btrfs_free_path(path);
-	up_read(&fs_info->commit_root_sem);
 	return ret;
 }
 
@@ -638,7 +656,8 @@  static int sample_block_group_extent_item(struct btrfs_block_group *block_group,
  *
  * Returns: 0 on success, negative error code on error.
  */
-static int load_block_group_size_class(struct btrfs_block_group *block_group)
+static int load_block_group_size_class(struct btrfs_caching_control *caching_ctl,
+				       struct btrfs_block_group *block_group)
 {
 	struct btrfs_key key;
 	int i;
@@ -646,11 +665,11 @@  static int load_block_group_size_class(struct btrfs_block_group *block_group)
 	enum btrfs_block_group_size_class size_class = BTRFS_BG_SZ_NONE;
 	int ret;
 
-	if (btrfs_block_group_should_use_size_class(block_group))
+	if (!btrfs_block_group_should_use_size_class(block_group))
 		return 0;
 
 	for (i = 0; i < 5; ++i) {
-		ret = sample_block_group_extent_item(block_group, i, 5, &key);
+		ret = sample_block_group_extent_item(caching_ctl, block_group, i, 5, &key);
 		if (ret < 0)
 			goto out;
 		if (ret > 0)
@@ -812,6 +831,7 @@  static noinline void caching_thread(struct btrfs_work *work)
 	mutex_lock(&caching_ctl->mutex);
 	down_read(&fs_info->commit_root_sem);
 
+	load_block_group_size_class(caching_ctl, block_group);
 	if (btrfs_test_opt(fs_info, SPACE_CACHE)) {
 		ret = load_free_space_cache(block_group);
 		if (ret == 1) {
@@ -867,8 +887,6 @@  static noinline void caching_thread(struct btrfs_work *work)
 
 	wake_up(&caching_ctl->wait);
 
-	load_block_group_size_class(block_group);
-
 	btrfs_put_caching_control(caching_ctl);
 	btrfs_put_block_group(block_group);
 }

[v2] btrfs: fix size class loading logic

Commit Message

Comments

Patch