diff mbox series

[v2] btrfs: scrub: avoid unnecessary extent tree search for striped profiles

Message ID c21b78ee8bcf22f373beeefb8ee47ee92dfe8f03.1692097289.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series [v2] btrfs: scrub: avoid unnecessary extent tree search for striped profiles | expand

Commit Message

Qu Wenruo Aug. 15, 2023, 11:07 a.m. UTC
[PROBLEM]
Since commit 8557635ed2b0 ("btrfs: scrub: introduce dedicated helper to
scrub simple-stripe based range"), the scrub speed of striped profiles
(RAID0/RAID10/RAID5/RAID6) are degraded, if the block group is mostly
empty or fragmented.

[CAUSE]
In scrub_simple_stripe(), which is the responsible for RAID0/RAID10
profiles, we just call scrub_simple_mirror() and increase our
@cur_logical and @cur_physical.

The problem is, if there are no more extents inside the block group, or
the next extent is far away from our current logical, we would call
scrub_simple_mirror() for the empty ranges again and again, until we
reach the next next.

This is completely a waste of CPU time, thus it greatly degrade the
scrub performance for stripped profiles.

This is also affecting RAID56, as we rely on scrub_simple_mirror() for
data stripes of RAID56.

[FIX]
- Introduce scrub_ctx::found_next to record the next extent we found
  This member would be updated by find_first_extent_item() calls inside
  scrub_find_fill_first_stripe().

- Skip to the next stripe directly in scrub_simple_stripe()
  If we detect sctx->found_next is beyond our current stripe, we just
  skip to the full stripe which covers the target bytenr.

- Skip to the next full stripe covering sctx->found_next
  Unlike RAID0/RAID10, we can not easily skip to the next stripe due to
  rotation.
  But we can still skip to the next full stripe, which can still save us
  a lot of time.

Fixes: 8557635ed2b0 ("btrfs: scrub: introduce dedicated helper to scrub simple-stripe based range")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Changelog:
v2:
- Fix a u64/u32 division not using the div_u64() helper

- Slightly change the advancement of logical/physical for RAID0 and
  RAID56
  Now logical/physical is always increased first, this removes one
  if () branch.

This patch is based on the scrub_testing branch (which is misc-next +
scrub performance fixes).

Thus there would be quite some conflicts for stable branches and would
need manual backport.
---
 fs/btrfs/scrub.c | 82 +++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 74 insertions(+), 8 deletions(-)

Comments

kernel test robot Aug. 15, 2023, 6:21 p.m. UTC | #1
Hi Qu,

kernel test robot noticed the following build errors:

[auto build test ERROR on kdave/for-next]
[also build test ERROR on next-20230815]
[cannot apply to linus/master v6.5-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Qu-Wenruo/btrfs-scrub-avoid-unnecessary-extent-tree-search-for-striped-profiles/20230815-191040
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
patch link:    https://lore.kernel.org/r/c21b78ee8bcf22f373beeefb8ee47ee92dfe8f03.1692097289.git.wqu%40suse.com
patch subject: [PATCH v2] btrfs: scrub: avoid unnecessary extent tree search for striped profiles
config: i386-allyesconfig (https://download.01.org/0day-ci/archive/20230816/202308160248.ZzecEExL-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce: (https://download.01.org/0day-ci/archive/20230816/202308160248.ZzecEExL-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308160248.ZzecEExL-lkp@intel.com/

All errors (new ones prefixed by >>):

   ld: fs/btrfs/scrub.o: in function `scrub_stripe':
>> scrub.c:(.text+0x3d74): undefined reference to `__umoddi3'
kernel test robot Aug. 15, 2023, 6:31 p.m. UTC | #2
Hi Qu,

kernel test robot noticed the following build errors:

[auto build test ERROR on kdave/for-next]
[also build test ERROR on next-20230815]
[cannot apply to linus/master v6.5-rc6]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Qu-Wenruo/btrfs-scrub-avoid-unnecessary-extent-tree-search-for-striped-profiles/20230815-191040
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
patch link:    https://lore.kernel.org/r/c21b78ee8bcf22f373beeefb8ee47ee92dfe8f03.1692097289.git.wqu%40suse.com
patch subject: [PATCH v2] btrfs: scrub: avoid unnecessary extent tree search for striped profiles
config: mips-allyesconfig (https://download.01.org/0day-ci/archive/20230816/202308160235.tlWKcKbu-lkp@intel.com/config)
compiler: mips-linux-gcc (GCC) 12.3.0
reproduce: (https://download.01.org/0day-ci/archive/20230816/202308160235.tlWKcKbu-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308160235.tlWKcKbu-lkp@intel.com/

All errors (new ones prefixed by >>):

   arch/mips/kernel/head.o: in function `kernel_entry':
   (.ref.text+0xac): relocation truncated to fit: R_MIPS_26 against `start_kernel'
   init/main.o: in function `set_reset_devices':
   main.c:(.init.text+0x20): relocation truncated to fit: R_MIPS_26 against `_mcount'
   main.c:(.init.text+0x30): relocation truncated to fit: R_MIPS_26 against `__sanitizer_cov_trace_pc'
   init/main.o: in function `debug_kernel':
   main.c:(.init.text+0xa4): relocation truncated to fit: R_MIPS_26 against `_mcount'
   main.c:(.init.text+0xb4): relocation truncated to fit: R_MIPS_26 against `__sanitizer_cov_trace_pc'
   init/main.o: in function `quiet_kernel':
   main.c:(.init.text+0x128): relocation truncated to fit: R_MIPS_26 against `_mcount'
   main.c:(.init.text+0x138): relocation truncated to fit: R_MIPS_26 against `__sanitizer_cov_trace_pc'
   init/main.o: in function `warn_bootconfig':
   main.c:(.init.text+0x1ac): relocation truncated to fit: R_MIPS_26 against `_mcount'
   main.c:(.init.text+0x1bc): relocation truncated to fit: R_MIPS_26 against `__sanitizer_cov_trace_pc'
   init/main.o: in function `init_setup':
   main.c:(.init.text+0x230): relocation truncated to fit: R_MIPS_26 against `_mcount'
   main.c:(.init.text+0x254): additional relocation overflows omitted from the output
   mips-linux-ld: fs/btrfs/scrub.o: in function `scrub_stripe':
>> scrub.c:(.text.scrub_stripe+0x7a0): undefined reference to `__umoddi3'
Qu Wenruo Aug. 16, 2023, 2:35 a.m. UTC | #3
On 2023/8/16 02:21, kernel test robot wrote:
> Hi Qu,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on kdave/for-next]
> [also build test ERROR on next-20230815]
> [cannot apply to linus/master v6.5-rc6]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url:    https://github.com/intel-lab-lkp/linux/commits/Qu-Wenruo/btrfs-scrub-avoid-unnecessary-extent-tree-search-for-striped-profiles/20230815-191040
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-next
> patch link:    https://lore.kernel.org/r/c21b78ee8bcf22f373beeefb8ee47ee92dfe8f03.1692097289.git.wqu%40suse.com
> patch subject: [PATCH v2] btrfs: scrub: avoid unnecessary extent tree search for striped profiles
> config: i386-allyesconfig (https://download.01.org/0day-ci/archive/20230816/202308160248.ZzecEExL-lkp@intel.com/config)
> compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
> reproduce: (https://download.01.org/0day-ci/archive/20230816/202308160248.ZzecEExL-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202308160248.ZzecEExL-lkp@intel.com/
>
> All errors (new ones prefixed by >>):
>
>     ld: fs/btrfs/scrub.o: in function `scrub_stripe':
>>> scrub.c:(.text+0x3d74): undefined reference to `__umoddi3'
>

This doesn't look sane to me.

The same file, scrub.c, already has some other call sites of div_u64()
in scrub_throttle_dev_io():

         if (delta) {
                 long timeout;

                 timeout = div_u64(delta * HZ, 1000);
                 schedule_timeout_interruptible(timeout);
         }

So how could this be a problem?

Thanks,
Qu
David Sterba Aug. 17, 2023, 11:47 a.m. UTC | #4
On Tue, Aug 15, 2023 at 07:07:19PM +0800, Qu Wenruo wrote:
> [PROBLEM]
> Since commit 8557635ed2b0 ("btrfs: scrub: introduce dedicated helper to
> scrub simple-stripe based range"), the scrub speed of striped profiles
> (RAID0/RAID10/RAID5/RAID6) are degraded, if the block group is mostly
> empty or fragmented.
> 
> [CAUSE]
> In scrub_simple_stripe(), which is the responsible for RAID0/RAID10
> profiles, we just call scrub_simple_mirror() and increase our
> @cur_logical and @cur_physical.
> 
> The problem is, if there are no more extents inside the block group, or
> the next extent is far away from our current logical, we would call
> scrub_simple_mirror() for the empty ranges again and again, until we
> reach the next next.
> 
> This is completely a waste of CPU time, thus it greatly degrade the
> scrub performance for stripped profiles.
> 
> This is also affecting RAID56, as we rely on scrub_simple_mirror() for
> data stripes of RAID56.
> 
> [FIX]
> - Introduce scrub_ctx::found_next to record the next extent we found
>   This member would be updated by find_first_extent_item() calls inside
>   scrub_find_fill_first_stripe().
> 
> - Skip to the next stripe directly in scrub_simple_stripe()
>   If we detect sctx->found_next is beyond our current stripe, we just
>   skip to the full stripe which covers the target bytenr.
> 
> - Skip to the next full stripe covering sctx->found_next
>   Unlike RAID0/RAID10, we can not easily skip to the next stripe due to
>   rotation.
>   But we can still skip to the next full stripe, which can still save us
>   a lot of time.
> 
> Fixes: 8557635ed2b0 ("btrfs: scrub: introduce dedicated helper to scrub simple-stripe based range")
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Changelog:
> v2:
> - Fix a u64/u32 division not using the div_u64() helper
> 
> - Slightly change the advancement of logical/physical for RAID0 and
>   RAID56
>   Now logical/physical is always increased first, this removes one
>   if () branch.
> 
> This patch is based on the scrub_testing branch (which is misc-next +
> scrub performance fixes).
> 
> Thus there would be quite some conflicts for stable branches and would
> need manual backport.

Added to misc-next, thanks.
Qu Wenruo Aug. 17, 2023, 11:08 p.m. UTC | #5
On 2023/8/17 19:47, David Sterba wrote:
> On Tue, Aug 15, 2023 at 07:07:19PM +0800, Qu Wenruo wrote:
>> [PROBLEM]
>> Since commit 8557635ed2b0 ("btrfs: scrub: introduce dedicated helper to
>> scrub simple-stripe based range"), the scrub speed of striped profiles
>> (RAID0/RAID10/RAID5/RAID6) are degraded, if the block group is mostly
>> empty or fragmented.
>>
>> [CAUSE]
>> In scrub_simple_stripe(), which is the responsible for RAID0/RAID10
>> profiles, we just call scrub_simple_mirror() and increase our
>> @cur_logical and @cur_physical.
>>
>> The problem is, if there are no more extents inside the block group, or
>> the next extent is far away from our current logical, we would call
>> scrub_simple_mirror() for the empty ranges again and again, until we
>> reach the next next.
>>
>> This is completely a waste of CPU time, thus it greatly degrade the
>> scrub performance for stripped profiles.
>>
>> This is also affecting RAID56, as we rely on scrub_simple_mirror() for
>> data stripes of RAID56.
>>
>> [FIX]
>> - Introduce scrub_ctx::found_next to record the next extent we found
>>    This member would be updated by find_first_extent_item() calls inside
>>    scrub_find_fill_first_stripe().
>>
>> - Skip to the next stripe directly in scrub_simple_stripe()
>>    If we detect sctx->found_next is beyond our current stripe, we just
>>    skip to the full stripe which covers the target bytenr.
>>
>> - Skip to the next full stripe covering sctx->found_next
>>    Unlike RAID0/RAID10, we can not easily skip to the next stripe due to
>>    rotation.
>>    But we can still skip to the next full stripe, which can still save us
>>    a lot of time.
>>
>> Fixes: 8557635ed2b0 ("btrfs: scrub: introduce dedicated helper to scrub simple-stripe based range")
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> Changelog:
>> v2:
>> - Fix a u64/u32 division not using the div_u64() helper
>>
>> - Slightly change the advancement of logical/physical for RAID0 and
>>    RAID56
>>    Now logical/physical is always increased first, this removes one
>>    if () branch.
>>
>> This patch is based on the scrub_testing branch (which is misc-next +
>> scrub performance fixes).
>>
>> Thus there would be quite some conflicts for stable branches and would
>> need manual backport.
> 
> Added to misc-next, thanks.

BTW, did you hit any compiling errors on 32bits systems?

Some bots reported linkage error due to div_u64(), but I see the same 
div_u64() calls inside scrub.c thus I'm not sure what's really going on 
here.

Thanks,
Qu
David Sterba Aug. 17, 2023, 11:16 p.m. UTC | #6
On Fri, Aug 18, 2023 at 07:08:56AM +0800, Qu Wenruo wrote:
> On 2023/8/17 19:47, David Sterba wrote:
> > On Tue, Aug 15, 2023 at 07:07:19PM +0800, Qu Wenruo wrote:
> >>
> >> Fixes: 8557635ed2b0 ("btrfs: scrub: introduce dedicated helper to scrub simple-stripe based range")
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >> ---
> >> Changelog:
> >> v2:
> >> - Fix a u64/u32 division not using the div_u64() helper
> >>
> >> - Slightly change the advancement of logical/physical for RAID0 and
> >>    RAID56
> >>    Now logical/physical is always increased first, this removes one
> >>    if () branch.
> >>
> >> This patch is based on the scrub_testing branch (which is misc-next +
> >> scrub performance fixes).
> >>
> >> Thus there would be quite some conflicts for stable branches and would
> >> need manual backport.
> > 
> > Added to misc-next, thanks.
> 
> BTW, did you hit any compiling errors on 32bits systems?
> 
> Some bots reported linkage error due to div_u64(), but I see the same 
> div_u64() calls inside scrub.c thus I'm not sure what's really going on 
> here.

I did a build test now and it's clean. The bots sometimes lag behind
either git trees or mailinglist.
diff mbox series

Patch

diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 6d83f5ed1d93..749818bd9b8f 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -219,6 +219,14 @@  struct scrub_ctx {
 	 * doing the wakeup() call.
 	 */
 	refcount_t              refs;
+
+	/*
+	 * Indicate the next logical that is covered by an extent.
+	 *
+	 * This is for striped profiles to skip stripes which doesn't have
+	 * any extent.
+	 */
+	u64			found_next;
 };
 
 struct scrub_warning {
@@ -1365,7 +1373,8 @@  static int compare_extent_item_range(struct btrfs_path *path,
  */
 static int find_first_extent_item(struct btrfs_root *extent_root,
 				  struct btrfs_path *path,
-				  u64 search_start, u64 search_len)
+				  u64 search_start, u64 search_len,
+				  u64 *found_next_ret)
 {
 	struct btrfs_fs_info *fs_info = extent_root->fs_info;
 	struct btrfs_key key;
@@ -1401,8 +1410,11 @@  static int find_first_extent_item(struct btrfs_root *extent_root,
 search_forward:
 	while (true) {
 		btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
-		if (key.objectid >= search_start + search_len)
+		if (key.objectid >= search_start + search_len) {
+			if (found_next_ret)
+				*found_next_ret = key.objectid;
 			break;
+		}
 		if (key.type != BTRFS_METADATA_ITEM_KEY &&
 		    key.type != BTRFS_EXTENT_ITEM_KEY)
 			goto next;
@@ -1410,13 +1422,18 @@  static int find_first_extent_item(struct btrfs_root *extent_root,
 		ret = compare_extent_item_range(path, search_start, search_len);
 		if (ret == 0)
 			return ret;
-		if (ret > 0)
+		if (ret > 0) {
+			if (found_next_ret)
+				*found_next_ret = key.objectid;
 			break;
+		}
 next:
 		path->slots[0]++;
 		if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) {
 			ret = btrfs_next_leaf(extent_root, path);
 			if (ret) {
+				if (ret > 0 && found_next_ret)
+					*found_next_ret = U64_MAX;
 				/* Either no more item or fatal error */
 				btrfs_release_path(path);
 				return ret;
@@ -1518,7 +1535,8 @@  static int scrub_find_fill_first_stripe(struct btrfs_block_group *bg,
 					struct btrfs_device *dev, u64 physical,
 					int mirror_num, u64 logical_start,
 					u32 logical_len,
-					struct scrub_stripe *stripe)
+					struct scrub_stripe *stripe,
+					u64 *found_next_ret)
 {
 	struct btrfs_fs_info *fs_info = bg->fs_info;
 	struct btrfs_root *extent_root = btrfs_extent_root(fs_info, bg->start);
@@ -1540,7 +1558,7 @@  static int scrub_find_fill_first_stripe(struct btrfs_block_group *bg,
 	ASSERT(logical_start >= bg->start && logical_end <= bg->start + bg->length);
 
 	ret = find_first_extent_item(extent_root, extent_path, logical_start,
-				     logical_len);
+				     logical_len, found_next_ret);
 	/* Either error or not found. */
 	if (ret)
 		goto out;
@@ -1574,7 +1592,8 @@  static int scrub_find_fill_first_stripe(struct btrfs_block_group *bg,
 	/* Fill the extent info for the remaining sectors. */
 	while (cur_logical <= stripe_end) {
 		ret = find_first_extent_item(extent_root, extent_path, cur_logical,
-					     stripe_end - cur_logical + 1);
+					     stripe_end - cur_logical + 1,
+					     found_next_ret);
 		if (ret < 0)
 			goto out;
 		if (ret > 0) {
@@ -1809,7 +1828,7 @@  static int queue_scrub_stripe(struct scrub_ctx *sctx, struct btrfs_block_group *
 	scrub_reset_stripe(stripe);
 	ret = scrub_find_fill_first_stripe(bg, &sctx->extent_path,
 			&sctx->csum_path, dev, physical, mirror_num, logical,
-			length, stripe);
+			length, stripe, &sctx->found_next);
 	/* Either >0 as no more extents or <0 for error. */
 	if (ret)
 		return ret;
@@ -1881,7 +1900,7 @@  static int scrub_raid56_parity_stripe(struct scrub_ctx *sctx,
 		ret = scrub_find_fill_first_stripe(bg, &extent_path, &csum_path,
 				map->stripes[stripe_index].dev, physical, 1,
 				full_stripe_start + btrfs_stripe_nr_to_offset(i),
-				BTRFS_STRIPE_LEN, stripe);
+				BTRFS_STRIPE_LEN, stripe, NULL);
 		if (ret < 0)
 			goto out;
 		/*
@@ -2124,10 +2143,32 @@  static int scrub_simple_stripe(struct scrub_ctx *sctx,
 					  mirror_num);
 		if (ret)
 			return ret;
+
 		/* Skip to next stripe which belongs to the target device */
 		cur_logical += logical_increment;
 		/* For physical offset, we just go to next stripe */
 		cur_physical += BTRFS_STRIPE_LEN;
+
+		/* No more extent item. all done. */
+		if (sctx->found_next >= bg->start + bg->length) {
+			sctx->stat.last_physical = orig_physical +
+				div_u64(bg->length, map->num_stripes /
+					map->sub_stripes);
+			return 0;
+		}
+		/*
+		 * The next found extent is already beyond our stripe.
+		 * Skip to the next extent.
+		 */
+		if (sctx->found_next >= cur_logical) {
+			unsigned int stripes_skipped;
+
+			/* Advance to the next stripe covering sctx->found_next. */
+			stripes_skipped = div_u64(sctx->found_next - cur_logical,
+						  logical_increment);
+			cur_logical += logical_increment * stripes_skipped;
+			cur_physical += BTRFS_STRIPE_LEN * stripes_skipped;
+		}
 	}
 	return ret;
 }
@@ -2158,6 +2199,7 @@  static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 
 	/* Extent_path should be probably released. */
 	ASSERT(sctx->extent_path.nodes[0] == NULL);
+	sctx->found_next = chunk_logical;
 
 	scrub_blocked_if_needed(fs_info);
 
@@ -2235,8 +2277,13 @@  static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 	 * using their physical offset.
 	 */
 	while (physical < physical_end) {
+		u64 full_stripe_start;
+		u32 full_stripe_len = increment;
+
 		ret = get_raid56_logic_offset(physical, stripe_index, map,
 					      &logical, &stripe_logical);
+		full_stripe_start = rounddown(logical, full_stripe_len) +
+				    chunk_logical;
 		logical += chunk_logical;
 		if (ret) {
 			/* it is parity strip */
@@ -2263,6 +2310,25 @@  static noinline_for_stack int scrub_stripe(struct scrub_ctx *sctx,
 next:
 		logical += increment;
 		physical += BTRFS_STRIPE_LEN;
+		full_stripe_start += full_stripe_len;
+
+		/* No more extent in the block group. */
+		if (sctx->found_next >= bg->start + bg->length) {
+			spin_lock(&sctx->stat_lock);
+			sctx->stat.last_physical = physical_end;
+			spin_unlock(&sctx->stat_lock);
+			goto out;
+		}
+
+		if (sctx->found_next >= full_stripe_start) {
+			unsigned int stripes_skipped;
+
+			stripes_skipped = div_u64(sctx->found_next - full_stripe_start,
+						  full_stripe_len);
+			logical += increment * stripes_skipped;
+			physical += BTRFS_STRIPE_LEN * stripes_skipped;
+		}
+
 		spin_lock(&sctx->stat_lock);
 		if (stop_loop)
 			sctx->stat.last_physical =