diff mbox series

btrfs: don't bother stripe length if the profile is not stripe based

Message ID 20211119061933.13560-1-wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: don't bother stripe length if the profile is not stripe based | expand

Commit Message

Qu Wenruo Nov. 19, 2021, 6:19 a.m. UTC
[BUG]
When debugging calc_bio_boundaries(), I found that even for RAID1
metadata, we're following stripe length to calculate stripe boundary.

 # mkfs.btrfs -m raid1 -d raid1 /dev/test/scratch[12]
 # mount /dev/test/scratch /mnt/btrfs
 # xfs_io -f -c "pwrite 0 64K" /mnt/btrfs/file
 # umount

Above very basic operations will make calc_bio_boundaries() to report
the following result:

 submit_extent_page: r/i=1/1 file_offset=22036480 len_to_stripe_boundary=49152
 submit_extent_page: r/i=1/1 file_offset=30474240 len_to_stripe_boundary=65536
 ...
 submit_extent_page: r/i=1/1 file_offset=30523392 len_to_stripe_boundary=16384
 submit_extent_page: r/i=1/1 file_offset=30457856 len_to_stripe_boundary=16384
 submit_extent_page: r/i=5/257 file_offset=0 len_to_stripe_boundary=65536
 submit_extent_page: r/i=5/257 file_offset=65536 len_to_stripe_boundary=65536
 submit_extent_page: r/i=1/1 file_offset=30490624 len_to_stripe_boundary=49152
 submit_extent_page: r/i=1/1 file_offset=30507008 len_to_stripe_boundary=32768

Where "r/i" is the rootid and inode, 1/1 means they metadata.
The remaining names match the member used in kernel.

Even all data/metadata are using raid1, we're still following stripe
length.

[CAUSE]
This behavior is caused by a wrong condition in btrfs_get_io_geometry():

	if (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
		/* Fill using stripe_len */
		len = min_t(u64, em->len - offset, max_len);
	} else {
		len = em->len - offset;
	}

This means, only for SINGLE we will not follow stripe_len.

However for profiles like RAID1*, DUP, they don't need to bother
stripe_len.

This can lead to unnecessry bio split for RAID1*/DUP profiles, and can
even be a blockage for future zoned RAID support.

[FIX]
Introduce one single-use macro, BTRFS_BLOCK_GROUP_STRIPE_MASK, and
change the condition to only calculate the length using stripe length
for stripe based profiles.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/volumes.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Johannes Thumshirn Nov. 19, 2021, 8:37 a.m. UTC | #1
Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Anand Jain Nov. 19, 2021, 10:22 a.m. UTC | #2
LGTM.
Reviewed-by: Anand Jain <anand.jain@oracle.com>

Nit:

> +#define BTRFS_BLOCK_GROUP_STRIPE_MASK	(BTRFS_BLOCK_GROUP_RAID0 |\
> +					 BTRFS_BLOCK_GROUP_RAID10 |\
> +					 BTRFS_BLOCK_GROUP_RAID56_MASK)

  How about moving this define to btrfs_tree.h at line number 911?
  We have other BG MASKs there.

Thanks, Anand
Qu Wenruo Nov. 19, 2021, 10:32 a.m. UTC | #3
On 2021/11/19 18:22, Anand Jain wrote:
> LGTM.
> Reviewed-by: Anand Jain <anand.jain@oracle.com>
>
> Nit:
>
>> +#define BTRFS_BLOCK_GROUP_STRIPE_MASK    (BTRFS_BLOCK_GROUP_RAID0 |\
>> +                     BTRFS_BLOCK_GROUP_RAID10 |\
>> +                     BTRFS_BLOCK_GROUP_RAID56_MASK)
>
>   How about moving this define to btrfs_tree.h at line number 911?
>   We have other BG MASKs there.

I don't think we need it in the long run.

There is already a plan to remove the stripe_len requirement completely
(all the bio splitting will happen at btrfs_map_bio() time instead).

Even this definition would be gone in the future.

Thanks,
Qu
>
> Thanks, Anand
David Sterba Nov. 22, 2021, 6:29 p.m. UTC | #4
On Fri, Nov 19, 2021 at 02:19:33PM +0800, Qu Wenruo wrote:
> [BUG]
> When debugging calc_bio_boundaries(), I found that even for RAID1
> metadata, we're following stripe length to calculate stripe boundary.
> 
>  # mkfs.btrfs -m raid1 -d raid1 /dev/test/scratch[12]
>  # mount /dev/test/scratch /mnt/btrfs
>  # xfs_io -f -c "pwrite 0 64K" /mnt/btrfs/file
>  # umount
> 
> Above very basic operations will make calc_bio_boundaries() to report
> the following result:
> 
>  submit_extent_page: r/i=1/1 file_offset=22036480 len_to_stripe_boundary=49152
>  submit_extent_page: r/i=1/1 file_offset=30474240 len_to_stripe_boundary=65536
>  ...
>  submit_extent_page: r/i=1/1 file_offset=30523392 len_to_stripe_boundary=16384
>  submit_extent_page: r/i=1/1 file_offset=30457856 len_to_stripe_boundary=16384
>  submit_extent_page: r/i=5/257 file_offset=0 len_to_stripe_boundary=65536
>  submit_extent_page: r/i=5/257 file_offset=65536 len_to_stripe_boundary=65536
>  submit_extent_page: r/i=1/1 file_offset=30490624 len_to_stripe_boundary=49152
>  submit_extent_page: r/i=1/1 file_offset=30507008 len_to_stripe_boundary=32768
> 
> Where "r/i" is the rootid and inode, 1/1 means they metadata.
> The remaining names match the member used in kernel.
> 
> Even all data/metadata are using raid1, we're still following stripe
> length.
> 
> [CAUSE]
> This behavior is caused by a wrong condition in btrfs_get_io_geometry():
> 
> 	if (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
> 		/* Fill using stripe_len */
> 		len = min_t(u64, em->len - offset, max_len);
> 	} else {
> 		len = em->len - offset;
> 	}
> 
> This means, only for SINGLE we will not follow stripe_len.
> 
> However for profiles like RAID1*, DUP, they don't need to bother
> stripe_len.
> 
> This can lead to unnecessry bio split for RAID1*/DUP profiles, and can
> even be a blockage for future zoned RAID support.
> 
> [FIX]
> Introduce one single-use macro, BTRFS_BLOCK_GROUP_STRIPE_MASK, and
> change the condition to only calculate the length using stripe length
> for stripe based profiles.
> 
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Added to misc-next, thanks.

> @@ -6296,6 +6296,10 @@ static bool need_full_stripe(enum btrfs_map_op op)
>  	return (op == BTRFS_MAP_WRITE || op == BTRFS_MAP_GET_READ_MIRRORS);
>  }
>  
> +
> +#define BTRFS_BLOCK_GROUP_STRIPE_MASK	(BTRFS_BLOCK_GROUP_RAID0 |\
> +					 BTRFS_BLOCK_GROUP_RAID10 |\
> +					 BTRFS_BLOCK_GROUP_RAID56_MASK)

I've moved the defintion to the beginning of the file, even if it's for
a single use, such definitions should go to somewhere we can find them
easily and not 6000 lines down the file. The nature of the bit mask is
generic and we could need to use it again for other purposes.
diff mbox series

Patch

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index ae1a4f2a8af6..3dc1de376966 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6296,6 +6296,10 @@  static bool need_full_stripe(enum btrfs_map_op op)
 	return (op == BTRFS_MAP_WRITE || op == BTRFS_MAP_GET_READ_MIRRORS);
 }
 
+
+#define BTRFS_BLOCK_GROUP_STRIPE_MASK	(BTRFS_BLOCK_GROUP_RAID0 |\
+					 BTRFS_BLOCK_GROUP_RAID10 |\
+					 BTRFS_BLOCK_GROUP_RAID56_MASK)
 /*
  * Calculate the geometry of a particular (address, len) tuple. This
  * information is used to calculate how big a particular bio can get before it
@@ -6345,7 +6349,8 @@  int btrfs_get_io_geometry(struct btrfs_fs_info *fs_info, struct extent_map *em,
 	stripe_offset = offset - stripe_offset;
 	data_stripes = nr_data_stripes(map);
 
-	if (map->type & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
+	/* Only stripe based profiles needs to check against stripe length. */
+	if (map->type & BTRFS_BLOCK_GROUP_STRIPE_MASK) {
 		u64 max_len = stripe_len - stripe_offset;
 
 		/*