diff mbox series

[v2,2/4] btrfs: small improvement for btrfs_io_context structure

Message ID 63c3c509ec42d83e8038b33e2f21e036c591fe0b.1675743217.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: reduce the memory usage for btrfs_io_context, and reduce its variable sized members | expand

Commit Message

Qu Wenruo Feb. 7, 2023, 4:26 a.m. UTC
That structure is our ultimate objective for all __btrfs_map_block()
related functions.

We have some hard to understand members, like tgtdev_map, but without
any comments.

This patch will improve the situation by:

- Add extra comments for num_stripes, mirror_num, num_tgtdevs and
  tgtdev_map[]
  Especially for the last two members, add a dedicated (thus very long)
  comments for them, with example to explain it.

- Shrink those int members to u16.
  In fact our on-disk format is only using u16 for num_stripes, thus
  no need to go int at all.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/volumes.c | 16 ++++++++------
 fs/btrfs/volumes.h | 53 +++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 57 insertions(+), 12 deletions(-)

Comments

David Sterba Feb. 15, 2023, 8:02 p.m. UTC | #1
On Tue, Feb 07, 2023 at 12:26:13PM +0800, Qu Wenruo wrote:
> That structure is our ultimate objective for all __btrfs_map_block()
> related functions.
> 
> We have some hard to understand members, like tgtdev_map, but without
> any comments.
> 
> This patch will improve the situation by:
> 
> - Add extra comments for num_stripes, mirror_num, num_tgtdevs and
>   tgtdev_map[]
>   Especially for the last two members, add a dedicated (thus very long)
>   comments for them, with example to explain it.
> 
> - Shrink those int members to u16.
>   In fact our on-disk format is only using u16 for num_stripes, thus
>   no need to go int at all.

Note that u16 is maybe good for space saving in structures but otherwise
it's a type that's a bit clumsy for CPU to handle. Int for passing it
around does not require masking or other sorts of conversions when
there's arithmetic done. This can be cleaned up later with close
inspection of the effects, so far we have u16 for fs_info::csum_type or
some item lengths.
Qu Wenruo Feb. 17, 2023, 5:03 a.m. UTC | #2
On 2023/2/16 04:02, David Sterba wrote:
> On Tue, Feb 07, 2023 at 12:26:13PM +0800, Qu Wenruo wrote:
>> That structure is our ultimate objective for all __btrfs_map_block()
>> related functions.
>>
>> We have some hard to understand members, like tgtdev_map, but without
>> any comments.
>>
>> This patch will improve the situation by:
>>
>> - Add extra comments for num_stripes, mirror_num, num_tgtdevs and
>>    tgtdev_map[]
>>    Especially for the last two members, add a dedicated (thus very long)
>>    comments for them, with example to explain it.
>>
>> - Shrink those int members to u16.
>>    In fact our on-disk format is only using u16 for num_stripes, thus
>>    no need to go int at all.
> 
> Note that u16 is maybe good for space saving in structures but otherwise
> it's a type that's a bit clumsy for CPU to handle. Int for passing it
> around does not require masking or other sorts of conversions when
> there's arithmetic done. This can be cleaned up later with close
> inspection of the effects, so far we have u16 for fs_info::csum_type or
> some item lengths.

I'm not sure if the extra masking for CPU is a big deal.

If the compiler choose to do extra padding and not saving much space, 
I'll still stick to u16.

 From my recent learning on Rust, one of the best practice is the strong 
size limits on their variables.
For a lot of usages, we shouldn't really need to 32bit values, and in 
that case, I strongly prefer narrower variables.

If there is a better way to let compiler do 32bit operations while still 
do a u16 size limits, I'm pretty happy to follow.

Otherwise I think the extra CPU time (if any) is still worthy for a more 
explicit size limit.


BTW, I'll refresh this series along with its dependency patchset "btrfs: 
reduce div64 calls for __btrfs_map_block() and its variants".
As I guess you didn't apply the dependency first because quite some 
conflicts related to HCH's io_geometry cleanup series.

And I'll use patches from for-next branches to keep all your fixes.

Thanks,
Qu
David Sterba Feb. 23, 2023, 8:55 p.m. UTC | #3
On Fri, Feb 17, 2023 at 01:03:28PM +0800, Qu Wenruo wrote:
> 
> 
> On 2023/2/16 04:02, David Sterba wrote:
> > On Tue, Feb 07, 2023 at 12:26:13PM +0800, Qu Wenruo wrote:
> >> That structure is our ultimate objective for all __btrfs_map_block()
> >> related functions.
> >>
> >> We have some hard to understand members, like tgtdev_map, but without
> >> any comments.
> >>
> >> This patch will improve the situation by:
> >>
> >> - Add extra comments for num_stripes, mirror_num, num_tgtdevs and
> >>    tgtdev_map[]
> >>    Especially for the last two members, add a dedicated (thus very long)
> >>    comments for them, with example to explain it.
> >>
> >> - Shrink those int members to u16.
> >>    In fact our on-disk format is only using u16 for num_stripes, thus
> >>    no need to go int at all.
> > 
> > Note that u16 is maybe good for space saving in structures but otherwise
> > it's a type that's a bit clumsy for CPU to handle. Int for passing it
> > around does not require masking or other sorts of conversions when
> > there's arithmetic done. This can be cleaned up later with close
> > inspection of the effects, so far we have u16 for fs_info::csum_type or
> > some item lengths.
> 
> I'm not sure if the extra masking for CPU is a big deal.

It's a microoptimization so it's always a question if it's worth or not.
More instructions to do the same thing fill the i-cache and on hot paths
it counts. We have a lot of IO in btrfs code so we are not focused on
such things. In this case I don't want to let us use u16 freely as it's
not a native type and any operation on the data will lead to asm code
bloat here and there.

I measure size of final .ko object for many patches to see effects of
various changes and the size tendency is going upward. This over time
makes the code less effective regarding CPU.

> If the compiler choose to do extra padding and not saving much space, 
> I'll still stick to u16.
> 
>  From my recent learning on Rust, one of the best practice is the strong 
> size limits on their variables.
> For a lot of usages, we shouldn't really need to 32bit values, and in 
> that case, I strongly prefer narrower variables.

I'd prefer to keep using native CPU types so that the narrow types are
in structures either memory or disk but the calculations done in natural
types and then checked for eventual overflows, instead of hoping that
silent overflows won't affect.

> If there is a better way to let compiler do 32bit operations while still 
> do a u16 size limits, I'm pretty happy to follow.

read u16 -> int/u32 -> do the calculations -> check/assert -> write back
to u16

We have do that explicitly for validaing input, that rust does something
else won't help us.

> Otherwise I think the extra CPU time (if any) is still worthy for a more 
> explicit size limit.

Yeah it's not directly measurable and would depend on other things like
cachelines an internal CPU state at the time of executing but my point
is to think about such changes and not let inefficiencies silently creep
in.
diff mbox series

Patch

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index b5a6262092d1..ebdb4261bf12 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5884,16 +5884,18 @@  static void sort_parity_stripes(struct btrfs_io_context *bioc, int num_stripes)
 }
 
 static struct btrfs_io_context *alloc_btrfs_io_context(struct btrfs_fs_info *fs_info,
-						       int total_stripes,
-						       int real_stripes)
+						       u16 total_stripes,
+						       u16 real_stripes)
 {
-	struct btrfs_io_context *bioc = kzalloc(
+	struct btrfs_io_context *bioc;
+
+	bioc = kzalloc(
 		 /* The size of btrfs_io_context */
 		sizeof(struct btrfs_io_context) +
 		/* Plus the variable array for the stripes */
 		sizeof(struct btrfs_io_stripe) * (total_stripes) +
 		/* Plus the variable array for the tgt dev */
-		sizeof(int) * (real_stripes) +
+		sizeof(u16) * (real_stripes) +
 		/*
 		 * Plus the raid_map, which includes both the tgt dev
 		 * and the stripes.
@@ -5907,7 +5909,7 @@  static struct btrfs_io_context *alloc_btrfs_io_context(struct btrfs_fs_info *fs_
 	refcount_set(&bioc->refs, 1);
 
 	bioc->fs_info = fs_info;
-	bioc->tgtdev_map = (int *)(bioc->stripes + total_stripes);
+	bioc->tgtdev_map = (u16 *)(bioc->stripes + total_stripes);
 	bioc->raid_map = (u64 *)(bioc->tgtdev_map + real_stripes);
 
 	return bioc;
@@ -6377,12 +6379,12 @@  int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
 	int mirror_num = (mirror_num_ret ? *mirror_num_ret : 0);
 	int num_stripes;
 	int max_errors = 0;
-	int tgtdev_indexes = 0;
 	struct btrfs_io_context *bioc = NULL;
 	struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace;
 	int dev_replace_is_ongoing = 0;
-	int num_alloc_stripes;
 	int patch_the_first_stripe_for_dev_replace = 0;
+	u16 num_alloc_stripes;
+	u16 tgtdev_indexes = 0;
 	u64 physical_to_patch_in_first_stripe = 0;
 	u64 raid56_full_stripe_start = (u64)-1;
 	struct btrfs_io_geometry geom;
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 46f33b66d85e..bf7e7c08d8f7 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -422,11 +422,54 @@  struct btrfs_io_context {
 	u64 map_type; /* get from map_lookup->type */
 	struct bio *orig_bio;
 	atomic_t error;
-	int max_errors;
-	int num_stripes;
-	int mirror_num;
-	int num_tgtdevs;
-	int *tgtdev_map;
+	u16 max_errors;
+
+	/*
+	 * The total number of stripes, including the extra duplicated
+	 * stripe for replace.
+	 */
+	u16 num_stripes;
+
+	/*
+	 * The mirror_num of this bioc.
+	 *
+	 * This is for reads which uses 0 as mirror_num, thus we should
+	 * return a valid mirror_num (>0) for the reader.
+	 */
+	u16 mirror_num;
+
+	/*
+	 * The following two members are for dev-replace case only.
+	 *
+	 * @num_tgtdevs:	Number of duplicated stripes which needs to be
+	 *			written to replace target.
+	 *			Should be <= 2 (2 for DUP, otherwise <= 1).
+	 * @tgtdev_map:		The array indicates where the duplicated stripes
+	 *			are from. The size is the number of original
+	 *			stripes (num_stripes - num_tgtdevs).
+	 *
+	 * The @tgtdev_map[] array is mostly for RAID56 cases.
+	 * As non-RAID56 stripes share the same contents for the mapped range,
+	 * thus no need to bother where the duplicated ones are from.
+	 *
+	 * But for RAID56 case, all stripes contain different contents, thus
+	 * must need a way to know the mapping.
+	 *
+	 * There is an example for the two members, using a RAID5 write:
+	 * num_stripes:		4 (3 + 1 duplicated write)
+	 * stripes[0]:		dev = devid 1, physical = X
+	 * stripes[1]:		dev = devid 2, physical = Y
+	 * stripes[2]:		dev = devid 3, physical = Z
+	 * stripes[3]:		dev = devid 0, physical = Y
+	 *
+	 * num_tgtdevs = 1
+	 * tgtdev_map[0] = 0	<- Means stripes[0] is not involved in replace.
+	 * tgtdev_map[1] = 3	<- Means stripes[1] is involved in replace,
+	 *			   and it's duplicated to stripes[3].
+	 * tgtdev_map[2] = 0	<- Means stripes[2] is not involved in replace.
+	 */
+	u16 num_tgtdevs;
+	u16 *tgtdev_map;
 	/*
 	 * logical block numbers for the start of each stripe
 	 * The last one or two are p/q.  These are sorted,