diff mbox series

[v2,4/4] btrfs: replace btrfs_io_context::raid_map[] with a fixed u64 value

Message ID f82eed6746d19cf3bea15631a120648eadf20852.1675743217.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: reduce the memory usage for btrfs_io_context, and reduce its variable sized members | expand

Commit Message

Qu Wenruo Feb. 7, 2023, 4:26 a.m. UTC
In btrfs_io_context structure, we have a pointer raid_map, which is to
indicate the logical bytenr for each stripe.

But considering we always call sort_parity_stripes(), the result
raid_map[] is always sorted, thus raid_map[0] is always the logical
bytenr of the full stripe.

So why we waste the space and time (for sorting) for raid_map[]?

This patch will replace btrfs_io_context::raid_map with a single u64
number, full_stripe_start, by:

- Replace btrfs_io_context::raid_map with full_stripe_start

- Replace call sites using raid_map[0] to use full_stripe_start

- Replace call sites using raid_map[i] to compare with nr_data_stripes.

The benefits are:

- Less memory wasted on raid_map
  It's sizeof(u64) * num_stripes vs size(u64).
  It's always a save for at least one u64, and the benefit grows larger
  with num_stripes.

- No more weird alloc_btrfs_io_context() behavior
  As there is only one fixed size + one variable length array.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/raid56.c            | 32 ++++++-------
 fs/btrfs/scrub.c             | 26 ++++++-----
 fs/btrfs/volumes.c           | 87 +++++++++++++++---------------------
 fs/btrfs/volumes.h           | 18 ++++++--
 include/trace/events/btrfs.h |  2 +-
 5 files changed, 81 insertions(+), 84 deletions(-)

Comments

David Sterba Feb. 15, 2023, 8:07 p.m. UTC | #1
On Tue, Feb 07, 2023 at 12:26:15PM +0800, Qu Wenruo wrote:
> +		/* RAID5/6 */
> +		for (i = 0; i < nr_data_stripes; i++) {
> +			const u64 data_stripe_start = full_stripe_logical +
> +						(i << BTRFS_STRIPE_LEN_SHIFT);

BTRFS_STRIPE_LEN_SHIFT is from another patchset and there's no metion of
that dependency in the cover letter. I've converted that back to
(i * BTRFS_STRIPE_LEN).
Qu Wenruo Feb. 16, 2023, 12:23 a.m. UTC | #2
On 2023/2/16 04:07, David Sterba wrote:
> On Tue, Feb 07, 2023 at 12:26:15PM +0800, Qu Wenruo wrote:
>> +		/* RAID5/6 */
>> +		for (i = 0; i < nr_data_stripes; i++) {
>> +			const u64 data_stripe_start = full_stripe_logical +
>> +						(i << BTRFS_STRIPE_LEN_SHIFT);
> 
> BTRFS_STRIPE_LEN_SHIFT is from another patchset and there's no metion of
> that dependency in the cover letter. I've converted that back to
> (i * BTRFS_STRIPE_LEN).

Forgot to mention this series is based on the series: btrfs: reduce 
div64 calls for __btrfs_map_block() and its variants

https://patchwork.kernel.org/project/linux-btrfs/list/?series=719082

I strongly recommend to get them merged in the proper order.

If needed, I can re-send with both series merged with proper patch order.

Thanks,
Qu
Geert Uytterhoeven Feb. 20, 2023, 8:53 a.m. UTC | #3
Hi Qu,

On Tue, 7 Feb 2023, Qu Wenruo wrote:
> In btrfs_io_context structure, we have a pointer raid_map, which is to
> indicate the logical bytenr for each stripe.
>
> But considering we always call sort_parity_stripes(), the result
> raid_map[] is always sorted, thus raid_map[0] is always the logical
> bytenr of the full stripe.
>
> So why we waste the space and time (for sorting) for raid_map[]?
>
> This patch will replace btrfs_io_context::raid_map with a single u64
> number, full_stripe_start, by:
>
> - Replace btrfs_io_context::raid_map with full_stripe_start
>
> - Replace call sites using raid_map[0] to use full_stripe_start
>
> - Replace call sites using raid_map[i] to compare with nr_data_stripes.
>
> The benefits are:
>
> - Less memory wasted on raid_map
>  It's sizeof(u64) * num_stripes vs size(u64).
>  It's always a save for at least one u64, and the benefit grows larger
>  with num_stripes.
>
> - No more weird alloc_btrfs_io_context() behavior
>  As there is only one fixed size + one variable length array.
>
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Thanks for your patch, which is now commit 4a8c6e8a6dc8ae4c ("btrfs:
replace btrfs_io_context::raid_map with a fixed u64 value") in
next-20230220.

noreply@ellerman.id.au reported several build failures when
building for 32-bit platforms:

     ERROR: modpost: "__umoddi3" [fs/btrfs/btrfs.ko] undefined!

> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6556,35 +6532,44 @@ int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
> 	}
> 	bioc->map_type = map->type;
>
> -	for (i = 0; i < num_stripes; i++) {
> -		set_io_stripe(&bioc->stripes[i], map, stripe_index, stripe_offset,
> -			      stripe_nr);
> -		stripe_index++;
> -	}
> -
> -	/* Build raid_map */
> +	/*
> +	 * For RAID56 full map, we need to make sure the stripes[] follows
> +	 * the rule that data stripes are all ordered, then followed with P
> +	 * and Q (if we have).
> +	 *
> +	 * It's still mostly the same as other profiles, just with extra
> +	 * rotataion.
> +	 */
> 	if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK && need_raid_map &&
> 	    (need_full_stripe(op) || mirror_num > 1)) {
> -		u64 tmp;
> -		unsigned rot;
> -
> -		/* Work out the disk rotation on this stripe-set */
> -		rot = stripe_nr % num_stripes;
> -
> -		/* Fill in the logical address of each stripe */
> -		tmp = stripe_nr * data_stripes;
> -		for (i = 0; i < data_stripes; i++)
> -			bioc->raid_map[(i + rot) % num_stripes] =
> -				em->start + ((tmp + i) << BTRFS_STRIPE_LEN_SHIFT);
> -
> -		bioc->raid_map[(i + rot) % map->num_stripes] = RAID5_P_STRIPE;
> -		if (map->type & BTRFS_BLOCK_GROUP_RAID6)
> -			bioc->raid_map[(i + rot + 1) % num_stripes] =
> -				RAID6_Q_STRIPE;
> -
> -		sort_parity_stripes(bioc, num_stripes);
> +		/*
> +		 * For RAID56 @stripe_nr is already the number of full stripes
> +		 * before us, which is also the rotation value (needs to modulo
> +		 * with num_stripes).
> +		 *
> +		 * In this case, we just add @stripe_nr with @i, then do the
> +		 * modulo, to reduce one modulo call.
> +		 */
> +		bioc->full_stripe_logical = em->start +
> +			((stripe_nr * data_stripes) << BTRFS_STRIPE_LEN_SHIFT);
> +		for (i = 0; i < num_stripes; i++) {
> +			set_io_stripe(&bioc->stripes[i], map,
> +				      (i + stripe_nr) % num_stripes,

As stripe_nr is u64, this is a 64-by-32 modulo operation, which
should be implemented using a helper from include/linux/math64.h
instead.

> +				      stripe_offset, stripe_nr);
> +		}
> +	} else {
> +		/*
> +		 * For all other non-RAID56 profiles, just copy the target
> +		 * stripe into the bioc.
> +		 */
> +		for (i = 0; i < num_stripes; i++) {
> +			set_io_stripe(&bioc->stripes[i], map, stripe_index,
> +				      stripe_offset, stripe_nr);
> +			stripe_index++;
> +		}
> 	}
>
> +
> 	if (need_full_stripe(op))
> 		max_errors = btrfs_chunk_max_errors(map);

Gr{oetje,eeting}s,

 						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
 							    -- Linus Torvalds
Qu Wenruo Feb. 20, 2023, 11:50 a.m. UTC | #4
On 2023/2/20 16:53, Geert Uytterhoeven wrote:
>      Hi Qu,
> 
> On Tue, 7 Feb 2023, Qu Wenruo wrote:
>> In btrfs_io_context structure, we have a pointer raid_map, which is to
>> indicate the logical bytenr for each stripe.
>>
>> But considering we always call sort_parity_stripes(), the result
>> raid_map[] is always sorted, thus raid_map[0] is always the logical
>> bytenr of the full stripe.
>>
>> So why we waste the space and time (for sorting) for raid_map[]?
>>
>> This patch will replace btrfs_io_context::raid_map with a single u64
>> number, full_stripe_start, by:
>>
>> - Replace btrfs_io_context::raid_map with full_stripe_start
>>
>> - Replace call sites using raid_map[0] to use full_stripe_start
>>
>> - Replace call sites using raid_map[i] to compare with nr_data_stripes.
>>
>> The benefits are:
>>
>> - Less memory wasted on raid_map
>>  It's sizeof(u64) * num_stripes vs size(u64).
>>  It's always a save for at least one u64, and the benefit grows larger
>>  with num_stripes.
>>
>> - No more weird alloc_btrfs_io_context() behavior
>>  As there is only one fixed size + one variable length array.
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
> 
> Thanks for your patch, which is now commit 4a8c6e8a6dc8ae4c ("btrfs:
> replace btrfs_io_context::raid_map with a fixed u64 value") in
> next-20230220.
> 
> noreply@ellerman.id.au reported several build failures when
> building for 32-bit platforms:
> 
>      ERROR: modpost: "__umoddi3" [fs/btrfs/btrfs.ko] undefined!
> 
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -6556,35 +6532,44 @@ int __btrfs_map_block(struct btrfs_fs_info 
>> *fs_info, enum btrfs_map_op op,
>>     }
>>     bioc->map_type = map->type;
>>
>> -    for (i = 0; i < num_stripes; i++) {
>> -        set_io_stripe(&bioc->stripes[i], map, stripe_index, 
>> stripe_offset,
>> -                  stripe_nr);
>> -        stripe_index++;
>> -    }
>> -
>> -    /* Build raid_map */
>> +    /*
>> +     * For RAID56 full map, we need to make sure the stripes[] follows
>> +     * the rule that data stripes are all ordered, then followed with P
>> +     * and Q (if we have).
>> +     *
>> +     * It's still mostly the same as other profiles, just with extra
>> +     * rotataion.
>> +     */
>>     if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK && need_raid_map &&
>>         (need_full_stripe(op) || mirror_num > 1)) {
>> -        u64 tmp;
>> -        unsigned rot;
>> -
>> -        /* Work out the disk rotation on this stripe-set */
>> -        rot = stripe_nr % num_stripes;
>> -
>> -        /* Fill in the logical address of each stripe */
>> -        tmp = stripe_nr * data_stripes;
>> -        for (i = 0; i < data_stripes; i++)
>> -            bioc->raid_map[(i + rot) % num_stripes] =
>> -                em->start + ((tmp + i) << BTRFS_STRIPE_LEN_SHIFT);
>> -
>> -        bioc->raid_map[(i + rot) % map->num_stripes] = RAID5_P_STRIPE;
>> -        if (map->type & BTRFS_BLOCK_GROUP_RAID6)
>> -            bioc->raid_map[(i + rot + 1) % num_stripes] =
>> -                RAID6_Q_STRIPE;
>> -
>> -        sort_parity_stripes(bioc, num_stripes);
>> +        /*
>> +         * For RAID56 @stripe_nr is already the number of full stripes
>> +         * before us, which is also the rotation value (needs to modulo
>> +         * with num_stripes).
>> +         *
>> +         * In this case, we just add @stripe_nr with @i, then do the
>> +         * modulo, to reduce one modulo call.
>> +         */
>> +        bioc->full_stripe_logical = em->start +
>> +            ((stripe_nr * data_stripes) << BTRFS_STRIPE_LEN_SHIFT);
>> +        for (i = 0; i < num_stripes; i++) {
>> +            set_io_stripe(&bioc->stripes[i], map,
>> +                      (i + stripe_nr) % num_stripes,
> 
> As stripe_nr is u64, this is a 64-by-32 modulo operation, which
> should be implemented using a helper from include/linux/math64.h
> instead.

This is an older version, in the latest version, the @stripe_nr variable 
is also u32, and I tried compiling the latest branch with i686, it 
doesn't cause any u64 division problems anymore.

You can find the latest branch in either github or from the mailling list:

https://github.com/adam900710/linux/tree/map_block_refactor

https://lore.kernel.org/linux-btrfs/cover.1676611535.git.wqu@suse.com/

Thanks,
Qu

> 
>> +                      stripe_offset, stripe_nr);
>> +        }
>> +    } else {
>> +        /*
>> +         * For all other non-RAID56 profiles, just copy the target
>> +         * stripe into the bioc.
>> +         */
>> +        for (i = 0; i < num_stripes; i++) {
>> +            set_io_stripe(&bioc->stripes[i], map, stripe_index,
>> +                      stripe_offset, stripe_nr);
>> +            stripe_index++;
>> +        }
>>     }
>>
>> +
>>     if (need_full_stripe(op))
>>         max_errors = btrfs_chunk_max_errors(map);
> 
> Gr{oetje,eeting}s,
> 
>                          Geert
> 
> -- 
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
> geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. 
> But
> when I'm talking to journalists I just say "programmer" or something 
> like that.
>                                  -- Linus Torvalds
Geert Uytterhoeven Feb. 20, 2023, 12:14 p.m. UTC | #5
Hi Qu,

On Mon, Feb 20, 2023 at 12:50 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> On 2023/2/20 16:53, Geert Uytterhoeven wrote:
> > On Tue, 7 Feb 2023, Qu Wenruo wrote:
> >> In btrfs_io_context structure, we have a pointer raid_map, which is to
> >> indicate the logical bytenr for each stripe.
> >>
> >> But considering we always call sort_parity_stripes(), the result
> >> raid_map[] is always sorted, thus raid_map[0] is always the logical
> >> bytenr of the full stripe.
> >>
> >> So why we waste the space and time (for sorting) for raid_map[]?
> >>
> >> This patch will replace btrfs_io_context::raid_map with a single u64
> >> number, full_stripe_start, by:
> >>
> >> - Replace btrfs_io_context::raid_map with full_stripe_start
> >>
> >> - Replace call sites using raid_map[0] to use full_stripe_start
> >>
> >> - Replace call sites using raid_map[i] to compare with nr_data_stripes.
> >>
> >> The benefits are:
> >>
> >> - Less memory wasted on raid_map
> >>  It's sizeof(u64) * num_stripes vs size(u64).
> >>  It's always a save for at least one u64, and the benefit grows larger
> >>  with num_stripes.
> >>
> >> - No more weird alloc_btrfs_io_context() behavior
> >>  As there is only one fixed size + one variable length array.
> >>
> >> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >
> > Thanks for your patch, which is now commit 4a8c6e8a6dc8ae4c ("btrfs:
> > replace btrfs_io_context::raid_map with a fixed u64 value") in
> > next-20230220.
> >
> > noreply@ellerman.id.au reported several build failures when
> > building for 32-bit platforms:
> >
> >      ERROR: modpost: "__umoddi3" [fs/btrfs/btrfs.ko] undefined!
> >
> >> --- a/fs/btrfs/volumes.c
> >> +++ b/fs/btrfs/volumes.c
> >> @@ -6556,35 +6532,44 @@ int __btrfs_map_block(struct btrfs_fs_info
> >> *fs_info, enum btrfs_map_op op,
> >>     }
> >>     bioc->map_type = map->type;
> >>
> >> -    for (i = 0; i < num_stripes; i++) {
> >> -        set_io_stripe(&bioc->stripes[i], map, stripe_index,
> >> stripe_offset,
> >> -                  stripe_nr);
> >> -        stripe_index++;
> >> -    }
> >> -
> >> -    /* Build raid_map */
> >> +    /*
> >> +     * For RAID56 full map, we need to make sure the stripes[] follows
> >> +     * the rule that data stripes are all ordered, then followed with P
> >> +     * and Q (if we have).
> >> +     *
> >> +     * It's still mostly the same as other profiles, just with extra
> >> +     * rotataion.
> >> +     */
> >>     if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK && need_raid_map &&
> >>         (need_full_stripe(op) || mirror_num > 1)) {
> >> -        u64 tmp;
> >> -        unsigned rot;
> >> -
> >> -        /* Work out the disk rotation on this stripe-set */
> >> -        rot = stripe_nr % num_stripes;
> >> -
> >> -        /* Fill in the logical address of each stripe */
> >> -        tmp = stripe_nr * data_stripes;
> >> -        for (i = 0; i < data_stripes; i++)
> >> -            bioc->raid_map[(i + rot) % num_stripes] =
> >> -                em->start + ((tmp + i) << BTRFS_STRIPE_LEN_SHIFT);
> >> -
> >> -        bioc->raid_map[(i + rot) % map->num_stripes] = RAID5_P_STRIPE;
> >> -        if (map->type & BTRFS_BLOCK_GROUP_RAID6)
> >> -            bioc->raid_map[(i + rot + 1) % num_stripes] =
> >> -                RAID6_Q_STRIPE;
> >> -
> >> -        sort_parity_stripes(bioc, num_stripes);
> >> +        /*
> >> +         * For RAID56 @stripe_nr is already the number of full stripes
> >> +         * before us, which is also the rotation value (needs to modulo
> >> +         * with num_stripes).
> >> +         *
> >> +         * In this case, we just add @stripe_nr with @i, then do the
> >> +         * modulo, to reduce one modulo call.
> >> +         */
> >> +        bioc->full_stripe_logical = em->start +
> >> +            ((stripe_nr * data_stripes) << BTRFS_STRIPE_LEN_SHIFT);
> >> +        for (i = 0; i < num_stripes; i++) {
> >> +            set_io_stripe(&bioc->stripes[i], map,
> >> +                      (i + stripe_nr) % num_stripes,
> >
> > As stripe_nr is u64, this is a 64-by-32 modulo operation, which
> > should be implemented using a helper from include/linux/math64.h
> > instead.
>
> This is an older version, in the latest version, the @stripe_nr variable
> is also u32, and I tried compiling the latest branch with i686, it
> doesn't cause any u64 division problems anymore.
>
> You can find the latest branch in either github or from the mailling list:
>
> https://github.com/adam900710/linux/tree/map_block_refactor
>
> https://lore.kernel.org/linux-btrfs/cover.1676611535.git.wqu@suse.com/

So the older version was "v2", and the latest version had no
version indicator, nor changelog, thus assuming v1?
No surprise people end up applying the wrong version...

Gr{oetje,eeting}s,

                        Geert
Qu Wenruo Feb. 21, 2023, 12:09 a.m. UTC | #6
On 2023/2/20 20:14, Geert Uytterhoeven wrote:
> Hi Qu,
> 
> On Mon, Feb 20, 2023 at 12:50 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> On 2023/2/20 16:53, Geert Uytterhoeven wrote:
>>> On Tue, 7 Feb 2023, Qu Wenruo wrote:
>>>> In btrfs_io_context structure, we have a pointer raid_map, which is to
>>>> indicate the logical bytenr for each stripe.
>>>>
>>>> But considering we always call sort_parity_stripes(), the result
>>>> raid_map[] is always sorted, thus raid_map[0] is always the logical
>>>> bytenr of the full stripe.
>>>>
>>>> So why we waste the space and time (for sorting) for raid_map[]?
>>>>
>>>> This patch will replace btrfs_io_context::raid_map with a single u64
>>>> number, full_stripe_start, by:
>>>>
>>>> - Replace btrfs_io_context::raid_map with full_stripe_start
>>>>
>>>> - Replace call sites using raid_map[0] to use full_stripe_start
>>>>
>>>> - Replace call sites using raid_map[i] to compare with nr_data_stripes.
>>>>
>>>> The benefits are:
>>>>
>>>> - Less memory wasted on raid_map
>>>>   It's sizeof(u64) * num_stripes vs size(u64).
>>>>   It's always a save for at least one u64, and the benefit grows larger
>>>>   with num_stripes.
>>>>
>>>> - No more weird alloc_btrfs_io_context() behavior
>>>>   As there is only one fixed size + one variable length array.
>>>>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>
>>> Thanks for your patch, which is now commit 4a8c6e8a6dc8ae4c ("btrfs:
>>> replace btrfs_io_context::raid_map with a fixed u64 value") in
>>> next-20230220.
>>>
>>> noreply@ellerman.id.au reported several build failures when
>>> building for 32-bit platforms:
>>>
>>>       ERROR: modpost: "__umoddi3" [fs/btrfs/btrfs.ko] undefined!
>>>
>>>> --- a/fs/btrfs/volumes.c
>>>> +++ b/fs/btrfs/volumes.c
>>>> @@ -6556,35 +6532,44 @@ int __btrfs_map_block(struct btrfs_fs_info
>>>> *fs_info, enum btrfs_map_op op,
>>>>      }
>>>>      bioc->map_type = map->type;
>>>>
>>>> -    for (i = 0; i < num_stripes; i++) {
>>>> -        set_io_stripe(&bioc->stripes[i], map, stripe_index,
>>>> stripe_offset,
>>>> -                  stripe_nr);
>>>> -        stripe_index++;
>>>> -    }
>>>> -
>>>> -    /* Build raid_map */
>>>> +    /*
>>>> +     * For RAID56 full map, we need to make sure the stripes[] follows
>>>> +     * the rule that data stripes are all ordered, then followed with P
>>>> +     * and Q (if we have).
>>>> +     *
>>>> +     * It's still mostly the same as other profiles, just with extra
>>>> +     * rotataion.
>>>> +     */
>>>>      if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK && need_raid_map &&
>>>>          (need_full_stripe(op) || mirror_num > 1)) {
>>>> -        u64 tmp;
>>>> -        unsigned rot;
>>>> -
>>>> -        /* Work out the disk rotation on this stripe-set */
>>>> -        rot = stripe_nr % num_stripes;
>>>> -
>>>> -        /* Fill in the logical address of each stripe */
>>>> -        tmp = stripe_nr * data_stripes;
>>>> -        for (i = 0; i < data_stripes; i++)
>>>> -            bioc->raid_map[(i + rot) % num_stripes] =
>>>> -                em->start + ((tmp + i) << BTRFS_STRIPE_LEN_SHIFT);
>>>> -
>>>> -        bioc->raid_map[(i + rot) % map->num_stripes] = RAID5_P_STRIPE;
>>>> -        if (map->type & BTRFS_BLOCK_GROUP_RAID6)
>>>> -            bioc->raid_map[(i + rot + 1) % num_stripes] =
>>>> -                RAID6_Q_STRIPE;
>>>> -
>>>> -        sort_parity_stripes(bioc, num_stripes);
>>>> +        /*
>>>> +         * For RAID56 @stripe_nr is already the number of full stripes
>>>> +         * before us, which is also the rotation value (needs to modulo
>>>> +         * with num_stripes).
>>>> +         *
>>>> +         * In this case, we just add @stripe_nr with @i, then do the
>>>> +         * modulo, to reduce one modulo call.
>>>> +         */
>>>> +        bioc->full_stripe_logical = em->start +
>>>> +            ((stripe_nr * data_stripes) << BTRFS_STRIPE_LEN_SHIFT);
>>>> +        for (i = 0; i < num_stripes; i++) {
>>>> +            set_io_stripe(&bioc->stripes[i], map,
>>>> +                      (i + stripe_nr) % num_stripes,
>>>
>>> As stripe_nr is u64, this is a 64-by-32 modulo operation, which
>>> should be implemented using a helper from include/linux/math64.h
>>> instead.
>>
>> This is an older version, in the latest version, the @stripe_nr variable
>> is also u32, and I tried compiling the latest branch with i686, it
>> doesn't cause any u64 division problems anymore.
>>
>> You can find the latest branch in either github or from the mailling list:
>>
>> https://github.com/adam900710/linux/tree/map_block_refactor
>>
>> https://lore.kernel.org/linux-btrfs/cover.1676611535.git.wqu@suse.com/
> 
> So the older version was "v2", and the latest version had no
> version indicator, nor changelog, thus assuming v1?
> No surprise people end up applying the wrong version...

The previous version is two separate patchsets, the new one is the 
merged one.

And I sent the merged version because the dependency problem and 
conflicts, and since it's the merged version, no changelog based on 
previous version.

Thanks,
Qu

> 
> Gr{oetje,eeting}s,
> 
>                          Geert
>
diff mbox series

Patch

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 4332a8e165a7..febde169b68a 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -202,7 +202,7 @@  static void cache_rbio_pages(struct btrfs_raid_bio *rbio)
  */
 static int rbio_bucket(struct btrfs_raid_bio *rbio)
 {
-	u64 num = rbio->bioc->raid_map[0];
+	u64 num = rbio->bioc->full_stripe_logical;
 
 	/*
 	 * we shift down quite a bit.  We're using byte
@@ -571,7 +571,7 @@  static int rbio_can_merge(struct btrfs_raid_bio *last,
 	    test_bit(RBIO_CACHE_BIT, &cur->flags))
 		return 0;
 
-	if (last->bioc->raid_map[0] != cur->bioc->raid_map[0])
+	if (last->bioc->full_stripe_logical != cur->bioc->full_stripe_logical)
 		return 0;
 
 	/* we can't merge with different operations */
@@ -666,7 +666,8 @@  static noinline int lock_stripe_add(struct btrfs_raid_bio *rbio)
 
 	spin_lock_irqsave(&h->lock, flags);
 	list_for_each_entry(cur, &h->hash_list, hash_list) {
-		if (cur->bioc->raid_map[0] != rbio->bioc->raid_map[0])
+		if (cur->bioc->full_stripe_logical !=
+		    rbio->bioc->full_stripe_logical)
 			continue;
 
 		spin_lock(&cur->bio_list_lock);
@@ -1119,7 +1120,7 @@  static void index_one_bio(struct btrfs_raid_bio *rbio, struct bio *bio)
 	struct bio_vec bvec;
 	struct bvec_iter iter;
 	u32 offset = (bio->bi_iter.bi_sector << SECTOR_SHIFT) -
-		     rbio->bioc->raid_map[0];
+		     rbio->bioc->full_stripe_logical;
 
 	bio_for_each_segment(bvec, bio, iter) {
 		u32 bvec_offset;
@@ -1338,7 +1339,7 @@  static void set_rbio_range_error(struct btrfs_raid_bio *rbio, struct bio *bio)
 {
 	struct btrfs_fs_info *fs_info = rbio->bioc->fs_info;
 	u32 offset = (bio->bi_iter.bi_sector << SECTOR_SHIFT) -
-		     rbio->bioc->raid_map[0];
+		     rbio->bioc->full_stripe_logical;
 	int total_nr_sector = offset >> fs_info->sectorsize_bits;
 
 	ASSERT(total_nr_sector < rbio->nr_data * rbio->stripe_nsectors);
@@ -1648,7 +1649,7 @@  static void rbio_add_bio(struct btrfs_raid_bio *rbio, struct bio *orig_bio)
 {
 	const struct btrfs_fs_info *fs_info = rbio->bioc->fs_info;
 	const u64 orig_logical = orig_bio->bi_iter.bi_sector << SECTOR_SHIFT;
-	const u64 full_stripe_start = rbio->bioc->raid_map[0];
+	const u64 full_stripe_start = rbio->bioc->full_stripe_logical;
 	const u32 orig_len = orig_bio->bi_iter.bi_size;
 	const u32 sectorsize = fs_info->sectorsize;
 	u64 cur_logical;
@@ -1842,9 +1843,8 @@  static int recover_vertical(struct btrfs_raid_bio *rbio, int sector_nr,
 		 * here due to a crc mismatch and we can't give them the
 		 * data they want.
 		 */
-		if (rbio->bioc->raid_map[failb] == RAID6_Q_STRIPE) {
-			if (rbio->bioc->raid_map[faila] ==
-			    RAID5_P_STRIPE)
+		if (failb == rbio->real_stripes - 1) {
+			if (faila == rbio->real_stripes - 2)
 				/*
 				 * Only P and Q are corrupted.
 				 * We only care about data stripes recovery,
@@ -1858,7 +1858,7 @@  static int recover_vertical(struct btrfs_raid_bio *rbio, int sector_nr,
 			goto pstripe;
 		}
 
-		if (rbio->bioc->raid_map[failb] == RAID5_P_STRIPE) {
+		if (failb == rbio->real_stripes - 2) {
 			raid6_datap_recov(rbio->real_stripes, sectorsize,
 					  faila, pointers);
 		} else {
@@ -2156,8 +2156,8 @@  static void fill_data_csums(struct btrfs_raid_bio *rbio)
 {
 	struct btrfs_fs_info *fs_info = rbio->bioc->fs_info;
 	struct btrfs_root *csum_root = btrfs_csum_root(fs_info,
-						       rbio->bioc->raid_map[0]);
-	const u64 start = rbio->bioc->raid_map[0];
+					rbio->bioc->full_stripe_logical);
+	const u64 start = rbio->bioc->full_stripe_logical;
 	const u32 len = (rbio->nr_data * rbio->stripe_nsectors) <<
 			fs_info->sectorsize_bits;
 	int ret;
@@ -2205,7 +2205,7 @@  static void fill_data_csums(struct btrfs_raid_bio *rbio)
 	 */
 	btrfs_warn_rl(fs_info,
 "sub-stripe write for full stripe %llu is not safe, failed to get csum: %d",
-			rbio->bioc->raid_map[0], ret);
+			rbio->bioc->full_stripe_logical, ret);
 no_csum:
 	kfree(rbio->csum_buf);
 	bitmap_free(rbio->csum_bitmap);
@@ -2467,10 +2467,10 @@  void raid56_add_scrub_pages(struct btrfs_raid_bio *rbio, struct page *page,
 	int stripe_offset;
 	int index;
 
-	ASSERT(logical >= rbio->bioc->raid_map[0]);
-	ASSERT(logical + sectorsize <= rbio->bioc->raid_map[0] +
+	ASSERT(logical >= rbio->bioc->full_stripe_logical);
+	ASSERT(logical + sectorsize <= rbio->bioc->full_stripe_logical +
 				       BTRFS_STRIPE_LEN * rbio->nr_data);
-	stripe_offset = (int)(logical - rbio->bioc->raid_map[0]);
+	stripe_offset = (int)(logical - rbio->bioc->full_stripe_logical);
 	index = stripe_offset / sectorsize;
 	rbio->bio_sectors[index].page = page;
 	rbio->bio_sectors[index].pgoff = pgoff;
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index 83de9fecc7b6..aacf1dd9297c 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -1430,7 +1430,7 @@  static inline int scrub_nr_raid_mirrors(struct btrfs_io_context *bioc)
 }
 
 static inline void scrub_stripe_index_and_offset(u64 logical, u64 map_type,
-						 u64 *raid_map,
+						 u64 full_stripe_logical,
 						 int nstripes, int mirror,
 						 int *stripe_index,
 						 u64 *stripe_offset)
@@ -1438,19 +1438,21 @@  static inline void scrub_stripe_index_and_offset(u64 logical, u64 map_type,
 	int i;
 
 	if (map_type & BTRFS_BLOCK_GROUP_RAID56_MASK) {
-		/* RAID5/6 */
-		for (i = 0; i < nstripes; i++) {
-			if (raid_map[i] == RAID6_Q_STRIPE ||
-			    raid_map[i] == RAID5_P_STRIPE)
-				continue;
+		const int nr_data_stripes = (map_type & BTRFS_BLOCK_GROUP_RAID5) ?
+					    nstripes - 1 : nstripes - 2;
 
-			if (logical >= raid_map[i] &&
-			    logical < raid_map[i] + BTRFS_STRIPE_LEN)
+		/* RAID5/6 */
+		for (i = 0; i < nr_data_stripes; i++) {
+			const u64 data_stripe_start = full_stripe_logical +
+						(i << BTRFS_STRIPE_LEN_SHIFT);
+
+			if (logical >= data_stripe_start &&
+			    logical < data_stripe_start + BTRFS_STRIPE_LEN)
 				break;
 		}
 
 		*stripe_index = i;
-		*stripe_offset = logical - raid_map[i];
+		*stripe_offset = logical - full_stripe_logical;
 	} else {
 		/* The other RAID type */
 		*stripe_index = mirror;
@@ -1538,7 +1540,7 @@  static int scrub_setup_recheck_block(struct scrub_block *original_sblock,
 
 			scrub_stripe_index_and_offset(logical,
 						      bioc->map_type,
-						      bioc->raid_map,
+						      bioc->full_stripe_logical,
 						      bioc->num_stripes -
 						      bioc->replace_nr_stripes,
 						      mirror_index,
@@ -2398,7 +2400,7 @@  static void scrub_missing_raid56_pages(struct scrub_block *sblock)
 	btrfs_bio_counter_inc_blocked(fs_info);
 	ret = btrfs_map_sblock(fs_info, BTRFS_MAP_GET_READ_MIRRORS, logical,
 			       &length, &bioc);
-	if (ret || !bioc || !bioc->raid_map)
+	if (ret || !bioc)
 		goto bioc_out;
 
 	if (WARN_ON(!sctx->is_dev_replace ||
@@ -3008,7 +3010,7 @@  static void scrub_parity_check_and_repair(struct scrub_parity *sparity)
 	btrfs_bio_counter_inc_blocked(fs_info);
 	ret = btrfs_map_sblock(fs_info, BTRFS_MAP_WRITE, sparity->logic_start,
 			       &length, &bioc);
-	if (ret || !bioc || !bioc->raid_map)
+	if (ret || !bioc)
 		goto bioc_out;
 
 	bio = bio_alloc(NULL, BIO_MAX_VECS, REQ_OP_READ, GFP_NOFS);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 860bb2709778..3a2a256fa9cd 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -5864,25 +5864,6 @@  static int find_live_mirror(struct btrfs_fs_info *fs_info,
 	return preferred_mirror;
 }
 
-/* Bubble-sort the stripe set to put the parity/syndrome stripes last */
-static void sort_parity_stripes(struct btrfs_io_context *bioc, int num_stripes)
-{
-	int i;
-	int again = 1;
-
-	while (again) {
-		again = 0;
-		for (i = 0; i < num_stripes - 1; i++) {
-			/* Swap if parity is on a smaller index */
-			if (bioc->raid_map[i] > bioc->raid_map[i + 1]) {
-				swap(bioc->stripes[i], bioc->stripes[i + 1]);
-				swap(bioc->raid_map[i], bioc->raid_map[i + 1]);
-				again = 1;
-			}
-		}
-	}
-}
-
 static struct btrfs_io_context *alloc_btrfs_io_context(struct btrfs_fs_info *fs_info,
 						       u16 total_stripes)
 {
@@ -5892,12 +5873,7 @@  static struct btrfs_io_context *alloc_btrfs_io_context(struct btrfs_fs_info *fs_
 		 /* The size of btrfs_io_context */
 		sizeof(struct btrfs_io_context) +
 		/* Plus the variable array for the stripes */
-		sizeof(struct btrfs_io_stripe) * (total_stripes) +
-		/*
-		 * Plus the raid_map, which includes both the tgt dev
-		 * and the stripes.
-		 */
-		sizeof(u64) * (total_stripes),
+		sizeof(struct btrfs_io_stripe) * (total_stripes),
 		GFP_NOFS);
 
 	if (!bioc)
@@ -5906,8 +5882,8 @@  static struct btrfs_io_context *alloc_btrfs_io_context(struct btrfs_fs_info *fs_
 	refcount_set(&bioc->refs, 1);
 
 	bioc->fs_info = fs_info;
-	bioc->raid_map = (u64 *)(bioc->stripes + total_stripes);
 	bioc->replace_stripe_src = -1;
+	bioc->full_stripe_logical = (u64)-1;
 
 	return bioc;
 }
@@ -6556,35 +6532,44 @@  int __btrfs_map_block(struct btrfs_fs_info *fs_info, enum btrfs_map_op op,
 	}
 	bioc->map_type = map->type;
 
-	for (i = 0; i < num_stripes; i++) {
-		set_io_stripe(&bioc->stripes[i], map, stripe_index, stripe_offset,
-			      stripe_nr);
-		stripe_index++;
-	}
-
-	/* Build raid_map */
+	/*
+	 * For RAID56 full map, we need to make sure the stripes[] follows
+	 * the rule that data stripes are all ordered, then followed with P
+	 * and Q (if we have).
+	 *
+	 * It's still mostly the same as other profiles, just with extra
+	 * rotataion.
+	 */
 	if (map->type & BTRFS_BLOCK_GROUP_RAID56_MASK && need_raid_map &&
 	    (need_full_stripe(op) || mirror_num > 1)) {
-		u64 tmp;
-		unsigned rot;
-
-		/* Work out the disk rotation on this stripe-set */
-		rot = stripe_nr % num_stripes;
-
-		/* Fill in the logical address of each stripe */
-		tmp = stripe_nr * data_stripes;
-		for (i = 0; i < data_stripes; i++)
-			bioc->raid_map[(i + rot) % num_stripes] =
-				em->start + ((tmp + i) << BTRFS_STRIPE_LEN_SHIFT);
-
-		bioc->raid_map[(i + rot) % map->num_stripes] = RAID5_P_STRIPE;
-		if (map->type & BTRFS_BLOCK_GROUP_RAID6)
-			bioc->raid_map[(i + rot + 1) % num_stripes] =
-				RAID6_Q_STRIPE;
-
-		sort_parity_stripes(bioc, num_stripes);
+		/*
+		 * For RAID56 @stripe_nr is already the number of full stripes
+		 * before us, which is also the rotation value (needs to modulo
+		 * with num_stripes).
+		 *
+		 * In this case, we just add @stripe_nr with @i, then do the
+		 * modulo, to reduce one modulo call.
+		 */
+		bioc->full_stripe_logical = em->start +
+			((stripe_nr * data_stripes) << BTRFS_STRIPE_LEN_SHIFT);
+		for (i = 0; i < num_stripes; i++) {
+			set_io_stripe(&bioc->stripes[i], map,
+				      (i + stripe_nr) % num_stripes,
+				      stripe_offset, stripe_nr);
+		}
+	} else {
+		/*
+		 * For all other non-RAID56 profiles, just copy the target
+		 * stripe into the bioc.
+		 */
+		for (i = 0; i < num_stripes; i++) {
+			set_io_stripe(&bioc->stripes[i], map, stripe_index,
+				      stripe_offset, stripe_nr);
+			stripe_index++;
+		}
 	}
 
+
 	if (need_full_stripe(op))
 		max_errors = btrfs_chunk_max_errors(map);
 
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index f35db441ea9c..0fe21ece1491 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -473,11 +473,21 @@  struct btrfs_io_context {
 	u16 replace_nr_stripes;
 	s16 replace_stripe_src;
 	/*
-	 * logical block numbers for the start of each stripe
-	 * The last one or two are p/q.  These are sorted,
-	 * so raid_map[0] is the start of our full stripe
+	 * Logical bytenr of the full stripe start, only for RAID56 cases.
+	 *
+	 * When this value is set to other than (u64)-1, the stripes[] should
+	 * follow the following pattern:
+	 * (real_stripes = num_stripes - replace_nr_stripes)
+	 * (data_stripes = (is_raid6) ? (real_stripes - 2) : (real_stripes - 1))
+	 *
+	 * stripes[0]:			The first data stripe
+	 * stripes[1]:			The second data stripe
+	 * ...
+	 * stripes[data_stripes - 1]:	The last data stripe
+	 * stripes[data_stripes]:	The P stripe
+	 * stripes[data_stripes + 1]:	The Q stripe (only for RAID6).
 	 */
-	u64 *raid_map;
+	u64 full_stripe_logical;
 	struct btrfs_io_stripe stripes[];
 };
 
diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h
index 75d7d22c3a27..8ea9cea9bfeb 100644
--- a/include/trace/events/btrfs.h
+++ b/include/trace/events/btrfs.h
@@ -2422,7 +2422,7 @@  DECLARE_EVENT_CLASS(btrfs_raid56_bio,
 	),
 
 	TP_fast_assign_btrfs(rbio->bioc->fs_info,
-		__entry->full_stripe	= rbio->bioc->raid_map[0];
+		__entry->full_stripe	= rbio->bioc->full_stripe_logical;
 		__entry->physical	= bio->bi_iter.bi_sector << SECTOR_SHIFT;
 		__entry->len		= bio->bi_iter.bi_size;
 		__entry->opf		= bio_op(bio);