mbox series

[v3,00/12] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror()

Message ID cover.1679278088.git.wqu@suse.com (mailing list archive)
Headers show
Series btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() | expand

Message

Qu Wenruo March 20, 2023, 2:12 a.m. UTC
This series can be found in my github repo:

https://github.com/adam900710/linux/tree/scrub_stripe

It's recommended to fetch from the repo, as our misc-next seems to
change pretty rapidly.

[Changelog]
v3:
- Add a dedicated @fs_info member for btrfs_bio
  Unfortunately although we have a 32 bytes hole between @end_io_work and @bio,
  compiler still choose not to use that hole for whatever reasons.
  Thus this would increase the size of btrfs_bio by 4 bytes.

- Rebased to lastest misc-next

- Fix various while space error not caught by btrfs-workflow

v2:
- Use batched scrub_stripe submission
  This allows much better performance compared to the old scrub code

- Add scrub specific bio layer helpers
  This makes the scrub code to be completely rely on logical bytenr +
  mirror_num.

[PROBLEMS OF OLD SCRUB]

- Too many delayed jumps, making it hard to read
  Even starting from scrub_simple_mirror(), we have the following
  functions:

  scrub_extent()
       |
       v
  scrub_sectors()
       |
       v
  scrub_add_sector_to_rd_bio()
       | endio function
       v
  scrub_bio_end_io()
       | delayed work
       v
  scrub_bio_end_io_worker()
       |
       v
  scrub_block_complete()
       |
       v
  scrub_handle_errored_blocks()
       |
       v
  scrub_recheck_block()
       |
       v
  scrub_repair_sector_from_good_copy()

  Not to mention the hidden jumps in certain branches.

- IOPS inefficient for fragmented extents

  The real block size of scrub read is between 4K and 128K.
  If the extents are not adjacent, the blocksize drops to 4K and would
  be an IOPS disaster.

- All hardcoded to do the logical -> physical mapping by scrub itself
  No usage of any existing bio facilities.
  And even implemented a RAID56 recovery wrapper.

[NEW SCRUB_STRIPE BASED SOLUTION]

- Overall streamlined code base

  queue_scrub_stripe()
     |
     v
  scrub_find_fill_first_stripe()
     |
     v
  done

  Or

  queue_scrub_stripe()
     |
     v
  flush_scrub_stripes()
     |
     v
  scrub_submit_initial_read()
     | endio function
     v
  scrub_read_endio()
     | delayed work
     v
  scrub_stripe_read_repair_worker()
     |
     v
  scrub_verify_one_stripe()
     |
     v
  scrub_stripe_submit_repair_read()
     |
     v
  scrub_write_sectors()
     |
     v
  scrub_stripe_report_errors()

  Only one endio and delayed work, all other work are properly done in a
  sequential workflow.

- Always read in 64KiB block size
  The real blocksize of read starts at 64KiB, and ends at 512K.
  This already results a better performance even for the worst case:

  With patchset:	404.81MiB/s
  Without patchset:	369.30MiB/s

  Around 10% performance improvement on an SATA SSD.

- All logical bytenr/mirror_num based read and write

  With the new single stripe fast path in btrfs_submit_bio(), scrub can
  reuse most of the bio layer code, result much simpler scrub code.

[TODO]

- More testing on zoned devices
  Now the patchset can already pass all scrub/replace groups with
  regular devices.

- More cleanup on RAID56 path
  Now RAID56 still uses some old facility, resulting things like
  scrub_sector and scrub_bio can not be fully cleaned up.

Qu Wenruo (12):
  btrfs: scrub: use dedicated super block verification function to scrub
    one super block
  btrfs: introduce a new helper to submit bio for scrub
  btrfs: introduce a new helper to submit write bio for scrub
  btrfs: scrub: introduce the structure for new BTRFS_STRIPE_LEN based
    interface
  btrfs: scrub: introduce a helper to find and fill the sector info for
    a scrub_stripe
  btrfs: scrub: introduce a helper to verify one metadata
  btrfs: scrub: introduce a helper to verify one scrub_stripe
  btrfs: scrub: introduce the main read repair worker for scrub_stripe
  btrfs: scrub: introduce a writeback helper for scrub_stripe
  btrfs: scrub: introduce error reporting functionality for scrub_stripe
  btrfs: scrub: introduce the helper to queue a stripe for scrub
  btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe
    infrastructure

 fs/btrfs/bio.c       |  144 +++-
 fs/btrfs/bio.h       |   20 +-
 fs/btrfs/file-item.c |    9 +-
 fs/btrfs/file-item.h |    3 +-
 fs/btrfs/raid56.c    |    2 +-
 fs/btrfs/scrub.c     | 1658 ++++++++++++++++++++++++++++++------------
 6 files changed, 1351 insertions(+), 485 deletions(-)

Comments

David Sterba March 21, 2023, 12:09 a.m. UTC | #1
On Mon, Mar 20, 2023 at 10:12:46AM +0800, Qu Wenruo wrote:
> [TODO]
> 
> - More testing on zoned devices
>   Now the patchset can already pass all scrub/replace groups with
>   regular devices.

I think I noticed some disparity in the old and new code for the zoned
devices. This should be found by testing so I'd add this series to
for-next and see.

> - More cleanup on RAID56 path
>   Now RAID56 still uses some old facility, resulting things like
>   scrub_sector and scrub_bio can not be fully cleaned up.

We can do that incrementally, as long as the raid56 scrub works the
cleanups can be done later, the series changes a lot of code already.

> Qu Wenruo (12):
>   btrfs: scrub: use dedicated super block verification function to scrub
>     one super block
>   btrfs: introduce a new helper to submit bio for scrub
>   btrfs: introduce a new helper to submit write bio for scrub
>   btrfs: scrub: introduce the structure for new BTRFS_STRIPE_LEN based
>     interface
>   btrfs: scrub: introduce a helper to find and fill the sector info for
>     a scrub_stripe
>   btrfs: scrub: introduce a helper to verify one metadata
>   btrfs: scrub: introduce a helper to verify one scrub_stripe
>   btrfs: scrub: introduce the main read repair worker for scrub_stripe
>   btrfs: scrub: introduce a writeback helper for scrub_stripe
>   btrfs: scrub: introduce error reporting functionality for scrub_stripe
>   btrfs: scrub: introduce the helper to queue a stripe for scrub
>   btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe
>     infrastructure

The whole series follows the pattern to first introduce the individual
helpers that can be reviewed separately and then does the switch in one
patch. As the whole scrub IO path is reworked I don't think we can do
much better so clearly separating the old and new logic sounds OK.

One comment I have, the functional switch in the last patch should not
be mixed with deleting of the unused code. As the last patch activates
code from all the previous patches it would also show up in any
bisection as the cause so it would help to narrow the focus only on the
real changes.

There are some minor coding style issues I'll point out under the
patches. I'm assuming that the scrub structure won't change again soon
(the old code is from 3.x times) so let's use this opportunity to make
the style most up to date.
Qu Wenruo March 23, 2023, 6:28 a.m. UTC | #2
On 2023/3/21 08:09, David Sterba wrote:
> On Mon, Mar 20, 2023 at 10:12:46AM +0800, Qu Wenruo wrote:
>> [TODO]
>>
>> - More testing on zoned devices
>>    Now the patchset can already pass all scrub/replace groups with
>>    regular devices.
> 
> I think I noticed some disparity in the old and new code for the zoned
> devices. This should be found by testing so I'd add this series to
> for-next and see.

Just want to be sure, if I want to further update the series (mostly 
style and small cleanups), I should just base all my updates on your 
for-next branch, right?

Thanks,
Qu

> 
>> - More cleanup on RAID56 path
>>    Now RAID56 still uses some old facility, resulting things like
>>    scrub_sector and scrub_bio can not be fully cleaned up.
> 
> We can do that incrementally, as long as the raid56 scrub works the
> cleanups can be done later, the series changes a lot of code already.
> 
>> Qu Wenruo (12):
>>    btrfs: scrub: use dedicated super block verification function to scrub
>>      one super block
>>    btrfs: introduce a new helper to submit bio for scrub
>>    btrfs: introduce a new helper to submit write bio for scrub
>>    btrfs: scrub: introduce the structure for new BTRFS_STRIPE_LEN based
>>      interface
>>    btrfs: scrub: introduce a helper to find and fill the sector info for
>>      a scrub_stripe
>>    btrfs: scrub: introduce a helper to verify one metadata
>>    btrfs: scrub: introduce a helper to verify one scrub_stripe
>>    btrfs: scrub: introduce the main read repair worker for scrub_stripe
>>    btrfs: scrub: introduce a writeback helper for scrub_stripe
>>    btrfs: scrub: introduce error reporting functionality for scrub_stripe
>>    btrfs: scrub: introduce the helper to queue a stripe for scrub
>>    btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe
>>      infrastructure
> 
> The whole series follows the pattern to first introduce the individual
> helpers that can be reviewed separately and then does the switch in one
> patch. As the whole scrub IO path is reworked I don't think we can do
> much better so clearly separating the old and new logic sounds OK.
> 
> One comment I have, the functional switch in the last patch should not
> be mixed with deleting of the unused code. As the last patch activates
> code from all the previous patches it would also show up in any
> bisection as the cause so it would help to narrow the focus only on the
> real changes.
> 
> There are some minor coding style issues I'll point out under the
> patches. I'm assuming that the scrub structure won't change again soon
> (the old code is from 3.x times) so let's use this opportunity to make
> the style most up to date.
David Sterba March 23, 2023, 5:51 p.m. UTC | #3
On Thu, Mar 23, 2023 at 02:28:16PM +0800, Qu Wenruo wrote:
> 
> 
> On 2023/3/21 08:09, David Sterba wrote:
> > On Mon, Mar 20, 2023 at 10:12:46AM +0800, Qu Wenruo wrote:
> >> [TODO]
> >>
> >> - More testing on zoned devices
> >>    Now the patchset can already pass all scrub/replace groups with
> >>    regular devices.
> > 
> > I think I noticed some disparity in the old and new code for the zoned
> > devices. This should be found by testing so I'd add this series to
> > for-next and see.
> 
> Just want to be sure, if I want to further update the series (mostly 
> style and small cleanups), I should just base all my updates on your 
> for-next branch, right?

No, please base it on misc-next. For-next is for an early testing but
patchsets can be updated or completely dropped.
Qu Wenruo March 24, 2023, 12:42 a.m. UTC | #4
On 2023/3/24 01:51, David Sterba wrote:
> On Thu, Mar 23, 2023 at 02:28:16PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2023/3/21 08:09, David Sterba wrote:
>>> On Mon, Mar 20, 2023 at 10:12:46AM +0800, Qu Wenruo wrote:
>>>> [TODO]
>>>>
>>>> - More testing on zoned devices
>>>>     Now the patchset can already pass all scrub/replace groups with
>>>>     regular devices.
>>>
>>> I think I noticed some disparity in the old and new code for the zoned
>>> devices. This should be found by testing so I'd add this series to
>>> for-next and see.
>>
>> Just want to be sure, if I want to further update the series (mostly
>> style and small cleanups), I should just base all my updates on your
>> for-next branch, right?
> 
> No, please base it on misc-next. For-next is for an early testing but
> patchsets can be updated or completely dropped.

My bad, my question should be: Should I fetch all the patches from for-next?

Because I thought there may be some modification when you apply the 
patches, but it turns out it's really applied as is, so no need for that.

All my patches are and will always be based on misc-next.
(Although sometimes it's some older misc-next)

Thanks,
Qu
David Sterba March 27, 2023, 11:28 p.m. UTC | #5
On Fri, Mar 24, 2023 at 08:42:22AM +0800, Qu Wenruo wrote:
> 
> 
> On 2023/3/24 01:51, David Sterba wrote:
> > On Thu, Mar 23, 2023 at 02:28:16PM +0800, Qu Wenruo wrote:
> >>
> >>
> >> On 2023/3/21 08:09, David Sterba wrote:
> >>> On Mon, Mar 20, 2023 at 10:12:46AM +0800, Qu Wenruo wrote:
> >>>> [TODO]
> >>>>
> >>>> - More testing on zoned devices
> >>>>     Now the patchset can already pass all scrub/replace groups with
> >>>>     regular devices.
> >>>
> >>> I think I noticed some disparity in the old and new code for the zoned
> >>> devices. This should be found by testing so I'd add this series to
> >>> for-next and see.
> >>
> >> Just want to be sure, if I want to further update the series (mostly
> >> style and small cleanups), I should just base all my updates on your
> >> for-next branch, right?
> > 
> > No, please base it on misc-next. For-next is for an early testing but
> > patchsets can be updated or completely dropped.
> 
> My bad, my question should be: Should I fetch all the patches from for-next?
> 
> Because I thought there may be some modification when you apply the 
> patches, but it turns out it's really applied as is, so no need for that.

Ah I see what you mean, yeah that's a valid question. For patches that
get feedback regarding non-trivial changes I don't do my edits until
it's more or less finalized. This is implied by the status in the
mailing list, not reflected in the branch names, or if there's a tracking
issue for the patchset I add the 'update' label.