mbox series

[v8,00/12] btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror()

Message ID cover.1680225140.git.wqu@suse.com (mailing list archive)
Headers show
Series btrfs: scrub: use a more reader friendly code to implement scrub_simple_mirror() | expand

Message

Qu Wenruo March 31, 2023, 1:20 a.m. UTC
This series can be found in my github repo:

https://github.com/adam900710/linux/tree/scrub_stripe

It's recommended to fetch from the repo, as our misc-next seems to
change pretty rapidly.

[Changelog]
v8:
- Better bio layer interface
  * No special scrub read helper
    Just go btrfs_submit_bio() for read bios

  * Shared code base for repair writes
    A common helper, btrfs_map_repair_block(), is shared between
    read-repair and scrub.

  * Some code style enhancement related to bio layer

- Full RAID56 code cleanup
  A huge -2500 lines cleanup, although not done in this series.

  The cleanup is done in an dedicated series: "btrfs: scrub: finish the
  switch to scrub_stripe and cleanup the old code".

  Can be fetched from github repo.

v7:
- Fix a bug that scrub_stripe::extent_sector_bitmap is not cleared
  Exposed during my development for RAID56 new scrub code.

  Extent_sector_bitmap indicates whether the sector is utilized by
  any extent.
  If that bitmap is not cleared before running the next stripe, we
  can treat unused sectors as NODATASUM data.

  This is not a big deal for non-RAID56 profiles, as they skip empty
  stripe through other methods.
  But can be very problematic for RAID56, as they need to scrub
  data stripes then P/Q stripes.

  Such inherited bitmap makes RAID56 to scrub all those unused
  full stripes and greatly slow down the scrub.

v6:
- Fix a bug in zoned block group repair
  Exposed during my development for RAID56 new scrub code.

  There is a bug that we may use @stripe to determine if we need to
  queue a block group for repair.

  But @stripe is the last stripe we checked, it may not have any error.

  The correct way is to go through all the stripes and queue the repair
  if we found any error.

v5:
- Fix a bug that unconditionally repairs a zoned block group
  Only trigger the repair if we had any initial read failure

  Huge thanks to Johannes for the initial ZNS tests.

v4:
- Add a dedicated patch to add btrfs_bio::fs_info
  Along with dedicated allocator for scrub btrfs bios.
  The dedicated allocator is due to the fact that scrub and regular
  btrfs bios have very different mandatory members (fs_info vs inode +
  file_offset).
  For now I believe a different allocator would be better.

- Some code style change
  * No more single letter temporray structure copied from old scrub code
  * Use "for (int i = 0; ...)" when possible
  * Some new lines fixes
  * Extra brackets for tenrary operators
  * A new macro for (BTRFS_STRIPE_LEN >> PAGE_SHIFT)
  * Use enum for scrub_stripe::state bit flags
  * Use extra brackets for double shifting
  * Use explicit != 0 or == 0 comparing memcmp() results
  * Remove unnecessary ASSERT()s after btrfs_bio allocation

v3:
- Add a dedicated @fs_info member for btrfs_bio
  Unfortunately although we have a 32 bytes hole between @end_io_work and @bio,
  compiler still choose not to use that hole for whatever reasons.
  Thus this would increase the size of btrfs_bio by 4 bytes.

- Rebased to lastest misc-next

- Fix various while space error not caught by btrfs-workflow

v2:
- Use batched scrub_stripe submission
  This allows much better performance compared to the old scrub code

- Add scrub specific bio layer helpers
  This makes the scrub code to be completely rely on logical bytenr +
  mirror_num.

[PROBLEMS OF OLD SCRUB]

- Too many delayed jumps, making it hard to read
  Even starting from scrub_simple_mirror(), we have the following
  functions:

  scrub_extent()
       |
       v
  scrub_sectors()
       |
       v
  scrub_add_sector_to_rd_bio()
       | endio function
       v
  scrub_bio_end_io()
       | delayed work
       v
  scrub_bio_end_io_worker()
       |
       v
  scrub_block_complete()
       |
       v
  scrub_handle_errored_blocks()
       |
       v
  scrub_recheck_block()
       |
       v
  scrub_repair_sector_from_good_copy()

  Not to mention the hidden jumps in certain branches.

- IOPS inefficient for fragmented extents

  The real block size of scrub read is between 4K and 128K.
  If the extents are not adjacent, the blocksize drops to 4K and would
  be an IOPS disaster.

- All hardcoded to do the logical -> physical mapping by scrub itself
  No usage of any existing bio facilities.
  And even implemented a RAID56 recovery wrapper.

[NEW SCRUB_STRIPE BASED SOLUTION]

- Overall streamlined code base

  queue_scrub_stripe()
     |
     v
  scrub_find_fill_first_stripe()
     |
     v
  done

  Or

  queue_scrub_stripe()
     |
     v
  flush_scrub_stripes()
     |
     v
  scrub_submit_initial_read()
     | endio function
     v
  scrub_read_endio()
     | delayed work
     v
  scrub_stripe_read_repair_worker()
     |
     v
  scrub_verify_one_stripe()
     |
     v
  scrub_stripe_submit_repair_read()
     |
     v
  scrub_write_sectors()
     |
     v
  scrub_stripe_report_errors()

  Only one endio and delayed work, all other work are properly done in a
  sequential workflow.

- Always read in 64KiB block size
  The real blocksize of read starts at 64KiB, and ends at 512K.
  This already results a better performance even for the worst case:

  With patchset:	404.81MiB/s
  Without patchset:	369.30MiB/s

  Around 10% performance improvement on an SATA SSD.

- All logical bytenr/mirror_num based read and write

  With the new single stripe fast path in btrfs_submit_bio(), scrub can
  reuse most of the bio layer code, result much simpler scrub code.

[TODO]

- More testing on zoned devices
  Now the patchset can already pass all scrub/replace/raid/repair groups
  with regular devices.
  Johannes helped some zoned tests, but more zoned tests would still be
  very appreciated.

Qu Wenruo (12):
  btrfs: scrub: use dedicated super block verification function to scrub
    one super block
  btrfs: introduce btrfs_bio::fs_info member
  btrfs: introduce a new helper to submit write bio for repair
  btrfs: scrub: introduce the structure for new BTRFS_STRIPE_LEN based
    interface
  btrfs: scrub: introduce a helper to find and fill the sector info for
    a scrub_stripe
  btrfs: scrub: introduce a helper to verify one metadata
  btrfs: scrub: introduce a helper to verify one scrub_stripe
  btrfs: scrub: introduce the main read repair worker for scrub_stripe
  btrfs: scrub: introduce a writeback helper for scrub_stripe
  btrfs: scrub: introduce error reporting functionality for scrub_stripe
  btrfs: scrub: introduce the helper to queue a stripe for scrub
  btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe
    infrastructure

 fs/btrfs/bio.c         |  138 ++--
 fs/btrfs/bio.h         |   14 +-
 fs/btrfs/compression.c |    3 +-
 fs/btrfs/extent_io.c   |    3 +-
 fs/btrfs/file-item.c   |    9 +-
 fs/btrfs/file-item.h   |    3 +-
 fs/btrfs/inode.c       |   13 +-
 fs/btrfs/raid56.c      |    2 +-
 fs/btrfs/raid56.h      |    5 +
 fs/btrfs/scrub.c       | 1668 ++++++++++++++++++++++++++++------------
 fs/btrfs/volumes.c     |   73 ++
 fs/btrfs/volumes.h     |    3 +
 fs/btrfs/zoned.c       |    4 +-
 13 files changed, 1386 insertions(+), 552 deletions(-)

Comments

David Sterba March 31, 2023, 4:17 p.m. UTC | #1
On Fri, Mar 31, 2023 at 09:20:03AM +0800, Qu Wenruo wrote:
> This series can be found in my github repo:
> 
> https://github.com/adam900710/linux/tree/scrub_stripe

This also includes the cleanup branch so I'll use this as topic branch
in for-next.
Qu Wenruo April 4, 2023, 1:08 a.m. UTC | #2
On 2023/4/1 00:17, David Sterba wrote:
> On Fri, Mar 31, 2023 at 09:20:03AM +0800, Qu Wenruo wrote:
>> This series can be found in my github repo:
>>
>> https://github.com/adam900710/linux/tree/scrub_stripe
> 
> This also includes the cleanup branch so I'll use this as topic branch
> in for-next.

Thanks for that.

Just some questions inspired by the series.

[WAY TO CLEANUP]
Just want to ask what's the proper way to do the cleanup.

Christoph mentioned in other subsystems they accept huge cleanup as long 
as it's only deleting code, while in my series I did the split to try 
keep each cleanup small.

But the split itself sometimes introduced dead code which is only going 
to be removed later, and most of the time, such new code makes no sense 
other than for patch split.

So I'm wondering what's the proper way to do huge cleanup in btrfs.

[FUTURE SCRUB UPDATE]
There are still something I may want to do improving scrub.

One such objective is to enhance RAID56 scrubbing.

In that case, what branch should I base my code on?
Normally I would go misc-next, but the new scrub is only in for-next.

Thanks,
Qu
David Sterba April 5, 2023, 5:11 p.m. UTC | #3
On Tue, Apr 04, 2023 at 09:08:31AM +0800, Qu Wenruo wrote:
> 
> 
> On 2023/4/1 00:17, David Sterba wrote:
> > On Fri, Mar 31, 2023 at 09:20:03AM +0800, Qu Wenruo wrote:
> >> This series can be found in my github repo:
> >>
> >> https://github.com/adam900710/linux/tree/scrub_stripe
> > 
> > This also includes the cleanup branch so I'll use this as topic branch
> > in for-next.
> 
> Thanks for that.
> 
> Just some questions inspired by the series.
> 
> [WAY TO CLEANUP]
> Just want to ask what's the proper way to do the cleanup.
> 
> Christoph mentioned in other subsystems they accept huge cleanup as long 
> as it's only deleting code, while in my series I did the split to try 
> keep each cleanup small.

I've just finished final pass through the scrub series so I can give you
a fresh answer. The split was great and thank you for taking the time to
do it. I don't treat deleting code differently than adding code.
Deleting also affects functionality, we've seen a recent example when
deleting code broke the thread_pool mount option. The review question is
"does it still work as expected when this code is gone?". As an extreme
example imagine deleting a mutex, it will make things faster but also
wrong.

If there are other subsystems where deleting code is not taken seriously
the same way new code is then I have my doubts about the review quality.

> But the split itself sometimes introduced dead code which is only going 
> to be removed later, and most of the time, such new code makes no sense 
> other than for patch split.

The reason I ask for split is to let me maintain focus on what's being
changed, and with too many deleted lines it's easy to overlook something
or sipmly fatigue.  Which means the review would be useless.
The removal patches are over 3000 lines in total, separately a few
hundreds each and logically grouped.

Adding a few lines to avoid warnings or to making it compile is a small
cost and as you did it with the comments it's quite clear where it is
and why.

> So I'm wondering what's the proper way to do huge cleanup in btrfs.

As you did it in the scrub series is a great example to follow for next
time.

> [FUTURE SCRUB UPDATE]
> There are still something I may want to do improving scrub.
> 
> One such objective is to enhance RAID56 scrubbing.
> 
> In that case, what branch should I base my code on?
> Normally I would go misc-next, but the new scrub is only in for-next.

I'll move it to misc-next so you can start there.