mbox series

[RFC,0/5] btrfs: scrub: introduce SCRUB_LOGICAL flag

Message ID cover.1686977659.git.wqu@suse.com (mailing list archive)
Headers show
Series btrfs: scrub: introduce SCRUB_LOGICAL flag | expand

Message

Qu Wenruo June 17, 2023, 5:07 a.m. UTC
[BACKGROUND]
Scrub originally works in a per-device basis, that means if we want to
scrub the full fs, we would queue a scrub for each writeable device.

This is normally fine, but some behavior is not ideal like the
following:
		X	X+16K	X+32K
 Mirror 1	|XXXXXXX|       |
 Mirror 2	|       |XXXXXXX|

When scrubbing mirror 1, we found [X, X+16K) has corruption, then we
will try to read from mirror 2 and repair using the correct data.

This applies to range [X+16K, X+32K) of mirror 2, causing the good copy
to be read twice, which may or may not be a big deal.

But the problem can easily go crazy for RAID56, as its repair requires
the full stripe to be read out, so is its P/Q verification.


There are also some other minor problems, like due to the difference in
the disk read speed, we may be scrubbing different block groups on
different devices.
This can lead to reduced available space and higher chance of false
-ENOSPC.

[ENHANCEMENT]
Instead of the existing per-device scrub, this patchset introduce a new
flag, SCRUB_LOGICAL, to do logical address space based scrub.

Unlike per-device scrub, this new flag would make scrub a per-fs
operation.

This allows us to scrub the different mirrors in one go, and avoid
unnecessary read to repair the corruption.

This also makes user space handling much simpler, just one ioctl to
scrub the full fs, without the current multi-thread scrub code.

[TODO]
The long todo list is the main reason for RFC:

- Missing RAID56 handling
  Unlike pure mirror based profiles, RAID56 repair now needs to be
  handled inside scrub (or some new raid interfaces to handle fully
  cached stripes).

  This can be some extra investment, and add an exception for the
  elegant and simpler mirror based read-repair path.

- Better progs integration
  In theory we can silently try SCRUB_LOGICAL first, if the kernel
  doesn't support it, then fallback to the old multi-device scrub.

  But for current testing, a dedicated option is still assigned for
  scrub subcommand.

  And currently it only supports full report, no summary nor scrub file
  support.

- More testing
  Currently only very basic tests done, no stress tests yet.

Qu Wenruo (5):
  btrfs: scrub: make scrub_ctx::stripes dynamically allocated
  btrfs: scrub: introduce the skeleton for logical-scrub
  btrfs: scrub: extract the common preparation before scrubbing a block
    group
  btrfs: scrub: implement the basic extent iteration code
  btrfs: scrub: implement the repair part of scrub logical

 fs/btrfs/disk-io.c         |   1 +
 fs/btrfs/fs.h              |  12 +
 fs/btrfs/ioctl.c           |   6 +-
 fs/btrfs/scrub.c           | 757 ++++++++++++++++++++++++++++++-------
 fs/btrfs/scrub.h           |   2 +
 include/uapi/linux/btrfs.h |  11 +-
 6 files changed, 647 insertions(+), 142 deletions(-)

Comments

Qu Wenruo June 17, 2023, 5:12 a.m. UTC | #1
On 2023/6/17 13:07, Qu Wenruo wrote:
> [BACKGROUND]
> Scrub originally works in a per-device basis, that means if we want to
> scrub the full fs, we would queue a scrub for each writeable device.
>
> This is normally fine, but some behavior is not ideal like the
> following:
> 		X	X+16K	X+32K
>   Mirror 1	|XXXXXXX|       |
>   Mirror 2	|       |XXXXXXX|
>
> When scrubbing mirror 1, we found [X, X+16K) has corruption, then we
> will try to read from mirror 2 and repair using the correct data.
>
> This applies to range [X+16K, X+32K) of mirror 2, causing the good copy
> to be read twice, which may or may not be a big deal.
>
> But the problem can easily go crazy for RAID56, as its repair requires
> the full stripe to be read out, so is its P/Q verification.
>
>
> There are also some other minor problems, like due to the difference in
> the disk read speed, we may be scrubbing different block groups on
> different devices.
> This can lead to reduced available space and higher chance of false
> -ENOSPC.
>
> [ENHANCEMENT]
> Instead of the existing per-device scrub, this patchset introduce a new
> flag, SCRUB_LOGICAL, to do logical address space based scrub.
>
> Unlike per-device scrub, this new flag would make scrub a per-fs
> operation.
>
> This allows us to scrub the different mirrors in one go, and avoid
> unnecessary read to repair the corruption.
>
> This also makes user space handling much simpler, just one ioctl to
> scrub the full fs, without the current multi-thread scrub code.
>
> [TODO]
> The long todo list is the main reason for RFC:
>
> - Missing RAID56 handling
>    Unlike pure mirror based profiles, RAID56 repair now needs to be
>    handled inside scrub (or some new raid interfaces to handle fully
>    cached stripes).
>
>    This can be some extra investment, and add an exception for the
>    elegant and simpler mirror based read-repair path.
>
> - Better progs integration
>    In theory we can silently try SCRUB_LOGICAL first, if the kernel
>    doesn't support it, then fallback to the old multi-device scrub.
>
>    But for current testing, a dedicated option is still assigned for
>    scrub subcommand.
>
>    And currently it only supports full report, no summary nor scrub file
>    support.
>
> - More testing
>    Currently only very basic tests done, no stress tests yet.

Forgot one more feature missing:

- IO speed limit through sysfs interface
   The old interface is per-device one, which can not be integrated
   easily into the per-fs scrub.

Thanks,
Qu

>
> Qu Wenruo (5):
>    btrfs: scrub: make scrub_ctx::stripes dynamically allocated
>    btrfs: scrub: introduce the skeleton for logical-scrub
>    btrfs: scrub: extract the common preparation before scrubbing a block
>      group
>    btrfs: scrub: implement the basic extent iteration code
>    btrfs: scrub: implement the repair part of scrub logical
>
>   fs/btrfs/disk-io.c         |   1 +
>   fs/btrfs/fs.h              |  12 +
>   fs/btrfs/ioctl.c           |   6 +-
>   fs/btrfs/scrub.c           | 757 ++++++++++++++++++++++++++++++-------
>   fs/btrfs/scrub.h           |   2 +
>   include/uapi/linux/btrfs.h |  11 +-
>   6 files changed, 647 insertions(+), 142 deletions(-)
>