mbox series

[v3,00/10] raid1 balancing methods

Message ID cover.1731076425.git.anand.jain@oracle.com (mailing list archive)
Headers show
Series raid1 balancing methods | expand

Message

Anand Jain Nov. 15, 2024, 2:54 p.m. UTC
v3:
1. Removed the latency-based RAID1 balancing patch. (Per David's review)
2. Renamed "rotation" to "round-robin" and set the per-set
   min_contiguous_read to 256k. (Per David's review)
3. Added raid1-balancing module configuration for fstests testing.
   raid1-balancing can now be configured through both module load
   parameters and sysfs.

The logic of individual methods remains unchanged, and performance metrics
are consistent with v2.

----- 
v2:
1. Move new features to CONFIG_BTRFS_EXPERIMENTAL instead of CONFIG_BTRFS_DEBUG.
2. Correct the typo from %est_wait to %best_wait.
3. Initialize %best_wait to U64_MAX and remove the check for 0.
4. Implement rotation with a minimum contiguous read threshold before
   switching to the next stripe. Configure this, using:

        echo rotation:[min_contiguous_read] > /sys/fs/btrfs/<uuid>/read_policy

   The default value is the sector size, and the min_contiguous_read
   value must be a multiple of the sector size.

5. Tested FIO random read/write and defrag compression workloads with
   min_contiguous_read set to sector size, 192k, and 256k.

   RAID1 balancing method rotation is better for multi-process workloads
   such as fio and also single-process workload such as defragmentation.

     $ fio --filename=/btrfs/foo --size=5Gi --direct=1 --rw=randrw --bs=4k \
        --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 \
        --time_based --group_reporting --name=iops-test-job --eta-newline=1


|         |            |            | Read I/O count  |
|         | Read       | Write      | devid1 | devid2 |
|---------|------------|------------|--------|--------|
| pid     | 20.3MiB/s  | 20.5MiB/s  | 313895 | 313895 |
| rotation|            |            |        |        |
|     4096| 20.4MiB/s  | 20.5MiB/s  | 313895 | 313895 |
|   196608| 20.2MiB/s  | 20.2MiB/s  | 310152 | 310175 |
|   262144| 20.3MiB/s  | 20.4MiB/s  | 312180 | 312191 |
|  latency| 18.4MiB/s  | 18.4MiB/s  | 272980 | 291683 |
| devid:1 | 14.8MiB/s  | 14.9MiB/s  | 456376 | 0      |

   rotation RAID1 balancing technique performs more than 2x better for
   single-process defrag.

      $ time -p btrfs filesystem defrag -r -f -c /btrfs


|         | Time  | Read I/O Count  |
|         | Real  | devid1 | devid2 |
|---------|-------|--------|--------|
| pid     | 18.00s| 3800   | 0      |
| rotation|       |        |        |
|     4096|  8.95s| 1900   | 1901   |
|   196608|  8.50s| 1881   | 1919   |
|   262144|  8.80s| 1881   | 1919   |
| latency | 17.18s| 3800   | 0      |
| devid:2 | 17.48s| 0      | 3800   |

Rotation keeps all devices active, and for now, the Rotation RAID1
balancing method is preferable as default. More workload testing is
needed while the code is EXPERIMENTAL.
While Latency is better during the failing/unstable block layer transport.
As of now these two techniques, are needed to be further independently
tested with different worloads, and in the long term we should be merge
these technique to a unified heuristic.

Rotation keeps all devices active, and for now, the Rotation RAID1
balancing method should be the default. More workload testing is needed
while the code is EXPERIMENTAL.

Latency is smarter with unstable block layer transport.

Both techniques need independent testing across workloads, with the goal of
eventually merging them into a unified approach? for the long term.

Devid is a hands-on approach, provides manual or user-space script control.

These RAID1 balancing methods are tunable via the sysfs knob.
The mount -o option and btrfs properties are under consideration.

Thx.

--------- original v1 ------------

The RAID1-balancing methods helps distribute read I/O across devices, and
this patch introduces three balancing methods: rotation, latency, and
devid. These methods are enabled under the `CONFIG_BTRFS_DEBUG` config
option and are on top of the previously added
`/sys/fs/btrfs/<UUID>/read_policy` interface to configure the desired
RAID1 read balancing method.

I've tested these patches using fio and filesystem defragmentation
workloads on a two-device RAID1 setup (with both data and metadata
mirrored across identical devices). I tracked device read counts by
extracting stats from `/sys/devices/<..>/stat` for each device. Below is
a summary of the results, with each result the average of three
iterations.

A typical generic random rw workload:

$ fio --filename=/btrfs/foo --size=10Gi --direct=1 --rw=randrw --bs=4k \
  --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based \
  --group_reporting --name=iops-test-job --eta-newline=1

|         |            |            | Read I/O count  |
|         | Read       | Write      | devid1 | devid2 |
|---------|------------|------------|--------|--------|
| pid     | 29.4MiB/s  | 29.5MiB/s  | 456548 | 447975 |
| rotation| 29.3MiB/s  | 29.3MiB/s  | 450105 | 450055 |
| latency | 21.9MiB/s  | 21.9MiB/s  | 672387 | 0      |
| devid:1 | 22.0MiB/s  | 22.0MiB/s  | 674788 | 0      |

Defragmentation with compression workload:

$ xfs_io -f -d -c 'pwrite -S 0xab 0 1G' /btrfs/foo
$ sync
$ echo 3 > /proc/sys/vm/drop_caches
$ btrfs filesystem defrag -f -c /btrfs/foo

|         | Time  | Read I/O Count  |
|         | Real  | devid1 | devid2 |
|---------|-------|--------|--------|
| pid     | 21.61s| 3810   | 0      |
| rotation| 11.55s| 1905   | 1905   |
| latency | 20.99s| 0      | 3810   |
| devid:2 | 21.41s| 0      | 3810   |

. The PID-based balancing method works well for the generic random rw fio
  workload.
. The rotation method is ideal when you want to keep both devices active,
  and it boosts performance in sequential defragmentation scenarios.
. The latency-based method work well when we have mixed device types or
  when one device experiences intermittent I/O failures the latency
  increases and it automatically picks the other device for further Read
  IOs.
. The devid method is a more hands-on approach, useful for diagnosing and
  testing RAID1 mirror synchronizations.

Anand Jain (10):
  btrfs: initialize fs_devices->fs_info earlier
  btrfs: simplify output formatting in btrfs_read_policy_show
  btrfs: add btrfs_read_policy_to_enum helper and refactor read policy
    store
  btrfs: handle value associated with raid1 balancing parameter
  btrfs: introduce RAID1 round-robin read balancing
  btrfs: add RAID1 preferred read device
  btrfs: pr CONFIG_BTRFS_EXPERIMENTAL status
  btrfs: fix CONFIG_BTRFS_EXPERIMENTAL migration
  btrfs: enable RAID1 balancing configuration via modprobe parameter
  btrfs: modload to print RAID1 balancing status

 fs/btrfs/disk-io.c |   1 +
 fs/btrfs/super.c   |  22 +++++-
 fs/btrfs/sysfs.c   | 181 ++++++++++++++++++++++++++++++++++++++++-----
 fs/btrfs/sysfs.h   |   5 ++
 fs/btrfs/volumes.c |  86 ++++++++++++++++++++-
 fs/btrfs/volumes.h |  14 ++++
 6 files changed, 286 insertions(+), 23 deletions(-)

Comments

Anand Jain Nov. 15, 2024, 3:20 p.m. UTC | #1
These changes refine module reload control in testing. Patch 1 reloads
the module earlier in run_section, ensuring each section's first test
starts with a reloaded module. Patch 2 adds FS_MODULE_RELOAD_OPTIONS to
pass module options during reload, useful for the Btrfs pre-mount
configurations that aren’t available as mount options.

Anand Jain (2):
  fstests: move fs-module reload to earlier in the run_section function
  fstests: FS_MODULE_RELOAD_OPTIONS to control filesystem module reload
    options

 check | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)
Filipe Manana Nov. 15, 2024, 7:16 p.m. UTC | #2
On Fri, Nov 15, 2024 at 2:55 PM Anand Jain <anand.jain@oracle.com> wrote:
>
> v3:
> 1. Removed the latency-based RAID1 balancing patch. (Per David's review)
> 2. Renamed "rotation" to "round-robin" and set the per-set
>    min_contiguous_read to 256k. (Per David's review)
> 3. Added raid1-balancing module configuration for fstests testing.
>    raid1-balancing can now be configured through both module load
>    parameters and sysfs.
>
> The logic of individual methods remains unchanged, and performance metrics
> are consistent with v2.
>
> -----
> v2:
> 1. Move new features to CONFIG_BTRFS_EXPERIMENTAL instead of CONFIG_BTRFS_DEBUG.
> 2. Correct the typo from %est_wait to %best_wait.
> 3. Initialize %best_wait to U64_MAX and remove the check for 0.
> 4. Implement rotation with a minimum contiguous read threshold before
>    switching to the next stripe. Configure this, using:
>
>         echo rotation:[min_contiguous_read] > /sys/fs/btrfs/<uuid>/read_policy
>
>    The default value is the sector size, and the min_contiguous_read
>    value must be a multiple of the sector size.
>
> 5. Tested FIO random read/write and defrag compression workloads with
>    min_contiguous_read set to sector size, 192k, and 256k.
>
>    RAID1 balancing method rotation is better for multi-process workloads
>    such as fio and also single-process workload such as defragmentation.
>
>      $ fio --filename=/btrfs/foo --size=5Gi --direct=1 --rw=randrw --bs=4k \
>         --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 \
>         --time_based --group_reporting --name=iops-test-job --eta-newline=1
>
>
> |         |            |            | Read I/O count  |
> |         | Read       | Write      | devid1 | devid2 |
> |---------|------------|------------|--------|--------|
> | pid     | 20.3MiB/s  | 20.5MiB/s  | 313895 | 313895 |
> | rotation|            |            |        |        |
> |     4096| 20.4MiB/s  | 20.5MiB/s  | 313895 | 313895 |
> |   196608| 20.2MiB/s  | 20.2MiB/s  | 310152 | 310175 |
> |   262144| 20.3MiB/s  | 20.4MiB/s  | 312180 | 312191 |
> |  latency| 18.4MiB/s  | 18.4MiB/s  | 272980 | 291683 |
> | devid:1 | 14.8MiB/s  | 14.9MiB/s  | 456376 | 0      |
>
>    rotation RAID1 balancing technique performs more than 2x better for
>    single-process defrag.
>
>       $ time -p btrfs filesystem defrag -r -f -c /btrfs
>
>
> |         | Time  | Read I/O Count  |
> |         | Real  | devid1 | devid2 |
> |---------|-------|--------|--------|
> | pid     | 18.00s| 3800   | 0      |
> | rotation|       |        |        |
> |     4096|  8.95s| 1900   | 1901   |
> |   196608|  8.50s| 1881   | 1919   |
> |   262144|  8.80s| 1881   | 1919   |
> | latency | 17.18s| 3800   | 0      |
> | devid:2 | 17.48s| 0      | 3800   |
>
> Rotation keeps all devices active, and for now, the Rotation RAID1
> balancing method is preferable as default. More workload testing is
> needed while the code is EXPERIMENTAL.
> While Latency is better during the failing/unstable block layer transport.
> As of now these two techniques, are needed to be further independently
> tested with different worloads, and in the long term we should be merge
> these technique to a unified heuristic.
>
> Rotation keeps all devices active, and for now, the Rotation RAID1
> balancing method should be the default. More workload testing is needed
> while the code is EXPERIMENTAL.
>
> Latency is smarter with unstable block layer transport.
>
> Both techniques need independent testing across workloads, with the goal of
> eventually merging them into a unified approach? for the long term.
>
> Devid is a hands-on approach, provides manual or user-space script control.
>
> These RAID1 balancing methods are tunable via the sysfs knob.
> The mount -o option and btrfs properties are under consideration.
>
> Thx.
>
> --------- original v1 ------------
>
> The RAID1-balancing methods helps distribute read I/O across devices, and
> this patch introduces three balancing methods: rotation, latency, and
> devid. These methods are enabled under the `CONFIG_BTRFS_DEBUG` config
> option and are on top of the previously added
> `/sys/fs/btrfs/<UUID>/read_policy` interface to configure the desired
> RAID1 read balancing method.
>
> I've tested these patches using fio and filesystem defragmentation
> workloads on a two-device RAID1 setup (with both data and metadata
> mirrored across identical devices). I tracked device read counts by
> extracting stats from `/sys/devices/<..>/stat` for each device. Below is
> a summary of the results, with each result the average of three
> iterations.
>
> A typical generic random rw workload:
>
> $ fio --filename=/btrfs/foo --size=10Gi --direct=1 --rw=randrw --bs=4k \
>   --ioengine=libaio --iodepth=256 --runtime=120 --numjobs=4 --time_based \
>   --group_reporting --name=iops-test-job --eta-newline=1
>
> |         |            |            | Read I/O count  |
> |         | Read       | Write      | devid1 | devid2 |
> |---------|------------|------------|--------|--------|
> | pid     | 29.4MiB/s  | 29.5MiB/s  | 456548 | 447975 |
> | rotation| 29.3MiB/s  | 29.3MiB/s  | 450105 | 450055 |
> | latency | 21.9MiB/s  | 21.9MiB/s  | 672387 | 0      |
> | devid:1 | 22.0MiB/s  | 22.0MiB/s  | 674788 | 0      |
>
> Defragmentation with compression workload:
>
> $ xfs_io -f -d -c 'pwrite -S 0xab 0 1G' /btrfs/foo
> $ sync
> $ echo 3 > /proc/sys/vm/drop_caches
> $ btrfs filesystem defrag -f -c /btrfs/foo
>
> |         | Time  | Read I/O Count  |
> |         | Real  | devid1 | devid2 |
> |---------|-------|--------|--------|
> | pid     | 21.61s| 3810   | 0      |
> | rotation| 11.55s| 1905   | 1905   |
> | latency | 20.99s| 0      | 3810   |
> | devid:2 | 21.41s| 0      | 3810   |
>
> . The PID-based balancing method works well for the generic random rw fio
>   workload.
> . The rotation method is ideal when you want to keep both devices active,
>   and it boosts performance in sequential defragmentation scenarios.
> . The latency-based method work well when we have mixed device types or
>   when one device experiences intermittent I/O failures the latency
>   increases and it automatically picks the other device for further Read
>   IOs.
> . The devid method is a more hands-on approach, useful for diagnosing and
>   testing RAID1 mirror synchronizations.
>
> Anand Jain (10):
>   btrfs: initialize fs_devices->fs_info earlier
>   btrfs: simplify output formatting in btrfs_read_policy_show
>   btrfs: add btrfs_read_policy_to_enum helper and refactor read policy
>     store
>   btrfs: handle value associated with raid1 balancing parameter
>   btrfs: introduce RAID1 round-robin read balancing
>   btrfs: add RAID1 preferred read device
>   btrfs: pr CONFIG_BTRFS_EXPERIMENTAL status
>   btrfs: fix CONFIG_BTRFS_EXPERIMENTAL migration

Why are these two patches, which are fixes unrelated to the raid
balancing feature, in the middle of this patchset?
These should go as a separate patchset...

Also for the second patch, there's already a fix from yesterday and in for-next:

https://lore.kernel.org/linux-btrfs/c7b550091f427a79ec5a9aa6c5ac6b5efbdb4e8f.1731605782.git.fdmanana@suse.com/

Thanks.

>   btrfs: enable RAID1 balancing configuration via modprobe parameter
>   btrfs: modload to print RAID1 balancing status
>
>  fs/btrfs/disk-io.c |   1 +
>  fs/btrfs/super.c   |  22 +++++-
>  fs/btrfs/sysfs.c   | 181 ++++++++++++++++++++++++++++++++++++++++-----
>  fs/btrfs/sysfs.h   |   5 ++
>  fs/btrfs/volumes.c |  86 ++++++++++++++++++++-
>  fs/btrfs/volumes.h |  14 ++++
>  6 files changed, 286 insertions(+), 23 deletions(-)
>
> --
> 2.46.1
>
>