mbox series

[v3,0/5] dm: empty flush optimization

Message ID 20240516040235.115651-1-yang.yang@vivo.com (mailing list archive)
Headers show
Series dm: empty flush optimization | expand

Message

YangYang May 16, 2024, 4:02 a.m. UTC
__send_empty_flush() sends empty flush bios to every target in the
dm_table. However, if the num_targets exceeds the number of block
devices in the dm_table's device list, it could lead to multiple
invocations of __send_duplicate_bios() for the same block device.
Typically, a single thread sending numerous empty flush bios to one
block device is redundant, as these bios are likely to be merged by the
flush state machine. In scenarios where num_targets significantly
outweighs the number of block devices, such behavior may result in a
noteworthy decrease in performance.

This is a real-world scenario that we have encountered:
1) Call fallocate(file_fd, 0, 0, SZ_8G)
2) Call ioctl(file_fd, FS_IOC_FIEMAP, fiemap). In situations of severe
file system fragmentation, fiemap->fm_mapped_extents may exceed 1000.
3) Create a dm-linear device based on fiemap->fm_extents
4) Create a snapshot-cow device based on the dm-linear device 

Perf diff of fio test:
  fio --group_reporting --name=benchmark --filename=/dev/mapper/example \
      --ioengine=sync --invalidate=1 --numjobs=16 --rw=randwrite \
      --blocksize=4k --size=2G --time_based --runtime=30 --fdatasync=1

Scenario one:
  for i in {0..1023}; do
    echo $((8000*$i)) 8000 linear /dev/sda2 $((16384*$i))
  done | sudo dmsetup create example

  Before: bw=857KiB/
  After:  bw=30.8MiB/s    +3580%

Scenario two:
  for i in {0..1023}; do
    if [[ $i -gt 511 ]]; then
      echo $((8000*$i)) 8000 linear /dev/nvme0n1p6 $((16384*$i))
    else
      echo $((8000*$i)) 8000 linear /dev/sda2 $((16384*$i))
    fi
  done | sudo dmsetup create example

  Before: bw=1470KiB/
  After:  bw=33.9MiB/s    +2261%

Any comments are welcome!

V3:
-- Focus on targets with num_flush_bios equal to 1 to simplify the code
-- Use t->devices_lock to protect the dm_table's device list

V2:
-- Split into smaller pieces that are easier to review
-- Add flush_pass_around, suggested by Mikulas Patocka
-- Handling different target types separately

Yang Yang (5):
  dm: introduce flush_pass_around flag
  dm: add __send_empty_flush_bios() helper
  dm: support retrieving struct dm_target from struct dm_dev
  dm: Avoid sending redundant empty flush bios to the same block device
  dm linear: enable flush optimization function

 drivers/md/dm-core.h          |  3 +++
 drivers/md/dm-ioctl.c         |  4 ++++
 drivers/md/dm-linear.c        |  1 +
 drivers/md/dm-table.c         | 19 +++++++++++++++++++
 drivers/md/dm.c               | 34 +++++++++++++++++++++++++---------
 include/linux/device-mapper.h |  6 ++++++
 6 files changed, 58 insertions(+), 9 deletions(-)

Comments

YangYang May 16, 2024, 7:43 a.m. UTC | #1
On 2024/5/16 12:02, Yang Yang wrote:
> __send_empty_flush() sends empty flush bios to every target in the
> dm_table. However, if the num_targets exceeds the number of block
> devices in the dm_table's device list, it could lead to multiple
> invocations of __send_duplicate_bios() for the same block device.
> Typically, a single thread sending numerous empty flush bios to one
> block device is redundant, as these bios are likely to be merged by the
> flush state machine. In scenarios where num_targets significantly
> outweighs the number of block devices, such behavior may result in a
> noteworthy decrease in performance.
> 
> This is a real-world scenario that we have encountered:
> 1) Call fallocate(file_fd, 0, 0, SZ_8G)
> 2) Call ioctl(file_fd, FS_IOC_FIEMAP, fiemap). In situations of severe
> file system fragmentation, fiemap->fm_mapped_extents may exceed 1000.
> 3) Create a dm-linear device based on fiemap->fm_extents
> 4) Create a snapshot-cow device based on the dm-linear device
> 
> Perf diff of fio test:
>    fio --group_reporting --name=benchmark --filename=/dev/mapper/example \
>        --ioengine=sync --invalidate=1 --numjobs=16 --rw=randwrite \
>        --blocksize=4k --size=2G --time_based --runtime=30 --fdatasync=1
> 
> Scenario one:
>    for i in {0..1023}; do
>      echo $((8000*$i)) 8000 linear /dev/sda2 $((16384*$i))
>    done | sudo dmsetup create example
> 
>    Before: bw=857KiB/
>    After:  bw=30.8MiB/s    +3580%
> 
> Scenario two:
>    for i in {0..1023}; do
>      if [[ $i -gt 511 ]]; then
>        echo $((8000*$i)) 8000 linear /dev/nvme0n1p6 $((16384*$i))
>      else
>        echo $((8000*$i)) 8000 linear /dev/sda2 $((16384*$i))
>      fi
>    done | sudo dmsetup create example
> 
>    Before: bw=1470KiB/
>    After:  bw=33.9MiB/s    +2261%
> 
> Any comments are welcome!
> 
> V3:
> -- Focus on targets with num_flush_bios equal to 1 to simplify the code
> -- Use t->devices_lock to protect the dm_table's device list

Please ignore V3, which has a build warning. I will send V4 with the fix.

Thanks