mbox series

[v3,00/25] backup performance: block_status + async

Message ID 20201026171815.13233-1-vsementsov@virtuozzo.com (mailing list archive)
Headers show
Series backup performance: block_status + async | expand

Message

Vladimir Sementsov-Ogievskiy Oct. 26, 2020, 5:17 p.m. UTC
Hi all!

The series turn backup into series of block_copy_async calls, covering
the whole disk, so we get block-status based paralallel async requests
out of the box, which gives performance gain:

All results are in seconds

-----------------  -----------  -------------  --------------  ---------------------  --------------------------------  ------------------------------------
                   A            B              C               D                      E                                 F
                   mirror(old)  backup(old)    backup(old)     backup(new)            backup(new)                       backup(new)
                                copy-range=on  copy-range=off                         copy-range=on                     copy-range=on
                                                                                                                        max-workers=1
hdd-ext4:hdd-ext4  19           20             21 ± 14%        19                     51 ± 12%                          22 ± 24%
                                  A+5%           A+12%  B+6%     A+3%  B-2%  C-8%       A+174%  B+161%  C+145%  D+165%    A+18%  B+12%  C+5%  D+14%  E-57%
hdd-ext4:ssd-ext4  8.7          9.4 ± 3%       9.6 ± 2%        8.8                    24 ± 2%                           8.9
                                  A+8%           A+10%  B+2%     A+1%  B-7%  C-9%       A+174%  B+155%  C+149%  D+173%    A+2%  B-5%  C-8%  D+1%  E-63%
ssd-ext4:hdd-ext4  9            12 ± 9%        11 ± 7%         9.7 ± 7%               11 ± 2%                           10 ± 3%
                                  A+36%          A+28%  B-6%     A+7%  B-21%  C-16%     A+21%  B-11%  C-5%  D+13%         A+16%  B-14%  C-9%  D+8%  E-4%
ssd-ext4:ssd-ext4  4.4          11 ± 4%        10 ± 3%         4.7                    5.7                               10 ± 5%
                                  A+143%         A+134%  B-4%    A+6%  B-56%  C-55%     A+30%  B-46%  C-45%  D+22%        A+133%  B-4%  C-1%  D+119%  E+79%
hdd-xfs:hdd-xfs    19           20 ± 3%        20              20                     45 ± 4%                           19
                                  A+3%           A+4%  B+1%      A+3%  B+0%  C-1%       A+131%  B+125%  C+122%  D+125%    A-1%  B-4%  C-4%  D-3%  E-57%
hdd-xfs:ssd-xfs    9.1          9.9 ± 4%       9.5             9.1 ± 3%               23 ± 2%                           9.2
                                  A+8%           A+4%  B-4%      A+0%  B-8%  C-4%       A+151%  B+132%  C+142%  D+151%    A+1%  B-7%  C-3%  D+1%  E-60%
ssd-xfs:hdd-xfs    9.1          11 ± 9%        11              9.5 ± 4%               12 ± 22%                          11 ± 3%
                                  A+16%          A+22%  B+6%     A+4%  B-10%  C-15%     A+32%  B+14%  C+8%  D+26%         A+18%  B+2%  C-4%  D+13%  E-10%
ssd-xfs:ssd-xfs    4.1          8.7 ± 7%       9.2 ± 5%        4.5 ± 2%               5.7 ± 3%                          9.7 ± 5%
                                  A+113%         A+126%  B+6%    A+11%  B-48%  C-51%    A+40%  B-34%  C-38%  D+27%        A+138%  B+12%  C+5%  D+115%  E+70%
ssd-ext4:nbd       9.1 ± 2%     37             37 ± 2%         11                     11 ± 3%                           19 ± 2%
                                  A+302%         A+304%  B+1%    A+18%  B-71%  C-71%    A+18%  B-71%  C-71%  D+0%         A+106%  B-49%  C-49%  D+74%  E+75%
nbd:ssd-ext4       9            30 ± 3%        31              9                      9                                 17
                                  A+237%         A+245%  B+2%    A+0%  B-70%  C-71%     A+0%  B-70%  C-71%  D+0%          A+93%  B-43%  C-44%  D+93%  E+93%
-----------------  -----------  -------------  --------------  ---------------------  --------------------------------  ------------------------------------

Here column B is current backup and column D is new backup with
default parameters.

Mirror is still faster, but we are very close to it.

v3:
01: add Max's r-b
02: change to perf.use-copy-range
03: add Max's r-b
04: - more explicit finish status of async block_copy
    - block_copy_async always return non-NULL
    - personal opaque for new cb
05: - new arguments added in this patch
    - no default value for arguments in block_copy_async()
06: new
07: - caller does _kick() by hand
    - grammar in commit msg
    - add new parameter in _this_ patch
    - switch to opposite ignore_ratelimit
08: cancel now is async
09,10: add Max's r-b
11: changed a lot
12: add timeout
14: rebase on x-perf, keep r-b
15: rebase on x-perf
16: rebase on x-perf, keep r-b
17,18: new
19: now only backup.c is changed in this patch, changed a lot
20,21: new
22: rebased, keep r-b
23: new, split from 24
24: drop unrelated change (now patch23), keep r-b
25: changed a lot, explicitly specify options for each env (test table column)


To run benchmark do the following:

prepare images:
In a directories, where you want to place source and target images,
prepare images by:

for img in test-source test-target; do
 ./qemu-img create -f raw $img 1000M;
 ./qemu-img bench -c 1000 -d 1 -f raw -s 1M -w --pattern=0xff $img
done

prepare similar image for nbd server, and start it somewhere by

 qemu-nbd --persistent --nocache -f raw IMAGE

Then, run benchmark, like this:
./bench-backup.py --env old:/work/src/qemu/up-backup-block-copy-master/build/qemu-system-x86_64,mirror old,copy-range=on old,copy-range=off new:../../build/qemu-system-x86_64 new,copy-range=on new,copy-range=on,max-workers=1 --dir hdd-ext4:/test-a hdd-xfs:/test-b ssd-ext4:/ssd ssd-xfs:/ssd-xfs --test $(for fs in ext4 xfs; do echo hdd-$fs:hdd-$fs hdd-$fs:ssd-$fs ssd-$fs:hdd-$fs ssd-$fs:ssd-$fs; done) --nbd 192.168.100.5 --test ssd-ext4:nbd nbd:ssd-ext4

(you may simply reduce number of directories/test-cases, use --help for
 help)

Vladimir Sementsov-Ogievskiy (25):
  iotests: 129 don't check backup "busy"
  qapi: backup: add perf.use-copy-range parameter
  block/block-copy: More explicit call_state
  block/block-copy: implement block_copy_async
  block/block-copy: add max_chunk and max_workers parameters
  block/block-copy: add list of all call-states
  block/block-copy: add ratelimit to block-copy
  block/block-copy: add block_copy_cancel
  blockjob: add set_speed to BlockJobDriver
  job: call job_enter from job_user_pause
  qapi: backup: add max-chunk and max-workers to x-perf struct
  iotests: 56: prepare for backup over block-copy
  iotests: 129: prepare for backup over block-copy
  iotests: 185: prepare for backup over block-copy
  iotests: 219: prepare for backup over block-copy
  iotests: 257: prepare for backup over block-copy
  block/block-copy: make progress_bytes_callback optional
  block/backup: drop extra gotos from backup_run()
  backup: move to block-copy
  qapi: backup: disable copy_range by default
  block/block-copy: drop unused block_copy_set_progress_callback()
  block/block-copy: drop unused argument of block_copy()
  simplebench/bench_block_job: use correct shebang line with python3
  simplebench: bench_block_job: add cmd_options argument
  simplebench: add bench-backup.py

 qapi/block-core.json                   |  26 ++-
 block/backup-top.h                     |   1 +
 include/block/block-copy.h             |  58 ++++-
 include/block/block_int.h              |   3 +
 include/block/blockjob_int.h           |   2 +
 block/backup-top.c                     |   6 +-
 block/backup.c                         | 233 ++++++++++++-------
 block/block-copy.c                     | 227 +++++++++++++++---
 block/replication.c                    |   2 +
 blockdev.c                             |  14 ++
 blockjob.c                             |   6 +
 job.c                                  |   1 +
 scripts/simplebench/bench-backup.py    | 165 +++++++++++++
 scripts/simplebench/bench-example.py   |   2 +-
 scripts/simplebench/bench_block_job.py |  13 +-
 tests/qemu-iotests/056                 |   9 +-
 tests/qemu-iotests/129                 |   3 +-
 tests/qemu-iotests/185                 |   3 +-
 tests/qemu-iotests/185.out             |   2 +-
 tests/qemu-iotests/219                 |  13 +-
 tests/qemu-iotests/257                 |   1 +
 tests/qemu-iotests/257.out             | 306 ++++++++++++-------------
 22 files changed, 798 insertions(+), 298 deletions(-)
 create mode 100755 scripts/simplebench/bench-backup.py

Comments

Vladimir Sementsov-Ogievskiy Jan. 9, 2021, 10:18 a.m. UTC | #1
ping

26.10.2020 20:17, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> The series turn backup into series of block_copy_async calls, covering
> the whole disk, so we get block-status based paralallel async requests
> out of the box, which gives performance gain:
> 
> All results are in seconds
> 
> -----------------  -----------  -------------  --------------  ---------------------  --------------------------------  ------------------------------------
>                     A            B              C               D                      E                                 F
>                     mirror(old)  backup(old)    backup(old)     backup(new)            backup(new)                       backup(new)
>                                  copy-range=on  copy-range=off                         copy-range=on                     copy-range=on
>                                                                                                                          max-workers=1
> hdd-ext4:hdd-ext4  19           20             21 ± 14%        19                     51 ± 12%                          22 ± 24%
>                                    A+5%           A+12%  B+6%     A+3%  B-2%  C-8%       A+174%  B+161%  C+145%  D+165%    A+18%  B+12%  C+5%  D+14%  E-57%
> hdd-ext4:ssd-ext4  8.7          9.4 ± 3%       9.6 ± 2%        8.8                    24 ± 2%                           8.9
>                                    A+8%           A+10%  B+2%     A+1%  B-7%  C-9%       A+174%  B+155%  C+149%  D+173%    A+2%  B-5%  C-8%  D+1%  E-63%
> ssd-ext4:hdd-ext4  9            12 ± 9%        11 ± 7%         9.7 ± 7%               11 ± 2%                           10 ± 3%
>                                    A+36%          A+28%  B-6%     A+7%  B-21%  C-16%     A+21%  B-11%  C-5%  D+13%         A+16%  B-14%  C-9%  D+8%  E-4%
> ssd-ext4:ssd-ext4  4.4          11 ± 4%        10 ± 3%         4.7                    5.7                               10 ± 5%
>                                    A+143%         A+134%  B-4%    A+6%  B-56%  C-55%     A+30%  B-46%  C-45%  D+22%        A+133%  B-4%  C-1%  D+119%  E+79%
> hdd-xfs:hdd-xfs    19           20 ± 3%        20              20                     45 ± 4%                           19
>                                    A+3%           A+4%  B+1%      A+3%  B+0%  C-1%       A+131%  B+125%  C+122%  D+125%    A-1%  B-4%  C-4%  D-3%  E-57%
> hdd-xfs:ssd-xfs    9.1          9.9 ± 4%       9.5             9.1 ± 3%               23 ± 2%                           9.2
>                                    A+8%           A+4%  B-4%      A+0%  B-8%  C-4%       A+151%  B+132%  C+142%  D+151%    A+1%  B-7%  C-3%  D+1%  E-60%
> ssd-xfs:hdd-xfs    9.1          11 ± 9%        11              9.5 ± 4%               12 ± 22%                          11 ± 3%
>                                    A+16%          A+22%  B+6%     A+4%  B-10%  C-15%     A+32%  B+14%  C+8%  D+26%         A+18%  B+2%  C-4%  D+13%  E-10%
> ssd-xfs:ssd-xfs    4.1          8.7 ± 7%       9.2 ± 5%        4.5 ± 2%               5.7 ± 3%                          9.7 ± 5%
>                                    A+113%         A+126%  B+6%    A+11%  B-48%  C-51%    A+40%  B-34%  C-38%  D+27%        A+138%  B+12%  C+5%  D+115%  E+70%
> ssd-ext4:nbd       9.1 ± 2%     37             37 ± 2%         11                     11 ± 3%                           19 ± 2%
>                                    A+302%         A+304%  B+1%    A+18%  B-71%  C-71%    A+18%  B-71%  C-71%  D+0%         A+106%  B-49%  C-49%  D+74%  E+75%
> nbd:ssd-ext4       9            30 ± 3%        31              9                      9                                 17
>                                    A+237%         A+245%  B+2%    A+0%  B-70%  C-71%     A+0%  B-70%  C-71%  D+0%          A+93%  B-43%  C-44%  D+93%  E+93%
> -----------------  -----------  -------------  --------------  ---------------------  --------------------------------  ------------------------------------
> 
> Here column B is current backup and column D is new backup with
> default parameters.
> 
> Mirror is still faster, but we are very close to it.
> 
> v3:
> 01: add Max's r-b
> 02: change to perf.use-copy-range
> 03: add Max's r-b
> 04: - more explicit finish status of async block_copy
>      - block_copy_async always return non-NULL
>      - personal opaque for new cb
> 05: - new arguments added in this patch
>      - no default value for arguments in block_copy_async()
> 06: new
> 07: - caller does _kick() by hand
>      - grammar in commit msg
>      - add new parameter in _this_ patch
>      - switch to opposite ignore_ratelimit
> 08: cancel now is async
> 09,10: add Max's r-b
> 11: changed a lot
> 12: add timeout
> 14: rebase on x-perf, keep r-b
> 15: rebase on x-perf
> 16: rebase on x-perf, keep r-b
> 17,18: new
> 19: now only backup.c is changed in this patch, changed a lot
> 20,21: new
> 22: rebased, keep r-b
> 23: new, split from 24
> 24: drop unrelated change (now patch23), keep r-b
> 25: changed a lot, explicitly specify options for each env (test table column)
> 
> 
> To run benchmark do the following:
> 
> prepare images:
> In a directories, where you want to place source and target images,
> prepare images by:
> 
> for img in test-source test-target; do
>   ./qemu-img create -f raw $img 1000M;
>   ./qemu-img bench -c 1000 -d 1 -f raw -s 1M -w --pattern=0xff $img
> done
> 
> prepare similar image for nbd server, and start it somewhere by
> 
>   qemu-nbd --persistent --nocache -f raw IMAGE
> 
> Then, run benchmark, like this:
> ./bench-backup.py --env old:/work/src/qemu/up-backup-block-copy-master/build/qemu-system-x86_64,mirror old,copy-range=on old,copy-range=off new:../../build/qemu-system-x86_64 new,copy-range=on new,copy-range=on,max-workers=1 --dir hdd-ext4:/test-a hdd-xfs:/test-b ssd-ext4:/ssd ssd-xfs:/ssd-xfs --test $(for fs in ext4 xfs; do echo hdd-$fs:hdd-$fs hdd-$fs:ssd-$fs ssd-$fs:hdd-$fs ssd-$fs:ssd-$fs; done) --nbd 192.168.100.5 --test ssd-ext4:nbd nbd:ssd-ext4
> 
> (you may simply reduce number of directories/test-cases, use --help for
>   help)
> 
> Vladimir Sementsov-Ogievskiy (25):
>    iotests: 129 don't check backup "busy"
>    qapi: backup: add perf.use-copy-range parameter
>    block/block-copy: More explicit call_state
>    block/block-copy: implement block_copy_async
>    block/block-copy: add max_chunk and max_workers parameters
>    block/block-copy: add list of all call-states
>    block/block-copy: add ratelimit to block-copy
>    block/block-copy: add block_copy_cancel
>    blockjob: add set_speed to BlockJobDriver
>    job: call job_enter from job_user_pause
>    qapi: backup: add max-chunk and max-workers to x-perf struct
>    iotests: 56: prepare for backup over block-copy
>    iotests: 129: prepare for backup over block-copy
>    iotests: 185: prepare for backup over block-copy
>    iotests: 219: prepare for backup over block-copy
>    iotests: 257: prepare for backup over block-copy
>    block/block-copy: make progress_bytes_callback optional
>    block/backup: drop extra gotos from backup_run()
>    backup: move to block-copy
>    qapi: backup: disable copy_range by default
>    block/block-copy: drop unused block_copy_set_progress_callback()
>    block/block-copy: drop unused argument of block_copy()
>    simplebench/bench_block_job: use correct shebang line with python3
>    simplebench: bench_block_job: add cmd_options argument
>    simplebench: add bench-backup.py
> 
>   qapi/block-core.json                   |  26 ++-
>   block/backup-top.h                     |   1 +
>   include/block/block-copy.h             |  58 ++++-
>   include/block/block_int.h              |   3 +
>   include/block/blockjob_int.h           |   2 +
>   block/backup-top.c                     |   6 +-
>   block/backup.c                         | 233 ++++++++++++-------
>   block/block-copy.c                     | 227 +++++++++++++++---
>   block/replication.c                    |   2 +
>   blockdev.c                             |  14 ++
>   blockjob.c                             |   6 +
>   job.c                                  |   1 +
>   scripts/simplebench/bench-backup.py    | 165 +++++++++++++
>   scripts/simplebench/bench-example.py   |   2 +-
>   scripts/simplebench/bench_block_job.py |  13 +-
>   tests/qemu-iotests/056                 |   9 +-
>   tests/qemu-iotests/129                 |   3 +-
>   tests/qemu-iotests/185                 |   3 +-
>   tests/qemu-iotests/185.out             |   2 +-
>   tests/qemu-iotests/219                 |  13 +-
>   tests/qemu-iotests/257                 |   1 +
>   tests/qemu-iotests/257.out             | 306 ++++++++++++-------------
>   22 files changed, 798 insertions(+), 298 deletions(-)
>   create mode 100755 scripts/simplebench/bench-backup.py
>