Message ID | 20250411010732.358817-9-eblake@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Make blockdev-mirror dest sparse in more cases | expand |
On 11.04.25 04:04, Eric Blake wrote: > When doing a sync=full mirroring, QMP drive-mirror requests full > zeroing if it did not just create the destination, and blockdev-mirror > requests full zeroing unconditionally. This is because during a full > sync, we must ensure that the portions of the disk that are not > otherwise touched by the source still read as zero upon completion. > > However, in mirror_dirty_init(), we were blindly assuming that if the > destination allows punching holes, we should pre-zero the entire > image; and if it does not allow punching holes, then treat the entire > source as dirty rather than mirroring just the allocated portions of > the source. Without the ability to punch holes, this results in the > destination file being fully allocated; and even when punching holes > is supported, it causes duplicate I/O to the portions of the > destination corresponding to chunks of the source that are allocated > but read as zero. > > Smarter is to avoid the pre-zeroing pass over the destination if it > can be proved the destination already reads as zero. Note that a > later patch will then further improve things to skip writing to the > destination for parts of the image where the source is zero; but even > with just this patch, it is possible to see a difference for any BDS > that can quickly report that it already reads as zero. Iotest 194 is > proof of this: instead of mirroring a completely sparse file, change > it to pre-populate some data. When run with './check -file 194', the > full 1G is still allocated, but with './check -qcow2 194', only the 1M > of pre-populated data is now mirrored; this in turn requires an > additional log filter. > > Note that there are still BDS layers that do not quickly report > reading as all zero; for example, the file-posix code implementation > for fast block status currently blindly reports the entire image as > allocated and non-zero without even consulting lseek(SEEK_DATA)); that > will be addressed in later patches. > > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > block/mirror.c | 10 ++++++++-- > tests/qemu-iotests/194 | 15 +++++++++++++-- > tests/qemu-iotests/194.out | 4 ++-- > 3 files changed, 23 insertions(+), 6 deletions(-) > > diff --git a/block/mirror.c b/block/mirror.c > index a53582f17bb..2e1e14c8e7e 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -841,14 +841,20 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s) > int64_t offset; > BlockDriverState *bs; > BlockDriverState *target_bs = blk_bs(s->target); > - int ret = -1; > + int ret; I think, it was to avoid Coverity false-positive around further code WITH_GRAPH_RDLOCK_GUARD() { ret = bdrv_co_is_allocated_above(bs, s->base_overlay, true, offset, bytes, &count); } if (ret < 0) { return ret; } which you don't touch here. I think "= -1;" should be kept. Or I missed static analyzes revolution (if so, it should be mentioned in commit message). > int64_t count; > > bdrv_graph_co_rdlock(); > bs = s->mirror_top_bs->backing->bs; > + if (s->zero_target) { > + ret = bdrv_co_is_zero_fast(target_bs, 0, s->bdev_length); > + } > bdrv_graph_co_rdunlock(); > > - if (s->zero_target) { > + if (s->zero_target && ret <= 0) { > + if (ret < 0) { > + return ret; > + } > if (!bdrv_can_write_zeroes_with_unmap(target_bs)) { > bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length); > return 0; > diff --git a/tests/qemu-iotests/194 b/tests/qemu-iotests/194 > index c0ce82dd257..814c15dfe3b 100755
On 11.04.25 04:04, Eric Blake wrote: > When doing a sync=full mirroring, QMP drive-mirror requests full > zeroing if it did not just create the destination, and blockdev-mirror > requests full zeroing unconditionally. This is because during a full > sync, we must ensure that the portions of the disk that are not > otherwise touched by the source still read as zero upon completion. > > However, in mirror_dirty_init(), we were blindly assuming that if the > destination allows punching holes, we should pre-zero the entire > image; and if it does not allow punching holes, then treat the entire > source as dirty rather than mirroring just the allocated portions of > the source. Without the ability to punch holes, this results in the > destination file being fully allocated; and even when punching holes > is supported, it causes duplicate I/O to the portions of the > destination corresponding to chunks of the source that are allocated > but read as zero. > > Smarter is to avoid the pre-zeroing pass over the destination if it > can be proved the destination already reads as zero. Note that a > later patch will then further improve things to skip writing to the > destination for parts of the image where the source is zero; but even > with just this patch, it is possible to see a difference for any BDS > that can quickly report that it already reads as zero. Iotest 194 is > proof of this: instead of mirroring a completely sparse file, change > it to pre-populate some data. When run with './check -file 194', the > full 1G is still allocated, but with './check -qcow2 194', only the 1M > of pre-populated data is now mirrored; this in turn requires an > additional log filter. This make me doubt. Actually, what is the different for the user between "fast zeroing empty qcow2 (actually no-op)" and "skip zeroing empty qcow2 (no-op)"? And why visible interface effect should change after some internal optimization? > > Note that there are still BDS layers that do not quickly report > reading as all zero; for example, the file-posix code implementation > for fast block status currently blindly reports the entire image as > allocated and non-zero without even consulting lseek(SEEK_DATA)); that > will be addressed in later patches. > > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > block/mirror.c | 10 ++++++++-- > tests/qemu-iotests/194 | 15 +++++++++++++-- > tests/qemu-iotests/194.out | 4 ++-- > 3 files changed, 23 insertions(+), 6 deletions(-) > > diff --git a/block/mirror.c b/block/mirror.c > index a53582f17bb..2e1e14c8e7e 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -841,14 +841,20 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s) > int64_t offset; > BlockDriverState *bs; > BlockDriverState *target_bs = blk_bs(s->target); > - int ret = -1; > + int ret; > int64_t count; > > bdrv_graph_co_rdlock(); > bs = s->mirror_top_bs->backing->bs; > + if (s->zero_target) { > + ret = bdrv_co_is_zero_fast(target_bs, 0, s->bdev_length); > + } > bdrv_graph_co_rdunlock(); > > - if (s->zero_target) { > + if (s->zero_target && ret <= 0) { > + if (ret < 0) { > + return ret; > + } > if (!bdrv_can_write_zeroes_with_unmap(target_bs)) { > bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length); So, without your patch we go into this "if" and set the whole dirty bitmap even with qcow2? That looks like a preexisting bug to me. And this is because we don't have discard=unmap option on qcow2 node.. Probably, we'd better have something like bdrv_can_write_zeroes_fast(target_bs) here, which of course should be true for qcow2. Probably in real world people always set discard=unmap for qcow2 nodes. > return 0; > diff --git a/tests/qemu-iotests/194 b/tests/qemu-iotests/194 > index c0ce82dd257..814c15dfe3b 100755 > --- a/tests/qemu-iotests/194 > +++ b/tests/qemu-iotests/194 > @@ -22,6 +22,14 @@ > > import iotests > > +def filter_job_event(event): > + '''Filter len and offset in a job event''' > + event = dict(event) > + if event['data'].get('len', 0) == event['data'].get('offset', 1): I'd prefer explicit check that both fields exist instead of magic 0 and 1. But I don't care too much. > + event['data']['len'] = 'LEN' > + event['data']['offset'] = 'LEN' > + return event > + > iotests.script_initialize(supported_fmts=['qcow2', 'qed', 'raw'], > supported_platforms=['linux']) > > @@ -34,6 +42,7 @@ with iotests.FilePath('source.img') as source_img_path, \ > > img_size = '1G' > iotests.qemu_img_create('-f', iotests.imgfmt, source_img_path, img_size) > + iotests.qemu_io('-f', iotests.imgfmt, '-c', 'write 512M 1M', source_img_path) > iotests.qemu_img_create('-f', iotests.imgfmt, dest_img_path, img_size) > > iotests.log('Launching VMs...') > @@ -61,7 +70,8 @@ with iotests.FilePath('source.img') as source_img_path, \ > > iotests.log('Waiting for `drive-mirror` to complete...') > iotests.log(source_vm.event_wait('BLOCK_JOB_READY'), > - filters=[iotests.filter_qmp_event]) > + filters=[iotests.filter_qmp_event, > + filter_job_event]) > > iotests.log('Starting migration...') > capabilities = [{'capability': 'events', 'state': True}, > @@ -87,7 +97,8 @@ with iotests.FilePath('source.img') as source_img_path, \ > > while True: > event2 = source_vm.event_wait('BLOCK_JOB_COMPLETED') > - iotests.log(event2, filters=[iotests.filter_qmp_event]) > + iotests.log(event2, filters=[iotests.filter_qmp_event, > + filter_job_event]) > if event2['event'] == 'BLOCK_JOB_COMPLETED': > iotests.log('Stopping the NBD server on destination...') > iotests.log(dest_vm.qmp('nbd-server-stop')) > diff --git a/tests/qemu-iotests/194.out b/tests/qemu-iotests/194.out > index 6940e809cde..79b961723d8 100644 > --- a/tests/qemu-iotests/194.out > +++ b/tests/qemu-iotests/194.out > @@ -7,7 +7,7 @@ Launching NBD server on destination... > Starting `drive-mirror` on source... > {"return": {}} > Waiting for `drive-mirror` to complete... > -{"data": {"device": "mirror-job0", "len": 1073741824, "offset": 1073741824, "speed": 0, "type": "mirror"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}} > +{"data": {"device": "mirror-job0", "len": "LEN", "offset": "LEN", "speed": 0, "type": "mirror"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}} > Starting migration... > {"return": {}} > {"execute": "migrate-start-postcopy", "arguments": {}} > @@ -18,7 +18,7 @@ Starting migration... > {"data": {"status": "completed"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}} > Gracefully ending the `drive-mirror` job on source... > {"return": {}} > -{"data": {"device": "mirror-job0", "len": 1073741824, "offset": 1073741824, "speed": 0, "type": "mirror"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}} > +{"data": {"device": "mirror-job0", "len": "LEN", "offset": "LEN", "speed": 0, "type": "mirror"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}} > Stopping the NBD server on destination... > {"return": {}} > Wait for migration completion on target... With "int ret = -1;" kept as is: Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
diff --git a/block/mirror.c b/block/mirror.c index a53582f17bb..2e1e14c8e7e 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -841,14 +841,20 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s) int64_t offset; BlockDriverState *bs; BlockDriverState *target_bs = blk_bs(s->target); - int ret = -1; + int ret; int64_t count; bdrv_graph_co_rdlock(); bs = s->mirror_top_bs->backing->bs; + if (s->zero_target) { + ret = bdrv_co_is_zero_fast(target_bs, 0, s->bdev_length); + } bdrv_graph_co_rdunlock(); - if (s->zero_target) { + if (s->zero_target && ret <= 0) { + if (ret < 0) { + return ret; + } if (!bdrv_can_write_zeroes_with_unmap(target_bs)) { bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length); return 0; diff --git a/tests/qemu-iotests/194 b/tests/qemu-iotests/194 index c0ce82dd257..814c15dfe3b 100755 --- a/tests/qemu-iotests/194 +++ b/tests/qemu-iotests/194 @@ -22,6 +22,14 @@ import iotests +def filter_job_event(event): + '''Filter len and offset in a job event''' + event = dict(event) + if event['data'].get('len', 0) == event['data'].get('offset', 1): + event['data']['len'] = 'LEN' + event['data']['offset'] = 'LEN' + return event + iotests.script_initialize(supported_fmts=['qcow2', 'qed', 'raw'], supported_platforms=['linux']) @@ -34,6 +42,7 @@ with iotests.FilePath('source.img') as source_img_path, \ img_size = '1G' iotests.qemu_img_create('-f', iotests.imgfmt, source_img_path, img_size) + iotests.qemu_io('-f', iotests.imgfmt, '-c', 'write 512M 1M', source_img_path) iotests.qemu_img_create('-f', iotests.imgfmt, dest_img_path, img_size) iotests.log('Launching VMs...') @@ -61,7 +70,8 @@ with iotests.FilePath('source.img') as source_img_path, \ iotests.log('Waiting for `drive-mirror` to complete...') iotests.log(source_vm.event_wait('BLOCK_JOB_READY'), - filters=[iotests.filter_qmp_event]) + filters=[iotests.filter_qmp_event, + filter_job_event]) iotests.log('Starting migration...') capabilities = [{'capability': 'events', 'state': True}, @@ -87,7 +97,8 @@ with iotests.FilePath('source.img') as source_img_path, \ while True: event2 = source_vm.event_wait('BLOCK_JOB_COMPLETED') - iotests.log(event2, filters=[iotests.filter_qmp_event]) + iotests.log(event2, filters=[iotests.filter_qmp_event, + filter_job_event]) if event2['event'] == 'BLOCK_JOB_COMPLETED': iotests.log('Stopping the NBD server on destination...') iotests.log(dest_vm.qmp('nbd-server-stop')) diff --git a/tests/qemu-iotests/194.out b/tests/qemu-iotests/194.out index 6940e809cde..79b961723d8 100644 --- a/tests/qemu-iotests/194.out +++ b/tests/qemu-iotests/194.out @@ -7,7 +7,7 @@ Launching NBD server on destination... Starting `drive-mirror` on source... {"return": {}} Waiting for `drive-mirror` to complete... -{"data": {"device": "mirror-job0", "len": 1073741824, "offset": 1073741824, "speed": 0, "type": "mirror"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}} +{"data": {"device": "mirror-job0", "len": "LEN", "offset": "LEN", "speed": 0, "type": "mirror"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}} Starting migration... {"return": {}} {"execute": "migrate-start-postcopy", "arguments": {}} @@ -18,7 +18,7 @@ Starting migration... {"data": {"status": "completed"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}} Gracefully ending the `drive-mirror` job on source... {"return": {}} -{"data": {"device": "mirror-job0", "len": 1073741824, "offset": 1073741824, "speed": 0, "type": "mirror"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}} +{"data": {"device": "mirror-job0", "len": "LEN", "offset": "LEN", "speed": 0, "type": "mirror"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}} Stopping the NBD server on destination... {"return": {}} Wait for migration completion on target...
When doing a sync=full mirroring, QMP drive-mirror requests full zeroing if it did not just create the destination, and blockdev-mirror requests full zeroing unconditionally. This is because during a full sync, we must ensure that the portions of the disk that are not otherwise touched by the source still read as zero upon completion. However, in mirror_dirty_init(), we were blindly assuming that if the destination allows punching holes, we should pre-zero the entire image; and if it does not allow punching holes, then treat the entire source as dirty rather than mirroring just the allocated portions of the source. Without the ability to punch holes, this results in the destination file being fully allocated; and even when punching holes is supported, it causes duplicate I/O to the portions of the destination corresponding to chunks of the source that are allocated but read as zero. Smarter is to avoid the pre-zeroing pass over the destination if it can be proved the destination already reads as zero. Note that a later patch will then further improve things to skip writing to the destination for parts of the image where the source is zero; but even with just this patch, it is possible to see a difference for any BDS that can quickly report that it already reads as zero. Iotest 194 is proof of this: instead of mirroring a completely sparse file, change it to pre-populate some data. When run with './check -file 194', the full 1G is still allocated, but with './check -qcow2 194', only the 1M of pre-populated data is now mirrored; this in turn requires an additional log filter. Note that there are still BDS layers that do not quickly report reading as all zero; for example, the file-posix code implementation for fast block status currently blindly reports the entire image as allocated and non-zero without even consulting lseek(SEEK_DATA)); that will be addressed in later patches. Signed-off-by: Eric Blake <eblake@redhat.com> --- block/mirror.c | 10 ++++++++-- tests/qemu-iotests/194 | 15 +++++++++++++-- tests/qemu-iotests/194.out | 4 ++-- 3 files changed, 23 insertions(+), 6 deletions(-)