diff mbox series

[v2,08/33] block/backup: drop support for copy_range

Message ID 20210520142205.607501-9-vsementsov@virtuozzo.com (mailing list archive)
State New, archived
Headers show
Series block: publish backup-top filter | expand

Commit Message

Vladimir Sementsov-Ogievskiy May 20, 2021, 2:21 p.m. UTC
copy_range is not a default behavior since 6a30f663d4c0b3c, and it's
now available only though x-perf experimantal argument, so it's OK to
drop it.

Even when backup is used to copy disk to same filesystem, and
filesystem supports zero-copy copy_range, copy_range is probably not
what we want for backup: backup has good property of making a copy of
active disk, with no impact to active disk itself (unlike creating a
snapshot). And if copy_range instead of copying data adds fs-level
references, and on next guest write COW operation occurs, it's seems
most possible, that new block will be allocated for original vm disk,
not for backup disk. Thus, fragmentation of original disk will
increase.

We can simply add support back on demand. Now we want to publish
copy-before-write filter, and instead of thinking how to pass
use-copy-range argument to block-copy (create x-block-copy parameter
for new public filter driver, or may be set it by hand after filter
node creation?), instead of this let's just drop copy-range support in
backup for now.

After this patch copy-range support in block-copy becomes unused. Let's
keep it for a while, it won't hurt:

1. If there would be request for supporting copy_range in backup
   (and/or in a new public copy-before-write filter), it will be easy
   to satisfy it.

2. Probably, qemu-img convert will reuse block-copy, and qemu-img has
   option to enable copy-range. qemu-img convert is not a backup, and
   copy_range may be more reasonable for some cases in context of
   qemu-img convert.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
---
 block/copy-before-write.h | 1 -
 block/backup.c            | 3 +--
 block/copy-before-write.c | 4 +---
 3 files changed, 2 insertions(+), 6 deletions(-)

Comments

Vladimir Sementsov-Ogievskiy May 28, 2021, 3:29 p.m. UTC | #1
20.05.2021 17:21, Vladimir Sementsov-Ogievskiy wrote:
> copy_range is not a default behavior since 6a30f663d4c0b3c, and it's
> now available only though x-perf experimantal argument, so it's OK to
> drop it.
> 
> Even when backup is used to copy disk to same filesystem, and
> filesystem supports zero-copy copy_range, copy_range is probably not
> what we want for backup: backup has good property of making a copy of
> active disk, with no impact to active disk itself (unlike creating a
> snapshot). And if copy_range instead of copying data adds fs-level
> references, and on next guest write COW operation occurs, it's seems
> most possible, that new block will be allocated for original vm disk,
> not for backup disk. Thus, fragmentation of original disk will
> increase.
> 
> We can simply add support back on demand. Now we want to publish
> copy-before-write filter, and instead of thinking how to pass
> use-copy-range argument to block-copy (create x-block-copy parameter
> for new public filter driver, or may be set it by hand after filter
> node creation?), instead of this let's just drop copy-range support in
> backup for now.
> 
> After this patch copy-range support in block-copy becomes unused. Let's
> keep it for a while, it won't hurt:
> 
> 1. If there would be request for supporting copy_range in backup
>     (and/or in a new public copy-before-write filter), it will be easy
>     to satisfy it.
> 
> 2. Probably, qemu-img convert will reuse block-copy, and qemu-img has
>     option to enable copy-range. qemu-img convert is not a backup, and
>     copy_range may be more reasonable for some cases in context of
>     qemu-img convert.
> 

Actually, I know one case, where copy_range for backup job may be reasonable:

Using backup in push-backup with fleecing scheme in

    [PATCH 0/6] push backup with fleecing

Of-course, no real sense in using push-backup-with-fleecing scheme with both temp image and final backup target being on the same file system (no benefit of fleecing, we can use simple backup without temporary image).

But we absolutely don't care about fragmentation of temp disk.

Still, it doesn't make sense, as temp-image and real-backup-target should not be on same file-system..

Could it be some distributed filesystem, where it still make sense to call copy_range? Theoretically could.


Another thought: I'm going also to implement RAM-cache driver, to optimize push-backup-with-fleecing scheme. I'll need a way to copy data from RAM-cache node to final-target. I can implement copy_range for RAM-cache, and this will allow to not create extra buffer, but use the buffer that is already allocated and own by RAM-cache.. Still, this behavior is obviously good, it should work automatically, no reason to make it optional..


Hmm, so what should be summarized:

- Actually, block-copy does copy_range. So, probably it's good to change the copy_range() function in qemu to fallback to read+write..

And about copy_range itself, what we want:

1. We want to control does it influence fragmentation of source disk. When copying from temporary image we don't care. But when source of block-copy is active disk in we do care to not influence how original disk lay in filesystem. Probably, we even want an option for copy_range() syscall to control this thing.

2. We want to be efficient with copy_size, ie size of chunks to copy. We even have existing issue in block-copy: write-zero is limited to BLOCK_COPY_MAX_BUFFER which is obviously inefficient.

For copy-size we should have some good defaults or automatic detection logic..

For copy_range fragmentation..

If we have some internal copy_range-like optimizations like zero-copy from RAM-cache node, or maybe copy compressed data from one qcow2 node to another without decompression, it should be done anyway, it shouldn't be set by user option. Still, for file-posix, we don't know, does underlying filesystem copy_range() implementation lead to fragmentation or not. And we don't know is user OK with it or not. So we need an option.. So, it's probably better to keep x-perm.copy-range for now, until we don't have a good idea on interface.
diff mbox series

Patch

diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index 5977b7aa31..e284dfb6a7 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -33,7 +33,6 @@  BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
                                   BlockDriverState *target,
                                   const char *filter_node_name,
                                   uint64_t cluster_size,
-                                  BackupPerf *perf,
                                   BdrvRequestFlags write_flags,
                                   BlockCopyState **bcs,
                                   Error **errp);
diff --git a/block/backup.c b/block/backup.c
index ac91821b08..d41dd30e25 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -522,8 +522,7 @@  BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
                   (compress ? BDRV_REQ_WRITE_COMPRESSED : 0),
 
     cbw = bdrv_cbw_append(bs, target, filter_node_name,
-                                        cluster_size, perf,
-                                        write_flags, &bcs, errp);
+                          cluster_size, write_flags, &bcs, errp);
     if (!cbw) {
         goto error;
     }
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 0dc5a107cf..bc795adb87 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -170,7 +170,6 @@  BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
                                   BlockDriverState *target,
                                   const char *filter_node_name,
                                   uint64_t cluster_size,
-                                  BackupPerf *perf,
                                   BdrvRequestFlags write_flags,
                                   BlockCopyState **bcs,
                                   Error **errp)
@@ -217,8 +216,7 @@  BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
 
     state->cluster_size = cluster_size;
     state->bcs = block_copy_state_new(top->backing, state->target,
-                                      cluster_size, perf->use_copy_range,
-                                      write_flags, errp);
+                                      cluster_size, false, write_flags, errp);
     if (!state->bcs) {
         error_prepend(errp, "Cannot create block-copy-state: ");
         goto fail;