diff mbox

mirror: add target-zeroed flag

Message ID 1464962711-617992-1-git-send-email-vsementsov@virtuozzo.com (mailing list archive)
State New, archived
Headers show

Commit Message

Vladimir Sementsov-Ogievskiy June 3, 2016, 2:05 p.m. UTC
Add target-zeroed flag to allow user specify that target is already
zeroed. With this flag set zeroes which was in source before mirror
start will not be copyed.

Without this libvirt migration of empty disk takes too long time.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---

I've tested it with
time virsh migrate --live test qemu+ssh://other_node/system --copy-storage-all

Without 'target-zeroed' libvirt migration of vm with empty qcow2 disk of
400Mb to another node takes for me more than 5 minutes. Migration of 5Gb
disk was not finished in 28 minutes.

With new flag on, migration of 16Tb empty disk takes about a minute.

 block/mirror.c            | 16 +++++++++++-----
 blockdev.c                |  9 ++++++++-
 hmp.c                     |  2 +-
 include/block/block_int.h |  2 +-
 qapi/block-core.json      |  5 ++++-
 qmp-commands.hx           |  4 +++-
 6 files changed, 28 insertions(+), 10 deletions(-)

Comments

Eric Blake June 3, 2016, 3:06 p.m. UTC | #1
On 06/03/2016 08:05 AM, Vladimir Sementsov-Ogievskiy wrote:
> Add target-zeroed flag to allow user specify that target is already
> zeroed. With this flag set zeroes which was in source before mirror
> start will not be copyed.

With this flag set, any runs of zeroes in the source before the mirror
starts will not be copied.

> 
> Without this libvirt migration of empty disk takes too long time.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> 
> I've tested it with
> time virsh migrate --live test qemu+ssh://other_node/system --copy-storage-all

Presumably with a libvirt patch to turn on the optional flag.

I'm not sure I like this patch.  Libvirt uses NBD to implement
--copy-storage-all, I think we're better off improving NBD to
automatically handle sparse writes, than we are to add a one-off hack
that requires libvirt to change.  That is, once NBD is smarter, the copy
will be faster without needing a tweak.  And we ARE working on making
NBD smarter (one of my goals for the 2.7 release is to get all the
sparse file additions to NBD implemented)

That said, I'll still review it.

> 
> Without 'target-zeroed' libvirt migration of vm with empty qcow2 disk of
> 400Mb to another node takes for me more than 5 minutes. Migration of 5Gb
> disk was not finished in 28 minutes.
> 
> With new flag on, migration of 16Tb empty disk takes about a minute.
> 
>  block/mirror.c            | 16 +++++++++++-----
>  blockdev.c                |  9 ++++++++-
>  hmp.c                     |  2 +-
>  include/block/block_int.h |  2 +-
>  qapi/block-core.json      |  5 ++++-
>  qmp-commands.hx           |  4 +++-
>  6 files changed, 28 insertions(+), 10 deletions(-)
> 

> +++ b/hmp.c
> @@ -1097,7 +1097,7 @@ void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
>                       false, NULL, false, NULL,
>                       full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
>                       true, mode, false, 0, false, 0, false, 0,
> -                     false, 0, false, 0, false, true, &err);
> +                     false, 0, false, 0, false, true, false, false, &err);

I'd like for my qapi 'box' patches to land, so that this can be simpler.

https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg03569.html

> +++ b/qapi/block-core.json
> @@ -1154,6 +1154,9 @@
>  #         written. Both will result in identical contents.
>  #         Default is true. (Since 2.4)
>  #
> +# @target-zeroed: #optional Whether target is already zeroed, so most of zeroes
> +#                 should not be transferred. (Since 2.7)

Grammar suggestion:

#optional True if target already reads as all zeroes, so that holes in
the source need not be transferred. (Since 2.7)

> +++ b/qmp-commands.hx
> @@ -1632,7 +1632,7 @@ EQMP
>          .args_type  = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?,"
>                        "node-name:s?,replaces:s?,"
>                        "on-source-error:s?,on-target-error:s?,"
> -                      "unmap:b?,"
> +                      "unmap:b?,target-zeroed:b?"
>                        "granularity:i?,buf-size:i?",

I'm trying to get rid of .args_type (or rather, Marc-Andre's qapi
patches, that are stalled on my qapi patches, do the job); but for now
this part is correct.

>          .mhandler.cmd_new = qmp_marshal_drive_mirror,
>      },
> @@ -1674,6 +1674,8 @@ Arguments:
>    (BlockdevOnError, default 'report')
>  - "unmap": whether the target sectors should be discarded where source has only
>    zeroes. (json-bool, optional, default true)
> +- "target-zeroed": whether target is already zeroed, so most of zeroes should
> +  not be transferred. (json-bool, optional, default false)

Similar wording as above.
Denis V. Lunev June 3, 2016, 3:45 p.m. UTC | #2
On 06/03/2016 06:06 PM, Eric Blake wrote:
> On 06/03/2016 08:05 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Add target-zeroed flag to allow user specify that target is already
>> zeroed. With this flag set zeroes which was in source before mirror
>> start will not be copyed.
> With this flag set, any runs of zeroes in the source before the mirror
> starts will not be copied.
>
>> Without this libvirt migration of empty disk takes too long time.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>
>> I've tested it with
>> time virsh migrate --live test qemu+ssh://other_node/system --copy-storage-all
> Presumably with a libvirt patch to turn on the optional flag.
>
> I'm not sure I like this patch.  Libvirt uses NBD to implement
> --copy-storage-all, I think we're better off improving NBD to
> automatically handle sparse writes, than we are to add a one-off hack
> that requires libvirt to change.  That is, once NBD is smarter, the copy
> will be faster without needing a tweak.  And we ARE working on making
> NBD smarter (one of my goals for the 2.7 release is to get all the
> sparse file additions to NBD implemented)
>
> That said, I'll still review it.
this is not enough, definitely.

There is a problem that mirror_iteration code sleeps even for
not read zeroes (and this IS slow). Moreover, even sending sparcified
zeroes takes a lot of time for round trips.

We have started with that and spent a lot of time trying to improve
the situation.

Also, as a side note, the QCOW2 file on a source and a target will
be different without the flag - original image has empty blocks,
target image will have blocks explicitly marked with zeroes.

Though this is a matter of taste... For us this approach is the simplest.

Den
Vladimir Sementsov-Ogievskiy June 7, 2016, 4:30 p.m. UTC | #3
On 03.06.2016 18:45, Denis V. Lunev wrote:
> On 06/03/2016 06:06 PM, Eric Blake wrote:
>> On 06/03/2016 08:05 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Add target-zeroed flag to allow user specify that target is already
>>> zeroed. With this flag set zeroes which was in source before mirror
>>> start will not be copyed.
>> With this flag set, any runs of zeroes in the source before the mirror
>> starts will not be copied.
>>
>>> Without this libvirt migration of empty disk takes too long time.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>
>>> I've tested it with
>>> time virsh migrate --live test qemu+ssh://other_node/system 
>>> --copy-storage-all
>> Presumably with a libvirt patch to turn on the optional flag.
>>
>> I'm not sure I like this patch.  Libvirt uses NBD to implement
>> --copy-storage-all, I think we're better off improving NBD to
>> automatically handle sparse writes, than we are to add a one-off hack
>> that requires libvirt to change.  That is, once NBD is smarter, the copy
>> will be faster without needing a tweak.  And we ARE working on making
>> NBD smarter (one of my goals for the 2.7 release is to get all the
>> sparse file additions to NBD implemented)
>>
>> That said, I'll still review it.
> this is not enough, definitely.
>
> There is a problem that mirror_iteration code sleeps even for
> not read zeroes (and this IS slow). Moreover, even sending sparcified
> zeroes takes a lot of time for round trips.
>
> We have started with that and spent a lot of time trying to improve
> the situation.
>
> Also, as a side note, the QCOW2 file on a source and a target will
> be different without the flag - original image has empty blocks,
> target image will have blocks explicitly marked with zeroes.
>
> Though this is a matter of taste... For us this approach is the simplest.
>
> Den

Hey, what do think about it? Don't we forget somebody to be cc'ed?
Stefan Hajnoczi June 10, 2016, 4:59 p.m. UTC | #4
On Tue, Jun 07, 2016 at 07:30:19PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 03.06.2016 18:45, Denis V. Lunev wrote:
> > On 06/03/2016 06:06 PM, Eric Blake wrote:
> > > On 06/03/2016 08:05 AM, Vladimir Sementsov-Ogievskiy wrote:
> > > > Add target-zeroed flag to allow user specify that target is already
> > > > zeroed. With this flag set zeroes which was in source before mirror
> > > > start will not be copyed.
> > > With this flag set, any runs of zeroes in the source before the mirror
> > > starts will not be copied.
> > > 
> > > > Without this libvirt migration of empty disk takes too long time.
> > > > 
> > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > > > ---
> > > > 
> > > > I've tested it with
> > > > time virsh migrate --live test qemu+ssh://other_node/system
> > > > --copy-storage-all
> > > Presumably with a libvirt patch to turn on the optional flag.
> > > 
> > > I'm not sure I like this patch.  Libvirt uses NBD to implement
> > > --copy-storage-all, I think we're better off improving NBD to
> > > automatically handle sparse writes, than we are to add a one-off hack
> > > that requires libvirt to change.  That is, once NBD is smarter, the copy
> > > will be faster without needing a tweak.  And we ARE working on making
> > > NBD smarter (one of my goals for the 2.7 release is to get all the
> > > sparse file additions to NBD implemented)
> > > 
> > > That said, I'll still review it.
> > this is not enough, definitely.
> > 
> > There is a problem that mirror_iteration code sleeps even for
> > not read zeroes (and this IS slow). Moreover, even sending sparcified
> > zeroes takes a lot of time for round trips.
> > 
> > We have started with that and spent a lot of time trying to improve
> > the situation.
> > 
> > Also, as a side note, the QCOW2 file on a source and a target will
> > be different without the flag - original image has empty blocks,
> > target image will have blocks explicitly marked with zeroes.
> > 
> > Though this is a matter of taste... For us this approach is the simplest.
> > 
> > Den
> 
> Hey, what do think about it? Don't we forget somebody to be cc'ed?

Jeff Cody <jcody@redhat.com> maintains block jobs.  This patch should go
through him.

Stefan
Denis V. Lunev June 11, 2016, 11:57 a.m. UTC | #5
On 06/10/2016 07:59 PM, Stefan Hajnoczi wrote:
> On Tue, Jun 07, 2016 at 07:30:19PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 03.06.2016 18:45, Denis V. Lunev wrote:
>>> On 06/03/2016 06:06 PM, Eric Blake wrote:
>>>> On 06/03/2016 08:05 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>>> Add target-zeroed flag to allow user specify that target is already
>>>>> zeroed. With this flag set zeroes which was in source before mirror
>>>>> start will not be copyed.
>>>> With this flag set, any runs of zeroes in the source before the mirror
>>>> starts will not be copied.
>>>>
>>>>> Without this libvirt migration of empty disk takes too long time.
>>>>>
>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>> ---
>>>>>
>>>>> I've tested it with
>>>>> time virsh migrate --live test qemu+ssh://other_node/system
>>>>> --copy-storage-all
>>>> Presumably with a libvirt patch to turn on the optional flag.
>>>>
>>>> I'm not sure I like this patch.  Libvirt uses NBD to implement
>>>> --copy-storage-all, I think we're better off improving NBD to
>>>> automatically handle sparse writes, than we are to add a one-off hack
>>>> that requires libvirt to change.  That is, once NBD is smarter, the copy
>>>> will be faster without needing a tweak.  And we ARE working on making
>>>> NBD smarter (one of my goals for the 2.7 release is to get all the
>>>> sparse file additions to NBD implemented)
>>>>
>>>> That said, I'll still review it.
>>> this is not enough, definitely.
>>>
>>> There is a problem that mirror_iteration code sleeps even for
>>> not read zeroes (and this IS slow). Moreover, even sending sparcified
>>> zeroes takes a lot of time for round trips.
>>>
>>> We have started with that and spent a lot of time trying to improve
>>> the situation.
>>>
>>> Also, as a side note, the QCOW2 file on a source and a target will
>>> be different without the flag - original image has empty blocks,
>>> target image will have blocks explicitly marked with zeroes.
>>>
>>> Though this is a matter of taste... For us this approach is the simplest.
>>>
>>> Den
>> Hey, what do think about it? Don't we forget somebody to be cc'ed?
> Jeff Cody <jcody@redhat.com> maintains block jobs.  This patch should go
> through him.
>
> Stefan
thank you.

I think I have a better patch for this without additional flag.
I have used approach similar one in qemu-img. I am going
to resubmit more with driver-mirror stuff at the beginning
of next week.

Den
diff mbox

Patch

diff --git a/block/mirror.c b/block/mirror.c
index 80fd3c7..9604cae 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -65,6 +65,8 @@  typedef struct MirrorBlockJob {
     bool waiting_for_io;
     int target_cluster_sectors;
     int max_iov;
+
+    bool target_zeroed;
 } MirrorBlockJob;
 
 typedef struct MirrorOp {
@@ -570,7 +572,9 @@  static void coroutine_fn mirror_run(void *opaque)
     if (!s->is_none_mode) {
         /* First part, loop on the sectors and initialize the dirty bitmap.  */
         BlockDriverState *base = s->base;
-        bool mark_all_dirty = s->base == NULL && !bdrv_has_zero_init(target_bs);
+
+        bool mark_all_dirty = s->base == NULL &&
+            !(s->target_zeroed || bdrv_has_zero_init(target_bs));
 
         for (sector_num = 0; sector_num < end; ) {
             /* Just to make sure we are not exceeding int limit. */
@@ -801,7 +805,7 @@  static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
                              int64_t buf_size,
                              BlockdevOnError on_source_error,
                              BlockdevOnError on_target_error,
-                             bool unmap,
+                             bool unmap, bool target_zeroed,
                              BlockCompletionFunc *cb,
                              void *opaque, Error **errp,
                              const BlockJobDriver *driver,
@@ -840,6 +844,7 @@  static void mirror_start_job(BlockDriverState *bs, BlockDriverState *target,
     s->granularity = granularity;
     s->buf_size = ROUND_UP(buf_size, granularity);
     s->unmap = unmap;
+    s->target_zeroed = target_zeroed;
 
     s->dirty_bitmap = bdrv_create_dirty_bitmap(bs, granularity, NULL, errp);
     if (!s->dirty_bitmap) {
@@ -861,7 +866,7 @@  void mirror_start(BlockDriverState *bs, BlockDriverState *target,
                   int64_t speed, uint32_t granularity, int64_t buf_size,
                   MirrorSyncMode mode, BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
-                  bool unmap,
+                  bool unmap, bool target_zeroed,
                   BlockCompletionFunc *cb,
                   void *opaque, Error **errp)
 {
@@ -876,7 +881,8 @@  void mirror_start(BlockDriverState *bs, BlockDriverState *target,
     base = mode == MIRROR_SYNC_MODE_TOP ? backing_bs(bs) : NULL;
     mirror_start_job(bs, target, replaces,
                      speed, granularity, buf_size,
-                     on_source_error, on_target_error, unmap, cb, opaque, errp,
+                     on_source_error, on_target_error, unmap, target_zeroed,
+                     cb, opaque, errp,
                      &mirror_job_driver, is_none_mode, base);
 }
 
@@ -923,7 +929,7 @@  void commit_active_start(BlockDriverState *bs, BlockDriverState *base,
     }
 
     mirror_start_job(bs, base, NULL, speed, 0, 0,
-                     on_error, on_error, false, cb, opaque, &local_err,
+                     on_error, on_error, false, false, cb, opaque, &local_err,
                      &commit_active_job_driver, false, base);
     if (local_err) {
         error_propagate(errp, local_err);
diff --git a/blockdev.c b/blockdev.c
index 717785e..f70fb1d 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3418,6 +3418,7 @@  static void blockdev_mirror_common(BlockDriverState *bs,
                                    bool has_on_target_error,
                                    BlockdevOnError on_target_error,
                                    bool has_unmap, bool unmap,
+                                   bool has_target_zeroed, bool target_zeroed,
                                    Error **errp)
 {
 
@@ -3439,6 +3440,9 @@  static void blockdev_mirror_common(BlockDriverState *bs,
     if (!has_unmap) {
         unmap = true;
     }
+    if (!has_target_zeroed) {
+        target_zeroed = false;
+    }
 
     if (granularity != 0 && (granularity < 512 || granularity > 1048576 * 64)) {
         error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "granularity",
@@ -3468,7 +3472,7 @@  static void blockdev_mirror_common(BlockDriverState *bs,
     mirror_start(bs, target,
                  has_replaces ? replaces : NULL,
                  speed, granularity, buf_size, sync,
-                 on_source_error, on_target_error, unmap,
+                 on_source_error, on_target_error, unmap, target_zeroed,
                  block_job_cb, bs, errp);
 }
 
@@ -3484,6 +3488,7 @@  void qmp_drive_mirror(const char *device, const char *target,
                       bool has_on_source_error, BlockdevOnError on_source_error,
                       bool has_on_target_error, BlockdevOnError on_target_error,
                       bool has_unmap, bool unmap,
+                      bool has_target_zeroed, bool target_zeroed,
                       Error **errp)
 {
     BlockDriverState *bs;
@@ -3618,6 +3623,7 @@  void qmp_drive_mirror(const char *device, const char *target,
                            has_on_source_error, on_source_error,
                            has_on_target_error, on_target_error,
                            has_unmap, unmap,
+                           has_target_zeroed, target_zeroed,
                            &local_err);
     bdrv_unref(target_bs);
     if (local_err) {
@@ -3675,6 +3681,7 @@  void qmp_blockdev_mirror(const char *device, const char *target,
                            has_on_source_error, on_source_error,
                            has_on_target_error, on_target_error,
                            true, true,
+                           true, false,
                            &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
diff --git a/hmp.c b/hmp.c
index a4b1d3d..5574dad 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1097,7 +1097,7 @@  void hmp_drive_mirror(Monitor *mon, const QDict *qdict)
                      false, NULL, false, NULL,
                      full ? MIRROR_SYNC_MODE_FULL : MIRROR_SYNC_MODE_TOP,
                      true, mode, false, 0, false, 0, false, 0,
-                     false, 0, false, 0, false, true, &err);
+                     false, 0, false, 0, false, true, false, false, &err);
     hmp_handle_error(mon, &err);
 }
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 30a9717..f19ff88 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -687,7 +687,7 @@  void mirror_start(BlockDriverState *bs, BlockDriverState *target,
                   int64_t speed, uint32_t granularity, int64_t buf_size,
                   MirrorSyncMode mode, BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
-                  bool unmap,
+                  bool unmap, bool target_zeroed,
                   BlockCompletionFunc *cb,
                   void *opaque, Error **errp);
 
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 98a20d2..a2b3706 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1154,6 +1154,9 @@ 
 #         written. Both will result in identical contents.
 #         Default is true. (Since 2.4)
 #
+# @target-zeroed: #optional Whether target is already zeroed, so most of zeroes
+#                 should not be transferred. (Since 2.7)
+#
 # Returns: nothing on success
 #          If @device is not a valid block device, DeviceNotFound
 #
@@ -1166,7 +1169,7 @@ 
             '*speed': 'int', '*granularity': 'uint32',
             '*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
             '*on-target-error': 'BlockdevOnError',
-            '*unmap': 'bool' } }
+            '*unmap': 'bool', '*target-zeroed': 'bool' } }
 
 ##
 # @BlockDirtyBitmap
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 28801a2..1c1d454 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -1632,7 +1632,7 @@  EQMP
         .args_type  = "sync:s,device:B,target:s,speed:i?,mode:s?,format:s?,"
                       "node-name:s?,replaces:s?,"
                       "on-source-error:s?,on-target-error:s?,"
-                      "unmap:b?,"
+                      "unmap:b?,target-zeroed:b?"
                       "granularity:i?,buf-size:i?",
         .mhandler.cmd_new = qmp_marshal_drive_mirror,
     },
@@ -1674,6 +1674,8 @@  Arguments:
   (BlockdevOnError, default 'report')
 - "unmap": whether the target sectors should be discarded where source has only
   zeroes. (json-bool, optional, default true)
+- "target-zeroed": whether target is already zeroed, so most of zeroes should
+  not be transferred. (json-bool, optional, default false)
 
 The default value of the granularity is the image cluster size clamped
 between 4096 and 65536, if the image format defines one.  If the format