diff mbox series

[3/3] migration/colo: Merge multi checkpoint request into one.

Message ID 20200515042818.17908-4-chen.zhang@intel.com (mailing list archive)
State New, archived
Headers show
Series migration/colo: Optimize COLO framework code | expand

Commit Message

Zhang Chen May 15, 2020, 4:28 a.m. UTC
From: Zhang Chen <chen.zhang@intel.com>

When COLO guest occur issues, COLO-compare will catch lots of
different network packet and trigger notification multi times,
force periodic may happen at the same time. So this can be
efficient merge checkpoint request within COLO_CHECKPOINT_INTERVAL.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
 migration/colo.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

Comments

Zhanghailiang June 2, 2020, 6:59 a.m. UTC | #1
> -----Original Message-----
> From: Zhang Chen [mailto:chen.zhang@intel.com]
> Sent: Friday, May 15, 2020 12:28 PM
> To: Dr . David Alan Gilbert <dgilbert@redhat.com>; Juan Quintela
> <quintela@redhat.com>; Zhanghailiang <zhang.zhanghailiang@huawei.com>;
> qemu-dev <qemu-devel@nongnu.org>
> Cc: Zhang Chen <zhangckid@gmail.com>; Jason Wang
> <jasowang@redhat.com>; Zhang Chen <chen.zhang@intel.com>
> Subject: [PATCH 3/3] migration/colo: Merge multi checkpoint request into
> one.
> 
> From: Zhang Chen <chen.zhang@intel.com>
> 
> When COLO guest occur issues, COLO-compare will catch lots of different
> network packet and trigger notification multi times, force periodic may
> happen at the same time. So this can be efficient merge checkpoint request
> within COLO_CHECKPOINT_INTERVAL.
> 
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> ---
>  migration/colo.c | 22 ++++++++++++++++------
>  1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/colo.c b/migration/colo.c index
> d5bced22cb..e6a7d8c6e2 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -47,6 +47,9 @@ static COLOMode last_colo_mode;
> 
>  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
> 
> +/* Default COLO_CHECKPOINT_INTERVAL is 1000 ms */ #define
> +COLO_CHECKPOINT_INTERVAL 1000
> +
>  bool migration_in_colo_state(void)
>  {
>      MigrationState *s = migrate_get_current(); @@ -651,13 +654,20 @@
> out:
>  void colo_checkpoint_notify(void *opaque)  {
>      MigrationState *s = opaque;
> -    int64_t next_notify_time;
> +    int64_t now = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> 
> -    qemu_sem_post(&s->colo_checkpoint_sem);
> -    s->colo_checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> -    next_notify_time = s->colo_checkpoint_time +
> -                    s->parameters.x_checkpoint_delay;
> -    timer_mod(s->colo_delay_timer, next_notify_time);
> +    /*
> +     * When COLO guest occur issues, COLO-compare will catch lots of
> +     * different network packet and trigger notification multi times,
> +     * force periodic may happen at the same time. So this can be
> +     * efficient merge checkpoint request within
> COLO_CHECKPOINT_INTERVAL.
> +     */
> +    if (now > s->colo_checkpoint_time + COLO_CHECKPOINT_INTERVAL) {
> +        qemu_sem_post(&s->colo_checkpoint_sem);

It is not right here, this notification should not be controlled by the interval time,
I got what happened here, when multiple checkpoint requires come, this
Colo_delay_time will be added every time and it will be a big value which is not what we want.

Besides, please update this patch based on [PATCH 0/6] colo: migration related bugfixes series which
Has modified the same place.



> +        timer_mod(s->colo_delay_timer, now +
> +                  s->parameters.x_checkpoint_delay);
> +        s->colo_checkpoint_time = now;
> +    }
>  }
> 
>  void migrate_start_colo_process(MigrationState *s)
> --
> 2.17.1
Zhang Chen June 3, 2020, 9:11 a.m. UTC | #2
> -----Original Message-----
> From: Zhanghailiang <zhang.zhanghailiang@huawei.com>
> Sent: Tuesday, June 2, 2020 2:59 PM
> To: Zhang, Chen <chen.zhang@intel.com>; Dr . David Alan Gilbert
> <dgilbert@redhat.com>; Juan Quintela <quintela@redhat.com>; qemu-dev
> <qemu-devel@nongnu.org>
> Cc: Zhang Chen <zhangckid@gmail.com>; Jason Wang
> <jasowang@redhat.com>
> Subject: RE: [PATCH 3/3] migration/colo: Merge multi checkpoint request
> into one.
> 
> 
> 
> > -----Original Message-----
> > From: Zhang Chen [mailto:chen.zhang@intel.com]
> > Sent: Friday, May 15, 2020 12:28 PM
> > To: Dr . David Alan Gilbert <dgilbert@redhat.com>; Juan Quintela
> > <quintela@redhat.com>; Zhanghailiang
> <zhang.zhanghailiang@huawei.com>;
> > qemu-dev <qemu-devel@nongnu.org>
> > Cc: Zhang Chen <zhangckid@gmail.com>; Jason Wang
> > <jasowang@redhat.com>; Zhang Chen <chen.zhang@intel.com>
> > Subject: [PATCH 3/3] migration/colo: Merge multi checkpoint request
> > into one.
> >
> > From: Zhang Chen <chen.zhang@intel.com>
> >
> > When COLO guest occur issues, COLO-compare will catch lots of
> > different network packet and trigger notification multi times, force
> > periodic may happen at the same time. So this can be efficient merge
> > checkpoint request within COLO_CHECKPOINT_INTERVAL.
> >
> > Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> > ---
> >  migration/colo.c | 22 ++++++++++++++++------
> >  1 file changed, 16 insertions(+), 6 deletions(-)
> >
> > diff --git a/migration/colo.c b/migration/colo.c index
> > d5bced22cb..e6a7d8c6e2 100644
> > --- a/migration/colo.c
> > +++ b/migration/colo.c
> > @@ -47,6 +47,9 @@ static COLOMode last_colo_mode;
> >
> >  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
> >
> > +/* Default COLO_CHECKPOINT_INTERVAL is 1000 ms */ #define
> > +COLO_CHECKPOINT_INTERVAL 1000
> > +
> >  bool migration_in_colo_state(void)
> >  {
> >      MigrationState *s = migrate_get_current(); @@ -651,13 +654,20 @@
> > out:
> >  void colo_checkpoint_notify(void *opaque)  {
> >      MigrationState *s = opaque;
> > -    int64_t next_notify_time;
> > +    int64_t now = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> >
> > -    qemu_sem_post(&s->colo_checkpoint_sem);
> > -    s->colo_checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> > -    next_notify_time = s->colo_checkpoint_time +
> > -                    s->parameters.x_checkpoint_delay;
> > -    timer_mod(s->colo_delay_timer, next_notify_time);
> > +    /*
> > +     * When COLO guest occur issues, COLO-compare will catch lots of
> > +     * different network packet and trigger notification multi times,
> > +     * force periodic may happen at the same time. So this can be
> > +     * efficient merge checkpoint request within
> > COLO_CHECKPOINT_INTERVAL.
> > +     */
> > +    if (now > s->colo_checkpoint_time + COLO_CHECKPOINT_INTERVAL) {
> > +        qemu_sem_post(&s->colo_checkpoint_sem);
> 
> It is not right here, this notification should not be controlled by the interval
> time, I got what happened here, when multiple checkpoint requires come,
> this Colo_delay_time will be added every time and it will be a big value which
> is not what we want.

Not just this, multi checkpoint will spend lots of resource to sync memory from PVM to SVM,
It will make VM stop/start multi times, but for the results are same with one checkpoint. 
So in short time just need one checkpoint, because do checkpoint still need some time...

Thanks
Zhang Chen

> 
> Besides, please update this patch based on [PATCH 0/6] colo: migration
> related bugfixes series which Has modified the same place.
> 
> 
> 
> > +        timer_mod(s->colo_delay_timer, now +
> > +                  s->parameters.x_checkpoint_delay);
> > +        s->colo_checkpoint_time = now;
> > +    }
> >  }
> >
> >  void migrate_start_colo_process(MigrationState *s)
> > --
> > 2.17.1
Zhanghailiang June 3, 2020, 9:38 a.m. UTC | #3
> -----Original Message-----
> From: Zhang, Chen [mailto:chen.zhang@intel.com]
> Sent: Wednesday, June 3, 2020 5:11 PM
> To: Zhanghailiang <zhang.zhanghailiang@huawei.com>; Dr . David Alan
> Gilbert <dgilbert@redhat.com>; Juan Quintela <quintela@redhat.com>;
> qemu-dev <qemu-devel@nongnu.org>
> Cc: Zhang Chen <zhangckid@gmail.com>; Jason Wang
> <jasowang@redhat.com>
> Subject: RE: [PATCH 3/3] migration/colo: Merge multi checkpoint request
> into one.
> 
> 
> 
> > -----Original Message-----
> > From: Zhanghailiang <zhang.zhanghailiang@huawei.com>
> > Sent: Tuesday, June 2, 2020 2:59 PM
> > To: Zhang, Chen <chen.zhang@intel.com>; Dr . David Alan Gilbert
> > <dgilbert@redhat.com>; Juan Quintela <quintela@redhat.com>; qemu-dev
> > <qemu-devel@nongnu.org>
> > Cc: Zhang Chen <zhangckid@gmail.com>; Jason Wang
> <jasowang@redhat.com>
> > Subject: RE: [PATCH 3/3] migration/colo: Merge multi checkpoint
> > request into one.
> >
> >
> >
> > > -----Original Message-----
> > > From: Zhang Chen [mailto:chen.zhang@intel.com]
> > > Sent: Friday, May 15, 2020 12:28 PM
> > > To: Dr . David Alan Gilbert <dgilbert@redhat.com>; Juan Quintela
> > > <quintela@redhat.com>; Zhanghailiang
> > <zhang.zhanghailiang@huawei.com>;
> > > qemu-dev <qemu-devel@nongnu.org>
> > > Cc: Zhang Chen <zhangckid@gmail.com>; Jason Wang
> > > <jasowang@redhat.com>; Zhang Chen <chen.zhang@intel.com>
> > > Subject: [PATCH 3/3] migration/colo: Merge multi checkpoint request
> > > into one.
> > >
> > > From: Zhang Chen <chen.zhang@intel.com>
> > >
> > > When COLO guest occur issues, COLO-compare will catch lots of
> > > different network packet and trigger notification multi times, force
> > > periodic may happen at the same time. So this can be efficient merge
> > > checkpoint request within COLO_CHECKPOINT_INTERVAL.
> > >
> > > Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> > > ---
> > >  migration/colo.c | 22 ++++++++++++++++------
> > >  1 file changed, 16 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/migration/colo.c b/migration/colo.c index
> > > d5bced22cb..e6a7d8c6e2 100644
> > > --- a/migration/colo.c
> > > +++ b/migration/colo.c
> > > @@ -47,6 +47,9 @@ static COLOMode last_colo_mode;
> > >
> > >  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
> > >
> > > +/* Default COLO_CHECKPOINT_INTERVAL is 1000 ms */ #define
> > > +COLO_CHECKPOINT_INTERVAL 1000
> > > +
> > >  bool migration_in_colo_state(void)
> > >  {
> > >      MigrationState *s = migrate_get_current(); @@ -651,13 +654,20
> > > @@
> > > out:
> > >  void colo_checkpoint_notify(void *opaque)  {
> > >      MigrationState *s = opaque;
> > > -    int64_t next_notify_time;
> > > +    int64_t now = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> > >
> > > -    qemu_sem_post(&s->colo_checkpoint_sem);
> > > -    s->colo_checkpoint_time =
> qemu_clock_get_ms(QEMU_CLOCK_HOST);
> > > -    next_notify_time = s->colo_checkpoint_time +
> > > -                    s->parameters.x_checkpoint_delay;
> > > -    timer_mod(s->colo_delay_timer, next_notify_time);
> > > +    /*
> > > +     * When COLO guest occur issues, COLO-compare will catch lots of
> > > +     * different network packet and trigger notification multi times,
> > > +     * force periodic may happen at the same time. So this can be
> > > +     * efficient merge checkpoint request within
> > > COLO_CHECKPOINT_INTERVAL.
> > > +     */
> > > +    if (now > s->colo_checkpoint_time + COLO_CHECKPOINT_INTERVAL)
> {
> > > +        qemu_sem_post(&s->colo_checkpoint_sem);
> >
> > It is not right here, this notification should not be controlled by
> > the interval time, I got what happened here, when multiple checkpoint
> > requires come, this Colo_delay_time will be added every time and it
> > will be a big value which is not what we want.
> 
> Not just this, multi checkpoint will spend lots of resource to sync memory
> from PVM to SVM, It will make VM stop/start multi times, but for the results
> are same with one checkpoint.
> So in short time just need one checkpoint, because do checkpoint still need
> some time...
> 

Yes, this because we use semaphore here, it will be increased multiple times,
And I think Lukas's patch 'migration/colo.c: Use event instead of semaphore' has fixed this problem.
Did you try the qemu upstream which has merged this patch ?

> Thanks
> Zhang Chen
> 
> >
> > Besides, please update this patch based on [PATCH 0/6] colo: migration
> > related bugfixes series which Has modified the same place.
> >
> >
> >
> > > +        timer_mod(s->colo_delay_timer, now +
> > > +                  s->parameters.x_checkpoint_delay);
> > > +        s->colo_checkpoint_time = now;
> > > +    }
> > >  }
> > >
> > >  void migrate_start_colo_process(MigrationState *s)
> > > --
> > > 2.17.1
Zhang Chen June 4, 2020, 6:35 a.m. UTC | #4
> -----Original Message-----
> From: Zhanghailiang <zhang.zhanghailiang@huawei.com>
> Sent: Wednesday, June 3, 2020 5:39 PM
> To: Zhang, Chen <chen.zhang@intel.com>; Dr . David Alan Gilbert
> <dgilbert@redhat.com>; Juan Quintela <quintela@redhat.com>; qemu-dev
> <qemu-devel@nongnu.org>
> Cc: Zhang Chen <zhangckid@gmail.com>; Jason Wang
> <jasowang@redhat.com>
> Subject: RE: [PATCH 3/3] migration/colo: Merge multi checkpoint request
> into one.
> 
> > -----Original Message-----
> > From: Zhang, Chen [mailto:chen.zhang@intel.com]
> > Sent: Wednesday, June 3, 2020 5:11 PM
> > To: Zhanghailiang <zhang.zhanghailiang@huawei.com>; Dr . David Alan
> > Gilbert <dgilbert@redhat.com>; Juan Quintela <quintela@redhat.com>;
> > qemu-dev <qemu-devel@nongnu.org>
> > Cc: Zhang Chen <zhangckid@gmail.com>; Jason Wang
> <jasowang@redhat.com>
> > Subject: RE: [PATCH 3/3] migration/colo: Merge multi checkpoint
> > request into one.
> >
> >
> >
> > > -----Original Message-----
> > > From: Zhanghailiang <zhang.zhanghailiang@huawei.com>
> > > Sent: Tuesday, June 2, 2020 2:59 PM
> > > To: Zhang, Chen <chen.zhang@intel.com>; Dr . David Alan Gilbert
> > > <dgilbert@redhat.com>; Juan Quintela <quintela@redhat.com>; qemu-
> dev
> > > <qemu-devel@nongnu.org>
> > > Cc: Zhang Chen <zhangckid@gmail.com>; Jason Wang
> > <jasowang@redhat.com>
> > > Subject: RE: [PATCH 3/3] migration/colo: Merge multi checkpoint
> > > request into one.
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Zhang Chen [mailto:chen.zhang@intel.com]
> > > > Sent: Friday, May 15, 2020 12:28 PM
> > > > To: Dr . David Alan Gilbert <dgilbert@redhat.com>; Juan Quintela
> > > > <quintela@redhat.com>; Zhanghailiang
> > > <zhang.zhanghailiang@huawei.com>;
> > > > qemu-dev <qemu-devel@nongnu.org>
> > > > Cc: Zhang Chen <zhangckid@gmail.com>; Jason Wang
> > > > <jasowang@redhat.com>; Zhang Chen <chen.zhang@intel.com>
> > > > Subject: [PATCH 3/3] migration/colo: Merge multi checkpoint
> > > > request into one.
> > > >
> > > > From: Zhang Chen <chen.zhang@intel.com>
> > > >
> > > > When COLO guest occur issues, COLO-compare will catch lots of
> > > > different network packet and trigger notification multi times,
> > > > force periodic may happen at the same time. So this can be
> > > > efficient merge checkpoint request within
> COLO_CHECKPOINT_INTERVAL.
> > > >
> > > > Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> > > > ---
> > > >  migration/colo.c | 22 ++++++++++++++++------
> > > >  1 file changed, 16 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/migration/colo.c b/migration/colo.c index
> > > > d5bced22cb..e6a7d8c6e2 100644
> > > > --- a/migration/colo.c
> > > > +++ b/migration/colo.c
> > > > @@ -47,6 +47,9 @@ static COLOMode last_colo_mode;
> > > >
> > > >  #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
> > > >
> > > > +/* Default COLO_CHECKPOINT_INTERVAL is 1000 ms */ #define
> > > > +COLO_CHECKPOINT_INTERVAL 1000
> > > > +
> > > >  bool migration_in_colo_state(void)  {
> > > >      MigrationState *s = migrate_get_current(); @@ -651,13 +654,20
> > > > @@
> > > > out:
> > > >  void colo_checkpoint_notify(void *opaque)  {
> > > >      MigrationState *s = opaque;
> > > > -    int64_t next_notify_time;
> > > > +    int64_t now = qemu_clock_get_ms(QEMU_CLOCK_HOST);
> > > >
> > > > -    qemu_sem_post(&s->colo_checkpoint_sem);
> > > > -    s->colo_checkpoint_time =
> > qemu_clock_get_ms(QEMU_CLOCK_HOST);
> > > > -    next_notify_time = s->colo_checkpoint_time +
> > > > -                    s->parameters.x_checkpoint_delay;
> > > > -    timer_mod(s->colo_delay_timer, next_notify_time);
> > > > +    /*
> > > > +     * When COLO guest occur issues, COLO-compare will catch lots of
> > > > +     * different network packet and trigger notification multi times,
> > > > +     * force periodic may happen at the same time. So this can be
> > > > +     * efficient merge checkpoint request within
> > > > COLO_CHECKPOINT_INTERVAL.
> > > > +     */
> > > > +    if (now > s->colo_checkpoint_time + COLO_CHECKPOINT_INTERVAL)
> > {
> > > > +        qemu_sem_post(&s->colo_checkpoint_sem);
> > >
> > > It is not right here, this notification should not be controlled by
> > > the interval time, I got what happened here, when multiple
> > > checkpoint requires come, this Colo_delay_time will be added every
> > > time and it will be a big value which is not what we want.
> >
> > Not just this, multi checkpoint will spend lots of resource to sync
> > memory from PVM to SVM, It will make VM stop/start multi times, but
> > for the results are same with one checkpoint.
> > So in short time just need one checkpoint, because do checkpoint still
> > need some time...
> >
> 
> Yes, this because we use semaphore here, it will be increased multiple times,
> And I think Lukas's patch 'migration/colo.c: Use event instead of semaphore'
> has fixed this problem.
> Did you try the qemu upstream which has merged this patch ?

Oh, Thanks reminder, I will drop this patch and rebase on upstream for V2.

Thank
Zhang Chen

> 
> > Thanks
> > Zhang Chen
> >
> > >
> > > Besides, please update this patch based on [PATCH 0/6] colo:
> > > migration related bugfixes series which Has modified the same place.
> > >
> > >
> > >
> > > > +        timer_mod(s->colo_delay_timer, now +
> > > > +                  s->parameters.x_checkpoint_delay);
> > > > +        s->colo_checkpoint_time = now;
> > > > +    }
> > > >  }
> > > >
> > > >  void migrate_start_colo_process(MigrationState *s)
> > > > --
> > > > 2.17.1
diff mbox series

Patch

diff --git a/migration/colo.c b/migration/colo.c
index d5bced22cb..e6a7d8c6e2 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -47,6 +47,9 @@  static COLOMode last_colo_mode;
 
 #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
 
+/* Default COLO_CHECKPOINT_INTERVAL is 1000 ms */
+#define COLO_CHECKPOINT_INTERVAL 1000
+
 bool migration_in_colo_state(void)
 {
     MigrationState *s = migrate_get_current();
@@ -651,13 +654,20 @@  out:
 void colo_checkpoint_notify(void *opaque)
 {
     MigrationState *s = opaque;
-    int64_t next_notify_time;
+    int64_t now = qemu_clock_get_ms(QEMU_CLOCK_HOST);
 
-    qemu_sem_post(&s->colo_checkpoint_sem);
-    s->colo_checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST);
-    next_notify_time = s->colo_checkpoint_time +
-                    s->parameters.x_checkpoint_delay;
-    timer_mod(s->colo_delay_timer, next_notify_time);
+    /*
+     * When COLO guest occur issues, COLO-compare will catch lots of
+     * different network packet and trigger notification multi times,
+     * force periodic may happen at the same time. So this can be
+     * efficient merge checkpoint request within COLO_CHECKPOINT_INTERVAL.
+     */
+    if (now > s->colo_checkpoint_time + COLO_CHECKPOINT_INTERVAL) {
+        qemu_sem_post(&s->colo_checkpoint_sem);
+        timer_mod(s->colo_delay_timer, now +
+                  s->parameters.x_checkpoint_delay);
+        s->colo_checkpoint_time = now;
+    }
 }
 
 void migrate_start_colo_process(MigrationState *s)