Message ID | 20200515042818.17908-4-chen.zhang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | migration/colo: Optimize COLO framework code | expand |
> -----Original Message----- > From: Zhang Chen [mailto:chen.zhang@intel.com] > Sent: Friday, May 15, 2020 12:28 PM > To: Dr . David Alan Gilbert <dgilbert@redhat.com>; Juan Quintela > <quintela@redhat.com>; Zhanghailiang <zhang.zhanghailiang@huawei.com>; > qemu-dev <qemu-devel@nongnu.org> > Cc: Zhang Chen <zhangckid@gmail.com>; Jason Wang > <jasowang@redhat.com>; Zhang Chen <chen.zhang@intel.com> > Subject: [PATCH 3/3] migration/colo: Merge multi checkpoint request into > one. > > From: Zhang Chen <chen.zhang@intel.com> > > When COLO guest occur issues, COLO-compare will catch lots of different > network packet and trigger notification multi times, force periodic may > happen at the same time. So this can be efficient merge checkpoint request > within COLO_CHECKPOINT_INTERVAL. > > Signed-off-by: Zhang Chen <chen.zhang@intel.com> > --- > migration/colo.c | 22 ++++++++++++++++------ > 1 file changed, 16 insertions(+), 6 deletions(-) > > diff --git a/migration/colo.c b/migration/colo.c index > d5bced22cb..e6a7d8c6e2 100644 > --- a/migration/colo.c > +++ b/migration/colo.c > @@ -47,6 +47,9 @@ static COLOMode last_colo_mode; > > #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024) > > +/* Default COLO_CHECKPOINT_INTERVAL is 1000 ms */ #define > +COLO_CHECKPOINT_INTERVAL 1000 > + > bool migration_in_colo_state(void) > { > MigrationState *s = migrate_get_current(); @@ -651,13 +654,20 @@ > out: > void colo_checkpoint_notify(void *opaque) { > MigrationState *s = opaque; > - int64_t next_notify_time; > + int64_t now = qemu_clock_get_ms(QEMU_CLOCK_HOST); > > - qemu_sem_post(&s->colo_checkpoint_sem); > - s->colo_checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); > - next_notify_time = s->colo_checkpoint_time + > - s->parameters.x_checkpoint_delay; > - timer_mod(s->colo_delay_timer, next_notify_time); > + /* > + * When COLO guest occur issues, COLO-compare will catch lots of > + * different network packet and trigger notification multi times, > + * force periodic may happen at the same time. So this can be > + * efficient merge checkpoint request within > COLO_CHECKPOINT_INTERVAL. > + */ > + if (now > s->colo_checkpoint_time + COLO_CHECKPOINT_INTERVAL) { > + qemu_sem_post(&s->colo_checkpoint_sem); It is not right here, this notification should not be controlled by the interval time, I got what happened here, when multiple checkpoint requires come, this Colo_delay_time will be added every time and it will be a big value which is not what we want. Besides, please update this patch based on [PATCH 0/6] colo: migration related bugfixes series which Has modified the same place. > + timer_mod(s->colo_delay_timer, now + > + s->parameters.x_checkpoint_delay); > + s->colo_checkpoint_time = now; > + } > } > > void migrate_start_colo_process(MigrationState *s) > -- > 2.17.1
> -----Original Message----- > From: Zhanghailiang <zhang.zhanghailiang@huawei.com> > Sent: Tuesday, June 2, 2020 2:59 PM > To: Zhang, Chen <chen.zhang@intel.com>; Dr . David Alan Gilbert > <dgilbert@redhat.com>; Juan Quintela <quintela@redhat.com>; qemu-dev > <qemu-devel@nongnu.org> > Cc: Zhang Chen <zhangckid@gmail.com>; Jason Wang > <jasowang@redhat.com> > Subject: RE: [PATCH 3/3] migration/colo: Merge multi checkpoint request > into one. > > > > > -----Original Message----- > > From: Zhang Chen [mailto:chen.zhang@intel.com] > > Sent: Friday, May 15, 2020 12:28 PM > > To: Dr . David Alan Gilbert <dgilbert@redhat.com>; Juan Quintela > > <quintela@redhat.com>; Zhanghailiang > <zhang.zhanghailiang@huawei.com>; > > qemu-dev <qemu-devel@nongnu.org> > > Cc: Zhang Chen <zhangckid@gmail.com>; Jason Wang > > <jasowang@redhat.com>; Zhang Chen <chen.zhang@intel.com> > > Subject: [PATCH 3/3] migration/colo: Merge multi checkpoint request > > into one. > > > > From: Zhang Chen <chen.zhang@intel.com> > > > > When COLO guest occur issues, COLO-compare will catch lots of > > different network packet and trigger notification multi times, force > > periodic may happen at the same time. So this can be efficient merge > > checkpoint request within COLO_CHECKPOINT_INTERVAL. > > > > Signed-off-by: Zhang Chen <chen.zhang@intel.com> > > --- > > migration/colo.c | 22 ++++++++++++++++------ > > 1 file changed, 16 insertions(+), 6 deletions(-) > > > > diff --git a/migration/colo.c b/migration/colo.c index > > d5bced22cb..e6a7d8c6e2 100644 > > --- a/migration/colo.c > > +++ b/migration/colo.c > > @@ -47,6 +47,9 @@ static COLOMode last_colo_mode; > > > > #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024) > > > > +/* Default COLO_CHECKPOINT_INTERVAL is 1000 ms */ #define > > +COLO_CHECKPOINT_INTERVAL 1000 > > + > > bool migration_in_colo_state(void) > > { > > MigrationState *s = migrate_get_current(); @@ -651,13 +654,20 @@ > > out: > > void colo_checkpoint_notify(void *opaque) { > > MigrationState *s = opaque; > > - int64_t next_notify_time; > > + int64_t now = qemu_clock_get_ms(QEMU_CLOCK_HOST); > > > > - qemu_sem_post(&s->colo_checkpoint_sem); > > - s->colo_checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); > > - next_notify_time = s->colo_checkpoint_time + > > - s->parameters.x_checkpoint_delay; > > - timer_mod(s->colo_delay_timer, next_notify_time); > > + /* > > + * When COLO guest occur issues, COLO-compare will catch lots of > > + * different network packet and trigger notification multi times, > > + * force periodic may happen at the same time. So this can be > > + * efficient merge checkpoint request within > > COLO_CHECKPOINT_INTERVAL. > > + */ > > + if (now > s->colo_checkpoint_time + COLO_CHECKPOINT_INTERVAL) { > > + qemu_sem_post(&s->colo_checkpoint_sem); > > It is not right here, this notification should not be controlled by the interval > time, I got what happened here, when multiple checkpoint requires come, > this Colo_delay_time will be added every time and it will be a big value which > is not what we want. Not just this, multi checkpoint will spend lots of resource to sync memory from PVM to SVM, It will make VM stop/start multi times, but for the results are same with one checkpoint. So in short time just need one checkpoint, because do checkpoint still need some time... Thanks Zhang Chen > > Besides, please update this patch based on [PATCH 0/6] colo: migration > related bugfixes series which Has modified the same place. > > > > > + timer_mod(s->colo_delay_timer, now + > > + s->parameters.x_checkpoint_delay); > > + s->colo_checkpoint_time = now; > > + } > > } > > > > void migrate_start_colo_process(MigrationState *s) > > -- > > 2.17.1
> -----Original Message----- > From: Zhang, Chen [mailto:chen.zhang@intel.com] > Sent: Wednesday, June 3, 2020 5:11 PM > To: Zhanghailiang <zhang.zhanghailiang@huawei.com>; Dr . David Alan > Gilbert <dgilbert@redhat.com>; Juan Quintela <quintela@redhat.com>; > qemu-dev <qemu-devel@nongnu.org> > Cc: Zhang Chen <zhangckid@gmail.com>; Jason Wang > <jasowang@redhat.com> > Subject: RE: [PATCH 3/3] migration/colo: Merge multi checkpoint request > into one. > > > > > -----Original Message----- > > From: Zhanghailiang <zhang.zhanghailiang@huawei.com> > > Sent: Tuesday, June 2, 2020 2:59 PM > > To: Zhang, Chen <chen.zhang@intel.com>; Dr . David Alan Gilbert > > <dgilbert@redhat.com>; Juan Quintela <quintela@redhat.com>; qemu-dev > > <qemu-devel@nongnu.org> > > Cc: Zhang Chen <zhangckid@gmail.com>; Jason Wang > <jasowang@redhat.com> > > Subject: RE: [PATCH 3/3] migration/colo: Merge multi checkpoint > > request into one. > > > > > > > > > -----Original Message----- > > > From: Zhang Chen [mailto:chen.zhang@intel.com] > > > Sent: Friday, May 15, 2020 12:28 PM > > > To: Dr . David Alan Gilbert <dgilbert@redhat.com>; Juan Quintela > > > <quintela@redhat.com>; Zhanghailiang > > <zhang.zhanghailiang@huawei.com>; > > > qemu-dev <qemu-devel@nongnu.org> > > > Cc: Zhang Chen <zhangckid@gmail.com>; Jason Wang > > > <jasowang@redhat.com>; Zhang Chen <chen.zhang@intel.com> > > > Subject: [PATCH 3/3] migration/colo: Merge multi checkpoint request > > > into one. > > > > > > From: Zhang Chen <chen.zhang@intel.com> > > > > > > When COLO guest occur issues, COLO-compare will catch lots of > > > different network packet and trigger notification multi times, force > > > periodic may happen at the same time. So this can be efficient merge > > > checkpoint request within COLO_CHECKPOINT_INTERVAL. > > > > > > Signed-off-by: Zhang Chen <chen.zhang@intel.com> > > > --- > > > migration/colo.c | 22 ++++++++++++++++------ > > > 1 file changed, 16 insertions(+), 6 deletions(-) > > > > > > diff --git a/migration/colo.c b/migration/colo.c index > > > d5bced22cb..e6a7d8c6e2 100644 > > > --- a/migration/colo.c > > > +++ b/migration/colo.c > > > @@ -47,6 +47,9 @@ static COLOMode last_colo_mode; > > > > > > #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024) > > > > > > +/* Default COLO_CHECKPOINT_INTERVAL is 1000 ms */ #define > > > +COLO_CHECKPOINT_INTERVAL 1000 > > > + > > > bool migration_in_colo_state(void) > > > { > > > MigrationState *s = migrate_get_current(); @@ -651,13 +654,20 > > > @@ > > > out: > > > void colo_checkpoint_notify(void *opaque) { > > > MigrationState *s = opaque; > > > - int64_t next_notify_time; > > > + int64_t now = qemu_clock_get_ms(QEMU_CLOCK_HOST); > > > > > > - qemu_sem_post(&s->colo_checkpoint_sem); > > > - s->colo_checkpoint_time = > qemu_clock_get_ms(QEMU_CLOCK_HOST); > > > - next_notify_time = s->colo_checkpoint_time + > > > - s->parameters.x_checkpoint_delay; > > > - timer_mod(s->colo_delay_timer, next_notify_time); > > > + /* > > > + * When COLO guest occur issues, COLO-compare will catch lots of > > > + * different network packet and trigger notification multi times, > > > + * force periodic may happen at the same time. So this can be > > > + * efficient merge checkpoint request within > > > COLO_CHECKPOINT_INTERVAL. > > > + */ > > > + if (now > s->colo_checkpoint_time + COLO_CHECKPOINT_INTERVAL) > { > > > + qemu_sem_post(&s->colo_checkpoint_sem); > > > > It is not right here, this notification should not be controlled by > > the interval time, I got what happened here, when multiple checkpoint > > requires come, this Colo_delay_time will be added every time and it > > will be a big value which is not what we want. > > Not just this, multi checkpoint will spend lots of resource to sync memory > from PVM to SVM, It will make VM stop/start multi times, but for the results > are same with one checkpoint. > So in short time just need one checkpoint, because do checkpoint still need > some time... > Yes, this because we use semaphore here, it will be increased multiple times, And I think Lukas's patch 'migration/colo.c: Use event instead of semaphore' has fixed this problem. Did you try the qemu upstream which has merged this patch ? > Thanks > Zhang Chen > > > > > Besides, please update this patch based on [PATCH 0/6] colo: migration > > related bugfixes series which Has modified the same place. > > > > > > > > > + timer_mod(s->colo_delay_timer, now + > > > + s->parameters.x_checkpoint_delay); > > > + s->colo_checkpoint_time = now; > > > + } > > > } > > > > > > void migrate_start_colo_process(MigrationState *s) > > > -- > > > 2.17.1
> -----Original Message----- > From: Zhanghailiang <zhang.zhanghailiang@huawei.com> > Sent: Wednesday, June 3, 2020 5:39 PM > To: Zhang, Chen <chen.zhang@intel.com>; Dr . David Alan Gilbert > <dgilbert@redhat.com>; Juan Quintela <quintela@redhat.com>; qemu-dev > <qemu-devel@nongnu.org> > Cc: Zhang Chen <zhangckid@gmail.com>; Jason Wang > <jasowang@redhat.com> > Subject: RE: [PATCH 3/3] migration/colo: Merge multi checkpoint request > into one. > > > -----Original Message----- > > From: Zhang, Chen [mailto:chen.zhang@intel.com] > > Sent: Wednesday, June 3, 2020 5:11 PM > > To: Zhanghailiang <zhang.zhanghailiang@huawei.com>; Dr . David Alan > > Gilbert <dgilbert@redhat.com>; Juan Quintela <quintela@redhat.com>; > > qemu-dev <qemu-devel@nongnu.org> > > Cc: Zhang Chen <zhangckid@gmail.com>; Jason Wang > <jasowang@redhat.com> > > Subject: RE: [PATCH 3/3] migration/colo: Merge multi checkpoint > > request into one. > > > > > > > > > -----Original Message----- > > > From: Zhanghailiang <zhang.zhanghailiang@huawei.com> > > > Sent: Tuesday, June 2, 2020 2:59 PM > > > To: Zhang, Chen <chen.zhang@intel.com>; Dr . David Alan Gilbert > > > <dgilbert@redhat.com>; Juan Quintela <quintela@redhat.com>; qemu- > dev > > > <qemu-devel@nongnu.org> > > > Cc: Zhang Chen <zhangckid@gmail.com>; Jason Wang > > <jasowang@redhat.com> > > > Subject: RE: [PATCH 3/3] migration/colo: Merge multi checkpoint > > > request into one. > > > > > > > > > > > > > -----Original Message----- > > > > From: Zhang Chen [mailto:chen.zhang@intel.com] > > > > Sent: Friday, May 15, 2020 12:28 PM > > > > To: Dr . David Alan Gilbert <dgilbert@redhat.com>; Juan Quintela > > > > <quintela@redhat.com>; Zhanghailiang > > > <zhang.zhanghailiang@huawei.com>; > > > > qemu-dev <qemu-devel@nongnu.org> > > > > Cc: Zhang Chen <zhangckid@gmail.com>; Jason Wang > > > > <jasowang@redhat.com>; Zhang Chen <chen.zhang@intel.com> > > > > Subject: [PATCH 3/3] migration/colo: Merge multi checkpoint > > > > request into one. > > > > > > > > From: Zhang Chen <chen.zhang@intel.com> > > > > > > > > When COLO guest occur issues, COLO-compare will catch lots of > > > > different network packet and trigger notification multi times, > > > > force periodic may happen at the same time. So this can be > > > > efficient merge checkpoint request within > COLO_CHECKPOINT_INTERVAL. > > > > > > > > Signed-off-by: Zhang Chen <chen.zhang@intel.com> > > > > --- > > > > migration/colo.c | 22 ++++++++++++++++------ > > > > 1 file changed, 16 insertions(+), 6 deletions(-) > > > > > > > > diff --git a/migration/colo.c b/migration/colo.c index > > > > d5bced22cb..e6a7d8c6e2 100644 > > > > --- a/migration/colo.c > > > > +++ b/migration/colo.c > > > > @@ -47,6 +47,9 @@ static COLOMode last_colo_mode; > > > > > > > > #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024) > > > > > > > > +/* Default COLO_CHECKPOINT_INTERVAL is 1000 ms */ #define > > > > +COLO_CHECKPOINT_INTERVAL 1000 > > > > + > > > > bool migration_in_colo_state(void) { > > > > MigrationState *s = migrate_get_current(); @@ -651,13 +654,20 > > > > @@ > > > > out: > > > > void colo_checkpoint_notify(void *opaque) { > > > > MigrationState *s = opaque; > > > > - int64_t next_notify_time; > > > > + int64_t now = qemu_clock_get_ms(QEMU_CLOCK_HOST); > > > > > > > > - qemu_sem_post(&s->colo_checkpoint_sem); > > > > - s->colo_checkpoint_time = > > qemu_clock_get_ms(QEMU_CLOCK_HOST); > > > > - next_notify_time = s->colo_checkpoint_time + > > > > - s->parameters.x_checkpoint_delay; > > > > - timer_mod(s->colo_delay_timer, next_notify_time); > > > > + /* > > > > + * When COLO guest occur issues, COLO-compare will catch lots of > > > > + * different network packet and trigger notification multi times, > > > > + * force periodic may happen at the same time. So this can be > > > > + * efficient merge checkpoint request within > > > > COLO_CHECKPOINT_INTERVAL. > > > > + */ > > > > + if (now > s->colo_checkpoint_time + COLO_CHECKPOINT_INTERVAL) > > { > > > > + qemu_sem_post(&s->colo_checkpoint_sem); > > > > > > It is not right here, this notification should not be controlled by > > > the interval time, I got what happened here, when multiple > > > checkpoint requires come, this Colo_delay_time will be added every > > > time and it will be a big value which is not what we want. > > > > Not just this, multi checkpoint will spend lots of resource to sync > > memory from PVM to SVM, It will make VM stop/start multi times, but > > for the results are same with one checkpoint. > > So in short time just need one checkpoint, because do checkpoint still > > need some time... > > > > Yes, this because we use semaphore here, it will be increased multiple times, > And I think Lukas's patch 'migration/colo.c: Use event instead of semaphore' > has fixed this problem. > Did you try the qemu upstream which has merged this patch ? Oh, Thanks reminder, I will drop this patch and rebase on upstream for V2. Thank Zhang Chen > > > Thanks > > Zhang Chen > > > > > > > > Besides, please update this patch based on [PATCH 0/6] colo: > > > migration related bugfixes series which Has modified the same place. > > > > > > > > > > > > > + timer_mod(s->colo_delay_timer, now + > > > > + s->parameters.x_checkpoint_delay); > > > > + s->colo_checkpoint_time = now; > > > > + } > > > > } > > > > > > > > void migrate_start_colo_process(MigrationState *s) > > > > -- > > > > 2.17.1
diff --git a/migration/colo.c b/migration/colo.c index d5bced22cb..e6a7d8c6e2 100644 --- a/migration/colo.c +++ b/migration/colo.c @@ -47,6 +47,9 @@ static COLOMode last_colo_mode; #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024) +/* Default COLO_CHECKPOINT_INTERVAL is 1000 ms */ +#define COLO_CHECKPOINT_INTERVAL 1000 + bool migration_in_colo_state(void) { MigrationState *s = migrate_get_current(); @@ -651,13 +654,20 @@ out: void colo_checkpoint_notify(void *opaque) { MigrationState *s = opaque; - int64_t next_notify_time; + int64_t now = qemu_clock_get_ms(QEMU_CLOCK_HOST); - qemu_sem_post(&s->colo_checkpoint_sem); - s->colo_checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); - next_notify_time = s->colo_checkpoint_time + - s->parameters.x_checkpoint_delay; - timer_mod(s->colo_delay_timer, next_notify_time); + /* + * When COLO guest occur issues, COLO-compare will catch lots of + * different network packet and trigger notification multi times, + * force periodic may happen at the same time. So this can be + * efficient merge checkpoint request within COLO_CHECKPOINT_INTERVAL. + */ + if (now > s->colo_checkpoint_time + COLO_CHECKPOINT_INTERVAL) { + qemu_sem_post(&s->colo_checkpoint_sem); + timer_mod(s->colo_delay_timer, now + + s->parameters.x_checkpoint_delay); + s->colo_checkpoint_time = now; + } } void migrate_start_colo_process(MigrationState *s)