Message ID | 1610505995-144129-8-git-send-email-lei.rao@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fixed some bugs and optimized some codes for COLO | expand |
* leirao (lei.rao@intel.com) wrote: > From: "Rao, Lei" <lei.rao@intel.com> > > If we don't disable the feature of auto-converge for live migration > before entering COLO mode, it will continue to run with COLO running, > and eventually the system will hang due to the CPU throttle reaching > DEFAULT_MIGRATE_MAX_CPU_THROTTLE. > > Signed-off-by: Lei Rao <lei.rao@intel.com> I don't think that's the right answer, because it would seem reasonable to use auto-converge to ensure that a COLO snapshot succeeded by limiting guest CPU time. Is the right fix here to reset the state of the auto-converge counters at the start of each colo snapshot? Dave > --- > migration/migration.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 31417ce..6ab37e5 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1673,6 +1673,20 @@ void migrate_set_block_enabled(bool value, Error **errp) > qapi_free_MigrationCapabilityStatusList(cap); > } > > +static void colo_auto_converge_enabled(bool value, Error **errp) > +{ > + MigrationCapabilityStatusList *cap = NULL; > + > + if (migrate_colo_enabled() && migrate_auto_converge()) { > + QAPI_LIST_PREPEND(cap, > + migrate_cap_add(MIGRATION_CAPABILITY_AUTO_CONVERGE, > + value)); > + qmp_migrate_set_capabilities(cap, errp); > + qapi_free_MigrationCapabilityStatusList(cap); > + } > + cpu_throttle_stop(); > +} > + > static void migrate_set_block_incremental(MigrationState *s, bool value) > { > s->parameters.block_incremental = value; > @@ -3401,7 +3415,7 @@ static MigIterateState migration_iteration_run(MigrationState *s) > static void migration_iteration_finish(MigrationState *s) > { > /* If we enabled cpu throttling for auto-converge, turn it off. */ > - cpu_throttle_stop(); > + colo_auto_converge_enabled(false, &error_abort); > > qemu_mutex_lock_iothread(); > switch (s->state) { > -- > 1.8.3.1 >
I think there is a difference between doing checkpoints in COLO and live migration. The feature of auto-converge is to ensure the success of live migration even though the dirty page generation speed is faster than data transfer. but for COLO, we will force the VM to stop when something is doing a checkpoint. This will ensure the success of doing a checkpoint and this has nothing to do with auto-converge. Thanks, Lei. -----Original Message----- From: Dr. David Alan Gilbert <dgilbert@redhat.com> Sent: Wednesday, January 13, 2021 7:32 PM To: Rao, Lei <lei.rao@intel.com> Cc: Zhang, Chen <chen.zhang@intel.com>; lizhijian@cn.fujitsu.com; jasowang@redhat.com; zhang.zhanghailiang@huawei.com; quintela@redhat.com; qemu-devel@nongnu.org Subject: Re: [PATCH 07/10] Disable auto-coverge before entering COLO mode. * leirao (lei.rao@intel.com) wrote: > From: "Rao, Lei" <lei.rao@intel.com> > > If we don't disable the feature of auto-converge for live migration > before entering COLO mode, it will continue to run with COLO running, > and eventually the system will hang due to the CPU throttle reaching > DEFAULT_MIGRATE_MAX_CPU_THROTTLE. > > Signed-off-by: Lei Rao <lei.rao@intel.com> I don't think that's the right answer, because it would seem reasonable to use auto-converge to ensure that a COLO snapshot succeeded by limiting guest CPU time. Is the right fix here to reset the state of the auto-converge counters at the start of each colo snapshot? Dave > --- > migration/migration.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/migration/migration.c b/migration/migration.c index > 31417ce..6ab37e5 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1673,6 +1673,20 @@ void migrate_set_block_enabled(bool value, Error **errp) > qapi_free_MigrationCapabilityStatusList(cap); > } > > +static void colo_auto_converge_enabled(bool value, Error **errp) { > + MigrationCapabilityStatusList *cap = NULL; > + > + if (migrate_colo_enabled() && migrate_auto_converge()) { > + QAPI_LIST_PREPEND(cap, > + migrate_cap_add(MIGRATION_CAPABILITY_AUTO_CONVERGE, > + value)); > + qmp_migrate_set_capabilities(cap, errp); > + qapi_free_MigrationCapabilityStatusList(cap); > + } > + cpu_throttle_stop(); > +} > + > static void migrate_set_block_incremental(MigrationState *s, bool > value) { > s->parameters.block_incremental = value; @@ -3401,7 +3415,7 @@ > static MigIterateState migration_iteration_run(MigrationState *s) > static void migration_iteration_finish(MigrationState *s) { > /* If we enabled cpu throttling for auto-converge, turn it off. */ > - cpu_throttle_stop(); > + colo_auto_converge_enabled(false, &error_abort); > > qemu_mutex_lock_iothread(); > switch (s->state) { > -- > 1.8.3.1 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Wed, 13 Jan 2021 10:46:32 +0800 leirao <lei.rao@intel.com> wrote: > From: "Rao, Lei" <lei.rao@intel.com> > > If we don't disable the feature of auto-converge for live migration > before entering COLO mode, it will continue to run with COLO running, > and eventually the system will hang due to the CPU throttle reaching > DEFAULT_MIGRATE_MAX_CPU_THROTTLE. > > Signed-off-by: Lei Rao <lei.rao@intel.com> > --- > migration/migration.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 31417ce..6ab37e5 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1673,6 +1673,20 @@ void migrate_set_block_enabled(bool value, Error **errp) > qapi_free_MigrationCapabilityStatusList(cap); > } > > +static void colo_auto_converge_enabled(bool value, Error **errp) > +{ > + MigrationCapabilityStatusList *cap = NULL; > + > + if (migrate_colo_enabled() && migrate_auto_converge()) { > + QAPI_LIST_PREPEND(cap, > + migrate_cap_add(MIGRATION_CAPABILITY_AUTO_CONVERGE, > + value)); > + qmp_migrate_set_capabilities(cap, errp); > + qapi_free_MigrationCapabilityStatusList(cap); > + } > + cpu_throttle_stop(); > +} > + I think it's better to error out in migration_prepare or migrate_caps_check if both colo and auto-converge is enabled. > static void migrate_set_block_incremental(MigrationState *s, bool value) > { > s->parameters.block_incremental = value; > @@ -3401,7 +3415,7 @@ static MigIterateState migration_iteration_run(MigrationState *s) > static void migration_iteration_finish(MigrationState *s) > { > /* If we enabled cpu throttling for auto-converge, turn it off. */ > - cpu_throttle_stop(); > + colo_auto_converge_enabled(false, &error_abort); > > qemu_mutex_lock_iothread(); > switch (s->state) { --
Sorry for the late reply due to CNY. Auto-converge ensure that live migration can be completed smoothly when there are too many dirty pages. COLO may encounter the same situation when rebuild a new secondary VM. So, I think it is necessary to enable COLO and auto-converge at the same time. Thanks, Lei. -----Original Message----- From: Lukas Straub <lukasstraub2@web.de> Sent: Sunday, February 14, 2021 6:52 PM To: Rao, Lei <lei.rao@intel.com> Cc: Zhang, Chen <chen.zhang@intel.com>; lizhijian@cn.fujitsu.com; jasowang@redhat.com; zhang.zhanghailiang@huawei.com; quintela@redhat.com; dgilbert@redhat.com; qemu-devel@nongnu.org Subject: Re: [PATCH 07/10] Disable auto-coverge before entering COLO mode. On Wed, 13 Jan 2021 10:46:32 +0800 leirao <lei.rao@intel.com> wrote: > From: "Rao, Lei" <lei.rao@intel.com> > > If we don't disable the feature of auto-converge for live migration > before entering COLO mode, it will continue to run with COLO running, > and eventually the system will hang due to the CPU throttle reaching > DEFAULT_MIGRATE_MAX_CPU_THROTTLE. > > Signed-off-by: Lei Rao <lei.rao@intel.com> > --- > migration/migration.c | 16 +++++++++++++++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/migration/migration.c b/migration/migration.c index > 31417ce..6ab37e5 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1673,6 +1673,20 @@ void migrate_set_block_enabled(bool value, Error **errp) > qapi_free_MigrationCapabilityStatusList(cap); > } > > +static void colo_auto_converge_enabled(bool value, Error **errp) { > + MigrationCapabilityStatusList *cap = NULL; > + > + if (migrate_colo_enabled() && migrate_auto_converge()) { > + QAPI_LIST_PREPEND(cap, > + migrate_cap_add(MIGRATION_CAPABILITY_AUTO_CONVERGE, > + value)); > + qmp_migrate_set_capabilities(cap, errp); > + qapi_free_MigrationCapabilityStatusList(cap); > + } > + cpu_throttle_stop(); > +} > + I think it's better to error out in migration_prepare or migrate_caps_check if both colo and auto-converge is enabled. > static void migrate_set_block_incremental(MigrationState *s, bool > value) { > s->parameters.block_incremental = value; @@ -3401,7 +3415,7 @@ > static MigIterateState migration_iteration_run(MigrationState *s) > static void migration_iteration_finish(MigrationState *s) { > /* If we enabled cpu throttling for auto-converge, turn it off. */ > - cpu_throttle_stop(); > + colo_auto_converge_enabled(false, &error_abort); > > qemu_mutex_lock_iothread(); > switch (s->state) { --
diff --git a/migration/migration.c b/migration/migration.c index 31417ce..6ab37e5 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1673,6 +1673,20 @@ void migrate_set_block_enabled(bool value, Error **errp) qapi_free_MigrationCapabilityStatusList(cap); } +static void colo_auto_converge_enabled(bool value, Error **errp) +{ + MigrationCapabilityStatusList *cap = NULL; + + if (migrate_colo_enabled() && migrate_auto_converge()) { + QAPI_LIST_PREPEND(cap, + migrate_cap_add(MIGRATION_CAPABILITY_AUTO_CONVERGE, + value)); + qmp_migrate_set_capabilities(cap, errp); + qapi_free_MigrationCapabilityStatusList(cap); + } + cpu_throttle_stop(); +} + static void migrate_set_block_incremental(MigrationState *s, bool value) { s->parameters.block_incremental = value; @@ -3401,7 +3415,7 @@ static MigIterateState migration_iteration_run(MigrationState *s) static void migration_iteration_finish(MigrationState *s) { /* If we enabled cpu throttling for auto-converge, turn it off. */ - cpu_throttle_stop(); + colo_auto_converge_enabled(false, &error_abort); qemu_mutex_lock_iothread(); switch (s->state) {