diff mbox series

[07/10] Disable auto-coverge before entering COLO mode.

Message ID 1610505995-144129-8-git-send-email-lei.rao@intel.com (mailing list archive)
State New, archived
Headers show
Series Fixed some bugs and optimized some codes for COLO | expand

Commit Message

Rao, Lei Jan. 13, 2021, 2:46 a.m. UTC
From: "Rao, Lei" <lei.rao@intel.com>

If we don't disable the feature of auto-converge for live migration
before entering COLO mode, it will continue to run with COLO running,
and eventually the system will hang due to the CPU throttle reaching
DEFAULT_MIGRATE_MAX_CPU_THROTTLE.

Signed-off-by: Lei Rao <lei.rao@intel.com>
---
 migration/migration.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

Comments

Dr. David Alan Gilbert Jan. 13, 2021, 11:31 a.m. UTC | #1
* leirao (lei.rao@intel.com) wrote:
> From: "Rao, Lei" <lei.rao@intel.com>
> 
> If we don't disable the feature of auto-converge for live migration
> before entering COLO mode, it will continue to run with COLO running,
> and eventually the system will hang due to the CPU throttle reaching
> DEFAULT_MIGRATE_MAX_CPU_THROTTLE.
> 
> Signed-off-by: Lei Rao <lei.rao@intel.com>

I don't think that's the right answer, because it would seem reasonable
to use auto-converge to ensure that a COLO snapshot succeeded by
limiting guest CPU time.  Is the right fix here to reset the state of
the auto-converge counters at the start of each colo snapshot?

Dave

> ---
>  migration/migration.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 31417ce..6ab37e5 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1673,6 +1673,20 @@ void migrate_set_block_enabled(bool value, Error **errp)
>      qapi_free_MigrationCapabilityStatusList(cap);
>  }
>  
> +static void colo_auto_converge_enabled(bool value, Error **errp)
> +{
> +    MigrationCapabilityStatusList *cap = NULL;
> +
> +    if (migrate_colo_enabled() && migrate_auto_converge()) {
> +        QAPI_LIST_PREPEND(cap,
> +                          migrate_cap_add(MIGRATION_CAPABILITY_AUTO_CONVERGE,
> +                                          value));
> +        qmp_migrate_set_capabilities(cap, errp);
> +        qapi_free_MigrationCapabilityStatusList(cap);
> +    }
> +    cpu_throttle_stop();
> +}
> +
>  static void migrate_set_block_incremental(MigrationState *s, bool value)
>  {
>      s->parameters.block_incremental = value;
> @@ -3401,7 +3415,7 @@ static MigIterateState migration_iteration_run(MigrationState *s)
>  static void migration_iteration_finish(MigrationState *s)
>  {
>      /* If we enabled cpu throttling for auto-converge, turn it off. */
> -    cpu_throttle_stop();
> +    colo_auto_converge_enabled(false, &error_abort);
>  
>      qemu_mutex_lock_iothread();
>      switch (s->state) {
> -- 
> 1.8.3.1
>
Rao, Lei Jan. 14, 2021, 3:21 a.m. UTC | #2
I think there is a difference between doing checkpoints in COLO and live migration.
The feature of auto-converge is to ensure the success of live migration even though the dirty page generation speed is faster than data transfer.
but for COLO, we will force the VM to stop when something is doing a checkpoint. This will ensure the success of doing a checkpoint and this has nothing to do with auto-converge.

Thanks,
Lei.

-----Original Message-----
From: Dr. David Alan Gilbert <dgilbert@redhat.com> 
Sent: Wednesday, January 13, 2021 7:32 PM
To: Rao, Lei <lei.rao@intel.com>
Cc: Zhang, Chen <chen.zhang@intel.com>; lizhijian@cn.fujitsu.com; jasowang@redhat.com; zhang.zhanghailiang@huawei.com; quintela@redhat.com; qemu-devel@nongnu.org
Subject: Re: [PATCH 07/10] Disable auto-coverge before entering COLO mode.

* leirao (lei.rao@intel.com) wrote:
> From: "Rao, Lei" <lei.rao@intel.com>
> 
> If we don't disable the feature of auto-converge for live migration 
> before entering COLO mode, it will continue to run with COLO running, 
> and eventually the system will hang due to the CPU throttle reaching 
> DEFAULT_MIGRATE_MAX_CPU_THROTTLE.
> 
> Signed-off-by: Lei Rao <lei.rao@intel.com>

I don't think that's the right answer, because it would seem reasonable to use auto-converge to ensure that a COLO snapshot succeeded by limiting guest CPU time.  Is the right fix here to reset the state of the auto-converge counters at the start of each colo snapshot?

Dave

> ---
>  migration/migration.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c index 
> 31417ce..6ab37e5 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1673,6 +1673,20 @@ void migrate_set_block_enabled(bool value, Error **errp)
>      qapi_free_MigrationCapabilityStatusList(cap);
>  }
>  
> +static void colo_auto_converge_enabled(bool value, Error **errp) {
> +    MigrationCapabilityStatusList *cap = NULL;
> +
> +    if (migrate_colo_enabled() && migrate_auto_converge()) {
> +        QAPI_LIST_PREPEND(cap,
> +                          migrate_cap_add(MIGRATION_CAPABILITY_AUTO_CONVERGE,
> +                                          value));
> +        qmp_migrate_set_capabilities(cap, errp);
> +        qapi_free_MigrationCapabilityStatusList(cap);
> +    }
> +    cpu_throttle_stop();
> +}
> +
>  static void migrate_set_block_incremental(MigrationState *s, bool 
> value)  {
>      s->parameters.block_incremental = value; @@ -3401,7 +3415,7 @@ 
> static MigIterateState migration_iteration_run(MigrationState *s)  
> static void migration_iteration_finish(MigrationState *s)  {
>      /* If we enabled cpu throttling for auto-converge, turn it off. */
> -    cpu_throttle_stop();
> +    colo_auto_converge_enabled(false, &error_abort);
>  
>      qemu_mutex_lock_iothread();
>      switch (s->state) {
> --
> 1.8.3.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Lukas Straub Feb. 14, 2021, 10:52 a.m. UTC | #3
On Wed, 13 Jan 2021 10:46:32 +0800
leirao <lei.rao@intel.com> wrote:

> From: "Rao, Lei" <lei.rao@intel.com>
> 
> If we don't disable the feature of auto-converge for live migration
> before entering COLO mode, it will continue to run with COLO running,
> and eventually the system will hang due to the CPU throttle reaching
> DEFAULT_MIGRATE_MAX_CPU_THROTTLE.
> 
> Signed-off-by: Lei Rao <lei.rao@intel.com>
> ---
>  migration/migration.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 31417ce..6ab37e5 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1673,6 +1673,20 @@ void migrate_set_block_enabled(bool value, Error **errp)
>      qapi_free_MigrationCapabilityStatusList(cap);
>  }
>  
> +static void colo_auto_converge_enabled(bool value, Error **errp)
> +{
> +    MigrationCapabilityStatusList *cap = NULL;
> +
> +    if (migrate_colo_enabled() && migrate_auto_converge()) {
> +        QAPI_LIST_PREPEND(cap,
> +                          migrate_cap_add(MIGRATION_CAPABILITY_AUTO_CONVERGE,
> +                                          value));
> +        qmp_migrate_set_capabilities(cap, errp);
> +        qapi_free_MigrationCapabilityStatusList(cap);
> +    }
> +    cpu_throttle_stop();
> +}
> +

I think it's better to error out in migration_prepare or migrate_caps_check
if both colo and auto-converge is enabled.

>  static void migrate_set_block_incremental(MigrationState *s, bool value)
>  {
>      s->parameters.block_incremental = value;
> @@ -3401,7 +3415,7 @@ static MigIterateState migration_iteration_run(MigrationState *s)
>  static void migration_iteration_finish(MigrationState *s)
>  {
>      /* If we enabled cpu throttling for auto-converge, turn it off. */
> -    cpu_throttle_stop();
> +    colo_auto_converge_enabled(false, &error_abort);
>  
>      qemu_mutex_lock_iothread();
>      switch (s->state) {



--
Rao, Lei Feb. 25, 2021, 9:22 a.m. UTC | #4
Sorry for the late reply due to CNY.
Auto-converge ensure that live migration can be completed smoothly when there are too many dirty pages.
COLO may encounter the same situation when rebuild a new secondary VM. 
So, I think it is necessary to enable COLO and auto-converge at the same time.

Thanks,
Lei.

-----Original Message-----
From: Lukas Straub <lukasstraub2@web.de> 
Sent: Sunday, February 14, 2021 6:52 PM
To: Rao, Lei <lei.rao@intel.com>
Cc: Zhang, Chen <chen.zhang@intel.com>; lizhijian@cn.fujitsu.com; jasowang@redhat.com; zhang.zhanghailiang@huawei.com; quintela@redhat.com; dgilbert@redhat.com; qemu-devel@nongnu.org
Subject: Re: [PATCH 07/10] Disable auto-coverge before entering COLO mode.

On Wed, 13 Jan 2021 10:46:32 +0800
leirao <lei.rao@intel.com> wrote:

> From: "Rao, Lei" <lei.rao@intel.com>
> 
> If we don't disable the feature of auto-converge for live migration 
> before entering COLO mode, it will continue to run with COLO running, 
> and eventually the system will hang due to the CPU throttle reaching 
> DEFAULT_MIGRATE_MAX_CPU_THROTTLE.
> 
> Signed-off-by: Lei Rao <lei.rao@intel.com>
> ---
>  migration/migration.c | 16 +++++++++++++++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c index 
> 31417ce..6ab37e5 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1673,6 +1673,20 @@ void migrate_set_block_enabled(bool value, Error **errp)
>      qapi_free_MigrationCapabilityStatusList(cap);
>  }
>  
> +static void colo_auto_converge_enabled(bool value, Error **errp) {
> +    MigrationCapabilityStatusList *cap = NULL;
> +
> +    if (migrate_colo_enabled() && migrate_auto_converge()) {
> +        QAPI_LIST_PREPEND(cap,
> +                          migrate_cap_add(MIGRATION_CAPABILITY_AUTO_CONVERGE,
> +                                          value));
> +        qmp_migrate_set_capabilities(cap, errp);
> +        qapi_free_MigrationCapabilityStatusList(cap);
> +    }
> +    cpu_throttle_stop();
> +}
> +

I think it's better to error out in migration_prepare or migrate_caps_check if both colo and auto-converge is enabled.

>  static void migrate_set_block_incremental(MigrationState *s, bool 
> value)  {
>      s->parameters.block_incremental = value; @@ -3401,7 +3415,7 @@ 
> static MigIterateState migration_iteration_run(MigrationState *s)  
> static void migration_iteration_finish(MigrationState *s)  {
>      /* If we enabled cpu throttling for auto-converge, turn it off. */
> -    cpu_throttle_stop();
> +    colo_auto_converge_enabled(false, &error_abort);
>  
>      qemu_mutex_lock_iothread();
>      switch (s->state) {



--
diff mbox series

Patch

diff --git a/migration/migration.c b/migration/migration.c
index 31417ce..6ab37e5 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1673,6 +1673,20 @@  void migrate_set_block_enabled(bool value, Error **errp)
     qapi_free_MigrationCapabilityStatusList(cap);
 }
 
+static void colo_auto_converge_enabled(bool value, Error **errp)
+{
+    MigrationCapabilityStatusList *cap = NULL;
+
+    if (migrate_colo_enabled() && migrate_auto_converge()) {
+        QAPI_LIST_PREPEND(cap,
+                          migrate_cap_add(MIGRATION_CAPABILITY_AUTO_CONVERGE,
+                                          value));
+        qmp_migrate_set_capabilities(cap, errp);
+        qapi_free_MigrationCapabilityStatusList(cap);
+    }
+    cpu_throttle_stop();
+}
+
 static void migrate_set_block_incremental(MigrationState *s, bool value)
 {
     s->parameters.block_incremental = value;
@@ -3401,7 +3415,7 @@  static MigIterateState migration_iteration_run(MigrationState *s)
 static void migration_iteration_finish(MigrationState *s)
 {
     /* If we enabled cpu throttling for auto-converge, turn it off. */
-    cpu_throttle_stop();
+    colo_auto_converge_enabled(false, &error_abort);
 
     qemu_mutex_lock_iothread();
     switch (s->state) {