diff mbox

[v12,6/6] migration: add postcopy total blocktime into query-migrate

Message ID 1509369390-8285-7-git-send-email-a.perevalov@samsung.com (mailing list archive)
State New, archived
Headers show

Commit Message

Alexey Perevalov Oct. 30, 2017, 1:16 p.m. UTC
Postcopy total blocktime is available on destination side only.
But query-migrate was possible only for source. This patch
adds ability to call query-migrate on destination.
To be able to see postcopy blocktime, need to request postcopy-blocktime
capability.

The query-migrate command will show following sample result:
{"return":
    "postcopy-vcpu-blocktime": [115, 100],
    "status": "completed",
    "postcopy-blocktime": 100
}}

postcopy_vcpu_blocktime contains list, where the first item is the first
vCPU in QEMU.

This patch has a drawback, it combines states of incoming and
outgoing migration. Ongoing migration state will overwrite incoming
state. Looks like better to separate query-migrate for incoming and
outgoing migration or add parameter to indicate type of migration.

Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
---
 hmp.c                    | 15 +++++++++++++
 migration/migration.c    | 42 ++++++++++++++++++++++++++++++++----
 migration/migration.h    |  4 ++++
 migration/postcopy-ram.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++
 migration/trace-events   |  1 +
 qapi/migration.json      | 11 +++++++++-
 6 files changed, 124 insertions(+), 5 deletions(-)

Comments

Eric Blake Jan. 2, 2018, 9:26 p.m. UTC | #1
On 10/30/2017 08:16 AM, Alexey Perevalov wrote:
> Postcopy total blocktime is available on destination side only.
> But query-migrate was possible only for source. This patch
> adds ability to call query-migrate on destination.
> To be able to see postcopy blocktime, need to request postcopy-blocktime
> capability.

Why not display the stats unconditionally when they are available,
instead of having to set a capability knob to request them?

> 
> The query-migrate command will show following sample result:
> {"return":
>     "postcopy-vcpu-blocktime": [115, 100],
>     "status": "completed",
>     "postcopy-blocktime": 100
> }}
> 
> postcopy_vcpu_blocktime contains list, where the first item is the first
> vCPU in QEMU.
> 
> This patch has a drawback, it combines states of incoming and
> outgoing migration. Ongoing migration state will overwrite incoming
> state. Looks like better to separate query-migrate for incoming and
> outgoing migration or add parameter to indicate type of migration.
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
> ---

> +++ b/qapi/migration.json
> @@ -156,6 +156,13 @@
>  #              @status is 'failed'. Clients should not attempt to parse the
>  #              error strings. (Since 2.7)
>  #
> +# @postcopy-blocktime: total time when all vCPU were blocked during postcopy
> +#           live migration (Since 2.11)

2.12 now.

Should this mention the capability knob needed to enable this stat (or
else get rid of the capability knob and always expose this when possible)?

> +#
> +# @postcopy-vcpu-blocktime: list of the postcopy blocktime per vCPU (Since 2.11)

Also 2.12.

> +#
> +
> +#
>  # Since: 0.14.0
>  ##
>  { 'struct': 'MigrationInfo',
> @@ -167,7 +174,9 @@
>             '*downtime': 'int',
>             '*setup-time': 'int',
>             '*cpu-throttle-percentage': 'int',
> -           '*error-desc': 'str'} }
> +           '*error-desc': 'str',
> +           '*postcopy-blocktime' : 'int64',
> +           '*postcopy-vcpu-blocktime': ['int64']} }
>  
>  ##
>  # @query-migrate:
>
Alexey Perevalov Jan. 5, 2018, 12:15 p.m. UTC | #2
On 01/03/2018 12:26 AM, Eric Blake wrote:
> On 10/30/2017 08:16 AM, Alexey Perevalov wrote:
>> Postcopy total blocktime is available on destination side only.
>> But query-migrate was possible only for source. This patch
>> adds ability to call query-migrate on destination.
>> To be able to see postcopy blocktime, need to request postcopy-blocktime
>> capability.
> Why not display the stats unconditionally when they are available,
> instead of having to set a capability knob to request them?
That knob necessary to avoid regression if this information
is not necessary, we decided during so long discussion in previous
version of the patch set - it's not necessary always.
But if user requested blocktime and host can't calculate it,
e.g. due to appropriate feature isn't supported in host kernel,
yes, the value will be 0.

>
>> The query-migrate command will show following sample result:
>> {"return":
>>      "postcopy-vcpu-blocktime": [115, 100],
>>      "status": "completed",
>>      "postcopy-blocktime": 100
>> }}
>>
>> postcopy_vcpu_blocktime contains list, where the first item is the first
>> vCPU in QEMU.
>>
>> This patch has a drawback, it combines states of incoming and
>> outgoing migration. Ongoing migration state will overwrite incoming
>> state. Looks like better to separate query-migrate for incoming and
>> outgoing migration or add parameter to indicate type of migration.
>>
>> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
>> ---
>> +++ b/qapi/migration.json
>> @@ -156,6 +156,13 @@
>>   #              @status is 'failed'. Clients should not attempt to parse the
>>   #              error strings. (Since 2.7)
>>   #
>> +# @postcopy-blocktime: total time when all vCPU were blocked during postcopy
>> +#           live migration (Since 2.11)
> 2.12 now.
>
> Should this mention the capability knob needed to enable this stat (or
> else get rid of the capability knob and always expose this when possible)?
>
>> +#
>> +# @postcopy-vcpu-blocktime: list of the postcopy blocktime per vCPU (Since 2.11)
> Also 2.12.
>
>> +#
>> +
>> +#
>>   # Since: 0.14.0
>>   ##
>>   { 'struct': 'MigrationInfo',
>> @@ -167,7 +174,9 @@
>>              '*downtime': 'int',
>>              '*setup-time': 'int',
>>              '*cpu-throttle-percentage': 'int',
>> -           '*error-desc': 'str'} }
>> +           '*error-desc': 'str',
>> +           '*postcopy-blocktime' : 'int64',
>> +           '*postcopy-vcpu-blocktime': ['int64']} }
>>   
>>   ##
>>   # @query-migrate:
>>
Eric Blake Jan. 5, 2018, 4:21 p.m. UTC | #3
On 01/05/2018 06:15 AM, Alexey Perevalov wrote:
> On 01/03/2018 12:26 AM, Eric Blake wrote:
>> On 10/30/2017 08:16 AM, Alexey Perevalov wrote:
>>> Postcopy total blocktime is available on destination side only.
>>> But query-migrate was possible only for source. This patch
>>> adds ability to call query-migrate on destination.
>>> To be able to see postcopy blocktime, need to request postcopy-blocktime
>>> capability.
>> Why not display the stats unconditionally when they are available,
>> instead of having to set a capability knob to request them?
> That knob necessary to avoid regression if this information
> is not necessary, we decided during so long discussion in previous
> version of the patch set - it's not necessary always.
> But if user requested blocktime and host can't calculate it,
> e.g. due to appropriate feature isn't supported in host kernel,
> yes, the value will be 0.
> 

>>>   #
>>> +# @postcopy-blocktime: total time when all vCPU were blocked during
>>> postcopy
>>> +#           live migration (Since 2.11)
>> 2.12 now.
>>
>> Should this mention the capability knob needed to enable this stat (or
>> else get rid of the capability knob and always expose this when
>> possible)?
>>

Okay, so you've explained that the knob is necessary because there is a
noticeable performance difference for users that don't care about the
statistic; in which case, we DO need a followup patch (or a v2 of Juan's
pull request) that documents that this statistic is useless unless you
set the migration capability.
diff mbox

Patch

diff --git a/hmp.c b/hmp.c
index 41fcce6..4f42eb8 100644
--- a/hmp.c
+++ b/hmp.c
@@ -264,6 +264,21 @@  void hmp_info_migrate(Monitor *mon, const QDict *qdict)
                        info->cpu_throttle_percentage);
     }
 
+    if (info->has_postcopy_blocktime) {
+        monitor_printf(mon, "postcopy blocktime: %" PRId64 "\n",
+                       info->postcopy_blocktime);
+    }
+
+    if (info->has_postcopy_vcpu_blocktime) {
+        Visitor *v;
+        char *str;
+        v = string_output_visitor_new(false, &str);
+        visit_type_int64List(v, NULL, &info->postcopy_vcpu_blocktime, NULL);
+        visit_complete(v, &str);
+        monitor_printf(mon, "postcopy vcpu blocktime: %s\n", str);
+        g_free(str);
+        visit_free(v);
+    }
     qapi_free_MigrationInfo(info);
     qapi_free_MigrationCapabilityStatusList(caps);
 }
diff --git a/migration/migration.c b/migration/migration.c
index c5244ae..cd09ba4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -589,14 +589,15 @@  static void populate_disk_info(MigrationInfo *info)
     }
 }
 
-MigrationInfo *qmp_query_migrate(Error **errp)
+static void fill_source_migration_info(MigrationInfo *info)
 {
-    MigrationInfo *info = g_malloc0(sizeof(*info));
     MigrationState *s = migrate_get_current();
 
     switch (s->state) {
     case MIGRATION_STATUS_NONE:
         /* no migration has happened ever */
+        /* do not overwrite destination migration status */
+        return;
         break;
     case MIGRATION_STATUS_SETUP:
         info->has_status = true;
@@ -647,8 +648,6 @@  MigrationInfo *qmp_query_migrate(Error **errp)
         break;
     }
     info->status = s->state;
-
-    return info;
 }
 
 /**
@@ -712,6 +711,41 @@  static bool migrate_caps_check(bool *cap_list,
     return true;
 }
 
+static void fill_destination_migration_info(MigrationInfo *info)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+
+    switch (mis->state) {
+    case MIGRATION_STATUS_NONE:
+        return;
+        break;
+    case MIGRATION_STATUS_SETUP:
+    case MIGRATION_STATUS_CANCELLING:
+    case MIGRATION_STATUS_CANCELLED:
+    case MIGRATION_STATUS_ACTIVE:
+    case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+    case MIGRATION_STATUS_FAILED:
+    case MIGRATION_STATUS_COLO:
+        info->has_status = true;
+        break;
+    case MIGRATION_STATUS_COMPLETED:
+        info->has_status = true;
+        fill_destination_postcopy_migration_info(info);
+        break;
+    }
+    info->status = mis->state;
+}
+
+MigrationInfo *qmp_query_migrate(Error **errp)
+{
+    MigrationInfo *info = g_malloc0(sizeof(*info));
+
+    fill_destination_migration_info(info);
+    fill_source_migration_info(info);
+
+    return info;
+}
+
 void qmp_migrate_set_capabilities(MigrationCapabilityStatusList *params,
                                   Error **errp)
 {
diff --git a/migration/migration.h b/migration/migration.h
index fb8d2ef..99f294f 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -71,6 +71,10 @@  struct MigrationIncomingState {
 
 MigrationIncomingState *migration_incoming_get_current(void);
 void migration_incoming_state_destroy(void);
+/*
+ * Functions to work with blocktime context
+ */
+void fill_destination_postcopy_migration_info(MigrationInfo *info);
 
 #define TYPE_MIGRATION "migration"
 
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 6bf24e9..2823133 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -108,6 +108,55 @@  static struct PostcopyBlocktimeContext *blocktime_context_new(void)
     return ctx;
 }
 
+static int64List *get_vcpu_blocktime_list(PostcopyBlocktimeContext *ctx)
+{
+    int64List *list = NULL, *entry = NULL;
+    int i;
+
+    for (i = smp_cpus - 1; i >= 0; i--) {
+        entry = g_new0(int64List, 1);
+        entry->value = ctx->vcpu_blocktime[i];
+        entry->next = list;
+        list = entry;
+    }
+
+    return list;
+}
+
+/*
+ * This function just populates MigrationInfo from postcopy's
+ * blocktime context. It will not populate MigrationInfo,
+ * unless postcopy-blocktime capability was set.
+ *
+ * @info: pointer to MigrationInfo to populate
+ */
+void fill_destination_postcopy_migration_info(MigrationInfo *info)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    PostcopyBlocktimeContext *bc = mis->blocktime_ctx;
+
+    if (!bc) {
+        return;
+    }
+
+    info->has_postcopy_blocktime = true;
+    info->postcopy_blocktime = bc->total_blocktime;
+    info->has_postcopy_vcpu_blocktime = true;
+    info->postcopy_vcpu_blocktime = get_vcpu_blocktime_list(bc);
+}
+
+static uint64_t get_postcopy_total_blocktime(void)
+{
+    MigrationIncomingState *mis = migration_incoming_get_current();
+    PostcopyBlocktimeContext *bc = mis->blocktime_ctx;
+
+    if (!bc) {
+        return 0;
+    }
+
+    return bc->total_blocktime;
+}
+
 /**
  * receive_ufd_features: check userfault fd features, to request only supported
  * features in the future.
@@ -482,6 +531,9 @@  int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)
         munmap(mis->postcopy_tmp_zero_page, mis->largest_page_size);
         mis->postcopy_tmp_zero_page = NULL;
     }
+    trace_postcopy_ram_incoming_cleanup_blocktime(
+            get_postcopy_total_blocktime());
+
     trace_postcopy_ram_incoming_cleanup_exit();
     return 0;
 }
@@ -959,6 +1011,10 @@  void *postcopy_get_tmp_page(MigrationIncomingState *mis)
 
 #else
 /* No target OS support, stubs just fail */
+void fill_destination_postcopy_migration_info(MigrationInfo *info)
+{
+}
+
 bool postcopy_ram_supported_by_host(MigrationIncomingState *mis)
 {
     error_report("%s: No OS support", __func__);
diff --git a/migration/trace-events b/migration/trace-events
index 462d157..141e773 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -198,6 +198,7 @@  postcopy_ram_incoming_cleanup_closeuf(void) ""
 postcopy_ram_incoming_cleanup_entry(void) ""
 postcopy_ram_incoming_cleanup_exit(void) ""
 postcopy_ram_incoming_cleanup_join(void) ""
+postcopy_ram_incoming_cleanup_blocktime(uint64_t total) "total blocktime %" PRIu64
 save_xbzrle_page_skipping(void) ""
 save_xbzrle_page_overflow(void) ""
 ram_save_iterate_big_wait(uint64_t milliconds, int iterations) "big wait: %" PRIu64 " milliseconds, %d iterations"
diff --git a/qapi/migration.json b/qapi/migration.json
index c20caf4..be4a869 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -156,6 +156,13 @@ 
 #              @status is 'failed'. Clients should not attempt to parse the
 #              error strings. (Since 2.7)
 #
+# @postcopy-blocktime: total time when all vCPU were blocked during postcopy
+#           live migration (Since 2.11)
+#
+# @postcopy-vcpu-blocktime: list of the postcopy blocktime per vCPU (Since 2.11)
+#
+
+#
 # Since: 0.14.0
 ##
 { 'struct': 'MigrationInfo',
@@ -167,7 +174,9 @@ 
            '*downtime': 'int',
            '*setup-time': 'int',
            '*cpu-throttle-percentage': 'int',
-           '*error-desc': 'str'} }
+           '*error-desc': 'str',
+           '*postcopy-blocktime' : 'int64',
+           '*postcopy-vcpu-blocktime': ['int64']} }
 
 ##
 # @query-migrate: