diff mbox series

[4/5] migration: Provide QMP access to downtime stats

Message ID 20230926161841.98464-5-joao.m.martins@oracle.com (mailing list archive)
State New, archived
Headers show
Series migration: Downtime observability improvements | expand

Commit Message

Joao Martins Sept. 26, 2023, 4:18 p.m. UTC
Deliver the downtime breakdown also via `query-migrate`
to allow users to understand what their downtime value
represents.

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
 qapi/migration.json   | 22 ++++++++++++++++++++++
 migration/migration.c | 14 ++++++++++++++
 2 files changed, 36 insertions(+)

Comments

Peter Xu Oct. 4, 2023, 5:10 p.m. UTC | #1
Hi, Joao,

On Tue, Sep 26, 2023 at 05:18:40PM +0100, Joao Martins wrote:
> Deliver the downtime breakdown also via `query-migrate`
> to allow users to understand what their downtime value
> represents.

I agree downtime is an area we definitely need to improve.. however do we
need to make it part of qapi?  Or do we need them mostly for debugging
purpose?

Any introduction of motivation of this work, especially on exporting the
values to the mgmt apps?

> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
>  qapi/migration.json   | 22 ++++++++++++++++++++++
>  migration/migration.c | 14 ++++++++++++++
>  2 files changed, 36 insertions(+)
> 
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 2d91fbcb22ff..088e1b2bf440 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -217,6 +217,27 @@
>    'data': [ 'start', 'stop', 'precopy-iterable', 'precopy-noniterable',
>              'resume-return-path' ] }
>  
> +##
> +# @DowntimeStats:
> +#
> +# Detailed migration downtime statistics.
> +#
> +# @stop: Time taken to stop the VM during switchover.
> +#
> +# @precopy: Time taken to save all precopy state during switchover.
> +#
> +# @precopy-iterable: Time taken to save all precopy iterable state.
> +#
> +# @precopy-noniterable: Time taken to save all precopy non iterable state.
> +#
> +# @resume-return-path: Time taken to resume if return path is enabled,
> +#                      otherwise zero.

All these fields will more or less duplicate the ones in the other
MigrationDowntime just introduced.

We suffer from duplicated fields for migration parameters, proof shows that
dropping the duplication is normally harder..  I'm trying to dedup the
existing Migration*Parameter* objects and still in progress, so even if we
want to expose downtime in qapi I hope we can expose only one and single
object.

IIUC we can already do that by keeping DowntimeStats, keeing an object in
MigrationState, and just drop MigrationDowntime?

Thanks,
Joao Martins Oct. 6, 2023, 11:37 a.m. UTC | #2
On 04/10/2023 18:10, Peter Xu wrote:
> Hi, Joao,
> 
> On Tue, Sep 26, 2023 at 05:18:40PM +0100, Joao Martins wrote:
>> Deliver the downtime breakdown also via `query-migrate`
>> to allow users to understand what their downtime value
>> represents.
> 
> I agree downtime is an area we definitely need to improve.. however do we
> need to make it part of qapi?  Or do we need them mostly for debugging
> purpose?
> 
> Any introduction of motivation of this work, especially on exporting the
> values to the mgmt apps?
> 

I added the statistics mainly for observability (e.g. you would grep in the
libvirt logs for a non developer and they can understand how downtime is
explained). I wasn't specifically thinking about management app using this, just
broad access to the metrics.

One can get the same level of observability with a BPF/dtrace/systemtap script,
albeit in a less obvious way.

With respect to motivation: I am doing migration with VFs and sometimes
vhost-net, and the downtime/switchover is the only thing that is either
non-determinisc or not captured in the migration math. There are some things
that aren't accounted (e.g. vhost with enough queues will give you high
downtimes), and algorithimally not really possible to account for as one needs
to account every possible instruction when we quiesce the guest (or at least
that's my understanding).

Just having these metrics, help the developer *and* user see why such downtime
is high, and maybe open up window for fixes/bug-reports or where to improve.

Furthermore, hopefully these tracepoints or stats could be a starting point for
developers to understand how much downtime is spent in a particular device in
Qemu(as a follow-up to this series), or allow to implement bounds check limits
in switchover limits in way that doesn't violate downtime-limit SLAs (I have a
small set of patches for this).

>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>>  qapi/migration.json   | 22 ++++++++++++++++++++++
>>  migration/migration.c | 14 ++++++++++++++
>>  2 files changed, 36 insertions(+)
>>
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index 2d91fbcb22ff..088e1b2bf440 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -217,6 +217,27 @@
>>    'data': [ 'start', 'stop', 'precopy-iterable', 'precopy-noniterable',
>>              'resume-return-path' ] }
>>  
>> +##
>> +# @DowntimeStats:
>> +#
>> +# Detailed migration downtime statistics.
>> +#
>> +# @stop: Time taken to stop the VM during switchover.
>> +#
>> +# @precopy: Time taken to save all precopy state during switchover.
>> +#
>> +# @precopy-iterable: Time taken to save all precopy iterable state.
>> +#
>> +# @precopy-noniterable: Time taken to save all precopy non iterable state.
>> +#
>> +# @resume-return-path: Time taken to resume if return path is enabled,
>> +#                      otherwise zero.
> 
> All these fields will more or less duplicate the ones in the other
> MigrationDowntime just introduced.
> 
> We suffer from duplicated fields for migration parameters, proof shows that
> dropping the duplication is normally harder..  I'm trying to dedup the
> existing Migration*Parameter* objects and still in progress, so even if we
> want to expose downtime in qapi I hope we can expose only one and single
> object.
>

Thanks for the background; I am now recalling your other series doing this sort
of duplication[0]

[0] https://lore.kernel.org/qemu-devel/20230905162335.235619-5-peterx@redhat.com/

> IIUC we can already do that by keeping DowntimeStats, keeing an object in
> MigrationState, and just drop MigrationDowntime?
> 

I can try that, sounds way cleaner. I didn't like the duplication either.
Peter Xu Oct. 6, 2023, 2:27 p.m. UTC | #3
On Fri, Oct 06, 2023 at 12:37:15PM +0100, Joao Martins wrote:
> I added the statistics mainly for observability (e.g. you would grep in the
> libvirt logs for a non developer and they can understand how downtime is
> explained). I wasn't specifically thinking about management app using this, just
> broad access to the metrics.
> 
> One can get the same level of observability with a BPF/dtrace/systemtap script,
> albeit in a less obvious way.

Makes sense.

> 
> With respect to motivation: I am doing migration with VFs and sometimes
> vhost-net, and the downtime/switchover is the only thing that is either
> non-determinisc or not captured in the migration math. There are some things
> that aren't accounted (e.g. vhost with enough queues will give you high
> downtimes),

Will this be something relevant to loading of the queues?  There used to be
a work on greatly reducing downtime especially for virtio scenarios over
multiple queues (and iirc even 1 queue also benefits from that), it wasn't
merged probably because not enough review:

https://lore.kernel.org/r/20230317081904.24389-1-xuchuangxclwt@bytedance.com

Though personally I think that's some direction good to keep exploring at
least, maybe some slightly enhancement to that series will work for us.

> and algorithimally not really possible to account for as one needs
> to account every possible instruction when we quiesce the guest (or at least
> that's my understanding).
> 
> Just having these metrics, help the developer *and* user see why such downtime
> is high, and maybe open up window for fixes/bug-reports or where to improve.
> 
> Furthermore, hopefully these tracepoints or stats could be a starting point for
> developers to understand how much downtime is spent in a particular device in
> Qemu(as a follow-up to this series),

Yes, I was actually expecting that when read the cover letter. :) This also
makes sense.  One thing worth mention is, the real downtime measured can,
IMHO, differ on src/dst due to "pre_save" and "post_load" may not really
doing similar things.  IIUC it can happen that some device sents fast, but
loads slow.  I'm not sure whether there's reversed use case. Maybe we want
to capture that on both sides on some metrics?

> or allow to implement bounds check limits in switchover limits in way
> that doesn't violate downtime-limit SLAs (I have a small set of patches
> for this).

I assume that decision will always be synchronized between src/dst in some
way, or guaranteed to be same. But I can wait to read the series first.

Thanks,
diff mbox series

Patch

diff --git a/qapi/migration.json b/qapi/migration.json
index 2d91fbcb22ff..088e1b2bf440 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -217,6 +217,27 @@ 
   'data': [ 'start', 'stop', 'precopy-iterable', 'precopy-noniterable',
             'resume-return-path' ] }
 
+##
+# @DowntimeStats:
+#
+# Detailed migration downtime statistics.
+#
+# @stop: Time taken to stop the VM during switchover.
+#
+# @precopy: Time taken to save all precopy state during switchover.
+#
+# @precopy-iterable: Time taken to save all precopy iterable state.
+#
+# @precopy-noniterable: Time taken to save all precopy non iterable state.
+#
+# @resume-return-path: Time taken to resume if return path is enabled,
+#                      otherwise zero.
+#
+# Since: 8.2
+##
+{ 'struct': 'DowntimeStats',
+  'data': {'stop': 'int64', 'precopy': 'int64', 'precopy-iterable': 'int64',
+           'precopy-noniterable': 'int64', 'resume-return-path': 'int64' } }
 
 ##
 # @MigrationInfo:
@@ -308,6 +329,7 @@ 
            '*total-time': 'int',
            '*expected-downtime': 'int',
            '*downtime': 'int',
+           '*downtime-stats': 'DowntimeStats',
            '*setup-time': 'int',
            '*cpu-throttle-percentage': 'int',
            '*error-desc': 'str',
diff --git a/migration/migration.c b/migration/migration.c
index 4b5bed3eb09b..dec6c88fbff9 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -921,6 +921,19 @@  static int64_t migrate_get_downtime_resume_rp(MigrationState *s)
     return 0;
 }
 
+static void populate_downtime_info(MigrationInfo *info, MigrationState *s)
+{
+    DowntimeStats *stats;
+
+    info->downtime_stats = g_malloc0(sizeof(*info->downtime_stats));
+    stats = info->downtime_stats;
+    stats->stop = migrate_get_downtime_stop(s);
+    stats->precopy_iterable = migrate_get_downtime_precopy_iterable(s);
+    stats->precopy_noniterable = migrate_get_downtime_precopy_noniterable(s);
+    stats->precopy = stats->precopy_iterable + stats->precopy_noniterable;
+    stats->resume_return_path = migrate_get_downtime_resume_rp(s);
+}
+
 static void populate_time_info(MigrationInfo *info, MigrationState *s)
 {
     info->has_status = true;
@@ -939,6 +952,7 @@  static void populate_time_info(MigrationInfo *info, MigrationState *s)
     if (migrate_show_downtime(s)) {
         info->has_downtime = true;
         info->downtime = s->downtime;
+        populate_downtime_info(info, s);
     } else {
         info->has_expected_downtime = true;
         info->expected_downtime = s->expected_downtime;