diff mbox series

[QEMU,v25,17/17] qapi: Add VFIO devices migration stats in Migration stats

Message ID 1592684486-18511-18-git-send-email-kwankhede@nvidia.com (mailing list archive)
State New, archived
Headers show
Series Add migration support for VFIO devices | expand

Commit Message

Kirti Wankhede June 20, 2020, 8:21 p.m. UTC
Added amount of bytes transferred to the target VM by all VFIO devices

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
---
 hw/vfio/common.c            | 20 ++++++++++++++++++++
 hw/vfio/migration.c         | 11 ++++++++++-
 include/qemu/vfio-helpers.h |  3 +++
 migration/migration.c       | 14 ++++++++++++++
 monitor/hmp-cmds.c          |  6 ++++++
 qapi/migration.json         | 17 +++++++++++++++++
 6 files changed, 70 insertions(+), 1 deletion(-)

Comments

Markus Armbruster June 23, 2020, 7:21 a.m. UTC | #1
QAPI review only.

The only changes since I reviewed v23 is the rename of VfioStats member
@bytes to @transferred, and the move of MigrationInfo member @vfio next
to @ram and @disk.  Good.  I'm copying my other questions in the hope of
getting answers :)

Kirti Wankhede <kwankhede@nvidia.com> writes:

> Added amount of bytes transferred to the target VM by all VFIO devices
>
> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
[...]
> diff --git a/qapi/migration.json b/qapi/migration.json
> index d5000558c6c9..952864b05455 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -146,6 +146,18 @@
>              'active', 'postcopy-active', 'postcopy-paused',
>              'postcopy-recover', 'completed', 'failed', 'colo',
>              'pre-switchover', 'device', 'wait-unplug' ] }
> +##
> +# @VfioStats:
> +#
> +# Detailed VFIO devices migration statistics
> +#
> +# @transferred: amount of bytes transferred to the target VM by VFIO devices
> +#
> +# Since: 5.1
> +#
> +##
> +{ 'struct': 'VfioStats',
> +  'data': {'transferred': 'int' } }

Pardon my ignorance...  What exactly do VFIO devices transfer to the
target VM?  How is that related to MigrationInfo member @ram?

MigrationStats has much more information, and some of it is pretty
useful to track how migration is doing, in particular whether it
converges, and how fast.  Absent in VfioStats due to "not implemented",
or due to "can't be done"?

Byte counts should use QAPI type 'size'.  Many existing ones don't.
Since MigrationStats uses 'int', I'll let the migration maintainers
decide whether they want 'int' or 'size' here.

>  ##
>  # @MigrationInfo:
> @@ -207,11 +219,16 @@
>  #
>  # @socket-address: Only used for tcp, to know what the real port is (Since 4.0)
>  #
> +# @vfio: @VfioStats containing detailed VFIO devices migration statistics,
> +#        only returned if VFIO device is present, migration is supported by all
> +#         VFIO devices and status is 'active' or 'completed' (since 5.1)
> +#
>  # Since: 0.14.0
>  ##
>  { 'struct': 'MigrationInfo',
>    'data': {'*status': 'MigrationStatus', '*ram': 'MigrationStats',
>             '*disk': 'MigrationStats',
> +           '*vfio': 'VfioStats',
>             '*xbzrle-cache': 'XBZRLECacheStats',
>             '*total-time': 'int',
>             '*expected-downtime': 'int',
Kirti Wankhede June 23, 2020, 9:16 p.m. UTC | #2
On 6/23/2020 12:51 PM, Markus Armbruster wrote:
> QAPI review only.
> 
> The only changes since I reviewed v23 is the rename of VfioStats member
> @bytes to @transferred, and the move of MigrationInfo member @vfio next
> to @ram and @disk.  Good.  I'm copying my other questions in the hope of
> getting answers :)
> 
> Kirti Wankhede <kwankhede@nvidia.com> writes:
> 
>> Added amount of bytes transferred to the target VM by all VFIO devices
>>
>> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
> [...]
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index d5000558c6c9..952864b05455 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -146,6 +146,18 @@
>>               'active', 'postcopy-active', 'postcopy-paused',
>>               'postcopy-recover', 'completed', 'failed', 'colo',
>>               'pre-switchover', 'device', 'wait-unplug' ] }
>> +##
>> +# @VfioStats:
>> +#
>> +# Detailed VFIO devices migration statistics
>> +#
>> +# @transferred: amount of bytes transferred to the target VM by VFIO devices
>> +#
>> +# Since: 5.1
>> +#
>> +##
>> +{ 'struct': 'VfioStats',
>> +  'data': {'transferred': 'int' } }
> 
> Pardon my ignorance...  What exactly do VFIO devices transfer to the
> target VM? How is that related to MigrationInfo member @ram? 
> 

Sorry I missed to reply your question on earlier version.

VFIO device transfer vfio device's state, data from VFIO device and 
guest memory pages pinned for dma operation.
For example in case of GPU, vfio device state is GPUs current state to 
be saved that will be restored during resume and device data is data 
from onboard framebuffer. Pinned memory is marked dirty and transferred 
to target VM as part of global dirty page tracking for RAM.
VFIO device can add significant amount of data in migration stream 
(depending on FB size in GB), transferred byte count is important 
parameter to be monitored.

> MigrationStats has much more information, and some of it is pretty
> useful to track how migration is doing, in particular whether it
> converges, and how fast.  Absent in VfioStats due to "not implemented",
> or due to "can't be done"?
>

Vfio device migration interface is same as RAM's migration interface 
(using SaveVMHandlers). Converge part is already take care by 
.save_live_pending hook where *res_precopy_only is set to vfio devices 
pending_bytes, migration->pending_bytes

How fast - I'm not sure how this can be calculated.

Thanks,
Kirti

> Byte counts should use QAPI type 'size'.  Many existing ones don't.
> Since MigrationStats uses 'int', I'll let the migration maintainers
> decide whether they want 'int' or 'size' here.
> 
>>   ##
>>   # @MigrationInfo:
>> @@ -207,11 +219,16 @@
>>   #
>>   # @socket-address: Only used for tcp, to know what the real port is (Since 4.0)
>>   #
>> +# @vfio: @VfioStats containing detailed VFIO devices migration statistics,
>> +#        only returned if VFIO device is present, migration is supported by all
>> +#         VFIO devices and status is 'active' or 'completed' (since 5.1)
>> +#
>>   # Since: 0.14.0
>>   ##
>>   { 'struct': 'MigrationInfo',
>>     'data': {'*status': 'MigrationStatus', '*ram': 'MigrationStats',
>>              '*disk': 'MigrationStats',
>> +           '*vfio': 'VfioStats',
>>              '*xbzrle-cache': 'XBZRLECacheStats',
>>              '*total-time': 'int',
>>              '*expected-downtime': 'int',
>
Markus Armbruster June 25, 2020, 5:51 a.m. UTC | #3
Kirti Wankhede <kwankhede@nvidia.com> writes:

> On 6/23/2020 12:51 PM, Markus Armbruster wrote:
>> QAPI review only.
>>
>> The only changes since I reviewed v23 is the rename of VfioStats member
>> @bytes to @transferred, and the move of MigrationInfo member @vfio next
>> to @ram and @disk.  Good.  I'm copying my other questions in the hope of
>> getting answers :)
>>
>> Kirti Wankhede <kwankhede@nvidia.com> writes:
>>
>>> Added amount of bytes transferred to the target VM by all VFIO devices
>>>
>>> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
>> [...]
>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>> index d5000558c6c9..952864b05455 100644
>>> --- a/qapi/migration.json
>>> +++ b/qapi/migration.json
>>> @@ -146,6 +146,18 @@
>>>               'active', 'postcopy-active', 'postcopy-paused',
>>>               'postcopy-recover', 'completed', 'failed', 'colo',
>>>               'pre-switchover', 'device', 'wait-unplug' ] }
>>> +##
>>> +# @VfioStats:
>>> +#
>>> +# Detailed VFIO devices migration statistics
>>> +#
>>> +# @transferred: amount of bytes transferred to the target VM by VFIO devices
>>> +#
>>> +# Since: 5.1
>>> +#
>>> +##
>>> +{ 'struct': 'VfioStats',
>>> +  'data': {'transferred': 'int' } }
>>
>> Pardon my ignorance...  What exactly do VFIO devices transfer to the
>> target VM? How is that related to MigrationInfo member @ram? 
>>
>
> Sorry I missed to reply your question on earlier version.

Happens :)

> VFIO device transfer vfio device's state, data from VFIO device and
> guest memory pages pinned for dma operation.
> For example in case of GPU, vfio device state is GPUs current state to
> be saved that will be restored during resume and device data is data
> from onboard framebuffer. Pinned memory is marked dirty and
> transferred to target VM as part of global dirty page tracking for
> RAM.
> VFIO device can add significant amount of data in migration stream
> (depending on FB size in GB), transferred byte count is important
> parameter to be monitored.

Can we work this into documentation somehow?

Have you considered adding something on VFIO migration to docs/?  Then a
link with a short description could suffice here.

>> MigrationStats has much more information, and some of it is pretty
>> useful to track how migration is doing, in particular whether it
>> converges, and how fast.  Absent in VfioStats due to "not implemented",
>> or due to "can't be done"?
>>
>
> Vfio device migration interface is same as RAM's migration interface
> (using SaveVMHandlers). Converge part is already take care by
> .save_live_pending hook where *res_precopy_only is set to vfio devices
> pending_bytes, migration->pending_bytes
>
> How fast - I'm not sure how this can be calculated.

My concern is providing management applications the means they need to
monitor migration.  Have you solicited input from management application
developers on what's needed?

"Same as RAM's migration" makes me suspect the same stats are needed.
This may well be a subset of the stats provided for RAM.

Missing stats we need can be added on top, as long as it's done in a
timely manner.  But we better know how to compute them, or how to do
without.

> Thanks,
> Kirti
>
>> Byte counts should use QAPI type 'size'.  Many existing ones don't.
>> Since MigrationStats uses 'int', I'll let the migration maintainers
>> decide whether they want 'int' or 'size' here.
>>
>>>   ##
>>>   # @MigrationInfo:
>>> @@ -207,11 +219,16 @@
>>>   #
>>>   # @socket-address: Only used for tcp, to know what the real port is (Since 4.0)
>>>   #
>>> +# @vfio: @VfioStats containing detailed VFIO devices migration statistics,
>>> +#        only returned if VFIO device is present, migration is supported by all
>>> +#         VFIO devices and status is 'active' or 'completed' (since 5.1)
>>> +#
>>>   # Since: 0.14.0
>>>   ##
>>>   { 'struct': 'MigrationInfo',
>>>     'data': {'*status': 'MigrationStatus', '*ram': 'MigrationStats',
>>>              '*disk': 'MigrationStats',
>>> +           '*vfio': 'VfioStats',
>>>              '*xbzrle-cache': 'XBZRLECacheStats',
>>>              '*total-time': 'int',
>>>              '*expected-downtime': 'int',
>>
diff mbox series

Patch

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index a06b8f2f66e2..ff0a4072107f 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -40,6 +40,7 @@ 
 #include "trace.h"
 #include "qapi/error.h"
 #include "migration/migration.h"
+#include "qemu/vfio-helpers.h"
 
 VFIOGroupList vfio_group_list =
     QLIST_HEAD_INITIALIZER(vfio_group_list);
@@ -293,6 +294,25 @@  const MemoryRegionOps vfio_region_ops = {
  * Device state interfaces
  */
 
+bool vfio_mig_active(void)
+{
+    VFIOGroup *group;
+    VFIODevice *vbasedev;
+
+    if (QLIST_EMPTY(&vfio_group_list)) {
+        return false;
+    }
+
+    QLIST_FOREACH(group, &vfio_group_list, next) {
+        QLIST_FOREACH(vbasedev, &group->device_list, next) {
+            if (vbasedev->migration_blocker) {
+                return false;
+            }
+        }
+    }
+    return true;
+}
+
 static bool vfio_devices_are_stopped_and_saving(void)
 {
     VFIOGroup *group;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index e0fbb3a01855..09eec9cafdd5 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -27,7 +27,7 @@ 
 #include "exec/ram_addr.h"
 #include "pci.h"
 #include "trace.h"
-
+#include "qemu/vfio-helpers.h"
 /*
  * Flags used as delimiter:
  * 0xffffffff => MSB 32-bit all 1s
@@ -39,6 +39,8 @@ 
 #define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xffffffffef100003ULL)
 #define VFIO_MIG_FLAG_DEV_DATA_STATE    (0xffffffffef100004ULL)
 
+static int64_t bytes_transferred;
+
 static void vfio_migration_region_exit(VFIODevice *vbasedev)
 {
     VFIOMigration *migration = vbasedev->migration;
@@ -261,6 +263,7 @@  static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev)
         return ret;
     }
 
+    bytes_transferred += data_size;
     return data_size;
 }
 
@@ -742,6 +745,7 @@  static void vfio_migration_state_notifier(Notifier *notifier, void *data)
         }
 
         vfio_start_dirty_page_tracking(vbasedev, false);
+        bytes_transferred = 0;
     }
 }
 
@@ -789,6 +793,11 @@  static int vfio_migration_init(VFIODevice *vbasedev,
 
 /* ---------------------------------------------------------------------- */
 
+int64_t vfio_mig_bytes_transferred(void)
+{
+    return bytes_transferred;
+}
+
 int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
 {
     struct vfio_region_info *info;
diff --git a/include/qemu/vfio-helpers.h b/include/qemu/vfio-helpers.h
index 1f057c2b9e40..26a7df0767b1 100644
--- a/include/qemu/vfio-helpers.h
+++ b/include/qemu/vfio-helpers.h
@@ -29,4 +29,7 @@  void qemu_vfio_pci_unmap_bar(QEMUVFIOState *s, int index, void *bar,
 int qemu_vfio_pci_init_irq(QEMUVFIOState *s, EventNotifier *e,
                            int irq_type, Error **errp);
 
+bool vfio_mig_active(void);
+int64_t vfio_mig_bytes_transferred(void);
+
 #endif
diff --git a/migration/migration.c b/migration/migration.c
index 481a590f7222..8b010a002c3d 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -54,6 +54,7 @@ 
 #include "net/announce.h"
 #include "qemu/queue.h"
 #include "multifd.h"
+#include "qemu/vfio-helpers.h"
 
 #define MAX_THROTTLE  (32 << 20)      /* Migration transfer speed throttling */
 
@@ -970,6 +971,17 @@  static void populate_disk_info(MigrationInfo *info)
     }
 }
 
+static void populate_vfio_info(MigrationInfo *info)
+{
+#ifdef CONFIG_LINUX
+    if (vfio_mig_active()) {
+        info->has_vfio = true;
+        info->vfio = g_malloc0(sizeof(*info->vfio));
+        info->vfio->transferred = vfio_mig_bytes_transferred();
+    }
+#endif
+}
+
 static void fill_source_migration_info(MigrationInfo *info)
 {
     MigrationState *s = migrate_get_current();
@@ -995,6 +1007,7 @@  static void fill_source_migration_info(MigrationInfo *info)
         populate_time_info(info, s);
         populate_ram_info(info, s);
         populate_disk_info(info);
+        populate_vfio_info(info);
         break;
     case MIGRATION_STATUS_COLO:
         info->has_status = true;
@@ -1003,6 +1016,7 @@  static void fill_source_migration_info(MigrationInfo *info)
     case MIGRATION_STATUS_COMPLETED:
         populate_time_info(info, s);
         populate_ram_info(info, s);
+        populate_vfio_info(info);
         break;
     case MIGRATION_STATUS_FAILED:
         info->has_status = true;
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 2b0b58a336e6..29bb4d2d949c 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -355,6 +355,12 @@  void hmp_info_migrate(Monitor *mon, const QDict *qdict)
         }
         monitor_printf(mon, "]\n");
     }
+
+    if (info->has_vfio) {
+        monitor_printf(mon, "vfio device transferred: %" PRIu64 " kbytes\n",
+                       info->vfio->transferred >> 10);
+    }
+
     qapi_free_MigrationInfo(info);
 }
 
diff --git a/qapi/migration.json b/qapi/migration.json
index d5000558c6c9..952864b05455 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -146,6 +146,18 @@ 
             'active', 'postcopy-active', 'postcopy-paused',
             'postcopy-recover', 'completed', 'failed', 'colo',
             'pre-switchover', 'device', 'wait-unplug' ] }
+##
+# @VfioStats:
+#
+# Detailed VFIO devices migration statistics
+#
+# @transferred: amount of bytes transferred to the target VM by VFIO devices
+#
+# Since: 5.1
+#
+##
+{ 'struct': 'VfioStats',
+  'data': {'transferred': 'int' } }
 
 ##
 # @MigrationInfo:
@@ -207,11 +219,16 @@ 
 #
 # @socket-address: Only used for tcp, to know what the real port is (Since 4.0)
 #
+# @vfio: @VfioStats containing detailed VFIO devices migration statistics,
+#        only returned if VFIO device is present, migration is supported by all
+#         VFIO devices and status is 'active' or 'completed' (since 5.1)
+#
 # Since: 0.14.0
 ##
 { 'struct': 'MigrationInfo',
   'data': {'*status': 'MigrationStatus', '*ram': 'MigrationStats',
            '*disk': 'MigrationStats',
+           '*vfio': 'VfioStats',
            '*xbzrle-cache': 'XBZRLECacheStats',
            '*total-time': 'int',
            '*expected-downtime': 'int',