diff mbox series

[v2,02/11] vfio/migration: Skip pre-copy if dirty page tracking is not supported

Message ID 20220530170739.19072-3-avihaih@nvidia.com (mailing list archive)
State New, archived
Headers show
Series vfio/migration: Implement VFIO migration protocol v2 | expand

Commit Message

Avihai Horon May 30, 2022, 5:07 p.m. UTC
Currently, if IOMMU of a VFIO container doesn't support dirty page
tracking, migration is blocked completely. This is because a DMA-able
VFIO device can dirty RAM pages without updating QEMU about it, thus
breaking the migration.

However, this doesn't mean that migration can't be done at all. If
migration pre-copy phase is skipped, the VFIO device doesn't have a
chance to dirty RAM pages that have been migrated already, thus
eliminating the problem previously mentioned.

Hence, in such case allow migration but skip pre-copy phase.

Signed-off-by: Avihai Horon <avihaih@nvidia.com>
---
 hw/vfio/migration.c   | 9 ++++++++-
 migration/migration.c | 5 +++++
 migration/migration.h | 3 +++
 3 files changed, 16 insertions(+), 1 deletion(-)

Comments

Avihai Horon May 30, 2022, 5:12 p.m. UTC | #1
On 5/30/2022 8:07 PM, Avihai Horon wrote:
> Currently, if IOMMU of a VFIO container doesn't support dirty page
> tracking, migration is blocked completely. This is because a DMA-able
> VFIO device can dirty RAM pages without updating QEMU about it, thus
> breaking the migration.
>
> However, this doesn't mean that migration can't be done at all. If
> migration pre-copy phase is skipped, the VFIO device doesn't have a
> chance to dirty RAM pages that have been migrated already, thus
> eliminating the problem previously mentioned.
>
> Hence, in such case allow migration but skip pre-copy phase.
>
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
>   hw/vfio/migration.c   | 9 ++++++++-
>   migration/migration.c | 5 +++++
>   migration/migration.h | 3 +++
>   3 files changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 34f9f894ed..d8f9b086c2 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -863,10 +863,17 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
>       struct vfio_region_info *info = NULL;
>       int ret = -ENOTSUP;
>   
> -    if (!vbasedev->enable_migration || !container->dirty_pages_supported) {
> +    if (!vbasedev->enable_migration) {
>           goto add_blocker;
>       }
>   
> +    if (!container->dirty_pages_supported) {
> +        warn_report_once(
> +            "%s: IOMMU of the device's VFIO container doesn't support dirty page tracking, migration pre-copy phase will be skipped",
> +            vbasedev->name);
> +        migrate_get_current()->skip_precopy = true;
> +    }
> +
>       ret = vfio_get_dev_region_info(vbasedev,
>                                      VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
>                                      VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
> diff --git a/migration/migration.c b/migration/migration.c
> index 31739b2af9..217f0e3e94 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3636,6 +3636,11 @@ static MigIterateState migration_iteration_run(MigrationState *s)
>       uint64_t pending_size, pend_pre, pend_compat, pend_post;
>       bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
>   
> +    if (s->skip_precopy) {
> +        migration_completion(s);
> +        return MIG_ITERATE_BREAK;
> +    }
> +
>       qemu_savevm_state_pending(s->to_dst_file, s->threshold_size, &pend_pre,
>                                 &pend_compat, &pend_post);
>       pending_size = pend_pre + pend_compat + pend_post;
> diff --git a/migration/migration.h b/migration/migration.h
> index 485d58b95f..0920a0950e 100644
> --- a/migration/migration.h
> +++ b/migration/migration.h
> @@ -332,6 +332,9 @@ struct MigrationState {
>        * This save hostname when out-going migration starts
>        */
>       char *hostname;
> +
> +    /* Whether to skip pre-copy phase of migration or not */
> +    bool skip_precopy;
>   };
>   
>   void migrate_set_state(int *state, int old_state, int new_state);

This patch still has the problem that it doesn't respect configured 
downtime limit.

Maybe adding an option to set "no downtime limit" will solve it?
Then we can allow migration with VFIO device that doesn't support dirty 
tracking only if this option is set.
Can we use migration param downtime_limit with value 0 to mark "no 
downtime limit"? Does it make sense?

Do you have other ideas how to solve this issue?

Thanks!
Avihai Horon June 7, 2022, 5:53 p.m. UTC | #2
On 5/30/2022 8:12 PM, Avihai Horon wrote:
>
> On 5/30/2022 8:07 PM, Avihai Horon wrote:
>> Currently, if IOMMU of a VFIO container doesn't support dirty page
>> tracking, migration is blocked completely. This is because a DMA-able
>> VFIO device can dirty RAM pages without updating QEMU about it, thus
>> breaking the migration.
>>
>> However, this doesn't mean that migration can't be done at all. If
>> migration pre-copy phase is skipped, the VFIO device doesn't have a
>> chance to dirty RAM pages that have been migrated already, thus
>> eliminating the problem previously mentioned.
>>
>> Hence, in such case allow migration but skip pre-copy phase.
>>
>> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
>> ---
>>   hw/vfio/migration.c   | 9 ++++++++-
>>   migration/migration.c | 5 +++++
>>   migration/migration.h | 3 +++
>>   3 files changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
>> index 34f9f894ed..d8f9b086c2 100644
>> --- a/hw/vfio/migration.c
>> +++ b/hw/vfio/migration.c
>> @@ -863,10 +863,17 @@ int vfio_migration_probe(VFIODevice *vbasedev, 
>> Error **errp)
>>       struct vfio_region_info *info = NULL;
>>       int ret = -ENOTSUP;
>>   -    if (!vbasedev->enable_migration || 
>> !container->dirty_pages_supported) {
>> +    if (!vbasedev->enable_migration) {
>>           goto add_blocker;
>>       }
>>   +    if (!container->dirty_pages_supported) {
>> +        warn_report_once(
>> +            "%s: IOMMU of the device's VFIO container doesn't 
>> support dirty page tracking, migration pre-copy phase will be skipped",
>> +            vbasedev->name);
>> +        migrate_get_current()->skip_precopy = true;
>> +    }
>> +
>>       ret = vfio_get_dev_region_info(vbasedev,
>> VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
>> VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 31739b2af9..217f0e3e94 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -3636,6 +3636,11 @@ static MigIterateState 
>> migration_iteration_run(MigrationState *s)
>>       uint64_t pending_size, pend_pre, pend_compat, pend_post;
>>       bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
>>   +    if (s->skip_precopy) {
>> +        migration_completion(s);
>> +        return MIG_ITERATE_BREAK;
>> +    }
>> +
>>       qemu_savevm_state_pending(s->to_dst_file, s->threshold_size, 
>> &pend_pre,
>>                                 &pend_compat, &pend_post);
>>       pending_size = pend_pre + pend_compat + pend_post;
>> diff --git a/migration/migration.h b/migration/migration.h
>> index 485d58b95f..0920a0950e 100644
>> --- a/migration/migration.h
>> +++ b/migration/migration.h
>> @@ -332,6 +332,9 @@ struct MigrationState {
>>        * This save hostname when out-going migration starts
>>        */
>>       char *hostname;
>> +
>> +    /* Whether to skip pre-copy phase of migration or not */
>> +    bool skip_precopy;
>>   };
>>     void migrate_set_state(int *state, int old_state, int new_state);
>
> This patch still has the problem that it doesn't respect configured 
> downtime limit.
>
> Maybe adding an option to set "no downtime limit" will solve it?
> Then we can allow migration with VFIO device that doesn't support 
> dirty tracking only if this option is set.
> Can we use migration param downtime_limit with value 0 to mark "no 
> downtime limit"? Does it make sense?
>
> Do you have other ideas how to solve this issue?
>
What about letting QEMU mark all RAM dirty instead of kernel? Same 
effect but no need for kernel support.
Is this a reasonable approach?

Thanks.
diff mbox series

Patch

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 34f9f894ed..d8f9b086c2 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -863,10 +863,17 @@  int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
     struct vfio_region_info *info = NULL;
     int ret = -ENOTSUP;
 
-    if (!vbasedev->enable_migration || !container->dirty_pages_supported) {
+    if (!vbasedev->enable_migration) {
         goto add_blocker;
     }
 
+    if (!container->dirty_pages_supported) {
+        warn_report_once(
+            "%s: IOMMU of the device's VFIO container doesn't support dirty page tracking, migration pre-copy phase will be skipped",
+            vbasedev->name);
+        migrate_get_current()->skip_precopy = true;
+    }
+
     ret = vfio_get_dev_region_info(vbasedev,
                                    VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
                                    VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
diff --git a/migration/migration.c b/migration/migration.c
index 31739b2af9..217f0e3e94 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3636,6 +3636,11 @@  static MigIterateState migration_iteration_run(MigrationState *s)
     uint64_t pending_size, pend_pre, pend_compat, pend_post;
     bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
 
+    if (s->skip_precopy) {
+        migration_completion(s);
+        return MIG_ITERATE_BREAK;
+    }
+
     qemu_savevm_state_pending(s->to_dst_file, s->threshold_size, &pend_pre,
                               &pend_compat, &pend_post);
     pending_size = pend_pre + pend_compat + pend_post;
diff --git a/migration/migration.h b/migration/migration.h
index 485d58b95f..0920a0950e 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -332,6 +332,9 @@  struct MigrationState {
      * This save hostname when out-going migration starts
      */
     char *hostname;
+
+    /* Whether to skip pre-copy phase of migration or not */
+    bool skip_precopy;
 };
 
 void migrate_set_state(int *state, int old_state, int new_state);