Message ID | 20220530170739.19072-3-avihaih@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vfio/migration: Implement VFIO migration protocol v2 | expand |
On 5/30/2022 8:07 PM, Avihai Horon wrote: > Currently, if IOMMU of a VFIO container doesn't support dirty page > tracking, migration is blocked completely. This is because a DMA-able > VFIO device can dirty RAM pages without updating QEMU about it, thus > breaking the migration. > > However, this doesn't mean that migration can't be done at all. If > migration pre-copy phase is skipped, the VFIO device doesn't have a > chance to dirty RAM pages that have been migrated already, thus > eliminating the problem previously mentioned. > > Hence, in such case allow migration but skip pre-copy phase. > > Signed-off-by: Avihai Horon <avihaih@nvidia.com> > --- > hw/vfio/migration.c | 9 ++++++++- > migration/migration.c | 5 +++++ > migration/migration.h | 3 +++ > 3 files changed, 16 insertions(+), 1 deletion(-) > > diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c > index 34f9f894ed..d8f9b086c2 100644 > --- a/hw/vfio/migration.c > +++ b/hw/vfio/migration.c > @@ -863,10 +863,17 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error **errp) > struct vfio_region_info *info = NULL; > int ret = -ENOTSUP; > > - if (!vbasedev->enable_migration || !container->dirty_pages_supported) { > + if (!vbasedev->enable_migration) { > goto add_blocker; > } > > + if (!container->dirty_pages_supported) { > + warn_report_once( > + "%s: IOMMU of the device's VFIO container doesn't support dirty page tracking, migration pre-copy phase will be skipped", > + vbasedev->name); > + migrate_get_current()->skip_precopy = true; > + } > + > ret = vfio_get_dev_region_info(vbasedev, > VFIO_REGION_TYPE_MIGRATION_DEPRECATED, > VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED, > diff --git a/migration/migration.c b/migration/migration.c > index 31739b2af9..217f0e3e94 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -3636,6 +3636,11 @@ static MigIterateState migration_iteration_run(MigrationState *s) > uint64_t pending_size, pend_pre, pend_compat, pend_post; > bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE; > > + if (s->skip_precopy) { > + migration_completion(s); > + return MIG_ITERATE_BREAK; > + } > + > qemu_savevm_state_pending(s->to_dst_file, s->threshold_size, &pend_pre, > &pend_compat, &pend_post); > pending_size = pend_pre + pend_compat + pend_post; > diff --git a/migration/migration.h b/migration/migration.h > index 485d58b95f..0920a0950e 100644 > --- a/migration/migration.h > +++ b/migration/migration.h > @@ -332,6 +332,9 @@ struct MigrationState { > * This save hostname when out-going migration starts > */ > char *hostname; > + > + /* Whether to skip pre-copy phase of migration or not */ > + bool skip_precopy; > }; > > void migrate_set_state(int *state, int old_state, int new_state); This patch still has the problem that it doesn't respect configured downtime limit. Maybe adding an option to set "no downtime limit" will solve it? Then we can allow migration with VFIO device that doesn't support dirty tracking only if this option is set. Can we use migration param downtime_limit with value 0 to mark "no downtime limit"? Does it make sense? Do you have other ideas how to solve this issue? Thanks!
On 5/30/2022 8:12 PM, Avihai Horon wrote: > > On 5/30/2022 8:07 PM, Avihai Horon wrote: >> Currently, if IOMMU of a VFIO container doesn't support dirty page >> tracking, migration is blocked completely. This is because a DMA-able >> VFIO device can dirty RAM pages without updating QEMU about it, thus >> breaking the migration. >> >> However, this doesn't mean that migration can't be done at all. If >> migration pre-copy phase is skipped, the VFIO device doesn't have a >> chance to dirty RAM pages that have been migrated already, thus >> eliminating the problem previously mentioned. >> >> Hence, in such case allow migration but skip pre-copy phase. >> >> Signed-off-by: Avihai Horon <avihaih@nvidia.com> >> --- >> hw/vfio/migration.c | 9 ++++++++- >> migration/migration.c | 5 +++++ >> migration/migration.h | 3 +++ >> 3 files changed, 16 insertions(+), 1 deletion(-) >> >> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c >> index 34f9f894ed..d8f9b086c2 100644 >> --- a/hw/vfio/migration.c >> +++ b/hw/vfio/migration.c >> @@ -863,10 +863,17 @@ int vfio_migration_probe(VFIODevice *vbasedev, >> Error **errp) >> struct vfio_region_info *info = NULL; >> int ret = -ENOTSUP; >> - if (!vbasedev->enable_migration || >> !container->dirty_pages_supported) { >> + if (!vbasedev->enable_migration) { >> goto add_blocker; >> } >> + if (!container->dirty_pages_supported) { >> + warn_report_once( >> + "%s: IOMMU of the device's VFIO container doesn't >> support dirty page tracking, migration pre-copy phase will be skipped", >> + vbasedev->name); >> + migrate_get_current()->skip_precopy = true; >> + } >> + >> ret = vfio_get_dev_region_info(vbasedev, >> VFIO_REGION_TYPE_MIGRATION_DEPRECATED, >> VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED, >> diff --git a/migration/migration.c b/migration/migration.c >> index 31739b2af9..217f0e3e94 100644 >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -3636,6 +3636,11 @@ static MigIterateState >> migration_iteration_run(MigrationState *s) >> uint64_t pending_size, pend_pre, pend_compat, pend_post; >> bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE; >> + if (s->skip_precopy) { >> + migration_completion(s); >> + return MIG_ITERATE_BREAK; >> + } >> + >> qemu_savevm_state_pending(s->to_dst_file, s->threshold_size, >> &pend_pre, >> &pend_compat, &pend_post); >> pending_size = pend_pre + pend_compat + pend_post; >> diff --git a/migration/migration.h b/migration/migration.h >> index 485d58b95f..0920a0950e 100644 >> --- a/migration/migration.h >> +++ b/migration/migration.h >> @@ -332,6 +332,9 @@ struct MigrationState { >> * This save hostname when out-going migration starts >> */ >> char *hostname; >> + >> + /* Whether to skip pre-copy phase of migration or not */ >> + bool skip_precopy; >> }; >> void migrate_set_state(int *state, int old_state, int new_state); > > This patch still has the problem that it doesn't respect configured > downtime limit. > > Maybe adding an option to set "no downtime limit" will solve it? > Then we can allow migration with VFIO device that doesn't support > dirty tracking only if this option is set. > Can we use migration param downtime_limit with value 0 to mark "no > downtime limit"? Does it make sense? > > Do you have other ideas how to solve this issue? > What about letting QEMU mark all RAM dirty instead of kernel? Same effect but no need for kernel support. Is this a reasonable approach? Thanks.
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 34f9f894ed..d8f9b086c2 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -863,10 +863,17 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error **errp) struct vfio_region_info *info = NULL; int ret = -ENOTSUP; - if (!vbasedev->enable_migration || !container->dirty_pages_supported) { + if (!vbasedev->enable_migration) { goto add_blocker; } + if (!container->dirty_pages_supported) { + warn_report_once( + "%s: IOMMU of the device's VFIO container doesn't support dirty page tracking, migration pre-copy phase will be skipped", + vbasedev->name); + migrate_get_current()->skip_precopy = true; + } + ret = vfio_get_dev_region_info(vbasedev, VFIO_REGION_TYPE_MIGRATION_DEPRECATED, VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED, diff --git a/migration/migration.c b/migration/migration.c index 31739b2af9..217f0e3e94 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3636,6 +3636,11 @@ static MigIterateState migration_iteration_run(MigrationState *s) uint64_t pending_size, pend_pre, pend_compat, pend_post; bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE; + if (s->skip_precopy) { + migration_completion(s); + return MIG_ITERATE_BREAK; + } + qemu_savevm_state_pending(s->to_dst_file, s->threshold_size, &pend_pre, &pend_compat, &pend_post); pending_size = pend_pre + pend_compat + pend_post; diff --git a/migration/migration.h b/migration/migration.h index 485d58b95f..0920a0950e 100644 --- a/migration/migration.h +++ b/migration/migration.h @@ -332,6 +332,9 @@ struct MigrationState { * This save hostname when out-going migration starts */ char *hostname; + + /* Whether to skip pre-copy phase of migration or not */ + bool skip_precopy; }; void migrate_set_state(int *state, int old_state, int new_state);
Currently, if IOMMU of a VFIO container doesn't support dirty page tracking, migration is blocked completely. This is because a DMA-able VFIO device can dirty RAM pages without updating QEMU about it, thus breaking the migration. However, this doesn't mean that migration can't be done at all. If migration pre-copy phase is skipped, the VFIO device doesn't have a chance to dirty RAM pages that have been migrated already, thus eliminating the problem previously mentioned. Hence, in such case allow migration but skip pre-copy phase. Signed-off-by: Avihai Horon <avihaih@nvidia.com> --- hw/vfio/migration.c | 9 ++++++++- migration/migration.c | 5 +++++ migration/migration.h | 3 +++ 3 files changed, 16 insertions(+), 1 deletion(-)