Message ID | 20230530144821.1557-2-avihaih@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | migration: Add switchover ack capability and VFIO precopy support | expand |
Test in the following two scenarios: [1] Test scenario: Both source VM and target VM (in listening mode) have enabled return-path and switchover-ack capability: Test result : The VFIO migration completed successfully [2] Test scenario : The source VM has enabled return-path and switchover-ack capability while the target VM (in listening mode) not Test result : The VFIO migration fails The detailed error thrown by qemu-kvm when VFIO migration fails: Target VM: 0000:17:00.2: Received INIT_DATA_SENT but switchover ack is not used error while loading state section id 81(0000:00:02.4:00.0/vfio) load of migration failed: Invalid argument Source VM: failed to save SaveStateEntry with id(name): 2(ram): -5 Unable to write to socket: Connection reset by peer Unable to write to socket: Connection reset by peer Tested-by: YangHang Liu <yanghliu@redhat.com> On Wed, May 31, 2023 at 1:46 AM Avihai Horon <avihaih@nvidia.com> wrote: > > Migration downtime estimation is calculated based on bandwidth and > remaining migration data. This assumes that loading of migration data in > the destination takes a negligible amount of time and that downtime > depends only on network speed. > > While this may be true for RAM, it's not necessarily true for other > migrated devices. For example, loading the data of a VFIO device in the > destination might require from the device to allocate resources, prepare > internal data structures and so on. These operations can take a > significant amount of time which can increase migration downtime. > > This patch adds a new capability "switchover ack" that prevents the > source from stopping the VM and completing the migration until an ACK > is received from the destination that it's OK to do so. > > This can be used by migrated devices in various ways to reduce downtime. > For example, a device can send initial precopy metadata to pre-allocate > resources in the destination and use this capability to make sure that > the pre-allocation is completed before the source VM is stopped, so it > will have full effect. > > This new capability relies on the return path capability to communicate > from the destination back to the source. > > The actual implementation of the capability will be added in the > following patches. > > Signed-off-by: Avihai Horon <avihaih@nvidia.com> > Reviewed-by: Peter Xu <peterx@redhat.com> > Acked-by: Markus Armbruster <armbru@redhat.com> > --- > qapi/migration.json | 12 +++++++++++- > migration/options.h | 1 + > migration/options.c | 21 +++++++++++++++++++++ > 3 files changed, 33 insertions(+), 1 deletion(-) > > diff --git a/qapi/migration.json b/qapi/migration.json > index 179af0c4d8..061ea512e0 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -487,6 +487,16 @@ > # and should not affect the correctness of postcopy migration. > # (since 7.1) > # > +# @switchover-ack: If enabled, migration will not stop the source VM > +# and complete the migration until an ACK is received from the > +# destination that it's OK to do so. Exactly when this ACK is > +# sent depends on the migrated devices that use this feature. > +# For example, a device can use it to make sure some of its data > +# is sent and loaded in the destination before doing switchover. > +# This can reduce downtime if devices that support this capability > +# are present. 'return-path' capability must be enabled to use > +# it. (since 8.1) > +# > # Features: > # > # @unstable: Members @x-colo and @x-ignore-shared are experimental. > @@ -502,7 +512,7 @@ > 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate', > { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] }, > 'validate-uuid', 'background-snapshot', > - 'zero-copy-send', 'postcopy-preempt'] } > + 'zero-copy-send', 'postcopy-preempt', 'switchover-ack'] } > > ## > # @MigrationCapabilityStatus: > diff --git a/migration/options.h b/migration/options.h > index 45991af3c2..9aaf363322 100644 > --- a/migration/options.h > +++ b/migration/options.h > @@ -40,6 +40,7 @@ bool migrate_postcopy_ram(void); > bool migrate_rdma_pin_all(void); > bool migrate_release_ram(void); > bool migrate_return_path(void); > +bool migrate_switchover_ack(void); > bool migrate_validate_uuid(void); > bool migrate_xbzrle(void); > bool migrate_zero_blocks(void); > diff --git a/migration/options.c b/migration/options.c > index b62ab30cd5..16007afca6 100644 > --- a/migration/options.c > +++ b/migration/options.c > @@ -185,6 +185,8 @@ Property migration_properties[] = { > DEFINE_PROP_MIG_CAP("x-zero-copy-send", > MIGRATION_CAPABILITY_ZERO_COPY_SEND), > #endif > + DEFINE_PROP_MIG_CAP("x-switchover-ack", > + MIGRATION_CAPABILITY_SWITCHOVER_ACK), > > DEFINE_PROP_END_OF_LIST(), > }; > @@ -308,6 +310,13 @@ bool migrate_return_path(void) > return s->capabilities[MIGRATION_CAPABILITY_RETURN_PATH]; > } > > +bool migrate_switchover_ack(void) > +{ > + MigrationState *s = migrate_get_current(); > + > + return s->capabilities[MIGRATION_CAPABILITY_SWITCHOVER_ACK]; > +} > + > bool migrate_validate_uuid(void) > { > MigrationState *s = migrate_get_current(); > @@ -547,6 +556,18 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp) > } > } > > + if (new_caps[MIGRATION_CAPABILITY_SWITCHOVER_ACK]) { > + if (!new_caps[MIGRATION_CAPABILITY_RETURN_PATH]) { > + error_setg(errp, "Capability 'switchover-ack' requires capability " > + "'return-path'"); > + return false; > + } > + > + /* Disable this capability until it's implemented */ > + error_setg(errp, "'switchover-ack' is not implemented yet"); > + return false; > + } > + > return true; > } > > -- > 2.26.3 > >
On 6/15/23 14:38, YangHang Liu wrote: > Test in the following two scenarios: > > [1] Test scenario: Both source VM and target VM (in listening mode) > have enabled return-path and switchover-ack capability: > > Test result : The VFIO migration completed successfully > > [2] Test scenario : The source VM has enabled return-path and > switchover-ack capability while the target VM (in listening mode) not > > Test result : The VFIO migration fails > > The detailed error thrown by qemu-kvm when VFIO migration fails: > Target VM: > 0000:17:00.2: Received INIT_DATA_SENT but switchover ack is not used > error while loading state section id 81(0000:00:02.4:00.0/vfio) > load of migration failed: Invalid argument > Source VM: > failed to save SaveStateEntry with id(name): 2(ram): -5 > Unable to write to socket: Connection reset by peer > Unable to write to socket: Connection reset by peer > > Tested-by: YangHang Liu <yanghliu@redhat.com> Some more info, Tests were performed with a mainline Linux and a mainline QEMU including this series - patch8. The amount of precopy data for a CX-7 VF is not very large. Any idea how to generate some more initial state with such devices ? I suppose pre-copy will be more important with vGPUs. YangHang, Could you please reply with a Tested-by on the cover letter, so that the whole series is tagged and not only patch 1. Thanks, C. > > > On Wed, May 31, 2023 at 1:46 AM Avihai Horon <avihaih@nvidia.com> wrote: >> >> Migration downtime estimation is calculated based on bandwidth and >> remaining migration data. This assumes that loading of migration data in >> the destination takes a negligible amount of time and that downtime >> depends only on network speed. >> >> While this may be true for RAM, it's not necessarily true for other >> migrated devices. For example, loading the data of a VFIO device in the >> destination might require from the device to allocate resources, prepare >> internal data structures and so on. These operations can take a >> significant amount of time which can increase migration downtime. >> >> This patch adds a new capability "switchover ack" that prevents the >> source from stopping the VM and completing the migration until an ACK >> is received from the destination that it's OK to do so. >> >> This can be used by migrated devices in various ways to reduce downtime. >> For example, a device can send initial precopy metadata to pre-allocate >> resources in the destination and use this capability to make sure that >> the pre-allocation is completed before the source VM is stopped, so it >> will have full effect. >> >> This new capability relies on the return path capability to communicate >> from the destination back to the source. >> >> The actual implementation of the capability will be added in the >> following patches. >> >> Signed-off-by: Avihai Horon <avihaih@nvidia.com> >> Reviewed-by: Peter Xu <peterx@redhat.com> >> Acked-by: Markus Armbruster <armbru@redhat.com> >> --- >> qapi/migration.json | 12 +++++++++++- >> migration/options.h | 1 + >> migration/options.c | 21 +++++++++++++++++++++ >> 3 files changed, 33 insertions(+), 1 deletion(-) >> >> diff --git a/qapi/migration.json b/qapi/migration.json >> index 179af0c4d8..061ea512e0 100644 >> --- a/qapi/migration.json >> +++ b/qapi/migration.json >> @@ -487,6 +487,16 @@ >> # and should not affect the correctness of postcopy migration. >> # (since 7.1) >> # >> +# @switchover-ack: If enabled, migration will not stop the source VM >> +# and complete the migration until an ACK is received from the >> +# destination that it's OK to do so. Exactly when this ACK is >> +# sent depends on the migrated devices that use this feature. >> +# For example, a device can use it to make sure some of its data >> +# is sent and loaded in the destination before doing switchover. >> +# This can reduce downtime if devices that support this capability >> +# are present. 'return-path' capability must be enabled to use >> +# it. (since 8.1) >> +# >> # Features: >> # >> # @unstable: Members @x-colo and @x-ignore-shared are experimental. >> @@ -502,7 +512,7 @@ >> 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate', >> { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] }, >> 'validate-uuid', 'background-snapshot', >> - 'zero-copy-send', 'postcopy-preempt'] } >> + 'zero-copy-send', 'postcopy-preempt', 'switchover-ack'] } >> >> ## >> # @MigrationCapabilityStatus: >> diff --git a/migration/options.h b/migration/options.h >> index 45991af3c2..9aaf363322 100644 >> --- a/migration/options.h >> +++ b/migration/options.h >> @@ -40,6 +40,7 @@ bool migrate_postcopy_ram(void); >> bool migrate_rdma_pin_all(void); >> bool migrate_release_ram(void); >> bool migrate_return_path(void); >> +bool migrate_switchover_ack(void); >> bool migrate_validate_uuid(void); >> bool migrate_xbzrle(void); >> bool migrate_zero_blocks(void); >> diff --git a/migration/options.c b/migration/options.c >> index b62ab30cd5..16007afca6 100644 >> --- a/migration/options.c >> +++ b/migration/options.c >> @@ -185,6 +185,8 @@ Property migration_properties[] = { >> DEFINE_PROP_MIG_CAP("x-zero-copy-send", >> MIGRATION_CAPABILITY_ZERO_COPY_SEND), >> #endif >> + DEFINE_PROP_MIG_CAP("x-switchover-ack", >> + MIGRATION_CAPABILITY_SWITCHOVER_ACK), >> >> DEFINE_PROP_END_OF_LIST(), >> }; >> @@ -308,6 +310,13 @@ bool migrate_return_path(void) >> return s->capabilities[MIGRATION_CAPABILITY_RETURN_PATH]; >> } >> >> +bool migrate_switchover_ack(void) >> +{ >> + MigrationState *s = migrate_get_current(); >> + >> + return s->capabilities[MIGRATION_CAPABILITY_SWITCHOVER_ACK]; >> +} >> + >> bool migrate_validate_uuid(void) >> { >> MigrationState *s = migrate_get_current(); >> @@ -547,6 +556,18 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp) >> } >> } >> >> + if (new_caps[MIGRATION_CAPABILITY_SWITCHOVER_ACK]) { >> + if (!new_caps[MIGRATION_CAPABILITY_RETURN_PATH]) { >> + error_setg(errp, "Capability 'switchover-ack' requires capability " >> + "'return-path'"); >> + return false; >> + } >> + >> + /* Disable this capability until it's implemented */ >> + error_setg(errp, "'switchover-ack' is not implemented yet"); >> + return false; >> + } >> + >> return true; >> } >> >> -- >> 2.26.3 >> >> >
On 15/06/2023 16:49, Cédric Le Goater wrote: > External email: Use caution opening links or attachments > > > On 6/15/23 14:38, YangHang Liu wrote: >> Test in the following two scenarios: >> >> [1] Test scenario: Both source VM and target VM (in listening mode) >> have enabled return-path and switchover-ack capability: >> >> Test result : The VFIO migration completed successfully >> >> [2] Test scenario : The source VM has enabled return-path and >> switchover-ack capability while the target VM (in listening mode) not >> >> Test result : The VFIO migration fails >> >> The detailed error thrown by qemu-kvm when VFIO migration fails: >> Target VM: >> 0000:17:00.2: Received INIT_DATA_SENT but switchover ack >> is not used >> error while loading state section id >> 81(0000:00:02.4:00.0/vfio) >> load of migration failed: Invalid argument >> Source VM: >> failed to save SaveStateEntry with id(name): 2(ram): -5 >> Unable to write to socket: Connection reset by peer >> Unable to write to socket: Connection reset by peer >> >> Tested-by: YangHang Liu <yanghliu@redhat.com> > > Some more info, > > Tests were performed with a mainline Linux and a mainline QEMU including > this series - patch8. > > The amount of precopy data for a CX-7 VF is not very large. Any idea how > to generate some more initial state with such devices ? > > I suppose pre-copy will be more important with vGPUs. In CX-7 the precopy data is not expected to be very large, because it's mainly used to pre-allocate resources in the destination. However, precopy and switchover-ack are very important for CX-7, because they allow doing the resource pre-allocation in the destination when the source VM is running and reduce downtime significantly (see the example I gave in the cover letter). Thanks. > > > > YangHang, > > Could you please reply with a Tested-by on the cover letter, so that the > whole series is tagged and not only patch 1. > > Thanks, > > C. > >> >> >> On Wed, May 31, 2023 at 1:46 AM Avihai Horon <avihaih@nvidia.com> wrote: >>> >>> Migration downtime estimation is calculated based on bandwidth and >>> remaining migration data. This assumes that loading of migration >>> data in >>> the destination takes a negligible amount of time and that downtime >>> depends only on network speed. >>> >>> While this may be true for RAM, it's not necessarily true for other >>> migrated devices. For example, loading the data of a VFIO device in the >>> destination might require from the device to allocate resources, >>> prepare >>> internal data structures and so on. These operations can take a >>> significant amount of time which can increase migration downtime. >>> >>> This patch adds a new capability "switchover ack" that prevents the >>> source from stopping the VM and completing the migration until an ACK >>> is received from the destination that it's OK to do so. >>> >>> This can be used by migrated devices in various ways to reduce >>> downtime. >>> For example, a device can send initial precopy metadata to pre-allocate >>> resources in the destination and use this capability to make sure that >>> the pre-allocation is completed before the source VM is stopped, so it >>> will have full effect. >>> >>> This new capability relies on the return path capability to communicate >>> from the destination back to the source. >>> >>> The actual implementation of the capability will be added in the >>> following patches. >>> >>> Signed-off-by: Avihai Horon <avihaih@nvidia.com> >>> Reviewed-by: Peter Xu <peterx@redhat.com> >>> Acked-by: Markus Armbruster <armbru@redhat.com> >>> --- >>> qapi/migration.json | 12 +++++++++++- >>> migration/options.h | 1 + >>> migration/options.c | 21 +++++++++++++++++++++ >>> 3 files changed, 33 insertions(+), 1 deletion(-) >>> >>> diff --git a/qapi/migration.json b/qapi/migration.json >>> index 179af0c4d8..061ea512e0 100644 >>> --- a/qapi/migration.json >>> +++ b/qapi/migration.json >>> @@ -487,6 +487,16 @@ >>> # and should not affect the correctness of postcopy migration. >>> # (since 7.1) >>> # >>> +# @switchover-ack: If enabled, migration will not stop the source VM >>> +# and complete the migration until an ACK is received from the >>> +# destination that it's OK to do so. Exactly when this ACK is >>> +# sent depends on the migrated devices that use this feature. >>> +# For example, a device can use it to make sure some of its data >>> +# is sent and loaded in the destination before doing switchover. >>> +# This can reduce downtime if devices that support this capability >>> +# are present. 'return-path' capability must be enabled to use >>> +# it. (since 8.1) >>> +# >>> # Features: >>> # >>> # @unstable: Members @x-colo and @x-ignore-shared are experimental. >>> @@ -502,7 +512,7 @@ >>> 'dirty-bitmaps', 'postcopy-blocktime', >>> 'late-block-activate', >>> { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] }, >>> 'validate-uuid', 'background-snapshot', >>> - 'zero-copy-send', 'postcopy-preempt'] } >>> + 'zero-copy-send', 'postcopy-preempt', 'switchover-ack'] } >>> >>> ## >>> # @MigrationCapabilityStatus: >>> diff --git a/migration/options.h b/migration/options.h >>> index 45991af3c2..9aaf363322 100644 >>> --- a/migration/options.h >>> +++ b/migration/options.h >>> @@ -40,6 +40,7 @@ bool migrate_postcopy_ram(void); >>> bool migrate_rdma_pin_all(void); >>> bool migrate_release_ram(void); >>> bool migrate_return_path(void); >>> +bool migrate_switchover_ack(void); >>> bool migrate_validate_uuid(void); >>> bool migrate_xbzrle(void); >>> bool migrate_zero_blocks(void); >>> diff --git a/migration/options.c b/migration/options.c >>> index b62ab30cd5..16007afca6 100644 >>> --- a/migration/options.c >>> +++ b/migration/options.c >>> @@ -185,6 +185,8 @@ Property migration_properties[] = { >>> DEFINE_PROP_MIG_CAP("x-zero-copy-send", >>> MIGRATION_CAPABILITY_ZERO_COPY_SEND), >>> #endif >>> + DEFINE_PROP_MIG_CAP("x-switchover-ack", >>> + MIGRATION_CAPABILITY_SWITCHOVER_ACK), >>> >>> DEFINE_PROP_END_OF_LIST(), >>> }; >>> @@ -308,6 +310,13 @@ bool migrate_return_path(void) >>> return s->capabilities[MIGRATION_CAPABILITY_RETURN_PATH]; >>> } >>> >>> +bool migrate_switchover_ack(void) >>> +{ >>> + MigrationState *s = migrate_get_current(); >>> + >>> + return s->capabilities[MIGRATION_CAPABILITY_SWITCHOVER_ACK]; >>> +} >>> + >>> bool migrate_validate_uuid(void) >>> { >>> MigrationState *s = migrate_get_current(); >>> @@ -547,6 +556,18 @@ bool migrate_caps_check(bool *old_caps, bool >>> *new_caps, Error **errp) >>> } >>> } >>> >>> + if (new_caps[MIGRATION_CAPABILITY_SWITCHOVER_ACK]) { >>> + if (!new_caps[MIGRATION_CAPABILITY_RETURN_PATH]) { >>> + error_setg(errp, "Capability 'switchover-ack' requires >>> capability " >>> + "'return-path'"); >>> + return false; >>> + } >>> + >>> + /* Disable this capability until it's implemented */ >>> + error_setg(errp, "'switchover-ack' is not implemented yet"); >>> + return false; >>> + } >>> + >>> return true; >>> } >>> >>> -- >>> 2.26.3 >>> >>> >> >
diff --git a/qapi/migration.json b/qapi/migration.json index 179af0c4d8..061ea512e0 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -487,6 +487,16 @@ # and should not affect the correctness of postcopy migration. # (since 7.1) # +# @switchover-ack: If enabled, migration will not stop the source VM +# and complete the migration until an ACK is received from the +# destination that it's OK to do so. Exactly when this ACK is +# sent depends on the migrated devices that use this feature. +# For example, a device can use it to make sure some of its data +# is sent and loaded in the destination before doing switchover. +# This can reduce downtime if devices that support this capability +# are present. 'return-path' capability must be enabled to use +# it. (since 8.1) +# # Features: # # @unstable: Members @x-colo and @x-ignore-shared are experimental. @@ -502,7 +512,7 @@ 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate', { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] }, 'validate-uuid', 'background-snapshot', - 'zero-copy-send', 'postcopy-preempt'] } + 'zero-copy-send', 'postcopy-preempt', 'switchover-ack'] } ## # @MigrationCapabilityStatus: diff --git a/migration/options.h b/migration/options.h index 45991af3c2..9aaf363322 100644 --- a/migration/options.h +++ b/migration/options.h @@ -40,6 +40,7 @@ bool migrate_postcopy_ram(void); bool migrate_rdma_pin_all(void); bool migrate_release_ram(void); bool migrate_return_path(void); +bool migrate_switchover_ack(void); bool migrate_validate_uuid(void); bool migrate_xbzrle(void); bool migrate_zero_blocks(void); diff --git a/migration/options.c b/migration/options.c index b62ab30cd5..16007afca6 100644 --- a/migration/options.c +++ b/migration/options.c @@ -185,6 +185,8 @@ Property migration_properties[] = { DEFINE_PROP_MIG_CAP("x-zero-copy-send", MIGRATION_CAPABILITY_ZERO_COPY_SEND), #endif + DEFINE_PROP_MIG_CAP("x-switchover-ack", + MIGRATION_CAPABILITY_SWITCHOVER_ACK), DEFINE_PROP_END_OF_LIST(), }; @@ -308,6 +310,13 @@ bool migrate_return_path(void) return s->capabilities[MIGRATION_CAPABILITY_RETURN_PATH]; } +bool migrate_switchover_ack(void) +{ + MigrationState *s = migrate_get_current(); + + return s->capabilities[MIGRATION_CAPABILITY_SWITCHOVER_ACK]; +} + bool migrate_validate_uuid(void) { MigrationState *s = migrate_get_current(); @@ -547,6 +556,18 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp) } } + if (new_caps[MIGRATION_CAPABILITY_SWITCHOVER_ACK]) { + if (!new_caps[MIGRATION_CAPABILITY_RETURN_PATH]) { + error_setg(errp, "Capability 'switchover-ack' requires capability " + "'return-path'"); + return false; + } + + /* Disable this capability until it's implemented */ + error_setg(errp, "'switchover-ack' is not implemented yet"); + return false; + } + return true; }