Message ID | 20230501140141.11743-2-avihaih@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | migration: Add precopy initial data capability and VFIO precopy support | expand |
Avihai Horon <avihaih@nvidia.com> wrote: > Migration downtime estimation is calculated based on bandwidth and > remaining migration data. This assumes that loading of migration data in > the destination takes a negligible amount of time and that downtime > depends only on network speed. > > While this may be true for RAM, it's not necessarily true for other > migration users. For example, loading the data of a VFIO device in the > destination might require from the device to allocate resources, prepare > internal data structures and so on. These operations can take a > significant amount of time which can increase migration downtime. > > This patch adds a new capability "precopy initial data" that allows the > source to send initial precopy data and the destination to ACK that this > data has been loaded. Migration will not attempt to stop the source VM > and complete the migration until this ACK is received. > > This will allow migration users to send initial precopy data which can > be used to reduce downtime (e.g., by pre-allocating resources), while > making sure that the source will stop the VM and complete the migration > only after this initial precopy data is sent and loaded in the > destination so it will have full effect. > > This new capability relies on the return path capability to communicate > from the destination back to the source. > > The actual implementation of the capability will be added in the > following patches. > > Signed-off-by: Avihai Horon <avihaih@nvidia.com> Capability definition is correct. I am not given the review-by until the rest of the series is discussed, but nothing else to do here.
Avihai Horon <avihaih@nvidia.com> writes: > Migration downtime estimation is calculated based on bandwidth and > remaining migration data. This assumes that loading of migration data in > the destination takes a negligible amount of time and that downtime > depends only on network speed. > > While this may be true for RAM, it's not necessarily true for other > migration users. For example, loading the data of a VFIO device in the > destination might require from the device to allocate resources, prepare > internal data structures and so on. These operations can take a > significant amount of time which can increase migration downtime. > > This patch adds a new capability "precopy initial data" that allows the > source to send initial precopy data and the destination to ACK that this > data has been loaded. Migration will not attempt to stop the source VM > and complete the migration until this ACK is received. > > This will allow migration users to send initial precopy data which can > be used to reduce downtime (e.g., by pre-allocating resources), while > making sure that the source will stop the VM and complete the migration > only after this initial precopy data is sent and loaded in the > destination so it will have full effect. > > This new capability relies on the return path capability to communicate > from the destination back to the source. > > The actual implementation of the capability will be added in the > following patches. > > Signed-off-by: Avihai Horon <avihaih@nvidia.com> > --- > qapi/migration.json | 9 ++++++++- > migration/options.h | 1 + > migration/options.c | 20 ++++++++++++++++++++ > 3 files changed, 29 insertions(+), 1 deletion(-) > > diff --git a/qapi/migration.json b/qapi/migration.json > index 82000adce4..d496148386 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -478,6 +478,13 @@ > # should not affect the correctness of postcopy migration. > # (since 7.1) > # > +# @precopy-initial-data: If enabled, migration will not attempt to stop source > +# VM and complete the migration until an ACK is received > +# from the destination that initial precopy data has > +# been loaded. This can improve downtime if there are > +# migration users that support precopy initial data. > +# (since 8.1) > +# Please format like # @precopy-initial-data: If enabled, migration will not attempt to # stop source VM and complete the migration until an ACK is # received from the destination that initial precopy data has been # loaded. This can improve downtime if there are migration users # that support precopy initial data. (since 8.1) to blend in with recent commit a937b6aa739 (qapi: Reformat doc comments to conform to current conventions). What do you mean by "if there are migration users that support precopy initial data"? Do I have to ensure the ACK comes by configuring the destination VM in a certain way, and if yes, how exactly? > # Features: > # @unstable: Members @x-colo and @x-ignore-shared are experimental. > # > @@ -492,7 +499,7 @@ > 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate', > { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] }, > 'validate-uuid', 'background-snapshot', > - 'zero-copy-send', 'postcopy-preempt'] } > + 'zero-copy-send', 'postcopy-preempt', 'precopy-initial-data'] } > > ## > # @MigrationCapabilityStatus: > diff --git a/migration/options.h b/migration/options.h > index 3c322867cd..d004b6321e 100644 > --- a/migration/options.h > +++ b/migration/options.h > @@ -44,6 +44,7 @@ bool migrate_pause_before_switchover(void); > bool migrate_postcopy_blocktime(void); > bool migrate_postcopy_preempt(void); > bool migrate_postcopy_ram(void); > +bool migrate_precopy_initial_data(void); > bool migrate_rdma_pin_all(void); > bool migrate_release_ram(void); > bool migrate_return_path(void); > diff --git a/migration/options.c b/migration/options.c > index 53b7fc5d5d..c4ef0c60c7 100644 > --- a/migration/options.c > +++ b/migration/options.c > @@ -184,6 +184,8 @@ Property migration_properties[] = { > DEFINE_PROP_MIG_CAP("x-zero-copy-send", > MIGRATION_CAPABILITY_ZERO_COPY_SEND), > #endif > + DEFINE_PROP_MIG_CAP("x-precopy-initial-data", > + MIGRATION_CAPABILITY_PRECOPY_INITIAL_DATA), > > DEFINE_PROP_END_OF_LIST(), > }; > @@ -286,6 +288,13 @@ bool migrate_postcopy_ram(void) > return s->capabilities[MIGRATION_CAPABILITY_POSTCOPY_RAM]; > } > > +bool migrate_precopy_initial_data(void) > +{ > + MigrationState *s = migrate_get_current(); > + > + return s->capabilities[MIGRATION_CAPABILITY_PRECOPY_INITIAL_DATA]; > +} > + > bool migrate_rdma_pin_all(void) > { > MigrationState *s = migrate_get_current(); > @@ -546,6 +555,17 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp) > } > } > > + if (new_caps[MIGRATION_CAPABILITY_PRECOPY_INITIAL_DATA]) { > + if (!new_caps[MIGRATION_CAPABILITY_RETURN_PATH]) { > + error_setg(errp, "Precopy initial data requires return path"); Shouldn't we mention this requirement in the docs? To make sense of the message, the user needs to make the connection from "Precopy initial data" to capability @precopy-initial-data, and from "return path" to capability @return-path. More helpful, I think: "capability 'precopy-initial-data' requires capability 'return-path'". > + return false; > + } > + > + /* Disable this capability until it's implemented */ > + error_setg(errp, "Precopy initial data is not implemented yet"); > + return false; > + } > + > return true; > }
On 17/05/2023 12:17, Markus Armbruster wrote: > External email: Use caution opening links or attachments > > > Avihai Horon <avihaih@nvidia.com> writes: > >> Migration downtime estimation is calculated based on bandwidth and >> remaining migration data. This assumes that loading of migration data in >> the destination takes a negligible amount of time and that downtime >> depends only on network speed. >> >> While this may be true for RAM, it's not necessarily true for other >> migration users. For example, loading the data of a VFIO device in the >> destination might require from the device to allocate resources, prepare >> internal data structures and so on. These operations can take a >> significant amount of time which can increase migration downtime. >> >> This patch adds a new capability "precopy initial data" that allows the >> source to send initial precopy data and the destination to ACK that this >> data has been loaded. Migration will not attempt to stop the source VM >> and complete the migration until this ACK is received. >> >> This will allow migration users to send initial precopy data which can >> be used to reduce downtime (e.g., by pre-allocating resources), while >> making sure that the source will stop the VM and complete the migration >> only after this initial precopy data is sent and loaded in the >> destination so it will have full effect. >> >> This new capability relies on the return path capability to communicate >> from the destination back to the source. >> >> The actual implementation of the capability will be added in the >> following patches. >> >> Signed-off-by: Avihai Horon <avihaih@nvidia.com> >> --- >> qapi/migration.json | 9 ++++++++- >> migration/options.h | 1 + >> migration/options.c | 20 ++++++++++++++++++++ >> 3 files changed, 29 insertions(+), 1 deletion(-) >> >> diff --git a/qapi/migration.json b/qapi/migration.json >> index 82000adce4..d496148386 100644 >> --- a/qapi/migration.json >> +++ b/qapi/migration.json >> @@ -478,6 +478,13 @@ >> # should not affect the correctness of postcopy migration. >> # (since 7.1) >> # >> +# @precopy-initial-data: If enabled, migration will not attempt to stop source >> +# VM and complete the migration until an ACK is received >> +# from the destination that initial precopy data has >> +# been loaded. This can improve downtime if there are >> +# migration users that support precopy initial data. >> +# (since 8.1) >> +# > Please format like > > # @precopy-initial-data: If enabled, migration will not attempt to > # stop source VM and complete the migration until an ACK is > # received from the destination that initial precopy data has been > # loaded. This can improve downtime if there are migration users > # that support precopy initial data. (since 8.1) > > to blend in with recent commit a937b6aa739 (qapi: Reformat doc comments > to conform to current conventions). Sure. > > What do you mean by "if there are migration users that support precopy > initial data"? This capability only provides the framework to send precopy initial data and ACK that it was loaded in the destination. To actually benefit from it, migration users (such as VFIO devices, RAM, etc.) must implement support for it and use it. What I wanted to say here is that there is no point to enable this capability if there are no migration users that support it. For example, if you are migrating a VM without VFIO devices, then enabling this capability will have no effect. > > Do I have to ensure the ACK comes by configuring the destination VM in a > certain way, and if yes, how exactly? In v2 of the series that I will send later you will have to enable this capability also in the destination. > >> # Features: >> # @unstable: Members @x-colo and @x-ignore-shared are experimental. >> # >> @@ -492,7 +499,7 @@ >> 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate', >> { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] }, >> 'validate-uuid', 'background-snapshot', >> - 'zero-copy-send', 'postcopy-preempt'] } >> + 'zero-copy-send', 'postcopy-preempt', 'precopy-initial-data'] } >> >> ## >> # @MigrationCapabilityStatus: >> diff --git a/migration/options.h b/migration/options.h >> index 3c322867cd..d004b6321e 100644 >> --- a/migration/options.h >> +++ b/migration/options.h >> @@ -44,6 +44,7 @@ bool migrate_pause_before_switchover(void); >> bool migrate_postcopy_blocktime(void); >> bool migrate_postcopy_preempt(void); >> bool migrate_postcopy_ram(void); >> +bool migrate_precopy_initial_data(void); >> bool migrate_rdma_pin_all(void); >> bool migrate_release_ram(void); >> bool migrate_return_path(void); >> diff --git a/migration/options.c b/migration/options.c >> index 53b7fc5d5d..c4ef0c60c7 100644 >> --- a/migration/options.c >> +++ b/migration/options.c >> @@ -184,6 +184,8 @@ Property migration_properties[] = { >> DEFINE_PROP_MIG_CAP("x-zero-copy-send", >> MIGRATION_CAPABILITY_ZERO_COPY_SEND), >> #endif >> + DEFINE_PROP_MIG_CAP("x-precopy-initial-data", >> + MIGRATION_CAPABILITY_PRECOPY_INITIAL_DATA), >> >> DEFINE_PROP_END_OF_LIST(), >> }; >> @@ -286,6 +288,13 @@ bool migrate_postcopy_ram(void) >> return s->capabilities[MIGRATION_CAPABILITY_POSTCOPY_RAM]; >> } >> >> +bool migrate_precopy_initial_data(void) >> +{ >> + MigrationState *s = migrate_get_current(); >> + >> + return s->capabilities[MIGRATION_CAPABILITY_PRECOPY_INITIAL_DATA]; >> +} >> + >> bool migrate_rdma_pin_all(void) >> { >> MigrationState *s = migrate_get_current(); >> @@ -546,6 +555,17 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp) >> } >> } >> >> + if (new_caps[MIGRATION_CAPABILITY_PRECOPY_INITIAL_DATA]) { >> + if (!new_caps[MIGRATION_CAPABILITY_RETURN_PATH]) { >> + error_setg(errp, "Precopy initial data requires return path"); > Shouldn't we mention this requirement in the docs? Yes, I will add it. > > To make sense of the message, the user needs to make the connection from > "Precopy initial data" to capability @precopy-initial-data, and from > "return path" to capability @return-path. More helpful, I think: > "capability 'precopy-initial-data' requires capability 'return-path'". Sure, I will change. Thanks. >> + return false; >> + } >> + >> + /* Disable this capability until it's implemented */ >> + error_setg(errp, "Precopy initial data is not implemented yet"); >> + return false; >> + } >> + >> return true; >> }
Avihai Horon <avihaih@nvidia.com> writes: > On 17/05/2023 12:17, Markus Armbruster wrote: >> External email: Use caution opening links or attachments >> >> >> Avihai Horon <avihaih@nvidia.com> writes: >> >>> Migration downtime estimation is calculated based on bandwidth and >>> remaining migration data. This assumes that loading of migration data in >>> the destination takes a negligible amount of time and that downtime >>> depends only on network speed. >>> >>> While this may be true for RAM, it's not necessarily true for other >>> migration users. For example, loading the data of a VFIO device in the >>> destination might require from the device to allocate resources, prepare >>> internal data structures and so on. These operations can take a >>> significant amount of time which can increase migration downtime. >>> >>> This patch adds a new capability "precopy initial data" that allows the >>> source to send initial precopy data and the destination to ACK that this >>> data has been loaded. Migration will not attempt to stop the source VM >>> and complete the migration until this ACK is received. >>> >>> This will allow migration users to send initial precopy data which can >>> be used to reduce downtime (e.g., by pre-allocating resources), while >>> making sure that the source will stop the VM and complete the migration >>> only after this initial precopy data is sent and loaded in the >>> destination so it will have full effect. >>> >>> This new capability relies on the return path capability to communicate >>> from the destination back to the source. >>> >>> The actual implementation of the capability will be added in the >>> following patches. >>> >>> Signed-off-by: Avihai Horon <avihaih@nvidia.com> >>> --- >>> qapi/migration.json | 9 ++++++++- >>> migration/options.h | 1 + >>> migration/options.c | 20 ++++++++++++++++++++ >>> 3 files changed, 29 insertions(+), 1 deletion(-) >>> >>> diff --git a/qapi/migration.json b/qapi/migration.json >>> index 82000adce4..d496148386 100644 >>> --- a/qapi/migration.json >>> +++ b/qapi/migration.json >>> @@ -478,6 +478,13 @@ >>> # should not affect the correctness of postcopy migration. >>> # (since 7.1) >>> # >>> +# @precopy-initial-data: If enabled, migration will not attempt to stop source >>> +# VM and complete the migration until an ACK is received >>> +# from the destination that initial precopy data has >>> +# been loaded. This can improve downtime if there are >>> +# migration users that support precopy initial data. >>> +# (since 8.1) >>> +# >> Please format like >> >> # @precopy-initial-data: If enabled, migration will not attempt to >> # stop source VM and complete the migration until an ACK is >> # received from the destination that initial precopy data has been >> # loaded. This can improve downtime if there are migration users >> # that support precopy initial data. (since 8.1) >> >> to blend in with recent commit a937b6aa739 (qapi: Reformat doc comments >> to conform to current conventions). > > Sure. > >> >> What do you mean by "if there are migration users that support precopy >> initial data"? > > This capability only provides the framework to send precopy initial data and ACK that it was loaded in the destination. > To actually benefit from it, migration users (such as VFIO devices, RAM, etc.) must implement support for it and use it. > > What I wanted to say here is that there is no point to enable this capability if there are no migration users that support it. > For example, if you are migrating a VM without VFIO devices, then enabling this capability will have no effect. I see. Which "migration users" support it now? Which could support it in the future? Is the "initial precopy data" feature described in more detail anywhere? >> Do I have to ensure the ACK comes by configuring the destination VM in a >> certain way, and if yes, how exactly? > > In v2 of the series that I will send later you will have to enable this capability also in the destination. What happens when you enable it on the source and not on the destination? [...]
On 17/05/2023 15:21, Markus Armbruster wrote: > External email: Use caution opening links or attachments > > > Avihai Horon <avihaih@nvidia.com> writes: > >> On 17/05/2023 12:17, Markus Armbruster wrote: >>> External email: Use caution opening links or attachments >>> >>> >>> Avihai Horon <avihaih@nvidia.com> writes: >>> >>>> Migration downtime estimation is calculated based on bandwidth and >>>> remaining migration data. This assumes that loading of migration data in >>>> the destination takes a negligible amount of time and that downtime >>>> depends only on network speed. >>>> >>>> While this may be true for RAM, it's not necessarily true for other >>>> migration users. For example, loading the data of a VFIO device in the >>>> destination might require from the device to allocate resources, prepare >>>> internal data structures and so on. These operations can take a >>>> significant amount of time which can increase migration downtime. >>>> >>>> This patch adds a new capability "precopy initial data" that allows the >>>> source to send initial precopy data and the destination to ACK that this >>>> data has been loaded. Migration will not attempt to stop the source VM >>>> and complete the migration until this ACK is received. >>>> >>>> This will allow migration users to send initial precopy data which can >>>> be used to reduce downtime (e.g., by pre-allocating resources), while >>>> making sure that the source will stop the VM and complete the migration >>>> only after this initial precopy data is sent and loaded in the >>>> destination so it will have full effect. >>>> >>>> This new capability relies on the return path capability to communicate >>>> from the destination back to the source. >>>> >>>> The actual implementation of the capability will be added in the >>>> following patches. >>>> >>>> Signed-off-by: Avihai Horon <avihaih@nvidia.com> >>>> --- >>>> qapi/migration.json | 9 ++++++++- >>>> migration/options.h | 1 + >>>> migration/options.c | 20 ++++++++++++++++++++ >>>> 3 files changed, 29 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/qapi/migration.json b/qapi/migration.json >>>> index 82000adce4..d496148386 100644 >>>> --- a/qapi/migration.json >>>> +++ b/qapi/migration.json >>>> @@ -478,6 +478,13 @@ >>>> # should not affect the correctness of postcopy migration. >>>> # (since 7.1) >>>> # >>>> +# @precopy-initial-data: If enabled, migration will not attempt to stop source >>>> +# VM and complete the migration until an ACK is received >>>> +# from the destination that initial precopy data has >>>> +# been loaded. This can improve downtime if there are >>>> +# migration users that support precopy initial data. >>>> +# (since 8.1) >>>> +# >>> Please format like >>> >>> # @precopy-initial-data: If enabled, migration will not attempt to >>> # stop source VM and complete the migration until an ACK is >>> # received from the destination that initial precopy data has been >>> # loaded. This can improve downtime if there are migration users >>> # that support precopy initial data. (since 8.1) >>> >>> to blend in with recent commit a937b6aa739 (qapi: Reformat doc comments >>> to conform to current conventions). >> Sure. >> >>> What do you mean by "if there are migration users that support precopy >>> initial data"? >> This capability only provides the framework to send precopy initial data and ACK that it was loaded in the destination. >> To actually benefit from it, migration users (such as VFIO devices, RAM, etc.) must implement support for it and use it. >> >> What I wanted to say here is that there is no point to enable this capability if there are no migration users that support it. >> For example, if you are migrating a VM without VFIO devices, then enabling this capability will have no effect. > I see. > > Which "migration users" support it now? Currently, only VFIO devices. > > Which could support it in the future? Any device that uses "iterative migration" [1], such as RAM, block, or some new type of device in the future. [1] https://www.qemu.org/docs/master/devel/migration.html#iterative-device-migration > > Is the "initial precopy data" feature described in more detail anywhere? The cover letter of this series contains more info on the background and motivation behind it and also about the workflow of the feature. > >>> Do I have to ensure the ACK comes by configuring the destination VM in a >>> certain way, and if yes, how exactly? >> In v2 of the series that I will send later you will have to enable this capability also in the destination. > What happens when you enable it on the source and not on the > destination? Migration may fail. For example, this is what happens if I migrate a VM with a VFIO device and enable this capability only on the source. Thanks.
diff --git a/qapi/migration.json b/qapi/migration.json index 82000adce4..d496148386 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -478,6 +478,13 @@ # should not affect the correctness of postcopy migration. # (since 7.1) # +# @precopy-initial-data: If enabled, migration will not attempt to stop source +# VM and complete the migration until an ACK is received +# from the destination that initial precopy data has +# been loaded. This can improve downtime if there are +# migration users that support precopy initial data. +# (since 8.1) +# # Features: # @unstable: Members @x-colo and @x-ignore-shared are experimental. # @@ -492,7 +499,7 @@ 'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate', { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] }, 'validate-uuid', 'background-snapshot', - 'zero-copy-send', 'postcopy-preempt'] } + 'zero-copy-send', 'postcopy-preempt', 'precopy-initial-data'] } ## # @MigrationCapabilityStatus: diff --git a/migration/options.h b/migration/options.h index 3c322867cd..d004b6321e 100644 --- a/migration/options.h +++ b/migration/options.h @@ -44,6 +44,7 @@ bool migrate_pause_before_switchover(void); bool migrate_postcopy_blocktime(void); bool migrate_postcopy_preempt(void); bool migrate_postcopy_ram(void); +bool migrate_precopy_initial_data(void); bool migrate_rdma_pin_all(void); bool migrate_release_ram(void); bool migrate_return_path(void); diff --git a/migration/options.c b/migration/options.c index 53b7fc5d5d..c4ef0c60c7 100644 --- a/migration/options.c +++ b/migration/options.c @@ -184,6 +184,8 @@ Property migration_properties[] = { DEFINE_PROP_MIG_CAP("x-zero-copy-send", MIGRATION_CAPABILITY_ZERO_COPY_SEND), #endif + DEFINE_PROP_MIG_CAP("x-precopy-initial-data", + MIGRATION_CAPABILITY_PRECOPY_INITIAL_DATA), DEFINE_PROP_END_OF_LIST(), }; @@ -286,6 +288,13 @@ bool migrate_postcopy_ram(void) return s->capabilities[MIGRATION_CAPABILITY_POSTCOPY_RAM]; } +bool migrate_precopy_initial_data(void) +{ + MigrationState *s = migrate_get_current(); + + return s->capabilities[MIGRATION_CAPABILITY_PRECOPY_INITIAL_DATA]; +} + bool migrate_rdma_pin_all(void) { MigrationState *s = migrate_get_current(); @@ -546,6 +555,17 @@ bool migrate_caps_check(bool *old_caps, bool *new_caps, Error **errp) } } + if (new_caps[MIGRATION_CAPABILITY_PRECOPY_INITIAL_DATA]) { + if (!new_caps[MIGRATION_CAPABILITY_RETURN_PATH]) { + error_setg(errp, "Precopy initial data requires return path"); + return false; + } + + /* Disable this capability until it's implemented */ + error_setg(errp, "Precopy initial data is not implemented yet"); + return false; + } + return true; }
Migration downtime estimation is calculated based on bandwidth and remaining migration data. This assumes that loading of migration data in the destination takes a negligible amount of time and that downtime depends only on network speed. While this may be true for RAM, it's not necessarily true for other migration users. For example, loading the data of a VFIO device in the destination might require from the device to allocate resources, prepare internal data structures and so on. These operations can take a significant amount of time which can increase migration downtime. This patch adds a new capability "precopy initial data" that allows the source to send initial precopy data and the destination to ACK that this data has been loaded. Migration will not attempt to stop the source VM and complete the migration until this ACK is received. This will allow migration users to send initial precopy data which can be used to reduce downtime (e.g., by pre-allocating resources), while making sure that the source will stop the VM and complete the migration only after this initial precopy data is sent and loaded in the destination so it will have full effect. This new capability relies on the return path capability to communicate from the destination back to the source. The actual implementation of the capability will be added in the following patches. Signed-off-by: Avihai Horon <avihaih@nvidia.com> --- qapi/migration.json | 9 ++++++++- migration/options.h | 1 + migration/options.c | 20 ++++++++++++++++++++ 3 files changed, 29 insertions(+), 1 deletion(-)