Message ID | 20210930170926.1298118-1-lvivier@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | failover: allow to pause the VM during the migration | expand |
On 9/30/21 1:09 PM, Laurent Vivier wrote: > If we want to save a snapshot of a VM to a file, we used to follow the > following steps: > > 1- stop the VM: > (qemu) stop > > 2- migrate the VM to a file: > (qemu) migrate "exec:cat > snapshot" > > 3- resume the VM: > (qemu) cont > > After that we can restore the snapshot with: > qemu-system-x86_64 ... -incoming "exec:cat snapshot" > (qemu) cont This is the basics of what libvirt does for a snapshot, and steps 1+2 are what it does for a "managedsave" (where it saves the snapshot to disk and then terminates the qemu process, for later re-animation). In those cases, it seems like this new parameter could work for us - instead of explicitly pausing the guest prior to migrating it to disk, we would set this new parameter to on, then directly migrate-to-disk (relying on qemu to do the pause). Care will need to be taken to assure that error recovery behaves the same though. There are a couple of cases when libvirt apparently *doesn't* pause the guest during the migrate-to-disk, both having to do with saving a coredump of the guest. Since I really have no idea of how common/important that is (or even if my assessment of the code is correct), I'm Cc'ing this patch to libvir-list to make sure it catches the attention of someone who knows the answers and implications. > But when failover is configured, it doesn't work anymore. > > As the failover needs to ask the guest OS to unplug the card > the machine cannot be paused. > > This patch introduces a new migration parameter, "pause-vm", that > asks the migration to pause the VM during the migration startup > phase after the the card is unplugged. > > Once the migration is done, we only need to resume the VM with > "cont" and the card is plugged back: > > 1- set the parameter: > (qemu) migrate_set_parameter pause-vm on > > 2- migrate the VM to a file: > (qemu) migrate "exec:cat > snapshot" > > The primary failover card (VFIO) is unplugged and the VM is paused. > > 3- resume the VM: > (qemu) cont > > The VM restarts and the primary failover card is plugged back > > The VM state sent in the migration stream is "paused", it means > when the snapshot is loaded or if the stream is sent to a destination > QEMU, the VM needs to be resumed manually. > > Signed-off-by: Laurent Vivier <lvivier@redhat.com> > --- > qapi/migration.json | 20 +++++++++++++++--- > include/hw/virtio/virtio-net.h | 1 + > hw/net/virtio-net.c | 33 ++++++++++++++++++++++++++++++ > migration/migration.c | 37 +++++++++++++++++++++++++++++++++- > monitor/hmp-cmds.c | 8 ++++++++ > 5 files changed, 95 insertions(+), 4 deletions(-) > > diff --git a/qapi/migration.json b/qapi/migration.json > index 88f07baedd06..86284d96ad2a 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -743,6 +743,10 @@ > # block device name if there is one, and to their node name > # otherwise. (Since 5.2) > # > +# @pause-vm: Pause the virtual machine before doing the migration. > +# This allows QEMU to unplug a card before doing the > +# migration as it depends on the guest kernel. (since 6.2) > +# > # Since: 2.4 > ## > { 'enum': 'MigrationParameter', > @@ -758,7 +762,7 @@ > 'xbzrle-cache-size', 'max-postcopy-bandwidth', > 'max-cpu-throttle', 'multifd-compression', > 'multifd-zlib-level' ,'multifd-zstd-level', > - 'block-bitmap-mapping' ] } > + 'block-bitmap-mapping', 'pause-vm' ] } > > ## > # @MigrateSetParameters: > @@ -903,6 +907,10 @@ > # block device name if there is one, and to their node name > # otherwise. (Since 5.2) > # > +# @pause-vm: Pause the virtual machine before doing the migration. > +# This allows QEMU to unplug a card before doing the > +# migration as it depends on the guest kernel. (since 6.2) > +# > # Since: 2.4 > ## > # TODO either fuse back into MigrationParameters, or make > @@ -934,7 +942,8 @@ > '*multifd-compression': 'MultiFDCompression', > '*multifd-zlib-level': 'uint8', > '*multifd-zstd-level': 'uint8', > - '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } } > + '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ], > + '*pause-vm': 'bool' } } > > ## > # @migrate-set-parameters: > @@ -1099,6 +1108,10 @@ > # block device name if there is one, and to their node name > # otherwise. (Since 5.2) > # > +# @pause-vm: Pause the virtual machine before doing the migration. > +# This allows QEMU to unplug a card before doing the > +# migration as it depends on the guest kernel. (since 6.2) > +# > # Since: 2.4 > ## > { 'struct': 'MigrationParameters', > @@ -1128,7 +1141,8 @@ > '*multifd-compression': 'MultiFDCompression', > '*multifd-zlib-level': 'uint8', > '*multifd-zstd-level': 'uint8', > - '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } } > + '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ], > + '*pause-vm': 'bool' } } > > ## > # @query-migrate-parameters: > diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h > index 824a69c23f06..a6c186e28b45 100644 > --- a/include/hw/virtio/virtio-net.h > +++ b/include/hw/virtio/virtio-net.h > @@ -210,6 +210,7 @@ struct VirtIONet { > bool failover; > DeviceListener primary_listener; > Notifier migration_state; > + VMChangeStateEntry *vm_state; > VirtioNetRssData rss_data; > struct NetRxPkt *rx_pkt; > struct EBPFRSSContext ebpf_rss; > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > index f205331dcf8c..c83364eac47b 100644 > --- a/hw/net/virtio-net.c > +++ b/hw/net/virtio-net.c > @@ -39,6 +39,7 @@ > #include "migration/misc.h" > #include "standard-headers/linux/ethtool.h" > #include "sysemu/sysemu.h" > +#include "sysemu/runstate.h" > #include "trace.h" > #include "monitor/qdev.h" > #include "hw/pci/pci.h" > @@ -3303,6 +3304,35 @@ static void virtio_net_migration_state_notifier(Notifier *notifier, void *data) > virtio_net_handle_migration_primary(n, s); > } > > +static void virtio_net_failover_restart_cb(void *opaque, bool running, > + RunState state) > +{ > + DeviceState *dev; > + VirtIONet *n = opaque; > + Error *err = NULL; > + PCIDevice *pdev; > + > + if (!running) { > + return; > + } > + > + dev = failover_find_primary_device(n); > + if (!dev) { > + return; > + } > + > + pdev = PCI_DEVICE(dev); > + if (!pdev->partially_hotplugged) { > + return; > + } > + > + if (!failover_replug_primary(n, dev, &err)) { > + if (err) { > + error_report_err(err); > + } > + } > +} > + > static bool failover_hide_primary_device(DeviceListener *listener, > QemuOpts *device_opts) > { > @@ -3360,6 +3390,8 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp) > device_listener_register(&n->primary_listener); > n->migration_state.notify = virtio_net_migration_state_notifier; > add_migration_state_change_notifier(&n->migration_state); > + n->vm_state = qemu_add_vm_change_state_handler( > + virtio_net_failover_restart_cb, n); > n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY); > } > > @@ -3508,6 +3540,7 @@ static void virtio_net_device_unrealize(DeviceState *dev) > if (n->failover) { > device_listener_unregister(&n->primary_listener); > remove_migration_state_change_notifier(&n->migration_state); > + qemu_del_vm_change_state_handler(n->vm_state); > } > > max_queues = n->multiqueue ? n->max_queues : 1; > diff --git a/migration/migration.c b/migration/migration.c > index bb909781b7f5..9c96986d4abf 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -901,6 +901,9 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp) > s->parameters.block_bitmap_mapping); > } > > + params->has_pause_vm = true; > + params->pause_vm = s->parameters.pause_vm; > + > return params; > } > > @@ -1549,6 +1552,11 @@ static void migrate_params_test_apply(MigrateSetParameters *params, > dest->has_block_bitmap_mapping = true; > dest->block_bitmap_mapping = params->block_bitmap_mapping; > } > + > + if (params->has_pause_vm) { > + dest->has_pause_vm = true; > + dest->pause_vm = params->pause_vm; > + } > } > > static void migrate_params_apply(MigrateSetParameters *params, Error **errp) > @@ -1671,6 +1679,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp) > QAPI_CLONE(BitmapMigrationNodeAliasList, > params->block_bitmap_mapping); > } > + > + if (params->has_pause_vm) { > + s->parameters.pause_vm = params->pause_vm; > + } > } > > void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp) > @@ -1718,6 +1730,12 @@ void qmp_migrate_start_postcopy(Error **errp) > " started"); > return; > } > + > + if (s->parameters.pause_vm) { > + error_setg(errp, "Postcopy cannot be started if pause-vm is on"); > + return; > + } > + > /* > * we don't error if migration has finished since that would be racy > * with issuing this command. > @@ -3734,13 +3752,27 @@ static void qemu_savevm_wait_unplug(MigrationState *s, int old_state, > "failure"); > } > } > - > migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG, new_state); > } else { > migrate_set_state(&s->state, old_state, new_state); > } > } > > +/* stop the VM before starting the migration but after device unplug */ > +static void pause_vm_after_unplug(MigrationState *s) > +{ > + if (s->parameters.pause_vm) { > + qemu_mutex_lock_iothread(); > + qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); > + s->vm_was_running = runstate_is_running(); > + if (vm_stop_force_state(RUN_STATE_PAUSED)) { > + migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE, > + MIGRATION_STATUS_FAILED); > + } > + qemu_mutex_unlock_iothread(); > + } > +} > + > /* > * Master migration thread on the source VM. > * It drives the migration and pumps the data down the outgoing channel. > @@ -3790,6 +3822,8 @@ static void *migration_thread(void *opaque) > qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP, > MIGRATION_STATUS_ACTIVE); > > + pause_vm_after_unplug(s); > + > s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start; > > trace_migration_thread_setup_complete(); > @@ -4265,6 +4299,7 @@ static void migration_instance_init(Object *obj) > params->has_announce_max = true; > params->has_announce_rounds = true; > params->has_announce_step = true; > + params->has_pause_vm = true; > > qemu_sem_init(&ms->postcopy_pause_sem, 0); > qemu_sem_init(&ms->postcopy_pause_rp_sem, 0); > diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c > index b5e71d9e6f52..71bc8c15335b 100644 > --- a/monitor/hmp-cmds.c > +++ b/monitor/hmp-cmds.c > @@ -513,6 +513,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict) > } > } > } > + > + monitor_printf(mon, "%s: %s\n", > + MigrationParameter_str(MIGRATION_PARAMETER_PAUSE_VM), > + params->pause_vm ? "on" : "off"); > } > > qapi_free_MigrationParameters(params); > @@ -1399,6 +1403,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict) > error_setg(&err, "The block-bitmap-mapping parameter can only be set " > "through QMP"); > break; > + case MIGRATION_PARAMETER_PAUSE_VM: > + p->has_pause_vm = true; > + visit_type_bool(v, param, &p->pause_vm, &err); > + break; > default: > assert(0); > } >
On 30/09/2021 22:17, Laine Stump wrote: > On 9/30/21 1:09 PM, Laurent Vivier wrote: >> If we want to save a snapshot of a VM to a file, we used to follow the >> following steps: >> >> 1- stop the VM: >> (qemu) stop >> >> 2- migrate the VM to a file: >> (qemu) migrate "exec:cat > snapshot" >> >> 3- resume the VM: >> (qemu) cont >> >> After that we can restore the snapshot with: >> qemu-system-x86_64 ... -incoming "exec:cat snapshot" >> (qemu) cont > > This is the basics of what libvirt does for a snapshot, and steps 1+2 are what it does for > a "managedsave" (where it saves the snapshot to disk and then terminates the qemu process, > for later re-animation). > > In those cases, it seems like this new parameter could work for us - instead of explicitly > pausing the guest prior to migrating it to disk, we would set this new parameter to on, > then directly migrate-to-disk (relying on qemu to do the pause). Care will need to be > taken to assure that error recovery behaves the same though. In case of error, the VM is restarted like it's done for a standard migration. I can change that if you need. An other point is the VM state sent to the migration stream is "paused", it means that machine needs to be resumed after the stream is loaded (from the file or on destination in the case of a real migration), but it can be also changed to be "running" so the machine will be resumed automatically at the end of the file loading (or real migration) > There are a couple of cases when libvirt apparently *doesn't* pause the guest during the > migrate-to-disk, both having to do with saving a coredump of the guest. Since I really > have no idea of how common/important that is (or even if my assessment of the code is > correct), I'm Cc'ing this patch to libvir-list to make sure it catches the attention of > someone who knows the answers and implications. It's an interesting point I need to test and think about: in case of a coredump I guess the machine is crashed and doesn't answer to the unplug request and so the failover unplug cannot be done. For the moment the migration will hang until it is canceled. IT can be annoying if we want to debug the cause of the crash... > >> But when failover is configured, it doesn't work anymore. >> >> As the failover needs to ask the guest OS to unplug the card >> the machine cannot be paused. >> >> This patch introduces a new migration parameter, "pause-vm", that >> asks the migration to pause the VM during the migration startup >> phase after the the card is unplugged. >> >> Once the migration is done, we only need to resume the VM with >> "cont" and the card is plugged back: >> >> 1- set the parameter: >> (qemu) migrate_set_parameter pause-vm on >> >> 2- migrate the VM to a file: >> (qemu) migrate "exec:cat > snapshot" >> >> The primary failover card (VFIO) is unplugged and the VM is paused. >> >> 3- resume the VM: >> (qemu) cont >> >> The VM restarts and the primary failover card is plugged back >> >> The VM state sent in the migration stream is "paused", it means >> when the snapshot is loaded or if the stream is sent to a destination >> QEMU, the VM needs to be resumed manually. >> >> Signed-off-by: Laurent Vivier <lvivier@redhat.com> >> --- >> qapi/migration.json | 20 +++++++++++++++--- >> include/hw/virtio/virtio-net.h | 1 + >> hw/net/virtio-net.c | 33 ++++++++++++++++++++++++++++++ >> migration/migration.c | 37 +++++++++++++++++++++++++++++++++- >> monitor/hmp-cmds.c | 8 ++++++++ >> 5 files changed, 95 insertions(+), 4 deletions(-) >> ... Thanks, Laurent
On Thu, Sep 30, 2021 at 16:17:44 -0400, Laine Stump wrote: > On 9/30/21 1:09 PM, Laurent Vivier wrote: > > If we want to save a snapshot of a VM to a file, we used to follow the > > following steps: > > > > 1- stop the VM: > > (qemu) stop > > > > 2- migrate the VM to a file: > > (qemu) migrate "exec:cat > snapshot" > > > > 3- resume the VM: > > (qemu) cont > > > > After that we can restore the snapshot with: > > qemu-system-x86_64 ... -incoming "exec:cat snapshot" > > (qemu) cont > > This is the basics of what libvirt does for a snapshot, and steps 1+2 are > what it does for a "managedsave" (where it saves the snapshot to disk and > then terminates the qemu process, for later re-animation). > > In those cases, it seems like this new parameter could work for us - instead > of explicitly pausing the guest prior to migrating it to disk, we would set > this new parameter to on, then directly migrate-to-disk (relying on qemu to > do the pause). Care will need to be taken to assure that error recovery > behaves the same though. Yup, see below ... > There are a couple of cases when libvirt apparently *doesn't* pause the > guest during the migrate-to-disk, both having to do with saving a coredump > of the guest. Since I really have no idea of how common/important that is In most cases when doing a coredump the guest is paused because of an emulation/guest error. One example where the VM is not paused is a 'live' snapshot. It wastes disk space and is not commonly used thoug. Where it might become interesting is with the 'background-snapshot' migration flag. Ideally failover will be fixed to properly work with that one too. In such case we don't want to pause the VM (but we have to AFAIK, the backround-snapshot migration can't be done as part of 'transacetion' yet, so we need to pause the VM to kick off the migration (memory-snapshot) and then snapshot the disks). > (or even if my assessment of the code is correct), I'm Cc'ing this patch to > libvir-list to make sure it catches the attention of someone who knows the > answers and implications. Well cc-ing relevant patches to libvirt is always good. Especially if we'll need to adapt the code to support the new feature. > > But when failover is configured, it doesn't work anymore. > > > > As the failover needs to ask the guest OS to unplug the card > > the machine cannot be paused. > > > > This patch introduces a new migration parameter, "pause-vm", that > > asks the migration to pause the VM during the migration startup > > phase after the the card is unplugged. Is there a time limit to this? If guest interaction is required it might take unbounded time. In case of snapshots the expectation from the user is that the state capture happens "reasonably" immediately after issuing the command. If we introduce an possibly unbounded wait time, it will need an re-imagining of the snapshot workflow and the feature will need to be an opt-in. > > > > Once the migration is done, we only need to resume the VM with > > "cont" and the card is plugged back: > > > > 1- set the parameter: > > (qemu) migrate_set_parameter pause-vm on > > > > 2- migrate the VM to a file: > > (qemu) migrate "exec:cat > snapshot" > > > > The primary failover card (VFIO) is unplugged and the VM is paused. > > > > 3- resume the VM: > > (qemu) cont > > > > The VM restarts and the primary failover card is plugged back > > > > The VM state sent in the migration stream is "paused", it means > > when the snapshot is loaded or if the stream is sent to a destination > > QEMU, the VM needs to be resumed manually. This is not a problem, libvirt is already dealing with this internally anyways.
On Thu, Sep 30, 2021 at 04:17:44PM -0400, Laine Stump wrote: > On 9/30/21 1:09 PM, Laurent Vivier wrote: > > If we want to save a snapshot of a VM to a file, we used to follow the > > following steps: > > > > 1- stop the VM: > > (qemu) stop > > > > 2- migrate the VM to a file: > > (qemu) migrate "exec:cat > snapshot" > > > > 3- resume the VM: > > (qemu) cont > > > > After that we can restore the snapshot with: > > qemu-system-x86_64 ... -incoming "exec:cat snapshot" > > (qemu) cont > > This is the basics of what libvirt does for a snapshot, and steps 1+2 are > what it does for a "managedsave" (where it saves the snapshot to disk and > then terminates the qemu process, for later re-animation). > > In those cases, it seems like this new parameter could work for us - instead > of explicitly pausing the guest prior to migrating it to disk, we would set > this new parameter to on, then directly migrate-to-disk (relying on qemu to > do the pause). Care will need to be taken to assure that error recovery > behaves the same though. What libvirt does is actually quite different from this in a signficant way. In the HMP example here 'migrate' is a blocking command that does not return until migration is finished. Libvirt uses QMP and 'migrate' there is a asynchronous command that merely launches the migration and returns control to the client. IOW, what libvirt does is stop migrate while status != failed || completed query-migrate ...also receive any QMP migration events... ...possibly modify migration parameters... cont With this pattern I'm not seeing any need for a new migration parameter for libvirt. The migration status lets us distinguish when QEMU is in the "waiting for unplug" phase vs the "active" phase. So AFAICT, libvirt can do: migrate while status != failed || completed query-migrate ...also receive any QMP migration events.. if status changed wait-for-unplug to active stop ...possibly modify migration parameters... cont There is a small window here when the guest CPUs are running but migration is active. In most cases for libvirt that is harmless. If there are cases where libvirt needs a strong guarantee to synchonize the 'stop' with some other option, then the new proposed "pause-vm" parameter as the same problem as libvirt can't sychronize against that either. > There are a couple of cases when libvirt apparently *doesn't* pause the > guest during the migrate-to-disk, both having to do with saving a coredump > of the guest. Since I really have no idea of how common/important that is > (or even if my assessment of the code is correct), I'm Cc'ing this patch to > libvir-list to make sure it catches the attention of someone who knows the > answers and implications. IIUC, the problem with unplug only happens when libvirt pauses the guest. So surely if there are some scenarios where we're not pausing the guest, there's no problem to solve for those. Regards, Daniel
* Laurent Vivier (lvivier@redhat.com) wrote: > If we want to save a snapshot of a VM to a file, we used to follow the > following steps: > > 1- stop the VM: > (qemu) stop > > 2- migrate the VM to a file: > (qemu) migrate "exec:cat > snapshot" > > 3- resume the VM: > (qemu) cont > > After that we can restore the snapshot with: > qemu-system-x86_64 ... -incoming "exec:cat snapshot" > (qemu) cont > > But when failover is configured, it doesn't work anymore. > > As the failover needs to ask the guest OS to unplug the card > the machine cannot be paused. > > This patch introduces a new migration parameter, "pause-vm", that > asks the migration to pause the VM during the migration startup > phase after the the card is unplugged. > > Once the migration is done, we only need to resume the VM with > "cont" and the card is plugged back: > > 1- set the parameter: > (qemu) migrate_set_parameter pause-vm on > > 2- migrate the VM to a file: > (qemu) migrate "exec:cat > snapshot" > > The primary failover card (VFIO) is unplugged and the VM is paused. > > 3- resume the VM: > (qemu) cont > > The VM restarts and the primary failover card is plugged back > > The VM state sent in the migration stream is "paused", it means > when the snapshot is loaded or if the stream is sent to a destination > QEMU, the VM needs to be resumed manually. > > Signed-off-by: Laurent Vivier <lvivier@redhat.com> A mix of comments: a) As a boolean, this should be a MigrationCapability rather than a parameter b) We already have a pause-before-switchover capability for a pause that happens later in the flow; so this would be something like pause-after-unplug c) Is this really the right answer? Could this be done a different way by doing the unplugs using (a possibly new) qmp command - so that you can explicitly trigger the unplug prior to the migration? Dave > --- > qapi/migration.json | 20 +++++++++++++++--- > include/hw/virtio/virtio-net.h | 1 + > hw/net/virtio-net.c | 33 ++++++++++++++++++++++++++++++ > migration/migration.c | 37 +++++++++++++++++++++++++++++++++- > monitor/hmp-cmds.c | 8 ++++++++ > 5 files changed, 95 insertions(+), 4 deletions(-) > > diff --git a/qapi/migration.json b/qapi/migration.json > index 88f07baedd06..86284d96ad2a 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -743,6 +743,10 @@ > # block device name if there is one, and to their node name > # otherwise. (Since 5.2) > # > +# @pause-vm: Pause the virtual machine before doing the migration. > +# This allows QEMU to unplug a card before doing the > +# migration as it depends on the guest kernel. (since 6.2) > +# > # Since: 2.4 > ## > { 'enum': 'MigrationParameter', > @@ -758,7 +762,7 @@ > 'xbzrle-cache-size', 'max-postcopy-bandwidth', > 'max-cpu-throttle', 'multifd-compression', > 'multifd-zlib-level' ,'multifd-zstd-level', > - 'block-bitmap-mapping' ] } > + 'block-bitmap-mapping', 'pause-vm' ] } > > ## > # @MigrateSetParameters: > @@ -903,6 +907,10 @@ > # block device name if there is one, and to their node name > # otherwise. (Since 5.2) > # > +# @pause-vm: Pause the virtual machine before doing the migration. > +# This allows QEMU to unplug a card before doing the > +# migration as it depends on the guest kernel. (since 6.2) > +# > # Since: 2.4 > ## > # TODO either fuse back into MigrationParameters, or make > @@ -934,7 +942,8 @@ > '*multifd-compression': 'MultiFDCompression', > '*multifd-zlib-level': 'uint8', > '*multifd-zstd-level': 'uint8', > - '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } } > + '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ], > + '*pause-vm': 'bool' } } > > ## > # @migrate-set-parameters: > @@ -1099,6 +1108,10 @@ > # block device name if there is one, and to their node name > # otherwise. (Since 5.2) > # > +# @pause-vm: Pause the virtual machine before doing the migration. > +# This allows QEMU to unplug a card before doing the > +# migration as it depends on the guest kernel. (since 6.2) > +# > # Since: 2.4 > ## > { 'struct': 'MigrationParameters', > @@ -1128,7 +1141,8 @@ > '*multifd-compression': 'MultiFDCompression', > '*multifd-zlib-level': 'uint8', > '*multifd-zstd-level': 'uint8', > - '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } } > + '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ], > + '*pause-vm': 'bool' } } > > ## > # @query-migrate-parameters: > diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h > index 824a69c23f06..a6c186e28b45 100644 > --- a/include/hw/virtio/virtio-net.h > +++ b/include/hw/virtio/virtio-net.h > @@ -210,6 +210,7 @@ struct VirtIONet { > bool failover; > DeviceListener primary_listener; > Notifier migration_state; > + VMChangeStateEntry *vm_state; > VirtioNetRssData rss_data; > struct NetRxPkt *rx_pkt; > struct EBPFRSSContext ebpf_rss; > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > index f205331dcf8c..c83364eac47b 100644 > --- a/hw/net/virtio-net.c > +++ b/hw/net/virtio-net.c > @@ -39,6 +39,7 @@ > #include "migration/misc.h" > #include "standard-headers/linux/ethtool.h" > #include "sysemu/sysemu.h" > +#include "sysemu/runstate.h" > #include "trace.h" > #include "monitor/qdev.h" > #include "hw/pci/pci.h" > @@ -3303,6 +3304,35 @@ static void virtio_net_migration_state_notifier(Notifier *notifier, void *data) > virtio_net_handle_migration_primary(n, s); > } > > +static void virtio_net_failover_restart_cb(void *opaque, bool running, > + RunState state) > +{ > + DeviceState *dev; > + VirtIONet *n = opaque; > + Error *err = NULL; > + PCIDevice *pdev; > + > + if (!running) { > + return; > + } > + > + dev = failover_find_primary_device(n); > + if (!dev) { > + return; > + } > + > + pdev = PCI_DEVICE(dev); > + if (!pdev->partially_hotplugged) { > + return; > + } > + > + if (!failover_replug_primary(n, dev, &err)) { > + if (err) { > + error_report_err(err); > + } > + } > +} > + > static bool failover_hide_primary_device(DeviceListener *listener, > QemuOpts *device_opts) > { > @@ -3360,6 +3390,8 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp) > device_listener_register(&n->primary_listener); > n->migration_state.notify = virtio_net_migration_state_notifier; > add_migration_state_change_notifier(&n->migration_state); > + n->vm_state = qemu_add_vm_change_state_handler( > + virtio_net_failover_restart_cb, n); > n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY); > } > > @@ -3508,6 +3540,7 @@ static void virtio_net_device_unrealize(DeviceState *dev) > if (n->failover) { > device_listener_unregister(&n->primary_listener); > remove_migration_state_change_notifier(&n->migration_state); > + qemu_del_vm_change_state_handler(n->vm_state); > } > > max_queues = n->multiqueue ? n->max_queues : 1; > diff --git a/migration/migration.c b/migration/migration.c > index bb909781b7f5..9c96986d4abf 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -901,6 +901,9 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp) > s->parameters.block_bitmap_mapping); > } > > + params->has_pause_vm = true; > + params->pause_vm = s->parameters.pause_vm; > + > return params; > } > > @@ -1549,6 +1552,11 @@ static void migrate_params_test_apply(MigrateSetParameters *params, > dest->has_block_bitmap_mapping = true; > dest->block_bitmap_mapping = params->block_bitmap_mapping; > } > + > + if (params->has_pause_vm) { > + dest->has_pause_vm = true; > + dest->pause_vm = params->pause_vm; > + } > } > > static void migrate_params_apply(MigrateSetParameters *params, Error **errp) > @@ -1671,6 +1679,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp) > QAPI_CLONE(BitmapMigrationNodeAliasList, > params->block_bitmap_mapping); > } > + > + if (params->has_pause_vm) { > + s->parameters.pause_vm = params->pause_vm; > + } > } > > void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp) > @@ -1718,6 +1730,12 @@ void qmp_migrate_start_postcopy(Error **errp) > " started"); > return; > } > + > + if (s->parameters.pause_vm) { > + error_setg(errp, "Postcopy cannot be started if pause-vm is on"); > + return; > + } > + > /* > * we don't error if migration has finished since that would be racy > * with issuing this command. > @@ -3734,13 +3752,27 @@ static void qemu_savevm_wait_unplug(MigrationState *s, int old_state, > "failure"); > } > } > - > migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG, new_state); > } else { > migrate_set_state(&s->state, old_state, new_state); > } > } > > +/* stop the VM before starting the migration but after device unplug */ > +static void pause_vm_after_unplug(MigrationState *s) > +{ > + if (s->parameters.pause_vm) { > + qemu_mutex_lock_iothread(); > + qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); > + s->vm_was_running = runstate_is_running(); > + if (vm_stop_force_state(RUN_STATE_PAUSED)) { > + migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE, > + MIGRATION_STATUS_FAILED); > + } > + qemu_mutex_unlock_iothread(); > + } > +} > + > /* > * Master migration thread on the source VM. > * It drives the migration and pumps the data down the outgoing channel. > @@ -3790,6 +3822,8 @@ static void *migration_thread(void *opaque) > qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP, > MIGRATION_STATUS_ACTIVE); > > + pause_vm_after_unplug(s); > + > s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start; > > trace_migration_thread_setup_complete(); > @@ -4265,6 +4299,7 @@ static void migration_instance_init(Object *obj) > params->has_announce_max = true; > params->has_announce_rounds = true; > params->has_announce_step = true; > + params->has_pause_vm = true; > > qemu_sem_init(&ms->postcopy_pause_sem, 0); > qemu_sem_init(&ms->postcopy_pause_rp_sem, 0); > diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c > index b5e71d9e6f52..71bc8c15335b 100644 > --- a/monitor/hmp-cmds.c > +++ b/monitor/hmp-cmds.c > @@ -513,6 +513,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict) > } > } > } > + > + monitor_printf(mon, "%s: %s\n", > + MigrationParameter_str(MIGRATION_PARAMETER_PAUSE_VM), > + params->pause_vm ? "on" : "off"); > } > > qapi_free_MigrationParameters(params); > @@ -1399,6 +1403,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict) > error_setg(&err, "The block-bitmap-mapping parameter can only be set " > "through QMP"); > break; > + case MIGRATION_PARAMETER_PAUSE_VM: > + p->has_pause_vm = true; > + visit_type_bool(v, param, &p->pause_vm, &err); > + break; > default: > assert(0); > } > -- > 2.31.1 >
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote: > * Laurent Vivier (lvivier@redhat.com) wrote: >> If we want to save a snapshot of a VM to a file, we used to follow the >> following steps: >> >> 1- stop the VM: >> (qemu) stop >> >> 2- migrate the VM to a file: >> (qemu) migrate "exec:cat > snapshot" >> >> 3- resume the VM: >> (qemu) cont >> >> After that we can restore the snapshot with: >> qemu-system-x86_64 ... -incoming "exec:cat snapshot" >> (qemu) cont >> >> But when failover is configured, it doesn't work anymore. >> >> As the failover needs to ask the guest OS to unplug the card >> the machine cannot be paused. >> >> This patch introduces a new migration parameter, "pause-vm", that >> asks the migration to pause the VM during the migration startup >> phase after the the card is unplugged. >> >> Once the migration is done, we only need to resume the VM with >> "cont" and the card is plugged back: >> >> 1- set the parameter: >> (qemu) migrate_set_parameter pause-vm on >> >> 2- migrate the VM to a file: >> (qemu) migrate "exec:cat > snapshot" >> >> The primary failover card (VFIO) is unplugged and the VM is paused. >> >> 3- resume the VM: >> (qemu) cont >> >> The VM restarts and the primary failover card is plugged back >> >> The VM state sent in the migration stream is "paused", it means >> when the snapshot is loaded or if the stream is sent to a destination >> QEMU, the VM needs to be resumed manually. >> >> Signed-off-by: Laurent Vivier <lvivier@redhat.com> > > A mix of comments: > a) As a boolean, this should be a MigrationCapability rather than a > parameter > b) We already have a pause-before-switchover capability for a pause > that happens later in the flow; so this would be something like > pause-after-unplug > c) Is this really the right answer? Could this be done a different > way by doing the unplugs using (a possibly new) qmp command - so > that you can explicitly trigger the unplug prior to the migration? Not if you want the wait to be minimal. What managedsave wants to do is doing the migration with the guest stopped. And wait for it until the last moment. Doing this is qemu is "relatively" simple. Doing that on libvirt is extremely complex, because you basically have to : - unplug the device - wait for unplug to finish - stop the guest - migrate paused - (restart the guest) If you do it in libvirt, you are increasing the time betwee wait for unplug to finish and stop the guest. But the biggest problem is what happens if the migration (or anything else fails). qemu failover code already knows how to handle the stop/continuation of the vfio device. It is what happens on a normal run. If you do this on libvirt, it needs to be able to recover for all scenarios, what is much more complex in my hunble opinion. Later, Juan.
Laurent Vivier <lvivier@redhat.com> wrote: > If we want to save a snapshot of a VM to a file, we used to follow the > following steps: > > 1- stop the VM: > (qemu) stop > > 2- migrate the VM to a file: > (qemu) migrate "exec:cat > snapshot" > > 3- resume the VM: > (qemu) cont > > After that we can restore the snapshot with: > qemu-system-x86_64 ... -incoming "exec:cat snapshot" > (qemu) cont > > But when failover is configured, it doesn't work anymore. > > As the failover needs to ask the guest OS to unplug the card > the machine cannot be paused. > > This patch introduces a new migration parameter, "pause-vm", that > asks the migration to pause the VM during the migration startup > phase after the the card is unplugged. > > Once the migration is done, we only need to resume the VM with > "cont" and the card is plugged back: > > 1- set the parameter: > (qemu) migrate_set_parameter pause-vm on > > 2- migrate the VM to a file: > (qemu) migrate "exec:cat > snapshot" > > The primary failover card (VFIO) is unplugged and the VM is paused. > > 3- resume the VM: > (qemu) cont > > The VM restarts and the primary failover card is plugged back > > The VM state sent in the migration stream is "paused", it means > when the snapshot is loaded or if the stream is sent to a destination > QEMU, the VM needs to be resumed manually. > > Signed-off-by: Laurent Vivier <lvivier@redhat.com> I agree with Dave that you should use a capability instead of a parameter. Other than that, the code for the new parameter looks ok. > @@ -3734,13 +3752,27 @@ static void qemu_savevm_wait_unplug(MigrationState *s, int old_state, > "failure"); > } > } > - > migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG, new_state); > } else { > migrate_set_state(&s->state, old_state, new_state); > } > } This change is spurious. And to make this more generic, I think you can consider changing the name to pause_during_migration. Because that is basically what managedsave needs (If I understood Laine correctly). Later, Juan.
diff --git a/qapi/migration.json b/qapi/migration.json index 88f07baedd06..86284d96ad2a 100644 --- a/qapi/migration.json +++ b/qapi/migration.json @@ -743,6 +743,10 @@ # block device name if there is one, and to their node name # otherwise. (Since 5.2) # +# @pause-vm: Pause the virtual machine before doing the migration. +# This allows QEMU to unplug a card before doing the +# migration as it depends on the guest kernel. (since 6.2) +# # Since: 2.4 ## { 'enum': 'MigrationParameter', @@ -758,7 +762,7 @@ 'xbzrle-cache-size', 'max-postcopy-bandwidth', 'max-cpu-throttle', 'multifd-compression', 'multifd-zlib-level' ,'multifd-zstd-level', - 'block-bitmap-mapping' ] } + 'block-bitmap-mapping', 'pause-vm' ] } ## # @MigrateSetParameters: @@ -903,6 +907,10 @@ # block device name if there is one, and to their node name # otherwise. (Since 5.2) # +# @pause-vm: Pause the virtual machine before doing the migration. +# This allows QEMU to unplug a card before doing the +# migration as it depends on the guest kernel. (since 6.2) +# # Since: 2.4 ## # TODO either fuse back into MigrationParameters, or make @@ -934,7 +942,8 @@ '*multifd-compression': 'MultiFDCompression', '*multifd-zlib-level': 'uint8', '*multifd-zstd-level': 'uint8', - '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } } + '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ], + '*pause-vm': 'bool' } } ## # @migrate-set-parameters: @@ -1099,6 +1108,10 @@ # block device name if there is one, and to their node name # otherwise. (Since 5.2) # +# @pause-vm: Pause the virtual machine before doing the migration. +# This allows QEMU to unplug a card before doing the +# migration as it depends on the guest kernel. (since 6.2) +# # Since: 2.4 ## { 'struct': 'MigrationParameters', @@ -1128,7 +1141,8 @@ '*multifd-compression': 'MultiFDCompression', '*multifd-zlib-level': 'uint8', '*multifd-zstd-level': 'uint8', - '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ] } } + '*block-bitmap-mapping': [ 'BitmapMigrationNodeAlias' ], + '*pause-vm': 'bool' } } ## # @query-migrate-parameters: diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h index 824a69c23f06..a6c186e28b45 100644 --- a/include/hw/virtio/virtio-net.h +++ b/include/hw/virtio/virtio-net.h @@ -210,6 +210,7 @@ struct VirtIONet { bool failover; DeviceListener primary_listener; Notifier migration_state; + VMChangeStateEntry *vm_state; VirtioNetRssData rss_data; struct NetRxPkt *rx_pkt; struct EBPFRSSContext ebpf_rss; diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index f205331dcf8c..c83364eac47b 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -39,6 +39,7 @@ #include "migration/misc.h" #include "standard-headers/linux/ethtool.h" #include "sysemu/sysemu.h" +#include "sysemu/runstate.h" #include "trace.h" #include "monitor/qdev.h" #include "hw/pci/pci.h" @@ -3303,6 +3304,35 @@ static void virtio_net_migration_state_notifier(Notifier *notifier, void *data) virtio_net_handle_migration_primary(n, s); } +static void virtio_net_failover_restart_cb(void *opaque, bool running, + RunState state) +{ + DeviceState *dev; + VirtIONet *n = opaque; + Error *err = NULL; + PCIDevice *pdev; + + if (!running) { + return; + } + + dev = failover_find_primary_device(n); + if (!dev) { + return; + } + + pdev = PCI_DEVICE(dev); + if (!pdev->partially_hotplugged) { + return; + } + + if (!failover_replug_primary(n, dev, &err)) { + if (err) { + error_report_err(err); + } + } +} + static bool failover_hide_primary_device(DeviceListener *listener, QemuOpts *device_opts) { @@ -3360,6 +3390,8 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp) device_listener_register(&n->primary_listener); n->migration_state.notify = virtio_net_migration_state_notifier; add_migration_state_change_notifier(&n->migration_state); + n->vm_state = qemu_add_vm_change_state_handler( + virtio_net_failover_restart_cb, n); n->host_features |= (1ULL << VIRTIO_NET_F_STANDBY); } @@ -3508,6 +3540,7 @@ static void virtio_net_device_unrealize(DeviceState *dev) if (n->failover) { device_listener_unregister(&n->primary_listener); remove_migration_state_change_notifier(&n->migration_state); + qemu_del_vm_change_state_handler(n->vm_state); } max_queues = n->multiqueue ? n->max_queues : 1; diff --git a/migration/migration.c b/migration/migration.c index bb909781b7f5..9c96986d4abf 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -901,6 +901,9 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp) s->parameters.block_bitmap_mapping); } + params->has_pause_vm = true; + params->pause_vm = s->parameters.pause_vm; + return params; } @@ -1549,6 +1552,11 @@ static void migrate_params_test_apply(MigrateSetParameters *params, dest->has_block_bitmap_mapping = true; dest->block_bitmap_mapping = params->block_bitmap_mapping; } + + if (params->has_pause_vm) { + dest->has_pause_vm = true; + dest->pause_vm = params->pause_vm; + } } static void migrate_params_apply(MigrateSetParameters *params, Error **errp) @@ -1671,6 +1679,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp) QAPI_CLONE(BitmapMigrationNodeAliasList, params->block_bitmap_mapping); } + + if (params->has_pause_vm) { + s->parameters.pause_vm = params->pause_vm; + } } void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp) @@ -1718,6 +1730,12 @@ void qmp_migrate_start_postcopy(Error **errp) " started"); return; } + + if (s->parameters.pause_vm) { + error_setg(errp, "Postcopy cannot be started if pause-vm is on"); + return; + } + /* * we don't error if migration has finished since that would be racy * with issuing this command. @@ -3734,13 +3752,27 @@ static void qemu_savevm_wait_unplug(MigrationState *s, int old_state, "failure"); } } - migrate_set_state(&s->state, MIGRATION_STATUS_WAIT_UNPLUG, new_state); } else { migrate_set_state(&s->state, old_state, new_state); } } +/* stop the VM before starting the migration but after device unplug */ +static void pause_vm_after_unplug(MigrationState *s) +{ + if (s->parameters.pause_vm) { + qemu_mutex_lock_iothread(); + qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); + s->vm_was_running = runstate_is_running(); + if (vm_stop_force_state(RUN_STATE_PAUSED)) { + migrate_set_state(&s->state, MIGRATION_STATUS_ACTIVE, + MIGRATION_STATUS_FAILED); + } + qemu_mutex_unlock_iothread(); + } +} + /* * Master migration thread on the source VM. * It drives the migration and pumps the data down the outgoing channel. @@ -3790,6 +3822,8 @@ static void *migration_thread(void *opaque) qemu_savevm_wait_unplug(s, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_ACTIVE); + pause_vm_after_unplug(s); + s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start; trace_migration_thread_setup_complete(); @@ -4265,6 +4299,7 @@ static void migration_instance_init(Object *obj) params->has_announce_max = true; params->has_announce_rounds = true; params->has_announce_step = true; + params->has_pause_vm = true; qemu_sem_init(&ms->postcopy_pause_sem, 0); qemu_sem_init(&ms->postcopy_pause_rp_sem, 0); diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c index b5e71d9e6f52..71bc8c15335b 100644 --- a/monitor/hmp-cmds.c +++ b/monitor/hmp-cmds.c @@ -513,6 +513,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict) } } } + + monitor_printf(mon, "%s: %s\n", + MigrationParameter_str(MIGRATION_PARAMETER_PAUSE_VM), + params->pause_vm ? "on" : "off"); } qapi_free_MigrationParameters(params); @@ -1399,6 +1403,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict) error_setg(&err, "The block-bitmap-mapping parameter can only be set " "through QMP"); break; + case MIGRATION_PARAMETER_PAUSE_VM: + p->has_pause_vm = true; + visit_type_bool(v, param, &p->pause_vm, &err); + break; default: assert(0); }
If we want to save a snapshot of a VM to a file, we used to follow the following steps: 1- stop the VM: (qemu) stop 2- migrate the VM to a file: (qemu) migrate "exec:cat > snapshot" 3- resume the VM: (qemu) cont After that we can restore the snapshot with: qemu-system-x86_64 ... -incoming "exec:cat snapshot" (qemu) cont But when failover is configured, it doesn't work anymore. As the failover needs to ask the guest OS to unplug the card the machine cannot be paused. This patch introduces a new migration parameter, "pause-vm", that asks the migration to pause the VM during the migration startup phase after the the card is unplugged. Once the migration is done, we only need to resume the VM with "cont" and the card is plugged back: 1- set the parameter: (qemu) migrate_set_parameter pause-vm on 2- migrate the VM to a file: (qemu) migrate "exec:cat > snapshot" The primary failover card (VFIO) is unplugged and the VM is paused. 3- resume the VM: (qemu) cont The VM restarts and the primary failover card is plugged back The VM state sent in the migration stream is "paused", it means when the snapshot is loaded or if the stream is sent to a destination QEMU, the VM needs to be resumed manually. Signed-off-by: Laurent Vivier <lvivier@redhat.com> --- qapi/migration.json | 20 +++++++++++++++--- include/hw/virtio/virtio-net.h | 1 + hw/net/virtio-net.c | 33 ++++++++++++++++++++++++++++++ migration/migration.c | 37 +++++++++++++++++++++++++++++++++- monitor/hmp-cmds.c | 8 ++++++++ 5 files changed, 95 insertions(+), 4 deletions(-)