mbox series

[V2,00/11] allow cpr-reboot for vfio

Message ID 1705071910-174321-1-git-send-email-steven.sistare@oracle.com (mailing list archive)
Headers show
Series allow cpr-reboot for vfio | expand

Message

Steven Sistare Jan. 12, 2024, 3:04 p.m. UTC
Allow cpr-reboot for vfio if the guest is in the suspended runstate.  The
guest drivers' suspend methods flush outstanding requests and re-initialize
the devices, and thus there is no device state to save and restore.  The
user is responsible for suspending the guest before initiating cpr, such as
by issuing guest-suspend-ram to the qemu guest agent.

Most of the patches in this series enhance migration notifiers so they can
return an error status and message.  The last two patches register a notifier
for vfio that returns an error if the guest is not suspended.

Steve Sistare (11):
  notify: pass error to notifier with return
  migration: remove error from notifier data
  migration: convert to NotifierWithReturn
  migration: remove migration_in_postcopy parameter
  migration: MigrationEvent for notifiers
  migration: MigrationNotifyFunc
  migration: per-mode notifiers
  migration: refactor migrate_fd_connect failures
  migration: notifier error checking
  vfio: register container for cpr
  vfio: allow cpr-reboot migration if suspended

 hw/net/virtio-net.c            |  14 ++---
 hw/vfio/common.c               |   2 +-
 hw/vfio/container.c            |  11 +++-
 hw/vfio/cpr.c                  |  39 ++++++++++++++
 hw/vfio/meson.build            |   1 +
 hw/vfio/migration.c            |  13 +++--
 hw/virtio/vhost-user.c         |  10 ++--
 hw/virtio/virtio-balloon.c     |   3 +-
 include/hw/vfio/vfio-common.h  |   6 ++-
 include/hw/virtio/virtio-net.h |   2 +-
 include/migration/misc.h       |  21 +++++---
 include/qemu/notify.h          |   7 ++-
 migration/migration.c          | 117 +++++++++++++++++++++++++++--------------
 migration/postcopy-ram.c       |   3 +-
 migration/postcopy-ram.h       |   1 -
 migration/ram.c                |  12 ++---
 net/vhost-vdpa.c               |  15 +++---
 ui/spice-core.c                |  19 +++----
 util/notify.c                  |   5 +-
 19 files changed, 206 insertions(+), 95 deletions(-)
 create mode 100644 hw/vfio/cpr.c

Comments

Alex Williamson Jan. 12, 2024, 9:38 p.m. UTC | #1
On Fri, 12 Jan 2024 07:04:59 -0800
Steve Sistare <steven.sistare@oracle.com> wrote:

> Allow cpr-reboot for vfio if the guest is in the suspended runstate.  The
> guest drivers' suspend methods flush outstanding requests and re-initialize
> the devices, and thus there is no device state to save and restore.  The
> user is responsible for suspending the guest before initiating cpr, such as
> by issuing guest-suspend-ram to the qemu guest agent.
> 
> Most of the patches in this series enhance migration notifiers so they can
> return an error status and message.  The last two patches register a notifier
> for vfio that returns an error if the guest is not suspended.

Hi Steve,

This appears to only support legacy container mode and not cdev/iommufd
mode.  Shouldn't this support both?  Thanks,

Alex

> Steve Sistare (11):
>   notify: pass error to notifier with return
>   migration: remove error from notifier data
>   migration: convert to NotifierWithReturn
>   migration: remove migration_in_postcopy parameter
>   migration: MigrationEvent for notifiers
>   migration: MigrationNotifyFunc
>   migration: per-mode notifiers
>   migration: refactor migrate_fd_connect failures
>   migration: notifier error checking
>   vfio: register container for cpr
>   vfio: allow cpr-reboot migration if suspended
> 
>  hw/net/virtio-net.c            |  14 ++---
>  hw/vfio/common.c               |   2 +-
>  hw/vfio/container.c            |  11 +++-
>  hw/vfio/cpr.c                  |  39 ++++++++++++++
>  hw/vfio/meson.build            |   1 +
>  hw/vfio/migration.c            |  13 +++--
>  hw/virtio/vhost-user.c         |  10 ++--
>  hw/virtio/virtio-balloon.c     |   3 +-
>  include/hw/vfio/vfio-common.h  |   6 ++-
>  include/hw/virtio/virtio-net.h |   2 +-
>  include/migration/misc.h       |  21 +++++---
>  include/qemu/notify.h          |   7 ++-
>  migration/migration.c          | 117 +++++++++++++++++++++++++++--------------
>  migration/postcopy-ram.c       |   3 +-
>  migration/postcopy-ram.h       |   1 -
>  migration/ram.c                |  12 ++---
>  net/vhost-vdpa.c               |  15 +++---
>  ui/spice-core.c                |  19 +++----
>  util/notify.c                  |   5 +-
>  19 files changed, 206 insertions(+), 95 deletions(-)
>  create mode 100644 hw/vfio/cpr.c
>
David Hildenbrand Jan. 15, 2024, 10:48 a.m. UTC | #2
On 12.01.24 16:04, Steve Sistare wrote:
> Allow cpr-reboot for vfio if the guest is in the suspended runstate.  The
> guest drivers' suspend methods flush outstanding requests and re-initialize
> the devices, and thus there is no device state to save and restore.  The
> user is responsible for suspending the guest before initiating cpr, such as
> by issuing guest-suspend-ram to the qemu guest agent.

Can you briefly explain what cpr-reboot is, or do you have a pointer to 
some more details?

What is is good for, why would you use it, how does it work?

I *suspect* that you want to live-migrate a VM with vfio devices 
attached, whereby you don't want to migrate device state.
David Hildenbrand Jan. 15, 2024, 10:51 a.m. UTC | #3
On 15.01.24 11:48, David Hildenbrand wrote:
> On 12.01.24 16:04, Steve Sistare wrote:
>> Allow cpr-reboot for vfio if the guest is in the suspended runstate.  The
>> guest drivers' suspend methods flush outstanding requests and re-initialize
>> the devices, and thus there is no device state to save and restore.  The
>> user is responsible for suspending the guest before initiating cpr, such as
>> by issuing guest-suspend-ram to the qemu guest agent.
> 
> Can you briefly explain what cpr-reboot is, or do you have a pointer to
> some more details?
> 
> What is is good for, why would you use it, how does it work?
> 
> I *suspect* that you want to live-migrate a VM with vfio devices
> attached, whereby you don't want to migrate device state.
> 

Ah, found it:

+# @cpr-reboot: The migrate command saves state to a file, allowing one to
+#              quit qemu, reboot to an updated kernel, and restart an updated
+#              version of qemu.  The caller must specify a migration URI
+#              that writes to and reads from a file.  Unlike normal mode,
+#              the use of certain local storage options does not block the
+#              migration, but the caller must not modify guest block devices
+#              between the quit and restart.  To avoid saving guest RAM to the
+#              file, the memory backend must be shared, and the @x-ignore-shared
+#              migration capability must be set.  Guest RAM must be non-volatile
+#              across reboot, such as by backing it with a dax device, but this
+#              is not enforced.  The restarted qemu arguments must match those
+#              used to initially start qemu, plus the -incoming option.
+#              (since 8.2)

I'll note that the use of "reboot" is extremely confusing in this context.
You *might* want to reboot the hypervisor, but that's actually completely
independent from the QEMU implementation.
Steven Sistare Jan. 16, 2024, 8:36 p.m. UTC | #4
On 1/12/2024 4:38 PM, Alex Williamson wrote:
> On Fri, 12 Jan 2024 07:04:59 -0800
> Steve Sistare <steven.sistare@oracle.com> wrote:
> 
>> Allow cpr-reboot for vfio if the guest is in the suspended runstate.  The
>> guest drivers' suspend methods flush outstanding requests and re-initialize
>> the devices, and thus there is no device state to save and restore.  The
>> user is responsible for suspending the guest before initiating cpr, such as
>> by issuing guest-suspend-ram to the qemu guest agent.
>>
>> Most of the patches in this series enhance migration notifiers so they can
>> return an error status and message.  The last two patches register a notifier
>> for vfio that returns an error if the guest is not suspended.
> 
> Hi Steve,
> 
> This appears to only support legacy container mode and not cdev/iommufd
> mode.  Shouldn't this support both?  Thanks,

My bad, thanks! I'll move cpr_reboot_notifier to the base container, and also call
cpr register/unregister from iommufd_cdev_attach/iommufd_cdev_detach.

- Steve

>> Steve Sistare (11):
>>   notify: pass error to notifier with return
>>   migration: remove error from notifier data
>>   migration: convert to NotifierWithReturn
>>   migration: remove migration_in_postcopy parameter
>>   migration: MigrationEvent for notifiers
>>   migration: MigrationNotifyFunc
>>   migration: per-mode notifiers
>>   migration: refactor migrate_fd_connect failures
>>   migration: notifier error checking
>>   vfio: register container for cpr
>>   vfio: allow cpr-reboot migration if suspended
>>
>>  hw/net/virtio-net.c            |  14 ++---
>>  hw/vfio/common.c               |   2 +-
>>  hw/vfio/container.c            |  11 +++-
>>  hw/vfio/cpr.c                  |  39 ++++++++++++++
>>  hw/vfio/meson.build            |   1 +
>>  hw/vfio/migration.c            |  13 +++--
>>  hw/virtio/vhost-user.c         |  10 ++--
>>  hw/virtio/virtio-balloon.c     |   3 +-
>>  include/hw/vfio/vfio-common.h  |   6 ++-
>>  include/hw/virtio/virtio-net.h |   2 +-
>>  include/migration/misc.h       |  21 +++++---
>>  include/qemu/notify.h          |   7 ++-
>>  migration/migration.c          | 117 +++++++++++++++++++++++++++--------------
>>  migration/postcopy-ram.c       |   3 +-
>>  migration/postcopy-ram.h       |   1 -
>>  migration/ram.c                |  12 ++---
>>  net/vhost-vdpa.c               |  15 +++---
>>  ui/spice-core.c                |  19 +++----
>>  util/notify.c                  |   5 +-
>>  19 files changed, 206 insertions(+), 95 deletions(-)
>>  create mode 100644 hw/vfio/cpr.c
>>
>