mbox series

[v2,0/6] migration/block: disk activation rewrite

Message ID 20241206230838.1111496-1-peterx@redhat.com (mailing list archive)
Headers show
Series migration/block: disk activation rewrite | expand

Message

Peter Xu Dec. 6, 2024, 11:08 p.m. UTC
CI: https://gitlab.com/peterx/qemu/-/pipelines/1577280033
 (note: it's a pipeline of two patchsets, to save CI credits and time)

v1: https://lore.kernel.org/r/20241204005138.702289-1-peterx@redhat.com

This is v2 of the series, removing RFC tag, because my goal is to have them
(or some newer version) merged.

The major change is I merged last three patches, and did quite some changes
here and there, to make sure the global disk activation status is always
consistent.  The whole idea is still the same.  I say changelog won't help.

I also temporarily dropped Fabiano's ping-pong test cases to avoid
different versions floating on the list (as I know a new version is coming
at some point. Fabiano: you're taking over the 10.0 pulls, so I assume
you're aware so there's no concern on order of merges).  I'll review the
test cases separately when they're ready, but this series is still tested
with that pingpong test and it keeps working.

I started looking at this problem as a whole when reviewing Fabiano's
series, especially the patch (for a QEMU crash [1]):

https://lore.kernel.org/r/20241125144612.16194-5-farosas@suse.de

The proposed patch could work, but it's unwanted to add such side effect to
migration.  So I start to think about whether we can provide a cleaner
approach, because migration doesn't need the disks to be active to work at
all.  Hence we should try to avoid adding a migration ABI (which doesn't
matter now, but may matter some day) into prepare phase on disk activation
status.  Migration should happen with disks inactivated.

It's also a pure wish that, if bdrv_inactivate_all() could be benign to be
called even if all disks are already inactive.  Then the bug is also gone.
After all, similar call on bdrv_activate_all() upon all-active disks is all
fine.  I hope that wish could still be fair.  But I don't know well on
block layer to say anything meaningful.

And when I was looking at that, I found more things spread all over the
place on disk activation.  I decided to clean all of them up, while
hopefully fixing the QEMU crash [1] too.

For this v2, I did some more tests, I want to make sure all the past paths
keep working at least on failure or cancel races, also in postcopy failure
cases.  So I did below and they all run pass (when I said "emulated" below,
I meant I hacked something to trigger those race / rare failures, because
they aren't easy to trigger with vanilla binary):

- Tested generic migrate_cancel during precopy, disk activation won't be
  affected.  Disk status reports correct values in tracepoints.

- Test Fabiano's ping-pong migration tests on PAUSED state VM.

- Emulated precopy failure before sending non-iterable, disk inactivation
  won't happen, and also activation won't trigger after migration cleanups
  (even if activation on top of activate disk is benign, I checked traces
  to make sure it'll provide consistent disk status, skipping activation).

- Emulated precopy failure right after sending non-iterable. Disks will be
  inactivated, but then can be reactivated properly before VM starts.

- Emulated postcopy failure when sending the packed data (which is after
  disk invalidated), and making sure src VM will get back to live properly,
  re-activate the disks before starting.

- Emulated concurrent migrate_cancel at the end of migration_completion()
  of precopy, after disk inactivated.  Disks can be reactivated properly.

  NOTE: here if dest QEMU didn't quit before migrate_cancel,
  bdrv_activate_all() can crash src QEMU.  This behavior should be the same
  before/after this patch.

Comments welcomed, thanks.

[1] https://gitlab.com/qemu-project/qemu/-/issues/2395

Peter Xu (6):
  migration: Add helper to get target runstate
  qmp/cont: Only activate disks if migration completed
  migration/block: Make late-block-active the default
  migration/block: Apply late-block-active behavior to postcopy
  migration/block: Fix possible race with block_inactive
  migration/block: Rewrite disk activation

 include/migration/misc.h |   4 ++
 migration/migration.h    |   6 +-
 migration/block-active.c |  94 +++++++++++++++++++++++++++
 migration/colo.c         |   2 +-
 migration/migration.c    | 136 +++++++++++++++------------------------
 migration/savevm.c       |  46 ++++++-------
 monitor/qmp-cmds.c       |  22 +++----
 migration/meson.build    |   1 +
 migration/trace-events   |   3 +
 9 files changed, 188 insertions(+), 126 deletions(-)
 create mode 100644 migration/block-active.c

Comments

Fabiano Rosas Dec. 17, 2024, 3:26 p.m. UTC | #1
Peter Xu <peterx@redhat.com> writes:

> CI: https://gitlab.com/peterx/qemu/-/pipelines/1577280033
>  (note: it's a pipeline of two patchsets, to save CI credits and time)
>
> v1: https://lore.kernel.org/r/20241204005138.702289-1-peterx@redhat.com
>
> This is v2 of the series, removing RFC tag, because my goal is to have them
> (or some newer version) merged.
>
> The major change is I merged last three patches, and did quite some changes
> here and there, to make sure the global disk activation status is always
> consistent.  The whole idea is still the same.  I say changelog won't help.
>
> I also temporarily dropped Fabiano's ping-pong test cases to avoid
> different versions floating on the list (as I know a new version is coming
> at some point. Fabiano: you're taking over the 10.0 pulls, so I assume
> you're aware so there's no concern on order of merges).  I'll review the
> test cases separately when they're ready, but this series is still tested
> with that pingpong test and it keeps working.
>
> I started looking at this problem as a whole when reviewing Fabiano's
> series, especially the patch (for a QEMU crash [1]):
>
> https://lore.kernel.org/r/20241125144612.16194-5-farosas@suse.de
>
> The proposed patch could work, but it's unwanted to add such side effect to
> migration.  So I start to think about whether we can provide a cleaner
> approach, because migration doesn't need the disks to be active to work at
> all.  Hence we should try to avoid adding a migration ABI (which doesn't
> matter now, but may matter some day) into prepare phase on disk activation
> status.  Migration should happen with disks inactivated.
>
> It's also a pure wish that, if bdrv_inactivate_all() could be benign to be
> called even if all disks are already inactive.  Then the bug is also gone.
> After all, similar call on bdrv_activate_all() upon all-active disks is all
> fine.  I hope that wish could still be fair.  But I don't know well on
> block layer to say anything meaningful.
>
> And when I was looking at that, I found more things spread all over the
> place on disk activation.  I decided to clean all of them up, while
> hopefully fixing the QEMU crash [1] too.
>
> For this v2, I did some more tests, I want to make sure all the past paths
> keep working at least on failure or cancel races, also in postcopy failure
> cases.  So I did below and they all run pass (when I said "emulated" below,
> I meant I hacked something to trigger those race / rare failures, because
> they aren't easy to trigger with vanilla binary):
>
> - Tested generic migrate_cancel during precopy, disk activation won't be
>   affected.  Disk status reports correct values in tracepoints.
>
> - Test Fabiano's ping-pong migration tests on PAUSED state VM.
>
> - Emulated precopy failure before sending non-iterable, disk inactivation
>   won't happen, and also activation won't trigger after migration cleanups
>   (even if activation on top of activate disk is benign, I checked traces
>   to make sure it'll provide consistent disk status, skipping activation).
>
> - Emulated precopy failure right after sending non-iterable. Disks will be
>   inactivated, but then can be reactivated properly before VM starts.
>
> - Emulated postcopy failure when sending the packed data (which is after
>   disk invalidated), and making sure src VM will get back to live properly,
>   re-activate the disks before starting.
>
> - Emulated concurrent migrate_cancel at the end of migration_completion()
>   of precopy, after disk inactivated.  Disks can be reactivated properly.
>
>   NOTE: here if dest QEMU didn't quit before migrate_cancel,
>   bdrv_activate_all() can crash src QEMU.  This behavior should be the same
>   before/after this patch.
>
> Comments welcomed, thanks.
>
> [1] https://gitlab.com/qemu-project/qemu/-/issues/2395
>
> Peter Xu (6):
>   migration: Add helper to get target runstate
>   qmp/cont: Only activate disks if migration completed
>   migration/block: Make late-block-active the default
>   migration/block: Apply late-block-active behavior to postcopy
>   migration/block: Fix possible race with block_inactive
>   migration/block: Rewrite disk activation
>
>  include/migration/misc.h |   4 ++
>  migration/migration.h    |   6 +-
>  migration/block-active.c |  94 +++++++++++++++++++++++++++
>  migration/colo.c         |   2 +-
>  migration/migration.c    | 136 +++++++++++++++------------------------
>  migration/savevm.c       |  46 ++++++-------
>  monitor/qmp-cmds.c       |  22 +++----
>  migration/meson.build    |   1 +
>  migration/trace-events   |   3 +
>  9 files changed, 188 insertions(+), 126 deletions(-)
>  create mode 100644 migration/block-active.c

Queued, thanks!