mbox series

[00/16] migration: Switchover phase refactoring

Message ID 20250114230746.3268797-1-peterx@redhat.com (mailing list archive)
Headers show
Series migration: Switchover phase refactoring | expand

Message

Peter Xu Jan. 14, 2025, 11:07 p.m. UTC
CI: https://gitlab.com/peterx/qemu/-/pipelines/1625266692
    (note: warning is present on rust stuff, but shouldn't be relevant)

This series refactors the migration switchover path quite a bit.  I started
this work initially to measure the JSON writer overhead, but then I decided
to cleanup the switchover path in general when I am at it altogether, as I
wanted to do this for a long time.

A few major things I tried to do:

  - About the JSON writer

    Currently, precopy migration always dumps a chunk of data called VM
    description (QEMU_VM_VMDESCRIPTION) for debugging purpose.  That is a
    JSON blob explaining all the vmstates dumped in the migration stream.
    QEMU has a machine property suppress-vmdesc deciding whether migration
    will have that JSON chunk included.

    Postcopy does not have such JSON dump because postcopy is live session
    and it can't normally be debugged from stream level (e.g. as a streamed
    file).

    A tiny problem is we don't yet have a clue on how much cpu cycles we
    need to construct and dump these JSONs even if they're only for
    debugging, and even if suppress-vmdesc=on QEMU will still try to
    construct these JSONs (e.g. also for postcopy).

    This series has a few patches just to make sure the JSON blob won't be
    constructed if not needed (either postcopy, or suppress-vmdesc=on).  I
    tried to measure the downtime diff with/without these changes, the time
    QEMU takes to construct / dump the JSON blob is still not measurable.
    So I suppose unconditionally having this is ok.  Said that, let's still
    have these changes around so we avoid JSON operations if not needed.

  - DEVICE migration state

    QEMU has a very special DEVICE migration state, that only happens with
    precopy, and only when pause-before-switchover capability is enabled.
    Due to that specialty we can't merge precopy and postcopy code on
    switchover starts, because the state machine will be different.

    However after I checked the history and also with libvirt developers,
    this seems unnecessary.  So I had one patch making DEVICE state to be
    the "switchover" phase for precopy/postcopy unconditionally.  That will
    make the state machine much easier for both modes, meanwhile nothing is
    expected to break with it (but please still shoot if anyone knows /
    suspect something will, or could, break..).

  - General cleanups and fixes

    Most of the rest changes are random cleanups and fixes in the
    switchover path.

    E.g., postcopy_start() has some code that isn't easy to read due to
    some special flags here and there, mostly around the two calls of
    qemu_savevm_state_complete_precopy().  This series will remove most of
    those special treatments here and there.

    We could have done something twice in the past in postcopy switchover
    (e.g. I believe we sync CPU twice.. but only happens with postcopy),
    now they should all be sorted out.

    And quite some other things hopefully can be separately discussed and
    justified in each patch.  After these cleanups, we will be able to have
    an unified entrance for precopy/postcopy on switchover.

Initially I thought this could optimize the downtime slightly, but after
some tests, it turns out there's no measureable difference, at least in my
current setup... So let's take this as a cleanup series at least for now,
and I hope they would still make some sense.  Comments welcomed.

Thanks,

Peter Xu (16):
  migration: Remove postcopy implications in should_send_vmdesc()
  migration: Do not construct JSON description if suppressed
  migration: Optimize postcopy on downtime by avoiding JSON writer
  migration: Avoid two src-downtime-end tracepoints for postcopy
  migration: Drop inactivate_disk param in qemu_savevm_state_complete*
  migration: Synchronize all CPU states only for non-iterable dump
  migration: Adjust postcopy bandwidth during switchover
  migration: Adjust locking in migration_maybe_pause()
  migration: Drop cached migration state in migration_maybe_pause()
  migration: Take BQL slightly longer in postcopy_start()
  migration: Notify COMPLETE once for postcopy
  migration: Unwrap qemu_savevm_state_complete_precopy() in postcopy
  migration: Cleanup qemu_savevm_state_complete_precopy()
  migration: Always set DEVICE state
  migration: Merge precopy/postcopy on switchover start
  migration: Trivial cleanup on JSON writer of vmstate_save()

 qapi/migration.json         |   7 +-
 migration/migration.h       |   1 +
 migration/savevm.h          |   6 +-
 migration/migration.c       | 209 +++++++++++++++++++++++-------------
 migration/savevm.c          | 116 ++++++++------------
 migration/vmstate.c         |   6 +-
 tests/qtest/libqos/libqos.c |   3 +-
 migration/trace-events      |   2 +-
 tests/qemu-iotests/194.out  |   1 +
 tests/qemu-iotests/203.out  |   1 +
 tests/qemu-iotests/234.out  |   2 +
 tests/qemu-iotests/262.out  |   1 +
 tests/qemu-iotests/280.out  |   1 +
 13 files changed, 200 insertions(+), 156 deletions(-)

Comments

Jiri Denemark Jan. 15, 2025, 9:12 a.m. UTC | #1
On Tue, Jan 14, 2025 at 18:07:30 -0500, Peter Xu wrote:
> CI: https://gitlab.com/peterx/qemu/-/pipelines/1625266692
>     (note: warning is present on rust stuff, but shouldn't be relevant)
> 
> This series refactors the migration switchover path quite a bit.  I started
> this work initially to measure the JSON writer overhead, but then I decided
> to cleanup the switchover path in general when I am at it altogether, as I
> wanted to do this for a long time.
> 
> A few major things I tried to do:
> 
...
>   - DEVICE migration state
> 
>     QEMU has a very special DEVICE migration state, that only happens with
>     precopy, and only when pause-before-switchover capability is enabled.
>     Due to that specialty we can't merge precopy and postcopy code on
>     switchover starts, because the state machine will be different.
> 
>     However after I checked the history and also with libvirt developers,
>     this seems unnecessary.  So I had one patch making DEVICE state to be
>     the "switchover" phase for precopy/postcopy unconditionally.  That will
>     make the state machine much easier for both modes, meanwhile nothing is
>     expected to break with it (but please still shoot if anyone knows /
>     suspect something will, or could, break..).

No problem from libvirt side...

Tested-by: Jiri Denemark <jdenemar@redhat.com>
Peter Xu Jan. 15, 2025, 12:55 p.m. UTC | #2
On Wed, Jan 15, 2025 at 10:12:45AM +0100, Jiri Denemark wrote:
> On Tue, Jan 14, 2025 at 18:07:30 -0500, Peter Xu wrote:
> > CI: https://gitlab.com/peterx/qemu/-/pipelines/1625266692
> >     (note: warning is present on rust stuff, but shouldn't be relevant)
> > 
> > This series refactors the migration switchover path quite a bit.  I started
> > this work initially to measure the JSON writer overhead, but then I decided
> > to cleanup the switchover path in general when I am at it altogether, as I
> > wanted to do this for a long time.
> > 
> > A few major things I tried to do:
> > 
> ...
> >   - DEVICE migration state
> > 
> >     QEMU has a very special DEVICE migration state, that only happens with
> >     precopy, and only when pause-before-switchover capability is enabled.
> >     Due to that specialty we can't merge precopy and postcopy code on
> >     switchover starts, because the state machine will be different.
> > 
> >     However after I checked the history and also with libvirt developers,
> >     this seems unnecessary.  So I had one patch making DEVICE state to be
> >     the "switchover" phase for precopy/postcopy unconditionally.  That will
> >     make the state machine much easier for both modes, meanwhile nothing is
> >     expected to break with it (but please still shoot if anyone knows /
> >     suspect something will, or could, break..).
> 
> No problem from libvirt side...
> 
> Tested-by: Jiri Denemark <jdenemar@redhat.com>

This is definitely reassuring.. thanks a lot, Jiri!