mbox series

[v2,0/3] Allow to enable multifd and postcopy migration together

Message ID 20241129122256.96778-1-ppandit@redhat.com (mailing list archive)
Headers show
Series Allow to enable multifd and postcopy migration together | expand

Message

Prasad Pandit Nov. 29, 2024, 12:22 p.m. UTC
From: Prasad Pandit <pjp@fedoraproject.org>

Hello,

* Currently Multifd and Postcopy migration can not be used together.
  QEMU shows "Postcopy is not yet compatible with multifd" message.

  When migrating guests with large (100's GB) RAM, Multifd threads
  help to accelerate migration, but inability to use it with the
  Postcopy mode delays guest start up on the destination side.

* This patch series allows to enable both Multifd and Postcopy
  migration together. Precopy and Multifd threads work during
  the initial guest (RAM) transfer. When migration moves to the
  Postcopy phase, Multifd threads are restrained and the Postcopy
  threads start to request pages from the source side.

* This series removes magic value (4-bytes) introduced in the
  previous series for the Postcopy channel. And refactoring of
  the 'ram_save_target_page' function is made independent of
  the multifd & postcopy change.

  v1: https://lore.kernel.org/qemu-devel/20241126115748.118683-1-ppandit@redhat.com/T/#u
  v0: https://lore.kernel.org/qemu-devel/20241029150908.1136894-1-ppandit@redhat.com/T/#u


Thank you.
---
Prasad Pandit (3):
  migration/multifd: move macros to multifd header
  migration: refactor ram_save_target_page functions
  migration: enable multifd and postcopy together

 migration/migration.c      | 90 +++++++++++++++++++++++---------------
 migration/multifd-nocomp.c |  3 +-
 migration/multifd.c        |  5 ---
 migration/multifd.h        |  5 +++
 migration/options.c        |  5 ---
 migration/ram.c            | 73 +++++++++----------------------
 6 files changed, 82 insertions(+), 99 deletions(-)

Comments

Peter Xu Nov. 29, 2024, 4:45 p.m. UTC | #1
On Fri, Nov 29, 2024 at 05:52:53PM +0530, Prasad Pandit wrote:
> From: Prasad Pandit <pjp@fedoraproject.org>
> 
> Hello,
> 
> * Currently Multifd and Postcopy migration can not be used together.
>   QEMU shows "Postcopy is not yet compatible with multifd" message.
> 
>   When migrating guests with large (100's GB) RAM, Multifd threads
>   help to accelerate migration, but inability to use it with the
>   Postcopy mode delays guest start up on the destination side.
> 
> * This patch series allows to enable both Multifd and Postcopy
>   migration together. Precopy and Multifd threads work during
>   the initial guest (RAM) transfer. When migration moves to the
>   Postcopy phase, Multifd threads are restrained and the Postcopy
>   threads start to request pages from the source side.
> 
> * This series removes magic value (4-bytes) introduced in the
>   previous series for the Postcopy channel. And refactoring of
>   the 'ram_save_target_page' function is made independent of
>   the multifd & postcopy change.
> 
>   v1: https://lore.kernel.org/qemu-devel/20241126115748.118683-1-ppandit@redhat.com/T/#u
>   v0: https://lore.kernel.org/qemu-devel/20241029150908.1136894-1-ppandit@redhat.com/T/#u
> 
> 
> Thank you.
> ---
> Prasad Pandit (3):
>   migration/multifd: move macros to multifd header
>   migration: refactor ram_save_target_page functions
>   migration: enable multifd and postcopy together

Prasad,

I saw that there's still discussion in the previous version, while this
cover letter doesn't mention why it was ignored.  Especially, at least to
me, what Fabiano commented on simplifying the flush condition check makes
senes to me to be addressed.

Please cherish reviewer's feedback and time contributed, and let's finish
the disucssion first before rushing for a new version.  I'll try to join
the discussion later too there.

Meanwhile, before I read into any details, I found that all the tests I
requested are still missing.  Would you please consider adding them?

My previous comment regarding to test is here:

https://lore.kernel.org/qemu-devel/ZykJBq7ME5jgSzCA@x1n/

I listed exactly the minimum set of tests that I think we should have.

In general, any migration new feature must have both doc updates and test
coverage as long as applicable.

Multifd still has its doc missing, which is unfortunate.  We could have one
doc prior to this work, but it can also be done later.

OTOH on testing: this is not a new feature alone, but it's more dangerous
than a new feature due to what I mentioned before, that postcopy needs
atomicity on page movements, and multifd is completely against that from
design POV due to NIC drivers being able to modify guest pages directly.

It means multifd+postcopy bugs will be extremely hard to debug if we don't
put it right first.  So please be serious on the test coverage on this
work.

Thanks,