mbox series

[v2,00/10] migration: New postcopy state, and some cleanups

Message ID 20240617181534.1425179-1-peterx@redhat.com (mailing list archive)
Headers show
Series migration: New postcopy state, and some cleanups | expand

Message

Peter Xu June 17, 2024, 6:15 p.m. UTC
v2:
- Collect tags
- Patch 3
  - cover all states in migration_postcopy_is_alive()
- Patch 4 (old)
  - English changes [Fabiano]
  - Split the migration_incoming_state_setup() cleanup into a new patch
    [Fabiano]
  - Drop RECOVER_SETUP in fill_destination_migration_info() [Fabiano]
  - Keep using explicit state check in migrate_fd_connect() for resume
    [Fabiano]
- New patches
  - New doc update: "migration/docs: Update postcopy recover session for
    SETUP phase"
  - New test case: last four patches

v1: https://lore.kernel.org/r/20240612144228.1179240-1-peterx@redhat.com

The major goal of this patchset is patch 5, which introduced a new postcopy
state so that we will send an event in postcopy reconnect failures that
Libvirt would prefer to have.  There's more information for that issue in
the commit message alone.

Patch 1-2 are cleanups that are not directly relevant but I found/stored
that could be good to have.  I made it simple by putting them together in
one thread to make patch management easier, but I can send them separately
when necessary.

Patch 3 is also a cleanup, but will be needed for patch 4 as dependency.

Patch 4-5 is the core patches.

Patch 6 updates doc for the new state.

Patch 7-10 adds a new test for the new state.

Comments welcomed, thanks.

CI: https://gitlab.com/peterx/qemu/-/pipelines/1335604588
    (check-dco & check-patch fail to git-fetch, but doesn't look relevant)

Peter Xu (10):
  migration/multifd: Avoid the final FLUSH in complete()
  migration: Rename thread debug names
  migration: Use MigrationStatus instead of int
  migration: Cleanup incoming migration setup state change
  migration/postcopy: Add postcopy-recover-setup phase
  migration/docs: Update postcopy recover session for SETUP phase
  tests/migration-tests: Drop most WIN32 ifdefs for postcopy failure
    tests
  tests/migration-tests: Always enable migration events
  tests/migration-tests: Verify postcopy-recover-setup status
  tests/migration-tests: Cover postcopy failure on reconnect

 docs/devel/migration/postcopy.rst |  31 +++++----
 qapi/migration.json               |   4 ++
 migration/migration.h             |   9 +--
 migration/postcopy-ram.h          |   3 +
 tests/qtest/migration-helpers.h   |   2 +
 migration/colo.c                  |   2 +-
 migration/migration.c             |  98 ++++++++++++++++++--------
 migration/multifd.c               |   6 +-
 migration/postcopy-ram.c          |  10 ++-
 migration/ram.c                   |   4 --
 migration/savevm.c                |   6 +-
 tests/qtest/migration-helpers.c   |  20 ++++++
 tests/qtest/migration-test.c      | 110 ++++++++++++++++++++++++------
 13 files changed, 223 insertions(+), 82 deletions(-)

Comments

Peter Xu June 17, 2024, 7:34 p.m. UTC | #1
Hello,

On Mon, Jun 17, 2024 at 02:15:24PM -0400, Peter Xu wrote:
> v2:
> - Collect tags
> - Patch 3
>   - cover all states in migration_postcopy_is_alive()
> - Patch 4 (old)
>   - English changes [Fabiano]
>   - Split the migration_incoming_state_setup() cleanup into a new patch
>     [Fabiano]
>   - Drop RECOVER_SETUP in fill_destination_migration_info() [Fabiano]
>   - Keep using explicit state check in migrate_fd_connect() for resume
>     [Fabiano]
> - New patches
>   - New doc update: "migration/docs: Update postcopy recover session for
>     SETUP phase"
>   - New test case: last four patches

I just found that this won't apply on top of latest master, and also has a
trivial conflict against the direct-io stuffs.  Fabiano, I'll wait for a
few days on comments, and resend v3 on top of your direct-io stuff.

Meanwhile I also plan to squash below fixup to the last test patch, just to
fix up a spelling error I just found, and also renamed the test cases (as
the new test is actually also a "double failure" test, just at different
phase).  Comments welcomed for that fixup even before a repost.

===8<===
From 5b8fbc3a9d9e87ebfef1a3e5592fd196eecd5923 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Mon, 17 Jun 2024 14:40:15 -0400
Subject: [PATCH] fixup! tests/migration-tests: Cover postcopy failure on
 reconnect

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tests/qtest/migration-test.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index a4fed4cc6b..fe33b86783 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1474,7 +1474,7 @@ static void postcopy_recover_fail(QTestState *from, QTestState *to,
 
     /*
      * Kick dest QEMU out too. This is normally not needed in reality
-     * because when the channel is shutdown it should also happens on src.
+     * because when the channel is shutdown it should also happen on src.
      * However here we used separate socket pairs so we need to do that
      * explicitly.
      */
@@ -1565,7 +1565,7 @@ static void test_postcopy_recovery(void)
     test_postcopy_recovery_common(&args);
 }
 
-static void test_postcopy_recovery_double_fail(void)
+static void test_postcopy_recovery_fail_handshake(void)
 {
     MigrateCommon args = {
         .postcopy_recovery_fail_stage = POSTCOPY_FAIL_RECOVERY,
@@ -1574,7 +1574,7 @@ static void test_postcopy_recovery_double_fail(void)
     test_postcopy_recovery_common(&args);
 }
 
-static void test_postcopy_recovery_channel_reconnect(void)
+static void test_postcopy_recovery_fail_reconnect(void)
 {
     MigrateCommon args = {
         .postcopy_recovery_fail_stage = POSTCOPY_FAIL_CHANNEL_ESTABLISH,
@@ -3759,10 +3759,10 @@ int main(int argc, char **argv)
                            test_postcopy_preempt);
         migration_test_add("/migration/postcopy/preempt/recovery/plain",
                            test_postcopy_preempt_recovery);
-        migration_test_add("/migration/postcopy/recovery/double-failures",
-                           test_postcopy_recovery_double_fail);
-        migration_test_add("/migration/postcopy/recovery/channel-reconnect",
-                           test_postcopy_recovery_channel_reconnect);
+        migration_test_add("/migration/postcopy/recovery/double-failures/handshake",
+                           test_postcopy_recovery_fail_handshake);
+        migration_test_add("/migration/postcopy/recovery/double-failures/reconnect",
+                           test_postcopy_recovery_fail_reconnect);
         if (is_x86) {
             migration_test_add("/migration/postcopy/suspend",
                                test_postcopy_suspend);
Fabiano Rosas June 17, 2024, 8:12 p.m. UTC | #2
Peter Xu <peterx@redhat.com> writes:

> Hello,
>
> On Mon, Jun 17, 2024 at 02:15:24PM -0400, Peter Xu wrote:
>> v2:
>> - Collect tags
>> - Patch 3
>>   - cover all states in migration_postcopy_is_alive()
>> - Patch 4 (old)
>>   - English changes [Fabiano]
>>   - Split the migration_incoming_state_setup() cleanup into a new patch
>>     [Fabiano]
>>   - Drop RECOVER_SETUP in fill_destination_migration_info() [Fabiano]
>>   - Keep using explicit state check in migrate_fd_connect() for resume
>>     [Fabiano]
>> - New patches
>>   - New doc update: "migration/docs: Update postcopy recover session for
>>     SETUP phase"
>>   - New test case: last four patches
>
> I just found that this won't apply on top of latest master, and also has a
> trivial conflict against the direct-io stuffs.  Fabiano, I'll wait for a
> few days on comments, and resend v3 on top of your direct-io stuff.
>
> Meanwhile I also plan to squash below fixup to the last test patch, just to
> fix up a spelling error I just found, and also renamed the test cases (as
> the new test is actually also a "double failure" test, just at different
> phase).  Comments welcomed for that fixup even before a repost.
>
> ===8<===
> From 5b8fbc3a9d9e87ebfef1a3e5592fd196eecd5923 Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Mon, 17 Jun 2024 14:40:15 -0400
> Subject: [PATCH] fixup! tests/migration-tests: Cover postcopy failure on
>  reconnect
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  tests/qtest/migration-test.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index a4fed4cc6b..fe33b86783 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -1474,7 +1474,7 @@ static void postcopy_recover_fail(QTestState *from, QTestState *to,
>  
>      /*
>       * Kick dest QEMU out too. This is normally not needed in reality
> -     * because when the channel is shutdown it should also happens on src.
> +     * because when the channel is shutdown it should also happen on src.
>       * However here we used separate socket pairs so we need to do that
>       * explicitly.
>       */
> @@ -1565,7 +1565,7 @@ static void test_postcopy_recovery(void)
>      test_postcopy_recovery_common(&args);
>  }
>  
> -static void test_postcopy_recovery_double_fail(void)
> +static void test_postcopy_recovery_fail_handshake(void)
>  {
>      MigrateCommon args = {
>          .postcopy_recovery_fail_stage = POSTCOPY_FAIL_RECOVERY,
> @@ -1574,7 +1574,7 @@ static void test_postcopy_recovery_double_fail(void)
>      test_postcopy_recovery_common(&args);
>  }
>  
> -static void test_postcopy_recovery_channel_reconnect(void)
> +static void test_postcopy_recovery_fail_reconnect(void)
>  {
>      MigrateCommon args = {
>          .postcopy_recovery_fail_stage = POSTCOPY_FAIL_CHANNEL_ESTABLISH,
> @@ -3759,10 +3759,10 @@ int main(int argc, char **argv)
>                             test_postcopy_preempt);
>          migration_test_add("/migration/postcopy/preempt/recovery/plain",
>                             test_postcopy_preempt_recovery);
> -        migration_test_add("/migration/postcopy/recovery/double-failures",
> -                           test_postcopy_recovery_double_fail);
> -        migration_test_add("/migration/postcopy/recovery/channel-reconnect",
> -                           test_postcopy_recovery_channel_reconnect);
> +        migration_test_add("/migration/postcopy/recovery/double-failures/handshake",
> +                           test_postcopy_recovery_fail_handshake);
> +        migration_test_add("/migration/postcopy/recovery/double-failures/reconnect",
> +                           test_postcopy_recovery_fail_reconnect);
>          if (is_x86) {
>              migration_test_add("/migration/postcopy/suspend",
>                                 test_postcopy_suspend);
> -- 
> 2.45.0

LGTM