diff mbox series

[1/2] migration/multifd: Fix compat with QEMU < 9.0

Message ID 20241213160120.23880-2-farosas@suse.de (mailing list archive)
State New
Headers show
Series migration: Fix regressions | expand

Commit Message

Fabiano Rosas Dec. 13, 2024, 4:01 p.m. UTC
Commit f5f48a7891 ("migration/multifd: Separate SYNC request with
normal jobs") changed the multifd source side to stop sending data
along with the MULTIFD_FLAG_SYNC, effectively introducing the concept
of a SYNC-only packet. Relying on that, commit d7e58f412c
("migration/multifd: Don't send ram data during SYNC") later came
along and skipped reading data from SYNC packets.

In a versions timeline like this:

  8.2 f5f48a7 9.0 9.1 d7e58f41 9.2

The issue arises that QEMUs < 9.0 still send data along with SYNC, but
QEMUs > 9.1 don't gather that data anymore. This leads to various
kinds of migration failures due to desync/missing data.

Stop checking for a SYNC packet on the destination and unconditionally
unfill the packet.

From now on:

old -> new:
the source sends data + sync, destination reads normally

new -> new:
source sends only sync, destination reads zeros

new -> old:
source sends only sync, destination reads zeros

CC: qemu-stable@nongnu.org
Fixes: d7e58f412c ("migration/multifd: Don't send ram data during SYNC")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2720
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

Comments

Peter Xu Dec. 16, 2024, 4:02 p.m. UTC | #1
On Fri, Dec 13, 2024 at 01:01:19PM -0300, Fabiano Rosas wrote:
> Commit f5f48a7891 ("migration/multifd: Separate SYNC request with
> normal jobs") changed the multifd source side to stop sending data
> along with the MULTIFD_FLAG_SYNC, effectively introducing the concept
> of a SYNC-only packet. Relying on that, commit d7e58f412c
> ("migration/multifd: Don't send ram data during SYNC") later came
> along and skipped reading data from SYNC packets.
> 
> In a versions timeline like this:
> 
>   8.2 f5f48a7 9.0 9.1 d7e58f41 9.2
> 
> The issue arises that QEMUs < 9.0 still send data along with SYNC, but
> QEMUs > 9.1 don't gather that data anymore. This leads to various
> kinds of migration failures due to desync/missing data.
> 
> Stop checking for a SYNC packet on the destination and unconditionally
> unfill the packet.
> 
> From now on:
> 
> old -> new:
> the source sends data + sync, destination reads normally
> 
> new -> new:
> source sends only sync, destination reads zeros
> 
> new -> old:
> source sends only sync, destination reads zeros
> 
> CC: qemu-stable@nongnu.org
> Fixes: d7e58f412c ("migration/multifd: Don't send ram data during SYNC")
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2720
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Peter Xu <peterx@redhat.com>
diff mbox series

Patch

diff --git a/migration/multifd.c b/migration/multifd.c
index 498e71fd10..8d0a763a72 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -252,9 +252,8 @@  static int multifd_recv_unfill_packet(MultiFDRecvParams *p, Error **errp)
     p->packet_num = be64_to_cpu(packet->packet_num);
     p->packets_recved++;
 
-    if (!(p->flags & MULTIFD_FLAG_SYNC)) {
-        ret = multifd_ram_unfill_packet(p, errp);
-    }
+    /* Always unfill, old QEMUs (<9.0) send data along with SYNC */
+    ret = multifd_ram_unfill_packet(p, errp);
 
     trace_multifd_recv_unfill(p->id, p->packet_num, p->flags,
                               p->next_packet_size);
@@ -1151,9 +1150,13 @@  static void *multifd_recv_thread(void *opaque)
             flags = p->flags;
             /* recv methods don't know how to handle the SYNC flag */
             p->flags &= ~MULTIFD_FLAG_SYNC;
-            if (!(flags & MULTIFD_FLAG_SYNC)) {
-                has_data = p->normal_num || p->zero_num;
-            }
+
+            /*
+             * Even if it's a SYNC packet, this needs to be set
+             * because older QEMUs (<9.0) still send data along with
+             * the SYNC packet.
+             */
+            has_data = p->normal_num || p->zero_num;
             qemu_mutex_unlock(&p->mutex);
         } else {
             /*