diff mbox series

[4/4] multifd: reset next_packet_len after sending pages

Message ID 20230922065625.21848-5-elena.ufimtseva@oracle.com (mailing list archive)
State New, archived
Headers show
Series multifd: various fixes | expand

Commit Message

Elena Ufimtseva Sept. 22, 2023, 6:56 a.m. UTC
Sometimes multifd sends just sync packet with no pages
(normal_num is 0). In this case the old value is being
preserved and being accounted for while only packet_len
is being transferred.
Reset it to 0 after sending and accounting for.

TODO: Fix the same packet ids in the stream.
with this patch, there is still an issue with the duplicated
packets ids being sent (with different number of pages/flags).
See in below multifd_send trace (before this change):
multifd_send 394.774 pid=55477 id=0x1 packet_num=0x6f0 normal=0x57 flags=0x1 next_packet_size=0x57000
multifd_send 181.244 pid=55477 id=0x1 packet_num=0x6f0 normal=0x0 flags=0x0 next_packet_size=0x57000

With this commit there are still duplicated packets, but since no pages
are being sent with sync flag set, next_packet_size is 0:
multifd_send 27.814 pid=18602 id=0x1 packet_num=0x574 normal=0x7b flags=0x1 next_packet_size=0x7b000
multifd_send 136054.792 pid=18602 id=0x1 packet_num=0x574 normal=0x0 flags=0x0 next_packet_size=0x0
If there is a suggestion how to fix this properly, I will be
glad to use it.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
---
 migration/multifd.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Fabiano Rosas Sept. 22, 2023, 6:32 p.m. UTC | #1
Elena Ufimtseva <elena.ufimtseva@oracle.com> writes:

> Sometimes multifd sends just sync packet with no pages
> (normal_num is 0). In this case the old value is being
> preserved and being accounted for while only packet_len
> is being transferred.
> Reset it to 0 after sending and accounting for.
>
> TODO: Fix the same packet ids in the stream.
> with this patch, there is still an issue with the duplicated
> packets ids being sent (with different number of pages/flags).
> See in below multifd_send trace (before this change):
> multifd_send 394.774 pid=55477 id=0x1 packet_num=0x6f0 normal=0x57 flags=0x1 next_packet_size=0x57000
> multifd_send 181.244 pid=55477 id=0x1 packet_num=0x6f0 normal=0x0 flags=0x0 next_packet_size=0x57000
>
> With this commit there are still duplicated packets, but since no pages
> are being sent with sync flag set, next_packet_size is 0:
> multifd_send 27.814 pid=18602 id=0x1 packet_num=0x574 normal=0x7b flags=0x1 next_packet_size=0x7b000
> multifd_send 136054.792 pid=18602 id=0x1 packet_num=0x574 normal=0x0 flags=0x0 next_packet_size=0x0
> If there is a suggestion how to fix this properly, I will be
> glad to use it.
>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> ---
>  migration/multifd.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 3281397b18..8b4e26051b 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -730,6 +730,7 @@ static void *multifd_send_thread(void *opaque)
>                         p->next_packet_size + p->packet_len);
>              stat64_add(&mig_stats.transferred,
>                         p->next_packet_size + p->packet_len);
> +            p->next_packet_size = 0;
>              qemu_mutex_lock(&p->mutex);
>              p->pending_job--;
>              qemu_mutex_unlock(&p->mutex);

Reviewed-by: Fabiano Rosas <farosas@suse.de>
Fabiano Rosas Sept. 22, 2023, 6:44 p.m. UTC | #2
Elena Ufimtseva <elena.ufimtseva@oracle.com> writes:

> Sometimes multifd sends just sync packet with no pages
> (normal_num is 0). In this case the old value is being
> preserved and being accounted for while only packet_len
> is being transferred.
> Reset it to 0 after sending and accounting for.
>

Usually you'd finish your commit message here with your Signed off tag
and the comments below would be included after the following ---
sign. That way we can merge this patch without bring the unrelated
discussion along.

> TODO: Fix the same packet ids in the stream.
> with this patch, there is still an issue with the duplicated
> packets ids being sent (with different number of pages/flags).
> See in below multifd_send trace (before this change):
> multifd_send 394.774 pid=55477 id=0x1 packet_num=0x6f0 normal=0x57 flags=0x1 next_packet_size=0x57000
> multifd_send 181.244 pid=55477 id=0x1 packet_num=0x6f0 normal=0x0 flags=0x0 next_packet_size=0x57000
>
> With this commit there are still duplicated packets, but since no pages
> are being sent with sync flag set, next_packet_size is 0:
> multifd_send 27.814 pid=18602 id=0x1 packet_num=0x574 normal=0x7b flags=0x1 next_packet_size=0x7b000
> multifd_send 136054.792 pid=18602 id=0x1 packet_num=0x574 normal=0x0 flags=0x0 next_packet_size=0x0
> If there is a suggestion how to fix this properly, I will be
> glad to use it.

What is going on here? Is it that we're incrementing the
multifd_send_state->packet_num under different locks? Because p->mutex
is per-channel? You could try turning that into an atomic operation
instead.
diff mbox series

Patch

diff --git a/migration/multifd.c b/migration/multifd.c
index 3281397b18..8b4e26051b 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -730,6 +730,7 @@  static void *multifd_send_thread(void *opaque)
                        p->next_packet_size + p->packet_len);
             stat64_add(&mig_stats.transferred,
                        p->next_packet_size + p->packet_len);
+            p->next_packet_size = 0;
             qemu_mutex_lock(&p->mutex);
             p->pending_job--;
             qemu_mutex_unlock(&p->mutex);