diff mbox series

[2/2] migration/multifd: Fix multifd_send_setup cleanup when channel creation fails

Message ID 20240801174101.31806-3-farosas@suse.de (mailing list archive)
State New, archived
Headers show
Series Multifd fixes | expand

Commit Message

Fabiano Rosas Aug. 1, 2024, 5:41 p.m. UTC
When a channel fails to create, the code currently just returns. This
is wrong for two reasons:

1) Channel n+1 will not get to initialize it's semaphores, leading to
   an assert when terminate_threads tries to post to it:

 qemu-system-x86_64: ../util/qemu-thread-posix.c:92:
 qemu_mutex_lock_impl: Assertion `mutex->initialized' failed.

2) (theoretical) If channel n-1 already started creation it will
   defeat the purpose of the channels_created logic which is in place
   to avoid migrate_fd_cleanup() to run while channels are still being
   created.

   This cannot really happen today because the current failure cases
   for multifd_new_send_channel_create() are all synchronous,
   resulting from qio_channel_file_new_path() getting a bad
   filename. This would hit all channels equally.

   But I don't want to set a trap for future people, so have all
   channels try to create (even if failing), and only fail after the
   channels_created semaphore has been posted.

While here, remove the error_report_err call. There's one already at
migrate_fd_cleanup later on.

Cc: qemu-stable@nongnu.org
Reported-by: Jim Fehlig <jfehlig@suse.com>
Fixes: bd8b0a8f82 ("migration/multifd: Move multifd_send_setup error handling in to the function")
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/multifd.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

Comments

Peter Xu Aug. 1, 2024, 6:38 p.m. UTC | #1
On Thu, Aug 01, 2024 at 02:41:01PM -0300, Fabiano Rosas wrote:
> When a channel fails to create, the code currently just returns. This
> is wrong for two reasons:
> 
> 1) Channel n+1 will not get to initialize it's semaphores, leading to
>    an assert when terminate_threads tries to post to it:
> 
>  qemu-system-x86_64: ../util/qemu-thread-posix.c:92:
>  qemu_mutex_lock_impl: Assertion `mutex->initialized' failed.
> 
> 2) (theoretical) If channel n-1 already started creation it will
>    defeat the purpose of the channels_created logic which is in place
>    to avoid migrate_fd_cleanup() to run while channels are still being
>    created.
> 
>    This cannot really happen today because the current failure cases
>    for multifd_new_send_channel_create() are all synchronous,
>    resulting from qio_channel_file_new_path() getting a bad
>    filename. This would hit all channels equally.
> 
>    But I don't want to set a trap for future people, so have all
>    channels try to create (even if failing), and only fail after the
>    channels_created semaphore has been posted.
> 
> While here, remove the error_report_err call. There's one already at
> migrate_fd_cleanup later on.
> 
> Cc: qemu-stable@nongnu.org
> Reported-by: Jim Fehlig <jfehlig@suse.com>
> Fixes: bd8b0a8f82 ("migration/multifd: Move multifd_send_setup error handling in to the function")

Should it be this one instead?

b7b03eb614 ("migration/multifd: Add outgoing QIOChannelFile support")

> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Reviewed-by: Peter Xu <peterx@redhat.com>

PS: what's your plan on your other multifd SendData series?  I got a bit
overloaded on downstream stuff and I still have plenty review debts
recently (CPR one of them.. needs follow ups), so just to say I may delay a
bit on reading that one.  I assume it's next-release stuff anyway, but let
me know otherwise.

Thanks,
Fabiano Rosas Aug. 1, 2024, 7:14 p.m. UTC | #2
Peter Xu <peterx@redhat.com> writes:

> On Thu, Aug 01, 2024 at 02:41:01PM -0300, Fabiano Rosas wrote:
>> When a channel fails to create, the code currently just returns. This
>> is wrong for two reasons:
>> 
>> 1) Channel n+1 will not get to initialize it's semaphores, leading to
>>    an assert when terminate_threads tries to post to it:
>> 
>>  qemu-system-x86_64: ../util/qemu-thread-posix.c:92:
>>  qemu_mutex_lock_impl: Assertion `mutex->initialized' failed.
>> 
>> 2) (theoretical) If channel n-1 already started creation it will
>>    defeat the purpose of the channels_created logic which is in place
>>    to avoid migrate_fd_cleanup() to run while channels are still being
>>    created.
>> 
>>    This cannot really happen today because the current failure cases
>>    for multifd_new_send_channel_create() are all synchronous,
>>    resulting from qio_channel_file_new_path() getting a bad
>>    filename. This would hit all channels equally.
>> 
>>    But I don't want to set a trap for future people, so have all
>>    channels try to create (even if failing), and only fail after the
>>    channels_created semaphore has been posted.
>> 
>> While here, remove the error_report_err call. There's one already at
>> migrate_fd_cleanup later on.
>> 
>> Cc: qemu-stable@nongnu.org
>> Reported-by: Jim Fehlig <jfehlig@suse.com>
>> Fixes: bd8b0a8f82 ("migration/multifd: Move multifd_send_setup error handling in to the function")
>
> Should it be this one instead?
>
> b7b03eb614 ("migration/multifd: Add outgoing QIOChannelFile support")

Yep, thanks. I'll fix it up.

>
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
>
> PS: what's your plan on your other multifd SendData series?  I got a bit
> overloaded on downstream stuff and I still have plenty review debts
> recently (CPR one of them.. needs follow ups), so just to say I may delay a
> bit on reading that one.  I assume it's next-release stuff anyway, but let
> me know otherwise.

That one is pretty ready. From my side I don't intend to change anything
else, save for review comments. And it's definitely 9.2 material.

I think CPR is more important at this point because it's been lagging
behind for a while.

I have a PR to send with these fixes and catch up on that virtio-net
discussion. After that I should be able to get some reviews done.

>
> Thanks,
diff mbox series

Patch

diff --git a/migration/multifd.c b/migration/multifd.c
index 0b4cbaddfe..552f9723c8 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -1156,7 +1156,6 @@  static bool multifd_new_send_channel_create(gpointer opaque, Error **errp)
 bool multifd_send_setup(void)
 {
     MigrationState *s = migrate_get_current();
-    Error *local_err = NULL;
     int thread_count, ret = 0;
     uint32_t page_count = MULTIFD_PACKET_SIZE / qemu_target_page_size();
     bool use_packets = multifd_use_packets();
@@ -1177,6 +1176,7 @@  bool multifd_send_setup(void)
 
     for (i = 0; i < thread_count; i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
+        Error *local_err = NULL;
 
         qemu_sem_init(&p->sem, 0);
         qemu_sem_init(&p->sem_sync, 0);
@@ -1196,7 +1196,8 @@  bool multifd_send_setup(void)
         p->write_flags = 0;
 
         if (!multifd_new_send_channel_create(p, &local_err)) {
-            return false;
+            migrate_set_error(s, local_err);
+            ret = -1;
         }
     }
 
@@ -1209,24 +1210,27 @@  bool multifd_send_setup(void)
         qemu_sem_wait(&multifd_send_state->channels_created);
     }
 
+    if (ret) {
+        goto err;
+    }
+
     for (i = 0; i < thread_count; i++) {
         MultiFDSendParams *p = &multifd_send_state->params[i];
+        Error *local_err = NULL;
 
         ret = multifd_send_state->ops->send_setup(p, &local_err);
         if (ret) {
-            break;
+            migrate_set_error(s, local_err);
+            goto err;
         }
     }
 
-    if (ret) {
-        migrate_set_error(s, local_err);
-        error_report_err(local_err);
-        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
-                          MIGRATION_STATUS_FAILED);
-        return false;
-    }
-
     return true;
+
+err:
+    migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
+                      MIGRATION_STATUS_FAILED);
+    return false;
 }
 
 bool multifd_recv(void)