diff mbox series

[RFC,1/2] migration: Report error in incoming migration

Message ID 20231109165856.15224-2-farosas@suse.de (mailing list archive)
State New, archived
Headers show
Series migration: Fix multifd qemu_mutex_destroy race | expand

Commit Message

Fabiano Rosas Nov. 9, 2023, 4:58 p.m. UTC
We're not currently reporting the errors set with migrate_set_error()
when incoming migration fails.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 migration/migration.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Peter Xu Nov. 9, 2023, 6:57 p.m. UTC | #1
On Thu, Nov 09, 2023 at 01:58:55PM -0300, Fabiano Rosas wrote:
> We're not currently reporting the errors set with migrate_set_error()
> when incoming migration fails.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  migration/migration.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 28a34c9068..cca32c553c 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -698,6 +698,13 @@ process_incoming_migration_co(void *opaque)
>      }
>  
>      if (ret < 0) {
> +        MigrationState *s = migrate_get_current();
> +
> +        if (migrate_has_error(s)) {
> +            WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> +                error_report_err(s->error);
> +            }
> +        }

What's the major benefit of dumping this explicitly?

And this is not relevant to the multifd problem, correct?

>          error_report("load of migration failed: %s", strerror(-ret));
>          goto fail;
>      }
> -- 
> 2.35.3
>
Fabiano Rosas Nov. 10, 2023, 10:58 a.m. UTC | #2
Peter Xu <peterx@redhat.com> writes:

> On Thu, Nov 09, 2023 at 01:58:55PM -0300, Fabiano Rosas wrote:
>> We're not currently reporting the errors set with migrate_set_error()
>> when incoming migration fails.
>> 
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>  migration/migration.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>> 
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 28a34c9068..cca32c553c 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -698,6 +698,13 @@ process_incoming_migration_co(void *opaque)
>>      }
>>  
>>      if (ret < 0) {
>> +        MigrationState *s = migrate_get_current();
>> +
>> +        if (migrate_has_error(s)) {
>> +            WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
>> +                error_report_err(s->error);
>> +            }
>> +        }
>
> What's the major benefit of dumping this explicitly?

This is incoming migration, so there's no centralized error reporting
aside from the useless "load of migration failed: -5". If the code has
not called error_report we just never see the error message.

> And this is not relevant to the multifd problem, correct?

Yes, I'm being sneaky.
Peter Xu Nov. 13, 2023, 4:51 p.m. UTC | #3
On Fri, Nov 10, 2023 at 07:58:00AM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Thu, Nov 09, 2023 at 01:58:55PM -0300, Fabiano Rosas wrote:
> >> We're not currently reporting the errors set with migrate_set_error()
> >> when incoming migration fails.
> >> 
> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >> ---
> >>  migration/migration.c | 7 +++++++
> >>  1 file changed, 7 insertions(+)
> >> 
> >> diff --git a/migration/migration.c b/migration/migration.c
> >> index 28a34c9068..cca32c553c 100644
> >> --- a/migration/migration.c
> >> +++ b/migration/migration.c
> >> @@ -698,6 +698,13 @@ process_incoming_migration_co(void *opaque)
> >>      }
> >>  
> >>      if (ret < 0) {
> >> +        MigrationState *s = migrate_get_current();
> >> +
> >> +        if (migrate_has_error(s)) {
> >> +            WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
> >> +                error_report_err(s->error);
> >> +            }
> >> +        }
> >
> > What's the major benefit of dumping this explicitly?
> 
> This is incoming migration, so there's no centralized error reporting
> aside from the useless "load of migration failed: -5". If the code has
> not called error_report we just never see the error message.
> 
> > And this is not relevant to the multifd problem, correct?
> 
> Yes, I'm being sneaky.

Trying to sneak one patch into a 2 patch series is prone to be exposed and
lose the effect. :-)

I remember we had the verbose error before. Was that lost since some
commit?  In all cases, feel free to post that separately if you think we
should get it back.

The multifd fixes do not look like a regression either for this release. If
so, both of them may be better next release's material?
Fabiano Rosas Nov. 14, 2023, 1:54 a.m. UTC | #4
Peter Xu <peterx@redhat.com> writes:

> On Fri, Nov 10, 2023 at 07:58:00AM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Thu, Nov 09, 2023 at 01:58:55PM -0300, Fabiano Rosas wrote:
>> >> We're not currently reporting the errors set with migrate_set_error()
>> >> when incoming migration fails.
>> >> 
>> >> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> >> ---
>> >>  migration/migration.c | 7 +++++++
>> >>  1 file changed, 7 insertions(+)
>> >> 
>> >> diff --git a/migration/migration.c b/migration/migration.c
>> >> index 28a34c9068..cca32c553c 100644
>> >> --- a/migration/migration.c
>> >> +++ b/migration/migration.c
>> >> @@ -698,6 +698,13 @@ process_incoming_migration_co(void *opaque)
>> >>      }
>> >>  
>> >>      if (ret < 0) {
>> >> +        MigrationState *s = migrate_get_current();
>> >> +
>> >> +        if (migrate_has_error(s)) {
>> >> +            WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
>> >> +                error_report_err(s->error);
>> >> +            }
>> >> +        }
>> >
>> > What's the major benefit of dumping this explicitly?
>> 
>> This is incoming migration, so there's no centralized error reporting
>> aside from the useless "load of migration failed: -5". If the code has
>> not called error_report we just never see the error message.
>> 
>> > And this is not relevant to the multifd problem, correct?
>> 
>> Yes, I'm being sneaky.
>
> Trying to sneak one patch into a 2 patch series is prone to be exposed and
> lose the effect. :-)
>
> I remember we had the verbose error before. Was that lost since some
> commit?  In all cases, feel free to post that separately if you think we
> should get it back.
>
> The multifd fixes do not look like a regression either for this release. If
> so, both of them may be better next release's material?

People have complained about it on IRC and I hit it twice in a week. I
would call it a regression. However, we _do_ have an indication that it
might have been there all along since someone already tried to fix a
very similar issue, maybe even the same one. So I'm fine with punting to
the next release.
diff mbox series

Patch

diff --git a/migration/migration.c b/migration/migration.c
index 28a34c9068..cca32c553c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -698,6 +698,13 @@  process_incoming_migration_co(void *opaque)
     }
 
     if (ret < 0) {
+        MigrationState *s = migrate_get_current();
+
+        if (migrate_has_error(s)) {
+            WITH_QEMU_LOCK_GUARD(&s->error_mutex) {
+                error_report_err(s->error);
+            }
+        }
         error_report("load of migration failed: %s", strerror(-ret));
         goto fail;
     }