mbox series

[v2,0/2] Field 'reason' for MIGRATION event

Message ID 20240215122759.1438581-1-rkhapov@yandex-team.ru (mailing list archive)
Headers show
Series Field 'reason' for MIGRATION event | expand

Message

Roman Khapov Feb. 15, 2024, 12:27 p.m. UTC
This is resending of series 20240215082659.1378342-1-rkhapov@yandex-team.ru,
where patch subjects numbers were broken in patch 2/2.

Sometimes, when migration fails, it is hard to find out
the cause of the problems: you have to grep qemu logs.
At the same time, there is MIGRATION event, which looks like
suitable place to hold such error descriptions.

To handle situation like this (maybe one day it will be useful
for other MIGRATION statuses to have additional 'reason' strings),
the general optional field 'reason' can be added.

The series proposes next changes:

1. Adding optional 'reason' field of type str into
   qapi/migration.json MIGRATION event

2. Passing some error description as reason for every place, which
   sets migration state to MIGRATION_STATUS_FAILED

After the series, MIGRATION event will looks like this:
{"execute": "qmp_capabilities"}
{"return": {}}
{"event": "MIGRATION", "data": {"status": "setup"}}
{"event": "MIGRATION", "data": {"status": "failed", "reason": "Failed to connect to '/tmp/sock.sock': No such file or directory"}}

Roman Khapov (2):
  qapi/migration.json: add reason to MIGRATION event
  migration: add error reason for failed MIGRATION events

 migration/colo.c      |   6 +-
 migration/migration.c | 128 ++++++++++++++++++++++++++++--------------
 migration/migration.h |   5 +-
 migration/multifd.c   |  10 ++--
 migration/savevm.c    |  24 ++++----
 qapi/migration.json   |   3 +-
 6 files changed, 112 insertions(+), 64 deletions(-)

Comments

Fabiano Rosas Feb. 21, 2024, 2:45 p.m. UTC | #1
Roman Khapov <rkhapov@yandex-team.ru> writes:

Hi Roman,

> This is resending of series 20240215082659.1378342-1-rkhapov@yandex-team.ru,
> where patch subjects numbers were broken in patch 2/2.
>
> Sometimes, when migration fails, it is hard to find out
> the cause of the problems: you have to grep qemu logs.
> At the same time, there is MIGRATION event, which looks like
> suitable place to hold such error descriptions.

query-migrate after the event is received should be enough for giving
you the failure reason. We have that in error-desc. See commit
c94143e587 ("migration: Display error in query-migrate irrelevant of
status").

>
> To handle situation like this (maybe one day it will be useful
> for other MIGRATION statuses to have additional 'reason' strings),

I find it unlikely. There's no "reason" for making progress except
that's how things work. Only the exceptional (i.e. failure) statuses
would have a reason. Today that's FAILED only, maybe also
POSTCOPY_PAUSED.

> the general optional field 'reason' can be added.
>
> The series proposes next changes:
>
> 1. Adding optional 'reason' field of type str into
>    qapi/migration.json MIGRATION event
>
> 2. Passing some error description as reason for every place, which
>    sets migration state to MIGRATION_STATUS_FAILED
>
> After the series, MIGRATION event will looks like this:
> {"execute": "qmp_capabilities"}
> {"return": {}}
> {"event": "MIGRATION", "data": {"status": "setup"}}
> {"event": "MIGRATION", "data": {"status": "failed", "reason": "Failed to connect to '/tmp/sock.sock': No such file or directory"}}
>
> Roman Khapov (2):
>   qapi/migration.json: add reason to MIGRATION event
>   migration: add error reason for failed MIGRATION events
>
>  migration/colo.c      |   6 +-
>  migration/migration.c | 128 ++++++++++++++++++++++++++++--------------
>  migration/migration.h |   5 +-
>  migration/multifd.c   |  10 ++--
>  migration/savevm.c    |  24 ++++----
>  qapi/migration.json   |   3 +-
>  6 files changed, 112 insertions(+), 64 deletions(-)

Please remember to run make check:

380/383 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test ERROR
104.77s killed by signal 6 SIGABRT
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
stderr: Broken pipe ../tests/qtest/libqtest.c:204: kill_qemu() detected
QEMU death from signal 11 (Segmentation fault) (core dumped)


Most likely one of the new error_setg has broken postcopy recovery. Some
of those paths are not intended to trigger cleanup.
Markus Armbruster Feb. 22, 2024, 7:01 a.m. UTC | #2
Fabiano Rosas <farosas@suse.de> writes:

> Roman Khapov <rkhapov@yandex-team.ru> writes:
>
> Hi Roman,
>
>> This is resending of series 20240215082659.1378342-1-rkhapov@yandex-team.ru,
>> where patch subjects numbers were broken in patch 2/2.
>>
>> Sometimes, when migration fails, it is hard to find out
>> the cause of the problems: you have to grep qemu logs.
>> At the same time, there is MIGRATION event, which looks like
>> suitable place to hold such error descriptions.
>
> query-migrate after the event is received should be enough for giving
> you the failure reason. We have that in error-desc. See commit
> c94143e587 ("migration: Display error in query-migrate irrelevant of
> status").
>
>>
>> To handle situation like this (maybe one day it will be useful
>> for other MIGRATION statuses to have additional 'reason' strings),
>
> I find it unlikely. There's no "reason" for making progress except
> that's how things work. Only the exceptional (i.e. failure) statuses
> would have a reason. Today that's FAILED only, maybe also
> POSTCOPY_PAUSED.

I can't see a need for the proposed feature then.

>> the general optional field 'reason' can be added.

[...]