diff mbox series

tests/migration: Print some debug on bad status

Message ID 20191108104307.125020-1-dgilbert@redhat.com (mailing list archive)
State New, archived
Headers show
Series tests/migration: Print some debug on bad status | expand

Commit Message

Dr. David Alan Gilbert Nov. 8, 2019, 10:43 a.m. UTC
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

We're seeing occasional asserts in 'wait_for_migraiton_fail', that
I can't reliably reproduce, and where the cores don't have any useful
state.  Print the 'status' out, so we can see which unexpected state
we're ending up in.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tests/migration-test.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Thomas Huth Nov. 8, 2019, 11:49 a.m. UTC | #1
On 08/11/2019 11.43, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> We're seeing occasional asserts in 'wait_for_migraiton_fail', that
> I can't reliably reproduce, and where the cores don't have any useful
> state.  Print the 'status' out, so we can see which unexpected state
> we're ending up in.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   tests/migration-test.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/migration-test.c b/tests/migration-test.c
> index 59f291c654..ac780dffda 100644
> --- a/tests/migration-test.c
> +++ b/tests/migration-test.c
> @@ -899,8 +899,13 @@ static void wait_for_migration_fail(QTestState *from, bool allow_active)
>   
>       do {
>           status = migrate_query_status(from);
> -        g_assert(!strcmp(status, "setup") || !strcmp(status, "failed") ||
> -                 (allow_active && !strcmp(status, "active")));
> +        bool result = !strcmp(status, "setup") || !strcmp(status, "failed") ||
> +                 (allow_active && !strcmp(status, "active"));
> +        if (!result) {
> +            fprintf(stderr, "%s: unexpected status status=%s allow_active=%d\n",
> +                    __func__, status, allow_active);
> +        }
> +        g_assert(result);
>           failed = !strcmp(status, "failed");
>           g_free(status);
>       } while (!failed);
> 

Reviewed-by: Thomas Huth <thuth@redhat.com>
Philippe Mathieu-Daudé Nov. 8, 2019, 12:01 p.m. UTC | #2
On 11/8/19 11:43 AM, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> We're seeing occasional asserts in 'wait_for_migraiton_fail', that
> I can't reliably reproduce, and where the cores don't have any useful
> state.  Print the 'status' out, so we can see which unexpected state
> we're ending up in.

Back to the good remote printf() debugging :)

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>   tests/migration-test.c | 9 +++++++--
>   1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/migration-test.c b/tests/migration-test.c
> index 59f291c654..ac780dffda 100644
> --- a/tests/migration-test.c
> +++ b/tests/migration-test.c
> @@ -899,8 +899,13 @@ static void wait_for_migration_fail(QTestState *from, bool allow_active)
>   
>       do {
>           status = migrate_query_status(from);
> -        g_assert(!strcmp(status, "setup") || !strcmp(status, "failed") ||
> -                 (allow_active && !strcmp(status, "active")));
> +        bool result = !strcmp(status, "setup") || !strcmp(status, "failed") ||
> +                 (allow_active && !strcmp(status, "active"));
> +        if (!result) {
> +            fprintf(stderr, "%s: unexpected status status=%s allow_active=%d\n",
> +                    __func__, status, allow_active);
> +        }
> +        g_assert(result);
>           failed = !strcmp(status, "failed");
>           g_free(status);
>       } while (!failed);
>
no-reply@patchew.org Nov. 8, 2019, 5:46 p.m. UTC | #3
Patchew URL: https://patchew.org/QEMU/20191108104307.125020-1-dgilbert@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  TEST    check-unit: tests/test-bdrv-drain
wait_for_migration_fail: unexpected status status=wait-unplug allow_active=1
**
ERROR:/tmp/qemu-test/src/tests/migration-test.c:908:wait_for_migration_fail: assertion failed: (result)
ERROR - Bail out! ERROR:/tmp/qemu-test/src/tests/migration-test.c:908:wait_for_migration_fail: assertion failed: (result)
make: *** [check-qtest-aarch64] Error 1
make: *** Waiting for unfinished jobs....
  TEST    check-unit: tests/test-bdrv-graph-mod
  TEST    check-unit: tests/test-blockjob
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=fa574cc0377f47f39447919cb9419965', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-4cu751sh/src/docker-src.2019-11-08-12.33.13.24545:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=fa574cc0377f47f39447919cb9419965
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-4cu751sh/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    13m5.136s
user    0m8.186s


The full log is available at
http://patchew.org/logs/20191108104307.125020-1-dgilbert@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
no-reply@patchew.org Nov. 8, 2019, 5:47 p.m. UTC | #4
Patchew URL: https://patchew.org/QEMU/20191108104307.125020-1-dgilbert@redhat.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===




The full log is available at
http://patchew.org/logs/20191108104307.125020-1-dgilbert@redhat.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Dr. David Alan Gilbert Nov. 8, 2019, 6:38 p.m. UTC | #5
Hi Jens,
  the unplug failover stuff is triggering an assertion occasionally on
aarch64; but
  a) I'm not sure the right way to fix it
  b) And I'm out for a little over a week

so...

* no-reply@patchew.org (no-reply@patchew.org) wrote:
> Patchew URL: https://patchew.org/QEMU/20191108104307.125020-1-dgilbert@redhat.com/
> 
> 
> 
> Hi,
> 
> This series failed the docker-quick@centos7 build test. Please find the testing commands and
> their output below. If you have Docker installed, you can probably reproduce it
> locally.
> 
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> make docker-image-centos7 V=1 NETWORK=1
> time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
> === TEST SCRIPT END ===
> 
>   TEST    check-unit: tests/test-bdrv-drain
> wait_for_migration_fail: unexpected status status=wait-unplug allow_active=1

In tests/migration-test.c we've got wait_for_migration_fail, and it's
expecting the state to be any one of:
   setup, failed or maybe active

but it's getting surprised by seeing a 'wait-unplug'

So the question is should we see a wait-unplug?

the migration code has:

    if (qemu_savevm_nr_failover_devices()) {
        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
                          MIGRATION_STATUS_WAIT_UNPLUG);

Should qemu_savevm_nr_failover_devices() be true?
On aarch64 it seems to have a virtio-net device by default
and qemu_savevm_nr_failover_devices() checks for devices
having dev_unplug_pending but doesn't call it.

I see two fixes but am not sure which is right:
   a) Add 'wait-unplug' to the wait_for_migration_fail
      (easy)
   b) Actually call dev_unplug_pending in qemu_savevm_nr_failover_devices
      so that on a guest which has a virtio-net, but no failover device,
      the state isn't entered.

I think (b) is better, since we shouldn't be exposing the wait-unplug
event on setups that don't expect it; but I don't understand the unplug
enough to know if this is a safe change.

Thoughts?

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Jens Freimann Nov. 11, 2019, 10:15 a.m. UTC | #6
On Fri, Nov 08, 2019 at 06:38:17PM +0000, Dr. David Alan Gilbert wrote:
>Hi Jens,
>  the unplug failover stuff is triggering an assertion occasionally on
>aarch64; but
>  a) I'm not sure the right way to fix it
>  b) And I'm out for a little over a week
>
>so...

I'll take care of it. 

>
>* no-reply@patchew.org (no-reply@patchew.org) wrote:
>> Patchew URL: https://patchew.org/QEMU/20191108104307.125020-1-dgilbert@redhat.com/
>>
>>
>>
>> Hi,
>>
>> This series failed the docker-quick@centos7 build test. Please find the testing commands and
>> their output below. If you have Docker installed, you can probably reproduce it
>> locally.
>>
>> === TEST SCRIPT BEGIN ===
>> #!/bin/bash
>> make docker-image-centos7 V=1 NETWORK=1
>> time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
>> === TEST SCRIPT END ===
>>
>>   TEST    check-unit: tests/test-bdrv-drain
>> wait_for_migration_fail: unexpected status status=wait-unplug allow_active=1
>
>In tests/migration-test.c we've got wait_for_migration_fail, and it's
>expecting the state to be any one of:
>   setup, failed or maybe active
>
>but it's getting surprised by seeing a 'wait-unplug'
>
>So the question is should we see a wait-unplug?
>
>the migration code has:
>
>    if (qemu_savevm_nr_failover_devices()) {
>        migrate_set_state(&s->state, MIGRATION_STATUS_SETUP,
>                          MIGRATION_STATUS_WAIT_UNPLUG);
>
>Should qemu_savevm_nr_failover_devices() be true?
>On aarch64 it seems to have a virtio-net device by default
>and qemu_savevm_nr_failover_devices() checks for devices
>having dev_unplug_pending but doesn't call it.
>
>I see two fixes but am not sure which is right:
>   a) Add 'wait-unplug' to the wait_for_migration_fail
>      (easy)
>   b) Actually call dev_unplug_pending in qemu_savevm_nr_failover_devices
>      so that on a guest which has a virtio-net, but no failover device,
>      the state isn't entered.

I also prefer b over a, but I should only set up dev_unplug_pending when
failover=on to avoid this. I'll send a patch.

regards
Jens
diff mbox series

Patch

diff --git a/tests/migration-test.c b/tests/migration-test.c
index 59f291c654..ac780dffda 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -899,8 +899,13 @@  static void wait_for_migration_fail(QTestState *from, bool allow_active)
 
     do {
         status = migrate_query_status(from);
-        g_assert(!strcmp(status, "setup") || !strcmp(status, "failed") ||
-                 (allow_active && !strcmp(status, "active")));
+        bool result = !strcmp(status, "setup") || !strcmp(status, "failed") ||
+                 (allow_active && !strcmp(status, "active"));
+        if (!result) {
+            fprintf(stderr, "%s: unexpected status status=%s allow_active=%d\n",
+                    __func__, status, allow_active);
+        }
+        g_assert(result);
         failed = !strcmp(status, "failed");
         g_free(status);
     } while (!failed);