Message ID | 20191108104307.125020-1-dgilbert@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | tests/migration: Print some debug on bad status | expand |
On 08/11/2019 11.43, Dr. David Alan Gilbert (git) wrote: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > We're seeing occasional asserts in 'wait_for_migraiton_fail', that > I can't reliably reproduce, and where the cores don't have any useful > state. Print the 'status' out, so we can see which unexpected state > we're ending up in. > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > --- > tests/migration-test.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/tests/migration-test.c b/tests/migration-test.c > index 59f291c654..ac780dffda 100644 > --- a/tests/migration-test.c > +++ b/tests/migration-test.c > @@ -899,8 +899,13 @@ static void wait_for_migration_fail(QTestState *from, bool allow_active) > > do { > status = migrate_query_status(from); > - g_assert(!strcmp(status, "setup") || !strcmp(status, "failed") || > - (allow_active && !strcmp(status, "active"))); > + bool result = !strcmp(status, "setup") || !strcmp(status, "failed") || > + (allow_active && !strcmp(status, "active")); > + if (!result) { > + fprintf(stderr, "%s: unexpected status status=%s allow_active=%d\n", > + __func__, status, allow_active); > + } > + g_assert(result); > failed = !strcmp(status, "failed"); > g_free(status); > } while (!failed); > Reviewed-by: Thomas Huth <thuth@redhat.com>
On 11/8/19 11:43 AM, Dr. David Alan Gilbert (git) wrote: > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > > We're seeing occasional asserts in 'wait_for_migraiton_fail', that > I can't reliably reproduce, and where the cores don't have any useful > state. Print the 'status' out, so we can see which unexpected state > we're ending up in. Back to the good remote printf() debugging :) Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> > --- > tests/migration-test.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/tests/migration-test.c b/tests/migration-test.c > index 59f291c654..ac780dffda 100644 > --- a/tests/migration-test.c > +++ b/tests/migration-test.c > @@ -899,8 +899,13 @@ static void wait_for_migration_fail(QTestState *from, bool allow_active) > > do { > status = migrate_query_status(from); > - g_assert(!strcmp(status, "setup") || !strcmp(status, "failed") || > - (allow_active && !strcmp(status, "active"))); > + bool result = !strcmp(status, "setup") || !strcmp(status, "failed") || > + (allow_active && !strcmp(status, "active")); > + if (!result) { > + fprintf(stderr, "%s: unexpected status status=%s allow_active=%d\n", > + __func__, status, allow_active); > + } > + g_assert(result); > failed = !strcmp(status, "failed"); > g_free(status); > } while (!failed); >
Patchew URL: https://patchew.org/QEMU/20191108104307.125020-1-dgilbert@redhat.com/ Hi, This series failed the docker-quick@centos7 build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #!/bin/bash make docker-image-centos7 V=1 NETWORK=1 time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 === TEST SCRIPT END === TEST check-unit: tests/test-bdrv-drain wait_for_migration_fail: unexpected status status=wait-unplug allow_active=1 ** ERROR:/tmp/qemu-test/src/tests/migration-test.c:908:wait_for_migration_fail: assertion failed: (result) ERROR - Bail out! ERROR:/tmp/qemu-test/src/tests/migration-test.c:908:wait_for_migration_fail: assertion failed: (result) make: *** [check-qtest-aarch64] Error 1 make: *** Waiting for unfinished jobs.... TEST check-unit: tests/test-bdrv-graph-mod TEST check-unit: tests/test-blockjob --- raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=fa574cc0377f47f39447919cb9419965', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-4cu751sh/src/docker-src.2019-11-08-12.33.13.24545:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2. filter=--filter=label=com.qemu.instance.uuid=fa574cc0377f47f39447919cb9419965 make[1]: *** [docker-run] Error 1 make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-4cu751sh/src' make: *** [docker-run-test-quick@centos7] Error 2 real 13m5.136s user 0m8.186s The full log is available at http://patchew.org/logs/20191108104307.125020-1-dgilbert@redhat.com/testing.docker-quick@centos7/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
Patchew URL: https://patchew.org/QEMU/20191108104307.125020-1-dgilbert@redhat.com/ Hi, This series failed the docker-mingw@fedora build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #! /bin/bash export ARCH=x86_64 make docker-image-fedora V=1 NETWORK=1 time make docker-test-mingw@fedora J=14 NETWORK=1 === TEST SCRIPT END === The full log is available at http://patchew.org/logs/20191108104307.125020-1-dgilbert@redhat.com/testing.docker-mingw@fedora/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
Hi Jens, the unplug failover stuff is triggering an assertion occasionally on aarch64; but a) I'm not sure the right way to fix it b) And I'm out for a little over a week so... * no-reply@patchew.org (no-reply@patchew.org) wrote: > Patchew URL: https://patchew.org/QEMU/20191108104307.125020-1-dgilbert@redhat.com/ > > > > Hi, > > This series failed the docker-quick@centos7 build test. Please find the testing commands and > their output below. If you have Docker installed, you can probably reproduce it > locally. > > === TEST SCRIPT BEGIN === > #!/bin/bash > make docker-image-centos7 V=1 NETWORK=1 > time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 > === TEST SCRIPT END === > > TEST check-unit: tests/test-bdrv-drain > wait_for_migration_fail: unexpected status status=wait-unplug allow_active=1 In tests/migration-test.c we've got wait_for_migration_fail, and it's expecting the state to be any one of: setup, failed or maybe active but it's getting surprised by seeing a 'wait-unplug' So the question is should we see a wait-unplug? the migration code has: if (qemu_savevm_nr_failover_devices()) { migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, MIGRATION_STATUS_WAIT_UNPLUG); Should qemu_savevm_nr_failover_devices() be true? On aarch64 it seems to have a virtio-net device by default and qemu_savevm_nr_failover_devices() checks for devices having dev_unplug_pending but doesn't call it. I see two fixes but am not sure which is right: a) Add 'wait-unplug' to the wait_for_migration_fail (easy) b) Actually call dev_unplug_pending in qemu_savevm_nr_failover_devices so that on a guest which has a virtio-net, but no failover device, the state isn't entered. I think (b) is better, since we shouldn't be exposing the wait-unplug event on setups that don't expect it; but I don't understand the unplug enough to know if this is a safe change. Thoughts? Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Fri, Nov 08, 2019 at 06:38:17PM +0000, Dr. David Alan Gilbert wrote: >Hi Jens, > the unplug failover stuff is triggering an assertion occasionally on >aarch64; but > a) I'm not sure the right way to fix it > b) And I'm out for a little over a week > >so... I'll take care of it. > >* no-reply@patchew.org (no-reply@patchew.org) wrote: >> Patchew URL: https://patchew.org/QEMU/20191108104307.125020-1-dgilbert@redhat.com/ >> >> >> >> Hi, >> >> This series failed the docker-quick@centos7 build test. Please find the testing commands and >> their output below. If you have Docker installed, you can probably reproduce it >> locally. >> >> === TEST SCRIPT BEGIN === >> #!/bin/bash >> make docker-image-centos7 V=1 NETWORK=1 >> time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 >> === TEST SCRIPT END === >> >> TEST check-unit: tests/test-bdrv-drain >> wait_for_migration_fail: unexpected status status=wait-unplug allow_active=1 > >In tests/migration-test.c we've got wait_for_migration_fail, and it's >expecting the state to be any one of: > setup, failed or maybe active > >but it's getting surprised by seeing a 'wait-unplug' > >So the question is should we see a wait-unplug? > >the migration code has: > > if (qemu_savevm_nr_failover_devices()) { > migrate_set_state(&s->state, MIGRATION_STATUS_SETUP, > MIGRATION_STATUS_WAIT_UNPLUG); > >Should qemu_savevm_nr_failover_devices() be true? >On aarch64 it seems to have a virtio-net device by default >and qemu_savevm_nr_failover_devices() checks for devices >having dev_unplug_pending but doesn't call it. > >I see two fixes but am not sure which is right: > a) Add 'wait-unplug' to the wait_for_migration_fail > (easy) > b) Actually call dev_unplug_pending in qemu_savevm_nr_failover_devices > so that on a guest which has a virtio-net, but no failover device, > the state isn't entered. I also prefer b over a, but I should only set up dev_unplug_pending when failover=on to avoid this. I'll send a patch. regards Jens
diff --git a/tests/migration-test.c b/tests/migration-test.c index 59f291c654..ac780dffda 100644 --- a/tests/migration-test.c +++ b/tests/migration-test.c @@ -899,8 +899,13 @@ static void wait_for_migration_fail(QTestState *from, bool allow_active) do { status = migrate_query_status(from); - g_assert(!strcmp(status, "setup") || !strcmp(status, "failed") || - (allow_active && !strcmp(status, "active"))); + bool result = !strcmp(status, "setup") || !strcmp(status, "failed") || + (allow_active && !strcmp(status, "active")); + if (!result) { + fprintf(stderr, "%s: unexpected status status=%s allow_active=%d\n", + __func__, status, allow_active); + } + g_assert(result); failed = !strcmp(status, "failed"); g_free(status); } while (!failed);