diff mbox

[RFC] tests/device-introspect: Test devices with all machines, not only with "none"

Message ID 1521452376-25099-1-git-send-email-thuth@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Thomas Huth March 19, 2018, 9:39 a.m. UTC
Many device introspection crashes only happen if you are using a
certain machine, e.g.:

$ ppc-softmmu/qemu-system-ppc -S -M ref405ep,accel=qtest -qmp stdio
{"QMP": {"version": {"qemu": {"micro": 50, "minor": 11, "major": 2},
 "package": "build-all"}, "capabilities": []}}
{ 'execute': 'qmp_capabilities' }
{"return": {}}
{ 'execute': 'device-list-properties',
  'arguments': {'typename': 'macio-newworld'}}
Unexpected error in qemu_chr_fe_init() at chardev/char-fe.c:222:
Device 'serial0' is in use
Aborted (core dumped)

To be able to catch these problems, let's extend the device-introspect
test to check the devices on all machine types. Since this is a rather
slow operation, the test is only run in "SPEED=slow" mode.

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 In case someone wants to help with creating some bug fix patches
 during the QEMU hard freeze phase: This test can now be used to
 trigger lots of introspection bugs that we were not aware of yet.
 I think most of the bugs are due to wrong handling of instance_init
 vs. realize functions.
 For Example:
 $ make check-qtest SPEED=slow
  GTESTER check-qtest-aarch64
 RAMBlock "integrator.flash" already registered, abort!
 Broken pipe
 GTester: last random seed: R02S8e52709605790d290d2c8261cefb8b0e
 Unsupported NIC model: lan9118
 Broken pipe
 GTester: last random seed: R02S326d4ea43bfce860ebe2d554192540f7
 qemu-system-aarch64: warning: nic lan9118.0 has no peer
 Unsupported NIC model: smc91c111
 Broken pipe
 GTester: last random seed: R02Se9783b450806f350a14e757b175e3dc4
 qemu-system-aarch64: missing SecureDigital device
 Broken pipe
 GTester: last random seed: R02S5c718b8f4c4fd48a358de8daafcf1b6f
 qemu-system-aarch64: warning: nic lan9118.0 has no peer
 Unexpected error in error_set_from_qdev_prop_error() at hw/core/qdev-properties.c:1095:
 Property 'allwinner-emac.netdev' can't take value 'hub0port0', it's in use
 Broken pipe
 GTester: last random seed: R02S597848ddcfdc76a695a946a9d4e50146
 qemu-system-aarch64: warning: nic ftgmac100.0 has no peer
 GTester: last random seed: R02Seea0f0b769a2161fa53a50479fd68d84
 qemu-system-aarch64: warning: nic imx.fec.0 has no peer
 qemu-system-aarch64: missing SecureDigital device
 Broken pipe
 GTester: last random seed: R02S9c2d3e34427162e7a56aa4ac859f1a6b
 Unsupported NIC model: virtio-net-pci
 Broken pipe
 GTester: last random seed: R02Sd61c0e9ed52d50a17c784213e5c6590c
 Unsupported NIC model: mv88w8618
 Broken pipe
 GTester: last random seed: R02Sbfaecfe58dd643f2faca218e3051d464
 qemu-system-aarch64: warning: nic mv88w8618_eth.0 has no peer
 qemu-system-aarch64: missing SecureDigital device
 Broken pipe
 Unsupported NIC model: xgmac
 Broken pipe
 GTester: last random seed: R02Sc61e65e884e364652c3a0c4190023565
 fsl,imx7: Only 2 CPUs are supported (4 requested)
 Broken pipe
 GTester: last random seed: R02S0cfda43bc17e3e052d5a994b2c96457b
 etc.

 tests/device-introspect-test.c | 33 ++++++++++++++++++++++++++++++---
 1 file changed, 30 insertions(+), 3 deletions(-)

Comments

Eduardo Habkost March 19, 2018, 8:37 p.m. UTC | #1
On Mon, Mar 19, 2018 at 10:39:36AM +0100, Thomas Huth wrote:
> Many device introspection crashes only happen if you are using a
> certain machine, e.g.:
> 
> $ ppc-softmmu/qemu-system-ppc -S -M ref405ep,accel=qtest -qmp stdio
> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 11, "major": 2},
>  "package": "build-all"}, "capabilities": []}}
> { 'execute': 'qmp_capabilities' }
> {"return": {}}
> { 'execute': 'device-list-properties',
>   'arguments': {'typename': 'macio-newworld'}}
> Unexpected error in qemu_chr_fe_init() at chardev/char-fe.c:222:
> Device 'serial0' is in use
> Aborted (core dumped)
> 
> To be able to catch these problems, let's extend the device-introspect
> test to check the devices on all machine types. Since this is a rather
> slow operation, the test is only run in "SPEED=slow" mode.
> 
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  In case someone wants to help with creating some bug fix patches
>  during the QEMU hard freeze phase: This test can now be used to
>  trigger lots of introspection bugs that we were not aware of yet.
>  I think most of the bugs are due to wrong handling of instance_init
>  vs. realize functions.
[...]

This looks very useful, thanks!

I wonder if we could have something that would make it simpler
for us to cover more command-line combinations + QMP commands in
simple "validate output and check if QEMU won't crash" test cases
without writing extra C or Python code every time.

device-crash-test could be used for that, but I'd like to make it
simpler to extend.
Markus Armbruster April 17, 2018, 12:12 p.m. UTC | #2
Thomas Huth <thuth@redhat.com> writes:

> Many device introspection crashes only happen if you are using a
> certain machine, e.g.:
>
> $ ppc-softmmu/qemu-system-ppc -S -M ref405ep,accel=qtest -qmp stdio
> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 11, "major": 2},
>  "package": "build-all"}, "capabilities": []}}
> { 'execute': 'qmp_capabilities' }
> {"return": {}}
> { 'execute': 'device-list-properties',
>   'arguments': {'typename': 'macio-newworld'}}
> Unexpected error in qemu_chr_fe_init() at chardev/char-fe.c:222:
> Device 'serial0' is in use
> Aborted (core dumped)
>
> To be able to catch these problems, let's extend the device-introspect
> test to check the devices on all machine types. Since this is a rather
> slow operation, the test is only run in "SPEED=slow" mode.

If the device works with one machine type, it has a decent chance to
work with others, too.  Thus, testing each device with every machine
type is overkill.  I appreciate having overkill as an option :)

What I'd like to see for a quick "make check" is testing each device
once.  That should flush out most bugs.  

> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  In case someone wants to help with creating some bug fix patches
>  during the QEMU hard freeze phase: This test can now be used to
>  trigger lots of introspection bugs that we were not aware of yet.
>  I think most of the bugs are due to wrong handling of instance_init
>  vs. realize functions.

Yes, that's a common class of bugs.  There's little guidance on what
kind of work belongs where, and plenty of bad examples.  Some of the bad
examples crash (as you found).  Some work fine, typically because the
device doesn't support unplug.

Bad examples breed more bad code, so we better fix them all.  The ones
that work are harder to find...
Peter Maydell April 17, 2018, 12:52 p.m. UTC | #3
On 17 April 2018 at 13:12, Markus Armbruster <armbru@redhat.com> wrote:
> Thomas Huth <thuth@redhat.com> writes:
>>  In case someone wants to help with creating some bug fix patches
>>  during the QEMU hard freeze phase: This test can now be used to
>>  trigger lots of introspection bugs that we were not aware of yet.
>>  I think most of the bugs are due to wrong handling of instance_init
>>  vs. realize functions.
>
> Yes, that's a common class of bugs.  There's little guidance on what
> kind of work belongs where, and plenty of bad examples.  Some of the bad
> examples crash (as you found).  Some work fine, typically because the
> device doesn't support unplug.

I've been vaguely wondering if we should start to recommend that
all devices have a correctly implemented code path for destroying
them post-realize, even if they don't actually implement hotplug...

thanks
-- PMM
Markus Armbruster April 17, 2018, 1:15 p.m. UTC | #4
Peter Maydell <peter.maydell@linaro.org> writes:

> On 17 April 2018 at 13:12, Markus Armbruster <armbru@redhat.com> wrote:
>> Thomas Huth <thuth@redhat.com> writes:
>>>  In case someone wants to help with creating some bug fix patches
>>>  during the QEMU hard freeze phase: This test can now be used to
>>>  trigger lots of introspection bugs that we were not aware of yet.
>>>  I think most of the bugs are due to wrong handling of instance_init
>>>  vs. realize functions.
>>
>> Yes, that's a common class of bugs.  There's little guidance on what
>> kind of work belongs where, and plenty of bad examples.  Some of the bad
>> examples crash (as you found).  Some work fine, typically because the
>> device doesn't support unplug.
>
> I've been vaguely wondering if we should start to recommend that
> all devices have a correctly implemented code path for destroying
> them post-realize, even if they don't actually implement hotplug...

Possibly crazy idea: make devices implement *cold* unplug.  Not really
useful in practice, but it would make the destroy path testable.
Thomas Huth April 26, 2018, 10:24 a.m. UTC | #5
On 17.04.2018 14:12, Markus Armbruster wrote:
> Thomas Huth <thuth@redhat.com> writes:
> 
>> Many device introspection crashes only happen if you are using a
>> certain machine, e.g.:
>>
>> $ ppc-softmmu/qemu-system-ppc -S -M ref405ep,accel=qtest -qmp stdio
>> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 11, "major": 2},
>>  "package": "build-all"}, "capabilities": []}}
>> { 'execute': 'qmp_capabilities' }
>> {"return": {}}
>> { 'execute': 'device-list-properties',
>>   'arguments': {'typename': 'macio-newworld'}}
>> Unexpected error in qemu_chr_fe_init() at chardev/char-fe.c:222:
>> Device 'serial0' is in use
>> Aborted (core dumped)
>>
>> To be able to catch these problems, let's extend the device-introspect
>> test to check the devices on all machine types. Since this is a rather
>> slow operation, the test is only run in "SPEED=slow" mode.
> 
> If the device works with one machine type, it has a decent chance to
> work with others, too.  Thus, testing each device with every machine
> type is overkill.  I appreciate having overkill as an option :)
> 
> What I'd like to see for a quick "make check" is testing each device
> once.  That should flush out most bugs.  

That's already done with the "none" machine.

Anyway, do you think my patch here is useful and has a chance of getting
included? I.e. shall I re-spin this as a non-RFC patch? Or shall we
rather wait for Eduardo's python-based tests to get included into the
repository?

>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>> ---
>>  In case someone wants to help with creating some bug fix patches
>>  during the QEMU hard freeze phase: This test can now be used to
>>  trigger lots of introspection bugs that we were not aware of yet.
>>  I think most of the bugs are due to wrong handling of instance_init
>>  vs. realize functions.
> 
> Yes, that's a common class of bugs.  There's little guidance on what
> kind of work belongs where, and plenty of bad examples.

I think we urgently need a file in doc/devel/ that describes the various
states / functions of a device, where we should properly describe the
differences between instance_init and realize. ... I'll try to come up
with something when I've got some spare time (unless somebody else
volunteers to do that first).

 Thomas
Markus Armbruster April 26, 2018, 11:45 a.m. UTC | #6
Thomas Huth <thuth@redhat.com> writes:

> Many device introspection crashes only happen if you are using a
> certain machine, e.g.:
>
> $ ppc-softmmu/qemu-system-ppc -S -M ref405ep,accel=qtest -qmp stdio
> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 11, "major": 2},
>  "package": "build-all"}, "capabilities": []}}
> { 'execute': 'qmp_capabilities' }
> {"return": {}}
> { 'execute': 'device-list-properties',
>   'arguments': {'typename': 'macio-newworld'}}
> Unexpected error in qemu_chr_fe_init() at chardev/char-fe.c:222:
> Device 'serial0' is in use
> Aborted (core dumped)
>
> To be able to catch these problems, let's extend the device-introspect
> test to check the devices on all machine types. Since this is a rather
> slow operation, the test is only run in "SPEED=slow" mode.
>
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  In case someone wants to help with creating some bug fix patches
>  during the QEMU hard freeze phase: This test can now be used to
>  trigger lots of introspection bugs that we were not aware of yet.
>  I think most of the bugs are due to wrong handling of instance_init
>  vs. realize functions.
>  For Example:
>  $ make check-qtest SPEED=slow
>   GTESTER check-qtest-aarch64
>  RAMBlock "integrator.flash" already registered, abort!
>  Broken pipe
>  GTester: last random seed: R02S8e52709605790d290d2c8261cefb8b0e
>  Unsupported NIC model: lan9118
>  Broken pipe
>  GTester: last random seed: R02S326d4ea43bfce860ebe2d554192540f7
>  qemu-system-aarch64: warning: nic lan9118.0 has no peer
>  Unsupported NIC model: smc91c111
>  Broken pipe
>  GTester: last random seed: R02Se9783b450806f350a14e757b175e3dc4
>  qemu-system-aarch64: missing SecureDigital device
>  Broken pipe
>  GTester: last random seed: R02S5c718b8f4c4fd48a358de8daafcf1b6f
>  qemu-system-aarch64: warning: nic lan9118.0 has no peer
>  Unexpected error in error_set_from_qdev_prop_error() at hw/core/qdev-properties.c:1095:
>  Property 'allwinner-emac.netdev' can't take value 'hub0port0', it's in use
>  Broken pipe
>  GTester: last random seed: R02S597848ddcfdc76a695a946a9d4e50146
>  qemu-system-aarch64: warning: nic ftgmac100.0 has no peer
>  GTester: last random seed: R02Seea0f0b769a2161fa53a50479fd68d84
>  qemu-system-aarch64: warning: nic imx.fec.0 has no peer
>  qemu-system-aarch64: missing SecureDigital device
>  Broken pipe
>  GTester: last random seed: R02S9c2d3e34427162e7a56aa4ac859f1a6b
>  Unsupported NIC model: virtio-net-pci
>  Broken pipe
>  GTester: last random seed: R02Sd61c0e9ed52d50a17c784213e5c6590c
>  Unsupported NIC model: mv88w8618
>  Broken pipe
>  GTester: last random seed: R02Sbfaecfe58dd643f2faca218e3051d464
>  qemu-system-aarch64: warning: nic mv88w8618_eth.0 has no peer
>  qemu-system-aarch64: missing SecureDigital device
>  Broken pipe
>  Unsupported NIC model: xgmac
>  Broken pipe
>  GTester: last random seed: R02Sc61e65e884e364652c3a0c4190023565
>  fsl,imx7: Only 2 CPUs are supported (4 requested)
>  Broken pipe
>  GTester: last random seed: R02S0cfda43bc17e3e052d5a994b2c96457b
>  etc.
>
>  tests/device-introspect-test.c | 33 ++++++++++++++++++++++++++++++---
>  1 file changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/tests/device-introspect-test.c b/tests/device-introspect-test.c
> index b80058f..a9b9cf7 100644
> --- a/tests/device-introspect-test.c
> +++ b/tests/device-introspect-test.c
> @@ -105,6 +105,8 @@ static void test_one_device(const char *type)
>      QDict *resp;
>      char *help, *qom_tree;
>  
> +    g_debug("Testing device '%s'", type);
> +
>      resp = qmp("{'execute': 'device-list-properties',"
>                 " 'arguments': {'typename': %s}}",
>                 type);
> @@ -206,13 +208,13 @@ static void test_device_intro_abstract(void)
>      qtest_end();
>  }
>  
> -static void test_device_intro_concrete(void)
> +static void test_device_intro_concrete(gconstpointer args)

const void *, please, because that's what qtest_add_data_func() takes.

>  {
>      QList *types;
>      QListEntry *entry;
>      const char *type;
>  
> -    qtest_start(common_args);
> +    qtest_start((const char *)args);
>      types = device_type_list(false);
>  
>      QLIST_FOREACH_ENTRY(types, entry) {
> @@ -224,6 +226,7 @@ static void test_device_intro_concrete(void)
>  
>      QDECREF(types);
>      qtest_end();
> +    g_free((void *)args);
>  }
>  
>  static void test_abstract_interfaces(void)
> @@ -260,6 +263,26 @@ static void test_abstract_interfaces(void)
>      qtest_end();
>  }
>  
> +static void add_machine_test_case(const char *mname)
> +{
> +    char *path, *args;
> +
> +    /* Ignore blacklisted machines */
> +    if (g_str_equal("xenfv", mname) || g_str_equal("xenpv", mname)) {
> +        return;
> +    }
> +
> +    path = g_strdup_printf("device/introspect/concrete-defaults-%s", mname);
> +    args = g_strdup_printf("-machine %s", mname);
> +    qtest_add_data_func(path, args, test_device_intro_concrete);

This runs test_device_intro_concrete() with "-machine M" for all machine
types M, in SPEED=slow mode.

> +    g_free(path);
> +
> +    path = g_strdup_printf("device/introspect/concrete-nodefaults-%s", mname);
> +    args = g_strdup_printf("-nodefaults -machine %s", mname);
> +    qtest_add_data_func(path, args, test_device_intro_concrete);

This runs test_device_intro_concrete() with "-nodefaults -machine M" for
all machine types M, in SPEED=slow mode.

Has "without -nodefaults" exposed additional bugs?

Please mention "with and without -nodefaults" in the commit message.

I'd try "with -nodefaults" before "without", because "with" is the
simpler test case.

> +    g_free(path);
> +}
> +
>  int main(int argc, char **argv)
>  {
>      g_test_init(&argc, &argv, NULL);
> @@ -268,8 +291,12 @@ int main(int argc, char **argv)
>      qtest_add_func("device/introspect/list-fields", test_qom_list_fields);
>      qtest_add_func("device/introspect/none", test_device_intro_none);
>      qtest_add_func("device/introspect/abstract", test_device_intro_abstract);
> -    qtest_add_func("device/introspect/concrete", test_device_intro_concrete);
>      qtest_add_func("device/introspect/abstract-interfaces", test_abstract_interfaces);
> +    qtest_add_data_func("device/introspect/concrete", g_strdup(common_args),
> +                        test_device_intro_concrete);

This runs test_device_intro_concrete() with "-nodefaults -machine
none".  Duplicate in SPEED=slow mode?

> +    if (g_test_slow()) {
> +        qtest_cb_for_every_machine(add_machine_test_case);
> +    }
>  
>      return g_test_run();
>  }
Markus Armbruster April 26, 2018, 11:54 a.m. UTC | #7
Thomas Huth <thuth@redhat.com> writes:

> On 17.04.2018 14:12, Markus Armbruster wrote:
>> Thomas Huth <thuth@redhat.com> writes:
>> 
>>> Many device introspection crashes only happen if you are using a
>>> certain machine, e.g.:
>>>
>>> $ ppc-softmmu/qemu-system-ppc -S -M ref405ep,accel=qtest -qmp stdio
>>> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 11, "major": 2},
>>>  "package": "build-all"}, "capabilities": []}}
>>> { 'execute': 'qmp_capabilities' }
>>> {"return": {}}
>>> { 'execute': 'device-list-properties',
>>>   'arguments': {'typename': 'macio-newworld'}}
>>> Unexpected error in qemu_chr_fe_init() at chardev/char-fe.c:222:
>>> Device 'serial0' is in use
>>> Aborted (core dumped)
>>>
>>> To be able to catch these problems, let's extend the device-introspect
>>> test to check the devices on all machine types. Since this is a rather
>>> slow operation, the test is only run in "SPEED=slow" mode.
>> 
>> If the device works with one machine type, it has a decent chance to
>> work with others, too.  Thus, testing each device with every machine
>> type is overkill.  I appreciate having overkill as an option :)
>> 
>> What I'd like to see for a quick "make check" is testing each device
>> once.  That should flush out most bugs.  
>
> That's already done with the "none" machine.

I was too terse.  We test each device with -machine none for every
target.  Fine if that's quick enough.  If not, we might want to reduce
redundancy there.

Actually, a worse offender in the "waste everybody's time via redunancy"
department could be qom-test.

> Anyway, do you think my patch here is useful and has a chance of getting
> included? I.e. shall I re-spin this as a non-RFC patch? Or shall we
> rather wait for Eduardo's python-based tests to get included into the
> repository?

I don't mind having make check SPEED=slow run more extensive tests.
Assuming we actually run them at least once in a while, which seems
doubtful.

>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>> ---
>>>  In case someone wants to help with creating some bug fix patches
>>>  during the QEMU hard freeze phase: This test can now be used to
>>>  trigger lots of introspection bugs that we were not aware of yet.
>>>  I think most of the bugs are due to wrong handling of instance_init
>>>  vs. realize functions.
>> 
>> Yes, that's a common class of bugs.  There's little guidance on what
>> kind of work belongs where, and plenty of bad examples.
>
> I think we urgently need a file in doc/devel/ that describes the various
> states / functions of a device, where we should properly describe the
> differences between instance_init and realize. ... I'll try to come up
> with something when I've got some spare time (unless somebody else
> volunteers to do that first).

Please do.

Widen the scope from just TYPE_DEVICE to all of QOM?
Thomas Huth April 26, 2018, 3:20 p.m. UTC | #8
On 26.04.2018 13:45, Markus Armbruster wrote:
> Thomas Huth <thuth@redhat.com> writes:
[...]
>> @@ -260,6 +263,26 @@ static void test_abstract_interfaces(void)
>>      qtest_end();
>>  }
>>  
>> +static void add_machine_test_case(const char *mname)
>> +{
>> +    char *path, *args;
>> +
>> +    /* Ignore blacklisted machines */
>> +    if (g_str_equal("xenfv", mname) || g_str_equal("xenpv", mname)) {
>> +        return;
>> +    }
>> +
>> +    path = g_strdup_printf("device/introspect/concrete-defaults-%s", mname);
>> +    args = g_strdup_printf("-machine %s", mname);
>> +    qtest_add_data_func(path, args, test_device_intro_concrete);
> 
> This runs test_device_intro_concrete() with "-machine M" for all machine
> types M, in SPEED=slow mode.
> 
>> +    g_free(path);
>> +
>> +    path = g_strdup_printf("device/introspect/concrete-nodefaults-%s", mname);
>> +    args = g_strdup_printf("-nodefaults -machine %s", mname);
>> +    qtest_add_data_func(path, args, test_device_intro_concrete);
> 
> This runs test_device_intro_concrete() with "-nodefaults -machine M" for
> all machine types M, in SPEED=slow mode.
> 
> Has "without -nodefaults" exposed additional bugs?

After testing this with all machines, I had to discover that
"-nodefaults" does not work so easily: A lot of the embedded machines
(especially the ARM machines) simply refuse to work with "-nodefaults"
and exit immediately instead. E.g.:

$ arm-softmmu/qemu-system-arm -nodefaults -nographic -M n810,accel=qtest
qemu-system-arm: missing SecureDigital device

So we'd either need a rather big black list for the machines that do not
work, or simply drop the "-nodefaults" tests from this patch.

> Please mention "with and without -nodefaults" in the commit message.
> 
> I'd try "with -nodefaults" before "without", because "with" is the
> simpler test case.

For most boards, it seems rather to be the more "difficult" setting
since most boards are only tested without "-nodefaults" obviously.

>> +    g_free(path);
>> +}
>> +
>>  int main(int argc, char **argv)
>>  {
>>      g_test_init(&argc, &argv, NULL);
>> @@ -268,8 +291,12 @@ int main(int argc, char **argv)
>>      qtest_add_func("device/introspect/list-fields", test_qom_list_fields);
>>      qtest_add_func("device/introspect/none", test_device_intro_none);
>>      qtest_add_func("device/introspect/abstract", test_device_intro_abstract);
>> -    qtest_add_func("device/introspect/concrete", test_device_intro_concrete);
>>      qtest_add_func("device/introspect/abstract-interfaces", test_abstract_interfaces);
>> +    qtest_add_data_func("device/introspect/concrete", g_strdup(common_args),
>> +                        test_device_intro_concrete);
> 
> This runs test_device_intro_concrete() with "-nodefaults -machine
> none".  Duplicate in SPEED=slow mode?

Yes, it's a duplicate, we should skip that in SPEED=slow mode.

 Thomas
Thomas Huth April 26, 2018, 3:27 p.m. UTC | #9
On 26.04.2018 13:54, Markus Armbruster wrote:
> Thomas Huth <thuth@redhat.com> writes:
[...]
> Actually, a worse offender in the "waste everybody's time via redunancy"
> department could be qom-test.

I guess we could also change the logic in qom-tester to only run with
all machines if we're in SPEED=slow mode, and rather only use the "none"
machine by default?

>> Anyway, do you think my patch here is useful and has a chance of getting
>> included? I.e. shall I re-spin this as a non-RFC patch? Or shall we
>> rather wait for Eduardo's python-based tests to get included into the
>> repository?
> 
> I don't mind having make check SPEED=slow run more extensive tests.
> Assuming we actually run them at least once in a while, which seems
> doubtful.

If some developers (like myself) are running it at least every couple of
weeks manually, that's already much better than nothing!

>>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>>> ---
>>>>  In case someone wants to help with creating some bug fix patches
>>>>  during the QEMU hard freeze phase: This test can now be used to
>>>>  trigger lots of introspection bugs that we were not aware of yet.
>>>>  I think most of the bugs are due to wrong handling of instance_init
>>>>  vs. realize functions.
>>>
>>> Yes, that's a common class of bugs.  There's little guidance on what
>>> kind of work belongs where, and plenty of bad examples.
>>
>> I think we urgently need a file in doc/devel/ that describes the various
>> states / functions of a device, where we should properly describe the
>> differences between instance_init and realize. ... I'll try to come up
>> with something when I've got some spare time (unless somebody else
>> volunteers to do that first).
> 
> Please do.
> 
> Widen the scope from just TYPE_DEVICE to all of QOM?

I don't have that much experience with QOM yet that I'd dare to write a
doc about it. Would you maybe be interested in writing something up
about QOM?

 Thomas
Eduardo Habkost April 27, 2018, 12:32 a.m. UTC | #10
On Thu, Apr 26, 2018 at 05:20:25PM +0200, Thomas Huth wrote:
> On 26.04.2018 13:45, Markus Armbruster wrote:
> > Thomas Huth <thuth@redhat.com> writes:
> [...]
> >> @@ -260,6 +263,26 @@ static void test_abstract_interfaces(void)
> >>      qtest_end();
> >>  }
> >>  
> >> +static void add_machine_test_case(const char *mname)
> >> +{
> >> +    char *path, *args;
> >> +
> >> +    /* Ignore blacklisted machines */
> >> +    if (g_str_equal("xenfv", mname) || g_str_equal("xenpv", mname)) {
> >> +        return;
> >> +    }
> >> +
> >> +    path = g_strdup_printf("device/introspect/concrete-defaults-%s", mname);
> >> +    args = g_strdup_printf("-machine %s", mname);
> >> +    qtest_add_data_func(path, args, test_device_intro_concrete);
> > 
> > This runs test_device_intro_concrete() with "-machine M" for all machine
> > types M, in SPEED=slow mode.
> > 
> >> +    g_free(path);
> >> +
> >> +    path = g_strdup_printf("device/introspect/concrete-nodefaults-%s", mname);
> >> +    args = g_strdup_printf("-nodefaults -machine %s", mname);
> >> +    qtest_add_data_func(path, args, test_device_intro_concrete);
> > 
> > This runs test_device_intro_concrete() with "-nodefaults -machine M" for
> > all machine types M, in SPEED=slow mode.
> > 
> > Has "without -nodefaults" exposed additional bugs?
> 
> After testing this with all machines, I had to discover that
> "-nodefaults" does not work so easily: A lot of the embedded machines
> (especially the ARM machines) simply refuse to work with "-nodefaults"
> and exit immediately instead. E.g.:
> 
> $ arm-softmmu/qemu-system-arm -nodefaults -nographic -M n810,accel=qtest
> qemu-system-arm: missing SecureDigital device
> 
> So we'd either need a rather big black list for the machines that do not
> work, or simply drop the "-nodefaults" tests from this patch.

Or we could try to test all machines anyway, but not consider it
an error if QEMU just does exit(1).  Can the qtest C API give us
that information?

(Or we could simply let -nodefaults aside by now, and do this
after we implement this test case in Python.)
Eduardo Habkost April 27, 2018, 12:34 a.m. UTC | #11
On Thu, Apr 26, 2018 at 01:54:43PM +0200, Markus Armbruster wrote:
> Thomas Huth <thuth@redhat.com> writes:
> 
> > On 17.04.2018 14:12, Markus Armbruster wrote:
> >> Thomas Huth <thuth@redhat.com> writes:
> >> 
> >>> Many device introspection crashes only happen if you are using a
> >>> certain machine, e.g.:
> >>>
> >>> $ ppc-softmmu/qemu-system-ppc -S -M ref405ep,accel=qtest -qmp stdio
> >>> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 11, "major": 2},
> >>>  "package": "build-all"}, "capabilities": []}}
> >>> { 'execute': 'qmp_capabilities' }
> >>> {"return": {}}
> >>> { 'execute': 'device-list-properties',
> >>>   'arguments': {'typename': 'macio-newworld'}}
> >>> Unexpected error in qemu_chr_fe_init() at chardev/char-fe.c:222:
> >>> Device 'serial0' is in use
> >>> Aborted (core dumped)
> >>>
> >>> To be able to catch these problems, let's extend the device-introspect
> >>> test to check the devices on all machine types. Since this is a rather
> >>> slow operation, the test is only run in "SPEED=slow" mode.
> >> 
> >> If the device works with one machine type, it has a decent chance to
> >> work with others, too.  Thus, testing each device with every machine
> >> type is overkill.  I appreciate having overkill as an option :)
> >> 
> >> What I'd like to see for a quick "make check" is testing each device
> >> once.  That should flush out most bugs.  
> >
> > That's already done with the "none" machine.
> 
> I was too terse.  We test each device with -machine none for every
> target.  Fine if that's quick enough.  If not, we might want to reduce
> redundancy there.
> 
> Actually, a worse offender in the "waste everybody's time via redunancy"
> department could be qom-test.
> 
> > Anyway, do you think my patch here is useful and has a chance of getting
> > included? I.e. shall I re-spin this as a non-RFC patch? Or shall we
> > rather wait for Eduardo's python-based tests to get included into the
> > repository?
> 
> I don't mind having make check SPEED=slow run more extensive tests.
> Assuming we actually run them at least once in a while, which seems
> doubtful.

The infrastructure for Python-based tests might take a while to
be included, as I'm busy with other stuff right now.  I wouldn't
mind including this patch, as long as you don't mind seeing it
deleted after we reimplement it in Python.
Thomas Huth April 27, 2018, 3:45 a.m. UTC | #12
On 27.04.2018 02:34, Eduardo Habkost wrote:
> On Thu, Apr 26, 2018 at 01:54:43PM +0200, Markus Armbruster wrote:
>> Thomas Huth <thuth@redhat.com> writes:
>>
>>> On 17.04.2018 14:12, Markus Armbruster wrote:
>>>> Thomas Huth <thuth@redhat.com> writes:
>>>>
>>>>> Many device introspection crashes only happen if you are using a
>>>>> certain machine, e.g.:
>>>>>
>>>>> $ ppc-softmmu/qemu-system-ppc -S -M ref405ep,accel=qtest -qmp stdio
>>>>> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 11, "major": 2},
>>>>>  "package": "build-all"}, "capabilities": []}}
>>>>> { 'execute': 'qmp_capabilities' }
>>>>> {"return": {}}
>>>>> { 'execute': 'device-list-properties',
>>>>>   'arguments': {'typename': 'macio-newworld'}}
>>>>> Unexpected error in qemu_chr_fe_init() at chardev/char-fe.c:222:
>>>>> Device 'serial0' is in use
>>>>> Aborted (core dumped)
>>>>>
>>>>> To be able to catch these problems, let's extend the device-introspect
>>>>> test to check the devices on all machine types. Since this is a rather
>>>>> slow operation, the test is only run in "SPEED=slow" mode.
>>>>
>>>> If the device works with one machine type, it has a decent chance to
>>>> work with others, too.  Thus, testing each device with every machine
>>>> type is overkill.  I appreciate having overkill as an option :)
>>>>
>>>> What I'd like to see for a quick "make check" is testing each device
>>>> once.  That should flush out most bugs.  
>>>
>>> That's already done with the "none" machine.
>>
>> I was too terse.  We test each device with -machine none for every
>> target.  Fine if that's quick enough.  If not, we might want to reduce
>> redundancy there.
>>
>> Actually, a worse offender in the "waste everybody's time via redunancy"
>> department could be qom-test.
>>
>>> Anyway, do you think my patch here is useful and has a chance of getting
>>> included? I.e. shall I re-spin this as a non-RFC patch? Or shall we
>>> rather wait for Eduardo's python-based tests to get included into the
>>> repository?
>>
>> I don't mind having make check SPEED=slow run more extensive tests.
>> Assuming we actually run them at least once in a while, which seems
>> doubtful.
> 
> The infrastructure for Python-based tests might take a while to
> be included, as I'm busy with other stuff right now.  I wouldn't
> mind including this patch, as long as you don't mind seeing it
> deleted after we reimplement it in Python.

Fine for me.

 Thomas
Thomas Huth April 27, 2018, 3:52 a.m. UTC | #13
On 27.04.2018 02:32, Eduardo Habkost wrote:
> On Thu, Apr 26, 2018 at 05:20:25PM +0200, Thomas Huth wrote:
>> On 26.04.2018 13:45, Markus Armbruster wrote:
>>> Thomas Huth <thuth@redhat.com> writes:
>> [...]
>>>> @@ -260,6 +263,26 @@ static void test_abstract_interfaces(void)
>>>>      qtest_end();
>>>>  }
>>>>  
>>>> +static void add_machine_test_case(const char *mname)
>>>> +{
>>>> +    char *path, *args;
>>>> +
>>>> +    /* Ignore blacklisted machines */
>>>> +    if (g_str_equal("xenfv", mname) || g_str_equal("xenpv", mname)) {
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    path = g_strdup_printf("device/introspect/concrete-defaults-%s", mname);
>>>> +    args = g_strdup_printf("-machine %s", mname);
>>>> +    qtest_add_data_func(path, args, test_device_intro_concrete);
>>>
>>> This runs test_device_intro_concrete() with "-machine M" for all machine
>>> types M, in SPEED=slow mode.
>>>
>>>> +    g_free(path);
>>>> +
>>>> +    path = g_strdup_printf("device/introspect/concrete-nodefaults-%s", mname);
>>>> +    args = g_strdup_printf("-nodefaults -machine %s", mname);
>>>> +    qtest_add_data_func(path, args, test_device_intro_concrete);
>>>
>>> This runs test_device_intro_concrete() with "-nodefaults -machine M" for
>>> all machine types M, in SPEED=slow mode.
>>>
>>> Has "without -nodefaults" exposed additional bugs?
>>
>> After testing this with all machines, I had to discover that
>> "-nodefaults" does not work so easily: A lot of the embedded machines
>> (especially the ARM machines) simply refuse to work with "-nodefaults"
>> and exit immediately instead. E.g.:
>>
>> $ arm-softmmu/qemu-system-arm -nodefaults -nographic -M n810,accel=qtest
>> qemu-system-arm: missing SecureDigital device
>>
>> So we'd either need a rather big black list for the machines that do not
>> work, or simply drop the "-nodefaults" tests from this patch.
> 
> Or we could try to test all machines anyway, but not consider it
> an error if QEMU just does exit(1).  Can the qtest C API give us
> that information?

At a first glance, I haven't seen an easy way to do this. I guess we
could do some polling with waitid() or do something with SIGCHLD, but...

> (Or we could simply let -nodefaults aside by now, and do this
> after we implement this test case in Python.)

... I'd rather prefer that for now, assuming that the test will later
get replaced by the python test anyway.

 Thomas
Thomas Huth April 27, 2018, 6:06 a.m. UTC | #14
On 26.04.2018 13:54, Markus Armbruster wrote:
[...]
> Actually, a worse offender in the "waste everybody's time via redunancy"
> department could be qom-test.

Shall we change qom-test to also only test with the "none" machine in
the normal "make check" mode and only do the full test with all machines
in "make check SPEED=slow" ?

 Thomas
Markus Armbruster April 27, 2018, 6:29 a.m. UTC | #15
Thomas Huth <thuth@redhat.com> writes:

> On 26.04.2018 13:54, Markus Armbruster wrote:
> [...]
>> Actually, a worse offender in the "waste everybody's time via redunancy"
>> department could be qom-test.
>
> Shall we change qom-test to also only test with the "none" machine in
> the normal "make check" mode and only do the full test with all machines
> in "make check SPEED=slow" ?

I suggest you give it a try and see how much time it saves on "make
check" without SPEED=slow.  If you like the result, find out which
devices this no longer tests, if any.
Markus Armbruster April 27, 2018, 6:31 a.m. UTC | #16
Eduardo Habkost <ehabkost@redhat.com> writes:

> On Thu, Apr 26, 2018 at 05:20:25PM +0200, Thomas Huth wrote:
>> On 26.04.2018 13:45, Markus Armbruster wrote:
>> > Thomas Huth <thuth@redhat.com> writes:
>> [...]
>> >> @@ -260,6 +263,26 @@ static void test_abstract_interfaces(void)
>> >>      qtest_end();
>> >>  }
>> >>  
>> >> +static void add_machine_test_case(const char *mname)
>> >> +{
>> >> +    char *path, *args;
>> >> +
>> >> +    /* Ignore blacklisted machines */
>> >> +    if (g_str_equal("xenfv", mname) || g_str_equal("xenpv", mname)) {
>> >> +        return;
>> >> +    }
>> >> +
>> >> +    path = g_strdup_printf("device/introspect/concrete-defaults-%s", mname);
>> >> +    args = g_strdup_printf("-machine %s", mname);
>> >> +    qtest_add_data_func(path, args, test_device_intro_concrete);
>> > 
>> > This runs test_device_intro_concrete() with "-machine M" for all machine
>> > types M, in SPEED=slow mode.
>> > 
>> >> +    g_free(path);
>> >> +
>> >> +    path = g_strdup_printf("device/introspect/concrete-nodefaults-%s", mname);
>> >> +    args = g_strdup_printf("-nodefaults -machine %s", mname);
>> >> +    qtest_add_data_func(path, args, test_device_intro_concrete);
>> > 
>> > This runs test_device_intro_concrete() with "-nodefaults -machine M" for
>> > all machine types M, in SPEED=slow mode.
>> > 
>> > Has "without -nodefaults" exposed additional bugs?
>> 
>> After testing this with all machines, I had to discover that
>> "-nodefaults" does not work so easily: A lot of the embedded machines
>> (especially the ARM machines) simply refuse to work with "-nodefaults"
>> and exit immediately instead. E.g.:
>> 
>> $ arm-softmmu/qemu-system-arm -nodefaults -nographic -M n810,accel=qtest
>> qemu-system-arm: missing SecureDigital device

These are all bugs.  --nodefaults is supposed to suppress *optional*
devices, not mandatory ones.

>> So we'd either need a rather big black list for the machines that do not
>> work, or simply drop the "-nodefaults" tests from this patch.
>
> Or we could try to test all machines anyway, but not consider it
> an error if QEMU just does exit(1).  Can the qtest C API give us
> that information?
>
> (Or we could simply let -nodefaults aside by now, and do this
> after we implement this test case in Python.)

Or we could fix the bugs.
Thomas Huth April 27, 2018, 7:31 a.m. UTC | #17
On 27.04.2018 08:31, Markus Armbruster wrote:
> Eduardo Habkost <ehabkost@redhat.com> writes:
> 
>> On Thu, Apr 26, 2018 at 05:20:25PM +0200, Thomas Huth wrote:
>>> On 26.04.2018 13:45, Markus Armbruster wrote:
>>>> Thomas Huth <thuth@redhat.com> writes:
>>> [...]
>>>>> @@ -260,6 +263,26 @@ static void test_abstract_interfaces(void)
>>>>>      qtest_end();
>>>>>  }
>>>>>  
>>>>> +static void add_machine_test_case(const char *mname)
>>>>> +{
>>>>> +    char *path, *args;
>>>>> +
>>>>> +    /* Ignore blacklisted machines */
>>>>> +    if (g_str_equal("xenfv", mname) || g_str_equal("xenpv", mname)) {
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>> +    path = g_strdup_printf("device/introspect/concrete-defaults-%s", mname);
>>>>> +    args = g_strdup_printf("-machine %s", mname);
>>>>> +    qtest_add_data_func(path, args, test_device_intro_concrete);
>>>>
>>>> This runs test_device_intro_concrete() with "-machine M" for all machine
>>>> types M, in SPEED=slow mode.
>>>>
>>>>> +    g_free(path);
>>>>> +
>>>>> +    path = g_strdup_printf("device/introspect/concrete-nodefaults-%s", mname);
>>>>> +    args = g_strdup_printf("-nodefaults -machine %s", mname);
>>>>> +    qtest_add_data_func(path, args, test_device_intro_concrete);
>>>>
>>>> This runs test_device_intro_concrete() with "-nodefaults -machine M" for
>>>> all machine types M, in SPEED=slow mode.
>>>>
>>>> Has "without -nodefaults" exposed additional bugs?
>>>
>>> After testing this with all machines, I had to discover that
>>> "-nodefaults" does not work so easily: A lot of the embedded machines
>>> (especially the ARM machines) simply refuse to work with "-nodefaults"
>>> and exit immediately instead. E.g.:
>>>
>>> $ arm-softmmu/qemu-system-arm -nodefaults -nographic -M n810,accel=qtest
>>> qemu-system-arm: missing SecureDigital device
> 
> These are all bugs.  --nodefaults is supposed to suppress *optional*
> devices, not mandatory ones.

Even if we fix all the issues, there is still another cosmetic problem:
Since there are no entries in nd_table[], all the boards with embedded
NICs start to spill out "warning: nic XYZ has no peer". Should we simply
suppress that warning in qtest mode?

 Thomas
Markus Armbruster April 27, 2018, 8:05 a.m. UTC | #18
Thomas Huth <thuth@redhat.com> writes:

> On 27.04.2018 08:31, Markus Armbruster wrote:
>> Eduardo Habkost <ehabkost@redhat.com> writes:
>> 
>>> On Thu, Apr 26, 2018 at 05:20:25PM +0200, Thomas Huth wrote:
>>>> On 26.04.2018 13:45, Markus Armbruster wrote:
>>>>> Thomas Huth <thuth@redhat.com> writes:
>>>> [...]
>>>>>> @@ -260,6 +263,26 @@ static void test_abstract_interfaces(void)
>>>>>>      qtest_end();
>>>>>>  }
>>>>>>  
>>>>>> +static void add_machine_test_case(const char *mname)
>>>>>> +{
>>>>>> +    char *path, *args;
>>>>>> +
>>>>>> +    /* Ignore blacklisted machines */
>>>>>> +    if (g_str_equal("xenfv", mname) || g_str_equal("xenpv", mname)) {
>>>>>> +        return;
>>>>>> +    }
>>>>>> +
>>>>>> +    path = g_strdup_printf("device/introspect/concrete-defaults-%s", mname);
>>>>>> +    args = g_strdup_printf("-machine %s", mname);
>>>>>> +    qtest_add_data_func(path, args, test_device_intro_concrete);
>>>>>
>>>>> This runs test_device_intro_concrete() with "-machine M" for all machine
>>>>> types M, in SPEED=slow mode.
>>>>>
>>>>>> +    g_free(path);
>>>>>> +
>>>>>> +    path = g_strdup_printf("device/introspect/concrete-nodefaults-%s", mname);
>>>>>> +    args = g_strdup_printf("-nodefaults -machine %s", mname);
>>>>>> +    qtest_add_data_func(path, args, test_device_intro_concrete);
>>>>>
>>>>> This runs test_device_intro_concrete() with "-nodefaults -machine M" for
>>>>> all machine types M, in SPEED=slow mode.
>>>>>
>>>>> Has "without -nodefaults" exposed additional bugs?
>>>>
>>>> After testing this with all machines, I had to discover that
>>>> "-nodefaults" does not work so easily: A lot of the embedded machines
>>>> (especially the ARM machines) simply refuse to work with "-nodefaults"
>>>> and exit immediately instead. E.g.:
>>>>
>>>> $ arm-softmmu/qemu-system-arm -nodefaults -nographic -M n810,accel=qtest
>>>> qemu-system-arm: missing SecureDigital device
>> 
>> These are all bugs.  --nodefaults is supposed to suppress *optional*
>> devices, not mandatory ones.
>
> Even if we fix all the issues, there is still another cosmetic problem:
> Since there are no entries in nd_table[], all the boards with embedded
> NICs start to spill out "warning: nic XYZ has no peer". Should we simply
> suppress that warning in qtest mode?

Makes sense to me.
Peter Maydell April 27, 2018, 10:20 a.m. UTC | #19
On 27 April 2018 at 07:06, Thomas Huth <thuth@redhat.com> wrote:
> On 26.04.2018 13:54, Markus Armbruster wrote:
> [...]
>> Actually, a worse offender in the "waste everybody's time via redunancy"
>> department could be qom-test.
>
> Shall we change qom-test to also only test with the "none" machine in
> the normal "make check" mode and only do the full test with all machines
> in "make check SPEED=slow" ?

We definitely want something that tries to instantiate every
machine, because that does catch bugs.

thanks
-- PMM
Thomas Huth April 27, 2018, 10:24 a.m. UTC | #20
On 27.04.2018 12:20, Peter Maydell wrote:
> On 27 April 2018 at 07:06, Thomas Huth <thuth@redhat.com> wrote:
>> On 26.04.2018 13:54, Markus Armbruster wrote:
>> [...]
>>> Actually, a worse offender in the "waste everybody's time via redunancy"
>>> department could be qom-test.
>>
>> Shall we change qom-test to also only test with the "none" machine in
>> the normal "make check" mode and only do the full test with all machines
>> in "make check SPEED=slow" ?
> 
> We definitely want something that tries to instantiate every
> machine, because that does catch bugs.

Yes, after having a closer look at this one, I also think that we should
*not* change it to run with "none" by default only. The 'qom-list'
command results in quite a different output depending on which machine
you run it on.

 Thomas
Markus Armbruster April 27, 2018, 4:30 p.m. UTC | #21
Thomas Huth <thuth@redhat.com> writes:

> On 27.04.2018 12:20, Peter Maydell wrote:
>> On 27 April 2018 at 07:06, Thomas Huth <thuth@redhat.com> wrote:
>>> On 26.04.2018 13:54, Markus Armbruster wrote:
>>> [...]
>>>> Actually, a worse offender in the "waste everybody's time via redunancy"
>>>> department could be qom-test.

Supporting numbers:

$ time for i in *-softmmu/qemu-system-*; do [ -x $i ] || continue; QTEST_QEMU_BINARY=$i QTEST_QEMU_IMG=qemu-img MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} gtester -k --verbose -m=quick  tests/qom-test ; done
[...]
real	3m27.427s
user	2m7.141s
sys	1m44.354s

aarch64, arm, i386, x86_64 each take more than 30s.

For each target, we walk /machine and qom-get every property.  The test
passes if qom-get doesn't crash, the values we get don't matter.

For x86_64 alone, qom-test executes qom-get more than 45,000 times to
test almost 9,500 objects.  It gets the properties of more than 5000
qemu:memory-region objects, more than 2500 irq objects, almost 300
smbus-eeprom objects, 110 IDE objects, ...  It's nice we can test
qom-get doesn't crash on any of IDE's properties in 110 very slight
variations.  But most of the time, one of the variations would be
enough.

>>> Shall we change qom-test to also only test with the "none" machine in
>>> the normal "make check" mode and only do the full test with all machines
>>> in "make check SPEED=slow" ?
>> 
>> We definitely want something that tries to instantiate every
>> machine, because that does catch bugs.
>
> Yes, after having a closer look at this one, I also think that we should
> *not* change it to run with "none" by default only. The 'qom-list'
> command results in quite a different output depending on which machine
> you run it on.

Only running "none" is too naive.
Thomas Huth April 27, 2018, 4:36 p.m. UTC | #22
On 27.04.2018 18:30, Markus Armbruster wrote:
> Thomas Huth <thuth@redhat.com> writes:
> 
>> On 27.04.2018 12:20, Peter Maydell wrote:
>>> On 27 April 2018 at 07:06, Thomas Huth <thuth@redhat.com> wrote:
[...]
>>>> Shall we change qom-test to also only test with the "none" machine in
>>>> the normal "make check" mode and only do the full test with all machines
>>>> in "make check SPEED=slow" ?
>>>
>>> We definitely want something that tries to instantiate every
>>> machine, because that does catch bugs.
>>
>> Yes, after having a closer look at this one, I also think that we should
>> *not* change it to run with "none" by default only. The 'qom-list'
>> command results in quite a different output depending on which machine
>> you run it on.
> 
> Only running "none" is too naive.

For the targets that have "versioned" machine types, I think we could
skip all the older machine versions, so that we only test with
pc-i440fx-2.12 but not with pc-i440fx-2.11 and older anymore. That would
need some more or less clever algorithm to detect the latest version,
though.

 Thomas
Eduardo Habkost May 7, 2018, 1:53 p.m. UTC | #23
On Fri, Apr 27, 2018 at 08:31:58AM +0200, Markus Armbruster wrote:
> Eduardo Habkost <ehabkost@redhat.com> writes:
> 
> > On Thu, Apr 26, 2018 at 05:20:25PM +0200, Thomas Huth wrote:
> >> On 26.04.2018 13:45, Markus Armbruster wrote:
> >> > Thomas Huth <thuth@redhat.com> writes:
> >> [...]
> >> >> @@ -260,6 +263,26 @@ static void test_abstract_interfaces(void)
> >> >>      qtest_end();
> >> >>  }
> >> >>  
> >> >> +static void add_machine_test_case(const char *mname)
> >> >> +{
> >> >> +    char *path, *args;
> >> >> +
> >> >> +    /* Ignore blacklisted machines */
> >> >> +    if (g_str_equal("xenfv", mname) || g_str_equal("xenpv", mname)) {
> >> >> +        return;
> >> >> +    }
> >> >> +
> >> >> +    path = g_strdup_printf("device/introspect/concrete-defaults-%s", mname);
> >> >> +    args = g_strdup_printf("-machine %s", mname);
> >> >> +    qtest_add_data_func(path, args, test_device_intro_concrete);
> >> > 
> >> > This runs test_device_intro_concrete() with "-machine M" for all machine
> >> > types M, in SPEED=slow mode.
> >> > 
> >> >> +    g_free(path);
> >> >> +
> >> >> +    path = g_strdup_printf("device/introspect/concrete-nodefaults-%s", mname);
> >> >> +    args = g_strdup_printf("-nodefaults -machine %s", mname);
> >> >> +    qtest_add_data_func(path, args, test_device_intro_concrete);
> >> > 
> >> > This runs test_device_intro_concrete() with "-nodefaults -machine M" for
> >> > all machine types M, in SPEED=slow mode.
> >> > 
> >> > Has "without -nodefaults" exposed additional bugs?
> >> 
> >> After testing this with all machines, I had to discover that
> >> "-nodefaults" does not work so easily: A lot of the embedded machines
> >> (especially the ARM machines) simply refuse to work with "-nodefaults"
> >> and exit immediately instead. E.g.:
> >> 
> >> $ arm-softmmu/qemu-system-arm -nodefaults -nographic -M n810,accel=qtest
> >> qemu-system-arm: missing SecureDigital device
> 
> These are all bugs.  --nodefaults is supposed to suppress *optional*
> devices, not mandatory ones.

I'm not sure I understand the requirements.  What exactly is the
definition of "mandatory"?

A machine created by "qemu-system-x86_64 -machine pc -nodefaults"
is useless because it has no any device to boot from.  How is
that different from a n810 machine not booting because there's no
SD device?
Markus Armbruster May 7, 2018, 4:50 p.m. UTC | #24
Eduardo Habkost <ehabkost@redhat.com> writes:

> On Fri, Apr 27, 2018 at 08:31:58AM +0200, Markus Armbruster wrote:
>> Eduardo Habkost <ehabkost@redhat.com> writes:
>> 
>> > On Thu, Apr 26, 2018 at 05:20:25PM +0200, Thomas Huth wrote:
>> >> On 26.04.2018 13:45, Markus Armbruster wrote:
>> >> > Thomas Huth <thuth@redhat.com> writes:
>> >> [...]
>> >> >> @@ -260,6 +263,26 @@ static void test_abstract_interfaces(void)
>> >> >>      qtest_end();
>> >> >>  }
>> >> >>  
>> >> >> +static void add_machine_test_case(const char *mname)
>> >> >> +{
>> >> >> +    char *path, *args;
>> >> >> +
>> >> >> +    /* Ignore blacklisted machines */
>> >> >> +    if (g_str_equal("xenfv", mname) || g_str_equal("xenpv", mname)) {
>> >> >> +        return;
>> >> >> +    }
>> >> >> +
>> >> >> +    path = g_strdup_printf("device/introspect/concrete-defaults-%s", mname);
>> >> >> +    args = g_strdup_printf("-machine %s", mname);
>> >> >> +    qtest_add_data_func(path, args, test_device_intro_concrete);
>> >> > 
>> >> > This runs test_device_intro_concrete() with "-machine M" for all machine
>> >> > types M, in SPEED=slow mode.
>> >> > 
>> >> >> +    g_free(path);
>> >> >> +
>> >> >> +    path = g_strdup_printf("device/introspect/concrete-nodefaults-%s", mname);
>> >> >> +    args = g_strdup_printf("-nodefaults -machine %s", mname);
>> >> >> +    qtest_add_data_func(path, args, test_device_intro_concrete);
>> >> > 
>> >> > This runs test_device_intro_concrete() with "-nodefaults -machine M" for
>> >> > all machine types M, in SPEED=slow mode.
>> >> > 
>> >> > Has "without -nodefaults" exposed additional bugs?
>> >> 
>> >> After testing this with all machines, I had to discover that
>> >> "-nodefaults" does not work so easily: A lot of the embedded machines
>> >> (especially the ARM machines) simply refuse to work with "-nodefaults"
>> >> and exit immediately instead. E.g.:
>> >> 
>> >> $ arm-softmmu/qemu-system-arm -nodefaults -nographic -M n810,accel=qtest
>> >> qemu-system-arm: missing SecureDigital device
>> 
>> These are all bugs.  --nodefaults is supposed to suppress *optional*
>> devices, not mandatory ones.
>
> I'm not sure I understand the requirements.  What exactly is the
> definition of "mandatory"?
>
> A machine created by "qemu-system-x86_64 -machine pc -nodefaults"
> is useless because it has no any device to boot from.  How is
> that different from a n810 machine not booting because there's no
> SD device?

I propose:

* Stuff that's required for QEMU to run is not suppressed by -nodefaults

* Stuff that a real machine has soldered on is also not suppressed

* Stuff that can be pulled out of a real machine may be suppressed, even
  when that means the guest won't run

Does that make some sense?
Thomas Huth May 7, 2018, 5:02 p.m. UTC | #25
On 07.05.2018 18:50, Markus Armbruster wrote:
> Eduardo Habkost <ehabkost@redhat.com> writes:
> 
>> On Fri, Apr 27, 2018 at 08:31:58AM +0200, Markus Armbruster wrote:
>>> Eduardo Habkost <ehabkost@redhat.com> writes:
>>>
>>>> On Thu, Apr 26, 2018 at 05:20:25PM +0200, Thomas Huth wrote:
>>>>> On 26.04.2018 13:45, Markus Armbruster wrote:
>>>>>> Thomas Huth <thuth@redhat.com> writes:
>>>>> [...]
>>>>>>> @@ -260,6 +263,26 @@ static void test_abstract_interfaces(void)
>>>>>>>      qtest_end();
>>>>>>>  }
>>>>>>>  
>>>>>>> +static void add_machine_test_case(const char *mname)
>>>>>>> +{
>>>>>>> +    char *path, *args;
>>>>>>> +
>>>>>>> +    /* Ignore blacklisted machines */
>>>>>>> +    if (g_str_equal("xenfv", mname) || g_str_equal("xenpv", mname)) {
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    path = g_strdup_printf("device/introspect/concrete-defaults-%s", mname);
>>>>>>> +    args = g_strdup_printf("-machine %s", mname);
>>>>>>> +    qtest_add_data_func(path, args, test_device_intro_concrete);
>>>>>>
>>>>>> This runs test_device_intro_concrete() with "-machine M" for all machine
>>>>>> types M, in SPEED=slow mode.
>>>>>>
>>>>>>> +    g_free(path);
>>>>>>> +
>>>>>>> +    path = g_strdup_printf("device/introspect/concrete-nodefaults-%s", mname);
>>>>>>> +    args = g_strdup_printf("-nodefaults -machine %s", mname);
>>>>>>> +    qtest_add_data_func(path, args, test_device_intro_concrete);
>>>>>>
>>>>>> This runs test_device_intro_concrete() with "-nodefaults -machine M" for
>>>>>> all machine types M, in SPEED=slow mode.
>>>>>>
>>>>>> Has "without -nodefaults" exposed additional bugs?
>>>>>
>>>>> After testing this with all machines, I had to discover that
>>>>> "-nodefaults" does not work so easily: A lot of the embedded machines
>>>>> (especially the ARM machines) simply refuse to work with "-nodefaults"
>>>>> and exit immediately instead. E.g.:
>>>>>
>>>>> $ arm-softmmu/qemu-system-arm -nodefaults -nographic -M n810,accel=qtest
>>>>> qemu-system-arm: missing SecureDigital device
>>>
>>> These are all bugs.  --nodefaults is supposed to suppress *optional*
>>> devices, not mandatory ones.
>>
>> I'm not sure I understand the requirements.  What exactly is the
>> definition of "mandatory"?
>>
>> A machine created by "qemu-system-x86_64 -machine pc -nodefaults"
>> is useless because it has no any device to boot from.  How is
>> that different from a n810 machine not booting because there's no
>> SD device?
> 
> I propose:
> 
> * Stuff that's required for QEMU to run is not suppressed by -nodefaults
> 
> * Stuff that a real machine has soldered on is also not suppressed
> 
> * Stuff that can be pulled out of a real machine may be suppressed, even
>   when that means the guest won't run
> 
> Does that make some sense?

Makes sense. On a real machine, you could likely also remove the SD card
and load a kernel by other means, e.g. with a JTAG debug connector. So
it makes sense that you could also start the machine in QEMU without SD
card and load a kernel e.g. with the gdb stub instead.

 Thomas
Peter Maydell May 7, 2018, 5:04 p.m. UTC | #26
On 7 May 2018 at 17:50, Markus Armbruster <armbru@redhat.com> wrote:
> I propose:
>
> * Stuff that's required for QEMU to run is not suppressed by -nodefaults
>
> * Stuff that a real machine has soldered on is also not suppressed
>
> * Stuff that can be pulled out of a real machine may be suppressed, even
>   when that means the guest won't run
>
> Does that make some sense?

We might also want
 * Stuff that you can't add back in with an appropriate command line
   argument is not suppressed

though we might optimistically hope that the category is empty :-)

thanks
-- PMM
Eduardo Habkost May 7, 2018, 6:21 p.m. UTC | #27
On Mon, May 07, 2018 at 06:50:35PM +0200, Markus Armbruster wrote:
> Eduardo Habkost <ehabkost@redhat.com> writes:
> 
> > On Fri, Apr 27, 2018 at 08:31:58AM +0200, Markus Armbruster wrote:
> >> Eduardo Habkost <ehabkost@redhat.com> writes:
> >> 
> >> > On Thu, Apr 26, 2018 at 05:20:25PM +0200, Thomas Huth wrote:
> >> >> On 26.04.2018 13:45, Markus Armbruster wrote:
> >> >> > Thomas Huth <thuth@redhat.com> writes:
> >> >> [...]
> >> >> >> @@ -260,6 +263,26 @@ static void test_abstract_interfaces(void)
> >> >> >>      qtest_end();
> >> >> >>  }
> >> >> >>  
> >> >> >> +static void add_machine_test_case(const char *mname)
> >> >> >> +{
> >> >> >> +    char *path, *args;
> >> >> >> +
> >> >> >> +    /* Ignore blacklisted machines */
> >> >> >> +    if (g_str_equal("xenfv", mname) || g_str_equal("xenpv", mname)) {
> >> >> >> +        return;
> >> >> >> +    }
> >> >> >> +
> >> >> >> +    path = g_strdup_printf("device/introspect/concrete-defaults-%s", mname);
> >> >> >> +    args = g_strdup_printf("-machine %s", mname);
> >> >> >> +    qtest_add_data_func(path, args, test_device_intro_concrete);
> >> >> > 
> >> >> > This runs test_device_intro_concrete() with "-machine M" for all machine
> >> >> > types M, in SPEED=slow mode.
> >> >> > 
> >> >> >> +    g_free(path);
> >> >> >> +
> >> >> >> +    path = g_strdup_printf("device/introspect/concrete-nodefaults-%s", mname);
> >> >> >> +    args = g_strdup_printf("-nodefaults -machine %s", mname);
> >> >> >> +    qtest_add_data_func(path, args, test_device_intro_concrete);
> >> >> > 
> >> >> > This runs test_device_intro_concrete() with "-nodefaults -machine M" for
> >> >> > all machine types M, in SPEED=slow mode.
> >> >> > 
> >> >> > Has "without -nodefaults" exposed additional bugs?
> >> >> 
> >> >> After testing this with all machines, I had to discover that
> >> >> "-nodefaults" does not work so easily: A lot of the embedded machines
> >> >> (especially the ARM machines) simply refuse to work with "-nodefaults"
> >> >> and exit immediately instead. E.g.:
> >> >> 
> >> >> $ arm-softmmu/qemu-system-arm -nodefaults -nographic -M n810,accel=qtest
> >> >> qemu-system-arm: missing SecureDigital device
> >> 
> >> These are all bugs.  --nodefaults is supposed to suppress *optional*
> >> devices, not mandatory ones.
> >
> > I'm not sure I understand the requirements.  What exactly is the
> > definition of "mandatory"?
> >
> > A machine created by "qemu-system-x86_64 -machine pc -nodefaults"
> > is useless because it has no any device to boot from.  How is
> > that different from a n810 machine not booting because there's no
> > SD device?
> 
> I propose:
> 
> * Stuff that's required for QEMU to run is not suppressed by -nodefaults
> 
> * Stuff that a real machine has soldered on is also not suppressed
> 
> * Stuff that can be pulled out of a real machine may be suppressed, even
>   when that means the guest won't run

Makes sense to me.  It looks like the only obstacle for
tests/device-introspect and device-crash-test is the first rule.
"Guest won't boot" isn't a problem, but "QEMU won't run" is.

The first rule is easily testable, too: running
"$QEMU -machine $MACHINE -nodefaults" and not having a working
QMP monitor should be reported as a bug by automated tests.

Do we have an up-to-date list of machines that break this rule?
We can add this to
<https://wiki.qemu.org/Contribute/BiteSizedTasks>.
Thomas Huth May 7, 2018, 7:13 p.m. UTC | #28
On 07.05.2018 20:21, Eduardo Habkost wrote:
> On Mon, May 07, 2018 at 06:50:35PM +0200, Markus Armbruster wrote:
>> Eduardo Habkost <ehabkost@redhat.com> writes:
>>
>>> On Fri, Apr 27, 2018 at 08:31:58AM +0200, Markus Armbruster wrote:
>>>> Eduardo Habkost <ehabkost@redhat.com> writes:
>>>>
>>>>> On Thu, Apr 26, 2018 at 05:20:25PM +0200, Thomas Huth wrote:
>>>>>> On 26.04.2018 13:45, Markus Armbruster wrote:
>>>>>>> Thomas Huth <thuth@redhat.com> writes:
>>>>>> [...]
>>>>>>>> @@ -260,6 +263,26 @@ static void test_abstract_interfaces(void)
>>>>>>>>      qtest_end();
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> +static void add_machine_test_case(const char *mname)
>>>>>>>> +{
>>>>>>>> +    char *path, *args;
>>>>>>>> +
>>>>>>>> +    /* Ignore blacklisted machines */
>>>>>>>> +    if (g_str_equal("xenfv", mname) || g_str_equal("xenpv", mname)) {
>>>>>>>> +        return;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    path = g_strdup_printf("device/introspect/concrete-defaults-%s", mname);
>>>>>>>> +    args = g_strdup_printf("-machine %s", mname);
>>>>>>>> +    qtest_add_data_func(path, args, test_device_intro_concrete);
>>>>>>>
>>>>>>> This runs test_device_intro_concrete() with "-machine M" for all machine
>>>>>>> types M, in SPEED=slow mode.
>>>>>>>
>>>>>>>> +    g_free(path);
>>>>>>>> +
>>>>>>>> +    path = g_strdup_printf("device/introspect/concrete-nodefaults-%s", mname);
>>>>>>>> +    args = g_strdup_printf("-nodefaults -machine %s", mname);
>>>>>>>> +    qtest_add_data_func(path, args, test_device_intro_concrete);
>>>>>>>
>>>>>>> This runs test_device_intro_concrete() with "-nodefaults -machine M" for
>>>>>>> all machine types M, in SPEED=slow mode.
>>>>>>>
>>>>>>> Has "without -nodefaults" exposed additional bugs?
>>>>>>
>>>>>> After testing this with all machines, I had to discover that
>>>>>> "-nodefaults" does not work so easily: A lot of the embedded machines
>>>>>> (especially the ARM machines) simply refuse to work with "-nodefaults"
>>>>>> and exit immediately instead. E.g.:
>>>>>>
>>>>>> $ arm-softmmu/qemu-system-arm -nodefaults -nographic -M n810,accel=qtest
>>>>>> qemu-system-arm: missing SecureDigital device
>>>>
>>>> These are all bugs.  --nodefaults is supposed to suppress *optional*
>>>> devices, not mandatory ones.
>>>
>>> I'm not sure I understand the requirements.  What exactly is the
>>> definition of "mandatory"?
>>>
>>> A machine created by "qemu-system-x86_64 -machine pc -nodefaults"
>>> is useless because it has no any device to boot from.  How is
>>> that different from a n810 machine not booting because there's no
>>> SD device?
>>
>> I propose:
>>
>> * Stuff that's required for QEMU to run is not suppressed by -nodefaults
>>
>> * Stuff that a real machine has soldered on is also not suppressed
>>
>> * Stuff that can be pulled out of a real machine may be suppressed, even
>>   when that means the guest won't run
> 
> Makes sense to me.  It looks like the only obstacle for
> tests/device-introspect and device-crash-test is the first rule.
> "Guest won't boot" isn't a problem, but "QEMU won't run" is.
> 
> The first rule is easily testable, too: running
> "$QEMU -machine $MACHINE -nodefaults" and not having a working
> QMP monitor should be reported as a bug by automated tests.

You mean with "-accel qtest" or without? With "-accel qtest" we should
pretty soon be fine, after Peter's current PULL request has been merged
(which contains a patch from me for fixing these SD card problems with
ARM machines).
Without "-accel qtest", things are not that easy, unfortunately. Lots of
boards require "-kernel" or "-bios" and refuse to work without. So you
can hardly test "-nodefaults" automatically in the normal tcg mode. (But
maybe all boards should allow to start QEMU in case you've at least also
specified "-S" ? ... in that case we've got plenty of work for
BiteSizeTasks ;-) )

 Thomas
Eduardo Habkost May 7, 2018, 7:32 p.m. UTC | #29
On Mon, May 07, 2018 at 09:13:57PM +0200, Thomas Huth wrote:
> On 07.05.2018 20:21, Eduardo Habkost wrote:
> > On Mon, May 07, 2018 at 06:50:35PM +0200, Markus Armbruster wrote:
> >> Eduardo Habkost <ehabkost@redhat.com> writes:
> >>
> >>> On Fri, Apr 27, 2018 at 08:31:58AM +0200, Markus Armbruster wrote:
> >>>> Eduardo Habkost <ehabkost@redhat.com> writes:
> >>>>
> >>>>> On Thu, Apr 26, 2018 at 05:20:25PM +0200, Thomas Huth wrote:
> >>>>>> On 26.04.2018 13:45, Markus Armbruster wrote:
> >>>>>>> Thomas Huth <thuth@redhat.com> writes:
> >>>>>> [...]
> >>>>>>>> @@ -260,6 +263,26 @@ static void test_abstract_interfaces(void)
> >>>>>>>>      qtest_end();
> >>>>>>>>  }
> >>>>>>>>  
> >>>>>>>> +static void add_machine_test_case(const char *mname)
> >>>>>>>> +{
> >>>>>>>> +    char *path, *args;
> >>>>>>>> +
> >>>>>>>> +    /* Ignore blacklisted machines */
> >>>>>>>> +    if (g_str_equal("xenfv", mname) || g_str_equal("xenpv", mname)) {
> >>>>>>>> +        return;
> >>>>>>>> +    }
> >>>>>>>> +
> >>>>>>>> +    path = g_strdup_printf("device/introspect/concrete-defaults-%s", mname);
> >>>>>>>> +    args = g_strdup_printf("-machine %s", mname);
> >>>>>>>> +    qtest_add_data_func(path, args, test_device_intro_concrete);
> >>>>>>>
> >>>>>>> This runs test_device_intro_concrete() with "-machine M" for all machine
> >>>>>>> types M, in SPEED=slow mode.
> >>>>>>>
> >>>>>>>> +    g_free(path);
> >>>>>>>> +
> >>>>>>>> +    path = g_strdup_printf("device/introspect/concrete-nodefaults-%s", mname);
> >>>>>>>> +    args = g_strdup_printf("-nodefaults -machine %s", mname);
> >>>>>>>> +    qtest_add_data_func(path, args, test_device_intro_concrete);
> >>>>>>>
> >>>>>>> This runs test_device_intro_concrete() with "-nodefaults -machine M" for
> >>>>>>> all machine types M, in SPEED=slow mode.
> >>>>>>>
> >>>>>>> Has "without -nodefaults" exposed additional bugs?
> >>>>>>
> >>>>>> After testing this with all machines, I had to discover that
> >>>>>> "-nodefaults" does not work so easily: A lot of the embedded machines
> >>>>>> (especially the ARM machines) simply refuse to work with "-nodefaults"
> >>>>>> and exit immediately instead. E.g.:
> >>>>>>
> >>>>>> $ arm-softmmu/qemu-system-arm -nodefaults -nographic -M n810,accel=qtest
> >>>>>> qemu-system-arm: missing SecureDigital device
> >>>>
> >>>> These are all bugs.  --nodefaults is supposed to suppress *optional*
> >>>> devices, not mandatory ones.
> >>>
> >>> I'm not sure I understand the requirements.  What exactly is the
> >>> definition of "mandatory"?
> >>>
> >>> A machine created by "qemu-system-x86_64 -machine pc -nodefaults"
> >>> is useless because it has no any device to boot from.  How is
> >>> that different from a n810 machine not booting because there's no
> >>> SD device?
> >>
> >> I propose:
> >>
> >> * Stuff that's required for QEMU to run is not suppressed by -nodefaults
> >>
> >> * Stuff that a real machine has soldered on is also not suppressed
> >>
> >> * Stuff that can be pulled out of a real machine may be suppressed, even
> >>   when that means the guest won't run
> > 
> > Makes sense to me.  It looks like the only obstacle for
> > tests/device-introspect and device-crash-test is the first rule.
> > "Guest won't boot" isn't a problem, but "QEMU won't run" is.
> > 
> > The first rule is easily testable, too: running
> > "$QEMU -machine $MACHINE -nodefaults" and not having a working
> > QMP monitor should be reported as a bug by automated tests.
> 
> You mean with "-accel qtest" or without? With "-accel qtest" we should
> pretty soon be fine, after Peter's current PULL request has been merged
> (which contains a patch from me for fixing these SD card problems with
> ARM machines).
> Without "-accel qtest", things are not that easy, unfortunately. Lots of
> boards require "-kernel" or "-bios" and refuse to work without. So you
> can hardly test "-nodefaults" automatically in the normal tcg mode. (But
> maybe all boards should allow to start QEMU in case you've at least also
> specified "-S" ? ... in that case we've got plenty of work for
> BiteSizeTasks ;-) )

Hmm, maybe it's not a bite-sized task after all.  :)

Should we do this gradually?

* Working with -accel qtest is useful, and sounds like an easier goal;
* working with -S seems desirable too;
* working without -S (even if the emulated CPU crashes and burns)
  would be interesting.

Related question: what are the use cases where we require
"-accel qtest" and "-S" wouldn't work?

Are the requirements and goals of "-accel qtest" documented
somewhere?  Without documentation, it's hard to say when a given
qtest_enabled() call in the code is reasonable, or a hack we want
to get rid of.
Markus Armbruster May 8, 2018, 5:41 a.m. UTC | #30
Eduardo Habkost <ehabkost@redhat.com> writes:

> On Mon, May 07, 2018 at 09:13:57PM +0200, Thomas Huth wrote:
>> On 07.05.2018 20:21, Eduardo Habkost wrote:
>> > On Mon, May 07, 2018 at 06:50:35PM +0200, Markus Armbruster wrote:
>> >> Eduardo Habkost <ehabkost@redhat.com> writes:
>> >>
>> >>> On Fri, Apr 27, 2018 at 08:31:58AM +0200, Markus Armbruster wrote:
>> >>>> Eduardo Habkost <ehabkost@redhat.com> writes:
>> >>>>
>> >>>>> On Thu, Apr 26, 2018 at 05:20:25PM +0200, Thomas Huth wrote:
>> >>>>>> On 26.04.2018 13:45, Markus Armbruster wrote:
>> >>>>>>> Thomas Huth <thuth@redhat.com> writes:
>> >>>>>> [...]
>> >>>>>>>> @@ -260,6 +263,26 @@ static void test_abstract_interfaces(void)
>> >>>>>>>>      qtest_end();
>> >>>>>>>>  }
>> >>>>>>>>  
>> >>>>>>>> +static void add_machine_test_case(const char *mname)
>> >>>>>>>> +{
>> >>>>>>>> +    char *path, *args;
>> >>>>>>>> +
>> >>>>>>>> +    /* Ignore blacklisted machines */
>> >>>>>>>> +    if (g_str_equal("xenfv", mname) || g_str_equal("xenpv", mname)) {
>> >>>>>>>> +        return;
>> >>>>>>>> +    }
>> >>>>>>>> +
>> >>>>>>>> +    path = g_strdup_printf("device/introspect/concrete-defaults-%s", mname);
>> >>>>>>>> +    args = g_strdup_printf("-machine %s", mname);
>> >>>>>>>> +    qtest_add_data_func(path, args, test_device_intro_concrete);
>> >>>>>>>
>> >>>>>>> This runs test_device_intro_concrete() with "-machine M" for all machine
>> >>>>>>> types M, in SPEED=slow mode.
>> >>>>>>>
>> >>>>>>>> +    g_free(path);
>> >>>>>>>> +
>> >>>>>>>> +    path = g_strdup_printf("device/introspect/concrete-nodefaults-%s", mname);
>> >>>>>>>> +    args = g_strdup_printf("-nodefaults -machine %s", mname);
>> >>>>>>>> +    qtest_add_data_func(path, args, test_device_intro_concrete);
>> >>>>>>>
>> >>>>>>> This runs test_device_intro_concrete() with "-nodefaults -machine M" for
>> >>>>>>> all machine types M, in SPEED=slow mode.
>> >>>>>>>
>> >>>>>>> Has "without -nodefaults" exposed additional bugs?
>> >>>>>>
>> >>>>>> After testing this with all machines, I had to discover that
>> >>>>>> "-nodefaults" does not work so easily: A lot of the embedded machines
>> >>>>>> (especially the ARM machines) simply refuse to work with "-nodefaults"
>> >>>>>> and exit immediately instead. E.g.:
>> >>>>>>
>> >>>>>> $ arm-softmmu/qemu-system-arm -nodefaults -nographic -M n810,accel=qtest
>> >>>>>> qemu-system-arm: missing SecureDigital device
>> >>>>
>> >>>> These are all bugs.  --nodefaults is supposed to suppress *optional*
>> >>>> devices, not mandatory ones.
>> >>>
>> >>> I'm not sure I understand the requirements.  What exactly is the
>> >>> definition of "mandatory"?
>> >>>
>> >>> A machine created by "qemu-system-x86_64 -machine pc -nodefaults"
>> >>> is useless because it has no any device to boot from.  How is
>> >>> that different from a n810 machine not booting because there's no
>> >>> SD device?
>> >>
>> >> I propose:
>> >>
>> >> * Stuff that's required for QEMU to run is not suppressed by -nodefaults
>> >>
>> >> * Stuff that a real machine has soldered on is also not suppressed
>> >>
>> >> * Stuff that can be pulled out of a real machine may be suppressed, even
>> >>   when that means the guest won't run
>> > 
>> > Makes sense to me.  It looks like the only obstacle for
>> > tests/device-introspect and device-crash-test is the first rule.
>> > "Guest won't boot" isn't a problem, but "QEMU won't run" is.
>> > 
>> > The first rule is easily testable, too: running
>> > "$QEMU -machine $MACHINE -nodefaults" and not having a working
>> > QMP monitor should be reported as a bug by automated tests.
>> 
>> You mean with "-accel qtest" or without? With "-accel qtest" we should
>> pretty soon be fine, after Peter's current PULL request has been merged
>> (which contains a patch from me for fixing these SD card problems with
>> ARM machines).
>> Without "-accel qtest", things are not that easy, unfortunately. Lots of
>> boards require "-kernel" or "-bios" and refuse to work without. So you
>> can hardly test "-nodefaults" automatically in the normal tcg mode. (But
>> maybe all boards should allow to start QEMU in case you've at least also
>> specified "-S" ? ... in that case we've got plenty of work for
>> BiteSizeTasks ;-) )
>
> Hmm, maybe it's not a bite-sized task after all.  :)
>
> Should we do this gradually?
>
> * Working with -accel qtest is useful, and sounds like an easier goal;

This is immediately useful.

> * working with -S seems desirable too;
> * working without -S (even if the emulated CPU crashes and burns)
>   would be interesting.

Nice to have for consistency, I think.

> Related question: what are the use cases where we require
> "-accel qtest" and "-S" wouldn't work?
>
> Are the requirements and goals of "-accel qtest" documented
> somewhere?  Without documentation, it's hard to say when a given
> qtest_enabled() call in the code is reasonable, or a hack we want
> to get rid of.

Good question.

    $ git-grep -l qtest docs/
    docs/devel/testing.rst

Its section QTest doesn't mention -accel.

The accelerator was added in commit c7f0f3b1c82.  The commit message
mentions it [lines wrapped for readability]:

    The idea behind qtest is pretty simple.  Instead of executing a CPU
    via TCG or KVM, rely on an external process to send events to the
    device model that the CPU would normally generate.
    
    qtest presents itself as an accelerator.  In addition, a new option
    is added to establish a qtest server (-qtest) that takes a character
    device.  This is what allows the external process to send CPU events
    to the device model.
    
    qtest uses a simple line based protocol to send the events.
    Documentation of that protocol is in qtest.c.

Less than clear.

In my understanding, the purpose of the qtest accelerator is to suppress
guest execution.  We later put it to secondary use of suppressing
whatever stuff (such as warnings) gets in the way of the test suite.
diff mbox

Patch

diff --git a/tests/device-introspect-test.c b/tests/device-introspect-test.c
index b80058f..a9b9cf7 100644
--- a/tests/device-introspect-test.c
+++ b/tests/device-introspect-test.c
@@ -105,6 +105,8 @@  static void test_one_device(const char *type)
     QDict *resp;
     char *help, *qom_tree;
 
+    g_debug("Testing device '%s'", type);
+
     resp = qmp("{'execute': 'device-list-properties',"
                " 'arguments': {'typename': %s}}",
                type);
@@ -206,13 +208,13 @@  static void test_device_intro_abstract(void)
     qtest_end();
 }
 
-static void test_device_intro_concrete(void)
+static void test_device_intro_concrete(gconstpointer args)
 {
     QList *types;
     QListEntry *entry;
     const char *type;
 
-    qtest_start(common_args);
+    qtest_start((const char *)args);
     types = device_type_list(false);
 
     QLIST_FOREACH_ENTRY(types, entry) {
@@ -224,6 +226,7 @@  static void test_device_intro_concrete(void)
 
     QDECREF(types);
     qtest_end();
+    g_free((void *)args);
 }
 
 static void test_abstract_interfaces(void)
@@ -260,6 +263,26 @@  static void test_abstract_interfaces(void)
     qtest_end();
 }
 
+static void add_machine_test_case(const char *mname)
+{
+    char *path, *args;
+
+    /* Ignore blacklisted machines */
+    if (g_str_equal("xenfv", mname) || g_str_equal("xenpv", mname)) {
+        return;
+    }
+
+    path = g_strdup_printf("device/introspect/concrete-defaults-%s", mname);
+    args = g_strdup_printf("-machine %s", mname);
+    qtest_add_data_func(path, args, test_device_intro_concrete);
+    g_free(path);
+
+    path = g_strdup_printf("device/introspect/concrete-nodefaults-%s", mname);
+    args = g_strdup_printf("-nodefaults -machine %s", mname);
+    qtest_add_data_func(path, args, test_device_intro_concrete);
+    g_free(path);
+}
+
 int main(int argc, char **argv)
 {
     g_test_init(&argc, &argv, NULL);
@@ -268,8 +291,12 @@  int main(int argc, char **argv)
     qtest_add_func("device/introspect/list-fields", test_qom_list_fields);
     qtest_add_func("device/introspect/none", test_device_intro_none);
     qtest_add_func("device/introspect/abstract", test_device_intro_abstract);
-    qtest_add_func("device/introspect/concrete", test_device_intro_concrete);
     qtest_add_func("device/introspect/abstract-interfaces", test_abstract_interfaces);
+    qtest_add_data_func("device/introspect/concrete", g_strdup(common_args),
+                        test_device_intro_concrete);
+    if (g_test_slow()) {
+        qtest_cb_for_every_machine(add_machine_test_case);
+    }
 
     return g_test_run();
 }