diff mbox series

iotests: Revert emulator selection to old behaviour

Message ID 20210202142802.119999-1-kwolf@redhat.com (mailing list archive)
State New, archived
Headers show
Series iotests: Revert emulator selection to old behaviour | expand

Commit Message

Kevin Wolf Feb. 2, 2021, 2:28 p.m. UTC
If the qemu-system-{arch} binary for the host architecture can't be
found, the old 'check' implementation selected the alphabetically first
system emulator binary that it could find. The new Python implementation
just uses the first result of glob.iglob(), which has an undefined
order.

This is a problem that breaks CI because the iotests aren't actually
prepared to run on any emulator. They should be, so this is really a bug
in the failing test cases that should be fixed there, but as a quick
fix, let's revert to the old behaviour to let CI runs succeed again.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/qemu-iotests/testenv.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Philippe Mathieu-Daudé Feb. 2, 2021, 2:35 p.m. UTC | #1
On 2/2/21 3:28 PM, Kevin Wolf wrote:
> If the qemu-system-{arch} binary for the host architecture can't be
> found, the old 'check' implementation selected the alphabetically first
> system emulator binary that it could find. The new Python implementation
> just uses the first result of glob.iglob(), which has an undefined
> order.
> 
> This is a problem that breaks CI because the iotests aren't actually
> prepared to run on any emulator. They should be, so this is really a bug
> in the failing test cases that should be fixed there, but as a quick
> fix, let's revert to the old behaviour to let CI runs succeed again.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  tests/qemu-iotests/testenv.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Daniel P. Berrangé Feb. 2, 2021, 2:41 p.m. UTC | #2
On Tue, Feb 02, 2021 at 03:28:02PM +0100, Kevin Wolf wrote:
> If the qemu-system-{arch} binary for the host architecture can't be
> found, the old 'check' implementation selected the alphabetically first
> system emulator binary that it could find. The new Python implementation
> just uses the first result of glob.iglob(), which has an undefined
> order.
> 
> This is a problem that breaks CI because the iotests aren't actually
> prepared to run on any emulator. They should be, so this is really a bug
> in the failing test cases that should be fixed there, but as a quick
> fix, let's revert to the old behaviour to let CI runs succeed again.

Deterministic CI is critically important.

> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  tests/qemu-iotests/testenv.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


> 
> diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
> index b31275f518..1fbec854c1 100644
> --- a/tests/qemu-iotests/testenv.py
> +++ b/tests/qemu-iotests/testenv.py
> @@ -135,7 +135,7 @@ class TestEnv(ContextManager['TestEnv']):
>          if not os.path.exists(self.qemu_prog):
>              pattern = root('qemu-system-*')
>              try:
> -                progs = glob.iglob(pattern)
> +                progs = sorted(glob.iglob(pattern))
>                  self.qemu_prog = next(p for p in progs if isxfile(p))
>              except StopIteration:
>                  sys.exit("Not found any Qemu executable binary by pattern "
> -- 
> 2.29.2
> 

Regards,
Daniel
Philippe Mathieu-Daudé Feb. 2, 2021, 2:46 p.m. UTC | #3
On 2/2/21 3:28 PM, Kevin Wolf wrote:
> If the qemu-system-{arch} binary for the host architecture can't be
> found, the old 'check' implementation selected the alphabetically first
> system emulator binary that it could find. The new Python implementation
> just uses the first result of glob.iglob(), which has an undefined
> order.
> 
> This is a problem that breaks CI because the iotests aren't actually
> prepared to run on any emulator. They should be, so this is really a bug
> in the failing test cases that should be fixed there, but as a quick
> fix, let's revert to the old behaviour to let CI runs succeed again.

FWIW this is the same problem I had 1 year ago and tried to
fix it by sending QMP 'query-version' (introduced in v0.14):
https://www.mail-archive.com/qemu-devel@nongnu.org/msg675075.html

> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  tests/qemu-iotests/testenv.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
> index b31275f518..1fbec854c1 100644
> --- a/tests/qemu-iotests/testenv.py
> +++ b/tests/qemu-iotests/testenv.py
> @@ -135,7 +135,7 @@ class TestEnv(ContextManager['TestEnv']):
>          if not os.path.exists(self.qemu_prog):
>              pattern = root('qemu-system-*')
>              try:
> -                progs = glob.iglob(pattern)
> +                progs = sorted(glob.iglob(pattern))
>                  self.qemu_prog = next(p for p in progs if isxfile(p))
>              except StopIteration:
>                  sys.exit("Not found any Qemu executable binary by pattern "
>
Philippe Mathieu-Daudé Feb. 2, 2021, 2:48 p.m. UTC | #4
Forgot to Cc Wainer & Willian in case they are interested in
providing a better long term fix.

On 2/2/21 3:46 PM, Philippe Mathieu-Daudé wrote:
> On 2/2/21 3:28 PM, Kevin Wolf wrote:
>> If the qemu-system-{arch} binary for the host architecture can't be
>> found, the old 'check' implementation selected the alphabetically first
>> system emulator binary that it could find. The new Python implementation
>> just uses the first result of glob.iglob(), which has an undefined
>> order.
>>
>> This is a problem that breaks CI because the iotests aren't actually
>> prepared to run on any emulator. They should be, so this is really a bug
>> in the failing test cases that should be fixed there, but as a quick
>> fix, let's revert to the old behaviour to let CI runs succeed again.
> 
> FWIW this is the same problem I had 1 year ago and tried to
> fix it by sending QMP 'query-version' (introduced in v0.14):
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg675075.html
Kevin Wolf Feb. 2, 2021, 2:51 p.m. UTC | #5
Am 02.02.2021 um 15:41 hat Daniel P. Berrangé geschrieben:
> On Tue, Feb 02, 2021 at 03:28:02PM +0100, Kevin Wolf wrote:
> > If the qemu-system-{arch} binary for the host architecture can't be
> > found, the old 'check' implementation selected the alphabetically first
> > system emulator binary that it could find. The new Python implementation
> > just uses the first result of glob.iglob(), which has an undefined
> > order.
> > 
> > This is a problem that breaks CI because the iotests aren't actually
> > prepared to run on any emulator. They should be, so this is really a bug
> > in the failing test cases that should be fixed there, but as a quick
> > fix, let's revert to the old behaviour to let CI runs succeed again.
> 
> Deterministic CI is critically important.

True. I didn't mean to imply that we don't want deterministic behaviour,
but just that this hides some bugs in the test cases that we'll want to
have fixed eventually, too.

Maybe we should rely on automatic picking less and specify different
emulators explicitly in different CI jobs so that we don't only test the
same binary over and over again and others not at all.

Kevin
Eric Blake Feb. 2, 2021, 2:54 p.m. UTC | #6
On 2/2/21 8:28 AM, Kevin Wolf wrote:
> If the qemu-system-{arch} binary for the host architecture can't be
> found, the old 'check' implementation selected the alphabetically first
> system emulator binary that it could find. The new Python implementation
> just uses the first result of glob.iglob(), which has an undefined
> order.
> 
> This is a problem that breaks CI because the iotests aren't actually
> prepared to run on any emulator. They should be, so this is really a bug
> in the failing test cases that should be fixed there, but as a quick
> fix, let's revert to the old behaviour to let CI runs succeed again.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  tests/qemu-iotests/testenv.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

As fixing the tests is indeed more work than sorting the glob results,
this one-liner is worth checking.

Reviewed-by: Eric Blake <eblake@redhat.com>

> 
> diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
> index b31275f518..1fbec854c1 100644
> --- a/tests/qemu-iotests/testenv.py
> +++ b/tests/qemu-iotests/testenv.py
> @@ -135,7 +135,7 @@ class TestEnv(ContextManager['TestEnv']):
>          if not os.path.exists(self.qemu_prog):
>              pattern = root('qemu-system-*')
>              try:
> -                progs = glob.iglob(pattern)
> +                progs = sorted(glob.iglob(pattern))
>                  self.qemu_prog = next(p for p in progs if isxfile(p))
>              except StopIteration:
>                  sys.exit("Not found any Qemu executable binary by pattern "
>
Daniel P. Berrangé Feb. 2, 2021, 2:56 p.m. UTC | #7
On Tue, Feb 02, 2021 at 03:46:11PM +0100, Philippe Mathieu-Daudé wrote:
> On 2/2/21 3:28 PM, Kevin Wolf wrote:
> > If the qemu-system-{arch} binary for the host architecture can't be
> > found, the old 'check' implementation selected the alphabetically first
> > system emulator binary that it could find. The new Python implementation
> > just uses the first result of glob.iglob(), which has an undefined
> > order.
> > 
> > This is a problem that breaks CI because the iotests aren't actually
> > prepared to run on any emulator. They should be, so this is really a bug
> > in the failing test cases that should be fixed there, but as a quick
> > fix, let's revert to the old behaviour to let CI runs succeed again.
> 
> FWIW this is the same problem I had 1 year ago and tried to
> fix it by sending QMP 'query-version' (introduced in v0.14):
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg675075.html

In the current failures the issue isn't the version number. Rather some
of the tests (mistakenly) assume the emulator supports PCI, and we're
randomly sometimes picking emulator targets that lack PCI.


Regards,
Daniel
Philippe Mathieu-Daudé Feb. 2, 2021, 3:40 p.m. UTC | #8
On 2/2/21 3:56 PM, Daniel P. Berrangé wrote:
> On Tue, Feb 02, 2021 at 03:46:11PM +0100, Philippe Mathieu-Daudé wrote:
>> On 2/2/21 3:28 PM, Kevin Wolf wrote:
>>> If the qemu-system-{arch} binary for the host architecture can't be
>>> found, the old 'check' implementation selected the alphabetically first
>>> system emulator binary that it could find. The new Python implementation
>>> just uses the first result of glob.iglob(), which has an undefined
>>> order.
>>>
>>> This is a problem that breaks CI because the iotests aren't actually
>>> prepared to run on any emulator. They should be, so this is really a bug
>>> in the failing test cases that should be fixed there, but as a quick
>>> fix, let's revert to the old behaviour to let CI runs succeed again.
>>
>> FWIW this is the same problem I had 1 year ago and tried to
>> fix it by sending QMP 'query-version' (introduced in v0.14):
>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg675075.html
> 
> In the current failures the issue isn't the version number. Rather some
> of the tests (mistakenly) assume the emulator supports PCI, and we're
> randomly sometimes picking emulator targets that lack PCI.

Then this patch from the same series:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg675088.html

It queries TYPE_DEVICE, but we can query TYPE_PCI_DEVICE too,
which should be selected if a machine have a TYPE_PCI_HOST_BRIDGE
providing a TYPE_PCI_BUS.

Anyway the idea is to query the binary with QMP to see if it makes
sens to run the test with it.

Regards,

Phil.
Vladimir Sementsov-Ogievskiy Feb. 2, 2021, 3:48 p.m. UTC | #9
02.02.2021 17:28, Kevin Wolf wrote:
> If the qemu-system-{arch} binary for the host architecture can't be
> found, the old 'check' implementation selected the alphabetically first
> system emulator binary that it could find. The new Python implementation
> just uses the first result of glob.iglob(), which has an undefined
> order.
> 
> This is a problem that breaks CI because the iotests aren't actually
> prepared to run on any emulator. They should be, so this is really a bug
> in the failing test cases that should be fixed there, but as a quick
> fix, let's revert to the old behaviour to let CI runs succeed again.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   tests/qemu-iotests/testenv.py | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
> index b31275f518..1fbec854c1 100644
> --- a/tests/qemu-iotests/testenv.py
> +++ b/tests/qemu-iotests/testenv.py
> @@ -135,7 +135,7 @@ class TestEnv(ContextManager['TestEnv']):
>           if not os.path.exists(self.qemu_prog):
>               pattern = root('qemu-system-*')
>               try:
> -                progs = glob.iglob(pattern)
> +                progs = sorted(glob.iglob(pattern))
>                   self.qemu_prog = next(p for p in progs if isxfile(p))
>               except StopIteration:
>                   sys.exit("Not found any Qemu executable binary by pattern "
> 

Great that you find the problem! Anyway sorted is better than random. Probably we should print some warning when we have to chose from several binaries.. But nobody will read it anyway.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
diff mbox series

Patch

diff --git a/tests/qemu-iotests/testenv.py b/tests/qemu-iotests/testenv.py
index b31275f518..1fbec854c1 100644
--- a/tests/qemu-iotests/testenv.py
+++ b/tests/qemu-iotests/testenv.py
@@ -135,7 +135,7 @@  class TestEnv(ContextManager['TestEnv']):
         if not os.path.exists(self.qemu_prog):
             pattern = root('qemu-system-*')
             try:
-                progs = glob.iglob(pattern)
+                progs = sorted(glob.iglob(pattern))
                 self.qemu_prog = next(p for p in progs if isxfile(p))
             except StopIteration:
                 sys.exit("Not found any Qemu executable binary by pattern "