diff mbox series

iotest 040, 041, intermittent failure in netbsd VM

Message ID CAFEAcA-UKdcTROB7e3jO1qe=WCbuHRuX5WN7HZF2CcdMsmAt=g@mail.gmail.com (mailing list archive)
State New, archived
Headers show
Series iotest 040, 041, intermittent failure in netbsd VM | expand

Commit Message

Peter Maydell Jan. 10, 2022, 3:55 p.m. UTC
Just saw this failure of iotests in a netbsd VM (the in-tree
tests/vm stuff). Pretty sure it's an intermittent as the
pulreq being tested has nothing io or block related.


  TEST   iotest-qcow2: 036
  TEST   iotest-qcow2: 037
  TEST   iotest-qcow2: 038 [not run]
  TEST   iotest-qcow2: 039 [not run]
  TEST   iotest-qcow2: 040 [fail]
QEMU          --
"/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-system-aarch64"
-nodefaults -display none -accel qtest -machine
virt
QEMU_IMG      --
"/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-img"
QEMU_IO       --
"/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-io"
--cache writeback --aio threads -f qcow2
QEMU_NBD      --
"/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-nbd"
IMGFMT        -- qcow2
IMGPROTO      -- file
PLATFORM      -- NetBSD/amd64 localhost 9.2
TEST_DIR      -- /home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/scratch
SOCK_DIR      -- /tmp/tmpuniuicbi
GDB_OPTIONS   --
VALGRIND_QEMU --
PRINT_QEMU_OUTPUT --

--- /home/qemu/qemu-test.MPWquy/src/tests/qemu-iotests/040.out
fcntl(): Invalid argument
+++ 040.out.bad
@@ -1,5 +1,30 @@
-.................................................................
+............................................................ERROR:qemu.aqmp.qmp_client.qemu-7648:Failed
to establish connection: concurrent.futures._base.CancelledError
+E....
+======================================================================
+ERROR: test_top_is_default_active (__main__.TestSingleDrive)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "/home/qemu/qemu-test.MPWquy/src/tests/qemu-iotests/040", line
94, in setUp
+    self.vm.launch()
+  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/machine.py",
line 399, in launch
+    self._launch()
+  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/machine.py",
line 434, in _launch
+    self._post_launch()
+  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/qtest.py",
line 147, in _post_launch
+    super()._post_launch()
+  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/machine.py",
line 340, in _post_launch
+    self._qmp.accept(self._qmp_timer)
+  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/aqmp/legacy.py",
line 69, in accept
+    timeout
+  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/aqmp/legacy.py",
line 42, in _sync
+    asyncio.wait_for(future, timeout=timeout)
+  File "/usr/pkg/lib/python3.7/asyncio/base_events.py", line 587, in
run_until_complete
+    return future.result()
+  File "/usr/pkg/lib/python3.7/asyncio/tasks.py", line 449, in wait_for
+    raise futures.TimeoutError()
+concurrent.futures._base.TimeoutError
+
 ----------------------------------------------------------------------
 Ran 65 tests

-OK
+FAILED (errors=1)
  TEST   iotest-qcow2: 041 [fail]
QEMU          --
"/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-system-aarch64"
-nodefaults -display none -accel qtest -machine virt
QEMU_IMG      --
"/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-img"
QEMU_IO       --
"/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-io"
--cache writeback --aio threads -f qcow2
QEMU_NBD      --
"/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-nbd"
IMGFMT        -- qcow2
IMGPROTO      -- file
PLATFORM      -- NetBSD/amd64 localhost 9.2
TEST_DIR      -- /home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/scratch
SOCK_DIR      -- /tmp/tmpuniuicbi
GDB_OPTIONS   --
VALGRIND_QEMU --
PRINT_QEMU_OUTPUT --

run_until_complete
+    return future.result()
+  File "/usr/pkg/lib/python3.7/asyncio/tasks.py", line 449, in wait_for
+    raise futures.TimeoutError()
+concurrent.futures._base.TimeoutError
+
 ----------------------------------------------------------------------
 Ran 107 tests

-OK
+FAILED (errors=1)
  TEST   iotest-qcow2: 042
  TEST   iotest-qcow2: 043
[...]
Not run: 005 013 018 019 024 034 038 039 048 060 061 074 079 080 086
097 099 108 114 137 138 140 141 150 154 161 172 176 179 181 184 186
192 203 220 226 229 244 249 251 252 265 267 271 287 290 292 313
nbd-qemu-allocation
Failures: 040 041
Failed 2 of 74 iotests


Does it look familiar ?

-- PMM

Comments

Kevin Wolf Jan. 17, 2022, 10:05 a.m. UTC | #1
Am 10.01.2022 um 16:55 hat Peter Maydell geschrieben:
> Just saw this failure of iotests in a netbsd VM (the in-tree
> tests/vm stuff). Pretty sure it's an intermittent as the
> pulreq being tested has nothing io or block related.
> 
> 
>   TEST   iotest-qcow2: 036
>   TEST   iotest-qcow2: 037
>   TEST   iotest-qcow2: 038 [not run]
>   TEST   iotest-qcow2: 039 [not run]
>   TEST   iotest-qcow2: 040 [fail]
> QEMU          --
> "/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-system-aarch64"
> -nodefaults -display none -accel qtest -machine
> virt
> QEMU_IMG      --
> "/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-img"
> QEMU_IO       --
> "/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-io"
> --cache writeback --aio threads -f qcow2
> QEMU_NBD      --
> "/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-nbd"
> IMGFMT        -- qcow2
> IMGPROTO      -- file
> PLATFORM      -- NetBSD/amd64 localhost 9.2
> TEST_DIR      -- /home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/scratch
> SOCK_DIR      -- /tmp/tmpuniuicbi
> GDB_OPTIONS   --
> VALGRIND_QEMU --
> PRINT_QEMU_OUTPUT --
> 
> --- /home/qemu/qemu-test.MPWquy/src/tests/qemu-iotests/040.out
> fcntl(): Invalid argument
> +++ 040.out.bad
> @@ -1,5 +1,30 @@
> -.................................................................
> +............................................................ERROR:qemu.aqmp.qmp_client.qemu-7648:Failed
> to establish connection: concurrent.futures._base.CancelledError
> +E....
> +======================================================================
> +ERROR: test_top_is_default_active (__main__.TestSingleDrive)
> +----------------------------------------------------------------------
> +Traceback (most recent call last):
> +  File "/home/qemu/qemu-test.MPWquy/src/tests/qemu-iotests/040", line
> 94, in setUp
> +    self.vm.launch()
> +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/machine.py",
> line 399, in launch
> +    self._launch()
> +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/machine.py",
> line 434, in _launch
> +    self._post_launch()
> +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/qtest.py",
> line 147, in _post_launch
> +    super()._post_launch()
> +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/machine.py",
> line 340, in _post_launch
> +    self._qmp.accept(self._qmp_timer)
> +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/aqmp/legacy.py",
> line 69, in accept
> +    timeout
> +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/aqmp/legacy.py",
> line 42, in _sync
> +    asyncio.wait_for(future, timeout=timeout)
> +  File "/usr/pkg/lib/python3.7/asyncio/base_events.py", line 587, in
> run_until_complete
> +    return future.result()
> +  File "/usr/pkg/lib/python3.7/asyncio/tasks.py", line 449, in wait_for
> +    raise futures.TimeoutError()
> +concurrent.futures._base.TimeoutError
> +
>  ----------------------------------------------------------------------
>  Ran 65 tests
> 
> -OK
> +FAILED (errors=1)
>   TEST   iotest-qcow2: 041 [fail]
> QEMU          --
> "/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-system-aarch64"
> -nodefaults -display none -accel qtest -machine virt
> QEMU_IMG      --
> "/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-img"
> QEMU_IO       --
> "/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-io"
> --cache writeback --aio threads -f qcow2
> QEMU_NBD      --
> "/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-nbd"
> IMGFMT        -- qcow2
> IMGPROTO      -- file
> PLATFORM      -- NetBSD/amd64 localhost 9.2
> TEST_DIR      -- /home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/scratch
> SOCK_DIR      -- /tmp/tmpuniuicbi
> GDB_OPTIONS   --
> VALGRIND_QEMU --
> PRINT_QEMU_OUTPUT --
> 
> --- /home/qemu/qemu-test.MPWquy/src/tests/qemu-iotests/041.out
> +++ 041.out.bad
> @@ -1,5 +1,32 @@
> -...........................................................................................................
> +..........................................ERROR:qemu.aqmp.qmp_client.qemu-15252:Failed
> to establish connection: concurrent.futures._base.CancelledError
> +E................................................................
> +======================================================================
> +ERROR: test_small_buffer (__main__.TestSingleBlockdev)
> +----------------------------------------------------------------------
> +Traceback (most recent call last):
> +  File "/home/qemu/qemu-test.MPWquy/src/tests/qemu-iotests/041", line
> 233, in setUp
> +    TestSingleDrive.setUp(self)
> +  File "/home/qemu/qemu-test.MPWquy/src/tests/qemu-iotests/041", line
> 54, in setUp
> +    self.vm.launch()
> +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/machine.py",
> line 399, in launch
> +    self._launch()
> +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/machine.py",
> line 434, in _launch
> +    self._post_launch()
> +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/qtest.py",
> line 147, in _post_launch
> +    super()._post_launch()
> +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/machine.py",
> line 340, in _post_launch
> +    self._qmp.accept(self._qmp_timer)
> +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/aqmp/legacy.py",
> line 69, in accept
> +    timeout
> +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/aqmp/legacy.py",
> line 42, in _sync
> +    asyncio.wait_for(future, timeout=timeout)
> +  File "/usr/pkg/lib/python3.7/asyncio/base_events.py", line 587, in
> run_until_complete
> +    return future.result()
> +  File "/usr/pkg/lib/python3.7/asyncio/tasks.py", line 449, in wait_for
> +    raise futures.TimeoutError()
> +concurrent.futures._base.TimeoutError
> +
>  ----------------------------------------------------------------------
>  Ran 107 tests
> 
> -OK
> +FAILED (errors=1)
>   TEST   iotest-qcow2: 042
>   TEST   iotest-qcow2: 043
> [...]
> Not run: 005 013 018 019 024 034 038 039 048 060 061 074 079 080 086
> 097 099 108 114 137 138 140 141 150 154 161 172 176 179 181 184 186
> 192 203 220 226 229 244 249 251 252 265 267 271 287 290 292 313
> nbd-qemu-allocation
> Failures: 040 041
> Failed 2 of 74 iotests
> 
> 
> Does it look familiar ?

Not to me, but since there is a lot of AQMP in the trace, which is
relatively new, and 040 and 041 haven't changed in quite a while, let me
CC John to have a look.

Previously, with the old synchronous QMP library, I think timeout while
connecting often meant that there was an error in the command line, so
QEMU failed to start up and only printed an error message. But that
doesn't make sense for an intermittent failure, and even less for a test
that hasn't changed.

Kevin
John Snow Jan. 17, 2022, 8:35 p.m. UTC | #2
On Mon, Jan 17, 2022 at 5:05 AM Kevin Wolf <kwolf@redhat.com> wrote:
>
> Am 10.01.2022 um 16:55 hat Peter Maydell geschrieben:
> > Just saw this failure of iotests in a netbsd VM (the in-tree
> > tests/vm stuff). Pretty sure it's an intermittent as the
> > pulreq being tested has nothing io or block related.
> >
> >
> >   TEST   iotest-qcow2: 036
> >   TEST   iotest-qcow2: 037
> >   TEST   iotest-qcow2: 038 [not run]
> >   TEST   iotest-qcow2: 039 [not run]
> >   TEST   iotest-qcow2: 040 [fail]
> > QEMU          --
> > "/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-system-aarch64"
> > -nodefaults -display none -accel qtest -machine
> > virt
> > QEMU_IMG      --
> > "/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-img"
> > QEMU_IO       --
> > "/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-io"
> > --cache writeback --aio threads -f qcow2
> > QEMU_NBD      --
> > "/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-nbd"
> > IMGFMT        -- qcow2
> > IMGPROTO      -- file
> > PLATFORM      -- NetBSD/amd64 localhost 9.2
> > TEST_DIR      -- /home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/scratch
> > SOCK_DIR      -- /tmp/tmpuniuicbi
> > GDB_OPTIONS   --
> > VALGRIND_QEMU --
> > PRINT_QEMU_OUTPUT --
> >
> > --- /home/qemu/qemu-test.MPWquy/src/tests/qemu-iotests/040.out
> > fcntl(): Invalid argument
> > +++ 040.out.bad
> > @@ -1,5 +1,30 @@
> > -.................................................................
> > +............................................................ERROR:qemu.aqmp.qmp_client.qemu-7648:Failed
> > to establish connection: concurrent.futures._base.CancelledError
> > +E....
> > +======================================================================
> > +ERROR: test_top_is_default_active (__main__.TestSingleDrive)
> > +----------------------------------------------------------------------
> > +Traceback (most recent call last):
> > +  File "/home/qemu/qemu-test.MPWquy/src/tests/qemu-iotests/040", line
> > 94, in setUp
> > +    self.vm.launch()
> > +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/machine.py",
> > line 399, in launch
> > +    self._launch()
> > +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/machine.py",
> > line 434, in _launch
> > +    self._post_launch()
> > +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/qtest.py",
> > line 147, in _post_launch
> > +    super()._post_launch()
> > +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/machine.py",
> > line 340, in _post_launch
> > +    self._qmp.accept(self._qmp_timer)
> > +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/aqmp/legacy.py",
> > line 69, in accept
> > +    timeout
> > +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/aqmp/legacy.py",
> > line 42, in _sync
> > +    asyncio.wait_for(future, timeout=timeout)
> > +  File "/usr/pkg/lib/python3.7/asyncio/base_events.py", line 587, in
> > run_until_complete
> > +    return future.result()
> > +  File "/usr/pkg/lib/python3.7/asyncio/tasks.py", line 449, in wait_for
> > +    raise futures.TimeoutError()
> > +concurrent.futures._base.TimeoutError
> > +
> >  ----------------------------------------------------------------------
> >  Ran 65 tests
> >

This trace says that we timed out while awaiting a connection from
QEMU during the VM launch phase. i.e. python/qemu/qmp/machine.py line
340:

    def _post_launch(self) -> None:
        if self._qmp_connection:
            self._qmp.accept(self._qmp_timer)  <-- we timed out here.

(Note to self: make this traceback look more obvious as to what was
canceled and why. I think I can improve readability here a bit ...)

Sky's the limit on why QEMU never connected to the socket, but:

> > --- /home/qemu/qemu-test.MPWquy/src/tests/qemu-iotests/040.out
> > fcntl(): Invalid argument

That looks fairly suspicious, and I don't know which process was
responsible for printing it (or when, relative to the other outputs).
I assume you don't see any such output like this on a good run.

> > -OK
> > +FAILED (errors=1)
> >   TEST   iotest-qcow2: 041 [fail]
> > QEMU          --
> > "/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-system-aarch64"
> > -nodefaults -display none -accel qtest -machine virt
> > QEMU_IMG      --
> > "/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-img"
> > QEMU_IO       --
> > "/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-io"
> > --cache writeback --aio threads -f qcow2
> > QEMU_NBD      --
> > "/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-nbd"
> > IMGFMT        -- qcow2
> > IMGPROTO      -- file
> > PLATFORM      -- NetBSD/amd64 localhost 9.2
> > TEST_DIR      -- /home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/scratch
> > SOCK_DIR      -- /tmp/tmpuniuicbi
> > GDB_OPTIONS   --
> > VALGRIND_QEMU --
> > PRINT_QEMU_OUTPUT --
> >
> > --- /home/qemu/qemu-test.MPWquy/src/tests/qemu-iotests/041.out
> > +++ 041.out.bad
> > @@ -1,5 +1,32 @@
> > -...........................................................................................................
> > +..........................................ERROR:qemu.aqmp.qmp_client.qemu-15252:Failed
> > to establish connection: concurrent.futures._base.CancelledError
> > +E................................................................
> > +======================================================================
> > +ERROR: test_small_buffer (__main__.TestSingleBlockdev)
> > +----------------------------------------------------------------------
> > +Traceback (most recent call last):
> > +  File "/home/qemu/qemu-test.MPWquy/src/tests/qemu-iotests/041", line
> > 233, in setUp
> > +    TestSingleDrive.setUp(self)
> > +  File "/home/qemu/qemu-test.MPWquy/src/tests/qemu-iotests/041", line
> > 54, in setUp
> > +    self.vm.launch()
> > +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/machine.py",
> > line 399, in launch
> > +    self._launch()
> > +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/machine.py",
> > line 434, in _launch
> > +    self._post_launch()
> > +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/qtest.py",
> > line 147, in _post_launch
> > +    super()._post_launch()
> > +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/machine.py",
> > line 340, in _post_launch
> > +    self._qmp.accept(self._qmp_timer)
> > +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/aqmp/legacy.py",
> > line 69, in accept
> > +    timeout
> > +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/aqmp/legacy.py",
> > line 42, in _sync
> > +    asyncio.wait_for(future, timeout=timeout)
> > +  File "/usr/pkg/lib/python3.7/asyncio/base_events.py", line 587, in
> > run_until_complete
> > +    return future.result()
> > +  File "/usr/pkg/lib/python3.7/asyncio/tasks.py", line 449, in wait_for
> > +    raise futures.TimeoutError()
> > +concurrent.futures._base.TimeoutError

Same problem here, except I don't see any output from QEMU to blame.
As far as the Python code knows, it just never got a connection on the
socket, so it timed out and died.

I do expect this to print more information on failure than it
currently is, though (bug somewhere in machine.py, I think).
Can you please try applying this temporary patch and running `./check
-qcow2 040 041` until you see a breakage and show me the output from
that?

diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
index 67ab06ca2b..ca49e6fcd2 100644
--- a/python/qemu/machine/machine.py
+++ b/python/qemu/machine/machine.py
@@ -403,16 +403,19 @@ def launch(self) -> None:
             # Assume the VM didn't launch or is exiting.
             # If we don't wait for the process, exitcode() may still be
             # 'None' by the time control is ceded back to the caller.
+            LOG.error('Error launching VM')
             if self._launched:
+                LOG.error('Process was forked, waiting on it')
                 self.wait()
             else:
+                LOG.error('Process was not forked, just cleaning up')
                 self._post_shutdown()

-            LOG.debug('Error launching VM')
             if self._qemu_full_args:
-                LOG.debug('Command: %r', ' '.join(self._qemu_full_args))
+                LOG.error('Command: %r', ' '.join(self._qemu_full_args))
             if self._iolog:
-                LOG.debug('Output: %r', self._iolog)
+                LOG.error('Output: %r', self._iolog)
+            LOG.error('exitcode: %s', str(self.exitcode()))
             raise

     def _launch(self) -> None:



> > +
> >  ----------------------------------------------------------------------
> >  Ran 107 tests
> >
> > -OK
> > +FAILED (errors=1)
> >   TEST   iotest-qcow2: 042
> >   TEST   iotest-qcow2: 043
> > [...]
> > Not run: 005 013 018 019 024 034 038 039 048 060 061 074 079 080 086
> > 097 099 108 114 137 138 140 141 150 154 161 172 176 179 181 184 186
> > 192 203 220 226 229 244 249 251 252 265 267 271 287 290 292 313
> > nbd-qemu-allocation
> > Failures: 040 041
> > Failed 2 of 74 iotests
> >
> >
> > Does it look familiar ?
>
> Not to me, but since there is a lot of AQMP in the trace, which is
> relatively new, and 040 and 041 haven't changed in quite a while, let me
> CC John to have a look.
>
> Previously, with the old synchronous QMP library, I think timeout while
> connecting often meant that there was an error in the command line, so
> QEMU failed to start up and only printed an error message. But that
> doesn't make sense for an intermittent failure, and even less for a test
> that hasn't changed.
>
> Kevin
>
Peter Maydell Jan. 17, 2022, 8:43 p.m. UTC | #3
On Mon, 17 Jan 2022 at 20:35, John Snow <jsnow@redhat.com> wrote:
>
> On Mon, Jan 17, 2022 at 5:05 AM Kevin Wolf <kwolf@redhat.com> wrote:
> >
> > Am 10.01.2022 um 16:55 hat Peter Maydell geschrieben:
> > > Just saw this failure of iotests in a netbsd VM

> This trace says that we timed out while awaiting a connection from
> QEMU during the VM launch phase. i.e. python/qemu/qmp/machine.py line
> 340:
>
>     def _post_launch(self) -> None:
>         if self._qmp_connection:
>             self._qmp.accept(self._qmp_timer)  <-- we timed out here.
>
> (Note to self: make this traceback look more obvious as to what was
> canceled and why. I think I can improve readability here a bit ...)
>
> Sky's the limit on why QEMU never connected to the socket, but:
>
> > > --- /home/qemu/qemu-test.MPWquy/src/tests/qemu-iotests/040.out
> > > fcntl(): Invalid argument
>
> That looks fairly suspicious, and I don't know which process was
> responsible for printing it (or when, relative to the other outputs).
> I assume you don't see any such output like this on a good run.

The NetBSD VM prints those fcntl messages all over the place.
I think something in the build system (make? ninja? who knows)
triggers them.

> > > -OK
> > > +FAILED (errors=1)
> > >   TEST   iotest-qcow2: 041 [fail]
> > > QEMU          --
> > > "/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-system-aarch64"
> > > -nodefaults -display none -accel qtest -machine virt
> > > QEMU_IMG      --
> > > "/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-img"
> > > QEMU_IO       --
> > > "/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-io"
> > > --cache writeback --aio threads -f qcow2
> > > QEMU_NBD      --
> > > "/home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/../../qemu-nbd"
> > > IMGFMT        -- qcow2
> > > IMGPROTO      -- file
> > > PLATFORM      -- NetBSD/amd64 localhost 9.2
> > > TEST_DIR      -- /home/qemu/qemu-test.MPWquy/build/tests/qemu-iotests/scratch
> > > SOCK_DIR      -- /tmp/tmpuniuicbi
> > > GDB_OPTIONS   --
> > > VALGRIND_QEMU --
> > > PRINT_QEMU_OUTPUT --
> > >
> > > --- /home/qemu/qemu-test.MPWquy/src/tests/qemu-iotests/041.out
> > > +++ 041.out.bad
> > > @@ -1,5 +1,32 @@
> > > -...........................................................................................................
> > > +..........................................ERROR:qemu.aqmp.qmp_client.qemu-15252:Failed
> > > to establish connection: concurrent.futures._base.CancelledError
> > > +E................................................................
> > > +======================================================================
> > > +ERROR: test_small_buffer (__main__.TestSingleBlockdev)
> > > +----------------------------------------------------------------------
> > > +Traceback (most recent call last):
> > > +  File "/home/qemu/qemu-test.MPWquy/src/tests/qemu-iotests/041", line
> > > 233, in setUp
> > > +    TestSingleDrive.setUp(self)
> > > +  File "/home/qemu/qemu-test.MPWquy/src/tests/qemu-iotests/041", line
> > > 54, in setUp
> > > +    self.vm.launch()
> > > +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/machine.py",
> > > line 399, in launch
> > > +    self._launch()
> > > +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/machine.py",
> > > line 434, in _launch
> > > +    self._post_launch()
> > > +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/qtest.py",
> > > line 147, in _post_launch
> > > +    super()._post_launch()
> > > +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/machine.py",
> > > line 340, in _post_launch
> > > +    self._qmp.accept(self._qmp_timer)
> > > +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/aqmp/legacy.py",
> > > line 69, in accept
> > > +    timeout
> > > +  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/aqmp/legacy.py",
> > > line 42, in _sync
> > > +    asyncio.wait_for(future, timeout=timeout)
> > > +  File "/usr/pkg/lib/python3.7/asyncio/base_events.py", line 587, in
> > > run_until_complete
> > > +    return future.result()
> > > +  File "/usr/pkg/lib/python3.7/asyncio/tasks.py", line 449, in wait_for
> > > +    raise futures.TimeoutError()
> > > +concurrent.futures._base.TimeoutError
>
> Same problem here, except I don't see any output from QEMU to blame.
> As far as the Python code knows, it just never got a connection on the
> socket, so it timed out and died.

I think the NetBSD VM does for some reason get a bit slow to do
stuff. I've never worked out why. In the past we've had to bump
up various overoptimistic timeouts to help it out.

> I do expect this to print more information on failure than it
> currently is, though (bug somewhere in machine.py, I think).
> Can you please try applying this temporary patch and running `./check
> -qcow2 040 041` until you see a breakage and show me the output from
> that?
>
> diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
> index 67ab06ca2b..ca49e6fcd2 100644
> --- a/python/qemu/machine/machine.py
> +++ b/python/qemu/machine/machine.py
> @@ -403,16 +403,19 @@ def launch(self) -> None:
>              # Assume the VM didn't launch or is exiting.
>              # If we don't wait for the process, exitcode() may still be
>              # 'None' by the time control is ceded back to the caller.
> +            LOG.error('Error launching VM')
>              if self._launched:
> +                LOG.error('Process was forked, waiting on it')
>                  self.wait()
>              else:
> +                LOG.error('Process was not forked, just cleaning up')
>                  self._post_shutdown()
>
> -            LOG.debug('Error launching VM')
>              if self._qemu_full_args:
> -                LOG.debug('Command: %r', ' '.join(self._qemu_full_args))
> +                LOG.error('Command: %r', ' '.join(self._qemu_full_args))
>              if self._iolog:
> -                LOG.debug('Output: %r', self._iolog)
> +                LOG.error('Output: %r', self._iolog)
> +            LOG.error('exitcode: %s', str(self.exitcode()))
>              raise
>
>      def _launch(self) -> None:

Yeah, I should be able to test this.

-- PMM
Peter Maydell Jan. 17, 2022, 8:49 p.m. UTC | #4
On Mon, 17 Jan 2022 at 20:35, John Snow <jsnow@redhat.com> wrote:

> Can you please try applying this temporary patch and running `./check
> -qcow2 040 041` until you see a breakage and show me the output from
> that?

With this temporary patch the VM doesn't launch at all:

peter.maydell@hackbox2.linaro.org:~/qemu-netbsd$ make -C build/
vm-build-netbsd J=8 V=1 2>&1 | tee netbsd.log
make: Entering directory '/home/peter.maydell/qemu-netbsd/build'
/usr/bin/python3 -B /home/peter.maydell/qemu-netbsd/meson/meson.py
introspect --targets --tests --benchmarks | /usr/bin/python3 -B
scripts/mtest2m
ake.py > Makefile.mtest
{ \
  echo 'ninja-targets = \'; \
  /usr/bin/ninja -t targets all | sed 's/:.*//; $!s/$/ \\/'; \
  echo 'build-files = \'; \
  /usr/bin/ninja -t query build.ninja | sed -n '1,/^  input:/d; /^
outputs:/q; s/$/ \\/p'; \
} > Makefile.ninja.tmp && mv Makefile.ninja.tmp Makefile.ninja
(GIT="git" "/home/peter.maydell/qemu-netbsd/scripts/git-submodule.sh"
update ui/keycodemapdb meson tests/fp/berkeley-testfloat-3
tests/fp/berkeley-softfloat-3 dtc capstone slirp)
(GIT="git" "/home/peter.maydell/qemu-netbsd/scripts/git-submodule.sh"
update ui/keycodemapdb meson tests/fp/berkeley-testfloat-3
tests/fp/berkeley-softfloat-3 dtc capstone slirp)
/usr/bin/python3 -B /home/peter.maydell/qemu-netbsd/tests/vm/netbsd
--debug  --jobs 8 --verbose    --image
"/home/peter.maydell/.cache/qemu-vm/images/netbsd.img"  --snapshot
--build-qemu /home/peter.maydell/qemu-netbsd --
DEBUG:root:Creating archive
/home/peter.maydell/qemu-netbsd/build/vm-test-72ra6_8s.tmp/data-f706c.tar
for src_dir dir: /home/peter.maydell/qemu-netbsd
DEBUG:root:QEMU args: -nodefaults -m 4G -cpu max -netdev
user,id=vnet,hostfwd=:127.0.0.1:0-:22,ipv6=no -device
virtio-net-pci,netdev=vnet -vnc 127.0.0.1:0,to=20 -smp 8 -enable-kvm
-drive file=/home/peter.maydell/.cache/qemu-vm/images/netbsd.img,snapshot=on,if=none,id=drive0,cache=writeback
-device virtio-blk,drive=drive0,bootindex=0 -drive
file=/home/peter.maydell/qemu-netbsd/build/vm-test-72ra6_8s.tmp/data-f706c.tar,if=none,id=data-f706c,cache=writeback,format=raw
-device virtio-blk,drive=data-f706c,serial=data-f706c,bootindex=1
DEBUG:asyncio:Using selector: EpollSelector
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Registering
<qemu.aqmp.events.EventListener object at 0x7f3b76bfc9b0>.
DEBUG:qemu.machine.machine:VM launch command: 'qemu-system-x86_64
-display none -vga none -chardev
socket,id=mon,path=/var/tmp/qemu-machine-0m15ou19/qemu-2335-7f3b78d7f128-monitor.sock
-mon chardev=mon,mode=control -machine pc -chardev
socket,id=console,path=/var/tmp/qemu-machine-0m15ou19/qemu-2335-7f3b78d7f128-console.sock,server=on,wait=off
-serial chardev:console -nodefaults -m 4G -cpu max -netdev
user,id=vnet,hostfwd=:127.0.0.1:0-:22,ipv6=no -device
virtio-net-pci,netdev=vnet -vnc 127.0.0.1:0,to=20 -smp 8 -enable-kvm
-drive file=/home/peter.maydell/.cache/qemu-vm/images/netbsd.img,snapshot=on,if=none,id=drive0,cache=writeback
-device virtio-blk,drive=drive0,bootindex=0 -drive
file=/home/peter.maydell/qemu-netbsd/build/vm-test-72ra6_8s.tmp/data-f706c.tar,if=none,id=data-f706c,cache=writeback,format=raw
-device virtio-blk,drive=data-f706c,serial=data-f706c,bootindex=1'
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Transitioning from
'Runstate.IDLE' to 'Runstate.CONNECTING'.
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Awaiting connection
on /var/tmp/qemu-machine-0m15ou19/qemu-2335-7f3b78d7f128-monitor.sock
...
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Connection accepted.
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Awaiting greeting ...
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:<-- {
  "QMP": {
    "version": {
      "qemu": {
        "micro": 1,
        "minor": 11,
        "major": 2
      },
      "package": "(Debian 1:2.11+dfsg-1ubuntu7.38)"
    },
    "capabilities": []
  }
}
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Negotiating capabilities ...
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:--> {
  "execute": "qmp_capabilities",
  "arguments": {
    "enable": []
  }
}
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:<-- {
  "error": {
    "class": "GenericError",
    "desc": "Parameter 'enable' is unexpected"
  }
}
ERROR:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Negotiation failed:
AssertionError
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Negotiation failed:
  | Traceback (most recent call last):
  |   File "/home/peter.maydell/qemu-netbsd/tests/vm/../../python/qemu/aqmp/qmp_client.py",
line 306, in _negotiate
  |     assert 'return' in reply
  | AssertionError

ERROR:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Failed to establish
session: qemu.aqmp.qmp_client.NegotiationError: Negotiation failed:
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Failed to establish session:
  | Traceback (most recent call last):
  |   File "/home/peter.maydell/qemu-netbsd/tests/vm/../../python/qemu/aqmp/qmp_client.py",
line 306, in _negotiate
  |     assert 'return' in reply
  | AssertionError
  |
  | The above exception was the direct cause of the following exception:
  |
  | Traceback (most recent call last):
  |   File "/home/peter.maydell/qemu-netbsd/tests/vm/../../python/qemu/aqmp/protocol.py",
line 371, in _new_session
  |     await self._establish_session()
  |   File "/home/peter.maydell/qemu-netbsd/tests/vm/../../python/qemu/aqmp/qmp_client.py",
line 253, in _establish_session
  |     await self._negotiate()
  |   File "/home/peter.maydell/qemu-netbsd/tests/vm/../../python/qemu/aqmp/qmp_client.py",
line 312, in _negotiate
  |     raise NegotiationError(emsg, err) from err
  | qemu.aqmp.qmp_client.NegotiationError: Negotiation failed:

DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:disconnect() called.
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Transitioning from
'Runstate.CONNECTING' to 'Runstate.DISCONNECTING'.
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Scheduling disconnect.
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Closing StreamWriter.
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Waiting for
StreamWriter to close ...
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:StreamWriter closed.
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Disconnected.
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:QMP Disconnected.
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Transitioning from
'Runstate.DISCONNECTING' to 'Runstate.IDLE'.
ERROR:qemu.machine.machine:Error launching VM
ERROR:qemu.machine.machine:Process was forked, waiting on it
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:disconnect() called.
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Transitioning from
'Runstate.IDLE' to 'Runstate.DISCONNECTING'.
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Scheduling disconnect.
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Disconnected.
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:QMP Disconnected.
DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Transitioning from
'Runstate.DISCONNECTING' to 'Runstate.IDLE'.
WARNING:qemu.machine.machine:qemu received signal 9; command:
"qemu-system-x86_64 -display none -vga none -chardev
socket,id=mon,path=/var/tmp/qemu-machine-0m15ou19/qemu-2335-7f3b78d7f128-monitor.sock
-mon chardev=mon,mode=control -machine pc -chardev
socket,id=console,path=/var/tmp/qemu-machine-0m15ou19/qemu-2335-7f3b78d7f128-console.sock,server=on,wait=off
-serial chardev:console -nodefaults -m 4G -cpu max -netdev
user,id=vnet,hostfwd=:127.0.0.1:0-:22,ipv6=no -device
virtio-net-pci,netdev=vnet -vnc 127.0.0.1:0,to=20 -smp 8 -enable-kvm
-drive file=/home/peter.maydell/.cache/qemu-vm/images/netbsd.img,snapshot=on,if=none,id=drive0,cache=writeback
-device virtio-blk,drive=drive0,bootindex=0 -drive
file=/home/peter.maydell/qemu-netbsd/build/vm-test-72ra6_8s.tmp/data-f706c.tar,if=none,id=data-f706c,cache=writeback,format=raw
-device virtio-blk,drive=data-f706c,serial=data-f706c,bootindex=1"
ERROR:root:Failed to launch QEMU, command line:
ERROR:root:qemu-system-x86_64 -nodefaults -m 4G -cpu max -netdev
user,id=vnet,hostfwd=:127.0.0.1:0-:22,ipv6=no -device
virtio-net-pci,netdev=vnet -vnc 127.0.0.1:0,to=20 -smp 8 -enable-kvm
-drive file=/home/peter.maydell/.cache/qemu-vm/images/netbsd.img,snapshot=on,if=none,id=drive0,cache=writeback
-device virtio-blk,drive=drive0,bootindex=0 -drive
file=/home/peter.maydell/qemu-netbsd/build/vm-test-72ra6_8s.tmp/data-f706c.tar,if=none,id=data-f706c,cache=writeback,format=raw
-device virtio-blk,drive=data-f706c,serial=data-f706c,bootindex=1
ERROR:root:Log:
ERROR:root:VNC server running on 127.0.0.1:5900

Failed to prepare guest environment
Traceback (most recent call last):
  File "/home/peter.maydell/qemu-netbsd/tests/vm/../../python/qemu/aqmp/qmp_client.py",
line 306, in _negotiate
    assert 'return' in reply
AssertionError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/peter.maydell/qemu-netbsd/tests/vm/../../python/qemu/aqmp/protocol.py",
line 371, in _new_session
    await self._establish_session()
  File "/home/peter.maydell/qemu-netbsd/tests/vm/../../python/qemu/aqmp/qmp_client.py",
line 253, in _establish_session
    await self._negotiate()
  File "/home/peter.maydell/qemu-netbsd/tests/vm/../../python/qemu/aqmp/qmp_client.py",
line 312, in _negotiate
    raise NegotiationError(emsg, err) from err
qemu.aqmp.qmp_client.NegotiationError: Negotiation failed:

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/peter.maydell/qemu-netbsd/tests/vm/../../python/qemu/machine/machine.py",
line 399, in launch
    self._launch()
  File "/home/peter.maydell/qemu-netbsd/tests/vm/../../python/qemu/machine/machine.py",
line 437, in _launch
    self._post_launch()
  File "/home/peter.maydell/qemu-netbsd/tests/vm/../../python/qemu/machine/machine.py",
line 340, in _post_launch
    self._qmp.accept(self._qmp_timer)
  File "/home/peter.maydell/qemu-netbsd/tests/vm/../../python/qemu/aqmp/legacy.py",
line 69, in accept
    timeout
  File "/home/peter.maydell/qemu-netbsd/tests/vm/../../python/qemu/aqmp/legacy.py",
line 42, in _sync
    asyncio.wait_for(future, timeout=timeout)
  File "/usr/lib/python3.6/asyncio/base_events.py", line 484, in
run_until_complete
    return future.result()
  File "/usr/lib/python3.6/asyncio/tasks.py", line 339, in wait_for
    return (yield from fut)
  File "/home/peter.maydell/qemu-netbsd/tests/vm/../../python/qemu/aqmp/protocol.py",
line 274, in accept
    await self._new_session(address, ssl, accept=True)
  File "/home/peter.maydell/qemu-netbsd/tests/vm/../../python/qemu/aqmp/protocol.py",
line 390, in _new_session
    raise ConnectError(emsg, err) from err
qemu.aqmp.protocol.ConnectError: Failed to establish session:
Negotiation failed:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/peter.maydell/qemu-netbsd/tests/vm/../../python/qemu/machine/machine.py",
line 529, in _do_shutdown
    self._soft_shutdown(timeout)
  File "/home/peter.maydell/qemu-netbsd/tests/vm/../../python/qemu/machine/machine.py",
line 514, in _soft_shutdown
    self._subp.wait(timeout=timeout)
  File "/usr/lib/python3.6/subprocess.py", line 1469, in wait
    raise TimeoutExpired(self.args, timeout)
subprocess.TimeoutExpired: Command '('qemu-system-x86_64', '-display',
'none', '-vga', 'none', '-chardev',
'socket,id=mon,path=/var/tmp/qemu-machine-0m15ou19/qemu-2335-7f3b78d7f128-monitor.sock',
'-mon', 'chardev=mon,mode=control', '-machine', 'pc', '-chardev',
'socket,id=console,path=/var/tmp/qemu-machine-0m15ou19/qemu-2335-7f3b78d7f128-console.sock,server=on,wait=off',
'-serial', 'chardev:console', '-nodefaults', '-m', '4G', '-cpu',
'max', '-netdev', 'user,id=vnet,hostfwd=:127.0.0.1:0-:22,ipv6=no',
'-device', 'virtio-net-pci,netdev=vnet', '-vnc', '127.0.0.1:0,to=20',
'-smp', '8', '-enable-kvm', '-drive',
'file=/home/peter.maydell/.cache/qemu-vm/images/netbsd.img,snapshot=on,if=none,id=drive0,cache=writeback',
'-device', 'virtio-blk,drive=drive0,bootindex=0', '-drive',
'file=/home/peter.maydell/qemu-netbsd/build/vm-test-72ra6_8s.tmp/data-f706c.tar,if=none,id=data-f706c,cache=writeback,format=raw',
'-device', 'virtio-blk,drive=data-f706c,serial=data-f706c,bootindex=1')'
timed out after 30 seconds

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/peter.maydell/qemu-netbsd/tests/vm/basevm.py", line 647, in main
    vm.boot(img)
  File "/home/peter.maydell/qemu-netbsd/tests/vm/basevm.py", line 296, in boot
    guest.launch()
  File "/home/peter.maydell/qemu-netbsd/tests/vm/../../python/qemu/machine/machine.py",
line 409, in launch
    self.wait()
  File "/home/peter.maydell/qemu-netbsd/tests/vm/../../python/qemu/machine/machine.py",
line 576, in wait
    self.shutdown(timeout=timeout)
  File "/home/peter.maydell/qemu-netbsd/tests/vm/../../python/qemu/machine/machine.py",
line 558, in shutdown
    self._do_shutdown(timeout)
  File "/home/peter.maydell/qemu-netbsd/tests/vm/../../python/qemu/machine/machine.py",
line 533, in _do_shutdown
    from exc
qemu.machine.machine.AbnormalShutdown: Could not perform graceful shutdown
/home/peter.maydell/qemu-netbsd/tests/vm/Makefile.include:105: recipe
for target 'vm-build-netbsd' failed
make: *** [vm-build-netbsd] Error 2
make: Leaving directory '/home/peter.maydell/qemu-netbsd/build'

-- PMM
John Snow Jan. 17, 2022, 11:08 p.m. UTC | #5
On Mon, Jan 17, 2022 at 3:49 PM Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Mon, 17 Jan 2022 at 20:35, John Snow <jsnow@redhat.com> wrote:
>
> > Can you please try applying this temporary patch and running `./check
> > -qcow2 040 041` until you see a breakage and show me the output from
> > that?
>
> With this temporary patch the VM doesn't launch at all:

"Works for me", but I found out why.

>
> peter.maydell@hackbox2.linaro.org:~/qemu-netbsd$ make -C build/
> vm-build-netbsd J=8 V=1 2>&1 | tee netbsd.log
> make: Entering directory '/home/peter.maydell/qemu-netbsd/build'
> /usr/bin/python3 -B /home/peter.maydell/qemu-netbsd/meson/meson.py
> introspect --targets --tests --benchmarks | /usr/bin/python3 -B
> scripts/mtest2m
> ake.py > Makefile.mtest
> { \
>   echo 'ninja-targets = \'; \
>   /usr/bin/ninja -t targets all | sed 's/:.*//; $!s/$/ \\/'; \
>   echo 'build-files = \'; \
>   /usr/bin/ninja -t query build.ninja | sed -n '1,/^  input:/d; /^
> outputs:/q; s/$/ \\/p'; \
> } > Makefile.ninja.tmp && mv Makefile.ninja.tmp Makefile.ninja
> (GIT="git" "/home/peter.maydell/qemu-netbsd/scripts/git-submodule.sh"
> update ui/keycodemapdb meson tests/fp/berkeley-testfloat-3
> tests/fp/berkeley-softfloat-3 dtc capstone slirp)
> (GIT="git" "/home/peter.maydell/qemu-netbsd/scripts/git-submodule.sh"
> update ui/keycodemapdb meson tests/fp/berkeley-testfloat-3
> tests/fp/berkeley-softfloat-3 dtc capstone slirp)
> /usr/bin/python3 -B /home/peter.maydell/qemu-netbsd/tests/vm/netbsd
> --debug  --jobs 8 --verbose    --image
> "/home/peter.maydell/.cache/qemu-vm/images/netbsd.img"  --snapshot
> --build-qemu /home/peter.maydell/qemu-netbsd --
> DEBUG:root:Creating archive
> /home/peter.maydell/qemu-netbsd/build/vm-test-72ra6_8s.tmp/data-f706c.tar
> for src_dir dir: /home/peter.maydell/qemu-netbsd
> DEBUG:root:QEMU args: -nodefaults -m 4G -cpu max -netdev
> user,id=vnet,hostfwd=:127.0.0.1:0-:22,ipv6=no -device
> virtio-net-pci,netdev=vnet -vnc 127.0.0.1:0,to=20 -smp 8 -enable-kvm
> -drive file=/home/peter.maydell/.cache/qemu-vm/images/netbsd.img,snapshot=on,if=none,id=drive0,cache=writeback
> -device virtio-blk,drive=drive0,bootindex=0 -drive
> file=/home/peter.maydell/qemu-netbsd/build/vm-test-72ra6_8s.tmp/data-f706c.tar,if=none,id=data-f706c,cache=writeback,format=raw
> -device virtio-blk,drive=data-f706c,serial=data-f706c,bootindex=1
> DEBUG:asyncio:Using selector: EpollSelector
> DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Registering
> <qemu.aqmp.events.EventListener object at 0x7f3b76bfc9b0>.
> DEBUG:qemu.machine.machine:VM launch command: 'qemu-system-x86_64
> -display none -vga none -chardev
> socket,id=mon,path=/var/tmp/qemu-machine-0m15ou19/qemu-2335-7f3b78d7f128-monitor.sock
> -mon chardev=mon,mode=control -machine pc -chardev
> socket,id=console,path=/var/tmp/qemu-machine-0m15ou19/qemu-2335-7f3b78d7f128-console.sock,server=on,wait=off
> -serial chardev:console -nodefaults -m 4G -cpu max -netdev
> user,id=vnet,hostfwd=:127.0.0.1:0-:22,ipv6=no -device
> virtio-net-pci,netdev=vnet -vnc 127.0.0.1:0,to=20 -smp 8 -enable-kvm
> -drive file=/home/peter.maydell/.cache/qemu-vm/images/netbsd.img,snapshot=on,if=none,id=drive0,cache=writeback
> -device virtio-blk,drive=drive0,bootindex=0 -drive
> file=/home/peter.maydell/qemu-netbsd/build/vm-test-72ra6_8s.tmp/data-f706c.tar,if=none,id=data-f706c,cache=writeback,format=raw
> -device virtio-blk,drive=data-f706c,serial=data-f706c,bootindex=1'
> DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Transitioning from
> 'Runstate.IDLE' to 'Runstate.CONNECTING'.
> DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Awaiting connection
> on /var/tmp/qemu-machine-0m15ou19/qemu-2335-7f3b78d7f128-monitor.sock
> ...
> DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Connection accepted.
> DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Awaiting greeting ...
> DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:<-- {
>   "QMP": {
>     "version": {
>       "qemu": {
>         "micro": 1,
>         "minor": 11,
>         "major": 2
>       },
>       "package": "(Debian 1:2.11+dfsg-1ubuntu7.38)"
>     },
>     "capabilities": []
>   }
> }

Well, today I learned that:

(1) vm-build-XXX targets use your host system's QEMU to run that VM
(2) my QMP library cannot talk to QEMU 2.11.

That doesn't explain the intermittent netbsd failure yet, though.
(I guess this wasn't a failure point for you due to the aggressive
caching of the VM images? Unlucky.)

Here's another hotfix, this one I cannot easily test quickly (I don't
have 2.11 handy and it no longer builds for me),
but I think it'll fix the VM installation problem against older QEMU versions:

diff --git a/python/qemu/aqmp/qmp_client.py b/python/qemu/aqmp/qmp_client.py
index 8105e29fa8..6b43e1dbbe 100644
--- a/python/qemu/aqmp/qmp_client.py
+++ b/python/qemu/aqmp/qmp_client.py
@@ -292,9 +292,9 @@ async def _negotiate(self) -> None:
         """
         self.logger.debug("Negotiating capabilities ...")

-        arguments: Dict[str, List[str]] = {'enable': []}
+        arguments: Dict[str, List[str]] = {}
         if self._greeting and 'oob' in self._greeting.QMP.capabilities:
-            arguments['enable'].append('oob')
+            arguments.setdefault('enable', []).append('oob')
         msg = self.make_execute_msg('qmp_capabilities', arguments=arguments)

         # It's not safe to use execute() here, because the reader/writers
Peter Maydell Jan. 18, 2022, 10:55 a.m. UTC | #6
On Mon, 17 Jan 2022 at 23:09, John Snow <jsnow@redhat.com> wrote:
> Well, today I learned that:
>
> (1) vm-build-XXX targets use your host system's QEMU to run that VM
> (2) my QMP library cannot talk to QEMU 2.11.

Whoops, I hadn't realised I was running with that ancient a QEMU.
For the scripted runs I set the PATH to include a 5.0 QEMU, but
this by-hand command-line invocation of 'make' didn't do that, so
it got the ancient system QEMU.

-- PMM
Peter Maydell Jan. 18, 2022, 12:13 p.m. UTC | #7
On Mon, 17 Jan 2022 at 20:35, John Snow <jsnow@redhat.com> wrote:
> I do expect this to print more information on failure than it
> currently is, though (bug somewhere in machine.py, I think).
> Can you please try applying this temporary patch and running `./check
> -qcow2 040 041` until you see a breakage and show me the output from
> that?

Having fixed my setup to not use an ancient host QEMU, here's
the relevant bit of the log:

  TEST   iotest-qcow2: 037
  TEST   iotest-qcow2: 038 [not run]
  TEST   iotest-qcow2: 039 [not run]
  TEST   iotest-qcow2: 040 [fail]
QEMU          --
"/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/../../qemu-system-aarch64"
-nodefaults -display none -accel qtest -machine virt
QEMU_IMG      --
"/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/../../qemu-img"
QEMU_IO       --
"/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/../../qemu-io"
--cache writeback --aio threads -f qcow2
QEMU_NBD      --
"/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/../../qemu-nbd"
IMGFMT        -- qcow2
IMGPROTO      -- file
PLATFORM      -- NetBSD/amd64 localhost 9.2
TEST_DIR      -- /home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/scratch
SOCK_DIR      -- /tmp/tmp1h12r7ev
GDB_OPTIONS   --
VALGRIND_QEMU --
PRINT_QEMU_OUTPUT --

--- /home/qemu/qemu-test.vdrI02/src/tests/qemu-iotests/040.out
+++ 040.out.bad
@@ -1,5 +1,95 @@
-.................................................................
+.......ERROR:qemu.aqmp.qmp_client.qemu-12407:Failed to establish
connection: concurrent.futures._base.CancelledError
+ERROR:qemu.machine.machine:Error launching VM
+ERROR:qemu.machine.machine:Process was forked, waiting on it
+ERROR:qemu.machine.machine:Command:
'/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/../../qemu-system-aarch64
-display none -vga none -chardev
socket,id=mon,path=/tmp/tmp1h12r7ev/qemu-12407-monitor.sock -mon
chardev=mon,mode=control -qtest
unix:path=/tmp/tmp1h12r7ev/qemu-12407-qtest.sock -accel qtest
-nodefaults -display none -accel qtest -machine virt -drive
if=none,id=drive0,file=/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/scratch/test.img,format=qcow2,cache=writeback,aio=threads,node-name=top,backing.node-name=mid,backing.backing.node-name=base
-device virtio-scsi -device scsi-hd,id=scsi0,drive=drive0'
+ERROR:qemu.machine.machine:Output: "qemu-system-aarch64: -chardev
socket,id=mon,path=/tmp/tmp1h12r7ev/qemu-12407-monitor.sock: Failed to
connect to '/tmp/tmp1h12r7ev/qemu-12407-monitor.sock': No such file or
directory\n"
+ERROR:qemu.machine.machine:exitcode: 1
+E..........................................ERROR:qemu.aqmp.qmp_client.qemu-12407:Failed
to establish connection: concurrent.futures._base.CancelledError
+ERROR:qemu.machine.machine:Error launching VM
+ERROR:qemu.machine.machine:Process was forked, waiting on it
+ERROR:qemu.machine.machine:Command:
'/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/../../qemu-system-aarch64
-display none -vga none -chardev
socket,id=mon,path=/tmp/tmp1h12r7ev/qemu-12407-monitor.sock -mon
chardev=mon,mode=control -qtest
unix:path=/tmp/tmp1h12r7ev/qemu-12407-qtest.sock -accel qtest
-nodefaults -display none -accel qtest -machine virt -drive
if=none,id=drive0,file=/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/scratch/test.img,format=qcow2,cache=writeback,aio=threads,node-name=top,backing.node-name=mid,backing.backing.node-name=base
-device virtio-scsi -device scsi-hd,id=scsi0,drive=drive0'
+ERROR:qemu.machine.machine:Output: "qemu-system-aarch64: -chardev
socket,id=mon,path=/tmp/tmp1h12r7ev/qemu-12407-monitor.sock: Failed to
connect to '/tmp/tmp1h12r7ev/qemu-12407-monitor.sock': No such file or
directory\n"
+ERROR:qemu.machine.machine:exitcode: 1
+E....ERROR:qemu.aqmp.qmp_client.qemu-12407:Failed to establish
connection: concurrent.futures._base.CancelledError
+ERROR:qemu.machine.machine:Error launching VM
+ERROR:qemu.machine.machine:Process was forked, waiting on it
+ERROR:qemu.machine.machine:Command:
'/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/../../qemu-system-aarch64
-display none -vga none -chardev
socket,id=mon,path=/tmp/tmp1h12r7ev/qemu-12407-monitor.sock -mon
chardev=mon,mode=control -qtest
unix:path=/tmp/tmp1h12r7ev/qemu-12407-qtest.sock -accel qtest
-nodefaults -display none -accel qtest -machine virt -drive
if=none,id=drive0,file=/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/scratch/test.img,format=qcow2,cache=writeback,aio=threads,node-name=top,backing.node-name=mid,backing.backing.node-name=base
-device virtio-scsi -device scsi-hd,id=scsi0,drive=drive0'
+ERROR:qemu.machine.machine:Output: "qemu-system-aarch64: -chardev
socket,id=mon,path=/tmp/tmp1h12r7ev/qemu-12407-monitor.sock: Failed to
connect to '/tmp/tmp1h12r7ev/qemu-12407-monitor.sock': No such file or
directory\n"
+ERROR:qemu.machine.machine:exitcode: 1
+E.........
+======================================================================
+ERROR: test_device_not_found (__main__.TestActiveZeroLengthImage)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "/home/qemu/qemu-test.vdrI02/src/tests/qemu-iotests/040", line
94, in setUp
+    self.vm.launch()
+  File "/home/qemu/qemu-test.vdrI02/src/python/qemu/machine/machine.py",
line 399, in launch
+    self._launch()
+  File "/home/qemu/qemu-test.vdrI02/src/python/qemu/machine/machine.py",
line 437, in _launch
+    self._post_launch()
+  File "/home/qemu/qemu-test.vdrI02/src/python/qemu/machine/qtest.py",
line 147, in _post_launch
+    super()._post_launch()
+  File "/home/qemu/qemu-test.vdrI02/src/python/qemu/machine/machine.py",
line 340, in _post_launch
+    self._qmp.accept(self._qmp_timer)
+  File "/home/qemu/qemu-test.vdrI02/src/python/qemu/aqmp/legacy.py",
line 69, in accept
+    timeout
+  File "/home/qemu/qemu-test.vdrI02/src/python/qemu/aqmp/legacy.py",
line 42, in _sync
+    asyncio.wait_for(future, timeout=timeout)
+  File "/usr/pkg/lib/python3.7/asyncio/base_events.py", line 587, in
run_until_complete
+    return future.result()
+  File "/usr/pkg/lib/python3.7/asyncio/tasks.py", line 449, in wait_for
+    raise futures.TimeoutError()
+concurrent.futures._base.TimeoutError
+
+======================================================================
+ERROR: test_commit_node (__main__.TestSingleDrive)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "/home/qemu/qemu-test.vdrI02/src/tests/qemu-iotests/040", line
94, in setUp
+    self.vm.launch()
+  File "/home/qemu/qemu-test.vdrI02/src/python/qemu/machine/machine.py",
line 399, in launch
+    self._launch()
+  File "/home/qemu/qemu-test.vdrI02/src/python/qemu/machine/machine.py",
line 437, in _launch
+    self._post_launch()
+  File "/home/qemu/qemu-test.vdrI02/src/python/qemu/machine/qtest.py",
line 147, in _post_launch
+    super()._post_launch()
+  File "/home/qemu/qemu-test.vdrI02/src/python/qemu/machine/machine.py",
line 340, in _post_launch
+    self._qmp.accept(self._qmp_timer)
+  File "/home/qemu/qemu-test.vdrI02/src/python/qemu/aqmp/legacy.py",
line 69, in accept
+    timeout
+  File "/home/qemu/qemu-test.vdrI02/src/python/qemu/aqmp/legacy.py",
line 42, in _sync
+    asyncio.wait_for(future, timeout=timeout)
+  File "/usr/pkg/lib/python3.7/asyncio/base_events.py", line 587, in
run_until_complete
+    return future.result()
+  File "/usr/pkg/lib/python3.7/asyncio/tasks.py", line 449, in wait_for
+    raise futures.TimeoutError()
+concurrent.futures._base.TimeoutError
+
+======================================================================
+ERROR: test_implicit_node (__main__.TestSingleDrive)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "/home/qemu/qemu-test.vdrI02/src/tests/qemu-iotests/040", line
94, in setUp
+    self.vm.launch()
+  File "/home/qemu/qemu-test.vdrI02/src/python/qemu/machine/machine.py",
line 399, in launch
+    self._launch()
+  File "/home/qemu/qemu-test.vdrI02/src/python/qemu/machine/machine.py",
line 437, in _launch
+    self._post_launch()
+  File "/home/qemu/qemu-test.vdrI02/src/python/qemu/machine/qtest.py",
line 147, in _post_launch
+    super()._post_launch()
+  File "/home/qemu/qemu-test.vdrI02/src/python/qemu/machine/machine.py",
line 340, in _post_launch
+    self._qmp.accept(self._qmp_timer)
+  File "/home/qemu/qemu-test.vdrI02/src/python/qemu/aqmp/legacy.py",
line 69, in accept
+    timeout
+  File "/home/qemu/qemu-test.vdrI02/src/python/qemu/aqmp/legacy.py",
line 42, in _sync
+    asyncio.wait_for(future, timeout=timeout)
+  File "/usr/pkg/lib/python3.7/asyncio/base_events.py", line 587, in
run_until_complete
+    return future.result()
+  File "/usr/pkg/lib/python3.7/asyncio/tasks.py", line 449, in wait_for
+    raise futures.TimeoutError()
+concurrent.futures._base.TimeoutError
+
 ----------------------------------------------------------------------
 Ran 65 tests

-OK
+FAILED (errors=3)
  TEST   iotest-qcow2: 041
  TEST   iotest-qcow2: 042
  TEST   iotest-qcow2: 043
(etc)


thanks
-- PMM
Li Zhang Jan. 18, 2022, 1:30 p.m. UTC | #8
On 1/18/22 12:08 AM, John Snow wrote:
> On Mon, Jan 17, 2022 at 3:49 PM Peter Maydell <peter.maydell@linaro.org> wrote:
>>
>> On Mon, 17 Jan 2022 at 20:35, John Snow <jsnow@redhat.com> wrote:
>>
>>> Can you please try applying this temporary patch and running `./check
>>> -qcow2 040 041` until you see a breakage and show me the output from
>>> that?
>>
>> With this temporary patch the VM doesn't launch at all:
> 
> "Works for me", but I found out why.
> 
>>
>> peter.maydell@hackbox2.linaro.org:~/qemu-netbsd$ make -C build/
>> vm-build-netbsd J=8 V=1 2>&1 | tee netbsd.log
>> make: Entering directory '/home/peter.maydell/qemu-netbsd/build'
>> /usr/bin/python3 -B /home/peter.maydell/qemu-netbsd/meson/meson.py
>> introspect --targets --tests --benchmarks | /usr/bin/python3 -B
>> scripts/mtest2m
>> ake.py > Makefile.mtest
>> { \
>>    echo 'ninja-targets = \'; \
>>    /usr/bin/ninja -t targets all | sed 's/:.*//; $!s/$/ \\/'; \
>>    echo 'build-files = \'; \
>>    /usr/bin/ninja -t query build.ninja | sed -n '1,/^  input:/d; /^
>> outputs:/q; s/$/ \\/p'; \
>> } > Makefile.ninja.tmp && mv Makefile.ninja.tmp Makefile.ninja
>> (GIT="git" "/home/peter.maydell/qemu-netbsd/scripts/git-submodule.sh"
>> update ui/keycodemapdb meson tests/fp/berkeley-testfloat-3
>> tests/fp/berkeley-softfloat-3 dtc capstone slirp)
>> (GIT="git" "/home/peter.maydell/qemu-netbsd/scripts/git-submodule.sh"
>> update ui/keycodemapdb meson tests/fp/berkeley-testfloat-3
>> tests/fp/berkeley-softfloat-3 dtc capstone slirp)
>> /usr/bin/python3 -B /home/peter.maydell/qemu-netbsd/tests/vm/netbsd
>> --debug  --jobs 8 --verbose    --image
>> "/home/peter.maydell/.cache/qemu-vm/images/netbsd.img"  --snapshot
>> --build-qemu /home/peter.maydell/qemu-netbsd --
>> DEBUG:root:Creating archive
>> /home/peter.maydell/qemu-netbsd/build/vm-test-72ra6_8s.tmp/data-f706c.tar
>> for src_dir dir: /home/peter.maydell/qemu-netbsd
>> DEBUG:root:QEMU args: -nodefaults -m 4G -cpu max -netdev
>> user,id=vnet,hostfwd=:127.0.0.1:0-:22,ipv6=no -device
>> virtio-net-pci,netdev=vnet -vnc 127.0.0.1:0,to=20 -smp 8 -enable-kvm
>> -drive file=/home/peter.maydell/.cache/qemu-vm/images/netbsd.img,snapshot=on,if=none,id=drive0,cache=writeback
>> -device virtio-blk,drive=drive0,bootindex=0 -drive
>> file=/home/peter.maydell/qemu-netbsd/build/vm-test-72ra6_8s.tmp/data-f706c.tar,if=none,id=data-f706c,cache=writeback,format=raw
>> -device virtio-blk,drive=data-f706c,serial=data-f706c,bootindex=1
>> DEBUG:asyncio:Using selector: EpollSelector
>> DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Registering
>> <qemu.aqmp.events.EventListener object at 0x7f3b76bfc9b0>.
>> DEBUG:qemu.machine.machine:VM launch command: 'qemu-system-x86_64
>> -display none -vga none -chardev
>> socket,id=mon,path=/var/tmp/qemu-machine-0m15ou19/qemu-2335-7f3b78d7f128-monitor.sock
>> -mon chardev=mon,mode=control -machine pc -chardev
>> socket,id=console,path=/var/tmp/qemu-machine-0m15ou19/qemu-2335-7f3b78d7f128-console.sock,server=on,wait=off
>> -serial chardev:console -nodefaults -m 4G -cpu max -netdev
>> user,id=vnet,hostfwd=:127.0.0.1:0-:22,ipv6=no -device
>> virtio-net-pci,netdev=vnet -vnc 127.0.0.1:0,to=20 -smp 8 -enable-kvm
>> -drive file=/home/peter.maydell/.cache/qemu-vm/images/netbsd.img,snapshot=on,if=none,id=drive0,cache=writeback
>> -device virtio-blk,drive=drive0,bootindex=0 -drive
>> file=/home/peter.maydell/qemu-netbsd/build/vm-test-72ra6_8s.tmp/data-f706c.tar,if=none,id=data-f706c,cache=writeback,format=raw
>> -device virtio-blk,drive=data-f706c,serial=data-f706c,bootindex=1'
>> DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Transitioning from
>> 'Runstate.IDLE' to 'Runstate.CONNECTING'.
>> DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Awaiting connection
>> on /var/tmp/qemu-machine-0m15ou19/qemu-2335-7f3b78d7f128-monitor.sock
>> ...
>> DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Connection accepted.
>> DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:Awaiting greeting ...
>> DEBUG:qemu.aqmp.qmp_client.qemu-2335-7f3b78d7f128:<-- {
>>    "QMP": {
>>      "version": {
>>        "qemu": {
>>          "micro": 1,
>>          "minor": 11,
>>          "major": 2
>>        },
>>        "package": "(Debian 1:2.11+dfsg-1ubuntu7.38)"
>>      },
>>      "capabilities": []
>>    }
>> }
> 
> Well, today I learned that:
> 
> (1) vm-build-XXX targets use your host system's QEMU to run that VM
> (2) my QMP library cannot talk to QEMU 2.11.
> 
> That doesn't explain the intermittent netbsd failure yet, though.
> (I guess this wasn't a failure point for you due to the aggressive
> caching of the VM images? Unlucky.)
> 
> Here's another hotfix, this one I cannot easily test quickly (I don't
> have 2.11 handy and it no longer builds for me),
> but I think it'll fix the VM installation problem against older QEMU versions:
> 
> diff --git a/python/qemu/aqmp/qmp_client.py b/python/qemu/aqmp/qmp_client.py
> index 8105e29fa8..6b43e1dbbe 100644
> --- a/python/qemu/aqmp/qmp_client.py
> +++ b/python/qemu/aqmp/qmp_client.py
> @@ -292,9 +292,9 @@ async def _negotiate(self) -> None:
>           """
>           self.logger.debug("Negotiating capabilities ...")
> 
> -        arguments: Dict[str, List[str]] = {'enable': []}
> +        arguments: Dict[str, List[str]] = {}
>           if self._greeting and 'oob' in self._greeting.QMP.capabilities:
> -            arguments['enable'].append('oob')
> +            arguments.setdefault('enable', []).append('oob')
>           msg = self.make_execute_msg('qmp_capabilities', arguments=arguments)
> 
>           # It's not safe to use execute() here, because the reader/writers
> 

I also found that iotest 040, 041 fail sometimes (not always) on X86
when I run the test cases in my environment.

>
John Snow Jan. 18, 2022, 6:34 p.m. UTC | #9
On Tue, Jan 18, 2022 at 7:13 AM Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On Mon, 17 Jan 2022 at 20:35, John Snow <jsnow@redhat.com> wrote:
> > I do expect this to print more information on failure than it
> > currently is, though (bug somewhere in machine.py, I think).
> > Can you please try applying this temporary patch and running `./check
> > -qcow2 040 041` until you see a breakage and show me the output from
> > that?
>

Thanks for playing tele-debug.

> Having fixed my setup to not use an ancient host QEMU, here's
> the relevant bit of the log:
>
>   TEST   iotest-qcow2: 037
>   TEST   iotest-qcow2: 038 [not run]
>   TEST   iotest-qcow2: 039 [not run]
>   TEST   iotest-qcow2: 040 [fail]
> QEMU          --
> "/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/../../qemu-system-aarch64"
> -nodefaults -display none -accel qtest -machine virt
> QEMU_IMG      --
> "/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/../../qemu-img"
> QEMU_IO       --
> "/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/../../qemu-io"
> --cache writeback --aio threads -f qcow2
> QEMU_NBD      --
> "/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/../../qemu-nbd"
> IMGFMT        -- qcow2
> IMGPROTO      -- file
> PLATFORM      -- NetBSD/amd64 localhost 9.2
> TEST_DIR      -- /home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/scratch
> SOCK_DIR      -- /tmp/tmp1h12r7ev
> GDB_OPTIONS   --
> VALGRIND_QEMU --
> PRINT_QEMU_OUTPUT --
>
> --- /home/qemu/qemu-test.vdrI02/src/tests/qemu-iotests/040.out
> +++ 040.out.bad
> @@ -1,5 +1,95 @@
> -.................................................................
> +.......ERROR:qemu.aqmp.qmp_client.qemu-12407:Failed to establish
> connection: concurrent.futures._base.CancelledError
> +ERROR:qemu.machine.machine:Error launching VM
> +ERROR:qemu.machine.machine:Process was forked, waiting on it
> +ERROR:qemu.machine.machine:Command:
> '/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/../../qemu-system-aarch64
> -display none -vga none -chardev
> socket,id=mon,path=/tmp/tmp1h12r7ev/qemu-12407-monitor.sock -mon
> chardev=mon,mode=control -qtest
> unix:path=/tmp/tmp1h12r7ev/qemu-12407-qtest.sock -accel qtest
> -nodefaults -display none -accel qtest -machine virt -drive
> if=none,id=drive0,file=/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/scratch/test.img,format=qcow2,cache=writeback,aio=threads,node-name=top,backing.node-name=mid,backing.backing.node-name=base
> -device virtio-scsi -device scsi-hd,id=scsi0,drive=drive0'

> +ERROR:qemu.machine.machine:Output: "qemu-system-aarch64: -chardev
> socket,id=mon,path=/tmp/tmp1h12r7ev/qemu-12407-monitor.sock: Failed to
> connect to '/tmp/tmp1h12r7ev/qemu-12407-monitor.sock': No such file or
> directory\n"

... Oh. That's unpleasant. My guess is that we aren't listening on the
socket before the QEMU process gets far enough to want to connect to
it. The change to an asynchronous backend must have jostled the
timing.

> +ERROR:qemu.machine.machine:exitcode: 1

And, oh: The VM launching library only chirps about *negative* error
codes. That's why it wasn't printing anything more useful. I suppose
the thinking was that we use the VM launch utility to knowingly launch
bad command lines, so we only wanted to see failure notifications on
-errno style codes, but that obviously makes debugging unintentional
failures a lot more awful. I'll try to improve the usability and
legibility of the errors here.

Thanks,
--js
John Snow Jan. 19, 2022, 7:56 p.m. UTC | #10
On Tue, Jan 18, 2022 at 1:34 PM John Snow <jsnow@redhat.com> wrote:
>
> On Tue, Jan 18, 2022 at 7:13 AM Peter Maydell <peter.maydell@linaro.org> wrote:
> >
> > On Mon, 17 Jan 2022 at 20:35, John Snow <jsnow@redhat.com> wrote:
> > > I do expect this to print more information on failure than it
> > > currently is, though (bug somewhere in machine.py, I think).
> > > Can you please try applying this temporary patch and running `./check
> > > -qcow2 040 041` until you see a breakage and show me the output from
> > > that?
> >
>
> Thanks for playing tele-debug.
>
> > Having fixed my setup to not use an ancient host QEMU, here's
> > the relevant bit of the log:
> >
> >   TEST   iotest-qcow2: 037
> >   TEST   iotest-qcow2: 038 [not run]
> >   TEST   iotest-qcow2: 039 [not run]
> >   TEST   iotest-qcow2: 040 [fail]
> > QEMU          --
> > "/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/../../qemu-system-aarch64"
> > -nodefaults -display none -accel qtest -machine virt
> > QEMU_IMG      --
> > "/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/../../qemu-img"
> > QEMU_IO       --
> > "/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/../../qemu-io"
> > --cache writeback --aio threads -f qcow2
> > QEMU_NBD      --
> > "/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/../../qemu-nbd"
> > IMGFMT        -- qcow2
> > IMGPROTO      -- file
> > PLATFORM      -- NetBSD/amd64 localhost 9.2
> > TEST_DIR      -- /home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/scratch
> > SOCK_DIR      -- /tmp/tmp1h12r7ev
> > GDB_OPTIONS   --
> > VALGRIND_QEMU --
> > PRINT_QEMU_OUTPUT --
> >
> > --- /home/qemu/qemu-test.vdrI02/src/tests/qemu-iotests/040.out
> > +++ 040.out.bad
> > @@ -1,5 +1,95 @@
> > -.................................................................
> > +.......ERROR:qemu.aqmp.qmp_client.qemu-12407:Failed to establish
> > connection: concurrent.futures._base.CancelledError
> > +ERROR:qemu.machine.machine:Error launching VM
> > +ERROR:qemu.machine.machine:Process was forked, waiting on it
> > +ERROR:qemu.machine.machine:Command:
> > '/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/../../qemu-system-aarch64
> > -display none -vga none -chardev
> > socket,id=mon,path=/tmp/tmp1h12r7ev/qemu-12407-monitor.sock -mon
> > chardev=mon,mode=control -qtest
> > unix:path=/tmp/tmp1h12r7ev/qemu-12407-qtest.sock -accel qtest
> > -nodefaults -display none -accel qtest -machine virt -drive
> > if=none,id=drive0,file=/home/qemu/qemu-test.vdrI02/build/tests/qemu-iotests/scratch/test.img,format=qcow2,cache=writeback,aio=threads,node-name=top,backing.node-name=mid,backing.backing.node-name=base
> > -device virtio-scsi -device scsi-hd,id=scsi0,drive=drive0'
>
> > +ERROR:qemu.machine.machine:Output: "qemu-system-aarch64: -chardev
> > socket,id=mon,path=/tmp/tmp1h12r7ev/qemu-12407-monitor.sock: Failed to
> > connect to '/tmp/tmp1h12r7ev/qemu-12407-monitor.sock': No such file or
> > directory\n"
>
> ... Oh. That's unpleasant. My guess is that we aren't listening on the
> socket before the QEMU process gets far enough to want to connect to
> it. The change to an asynchronous backend must have jostled the
> timing.
>
> > +ERROR:qemu.machine.machine:exitcode: 1
>
> And, oh: The VM launching library only chirps about *negative* error
> codes. That's why it wasn't printing anything more useful. I suppose
> the thinking was that we use the VM launch utility to knowingly launch
> bad command lines, so we only wanted to see failure notifications on
> -errno style codes, but that obviously makes debugging unintentional
> failures a lot more awful. I'll try to improve the usability and
> legibility of the errors here.
>
> Thanks,
> --js

I've published '[PATCH v2 0/5] Python: minor fixes' and pushed to
jsnow/python. Test it if you want; otherwise I'll wait for
reviews/acks and send a PR like normal. CI is still running on the
final push, but early smoke tests looked good.

(Patch 1 fixes compatibility with QEMU 2.11, patch 3 adds better
diagnostic info to failures, patch 5 should ultimately fix the root
cause of the race condition.)

Thanks,
--js
diff mbox series

Patch

--- /home/qemu/qemu-test.MPWquy/src/tests/qemu-iotests/041.out
+++ 041.out.bad
@@ -1,5 +1,32 @@ 
-...........................................................................................................
+..........................................ERROR:qemu.aqmp.qmp_client.qemu-15252:Failed
to establish connection: concurrent.futures._base.CancelledError
+E................................................................
+======================================================================
+ERROR: test_small_buffer (__main__.TestSingleBlockdev)
+----------------------------------------------------------------------
+Traceback (most recent call last):
+  File "/home/qemu/qemu-test.MPWquy/src/tests/qemu-iotests/041", line
233, in setUp
+    TestSingleDrive.setUp(self)
+  File "/home/qemu/qemu-test.MPWquy/src/tests/qemu-iotests/041", line
54, in setUp
+    self.vm.launch()
+  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/machine.py",
line 399, in launch
+    self._launch()
+  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/machine.py",
line 434, in _launch
+    self._post_launch()
+  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/qtest.py",
line 147, in _post_launch
+    super()._post_launch()
+  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/machine/machine.py",
line 340, in _post_launch
+    self._qmp.accept(self._qmp_timer)
+  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/aqmp/legacy.py",
line 69, in accept
+    timeout
+  File "/home/qemu/qemu-test.MPWquy/src/python/qemu/aqmp/legacy.py",
line 42, in _sync
+    asyncio.wait_for(future, timeout=timeout)
+  File "/usr/pkg/lib/python3.7/asyncio/base_events.py", line 587, in