mbox series

[0/2] hw/block/nvme: handle transient dma errors

Message ID 20200629202053.1223342-1-its@irrelevant.dk (mailing list archive)
Headers show
Series hw/block/nvme: handle transient dma errors | expand

Message

Klaus Jensen June 29, 2020, 8:20 p.m. UTC
From: Klaus Jensen <k.jensen@samsung.com>

QEMU actually respects that Bus Master Enabling for a PCI device gets
flipped, so in order to succesfully pass the block/011 test ("disable
PCI device while doing I/O") the nvme device needs to know if a dma
transfer was successful or not.

Based-on: <20200629195017.1217056-1-its@irrelevant.dk>
("[PATCH 00/17] hw/block/nvme: AIO and address mapping refactoring")

Klaus Jensen (2):
  pci: pass along the return value of dma_memory_rw
  hw/block/nvme: handle dma errors

 hw/block/nvme.c       | 43 ++++++++++++++++++++++++++++++++-----------
 hw/block/trace-events |  2 ++
 include/block/nvme.h  |  2 +-
 include/hw/pci/pci.h  |  3 +--
 4 files changed, 36 insertions(+), 14 deletions(-)

Comments

no-reply@patchew.org June 29, 2020, 9:07 p.m. UTC | #1
Patchew URL: https://patchew.org/QEMU/20200629202053.1223342-1-its@irrelevant.dk/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

--- /tmp/qemu-test/src/tests/qemu-iotests/040.out       2020-06-29 20:12:10.000000000 +0000
+++ /tmp/qemu-test/build/tests/qemu-iotests/040.out.bad 2020-06-29 20:58:48.288790818 +0000
@@ -1,3 +1,5 @@
+WARNING:qemu.machine:qemu received signal 9: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.Jdol0fPScQ/qemu-21749-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.Jdol0fPScQ/qemu-21749-qtest.sock -accel qtest -nodefaults -display none -accel qtest
+WARNING:qemu.machine:qemu received signal 9: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.Jdol0fPScQ/qemu-21749-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.Jdol0fPScQ/qemu-21749-qtest.sock -accel qtest -nodefaults -display none -accel qtest
 ...........................................................
 ----------------------------------------------------------------------
 Ran 59 tests
---
Not run: 259
Failures: 040
Failed 1 of 119 iotests
make: *** [check-tests/check-block.sh] Error 1
make: *** Waiting for unfinished jobs....
  TEST    check-qtest-aarch64: tests/qtest/qos-test
Traceback (most recent call last):
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=da25eaa8bdd04cb783e2c427c6a5aa94', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-98l7koy2/src/docker-src.2020-06-29-16.51.46.20742:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=da25eaa8bdd04cb783e2c427c6a5aa94
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-98l7koy2/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    15m57.590s
user    0m9.240s


The full log is available at
http://patchew.org/logs/20200629202053.1223342-1-its@irrelevant.dk/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Klaus Jensen June 29, 2020, 9:34 p.m. UTC | #2
On Jun 29 14:07, no-reply@patchew.org wrote:
> Patchew URL: https://patchew.org/QEMU/20200629202053.1223342-1-its@irrelevant.dk/
> 
> 
> 
> Hi,
> 
> This series failed the docker-quick@centos7 build test. Please find the testing commands and
> their output below. If you have Docker installed, you can probably reproduce it
> locally.
> 
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> make docker-image-centos7 V=1 NETWORK=1
> time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
> === TEST SCRIPT END ===
> 
> --- /tmp/qemu-test/src/tests/qemu-iotests/040.out       2020-06-29 20:12:10.000000000 +0000
> +++ /tmp/qemu-test/build/tests/qemu-iotests/040.out.bad 2020-06-29 20:58:48.288790818 +0000
> @@ -1,3 +1,5 @@
> +WARNING:qemu.machine:qemu received signal 9: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.Jdol0fPScQ/qemu-21749-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.Jdol0fPScQ/qemu-21749-qtest.sock -accel qtest -nodefaults -display none -accel qtest
> +WARNING:qemu.machine:qemu received signal 9: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.Jdol0fPScQ/qemu-21749-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.Jdol0fPScQ/qemu-21749-qtest.sock -accel qtest -nodefaults -display none -accel qtest


Hmm, I can't seem to reproduce this locally and the test succeeded on
the next series[1] that is based on this.

Is this a flaky test? Or a bad test runner? I'm of course worried when
a qcow2 test fails and I touch something else than the nvme device ;)


  [1]: https://patchew.org/QEMU/20200629203155.1236860-1-its@irrelevant.dk/
Philippe Mathieu-Daudé July 1, 2020, 12:58 p.m. UTC | #3
On 6/29/20 11:34 PM, Klaus Jensen wrote:
> On Jun 29 14:07, no-reply@patchew.org wrote:
>> Patchew URL: https://patchew.org/QEMU/20200629202053.1223342-1-its@irrelevant.dk/

>> --- /tmp/qemu-test/src/tests/qemu-iotests/040.out       2020-06-29 20:12:10.000000000 +0000
>> +++ /tmp/qemu-test/build/tests/qemu-iotests/040.out.bad 2020-06-29 20:58:48.288790818 +0000
>> @@ -1,3 +1,5 @@
>> +WARNING:qemu.machine:qemu received signal 9: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.Jdol0fPScQ/qemu-21749-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.Jdol0fPScQ/qemu-21749-qtest.sock -accel qtest -nodefaults -display none -accel qtest
>> +WARNING:qemu.machine:qemu received signal 9: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.Jdol0fPScQ/qemu-21749-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.Jdol0fPScQ/qemu-21749-qtest.sock -accel qtest -nodefaults -display none -accel qtest

Kevin, Max, can iotests/040 be affected by this change?

> 
> 
> Hmm, I can't seem to reproduce this locally and the test succeeded on
> the next series[1] that is based on this.
> 
> Is this a flaky test? Or a bad test runner? I'm of course worried when
> a qcow2 test fails and I touch something else than the nvme device ;)
> 
> 
>   [1]: https://patchew.org/QEMU/20200629203155.1236860-1-its@irrelevant.dk/
>
Kevin Wolf July 3, 2020, 7:50 a.m. UTC | #4
Am 01.07.2020 um 14:58 hat Philippe Mathieu-Daudé geschrieben:
> On 6/29/20 11:34 PM, Klaus Jensen wrote:
> > On Jun 29 14:07, no-reply@patchew.org wrote:
> >> Patchew URL: https://patchew.org/QEMU/20200629202053.1223342-1-its@irrelevant.dk/
> 
> >> --- /tmp/qemu-test/src/tests/qemu-iotests/040.out       2020-06-29 20:12:10.000000000 +0000
> >> +++ /tmp/qemu-test/build/tests/qemu-iotests/040.out.bad 2020-06-29 20:58:48.288790818 +0000
> >> @@ -1,3 +1,5 @@
> >> +WARNING:qemu.machine:qemu received signal 9: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.Jdol0fPScQ/qemu-21749-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.Jdol0fPScQ/qemu-21749-qtest.sock -accel qtest -nodefaults -display none -accel qtest
> >> +WARNING:qemu.machine:qemu received signal 9: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.Jdol0fPScQ/qemu-21749-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.Jdol0fPScQ/qemu-21749-qtest.sock -accel qtest -nodefaults -display none -accel qtest
> 
> Kevin, Max, can iotests/040 be affected by this change?

The diffstat of this series looks like it doesn't touch anything outside
of the nvme emuation, which isn't used by this test, so at least I'd say
it's not the fault of the patch series.

I think test cases use SIGKILL primarily in timeout handlers, so maybe
the test host was overloaded and didn't shutdown QEMU in time so it was
killed. There is no actually failing test case:

 ...........................................................
 ----------------------------------------------------------------------
 Ran 59 tests

You would have 'F' or 'E' for fail/error instead of '.' otherwise.

Kevin

> > 
> > 
> > Hmm, I can't seem to reproduce this locally and the test succeeded on
> > the next series[1] that is based on this.
> > 
> > Is this a flaky test? Or a bad test runner? I'm of course worried when
> > a qcow2 test fails and I touch something else than the nvme device ;)
> > 
> > 
> >   [1]: https://patchew.org/QEMU/20200629203155.1236860-1-its@irrelevant.dk/
> > 
>
Philippe Mathieu-Daudé July 3, 2020, 7:55 a.m. UTC | #5
On 7/3/20 9:50 AM, Kevin Wolf wrote:
> Am 01.07.2020 um 14:58 hat Philippe Mathieu-Daudé geschrieben:
>> On 6/29/20 11:34 PM, Klaus Jensen wrote:
>>> On Jun 29 14:07, no-reply@patchew.org wrote:
>>>> Patchew URL: https://patchew.org/QEMU/20200629202053.1223342-1-its@irrelevant.dk/
>>
>>>> --- /tmp/qemu-test/src/tests/qemu-iotests/040.out       2020-06-29 20:12:10.000000000 +0000
>>>> +++ /tmp/qemu-test/build/tests/qemu-iotests/040.out.bad 2020-06-29 20:58:48.288790818 +0000
>>>> @@ -1,3 +1,5 @@
>>>> +WARNING:qemu.machine:qemu received signal 9: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.Jdol0fPScQ/qemu-21749-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.Jdol0fPScQ/qemu-21749-qtest.sock -accel qtest -nodefaults -display none -accel qtest
>>>> +WARNING:qemu.machine:qemu received signal 9: /tmp/qemu-test/build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64 -display none -vga none -chardev socket,id=mon,path=/tmp/tmp.Jdol0fPScQ/qemu-21749-monitor.sock -mon chardev=mon,mode=control -qtest unix:path=/tmp/tmp.Jdol0fPScQ/qemu-21749-qtest.sock -accel qtest -nodefaults -display none -accel qtest
>>
>> Kevin, Max, can iotests/040 be affected by this change?
> 
> The diffstat of this series looks like it doesn't touch anything outside
> of the nvme emuation, which isn't used by this test, so at least I'd say
> it's not the fault of the patch series.
> 
> I think test cases use SIGKILL primarily in timeout handlers, so maybe
> the test host was overloaded and didn't shutdown QEMU in time so it was
> killed. There is no actually failing test case:
> 
>  ...........................................................
>  ----------------------------------------------------------------------
>  Ran 59 tests
> 
> You would have 'F' or 'E' for fail/error instead of '.' otherwise.

TIL how to read that line :)

Thanks for your analysis Kevin!

> 
> Kevin
> 
>>>
>>>
>>> Hmm, I can't seem to reproduce this locally and the test succeeded on
>>> the next series[1] that is based on this.
>>>
>>> Is this a flaky test? Or a bad test runner? I'm of course worried when
>>> a qcow2 test fails and I touch something else than the nvme device ;)
>>>
>>>
>>>   [1]: https://patchew.org/QEMU/20200629203155.1236860-1-its@irrelevant.dk/
>>>
>>
>