mbox series

[v3,for-5.1,0/2] Fix crash due to NBD export leak

Message ID 20200714162234.13113-1-vsementsov@virtuozzo.com (mailing list archive)
Headers show
Series Fix crash due to NBD export leak | expand

Message

Vladimir Sementsov-Ogievskiy July 14, 2020, 4:22 p.m. UTC
Hi all!

We've faced crash bug, which is reproducing on master branch as well.
The case is described in 01, where fix is suggested.
New iotest in 02 crashes without that fix.

v3: resend for convenience, as all preparatory patches are merged.
01-02: add Eric's r-b and t-b marks

====

This is a crash-fix, so it would be good to fix in 5.1. Still neither
Eric nor I are sure in patch 01: is AIO_WAIT_WHILE used correctly?

====

Side note: this AIO_WAIT_WHILE may be long, if nbd reconnect is enabled
and connection failed recently. Still it's another story: I think we
actually should disable reconnect in bdrv_close, before drain.

Vladimir Sementsov-Ogievskiy (2):
  nbd: make nbd_export_close_all() synchronous
  iotests: test shutdown when bitmap is exported through NBD

 nbd/server.c               |  8 +++++
 tests/qemu-iotests/299     | 65 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/299.out | 10 ++++++
 tests/qemu-iotests/group   |  1 +
 4 files changed, 84 insertions(+)
 create mode 100644 tests/qemu-iotests/299
 create mode 100644 tests/qemu-iotests/299.out

Comments

no-reply@patchew.org July 14, 2020, 4:49 p.m. UTC | #1
Patchew URL: https://patchew.org/QEMU/20200714162234.13113-1-vsementsov@virtuozzo.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  TEST    iotest-qcow2: 022
  TEST    check-unit: tests/test-char
**
ERROR:/tmp/qemu-test/src/tests/test-char.c:1204:char_serial_test: 'chr' should not be NULL
ERROR test-char - Bail out! ERROR:/tmp/qemu-test/src/tests/test-char.c:1204:char_serial_test: 'chr' should not be NULL
make: *** [check-unit] Error 1
make: *** Waiting for unfinished jobs....
  TEST    iotest-qcow2: 024
  TEST    iotest-qcow2: 025
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=9dc2c7a6e3eb458c88fdd6be6a03c6eb', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-fzvixkio/src/docker-src.2020-07-14-12.32.53.19697:/var/tmp/qemu:z,ro', 'qemu/centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=9dc2c7a6e3eb458c88fdd6be6a03c6eb
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-fzvixkio/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    16m48.396s
user    0m8.741s


The full log is available at
http://patchew.org/logs/20200714162234.13113-1-vsementsov@virtuozzo.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Vladimir Sementsov-Ogievskiy July 14, 2020, 4:59 p.m. UTC | #2
14.07.2020 19:49, no-reply@patchew.org wrote:
> Patchew URL: https://patchew.org/QEMU/20200714162234.13113-1-vsementsov@virtuozzo.com/
> 
> 
> 
> Hi,
> 
> This series failed the docker-quick@centos7 build test. Please find the testing commands and
> their output below. If you have Docker installed, you can probably reproduce it
> locally.
> 
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> make docker-image-centos7 V=1 NETWORK=1
> time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
> === TEST SCRIPT END ===
> 
>    TEST    iotest-qcow2: 022
>    TEST    check-unit: tests/test-char
> **
> ERROR:/tmp/qemu-test/src/tests/test-char.c:1204:char_serial_test: 'chr' should not be NULL
> ERROR test-char - Bail out! ERROR:/tmp/qemu-test/src/tests/test-char.c:1204:char_serial_test: 'chr' should not be NULL
> make: *** [check-unit] Error 1
> make: *** Waiting for unfinished jobs....
>    TEST    iotest-qcow2: 024
>    TEST    iotest-qcow2: 025
> ---
>      raise CalledProcessError(retcode, cmd)
> subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=9dc2c7a6e3eb458c88fdd6be6a03c6eb', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-fzvixkio/src/docker-src.2020-07-14-12.32.53.19697:/var/tmp/qemu:z,ro', 'qemu/centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
> filter=--filter=label=com.qemu.instance.uuid=9dc2c7a6e3eb458c88fdd6be6a03c6eb
> make[1]: *** [docker-run] Error 1
> make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-fzvixkio/src'
> make: *** [docker-run-test-quick@centos7] Error 2
> 
> real    16m48.396s
> user    0m8.741s
> 
> 
> The full log is available at
> http://patchew.org/logs/20200714162234.13113-1-vsementsov@virtuozzo.com/testing.docker-quick@centos7/?type=message.
> ---
> Email generated automatically by Patchew [https://patchew.org/].
> Please send your feedback to patchew-devel@redhat.com
> 

something unrelated..
Philippe Mathieu-Daudé July 14, 2020, 5:26 p.m. UTC | #3
Cc'ing Marc-André

On 7/14/20 6:49 PM, no-reply@patchew.org wrote:
> Patchew URL: https://patchew.org/QEMU/20200714162234.13113-1-vsementsov@virtuozzo.com/
...
> 
>   TEST    iotest-qcow2: 022
>   TEST    check-unit: tests/test-char
> **
> ERROR:/tmp/qemu-test/src/tests/test-char.c:1204:char_serial_test: 'chr' should not be NULL
> ERROR test-char - Bail out! ERROR:/tmp/qemu-test/src/tests/test-char.c:1204:char_serial_test: 'chr' should not be NULL

Seems related to latest chardev-pull-request.

> make: *** [check-unit] Error 1
> make: *** Waiting for unfinished jobs....
>   TEST    iotest-qcow2: 024
>   TEST    iotest-qcow2: 025
> ---
>     raise CalledProcessError(retcode, cmd)
> subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=9dc2c7a6e3eb458c88fdd6be6a03c6eb', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-fzvixkio/src/docker-src.2020-07-14-12.32.53.19697:/var/tmp/qemu:z,ro', 'qemu/centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
> filter=--filter=label=com.qemu.instance.uuid=9dc2c7a6e3eb458c88fdd6be6a03c6eb
> make[1]: *** [docker-run] Error 1
> make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-fzvixkio/src'
> make: *** [docker-run-test-quick@centos7] Error 2
> 
> real    16m48.396s
> user    0m8.741s
> 
> 
> The full log is available at
> http://patchew.org/logs/20200714162234.13113-1-vsementsov@virtuozzo.com/testing.docker-quick@centos7/?type=message.
Kevin Wolf July 17, 2020, 12:01 p.m. UTC | #4
Am 14.07.2020 um 18:22 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Hi all!
> 
> We've faced crash bug, which is reproducing on master branch as well.
> The case is described in 01, where fix is suggested.
> New iotest in 02 crashes without that fix.
> 
> v3: resend for convenience, as all preparatory patches are merged.
> 01-02: add Eric's r-b and t-b marks
> 
> ====
> 
> This is a crash-fix, so it would be good to fix in 5.1. Still neither
> Eric nor I are sure in patch 01: is AIO_WAIT_WHILE used correctly?

Anything specific you had doubts about?

At first sight it looks good to me. It's always called in the main loop
and we don't hold any AioContext locks, so using NULL as the context is
fine. You also made sure to use aio_wait_kick() so that we won't hang
even though the condition has become false.

I'm applying this to my block branch now. If your doubts were about
something more subtle that I missed, we can unstage/revert the patch.

Kevin
Vladimir Sementsov-Ogievskiy July 17, 2020, 3 p.m. UTC | #5
17.07.2020 15:01, Kevin Wolf wrote:
> Am 14.07.2020 um 18:22 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> Hi all!
>>
>> We've faced crash bug, which is reproducing on master branch as well.
>> The case is described in 01, where fix is suggested.
>> New iotest in 02 crashes without that fix.
>>
>> v3: resend for convenience, as all preparatory patches are merged.
>> 01-02: add Eric's r-b and t-b marks
>>
>> ====
>>
>> This is a crash-fix, so it would be good to fix in 5.1. Still neither
>> Eric nor I are sure in patch 01: is AIO_WAIT_WHILE used correctly?
> 
> Anything specific you had doubts about?
> 
> At first sight it looks good to me. It's always called in the main loop
> and we don't hold any AioContext locks, so using NULL as the context is
> fine. You also made sure to use aio_wait_kick() so that we won't hang
> even though the condition has become false.
> 
> I'm applying this to my block branch now. If your doubts were about
> something more subtle that I missed, we can unstage/revert the patch.
> 
> Kevin
> 

Nothing specific, thanks!