Message ID | 20230517152834.277483-4-kwolf@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block/graph-lock: Disable locking for now | expand |
On Wed, May 17, 2023 at 05:28:34PM +0200, Kevin Wolf wrote: > This tests exercises graph locking, draining, and graph modifications > with AioContext switches a lot. Amongst others, it serves as a > regression test for bdrv_graph_wrlock() deadlocking because it is called > with a locked AioContext and for AioContext handling in the NBD server. > > Signed-off-by: Kevin Wolf <kwolf@redhat.com> I've now confirmed the following setups with just './check graph-changes-while-io -qcow2': patch 3 alone => test fails with wrlock assertion (good, we're catching the 8.0 regression where the new assertion failure is tripping) patch 1 and 3 => test fails differently than patch 3 alone (good, we're exposing the fact that NBD had a pre-existing bug, regardless of whether the added rwlock made it easier to spot) patch 2 and 3 => test passes (good, patch 2 appears to have fixed this particular bug, and when we are ready to revert patch 1 because we get rid of AioContext locking we'll still be okay) patch 1, 2, and 3 => test passes (good, we fixed the NBD bug, and have regression testing in place for a scenario that previously wasn't getting good testing) As such, I'm happy to supply: Tested-by: Eric Blake <eblake@redhat.com> Now on to the patch itself... > --- > tests/qemu-iotests/iotests.py | 4 ++ > .../qemu-iotests/tests/graph-changes-while-io | 56 +++++++++++++++++-- > .../tests/graph-changes-while-io.out | 4 +- > 3 files changed, 58 insertions(+), 6 deletions(-) > > @@ -84,6 +84,54 @@ class TestGraphChangesWhileIO(QMPTestCase): > > bench_thr.join() > > + def test_commit_while_io(self) -> None: > + # Run qemu-img bench in the background > + bench_thr = Thread(target=do_qemu_img_bench, args=(200000, )) TIL - you can create a 1-item tuple in Python. It caught me off-guard, but makes sense now that I've re-read it. > + > + # While qemu-img bench is running, repeatedly commit overlay to node0 > + while bench_thr.is_alive(): > + result = self.qsd.qmp('block-commit', { > + 'job-id': 'job0', > + 'device': 'overlay', > + }) > + self.assert_qmp(result, 'return', {}) > + > + result = self.qsd.qmp('block-job-cancel', { > + 'device': 'job0', > + }) > + self.assert_qmp(result, 'return', {}) > + > + cancelled = False > + while not cancelled: > + for event in self.qsd.get_qmp().get_events(wait=10.0): The updated test took about 34 seconds on my machine; long enough that it is rightfully not part of './check -g quick', but still reasonable that I had no problems reproducing the issue while the test was running. > + if event['event'] != 'JOB_STATUS_CHANGE': > + continue > + if event['data']['status'] == 'null': > + cancelled = True > + > + bench_thr.join() > + It feels a bit odd that the test is skipped during './check -nbd', yet it IS utilizing nbd, and the fix in patch 2 is indeed in NBD code. But that's not a flaw in the test itself, just a limitation of what images we need in order to set up the NBD service in a way to trigger the problem. Reviewed-by: Eric Blake <eblake@redhat.com> I'm in a spot where I can quickly queue this through my NBD tree so we can get it backported to 8.0.1; pull request coming up, provided the full series passes a few more tests on my end.
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index 3e82c634cf..7073579a7d 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -462,6 +462,10 @@ def qmp(self, cmd: str, args: Optional[Dict[str, object]] = None) \ assert self._qmp is not None return self._qmp.cmd(cmd, args) + def get_qmp(self) -> QEMUMonitorProtocol: + assert self._qmp is not None + return self._qmp + def stop(self, kill_signal=15): self._p.send_signal(kill_signal) self._p.wait() diff --git a/tests/qemu-iotests/tests/graph-changes-while-io b/tests/qemu-iotests/tests/graph-changes-while-io index 7664f33689..750e7d4d38 100755 --- a/tests/qemu-iotests/tests/graph-changes-while-io +++ b/tests/qemu-iotests/tests/graph-changes-while-io @@ -22,19 +22,19 @@ import os from threading import Thread import iotests -from iotests import imgfmt, qemu_img, qemu_img_create, QMPTestCase, \ - QemuStorageDaemon +from iotests import imgfmt, qemu_img, qemu_img_create, qemu_io, \ + QMPTestCase, QemuStorageDaemon top = os.path.join(iotests.test_dir, 'top.img') nbd_sock = os.path.join(iotests.sock_dir, 'nbd.sock') -def do_qemu_img_bench() -> None: +def do_qemu_img_bench(count: int = 2000000) -> None: """ Do some I/O requests on `nbd_sock`. """ - qemu_img('bench', '-f', 'raw', '-c', '2000000', + qemu_img('bench', '-f', 'raw', '-c', str(count), f'nbd+unix:///node0?socket={nbd_sock}') @@ -84,6 +84,54 @@ class TestGraphChangesWhileIO(QMPTestCase): bench_thr.join() + def test_commit_while_io(self) -> None: + # Run qemu-img bench in the background + bench_thr = Thread(target=do_qemu_img_bench, args=(200000, )) + bench_thr.start() + + qemu_io('-c', 'write 0 64k', top) + qemu_io('-c', 'write 128k 64k', top) + + result = self.qsd.qmp('blockdev-add', { + 'driver': imgfmt, + 'node-name': 'overlay', + 'backing': None, + 'file': { + 'driver': 'file', + 'filename': top + } + }) + self.assert_qmp(result, 'return', {}) + + result = self.qsd.qmp('blockdev-snapshot', { + 'node': 'node0', + 'overlay': 'overlay', + }) + self.assert_qmp(result, 'return', {}) + + # While qemu-img bench is running, repeatedly commit overlay to node0 + while bench_thr.is_alive(): + result = self.qsd.qmp('block-commit', { + 'job-id': 'job0', + 'device': 'overlay', + }) + self.assert_qmp(result, 'return', {}) + + result = self.qsd.qmp('block-job-cancel', { + 'device': 'job0', + }) + self.assert_qmp(result, 'return', {}) + + cancelled = False + while not cancelled: + for event in self.qsd.get_qmp().get_events(wait=10.0): + if event['event'] != 'JOB_STATUS_CHANGE': + continue + if event['data']['status'] == 'null': + cancelled = True + + bench_thr.join() + if __name__ == '__main__': # Format must support raw backing files iotests.main(supported_fmts=['qcow', 'qcow2', 'qed'], diff --git a/tests/qemu-iotests/tests/graph-changes-while-io.out b/tests/qemu-iotests/tests/graph-changes-while-io.out index ae1213e6f8..fbc63e62f8 100644 --- a/tests/qemu-iotests/tests/graph-changes-while-io.out +++ b/tests/qemu-iotests/tests/graph-changes-while-io.out @@ -1,5 +1,5 @@ -. +.. ---------------------------------------------------------------------- -Ran 1 tests +Ran 2 tests OK
This tests exercises graph locking, draining, and graph modifications with AioContext switches a lot. Amongst others, it serves as a regression test for bdrv_graph_wrlock() deadlocking because it is called with a locked AioContext and for AioContext handling in the NBD server. Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- tests/qemu-iotests/iotests.py | 4 ++ .../qemu-iotests/tests/graph-changes-while-io | 56 +++++++++++++++++-- .../tests/graph-changes-while-io.out | 4 +- 3 files changed, 58 insertions(+), 6 deletions(-)