Message ID | 1490717566-25516-1-git-send-email-den@openvz.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, 03/28 19:12, Denis V. Lunev wrote: > Recently we expirience hang with iothreads enabled with the following > call trace: > Thread 1 (Thread 0x7fa95efebc80 (LWP 177117)): > 0 ppoll () from /lib64/libc.so.6 > 2 qemu_poll_ns () at qemu-timer.c:313 > 3 aio_poll () at aio-posix.c:457 > 4 bdrv_flush () at block/io.c:2641 > 5 bdrv_close () at block.c:2143 > 6 bdrv_delete () at block.c:2352 > 7 bdrv_unref () at block.c:3429 > 8 blk_remove_bs () at block/block-backend.c:427 > 9 blk_delete () at block/block-backend.c:178 > 10 blk_unref () at block/block-backend.c:226 > 11 object_property_del_all () at qom/object.c:399 > 12 object_finalize () at qom/object.c:461 > 13 object_unref () at qom/object.c:898 > 14 object_property_del_child () at qom/object.c:422 > 15 qmp_marshal_device_del () at qmp-marshal.c:1145 > 16 handle_qmp_command () at /usr/src/debug/qemu-2.6.0/monitor.c:3929 > > Technically bdrv_flush() stucks in > while (rwco.ret == NOT_DONE) { > aio_poll(aio_context, true); > } > but rwco.ret is equal to 0 thus we have missed wakeup. Code investigation > reveals that we do not have performed aio_context_acquire() on this call > stack. > > This patch adds missed lock. > > Signed-off-by: Denis V. Lunev <den@openvz.org> > CC: Kevin Wolf <kwolf@redhat.com> > CC: Max Reitz <mreitz@redhat.com> > CC: Eric Blake <eblake@redhat.com> > CC: Markus Armbruster <armbru@redhat.com> Nit: reading the subject I thought it's an unbalanced acquire/release, but it is actually a missing pair. In bdrv_unref we should have asserted we have acquired the AioContext, that way you wouldn't have been bit by this bug. Reviewed-by: Fam Zheng <famz@redhat.com>
On 28.03.2017 18:12, Denis V. Lunev wrote: > Recently we expirience hang with iothreads enabled with the following > call trace: > Thread 1 (Thread 0x7fa95efebc80 (LWP 177117)): > 0 ppoll () from /lib64/libc.so.6 > 2 qemu_poll_ns () at qemu-timer.c:313 > 3 aio_poll () at aio-posix.c:457 > 4 bdrv_flush () at block/io.c:2641 > 5 bdrv_close () at block.c:2143 > 6 bdrv_delete () at block.c:2352 > 7 bdrv_unref () at block.c:3429 > 8 blk_remove_bs () at block/block-backend.c:427 > 9 blk_delete () at block/block-backend.c:178 > 10 blk_unref () at block/block-backend.c:226 > 11 object_property_del_all () at qom/object.c:399 > 12 object_finalize () at qom/object.c:461 > 13 object_unref () at qom/object.c:898 > 14 object_property_del_child () at qom/object.c:422 > 15 qmp_marshal_device_del () at qmp-marshal.c:1145 > 16 handle_qmp_command () at /usr/src/debug/qemu-2.6.0/monitor.c:3929 > > Technically bdrv_flush() stucks in > while (rwco.ret == NOT_DONE) { > aio_poll(aio_context, true); > } > but rwco.ret is equal to 0 thus we have missed wakeup. Code investigation > reveals that we do not have performed aio_context_acquire() on this call > stack. > > This patch adds missed lock. > > Signed-off-by: Denis V. Lunev <den@openvz.org> > CC: Kevin Wolf <kwolf@redhat.com> > CC: Max Reitz <mreitz@redhat.com> > CC: Eric Blake <eblake@redhat.com> > CC: Markus Armbruster <armbru@redhat.com> > --- > hw/core/qdev-properties-system.c | 4 ++++ > 1 file changed, 4 insertions(+) Since this file is unmaintained but this patch is mostly a matter of the block layer (and the subject starts with "block" :-)): Thanks, applied to my block tree: https://github.com/xanClic/qemu/commits/block Max
diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c index c34be1c..e885e65 100644 --- a/hw/core/qdev-properties-system.c +++ b/hw/core/qdev-properties-system.c @@ -124,8 +124,12 @@ static void release_drive(Object *obj, const char *name, void *opaque) BlockBackend **ptr = qdev_get_prop_ptr(dev, prop); if (*ptr) { + AioContext *ctx = blk_get_aio_context(*ptr); + + aio_context_acquire(ctx); blockdev_auto_del(*ptr); blk_detach_dev(*ptr, dev); + aio_context_release(ctx); } }
Recently we expirience hang with iothreads enabled with the following call trace: Thread 1 (Thread 0x7fa95efebc80 (LWP 177117)): 0 ppoll () from /lib64/libc.so.6 2 qemu_poll_ns () at qemu-timer.c:313 3 aio_poll () at aio-posix.c:457 4 bdrv_flush () at block/io.c:2641 5 bdrv_close () at block.c:2143 6 bdrv_delete () at block.c:2352 7 bdrv_unref () at block.c:3429 8 blk_remove_bs () at block/block-backend.c:427 9 blk_delete () at block/block-backend.c:178 10 blk_unref () at block/block-backend.c:226 11 object_property_del_all () at qom/object.c:399 12 object_finalize () at qom/object.c:461 13 object_unref () at qom/object.c:898 14 object_property_del_child () at qom/object.c:422 15 qmp_marshal_device_del () at qmp-marshal.c:1145 16 handle_qmp_command () at /usr/src/debug/qemu-2.6.0/monitor.c:3929 Technically bdrv_flush() stucks in while (rwco.ret == NOT_DONE) { aio_poll(aio_context, true); } but rwco.ret is equal to 0 thus we have missed wakeup. Code investigation reveals that we do not have performed aio_context_acquire() on this call stack. This patch adds missed lock. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: Markus Armbruster <armbru@redhat.com> --- hw/core/qdev-properties-system.c | 4 ++++ 1 file changed, 4 insertions(+)