Message ID | 20240116190042.1363717-1-stefanha@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | monitor: only run coroutine commands in qemu_aio_context | expand |
16.01.2024 22:00, Stefan Hajnoczi wrote: > Several bugs have been reported related to how QMP commands are rescheduled in > qemu_aio_context: > - https://gitlab.com/qemu-project/qemu/-/issues/1933 > - https://issues.redhat.com/browse/RHEL-17369 > - https://bugzilla.redhat.com/show_bug.cgi?id=2215192 > - https://bugzilla.redhat.com/show_bug.cgi?id=2214985 > > The first instance of the bug interacted with drain_call_rcu() temporarily > dropping the BQL and resulted in vCPU threads entering device emulation code > simultaneously (something that should never happen). I set out to make > drain_call_rcu() safe to use in this environment, but Paolo and Kevin discussed > the possibility of avoiding rescheduling the monitor_qmp_dispatcher_co() > coroutine for non-coroutine commands. This would prevent monitor commands from > running during vCPU thread aio_poll() entirely and addresses the root cause. > > This patch series implements this idea. qemu-iotests is sensitive to the exact > order in which QMP events and responses are emitted. Running QMP handlers in > the iohandler AioContext causes some QMP events to be ordered differently than > before. It is therefore necessary to adjust the reference output in many test > cases. The actual QMP code change is small and everything else is just to make > qemu-iotests happy. This seems to be -stable material too, once it is applied to master. It's difficult for me to catch the issue (#1933 @gitlab) locally, but I did have some success there, and it always work after this patch is applied. It also seem to work fine on 7.2, not only on 8.2, fwiw. Thanks, /mjt
On Tue, 16 Jan 2024 at 19:01, Stefan Hajnoczi <stefanha@redhat.com> wrote: > > Several bugs have been reported related to how QMP commands are rescheduled in > qemu_aio_context: > - https://gitlab.com/qemu-project/qemu/-/issues/1933 > - https://issues.redhat.com/browse/RHEL-17369 > - https://bugzilla.redhat.com/show_bug.cgi?id=2215192 > - https://bugzilla.redhat.com/show_bug.cgi?id=2214985 > > The first instance of the bug interacted with drain_call_rcu() temporarily > dropping the BQL and resulted in vCPU threads entering device emulation code > simultaneously (something that should never happen). I set out to make > drain_call_rcu() safe to use in this environment, but Paolo and Kevin discussed > the possibility of avoiding rescheduling the monitor_qmp_dispatcher_co() > coroutine for non-coroutine commands. This would prevent monitor commands from > running during vCPU thread aio_poll() entirely and addresses the root cause. > > This patch series implements this idea. qemu-iotests is sensitive to the exact > order in which QMP events and responses are emitted. Running QMP handlers in > the iohandler AioContext causes some QMP events to be ordered differently than > before. It is therefore necessary to adjust the reference output in many test > cases. The actual QMP code change is small and everything else is just to make > qemu-iotests happy. Hi; we have a suspicion that this change has resulted in a flaky-CI test: iotest-144 sometimes fails, apparently because a "return" result from QMP isn't always returned at the same place in relation to other QMP events. Could you have a look at it? https://gitlab.com/qemu-project/qemu/-/issues/2126 thanks -- PMM
It's easily for me to encounter " ../block/qcow2.c:5263: ImageInfoSpecific *qcow2_get_specific_info(BlockDriverState *, Error **): Assertion `false' failed" issue during 1Q vhost-user interface + RT VM + post-copy migration After applying this patch, the issue is still not reproduced even if I repeat the same migration test for 60 times. Tested-by: Yanghang Liu <yanghliu@redhat.com> Best Regards, YangHang Liu On Mon, Jan 29, 2024 at 7:39 PM Peter Maydell <peter.maydell@linaro.org> wrote: > > On Tue, 16 Jan 2024 at 19:01, Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > > Several bugs have been reported related to how QMP commands are rescheduled in > > qemu_aio_context: > > - https://gitlab.com/qemu-project/qemu/-/issues/1933 > > - https://issues.redhat.com/browse/RHEL-17369 > > - https://bugzilla.redhat.com/show_bug.cgi?id=2215192 > > - https://bugzilla.redhat.com/show_bug.cgi?id=2214985 > > > > The first instance of the bug interacted with drain_call_rcu() temporarily > > dropping the BQL and resulted in vCPU threads entering device emulation code > > simultaneously (something that should never happen). I set out to make > > drain_call_rcu() safe to use in this environment, but Paolo and Kevin discussed > > the possibility of avoiding rescheduling the monitor_qmp_dispatcher_co() > > coroutine for non-coroutine commands. This would prevent monitor commands from > > running during vCPU thread aio_poll() entirely and addresses the root cause. > > > > This patch series implements this idea. qemu-iotests is sensitive to the exact > > order in which QMP events and responses are emitted. Running QMP handlers in > > the iohandler AioContext causes some QMP events to be ordered differently than > > before. It is therefore necessary to adjust the reference output in many test > > cases. The actual QMP code change is small and everything else is just to make > > qemu-iotests happy. > > Hi; we have a suspicion that this change has resulted in a flaky-CI > test: iotest-144 sometimes fails, apparently because a "return" > result from QMP isn't always returned at the same place in relation > to other QMP events. Could you have a look at it? > > https://gitlab.com/qemu-project/qemu/-/issues/2126 > > thanks > -- PMM >