Message ID | 20210318133550.13120-1-s.reiter@proxmox.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB | expand |
On Thu, Mar 18, 2021 at 02:35:50PM +0100, Stefan Reiter wrote: > If OOB is disabled, events received in monitor_qmp_event will be handled > in the main context. Thus, we must not acquire a qmp_queue_lock there, > as the dispatcher coroutine holds one over a yield point, where it > expects to be rescheduled from the main context. If a CHR_EVENT_CLOSED > event is received just then, it can race and block the main thread by > waiting on the queue lock. > > Run monitor_qmp_cleanup_queue_and_resume in a BH on the iohandler > thread, so the main thread can always make progress during the > reschedule. > > The delaying of the cleanup is safe, since the dispatcher always moves > back to the iothread afterward, and thus the cleanup will happen before > it gets to its next iteration. > > Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> > --- This is a tough one. It *may* be fine, but I wonder if we can approach this differently: From what I can gather we have the following call stacks & contexts: Guarded lock (lock & release): * monitor_qmp_cleanup_queue_and_resume by monitor_qmp_event by file handler (from I/O loop) ^ iohandler_context (assuming that's where the file handling happens...) (after this patch as BH though) * handle_qmp_command a) by the json parser (which is also re-initialized by monitor_qmp_event btw., haven't checked if that can also "trigger" its methods immediately) b) by monitor_qmp_read by file handler (from I/O loop) ^ iohandler_context Lock-"returning": * monitor_qmp_requests_pop_any_with_lock by coroutine_fn monitor_qmp_dispatcher_co ^ iohandler_context Lock-releasing: * coroutine_fn monitor_qmp_dispatcher_co ^ qemu_aio_context The only *weird* thing that immediately pops out here is `monitor_qmp_requests_pop_any_with_lock()` keeping a lock while switching contexts. This is done in order to allow `AIO_WAIT_WHILE` to work while making progress on the events, but do we actually already need to be in this context for the OOB `monitor_resume()` call or can we defer the context switch to after having done that and released the lock? `monitor_resume()` itself seems to simply schedule a BH which should work regardless if I'm not mistaken. There's also a `readline_show_prompt()` call, but that *looks* harmless? `monitor_resume()` is also called without the lock later on, so even if it needs to be in this context at that point for whatever reason, does it need the lock?
On 3/22/21 12:08 PM, Wolfgang Bumiller wrote: > On Thu, Mar 18, 2021 at 02:35:50PM +0100, Stefan Reiter wrote: >> If OOB is disabled, events received in monitor_qmp_event will be handled >> in the main context. Thus, we must not acquire a qmp_queue_lock there, >> as the dispatcher coroutine holds one over a yield point, where it >> expects to be rescheduled from the main context. If a CHR_EVENT_CLOSED >> event is received just then, it can race and block the main thread by >> waiting on the queue lock. >> >> Run monitor_qmp_cleanup_queue_and_resume in a BH on the iohandler >> thread, so the main thread can always make progress during the >> reschedule. >> >> The delaying of the cleanup is safe, since the dispatcher always moves >> back to the iothread afterward, and thus the cleanup will happen before >> it gets to its next iteration. >> >> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> >> --- > > This is a tough one. It *may* be fine, but I wonder if we can approach > this differently: > > From what I can gather we have the following call stacks & contexts: > > Guarded lock (lock & release): > * monitor_qmp_cleanup_queue_and_resume > by monitor_qmp_event > by file handler (from I/O loop) > ^ iohandler_context (assuming that's where the file handling happens...) > (after this patch as BH though) > > * handle_qmp_command > a) by the json parser (which is also re-initialized by > monitor_qmp_event btw., haven't checked if that can also > "trigger" its methods immediately) > b) by monitor_qmp_read > by file handler (from I/O loop) > ^ iohandler_context > > Lock-"returning": > * monitor_qmp_requests_pop_any_with_lock > by coroutine_fn monitor_qmp_dispatcher_co > ^ iohandler_context > > Lock-releasing: > * coroutine_fn monitor_qmp_dispatcher_co > ^ qemu_aio_context > > The only *weird* thing that immediately pops out here is > `monitor_qmp_requests_pop_any_with_lock()` keeping a lock while > switching contexts. monitor_qmp_dispatcher_co? _pop_any_ doesn't switch contexts... But yes, that is weird, as I mentioned in my original mail too. > This is done in order to allow `AIO_WAIT_WHILE` to work while making > progress on the events, but do we actually already need to be in this > context for the OOB `monitor_resume()` call or can we defer the context > switch to after having done that and released the lock? > `monitor_resume()` itself seems to simply schedule a BH which should > work regardless if I'm not mistaken. There's also a > `readline_show_prompt()` call, but that *looks* harmless? The BH should indeed be harmless since we don't schedule on qemu_get_current_aio_context, and the readline_show_prompt call we can ignore here since it's guarded with "!monitor_is_qmp(mon)". > `monitor_resume()` is also called without the lock later on, so even if > it needs to be in this context at that point for whatever reason, does > it need the lock? > It doesn't access the queue, so I don't see why it'd need the lock. And as you said, it currently works without too, actually, before commit 88daf0996c ("qmp: Resume OOB-enabled monitor before processing the request") it always did so. I'll cobble together a v2 with this in mind.
Wolfgang Bumiller <w.bumiller@proxmox.com> writes: > On Thu, Mar 18, 2021 at 02:35:50PM +0100, Stefan Reiter wrote: >> If OOB is disabled, events received in monitor_qmp_event will be handled >> in the main context. Thus, we must not acquire a qmp_queue_lock there, >> as the dispatcher coroutine holds one over a yield point, where it >> expects to be rescheduled from the main context. If a CHR_EVENT_CLOSED >> event is received just then, it can race and block the main thread by >> waiting on the queue lock. >> >> Run monitor_qmp_cleanup_queue_and_resume in a BH on the iohandler >> thread, so the main thread can always make progress during the >> reschedule. >> >> The delaying of the cleanup is safe, since the dispatcher always moves >> back to the iothread afterward, and thus the cleanup will happen before >> it gets to its next iteration. >> >> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> >> --- > > This is a tough one. It *may* be fine, but I wonder if we can approach > this differently: You guys make my head hurt. I understand we're talking about a bug. Is it a recent regression, or an older bug? How badly does the bug affect users? I'm about to vanish for my Easter break... If the bug must be fixed for 6.0, just waiting for me to come back seems unadvisable. [...]
On 3/26/21 3:48 PM, Markus Armbruster wrote: > Wolfgang Bumiller <w.bumiller@proxmox.com> writes: > >> On Thu, Mar 18, 2021 at 02:35:50PM +0100, Stefan Reiter wrote: >>> If OOB is disabled, events received in monitor_qmp_event will be handled >>> in the main context. Thus, we must not acquire a qmp_queue_lock there, >>> as the dispatcher coroutine holds one over a yield point, where it >>> expects to be rescheduled from the main context. If a CHR_EVENT_CLOSED >>> event is received just then, it can race and block the main thread by >>> waiting on the queue lock. >>> >>> Run monitor_qmp_cleanup_queue_and_resume in a BH on the iohandler >>> thread, so the main thread can always make progress during the >>> reschedule. >>> >>> The delaying of the cleanup is safe, since the dispatcher always moves >>> back to the iothread afterward, and thus the cleanup will happen before >>> it gets to its next iteration. >>> >>> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> >>> --- >> >> This is a tough one. It *may* be fine, but I wonder if we can approach >> this differently: > > You guys make my head hurt. > That makes three of us, I think. > I understand we're talking about a bug. Is it a recent regression, or > an older bug? How badly does the bug affect users? > It's a regression introduced with the coroutinization of QMP in 5.2. It only occurs when OOB is disabled - in our downstream we have it disabled unconditionally, as it caused some issues in the past. It affected quite a lot of our users, some when the host was under CPU load, some seemingly random. When it happened it usually hit multiple VMs at once, completely hanging them. Just for reference, our forum has the full story: https://forum.proxmox.com/threads/all-vms-locking-up-after-latest-pve-update.85397/ > I'm about to vanish for my Easter break... If the bug must be fixed for > 6.0, just waiting for me to come back seems unadvisable. > Since it doesn't happen with OOB (so, by default), I don't think it's that urgent. BTW, I've sent a v2 as well: https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg07590.html That one we have shipped to our users for now, with mostly good success, though a few reports that something still hangs - which could be people failing to upgrade, or some other issue still unsolved... And happy easter break :)
diff --git a/monitor/qmp.c b/monitor/qmp.c index 2b0308f933..83eb440b29 100644 --- a/monitor/qmp.c +++ b/monitor/qmp.c @@ -74,8 +74,9 @@ static void monitor_qmp_cleanup_req_queue_locked(MonitorQMP *mon) } } -static void monitor_qmp_cleanup_queue_and_resume(MonitorQMP *mon) +static void monitor_qmp_cleanup_queue_and_resume(void *opaque) { + MonitorQMP *mon = opaque; QEMU_LOCK_GUARD(&mon->qmp_queue_lock); /* @@ -453,8 +454,18 @@ static void monitor_qmp_event(void *opaque, QEMUChrEvent event) * backend is still open. For example, when the backend is * stdio, it's possible that stdout is still open when stdin * is closed. + * + * monitor_qmp_cleanup_queue_and_resume must not run in main + * context, as it acquires a qmp_queue_lock, which is held by + * the dispatcher coroutine during a reschedule/yield to the + * main context, and could thus lead to a deadlock. + * + * This is safe to delay, since the dispatcher also moves + * back to the iohandler context before reaquiring any + * queue locks. */ - monitor_qmp_cleanup_queue_and_resume(mon); + aio_bh_schedule_oneshot(iohandler_get_aio_context(), + monitor_qmp_cleanup_queue_and_resume, mon); json_message_parser_destroy(&mon->parser); json_message_parser_init(&mon->parser, handle_qmp_command, mon, NULL);
If OOB is disabled, events received in monitor_qmp_event will be handled in the main context. Thus, we must not acquire a qmp_queue_lock there, as the dispatcher coroutine holds one over a yield point, where it expects to be rescheduled from the main context. If a CHR_EVENT_CLOSED event is received just then, it can race and block the main thread by waiting on the queue lock. Run monitor_qmp_cleanup_queue_and_resume in a BH on the iohandler thread, so the main thread can always make progress during the reschedule. The delaying of the cleanup is safe, since the dispatcher always moves back to the iothread afterward, and thus the cleanup will happen before it gets to its next iteration. Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> --- We've had some spurious reports of people reporting (usually multiple) total VM hangs on our forum, and finally got our hands on some stack traces: Thread 1 (Thread 0x7fa59d476340 (LWP 1306954)): #0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103 #1 0x00007fa5a8335714 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x55722bf3a848) at ../nptl/pthread_mutex_lock.c:80 #2 0x000055722a475e39 in qemu_mutex_lock_impl (mutex=0x55722bf3a848, file=0x55722a5d7149 "../monitor/qmp.c", line=80) at ../util/qemu-thread-posix.c:79 #3 0x000055722a3fb686 in monitor_qmp_cleanup_queue_and_resume (mon=0x55722bf3a730) at ../monitor/qmp.c:80 #4 monitor_qmp_event (opaque=0x55722bf3a730, event=<optimized out>) at ../monitor/qmp.c:421 #5 0x000055722a3f9505 in tcp_chr_disconnect_locked (chr=0x55722bc68fa0) at ../chardev/char-socket.c:507 #6 0x000055722a3f9550 in tcp_chr_disconnect (chr=0x55722bc68fa0) at ../chardev/char-socket.c:517 #7 0x000055722a3f959e in tcp_chr_hup (channel=<optimized out>, cond=<optimized out>, opaque=<optimized out>) at ../chardev/char-socket.c:557 #8 0x00007fa5a9c2edd8 in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0 #9 0x000055722a47a848 in glib_pollfds_poll () at ../util/main-loop.c:221 #10 os_host_main_loop_wait (timeout=<optimized out>) at ../util/main-loop.c:244 #11 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:520 #12 0x000055722a2570a1 in qemu_main_loop () at ../softmmu/vl.c:1678 #13 0x0000557229fc178e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:50 I managed to reproduce it reliably by adding a 'sleep(1)' to the beginning of handle_qmp_command, then running this: ./build/qemu-system-x86_64 \ -chardev 'socket,id=qmp,path=./qmp.socket,server=on,wait=off' \ -mon 'chardev=qmp,mode=control' yes "query-version" | timeout 2s ./scripts/qmp/qmp-shell ./qmp.socket When the timeout hits, the main loop always ends up in the trace above, with the dispatch coroutine having just yielded to reschedule itself. Holding a QemuMutex over a yield point seems pretty dangerous in general, but it seems to be working for now... monitor/qmp.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)