Message ID | 20191025170505.2834957-1-anthony.perard@citrix.com (mailing list archive) |
---|---|
Headers | show |
Series | Fix: libxl workaround, multiple connection to single QMP socket | expand |
On 25/10/2019 19:05, Anthony PERARD wrote: > Patch series available in this git branch: > https://xenbits.xen.org/git-http/people/aperard/xen-unstable.git br.fix-ev_qmp-multi-connect-v1 > > Hi, > > QEMU's QMP socket doesn't allow multiple concurrent connection. Also, it > listen() on the socket with a `backlog' of only 1. On Linux at least, once that > backlog is filled connect() will return EAGAIN if the socket fd is > non-blocking. libxl may attempt many concurrent connect() attempt if for > example a guest is started with several PCI passthrough devices, and a > connect() failure lead to a failure to start the guest. Hi Anthony, Just tested with the patch series and it fixes my issue with starting a guest with several PCI passthrough devices. Thanks, Sander > Since we can't change the listen()'s `backlog' that QEMU use, we need other > ways to workaround the issue. This patch series introduce a lock to acquire > before attempting to connect() to the QMP socket. Since the lock might be held > for to long, the series also introduce a way to cancel the acquisition of the > lock, this means killing the process that tries to get the lock. > > Alternatively to this craziness, it might be possible to increase the `backlog' > value by having libxl opening the QMP socket on behalf of QEMU. But this is > only possible with a recent version of QEMU (2.12 or newer, released in Apr > 2018, or qemu-xen-4.12 or newer). It would involve to discover QEMU's > capability before we start the DM, which libxl isn't capable yet. > > Cheers, > > Anthony PERARD (4): > libxl: Introduce libxl__ev_child_kill > libxl: Introduce libxl__ev_qmplock > libxl: libxl__ev_qmp_send now takes an egc > libxl_qmp: Have a lock for QMP socket access > > tools/libxl/libxl_disk.c | 6 +-- > tools/libxl/libxl_dm.c | 8 ++-- > tools/libxl/libxl_dom_save.c | 2 +- > tools/libxl/libxl_dom_suspend.c | 2 +- > tools/libxl/libxl_domain.c | 8 ++-- > tools/libxl/libxl_event.c | 3 +- > tools/libxl/libxl_fork.c | 55 ++++++++++++++++++++++++ > tools/libxl/libxl_internal.c | 31 +++++++++++++- > tools/libxl/libxl_internal.h | 53 +++++++++++++++++------ > tools/libxl/libxl_pci.c | 8 ++-- > tools/libxl/libxl_qmp.c | 75 +++++++++++++++++++++++++-------- > tools/libxl/libxl_usb.c | 28 ++++++------ > 12 files changed, 219 insertions(+), 60 deletions(-) >
Hi. Thanks for tackling this swamp. All very unfortunate. Anthony PERARD writes ("[RFC XEN PATCH for-4.13 0/4] Fix: libxl workaround, multiple connection to single QMP socket"): > Alternatively to this craziness, it might be possible to increase > the `backlog' value by having libxl opening the QMP socket on behalf > of QEMU. But this is only possible with a recent version of QEMU > (2.12 or newer, released in Apr 2018, or qemu-xen-4.12 or newer). It > would involve to discover QEMU's capability before we start the DM, > which libxl isn't capable yet. I have an ancient unapplied patch somewhere which runs qemu --help and greps the output. If you would like, I can dig it out. But one problem with that approach is this: without that feature in qemu, what would we do ? Live with the bug where domain creation fails ? Bodge it by serialising within domain create (awkwardating the code) ? I have some other suggestions which ought to be considered: 1. Send a patch to qemu upstream to allow specifying the socket listen queue. 1(a) Expect distros to apply that patch to older qemus, if they ship older qemus. Have libxl unconditionally specify that argument. 1(b) grep the help output (as I propose above) and if the patch is not present, use LD_PRELOAD to wrap listen(2). 2. Send a patch to qemu upstream to change the fixed queue length from 1 to 10000. Expect distros to apply that patch to older qemus (even, perhaps, if it is not accepted upstream!) Change libxl to detect EAGAIN from qmp connect() and print a message explaining what patch is missing. Since you have provided an implementation of the fork/lock strategy, I'll now go and do a detailed review of that. Thanks, Ian.
On Mon, Oct 28, 2019 at 11:25:26AM +0000, Ian Jackson wrote: > Hi. Thanks for tackling this swamp. All very unfortunate. > > Anthony PERARD writes ("[RFC XEN PATCH for-4.13 0/4] Fix: libxl workaround, multiple connection to single QMP socket"): > > Alternatively to this craziness, it might be possible to increase > > the `backlog' value by having libxl opening the QMP socket on behalf > > of QEMU. But this is only possible with a recent version of QEMU > > (2.12 or newer, released in Apr 2018, or qemu-xen-4.12 or newer). It > > would involve to discover QEMU's capability before we start the DM, > > which libxl isn't capable yet. > > I have an ancient unapplied patch somewhere which runs qemu --help > and greps the output. If you would like, I can dig it out. > > But one problem with that approach is this: without that feature in > qemu, what would we do ? Live with the bug where domain creation > fails ? Bodge it by serialising within domain create (awkwardating > the code) ? > > I have some other suggestions which ought to be considered: > > > 1. Send a patch to qemu upstream to allow specifying the socket listen > queue. > > 1(a) Expect distros to apply that patch to older qemus, if they ship > older qemus. Have libxl unconditionally specify that argument. > > 1(b) grep the help output (as I propose above) and if the patch is not > present, use LD_PRELOAD to wrap listen(2). > > > 2. Send a patch to qemu upstream to change the fixed queue length from > 1 to 10000. Expect distros to apply that patch to older qemus (even, > perhaps, if it is not accepted upstream!) Change libxl to detect > EAGAIN from qmp connect() and print a message explaining what patch is > missing. Those suggestions are interesting idea, but I would prefer to have libxl been able to deal with any version of QEMU, so without having to patch QEMU. Beside serialising QMP access in the code, fork/lock strategy might be the only other way. (Well there is also fork/connect with a blocking fd, but we already have code for fork/lock.) So I'll keep working on the fork/lock strategy. > Since you have provided an implementation of the fork/lock strategy, > I'll now go and do a detailed review of that. Thanks,
Anthony PERARD writes ("Re: [RFC XEN PATCH for-4.13 0/4] Fix: libxl workaround, multiple connection to single QMP socket"): > Those suggestions are interesting idea, but I would prefer to have libxl > been able to deal with any version of QEMU, so without having to patch > QEMU. Beside serialising QMP access in the code, fork/lock strategy > might be the only other way. (Well there is also fork/connect with a > blocking fd, but we already have code for fork/lock.) > > So I'll keep working on the fork/lock strategy. OK. Thanks for the detailed reply, which makes sense to me. Ian.