Message ID | 20180823143125.16767-1-marcandre.lureau@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | Fix socket chardev regression | expand |
On Thu, Aug 23, 2018 at 04:31:20PM +0200, Marc-André Lureau wrote: > Hi, > > In commit 25679e5d58e "chardev: tcp: postpone async connection setup" > (and its follow up 99f2f54174a59), Peter moved chardev socket > connection to machine_done event. However, chardev created later will > no longer attempt to connect, and chardev created in tests do not have > machine_done event (breaking some of vhost-user-test). > > The goal was to move the "connect" source to the chardev frontend > context (the monitor thread context in his case). chr->gcontext is set > with qemu_chr_fe_set_handlers(). But there is no guarantee that the > function will be called in general, so we can't delay connection until > then: the chardev should still attempt to connect during open(), using > the main context. > > An alternative would be to specify the iothread during chardev > creation. Setting up monitor OOB would be quite different too, it > would take the same iothread as argument. > > 99f2f54174a595e is also a bit problematic, since it will behave > differently before and after machine_done (the first case gives a > chance to use a different context reliably, the second looks racy) > > In the end, I am not sure this is all necessary, as chardev callbacks > are called after qemu_chr_fe_set_handlers(), at which point the > context of sources are updated. In "char-socket: update all ioc > handlers when changing context", I moved also the hup handler to the > updated context. So unless the main thread is already stuck, we can > setup a different context for the chardev at that time. Or not? > > v2: > - fix a random socket chardev test failure I have no problem on patch 3-5, but aren't patch 1-2 still breaking the context switch of chardev? Or did I misunderstood? Regards,
Hi On Fri, Aug 24, 2018 at 5:45 AM, Peter Xu <peterx@redhat.com> wrote: > On Thu, Aug 23, 2018 at 04:31:20PM +0200, Marc-André Lureau wrote: >> Hi, >> >> In commit 25679e5d58e "chardev: tcp: postpone async connection setup" >> (and its follow up 99f2f54174a59), Peter moved chardev socket >> connection to machine_done event. However, chardev created later will >> no longer attempt to connect, and chardev created in tests do not have >> machine_done event (breaking some of vhost-user-test). >> >> The goal was to move the "connect" source to the chardev frontend >> context (the monitor thread context in his case). chr->gcontext is set >> with qemu_chr_fe_set_handlers(). But there is no guarantee that the >> function will be called in general, so we can't delay connection until >> then: the chardev should still attempt to connect during open(), using >> the main context. >> >> An alternative would be to specify the iothread during chardev >> creation. Setting up monitor OOB would be quite different too, it >> would take the same iothread as argument. >> >> 99f2f54174a595e is also a bit problematic, since it will behave >> differently before and after machine_done (the first case gives a >> chance to use a different context reliably, the second looks racy) >> >> In the end, I am not sure this is all necessary, as chardev callbacks >> are called after qemu_chr_fe_set_handlers(), at which point the >> context of sources are updated. In "char-socket: update all ioc >> handlers when changing context", I moved also the hup handler to the >> updated context. So unless the main thread is already stuck, we can >> setup a different context for the chardev at that time. Or not? >> >> v2: >> - fix a random socket chardev test failure > > I have no problem on patch 3-5, but aren't patch 1-2 still breaking > the context switch of chardev? Or did I misunderstood? Switching context is broken/racy regardless. My current plan is to suspend the context to switch to, and update all sources before thawing the target context. If you prefer we can delay the revert, but I would rather have them because the regression seems much worse to me. > > Regards, > > -- > Peter Xu
On Fri, Aug 24, 2018 at 10:32:58AM +0200, Marc-André Lureau wrote: > Hi > > On Fri, Aug 24, 2018 at 5:45 AM, Peter Xu <peterx@redhat.com> wrote: > > On Thu, Aug 23, 2018 at 04:31:20PM +0200, Marc-André Lureau wrote: > >> Hi, > >> > >> In commit 25679e5d58e "chardev: tcp: postpone async connection setup" > >> (and its follow up 99f2f54174a59), Peter moved chardev socket > >> connection to machine_done event. However, chardev created later will > >> no longer attempt to connect, and chardev created in tests do not have > >> machine_done event (breaking some of vhost-user-test). > >> > >> The goal was to move the "connect" source to the chardev frontend > >> context (the monitor thread context in his case). chr->gcontext is set > >> with qemu_chr_fe_set_handlers(). But there is no guarantee that the > >> function will be called in general, so we can't delay connection until > >> then: the chardev should still attempt to connect during open(), using > >> the main context. > >> > >> An alternative would be to specify the iothread during chardev > >> creation. Setting up monitor OOB would be quite different too, it > >> would take the same iothread as argument. > >> > >> 99f2f54174a595e is also a bit problematic, since it will behave > >> differently before and after machine_done (the first case gives a > >> chance to use a different context reliably, the second looks racy) > >> > >> In the end, I am not sure this is all necessary, as chardev callbacks > >> are called after qemu_chr_fe_set_handlers(), at which point the > >> context of sources are updated. In "char-socket: update all ioc > >> handlers when changing context", I moved also the hup handler to the > >> updated context. So unless the main thread is already stuck, we can > >> setup a different context for the chardev at that time. Or not? > >> > >> v2: > >> - fix a random socket chardev test failure > > > > I have no problem on patch 3-5, but aren't patch 1-2 still breaking > > the context switch of chardev? Or did I misunderstood? > > Switching context is broken/racy regardless. My current plan is to > suspend the context to switch to, and update all sources before > thawing the target context. > > If you prefer we can delay the revert, but I would rather have them > because the regression seems much worse to me. Have you started working on above? I started to code up some RFC patch to let the chardev maintain the iothreads (now it'll only have a monitor iothread) then we don't need to switch context for chardev any more. I plan to post it today for a quick look. With that, this series will not break anything then. > > > > > Regards, > > > > -- > > Peter Xu Regards,
Hi On Fri, Aug 24, 2018 at 10:45 AM Peter Xu <peterx@redhat.com> wrote: > > On Fri, Aug 24, 2018 at 10:32:58AM +0200, Marc-André Lureau wrote: > > Hi > > > > On Fri, Aug 24, 2018 at 5:45 AM, Peter Xu <peterx@redhat.com> wrote: > > > On Thu, Aug 23, 2018 at 04:31:20PM +0200, Marc-André Lureau wrote: > > >> Hi, > > >> > > >> In commit 25679e5d58e "chardev: tcp: postpone async connection setup" > > >> (and its follow up 99f2f54174a59), Peter moved chardev socket > > >> connection to machine_done event. However, chardev created later will > > >> no longer attempt to connect, and chardev created in tests do not have > > >> machine_done event (breaking some of vhost-user-test). > > >> > > >> The goal was to move the "connect" source to the chardev frontend > > >> context (the monitor thread context in his case). chr->gcontext is set > > >> with qemu_chr_fe_set_handlers(). But there is no guarantee that the > > >> function will be called in general, so we can't delay connection until > > >> then: the chardev should still attempt to connect during open(), using > > >> the main context. > > >> > > >> An alternative would be to specify the iothread during chardev > > >> creation. Setting up monitor OOB would be quite different too, it > > >> would take the same iothread as argument. > > >> > > >> 99f2f54174a595e is also a bit problematic, since it will behave > > >> differently before and after machine_done (the first case gives a > > >> chance to use a different context reliably, the second looks racy) > > >> > > >> In the end, I am not sure this is all necessary, as chardev callbacks > > >> are called after qemu_chr_fe_set_handlers(), at which point the > > >> context of sources are updated. In "char-socket: update all ioc > > >> handlers when changing context", I moved also the hup handler to the > > >> updated context. So unless the main thread is already stuck, we can > > >> setup a different context for the chardev at that time. Or not? > > >> > > >> v2: > > >> - fix a random socket chardev test failure > > > > > > I have no problem on patch 3-5, but aren't patch 1-2 still breaking > > > the context switch of chardev? Or did I misunderstood? > > > > Switching context is broken/racy regardless. My current plan is to > > suspend the context to switch to, and update all sources before > > thawing the target context. > > > > If you prefer we can delay the revert, but I would rather have them > > because the regression seems much worse to me. > > Have you started working on above? I started to code up some RFC > patch to let the chardev maintain the iothreads (now it'll only have a > monitor iothread) then we don't need to switch context for chardev any > more. I plan to post it today for a quick look. With that, this > series will not break anything then. No, I haven't really started the code, but spent some time thinking... Ok, if you manage to put together that proposal, that sounds interesting. Nevertheless, I can work on improving/fixing the race when switching contexts (already used by colo) > > > > > > > > > Regards, > > > > > > -- > > > Peter Xu > > Regards, > > -- > Peter Xu >
On Fri, Aug 24, 2018 at 10:51:18AM +0200, Marc-André Lureau wrote: > Hi > > On Fri, Aug 24, 2018 at 10:45 AM Peter Xu <peterx@redhat.com> wrote: > > > > On Fri, Aug 24, 2018 at 10:32:58AM +0200, Marc-André Lureau wrote: > > > Hi > > > > > > On Fri, Aug 24, 2018 at 5:45 AM, Peter Xu <peterx@redhat.com> wrote: > > > > On Thu, Aug 23, 2018 at 04:31:20PM +0200, Marc-André Lureau wrote: > > > >> Hi, > > > >> > > > >> In commit 25679e5d58e "chardev: tcp: postpone async connection setup" > > > >> (and its follow up 99f2f54174a59), Peter moved chardev socket > > > >> connection to machine_done event. However, chardev created later will > > > >> no longer attempt to connect, and chardev created in tests do not have > > > >> machine_done event (breaking some of vhost-user-test). > > > >> > > > >> The goal was to move the "connect" source to the chardev frontend > > > >> context (the monitor thread context in his case). chr->gcontext is set > > > >> with qemu_chr_fe_set_handlers(). But there is no guarantee that the > > > >> function will be called in general, so we can't delay connection until > > > >> then: the chardev should still attempt to connect during open(), using > > > >> the main context. > > > >> > > > >> An alternative would be to specify the iothread during chardev > > > >> creation. Setting up monitor OOB would be quite different too, it > > > >> would take the same iothread as argument. > > > >> > > > >> 99f2f54174a595e is also a bit problematic, since it will behave > > > >> differently before and after machine_done (the first case gives a > > > >> chance to use a different context reliably, the second looks racy) > > > >> > > > >> In the end, I am not sure this is all necessary, as chardev callbacks > > > >> are called after qemu_chr_fe_set_handlers(), at which point the > > > >> context of sources are updated. In "char-socket: update all ioc > > > >> handlers when changing context", I moved also the hup handler to the > > > >> updated context. So unless the main thread is already stuck, we can > > > >> setup a different context for the chardev at that time. Or not? > > > >> > > > >> v2: > > > >> - fix a random socket chardev test failure > > > > > > > > I have no problem on patch 3-5, but aren't patch 1-2 still breaking > > > > the context switch of chardev? Or did I misunderstood? > > > > > > Switching context is broken/racy regardless. My current plan is to > > > suspend the context to switch to, and update all sources before > > > thawing the target context. > > > > > > If you prefer we can delay the revert, but I would rather have them > > > because the regression seems much worse to me. > > > > Have you started working on above? I started to code up some RFC > > patch to let the chardev maintain the iothreads (now it'll only have a > > monitor iothread) then we don't need to switch context for chardev any > > more. I plan to post it today for a quick look. With that, this > > series will not break anything then. > > No, I haven't really started the code, but spent some time thinking... > > Ok, if you manage to put together that proposal, that sounds interesting. Not really your iothread proposal; I tried to keep the interface unchanged, but move the iothread ownership from the monitor code to chardev code. Please have a look at: [RFC 0/3] chardev: introduce chardev contexts > > Nevertheless, I can work on improving/fixing the race when switching > contexts (already used by colo) Ouch, is colo changing context too? I hope that series won't break it. (Ok if so I believe it'll very possible to break it...) Regards,