mbox series

[v2,0/5] Fix socket chardev regression

Message ID 20180823143125.16767-1-marcandre.lureau@redhat.com (mailing list archive)
Headers show
Series Fix socket chardev regression | expand

Message

Marc-André Lureau Aug. 23, 2018, 2:31 p.m. UTC
Hi,

In commit 25679e5d58e "chardev: tcp: postpone async connection setup"
(and its follow up 99f2f54174a59), Peter moved chardev socket
connection to machine_done event. However, chardev created later will
no longer attempt to connect, and chardev created in tests do not have
machine_done event (breaking some of vhost-user-test).

The goal was to move the "connect" source to the chardev frontend
context (the monitor thread context in his case). chr->gcontext is set
with qemu_chr_fe_set_handlers(). But there is no guarantee that the
function will be called in general, so we can't delay connection until
then: the chardev should still attempt to connect during open(), using
the main context.

An alternative would be to specify the iothread during chardev
creation. Setting up monitor OOB would be quite different too, it
would take the same iothread as argument.

99f2f54174a595e is also a bit problematic, since it will behave
differently before and after machine_done (the first case gives a
chance to use a different context reliably, the second looks racy)

In the end, I am not sure this is all necessary, as chardev callbacks
are called after qemu_chr_fe_set_handlers(), at which point the
context of sources are updated. In "char-socket: update all ioc
handlers when changing context", I moved also the hup handler to the
updated context. So unless the main thread is already stuck, we can
setup a different context for the chardev at that time. Or not?

v2:
- fix a random socket chardev test failure

Marc-André Lureau (5):
  Revert "chardev: tcp: postpone TLS work until machine done"
  Revert "chardev: tcp: postpone async connection setup"
  char-socket: update all ioc handlers when changing context
  test-char: fix random socket test failure
  test-char: add socket reconnect test

 chardev/char-socket.c | 86 ++++++++++++++++++-------------------------
 tests/test-char.c     | 25 +++++++++++--
 2 files changed, 57 insertions(+), 54 deletions(-)

Comments

Peter Xu Aug. 24, 2018, 3:45 a.m. UTC | #1
On Thu, Aug 23, 2018 at 04:31:20PM +0200, Marc-André Lureau wrote:
> Hi,
> 
> In commit 25679e5d58e "chardev: tcp: postpone async connection setup"
> (and its follow up 99f2f54174a59), Peter moved chardev socket
> connection to machine_done event. However, chardev created later will
> no longer attempt to connect, and chardev created in tests do not have
> machine_done event (breaking some of vhost-user-test).
> 
> The goal was to move the "connect" source to the chardev frontend
> context (the monitor thread context in his case). chr->gcontext is set
> with qemu_chr_fe_set_handlers(). But there is no guarantee that the
> function will be called in general, so we can't delay connection until
> then: the chardev should still attempt to connect during open(), using
> the main context.
> 
> An alternative would be to specify the iothread during chardev
> creation. Setting up monitor OOB would be quite different too, it
> would take the same iothread as argument.
> 
> 99f2f54174a595e is also a bit problematic, since it will behave
> differently before and after machine_done (the first case gives a
> chance to use a different context reliably, the second looks racy)
> 
> In the end, I am not sure this is all necessary, as chardev callbacks
> are called after qemu_chr_fe_set_handlers(), at which point the
> context of sources are updated. In "char-socket: update all ioc
> handlers when changing context", I moved also the hup handler to the
> updated context. So unless the main thread is already stuck, we can
> setup a different context for the chardev at that time. Or not?
> 
> v2:
> - fix a random socket chardev test failure

I have no problem on patch 3-5, but aren't patch 1-2 still breaking
the context switch of chardev?  Or did I misunderstood?

Regards,
Marc-André Lureau Aug. 24, 2018, 8:32 a.m. UTC | #2
Hi

On Fri, Aug 24, 2018 at 5:45 AM, Peter Xu <peterx@redhat.com> wrote:
> On Thu, Aug 23, 2018 at 04:31:20PM +0200, Marc-André Lureau wrote:
>> Hi,
>>
>> In commit 25679e5d58e "chardev: tcp: postpone async connection setup"
>> (and its follow up 99f2f54174a59), Peter moved chardev socket
>> connection to machine_done event. However, chardev created later will
>> no longer attempt to connect, and chardev created in tests do not have
>> machine_done event (breaking some of vhost-user-test).
>>
>> The goal was to move the "connect" source to the chardev frontend
>> context (the monitor thread context in his case). chr->gcontext is set
>> with qemu_chr_fe_set_handlers(). But there is no guarantee that the
>> function will be called in general, so we can't delay connection until
>> then: the chardev should still attempt to connect during open(), using
>> the main context.
>>
>> An alternative would be to specify the iothread during chardev
>> creation. Setting up monitor OOB would be quite different too, it
>> would take the same iothread as argument.
>>
>> 99f2f54174a595e is also a bit problematic, since it will behave
>> differently before and after machine_done (the first case gives a
>> chance to use a different context reliably, the second looks racy)
>>
>> In the end, I am not sure this is all necessary, as chardev callbacks
>> are called after qemu_chr_fe_set_handlers(), at which point the
>> context of sources are updated. In "char-socket: update all ioc
>> handlers when changing context", I moved also the hup handler to the
>> updated context. So unless the main thread is already stuck, we can
>> setup a different context for the chardev at that time. Or not?
>>
>> v2:
>> - fix a random socket chardev test failure
>
> I have no problem on patch 3-5, but aren't patch 1-2 still breaking
> the context switch of chardev?  Or did I misunderstood?

Switching context is broken/racy regardless. My current plan is to
suspend the context to switch to, and update all sources before
thawing the target context.

If you prefer we can delay the revert, but I would rather have them
because the regression seems much worse to me.

>
> Regards,
>
> --
> Peter Xu
Peter Xu Aug. 24, 2018, 8:44 a.m. UTC | #3
On Fri, Aug 24, 2018 at 10:32:58AM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Fri, Aug 24, 2018 at 5:45 AM, Peter Xu <peterx@redhat.com> wrote:
> > On Thu, Aug 23, 2018 at 04:31:20PM +0200, Marc-André Lureau wrote:
> >> Hi,
> >>
> >> In commit 25679e5d58e "chardev: tcp: postpone async connection setup"
> >> (and its follow up 99f2f54174a59), Peter moved chardev socket
> >> connection to machine_done event. However, chardev created later will
> >> no longer attempt to connect, and chardev created in tests do not have
> >> machine_done event (breaking some of vhost-user-test).
> >>
> >> The goal was to move the "connect" source to the chardev frontend
> >> context (the monitor thread context in his case). chr->gcontext is set
> >> with qemu_chr_fe_set_handlers(). But there is no guarantee that the
> >> function will be called in general, so we can't delay connection until
> >> then: the chardev should still attempt to connect during open(), using
> >> the main context.
> >>
> >> An alternative would be to specify the iothread during chardev
> >> creation. Setting up monitor OOB would be quite different too, it
> >> would take the same iothread as argument.
> >>
> >> 99f2f54174a595e is also a bit problematic, since it will behave
> >> differently before and after machine_done (the first case gives a
> >> chance to use a different context reliably, the second looks racy)
> >>
> >> In the end, I am not sure this is all necessary, as chardev callbacks
> >> are called after qemu_chr_fe_set_handlers(), at which point the
> >> context of sources are updated. In "char-socket: update all ioc
> >> handlers when changing context", I moved also the hup handler to the
> >> updated context. So unless the main thread is already stuck, we can
> >> setup a different context for the chardev at that time. Or not?
> >>
> >> v2:
> >> - fix a random socket chardev test failure
> >
> > I have no problem on patch 3-5, but aren't patch 1-2 still breaking
> > the context switch of chardev?  Or did I misunderstood?
> 
> Switching context is broken/racy regardless. My current plan is to
> suspend the context to switch to, and update all sources before
> thawing the target context.
> 
> If you prefer we can delay the revert, but I would rather have them
> because the regression seems much worse to me.

Have you started working on above?  I started to code up some RFC
patch to let the chardev maintain the iothreads (now it'll only have a
monitor iothread) then we don't need to switch context for chardev any
more.  I plan to post it today for a quick look.  With that, this
series will not break anything then.

> 
> >
> > Regards,
> >
> > --
> > Peter Xu

Regards,
Marc-André Lureau Aug. 24, 2018, 8:51 a.m. UTC | #4
Hi

On Fri, Aug 24, 2018 at 10:45 AM Peter Xu <peterx@redhat.com> wrote:
>
> On Fri, Aug 24, 2018 at 10:32:58AM +0200, Marc-André Lureau wrote:
> > Hi
> >
> > On Fri, Aug 24, 2018 at 5:45 AM, Peter Xu <peterx@redhat.com> wrote:
> > > On Thu, Aug 23, 2018 at 04:31:20PM +0200, Marc-André Lureau wrote:
> > >> Hi,
> > >>
> > >> In commit 25679e5d58e "chardev: tcp: postpone async connection setup"
> > >> (and its follow up 99f2f54174a59), Peter moved chardev socket
> > >> connection to machine_done event. However, chardev created later will
> > >> no longer attempt to connect, and chardev created in tests do not have
> > >> machine_done event (breaking some of vhost-user-test).
> > >>
> > >> The goal was to move the "connect" source to the chardev frontend
> > >> context (the monitor thread context in his case). chr->gcontext is set
> > >> with qemu_chr_fe_set_handlers(). But there is no guarantee that the
> > >> function will be called in general, so we can't delay connection until
> > >> then: the chardev should still attempt to connect during open(), using
> > >> the main context.
> > >>
> > >> An alternative would be to specify the iothread during chardev
> > >> creation. Setting up monitor OOB would be quite different too, it
> > >> would take the same iothread as argument.
> > >>
> > >> 99f2f54174a595e is also a bit problematic, since it will behave
> > >> differently before and after machine_done (the first case gives a
> > >> chance to use a different context reliably, the second looks racy)
> > >>
> > >> In the end, I am not sure this is all necessary, as chardev callbacks
> > >> are called after qemu_chr_fe_set_handlers(), at which point the
> > >> context of sources are updated. In "char-socket: update all ioc
> > >> handlers when changing context", I moved also the hup handler to the
> > >> updated context. So unless the main thread is already stuck, we can
> > >> setup a different context for the chardev at that time. Or not?
> > >>
> > >> v2:
> > >> - fix a random socket chardev test failure
> > >
> > > I have no problem on patch 3-5, but aren't patch 1-2 still breaking
> > > the context switch of chardev?  Or did I misunderstood?
> >
> > Switching context is broken/racy regardless. My current plan is to
> > suspend the context to switch to, and update all sources before
> > thawing the target context.
> >
> > If you prefer we can delay the revert, but I would rather have them
> > because the regression seems much worse to me.
>
> Have you started working on above?  I started to code up some RFC
> patch to let the chardev maintain the iothreads (now it'll only have a
> monitor iothread) then we don't need to switch context for chardev any
> more.  I plan to post it today for a quick look.  With that, this
> series will not break anything then.

No, I haven't really started the code, but spent some time thinking...

Ok, if you manage to put together that proposal, that sounds interesting.

Nevertheless, I can work on improving/fixing the race when switching
contexts (already used by colo)

>
> >
> > >
> > > Regards,
> > >
> > > --
> > > Peter Xu
>
> Regards,
>
> --
> Peter Xu
>
Peter Xu Aug. 24, 2018, 9:12 a.m. UTC | #5
On Fri, Aug 24, 2018 at 10:51:18AM +0200, Marc-André Lureau wrote:
> Hi
> 
> On Fri, Aug 24, 2018 at 10:45 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > On Fri, Aug 24, 2018 at 10:32:58AM +0200, Marc-André Lureau wrote:
> > > Hi
> > >
> > > On Fri, Aug 24, 2018 at 5:45 AM, Peter Xu <peterx@redhat.com> wrote:
> > > > On Thu, Aug 23, 2018 at 04:31:20PM +0200, Marc-André Lureau wrote:
> > > >> Hi,
> > > >>
> > > >> In commit 25679e5d58e "chardev: tcp: postpone async connection setup"
> > > >> (and its follow up 99f2f54174a59), Peter moved chardev socket
> > > >> connection to machine_done event. However, chardev created later will
> > > >> no longer attempt to connect, and chardev created in tests do not have
> > > >> machine_done event (breaking some of vhost-user-test).
> > > >>
> > > >> The goal was to move the "connect" source to the chardev frontend
> > > >> context (the monitor thread context in his case). chr->gcontext is set
> > > >> with qemu_chr_fe_set_handlers(). But there is no guarantee that the
> > > >> function will be called in general, so we can't delay connection until
> > > >> then: the chardev should still attempt to connect during open(), using
> > > >> the main context.
> > > >>
> > > >> An alternative would be to specify the iothread during chardev
> > > >> creation. Setting up monitor OOB would be quite different too, it
> > > >> would take the same iothread as argument.
> > > >>
> > > >> 99f2f54174a595e is also a bit problematic, since it will behave
> > > >> differently before and after machine_done (the first case gives a
> > > >> chance to use a different context reliably, the second looks racy)
> > > >>
> > > >> In the end, I am not sure this is all necessary, as chardev callbacks
> > > >> are called after qemu_chr_fe_set_handlers(), at which point the
> > > >> context of sources are updated. In "char-socket: update all ioc
> > > >> handlers when changing context", I moved also the hup handler to the
> > > >> updated context. So unless the main thread is already stuck, we can
> > > >> setup a different context for the chardev at that time. Or not?
> > > >>
> > > >> v2:
> > > >> - fix a random socket chardev test failure
> > > >
> > > > I have no problem on patch 3-5, but aren't patch 1-2 still breaking
> > > > the context switch of chardev?  Or did I misunderstood?
> > >
> > > Switching context is broken/racy regardless. My current plan is to
> > > suspend the context to switch to, and update all sources before
> > > thawing the target context.
> > >
> > > If you prefer we can delay the revert, but I would rather have them
> > > because the regression seems much worse to me.
> >
> > Have you started working on above?  I started to code up some RFC
> > patch to let the chardev maintain the iothreads (now it'll only have a
> > monitor iothread) then we don't need to switch context for chardev any
> > more.  I plan to post it today for a quick look.  With that, this
> > series will not break anything then.
> 
> No, I haven't really started the code, but spent some time thinking...
> 
> Ok, if you manage to put together that proposal, that sounds interesting.

Not really your iothread proposal; I tried to keep the interface
unchanged, but move the iothread ownership from the monitor code to
chardev code.  Please have a look at:

  [RFC 0/3] chardev: introduce chardev contexts

> 
> Nevertheless, I can work on improving/fixing the race when switching
> contexts (already used by colo)

Ouch, is colo changing context too?  I hope that series won't break it.

(Ok if so I believe it'll very possible to break it...)

Regards,