diff mbox

[v2,1/1] qemu-char: socket backend: disconnect on write error

Message ID 1486041504-23627-1-git-send-email-den@openvz.org (mailing list archive)
State New, archived
Headers show

Commit Message

Denis V. Lunev Feb. 2, 2017, 1:18 p.m. UTC
From: Anton Nefedov <anton.nefedov@virtuozzo.com>

Socket backend read handler should normally perform a disconnect, however
the read handler may not get a chance to run if the frontend is not ready
(qemu_chr_be_can_write() == 0).

This means that in virtio-serial frontend case if
 - the host has disconnected (giving EPIPE on socket write)
 - and the guest has disconnected (-> frontend not ready -> backend
   will not read)
 - and there is still data (frontend->backend) to flush (has to be a really
   tricky timing but nevertheless, we have observed the case in production)

This results in virtio-serial trying to flush this data continiously forming
a busy loop.

Solution: react on write error in the socket write handler.
errno is not reliable after qio_channel_writev_full(), so we may not get
the exact EPIPE, so disconnect on any error but QIO_CHANNEL_ERR_BLOCK which
io_channel_send_full() converts to errno EAGAIN.
We must not disconnect right away though, there still may be data to read
(see 4bf1cb0).

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Daniel P. Berrange <berrange@redhat.com>
CC: Marc-André Lureau <marcandre.lureau@redhat.com>
---
Changes from v1:
- we do not rely on EPIPE anynore. Socket should be closed on all errors
  except EAGAIN

 chardev/char-socket.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Denis V. Lunev Feb. 2, 2017, 2:07 p.m. UTC | #1
On 02/02/2017 04:18 PM, Denis V. Lunev wrote:
> From: Anton Nefedov <anton.nefedov@virtuozzo.com>
>
> Socket backend read handler should normally perform a disconnect, however
> the read handler may not get a chance to run if the frontend is not ready
> (qemu_chr_be_can_write() == 0).
>
> This means that in virtio-serial frontend case if
>  - the host has disconnected (giving EPIPE on socket write)
>  - and the guest has disconnected (-> frontend not ready -> backend
>    will not read)
>  - and there is still data (frontend->backend) to flush (has to be a really
>    tricky timing but nevertheless, we have observed the case in production)
>
> This results in virtio-serial trying to flush this data continiously forming
> a busy loop.
>
> Solution: react on write error in the socket write handler.
> errno is not reliable after qio_channel_writev_full(), so we may not get
> the exact EPIPE, so disconnect on any error but QIO_CHANNEL_ERR_BLOCK which
> io_channel_send_full() converts to errno EAGAIN.
> We must not disconnect right away though, there still may be data to read
> (see 4bf1cb0).
>
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> CC: Daniel P. Berrange <berrange@redhat.com>
> CC: Marc-André Lureau <marcandre.lureau@redhat.com>
> ---
> Changes from v1:
> - we do not rely on EPIPE anynore. Socket should be closed on all errors
>   except EAGAIN
>
>  chardev/char-socket.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index 4068dc5..e2137aa 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -97,6 +97,9 @@ static gboolean tcp_chr_accept(QIOChannel *chan,
>                                 GIOCondition cond,
>                                 void *opaque);
>  
> +static int tcp_chr_read_poll(void *opaque);
> +static void tcp_chr_disconnect(CharDriverState *chr);
> +
>  /* Called with chr_write_lock held.  */
>  static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
>  {
> @@ -114,6 +117,13 @@ static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
>              s->write_msgfds_num = 0;
>          }
>  
> +        if (ret < 0 && errno != EAGAIN) {
> +            if (tcp_chr_read_poll(chr) <= 0) {
> +                tcp_chr_disconnect(chr);
> +                return len;
> +            } /* else let the read handler finish it properly */
> +        }
> +
>          return ret;
>      } else {
>          /* XXX: indicate an error ? */
pls disregard. I am very sorry. This patch was made against all version
and will not compile.

Den
Marc-André Lureau Feb. 24, 2017, 12:59 p.m. UTC | #2
Hi

On Thu, Feb 2, 2017 at 5:20 PM Denis V. Lunev <den@openvz.org> wrote:

> From: Anton Nefedov <anton.nefedov@virtuozzo.com>
>
> Socket backend read handler should normally perform a disconnect, however
> the read handler may not get a chance to run if the frontend is not ready
> (qemu_chr_be_can_write() == 0).
>
> This means that in virtio-serial frontend case if
>  - the host has disconnected (giving EPIPE on socket write)
>  - and the guest has disconnected (-> frontend not ready -> backend
>    will not read)
>  - and there is still data (frontend->backend) to flush (has to be a really
>    tricky timing but nevertheless, we have observed the case in production)
>
> This results in virtio-serial trying to flush this data continiously
> forming
> a busy loop.
>
> Solution: react on write error in the socket write handler.
> errno is not reliable after qio_channel_writev_full(), so we may not get
> the exact EPIPE, so disconnect on any error but QIO_CHANNEL_ERR_BLOCK which
> io_channel_send_full() converts to errno EAGAIN.
> We must not disconnect right away though, there still may be data to read
> (see 4bf1cb0).
>
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> CC: Daniel P. Berrange <berrange@redhat.com>
> CC: Marc-André Lureau <marcandre.lureau@redhat.com>
> ---
> Changes from v1:
> - we do not rely on EPIPE anynore. Socket should be closed on all errors
>   except EAGAIN
>
>  chardev/char-socket.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index 4068dc5..e2137aa 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -97,6 +97,9 @@ static gboolean tcp_chr_accept(QIOChannel *chan,
>                                 GIOCondition cond,
>                                 void *opaque);
>
> +static int tcp_chr_read_poll(void *opaque);
> +static void tcp_chr_disconnect(CharDriverState *chr);
> +
>  /* Called with chr_write_lock held.  */
>  static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
>  {
> @@ -114,6 +117,13 @@ static int tcp_chr_write(Chardev *chr, const uint8_t
> *buf, int len)
>              s->write_msgfds_num = 0;
>          }
>
> +        if (ret < 0 && errno != EAGAIN) {
> +            if (tcp_chr_read_poll(chr) <= 0) {
> +                tcp_chr_disconnect(chr);
> +                return len;
>

This change breaks a number of assumption in vhost-user code. Until now, a
vhost-user function assumed that dev->vhost_ops would remain as long as the
function is running, so it may call vhost_ops callbacks several time, which
may eventually fail to do io, but no crash would occur. The disconnection
would be handled later with the HUP handler. Now, vhost_ops may be cleared
during a write (chr_disconnect -> CHR_EVENT_CLOSED in
net_vhost_user_event). This can be randomly reproduced with vhost-user-test
-p /x86_64/vhost-user/flags-mismatch/subprocess  (broken pipe, qemu crashes)

I am trying to fix this, but it isn't simple (races, many code paths etc),
so I just wanted to inform you about it.


+            } /* else let the read handler finish it properly */
> +        }
> +
>          return ret;
>      } else {
>          /* XXX: indicate an error ? */
> --
> 2.7.4
>
>
> --
Marc-André Lureau
Markus Armbruster Feb. 24, 2017, 1:42 p.m. UTC | #3
Marc-André Lureau <marcandre.lureau@gmail.com> writes:

> Hi
>
> On Thu, Feb 2, 2017 at 5:20 PM Denis V. Lunev <den@openvz.org> wrote:
>
>> From: Anton Nefedov <anton.nefedov@virtuozzo.com>
>>
>> Socket backend read handler should normally perform a disconnect, however
>> the read handler may not get a chance to run if the frontend is not ready
>> (qemu_chr_be_can_write() == 0).
>>
>> This means that in virtio-serial frontend case if
>>  - the host has disconnected (giving EPIPE on socket write)
>>  - and the guest has disconnected (-> frontend not ready -> backend
>>    will not read)
>>  - and there is still data (frontend->backend) to flush (has to be a really
>>    tricky timing but nevertheless, we have observed the case in production)
>>
>> This results in virtio-serial trying to flush this data continiously
>> forming
>> a busy loop.
>>
>> Solution: react on write error in the socket write handler.
>> errno is not reliable after qio_channel_writev_full(), so we may not get
>> the exact EPIPE, so disconnect on any error but QIO_CHANNEL_ERR_BLOCK which
>> io_channel_send_full() converts to errno EAGAIN.
>> We must not disconnect right away though, there still may be data to read
>> (see 4bf1cb0).
>>
>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Paolo Bonzini <pbonzini@redhat.com>
>> CC: Daniel P. Berrange <berrange@redhat.com>
>> CC: Marc-André Lureau <marcandre.lureau@redhat.com>
>> ---
>> Changes from v1:
>> - we do not rely on EPIPE anynore. Socket should be closed on all errors
>>   except EAGAIN
>>
>>  chardev/char-socket.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
>> index 4068dc5..e2137aa 100644
>> --- a/chardev/char-socket.c
>> +++ b/chardev/char-socket.c
>> @@ -97,6 +97,9 @@ static gboolean tcp_chr_accept(QIOChannel *chan,
>>                                 GIOCondition cond,
>>                                 void *opaque);
>>
>> +static int tcp_chr_read_poll(void *opaque);
>> +static void tcp_chr_disconnect(CharDriverState *chr);
>> +
>>  /* Called with chr_write_lock held.  */
>>  static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
>>  {
>> @@ -114,6 +117,13 @@ static int tcp_chr_write(Chardev *chr, const uint8_t
>> *buf, int len)
>>              s->write_msgfds_num = 0;
>>          }
>>
>> +        if (ret < 0 && errno != EAGAIN) {
>> +            if (tcp_chr_read_poll(chr) <= 0) {
>> +                tcp_chr_disconnect(chr);
>> +                return len;
>>
>
> This change breaks a number of assumption in vhost-user code. Until now, a
> vhost-user function assumed that dev->vhost_ops would remain as long as the
> function is running, so it may call vhost_ops callbacks several time, which
> may eventually fail to do io, but no crash would occur. The disconnection
> would be handled later with the HUP handler. Now, vhost_ops may be cleared
> during a write (chr_disconnect -> CHR_EVENT_CLOSED in
> net_vhost_user_event). This can be randomly reproduced with vhost-user-test
> -p /x86_64/vhost-user/flags-mismatch/subprocess  (broken pipe, qemu crashes)

It hangs for me with annoying frequency.  I'm glad you found the culprit!

> I am trying to fix this, but it isn't simple (races, many code paths etc),
> so I just wanted to inform you about it.

Can we revert this patch until we have a fix?
Marc-André Lureau Feb. 24, 2017, 2:46 p.m. UTC | #4
Hi

On Fri, Feb 24, 2017 at 5:42 PM Markus Armbruster <armbru@redhat.com> wrote:

> Marc-André Lureau <marcandre.lureau@gmail.com> writes:
>
> > Hi
> >
> > On Thu, Feb 2, 2017 at 5:20 PM Denis V. Lunev <den@openvz.org> wrote:
> >
> >> From: Anton Nefedov <anton.nefedov@virtuozzo.com>
> >>
> >> Socket backend read handler should normally perform a disconnect,
> however
> >> the read handler may not get a chance to run if the frontend is not
> ready
> >> (qemu_chr_be_can_write() == 0).
> >>
> >> This means that in virtio-serial frontend case if
> >>  - the host has disconnected (giving EPIPE on socket write)
> >>  - and the guest has disconnected (-> frontend not ready -> backend
> >>    will not read)
> >>  - and there is still data (frontend->backend) to flush (has to be a
> really
> >>    tricky timing but nevertheless, we have observed the case in
> production)
> >>
> >> This results in virtio-serial trying to flush this data continiously
> >> forming
> >> a busy loop.
> >>
> >> Solution: react on write error in the socket write handler.
> >> errno is not reliable after qio_channel_writev_full(), so we may not get
> >> the exact EPIPE, so disconnect on any error but QIO_CHANNEL_ERR_BLOCK
> which
> >> io_channel_send_full() converts to errno EAGAIN.
> >> We must not disconnect right away though, there still may be data to
> read
> >> (see 4bf1cb0).
> >>
> >> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> >> Signed-off-by: Denis V. Lunev <den@openvz.org>
> >> CC: Paolo Bonzini <pbonzini@redhat.com>
> >> CC: Daniel P. Berrange <berrange@redhat.com>
> >> CC: Marc-André Lureau <marcandre.lureau@redhat.com>
> >> ---
> >> Changes from v1:
> >> - we do not rely on EPIPE anynore. Socket should be closed on all errors
> >>   except EAGAIN
> >>
> >>  chardev/char-socket.c | 10 ++++++++++
> >>  1 file changed, 10 insertions(+)
> >>
> >> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> >> index 4068dc5..e2137aa 100644
> >> --- a/chardev/char-socket.c
> >> +++ b/chardev/char-socket.c
> >> @@ -97,6 +97,9 @@ static gboolean tcp_chr_accept(QIOChannel *chan,
> >>                                 GIOCondition cond,
> >>                                 void *opaque);
> >>
> >> +static int tcp_chr_read_poll(void *opaque);
> >> +static void tcp_chr_disconnect(CharDriverState *chr);
> >> +
> >>  /* Called with chr_write_lock held.  */
> >>  static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
> >>  {
> >> @@ -114,6 +117,13 @@ static int tcp_chr_write(Chardev *chr, const
> uint8_t
> >> *buf, int len)
> >>              s->write_msgfds_num = 0;
> >>          }
> >>
> >> +        if (ret < 0 && errno != EAGAIN) {
> >> +            if (tcp_chr_read_poll(chr) <= 0) {
> >> +                tcp_chr_disconnect(chr);
> >> +                return len;
> >>
> >
> > This change breaks a number of assumption in vhost-user code. Until now,
> a
> > vhost-user function assumed that dev->vhost_ops would remain as long as
> the
> > function is running, so it may call vhost_ops callbacks several time,
> which
> > may eventually fail to do io, but no crash would occur. The disconnection
> > would be handled later with the HUP handler. Now, vhost_ops may be
> cleared
> > during a write (chr_disconnect -> CHR_EVENT_CLOSED in
> > net_vhost_user_event). This can be randomly reproduced with
> vhost-user-test
> > -p /x86_64/vhost-user/flags-mismatch/subprocess  (broken pipe, qemu
> crashes)
>
> It hangs for me with annoying frequency.  I'm glad you found the culprit!
>
>
Yeah, when reverted, it seems it never hangs here. But the test exercices
the code in interesting ways, with some unpredictable situations (device
initialization while the backend disconnects/reconnects), so I wouldn't be
surprised if we find more hidden issues. (vhost-user disconnect handling
still isn't close to being safe)

> I am trying to fix this, but it isn't simple (races, many code paths etc),
> > so I just wanted to inform you about it.
>
> Can we revert this patch until we have a fix?
>

That or disable the test for now.

But  I wonder if this patch is a good solution to the problem. And I don't
like the socket backend behaving differently than other backends.
Paolo Bonzini Feb. 24, 2017, 3:31 p.m. UTC | #5
On 24/02/2017 15:46, Marc-André Lureau wrote:
> 
>>> +        if (ret < 0 && errno != EAGAIN) {
>>> +            if (tcp_chr_read_poll(chr) <= 0) {
>>> +                tcp_chr_disconnect(chr);
>>> +                return len;
>>>
>>
>> This change breaks a number of assumption in vhost-user code. Until now, a
>> vhost-user function assumed that dev->vhost_ops would remain as long as the
>> function is running, so it may call vhost_ops callbacks several time, which
>> may eventually fail to do io, but no crash would occur. The disconnection
>> would be handled later with the HUP handler. Now, vhost_ops may be cleared
>> during a write (chr_disconnect -> CHR_EVENT_CLOSED in
>> net_vhost_user_event). This can be randomly reproduced with
>> vhost-user-test -p /x86_64/vhost-user/flags-mismatch/subprocess

Would it work to call tcp_chr_disconnect from an idle source (and
destroy the source on the next connect)?

Paolo
Denis V. Lunev Feb. 24, 2017, 3:57 p.m. UTC | #6
On 02/24/2017 04:31 PM, Paolo Bonzini wrote:
>
> On 24/02/2017 15:46, Marc-André Lureau wrote:
>>>> +        if (ret < 0 && errno != EAGAIN) {
>>>> +            if (tcp_chr_read_poll(chr) <= 0) {
>>>> +                tcp_chr_disconnect(chr);
>>>> +                return len;
>>>>
>>> This change breaks a number of assumption in vhost-user code. Until now, a
>>> vhost-user function assumed that dev->vhost_ops would remain as long as the
>>> function is running, so it may call vhost_ops callbacks several time, which
>>> may eventually fail to do io, but no crash would occur. The disconnection
>>> would be handled later with the HUP handler. Now, vhost_ops may be cleared
>>> during a write (chr_disconnect -> CHR_EVENT_CLOSED in
>>> net_vhost_user_event). This can be randomly reproduced with
>>> vhost-user-test -p /x86_64/vhost-user/flags-mismatch/subprocess
> Would it work to call tcp_chr_disconnect from an idle source (and
> destroy the source on the next connect)?
>
> Paolo
I think no, but will think more on Monday. Unfortunately
today is official holiday in Russia :(

Right now the code loops forever. May be we should notify the
guest that the problem really happens.

Den
Marc-André Lureau Feb. 24, 2017, 4 p.m. UTC | #7
Hi

On Fri, Feb 24, 2017 at 7:31 PM Paolo Bonzini <pbonzini@redhat.com> wrote:

>
>
> On 24/02/2017 15:46, Marc-André Lureau wrote:
> >
> >>> +        if (ret < 0 && errno != EAGAIN) {
> >>> +            if (tcp_chr_read_poll(chr) <= 0) {
> >>> +                tcp_chr_disconnect(chr);
> >>> +                return len;
> >>>
> >>
> >> This change breaks a number of assumption in vhost-user code. Until
> now, a
> >> vhost-user function assumed that dev->vhost_ops would remain as long as
> the
> >> function is running, so it may call vhost_ops callbacks several time,
> which
> >> may eventually fail to do io, but no crash would occur. The
> disconnection
> >> would be handled later with the HUP handler. Now, vhost_ops may be
> cleared
> >> during a write (chr_disconnect -> CHR_EVENT_CLOSED in
> >> net_vhost_user_event). This can be randomly reproduced with
> >> vhost-user-test -p /x86_64/vhost-user/flags-mismatch/subprocess
>
> Would it work to call tcp_chr_disconnect from an idle source (and
> destroy the source on the next connect)?
>

Yes, I have a solution in this direction: https://da.gd/NUZf7

Note that vhost code could reach disconnect on read and segv in similar
ways even if this commit was reverted. Delaying cleanup would solve that
case too. Otherwise, significant work needs to be done to make vhost more
robust on disconnect...
diff mbox

Patch

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 4068dc5..e2137aa 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -97,6 +97,9 @@  static gboolean tcp_chr_accept(QIOChannel *chan,
                                GIOCondition cond,
                                void *opaque);
 
+static int tcp_chr_read_poll(void *opaque);
+static void tcp_chr_disconnect(CharDriverState *chr);
+
 /* Called with chr_write_lock held.  */
 static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
 {
@@ -114,6 +117,13 @@  static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
             s->write_msgfds_num = 0;
         }
 
+        if (ret < 0 && errno != EAGAIN) {
+            if (tcp_chr_read_poll(chr) <= 0) {
+                tcp_chr_disconnect(chr);
+                return len;
+            } /* else let the read handler finish it properly */
+        }
+
         return ret;
     } else {
         /* XXX: indicate an error ? */