diff mbox

[1/1] qemu_char: socket backend: disconnect on EPIPE

Message ID 1485954928-22287-1-git-send-email-den@openvz.org (mailing list archive)
State New, archived
Headers show

Commit Message

Denis V. Lunev Feb. 1, 2017, 1:15 p.m. UTC
From: Anton Nefedov <anton.nefedov@virtuozzo.com>

Socket backend read handler should normally perform a disconnect, however
the read handler may not get a chance to run if the frontend is not ready
(qemu_chr_be_can_write == 0).

This means that in virtio-serial frontend case if
 - the host has disconnected (giving EPIPE on socket write)
 - and the guest has disconnected (-> frontend not ready -> backend
   will not read)
 - and there is still data (frontend->backend) to flush (has to be a really
   tricky timing but nevertheless, we have observed the case in production)

This results in virtio-serial trying to flush this data continiously forming
a busy loop.

Solution: react on EPIPE in the socket write handler. We must not disconnect
right away though, there still may be data to read (see 4bf1cb0).

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Daniel P. Berrange <berrange@redhat.com>
---
 qemu-char.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Comments

Daniel P. Berrangé Feb. 1, 2017, 1:53 p.m. UTC | #1
On Wed, Feb 01, 2017 at 04:15:28PM +0300, Denis V. Lunev wrote:
> From: Anton Nefedov <anton.nefedov@virtuozzo.com>
> 
> Socket backend read handler should normally perform a disconnect, however
> the read handler may not get a chance to run if the frontend is not ready
> (qemu_chr_be_can_write == 0).
> 
> This means that in virtio-serial frontend case if
>  - the host has disconnected (giving EPIPE on socket write)
>  - and the guest has disconnected (-> frontend not ready -> backend
>    will not read)
>  - and there is still data (frontend->backend) to flush (has to be a really
>    tricky timing but nevertheless, we have observed the case in production)
> 
> This results in virtio-serial trying to flush this data continiously forming
> a busy loop.
> 
> Solution: react on EPIPE in the socket write handler. We must not disconnect
> right away though, there still may be data to read (see 4bf1cb0).
> 
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> CC: Daniel P. Berrange <berrange@redhat.com>
> ---
>  qemu-char.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/qemu-char.c b/qemu-char.c
> index 6b4a299..f1f7a07 100644
> --- a/qemu-char.c
> +++ b/qemu-char.c
> @@ -1202,7 +1202,7 @@ static int io_channel_send_full(QIOChannel *ioc,
>              errno = EAGAIN;
>              return -1;
>          } else if (ret < 0) {
> -            errno = EINVAL;
> +            errno = errno == EPIPE ? EPIPE : EINVAL;

You can't rely on 'errno' being valid after qio_channel_writev_full().

The 'Error **errp' arg to that function is the only error reporting
mechanism that's supported. In particular since the TCP channel can
be wrapped in TLS, the errno can easily have been clobbered by time
you get here.

> @@ -2854,6 +2854,9 @@ static gboolean tcp_chr_accept(QIOChannel *chan,
>                                 GIOCondition cond,
>                                 void *opaque);
>  
> +static int tcp_chr_read_poll(void *opaque);
> +static void tcp_chr_disconnect(Chardev *chr);
> +
>  /* Called with chr_write_lock held.  */
>  static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
>  {
> @@ -2871,6 +2874,13 @@ static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
>              s->write_msgfds_num = 0;
>          }
>  
> +        if (ret < 0 && errno == EPIPE) {

IMHO you should just remove "&& errno == EPIPE" and it'd be fine.

> +            if (tcp_chr_read_poll(chr) <= 0) {
> +                tcp_chr_disconnect(chr);
> +                return len;
> +            } /* else let the read handler finish it properly */
> +        }
> +
>          return ret;
>      } else {
>          /* XXX: indicate an error ? */

Regards,
Daniel
Paolo Bonzini Feb. 2, 2017, 12:54 a.m. UTC | #2
On 01/02/2017 05:53, Daniel P. Berrange wrote:
>>  
>> +        if (ret < 0 && errno == EPIPE) {
> IMHO you should just remove "&& errno == EPIPE" and it'd be fine.

It should be errno != EAGAIN.

Paolo

>> +            if (tcp_chr_read_poll(chr) <= 0) {
>> +                tcp_chr_disconnect(chr);
>> +                return len;
>> +            } /* else let the read handler finish it properly */
>> +        }
Denis V. Lunev Feb. 2, 2017, 8:59 a.m. UTC | #3
On 02/01/2017 04:53 PM, Daniel P. Berrange wrote:
> On Wed, Feb 01, 2017 at 04:15:28PM +0300, Denis V. Lunev wrote:
>> From: Anton Nefedov <anton.nefedov@virtuozzo.com>
>>
>> Socket backend read handler should normally perform a disconnect, however
>> the read handler may not get a chance to run if the frontend is not ready
>> (qemu_chr_be_can_write == 0).
>>
>> This means that in virtio-serial frontend case if
>>  - the host has disconnected (giving EPIPE on socket write)
>>  - and the guest has disconnected (-> frontend not ready -> backend
>>    will not read)
>>  - and there is still data (frontend->backend) to flush (has to be a really
>>    tricky timing but nevertheless, we have observed the case in production)
>>
>> This results in virtio-serial trying to flush this data continiously forming
>> a busy loop.
>>
>> Solution: react on EPIPE in the socket write handler. We must not disconnect
>> right away though, there still may be data to read (see 4bf1cb0).
>>
>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Paolo Bonzini <pbonzini@redhat.com>
>> CC: Daniel P. Berrange <berrange@redhat.com>
>> ---
>>  qemu-char.c | 12 +++++++++++-
>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/qemu-char.c b/qemu-char.c
>> index 6b4a299..f1f7a07 100644
>> --- a/qemu-char.c
>> +++ b/qemu-char.c
>> @@ -1202,7 +1202,7 @@ static int io_channel_send_full(QIOChannel *ioc,
>>              errno = EAGAIN;
>>              return -1;
>>          } else if (ret < 0) {
>> -            errno = EINVAL;
>> +            errno = errno == EPIPE ? EPIPE : EINVAL;
> You can't rely on 'errno' being valid after qio_channel_writev_full().
>
> The 'Error **errp' arg to that function is the only error reporting
> mechanism that's supported. In particular since the TCP channel can
> be wrapped in TLS, the errno can easily have been clobbered by time
> you get here.
can we return errno directly from this call and pass exit code to the upper
layer? This will be fair and much more straightforward.

In this case we will have to check ret directly below.

Den
Daniel P. Berrangé Feb. 2, 2017, 9:47 a.m. UTC | #4
On Thu, Feb 02, 2017 at 11:59:22AM +0300, Denis V. Lunev wrote:
> On 02/01/2017 04:53 PM, Daniel P. Berrange wrote:
> > On Wed, Feb 01, 2017 at 04:15:28PM +0300, Denis V. Lunev wrote:
> >> From: Anton Nefedov <anton.nefedov@virtuozzo.com>
> >>
> >> Socket backend read handler should normally perform a disconnect, however
> >> the read handler may not get a chance to run if the frontend is not ready
> >> (qemu_chr_be_can_write == 0).
> >>
> >> This means that in virtio-serial frontend case if
> >>  - the host has disconnected (giving EPIPE on socket write)
> >>  - and the guest has disconnected (-> frontend not ready -> backend
> >>    will not read)
> >>  - and there is still data (frontend->backend) to flush (has to be a really
> >>    tricky timing but nevertheless, we have observed the case in production)
> >>
> >> This results in virtio-serial trying to flush this data continiously forming
> >> a busy loop.
> >>
> >> Solution: react on EPIPE in the socket write handler. We must not disconnect
> >> right away though, there still may be data to read (see 4bf1cb0).
> >>
> >> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> >> Signed-off-by: Denis V. Lunev <den@openvz.org>
> >> CC: Paolo Bonzini <pbonzini@redhat.com>
> >> CC: Daniel P. Berrange <berrange@redhat.com>
> >> ---
> >>  qemu-char.c | 12 +++++++++++-
> >>  1 file changed, 11 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/qemu-char.c b/qemu-char.c
> >> index 6b4a299..f1f7a07 100644
> >> --- a/qemu-char.c
> >> +++ b/qemu-char.c
> >> @@ -1202,7 +1202,7 @@ static int io_channel_send_full(QIOChannel *ioc,
> >>              errno = EAGAIN;
> >>              return -1;
> >>          } else if (ret < 0) {
> >> -            errno = EINVAL;
> >> +            errno = errno == EPIPE ? EPIPE : EINVAL;
> > You can't rely on 'errno' being valid after qio_channel_writev_full().
> >
> > The 'Error **errp' arg to that function is the only error reporting
> > mechanism that's supported. In particular since the TCP channel can
> > be wrapped in TLS, the errno can easily have been clobbered by time
> > you get here.
> can we return errno directly from this call and pass exit code to the upper
> layer? This will be fair and much more straightforward.

No, the qio APIs explicitly do *not* use errno because their implementations
may be calling APIs which in turn do not provide errnos. errno is only a
useful concept if your always calling into system calls. As soon as you need
to deal with higher level libraries, errno is woefully inadequate as a
concept, hence using Error ** instead.

Regards,
Daniel
Daniel P. Berrangé Feb. 2, 2017, 9:49 a.m. UTC | #5
On Wed, Feb 01, 2017 at 04:54:28PM -0800, Paolo Bonzini wrote:
> 
> 
> On 01/02/2017 05:53, Daniel P. Berrange wrote:
> >>  
> >> +        if (ret < 0 && errno == EPIPE) {
> > IMHO you should just remove "&& errno == EPIPE" and it'd be fine.
> 
> It should be errno != EAGAIN.

Oh, yes, thats right.

Regards,
Daniel
Denis V. Lunev Feb. 2, 2017, 11:43 a.m. UTC | #6
On 02/02/2017 12:47 PM, Daniel P. Berrange wrote:
> On Thu, Feb 02, 2017 at 11:59:22AM +0300, Denis V. Lunev wrote:
>> On 02/01/2017 04:53 PM, Daniel P. Berrange wrote:
>>> On Wed, Feb 01, 2017 at 04:15:28PM +0300, Denis V. Lunev wrote:
>>>> From: Anton Nefedov <anton.nefedov@virtuozzo.com>
>>>>
>>>> Socket backend read handler should normally perform a disconnect, however
>>>> the read handler may not get a chance to run if the frontend is not ready
>>>> (qemu_chr_be_can_write == 0).
>>>>
>>>> This means that in virtio-serial frontend case if
>>>>  - the host has disconnected (giving EPIPE on socket write)
>>>>  - and the guest has disconnected (-> frontend not ready -> backend
>>>>    will not read)
>>>>  - and there is still data (frontend->backend) to flush (has to be a really
>>>>    tricky timing but nevertheless, we have observed the case in production)
>>>>
>>>> This results in virtio-serial trying to flush this data continiously forming
>>>> a busy loop.
>>>>
>>>> Solution: react on EPIPE in the socket write handler. We must not disconnect
>>>> right away though, there still may be data to read (see 4bf1cb0).
>>>>
>>>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>>> CC: Paolo Bonzini <pbonzini@redhat.com>
>>>> CC: Daniel P. Berrange <berrange@redhat.com>
>>>> ---
>>>>  qemu-char.c | 12 +++++++++++-
>>>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/qemu-char.c b/qemu-char.c
>>>> index 6b4a299..f1f7a07 100644
>>>> --- a/qemu-char.c
>>>> +++ b/qemu-char.c
>>>> @@ -1202,7 +1202,7 @@ static int io_channel_send_full(QIOChannel *ioc,
>>>>              errno = EAGAIN;
>>>>              return -1;
>>>>          } else if (ret < 0) {
>>>> -            errno = EINVAL;
>>>> +            errno = errno == EPIPE ? EPIPE : EINVAL;
>>> You can't rely on 'errno' being valid after qio_channel_writev_full().
>>>
>>> The 'Error **errp' arg to that function is the only error reporting
>>> mechanism that's supported. In particular since the TCP channel can
>>> be wrapped in TLS, the errno can easily have been clobbered by time
>>> you get here.
>> can we return errno directly from this call and pass exit code to the upper
>> layer? This will be fair and much more straightforward.
> No, the qio APIs explicitly do *not* use errno because their implementations
> may be calling APIs which in turn do not provide errnos. errno is only a
> useful concept if your always calling into system calls. As soon as you need
> to deal with higher level libraries, errno is woefully inadequate as a
> concept, hence using Error ** instead.
>
> Regards,
> Daniel
But the problem is that Error does not have error code field inside.
Should we play with ErrorClass enum?

Den
Daniel P. Berrangé Feb. 2, 2017, 11:46 a.m. UTC | #7
On Thu, Feb 02, 2017 at 02:43:18PM +0300, Denis V. Lunev wrote:
> On 02/02/2017 12:47 PM, Daniel P. Berrange wrote:
> > On Thu, Feb 02, 2017 at 11:59:22AM +0300, Denis V. Lunev wrote:
> >> On 02/01/2017 04:53 PM, Daniel P. Berrange wrote:
> >>> On Wed, Feb 01, 2017 at 04:15:28PM +0300, Denis V. Lunev wrote:
> >>>> From: Anton Nefedov <anton.nefedov@virtuozzo.com>
> >>>>
> >>>> Socket backend read handler should normally perform a disconnect, however
> >>>> the read handler may not get a chance to run if the frontend is not ready
> >>>> (qemu_chr_be_can_write == 0).
> >>>>
> >>>> This means that in virtio-serial frontend case if
> >>>>  - the host has disconnected (giving EPIPE on socket write)
> >>>>  - and the guest has disconnected (-> frontend not ready -> backend
> >>>>    will not read)
> >>>>  - and there is still data (frontend->backend) to flush (has to be a really
> >>>>    tricky timing but nevertheless, we have observed the case in production)
> >>>>
> >>>> This results in virtio-serial trying to flush this data continiously forming
> >>>> a busy loop.
> >>>>
> >>>> Solution: react on EPIPE in the socket write handler. We must not disconnect
> >>>> right away though, there still may be data to read (see 4bf1cb0).
> >>>>
> >>>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> >>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
> >>>> CC: Paolo Bonzini <pbonzini@redhat.com>
> >>>> CC: Daniel P. Berrange <berrange@redhat.com>
> >>>> ---
> >>>>  qemu-char.c | 12 +++++++++++-
> >>>>  1 file changed, 11 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/qemu-char.c b/qemu-char.c
> >>>> index 6b4a299..f1f7a07 100644
> >>>> --- a/qemu-char.c
> >>>> +++ b/qemu-char.c
> >>>> @@ -1202,7 +1202,7 @@ static int io_channel_send_full(QIOChannel *ioc,
> >>>>              errno = EAGAIN;
> >>>>              return -1;
> >>>>          } else if (ret < 0) {
> >>>> -            errno = EINVAL;
> >>>> +            errno = errno == EPIPE ? EPIPE : EINVAL;
> >>> You can't rely on 'errno' being valid after qio_channel_writev_full().
> >>>
> >>> The 'Error **errp' arg to that function is the only error reporting
> >>> mechanism that's supported. In particular since the TCP channel can
> >>> be wrapped in TLS, the errno can easily have been clobbered by time
> >>> you get here.
> >> can we return errno directly from this call and pass exit code to the upper
> >> layer? This will be fair and much more straightforward.
> > No, the qio APIs explicitly do *not* use errno because their implementations
> > may be calling APIs which in turn do not provide errnos. errno is only a
> > useful concept if your always calling into system calls. As soon as you need
> > to deal with higher level libraries, errno is woefully inadequate as a
> > concept, hence using Error ** instead.
> >
> > Regards,
> > Daniel
> But the problem is that Error does not have error code field inside.
> Should we play with ErrorClass enum?

No, don't try to distinguish different types of errors. As mentioned
elsewhere just use the same logic for all errors, except when seeing
QIO_ERR_WOULD_BLOCK

Regards,
Daniel
diff mbox

Patch

diff --git a/qemu-char.c b/qemu-char.c
index 6b4a299..f1f7a07 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -1202,7 +1202,7 @@  static int io_channel_send_full(QIOChannel *ioc,
             errno = EAGAIN;
             return -1;
         } else if (ret < 0) {
-            errno = EINVAL;
+            errno = errno == EPIPE ? EPIPE : EINVAL;
             return -1;
         }
 
@@ -2854,6 +2854,9 @@  static gboolean tcp_chr_accept(QIOChannel *chan,
                                GIOCondition cond,
                                void *opaque);
 
+static int tcp_chr_read_poll(void *opaque);
+static void tcp_chr_disconnect(Chardev *chr);
+
 /* Called with chr_write_lock held.  */
 static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
 {
@@ -2871,6 +2874,13 @@  static int tcp_chr_write(Chardev *chr, const uint8_t *buf, int len)
             s->write_msgfds_num = 0;
         }
 
+        if (ret < 0 && errno == EPIPE) {
+            if (tcp_chr_read_poll(chr) <= 0) {
+                tcp_chr_disconnect(chr);
+                return len;
+            } /* else let the read handler finish it properly */
+        }
+
         return ret;
     } else {
         /* XXX: indicate an error ? */