mbox series

[00/10] vhost: stick to -errno error return convention

Message ID 20211111153354.18807-1-rvkagan@yandex-team.ru (mailing list archive)
Headers show
Series vhost: stick to -errno error return convention | expand

Message

Roman Kagan Nov. 11, 2021, 3:33 p.m. UTC
Error propagation between the generic vhost code and the specific backends is
not quite consistent: some places follow "return -1 and set errno" convention,
while others assume "return negated errno".  Furthermore, not enough care is
taken not to clobber errno.

As a result, on certain code paths the errno resulting from a failure may get
overridden by another function call, and then that zero errno inidicating
success is propagated up the stack, leading to failures being lost.  In
particular, we've seen errors in the communication with a vhost-user-blk slave
not trigger an immediate connection drop and reconnection, leaving it in a
broken state.

Rework error propagation to always return negated errno on errors and
correctly pass it up the stack.

Roman Kagan (10):
  vhost-user-blk: reconnect on any error during realize
  chardev/char-socket: tcp_chr_recv: don't clobber errno
  chardev/char-socket: tcp_chr_sync_read: don't clobber errno
  chardev/char-fe: don't allow EAGAIN from blocking read
  vhost-backend: avoid overflow on memslots_limit
  vhost-backend: stick to -errno error return convention
  vhost-vdpa: stick to -errno error return convention
  vhost-user: stick to -errno error return convention
  vhost: stick to -errno error return convention
  vhost-user-blk: propagate error return from generic vhost

 chardev/char-fe.c         |   7 +-
 chardev/char-socket.c     |  17 +-
 hw/block/vhost-user-blk.c |   4 +-
 hw/virtio/vhost-backend.c |   4 +-
 hw/virtio/vhost-user.c    | 401 +++++++++++++++++++++-----------------
 hw/virtio/vhost-vdpa.c    |  37 ++--
 hw/virtio/vhost.c         |  98 +++++-----
 7 files changed, 307 insertions(+), 261 deletions(-)

Comments

Michael S. Tsirkin Nov. 11, 2021, 8:14 p.m. UTC | #1
On Thu, Nov 11, 2021 at 06:33:44PM +0300, Roman Kagan wrote:
> Error propagation between the generic vhost code and the specific backends is
> not quite consistent: some places follow "return -1 and set errno" convention,
> while others assume "return negated errno".  Furthermore, not enough care is
> taken not to clobber errno.
> 
> As a result, on certain code paths the errno resulting from a failure may get
> overridden by another function call, and then that zero errno inidicating
> success is propagated up the stack, leading to failures being lost.  In
> particular, we've seen errors in the communication with a vhost-user-blk slave
> not trigger an immediate connection drop and reconnection, leaving it in a
> broken state.
> 
> Rework error propagation to always return negated errno on errors and
> correctly pass it up the stack.

Looks like something we want post release. I'll tag it
but pls ping me after the release to help make sure
it's not lost.


> Roman Kagan (10):
>   vhost-user-blk: reconnect on any error during realize
>   chardev/char-socket: tcp_chr_recv: don't clobber errno
>   chardev/char-socket: tcp_chr_sync_read: don't clobber errno
>   chardev/char-fe: don't allow EAGAIN from blocking read
>   vhost-backend: avoid overflow on memslots_limit
>   vhost-backend: stick to -errno error return convention
>   vhost-vdpa: stick to -errno error return convention
>   vhost-user: stick to -errno error return convention
>   vhost: stick to -errno error return convention
>   vhost-user-blk: propagate error return from generic vhost
> 
>  chardev/char-fe.c         |   7 +-
>  chardev/char-socket.c     |  17 +-
>  hw/block/vhost-user-blk.c |   4 +-
>  hw/virtio/vhost-backend.c |   4 +-
>  hw/virtio/vhost-user.c    | 401 +++++++++++++++++++++-----------------
>  hw/virtio/vhost-vdpa.c    |  37 ++--
>  hw/virtio/vhost.c         |  98 +++++-----
>  7 files changed, 307 insertions(+), 261 deletions(-)
> 
> -- 
> 2.33.1
>
Roman Kagan Nov. 12, 2021, 8:04 a.m. UTC | #2
On Thu, Nov 11, 2021 at 03:14:56PM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 11, 2021 at 06:33:44PM +0300, Roman Kagan wrote:
> > Error propagation between the generic vhost code and the specific backends is
> > not quite consistent: some places follow "return -1 and set errno" convention,
> > while others assume "return negated errno".  Furthermore, not enough care is
> > taken not to clobber errno.
> > 
> > As a result, on certain code paths the errno resulting from a failure may get
> > overridden by another function call, and then that zero errno inidicating
> > success is propagated up the stack, leading to failures being lost.  In
> > particular, we've seen errors in the communication with a vhost-user-blk slave
> > not trigger an immediate connection drop and reconnection, leaving it in a
> > broken state.
> > 
> > Rework error propagation to always return negated errno on errors and
> > correctly pass it up the stack.
> 
> Looks like something we want post release. I'll tag it
> but pls ping me after the release to help make sure
> it's not lost.

It doesn't introduce new features so I guess it might qualify for rc0,
but the churn is somewhat too big indeed.

OK I'll reiterate once 6.2 is out; meanwhile if anyone has spare cycles
to review it, it'll be much appreciated.

Thanks,
Roman.

> 
> 
> > Roman Kagan (10):
> >   vhost-user-blk: reconnect on any error during realize
> >   chardev/char-socket: tcp_chr_recv: don't clobber errno
> >   chardev/char-socket: tcp_chr_sync_read: don't clobber errno
> >   chardev/char-fe: don't allow EAGAIN from blocking read
> >   vhost-backend: avoid overflow on memslots_limit
> >   vhost-backend: stick to -errno error return convention
> >   vhost-vdpa: stick to -errno error return convention
> >   vhost-user: stick to -errno error return convention
> >   vhost: stick to -errno error return convention
> >   vhost-user-blk: propagate error return from generic vhost
> > 
> >  chardev/char-fe.c         |   7 +-
> >  chardev/char-socket.c     |  17 +-
> >  hw/block/vhost-user-blk.c |   4 +-
> >  hw/virtio/vhost-backend.c |   4 +-
> >  hw/virtio/vhost-user.c    | 401 +++++++++++++++++++++-----------------
> >  hw/virtio/vhost-vdpa.c    |  37 ++--
> >  hw/virtio/vhost.c         |  98 +++++-----
> >  7 files changed, 307 insertions(+), 261 deletions(-)
> > 
> > -- 
> > 2.33.1
> > 
>
Michael S. Tsirkin Nov. 28, 2021, 9:47 p.m. UTC | #3
On Thu, Nov 11, 2021 at 06:33:44PM +0300, Roman Kagan wrote:
> Error propagation between the generic vhost code and the specific backends is
> not quite consistent: some places follow "return -1 and set errno" convention,
> while others assume "return negated errno".  Furthermore, not enough care is
> taken not to clobber errno.
> 
> As a result, on certain code paths the errno resulting from a failure may get
> overridden by another function call, and then that zero errno inidicating
> success is propagated up the stack, leading to failures being lost.  In
> particular, we've seen errors in the communication with a vhost-user-blk slave
> not trigger an immediate connection drop and reconnection, leaving it in a
> broken state.
> 
> Rework error propagation to always return negated errno on errors and
> correctly pass it up the stack.

Hi Roman,
if there are bugfixes here I'll be happy to take them right now.
The wholesale rework seems inappropriate for 6.2, I'll be
happy to tag it for after 6.2. Pls ping me aftre release to help
make sure it's not lost.


> Roman Kagan (10):
>   vhost-user-blk: reconnect on any error during realize
>   chardev/char-socket: tcp_chr_recv: don't clobber errno
>   chardev/char-socket: tcp_chr_sync_read: don't clobber errno
>   chardev/char-fe: don't allow EAGAIN from blocking read
>   vhost-backend: avoid overflow on memslots_limit
>   vhost-backend: stick to -errno error return convention
>   vhost-vdpa: stick to -errno error return convention
>   vhost-user: stick to -errno error return convention
>   vhost: stick to -errno error return convention
>   vhost-user-blk: propagate error return from generic vhost
> 
>  chardev/char-fe.c         |   7 +-
>  chardev/char-socket.c     |  17 +-
>  hw/block/vhost-user-blk.c |   4 +-
>  hw/virtio/vhost-backend.c |   4 +-
>  hw/virtio/vhost-user.c    | 401 +++++++++++++++++++++-----------------
>  hw/virtio/vhost-vdpa.c    |  37 ++--
>  hw/virtio/vhost.c         |  98 +++++-----
>  7 files changed, 307 insertions(+), 261 deletions(-)
> 
> -- 
> 2.33.1
>
Roman Kagan Nov. 29, 2021, 9:44 p.m. UTC | #4
On Sun, Nov 28, 2021 at 04:47:20PM -0500, Michael S. Tsirkin wrote:
> On Thu, Nov 11, 2021 at 06:33:44PM +0300, Roman Kagan wrote:
> > Error propagation between the generic vhost code and the specific backends is
> > not quite consistent: some places follow "return -1 and set errno" convention,
> > while others assume "return negated errno".  Furthermore, not enough care is
> > taken not to clobber errno.
> > 
> > As a result, on certain code paths the errno resulting from a failure may get
> > overridden by another function call, and then that zero errno inidicating
> > success is propagated up the stack, leading to failures being lost.  In
> > particular, we've seen errors in the communication with a vhost-user-blk slave
> > not trigger an immediate connection drop and reconnection, leaving it in a
> > broken state.
> > 
> > Rework error propagation to always return negated errno on errors and
> > correctly pass it up the stack.
> 
> Hi Roman,
> if there are bugfixes here I'll be happy to take them right now.
> The wholesale rework seems inappropriate for 6.2, I'll be
> happy to tag it for after 6.2. Pls ping me aftre release to help
> make sure it's not lost.

All these patches are bugfixes in one way or another.  That said, none
of the problems being addressed are recent regressions.  OTOH the
patches introduce non-zero churn and change behavior on some error
paths, so I'd suggest to postpone the whole series till after 6.2 is
out.

Thanks,
Roman.
Michael S. Tsirkin Jan. 6, 2022, 9:57 a.m. UTC | #5
On Thu, Nov 11, 2021 at 06:33:44PM +0300, Roman Kagan wrote:
> Error propagation between the generic vhost code and the specific backends is
> not quite consistent: some places follow "return -1 and set errno" convention,
> while others assume "return negated errno".  Furthermore, not enough care is
> taken not to clobber errno.
> 
> As a result, on certain code paths the errno resulting from a failure may get
> overridden by another function call, and then that zero errno inidicating
> success is propagated up the stack, leading to failures being lost.  In
> particular, we've seen errors in the communication with a vhost-user-blk slave
> not trigger an immediate connection drop and reconnection, leaving it in a
> broken state.
> 
> Rework error propagation to always return negated errno on errors and
> correctly pass it up the stack.
> 
> Roman Kagan (10):
>   vhost-user-blk: reconnect on any error during realize
>   chardev/char-socket: tcp_chr_recv: don't clobber errno
>   chardev/char-socket: tcp_chr_sync_read: don't clobber errno
>   chardev/char-fe: don't allow EAGAIN from blocking read

So I dropped this one. If you are so inclined, pls work on
this separately.

>   vhost-backend: avoid overflow on memslots_limit
>   vhost-backend: stick to -errno error return convention
>   vhost-vdpa: stick to -errno error return convention
>   vhost-user: stick to -errno error return convention
>   vhost: stick to -errno error return convention
>   vhost-user-blk: propagate error return from generic vhost
> 
>  chardev/char-fe.c         |   7 +-
>  chardev/char-socket.c     |  17 +-
>  hw/block/vhost-user-blk.c |   4 +-
>  hw/virtio/vhost-backend.c |   4 +-
>  hw/virtio/vhost-user.c    | 401 +++++++++++++++++++++-----------------
>  hw/virtio/vhost-vdpa.c    |  37 ++--
>  hw/virtio/vhost.c         |  98 +++++-----
>  7 files changed, 307 insertions(+), 261 deletions(-)
> 
> -- 
> 2.33.1
>