mbox series

[v5,0/7] Introduce 'yank' oob qmp command to recover from hanging qemu

Message ID cover.1592923201.git.lukasstraub2@web.de (mailing list archive)
Headers show
Series Introduce 'yank' oob qmp command to recover from hanging qemu | expand

Message

Lukas Straub June 23, 2020, 2:42 p.m. UTC
Hello Everyone,
In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
to some other server and that server dies or hangs, qemu hangs too.
These patches introduce the new 'yank' out-of-band qmp command to recover from
these kinds of hangs. The different subsystems register callbacks which get
executed with the yank command. For example the callback can shutdown() a
socket. This is intended for the colo use-case, but it can be used for other
things too of course.

Regards,
Lukas Straub

v5:
 -move yank.c to util/
 -move yank.h to include/qemu/
 -add license to yank.h
 -use const char*
 -nbd: use atomic_store_release and atomic_load_aqcuire
 -io-channel: ensure thread-safety and document it
 -add myself as maintainer for yank

v4:
 -fix build errors...

v3:
 -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini)
 -fix build errors
 -rewrite migration patch so it actually passes all tests

v2:
 -don't touch io/ code anymore
 -always register yank functions
 -'yank' now takes a list of instances to yank
 -'query-yank' returns a list of yankable instances

Lukas Straub (7):
  Introduce yank feature
  block/nbd.c: Add yank feature
  chardev/char-socket.c: Add yank feature
  migration: Add yank feature
  io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
  io: Document thread-safety of qio_channel_shutdown
  MAINTAINERS: Add myself as maintainer for yank feature

 MAINTAINERS                   |  13 +++
 block/nbd.c                   | 101 ++++++++++++-------
 chardev/char-socket.c         |  24 +++++
 include/io/channel.h          |   2 +
 include/qemu/yank.h           |  79 +++++++++++++++
 io/channel-tls.c              |   6 +-
 migration/channel.c           |  12 +++
 migration/migration.c         |  18 +++-
 migration/multifd.c           |  10 ++
 migration/qemu-file-channel.c |   6 ++
 migration/savevm.c            |   2 +
 qapi/misc.json                |  45 +++++++++
 tests/Makefile.include        |   2 +-
 util/Makefile.objs            |   1 +
 util/yank.c                   | 179 ++++++++++++++++++++++++++++++++++
 15 files changed, 459 insertions(+), 41 deletions(-)
 create mode 100644 include/qemu/yank.h
 create mode 100644 util/yank.c

--
2.20.1

Comments

Lukas Straub June 24, 2020, 7:47 p.m. UTC | #1
On Tue, 23 Jun 2020 16:42:30 +0200
Lukas Straub <lukasstraub2@web.de> wrote:

> Hello Everyone,
> In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> to some other server and that server dies or hangs, qemu hangs too.
> These patches introduce the new 'yank' out-of-band qmp command to recover from
> these kinds of hangs. The different subsystems register callbacks which get
> executed with the yank command. For example the callback can shutdown() a
> socket. This is intended for the colo use-case, but it can be used for other
> things too of course.
> 
> Regards,
> Lukas Straub
> 
> v5:
>  -move yank.c to util/
>  -move yank.h to include/qemu/
>  -add license to yank.h
>  -use const char*
>  -nbd: use atomic_store_release and atomic_load_aqcuire
>  -io-channel: ensure thread-safety and document it
>  -add myself as maintainer for yank
> 
> v4:
>  -fix build errors...
> 
> v3:
>  -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini)
>  -fix build errors
>  -rewrite migration patch so it actually passes all tests
> 
> v2:
>  -don't touch io/ code anymore
>  -always register yank functions
>  -'yank' now takes a list of instances to yank
>  -'query-yank' returns a list of yankable instances
> 
> Lukas Straub (7):
>   Introduce yank feature
>   block/nbd.c: Add yank feature
>   chardev/char-socket.c: Add yank feature
>   migration: Add yank feature
>   io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
>   io: Document thread-safety of qio_channel_shutdown
>   MAINTAINERS: Add myself as maintainer for yank feature
> 
>  MAINTAINERS                   |  13 +++
>  block/nbd.c                   | 101 ++++++++++++-------
>  chardev/char-socket.c         |  24 +++++
>  include/io/channel.h          |   2 +
>  include/qemu/yank.h           |  79 +++++++++++++++
>  io/channel-tls.c              |   6 +-
>  migration/channel.c           |  12 +++
>  migration/migration.c         |  18 +++-
>  migration/multifd.c           |  10 ++
>  migration/qemu-file-channel.c |   6 ++
>  migration/savevm.c            |   2 +
>  qapi/misc.json                |  45 +++++++++
>  tests/Makefile.include        |   2 +-
>  util/Makefile.objs            |   1 +
>  util/yank.c                   | 179 ++++++++++++++++++++++++++++++++++
>  15 files changed, 459 insertions(+), 41 deletions(-)
>  create mode 100644 include/qemu/yank.h
>  create mode 100644 util/yank.c
> 
> --
> 2.20.1

Forgot to cc Stefan Hajnoczi...
Lukas Straub July 5, 2020, 9:35 a.m. UTC | #2
On Wed, 24 Jun 2020 21:47:46 +0200
Lukas Straub <lukasstraub2@web.de> wrote:

> On Tue, 23 Jun 2020 16:42:30 +0200
> Lukas Straub <lukasstraub2@web.de> wrote:
> 
> > Hello Everyone,
> > In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> > to some other server and that server dies or hangs, qemu hangs too.
> > These patches introduce the new 'yank' out-of-band qmp command to recover from
> > these kinds of hangs. The different subsystems register callbacks which get
> > executed with the yank command. For example the callback can shutdown() a
> > socket. This is intended for the colo use-case, but it can be used for other
> > things too of course.
> > 
> > Regards,
> > Lukas Straub
> > 
> > v5:
> >  -move yank.c to util/
> >  -move yank.h to include/qemu/
> >  -add license to yank.h
> >  -use const char*
> >  -nbd: use atomic_store_release and atomic_load_aqcuire
> >  -io-channel: ensure thread-safety and document it
> >  -add myself as maintainer for yank
> > 
> > v4:
> >  -fix build errors...
> > 
> > v3:
> >  -don't touch softmmu/vl.c, use __contructor__ attribute instead (Paolo Bonzini)
> >  -fix build errors
> >  -rewrite migration patch so it actually passes all tests
> > 
> > v2:
> >  -don't touch io/ code anymore
> >  -always register yank functions
> >  -'yank' now takes a list of instances to yank
> >  -'query-yank' returns a list of yankable instances
> > 
> > Lukas Straub (7):
> >   Introduce yank feature
> >   block/nbd.c: Add yank feature
> >   chardev/char-socket.c: Add yank feature
> >   migration: Add yank feature
> >   io/channel-tls.c: make qio_channel_tls_shutdown thread-safe
> >   io: Document thread-safety of qio_channel_shutdown
> >   MAINTAINERS: Add myself as maintainer for yank feature
> > 
> >  MAINTAINERS                   |  13 +++
> >  block/nbd.c                   | 101 ++++++++++++-------
> >  chardev/char-socket.c         |  24 +++++
> >  include/io/channel.h          |   2 +
> >  include/qemu/yank.h           |  79 +++++++++++++++
> >  io/channel-tls.c              |   6 +-
> >  migration/channel.c           |  12 +++
> >  migration/migration.c         |  18 +++-
> >  migration/multifd.c           |  10 ++
> >  migration/qemu-file-channel.c |   6 ++
> >  migration/savevm.c            |   2 +
> >  qapi/misc.json                |  45 +++++++++
> >  tests/Makefile.include        |   2 +-
> >  util/Makefile.objs            |   1 +
> >  util/yank.c                   | 179 ++++++++++++++++++++++++++++++++++
> >  15 files changed, 459 insertions(+), 41 deletions(-)
> >  create mode 100644 include/qemu/yank.h
> >  create mode 100644 util/yank.c
> > 
> > --
> > 2.20.1  
> 
> Forgot to cc Stefan Hajnoczi...

Ping...
Stefan Hajnoczi July 28, 2020, 10:29 a.m. UTC | #3
On Tue, Jun 23, 2020 at 04:42:30PM +0200, Lukas Straub wrote:
> In many cases, if qemu has a network connection (qmp, migration, chardev, etc.)
> to some other server and that server dies or hangs, qemu hangs too.
> These patches introduce the new 'yank' out-of-band qmp command to recover from
> these kinds of hangs. The different subsystems register callbacks which get
> executed with the yank command. For example the callback can shutdown() a
> socket. This is intended for the colo use-case, but it can be used for other
> things too of course.

Acked-by: Stefan Hajnoczi <stefanha@redhat.com>