mbox series

[V2,0/4] Introduce Advanced Watch Dog module

Message ID 20191101024850.20808-1-chen.zhang@intel.com (mailing list archive)
Headers show
Series Introduce Advanced Watch Dog module | expand

Message

Zhang, Chen Nov. 1, 2019, 2:48 a.m. UTC
From: Zhang Chen <chen.zhang@intel.com>

Advanced Watch Dog is an universal monitoring module on VMM side, it can be used to detect network down(VMM to guest, VMM to VMM, VMM to another remote server) and do previously set operation. Current AWD patch just accept any input as the signal to refresh the watchdog timer,
and we can also make a certain interactive protocol here. For the output user can pre-write
some command or some messages in the AWD opt-script. We noticed that there is no way
for VMM communicate directly, maybe some people think we don't need such things(up layer
software like openstack can handle it). But we engaged with real customer found that in some cases,they need a lightweight and efficient mechanism to solve some practical problems(openstack is too heavy).
for example: When it detects lost connection with the paired node,it will send message to admin, notify another VMM, send qmp command to qemu do some operation like restart the VM, build VMM heartbeat system, etc.
It make user have basic VM/Host network monitoring tools and basic false tolerance and recovery solution.

Demo usage(for COLO heartbeat service):

In primary node:

-chardev socket,id=h1,host=3.3.3.3,port=9009,server,nowait
-chardev socket,id=heartbeat0,host=3.3.3.3,port=4445
-object iothread,id=iothread2
-object advanced-watchdog,id=heart1,server=on,awd_node=h1,notification_node=heartbeat0,opt_script=colo_opt_script_path,iothread=iothread1,pulse_interval=1000,timeout=5000

In secondary node:

-monitor tcp::4445,server,nowait
-chardev socket,id=h1,host=3.3.3.3,port=9009,reconnect=1
-chardev socket,id=heart1,host=3.3.3.8,port=4445
-object iothread,id=iothread1
-object advanced-watchdog,id=heart1,server=off,awd_node=h1,notification_node=heart1,opt_script=colo_secondary_opt_script,iothread=iothread1,timeout=10000


V2:
 - Addressed Philippe comments add configure selector for AWD.

Initial:
 - Initial version.

Zhang Chen (4):
  net/awd.c: Introduce Advanced Watch Dog module framework
  net/awd.c: Initailize input/output chardev
  net/awd.c: Load advanced watch dog worker thread job
  vl.c: Make Advanced Watch Dog delayed initialization

 configure         |   9 +
 net/Makefile.objs |   1 +
 net/awd.c         | 491 ++++++++++++++++++++++++++++++++++++++++++++++
 qemu-options.hx   |   6 +
 vl.c              |   7 +
 5 files changed, 514 insertions(+)
 create mode 100644 net/awd.c

Comments

Zhang, Chen Nov. 8, 2019, 3:03 a.m. UTC | #1
Hi~ All~ 

Ping.... Anyone have time to review this series? I need more comments~

Thanks
Zhang Chen

> -----Original Message-----
> From: Zhang, Chen <chen.zhang@intel.com>
> Sent: Friday, November 1, 2019 10:49 AM
> To: Jason Wang <jasowang@redhat.com>; Paolo Bonzini
> <pbonzini@redhat.com>; Philippe Mathieu-Daudé <philmd@redhat.com>;
> qemu-dev <qemu-devel@nongnu.org>
> Cc: Zhang Chen <zhangckid@gmail.com>; Zhang, Chen
> <chen.zhang@intel.com>
> Subject: [PATCH V2 0/4] Introduce Advanced Watch Dog module
> 
> From: Zhang Chen <chen.zhang@intel.com>
> 
> Advanced Watch Dog is an universal monitoring module on VMM side, it can
> be used to detect network down(VMM to guest, VMM to VMM, VMM to
> another remote server) and do previously set operation. Current AWD patch
> just accept any input as the signal to refresh the watchdog timer, and we can
> also make a certain interactive protocol here. For the output user can pre-
> write some command or some messages in the AWD opt-script. We noticed
> that there is no way for VMM communicate directly, maybe some people
> think we don't need such things(up layer software like openstack can handle
> it). But we engaged with real customer found that in some cases,they need a
> lightweight and efficient mechanism to solve some practical
> problems(openstack is too heavy).
> for example: When it detects lost connection with the paired node,it will
> send message to admin, notify another VMM, send qmp command to qemu
> do some operation like restart the VM, build VMM heartbeat system, etc.
> It make user have basic VM/Host network monitoring tools and basic false
> tolerance and recovery solution.
> 
> Demo usage(for COLO heartbeat service):
> 
> In primary node:
> 
> -chardev socket,id=h1,host=3.3.3.3,port=9009,server,nowait
> -chardev socket,id=heartbeat0,host=3.3.3.3,port=4445
> -object iothread,id=iothread2
> -object advanced-
> watchdog,id=heart1,server=on,awd_node=h1,notification_node=heartbeat
> 0,opt_script=colo_opt_script_path,iothread=iothread1,pulse_interval=1000,
> timeout=5000
> 
> In secondary node:
> 
> -monitor tcp::4445,server,nowait
> -chardev socket,id=h1,host=3.3.3.3,port=9009,reconnect=1
> -chardev socket,id=heart1,host=3.3.3.8,port=4445
> -object iothread,id=iothread1
> -object advanced-
> watchdog,id=heart1,server=off,awd_node=h1,notification_node=heart1,op
> t_script=colo_secondary_opt_script,iothread=iothread1,timeout=10000
> 
> 
> V2:
>  - Addressed Philippe comments add configure selector for AWD.
> 
> Initial:
>  - Initial version.
> 
> Zhang Chen (4):
>   net/awd.c: Introduce Advanced Watch Dog module framework
>   net/awd.c: Initailize input/output chardev
>   net/awd.c: Load advanced watch dog worker thread job
>   vl.c: Make Advanced Watch Dog delayed initialization
> 
>  configure         |   9 +
>  net/Makefile.objs |   1 +
>  net/awd.c         | 491
> ++++++++++++++++++++++++++++++++++++++++++++++
>  qemu-options.hx   |   6 +
>  vl.c              |   7 +
>  5 files changed, 514 insertions(+)
>  create mode 100644 net/awd.c
> 
> --
> 2.17.1
Markus Armbruster Nov. 27, 2019, 3:48 p.m. UTC | #2
"Zhang, Chen" <chen.zhang@intel.com> writes:

> Hi~ All~ 
>
> Ping.... Anyone have time to review this series? I need more comments~

Any takers?
Zhang, Chen Nov. 28, 2019, 3:15 a.m. UTC | #3
> -----Original Message-----
> From: Markus Armbruster <armbru@redhat.com>
> Sent: Wednesday, November 27, 2019 11:49 PM
> To: Zhang, Chen <chen.zhang@intel.com>
> Cc: Jason Wang <jasowang@redhat.com>; Paolo Bonzini
> <pbonzini@redhat.com>; Philippe Mathieu-Daudé <philmd@redhat.com>;
> qemu-dev <qemu-devel@nongnu.org>; Zhang Chen <zhangckid@gmail.com>
> Subject: Re: [PATCH V2 0/4] Introduce Advanced Watch Dog module
> 
> "Zhang, Chen" <chen.zhang@intel.com> writes:
> 
> > Hi~ All~
> >
> > Ping.... Anyone have time to review this series? I need more comments~
> 
> Any takers?

Hi Markus,

Thank you for your attention.
This is a very simple module to complete the tasks related to error detection and automatic processing.
I have write the detail reason why we need it in real environment on the commit log.
Here is the latest patch:
https://lists.nongnu.org/archive/html/qemu-devel/2019-11/msg02872.html

Thanks
Zhang Chen