Message ID | 20191119123016.13740-1-chen.zhang@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Introduce Advanced Watch Dog module | expand |
Hi All~ No news for a long time. Please give me more comments about this series. Thanks Zhang Chen On 11/19/2019 8:30 PM, Zhang, Chen wrote: > From: Zhang Chen <chen.zhang@intel.com> > > Advanced Watch Dog is an universal monitoring module on VMM side, it can be used to detect network down(VMM to guest, VMM to VMM, VMM to another remote server) and do previously set operation. Current AWD patch just accept any input as the signal to refresh the watchdog timer, > and we can also make a certain interactive protocol here. For the output user can pre-write > some command or some messages in the AWD opt-script. We noticed that there is no way > for VMM communicate directly, maybe some people think we don't need such things(up layer > software like openstack can handle it). But we engaged with real customer found that in some cases,they need a lightweight and efficient mechanism to solve some practical problems(openstack is too heavy). > for example: When it detects lost connection with the paired node,it will send message to admin, notify another VMM, send qmp command to qemu do some operation like restart the VM, build VMM heartbeat system, etc. > It make user have basic VM/Host network monitoring tools and basic false tolerance and recovery solution. > > Demo usage(for COLO heartbeat service): > > In primary node: > > -chardev socket,id=h1,host=3.3.3.3,port=9009,server,nowait > -chardev socket,id=heartbeat0,host=3.3.3.3,port=4445 > -object iothread,id=iothread2 > -object advanced-watchdog,id=heart1,server=on,awd_node=h1,notification_node=heartbeat0,opt_script=colo_opt_script_path,iothread=iothread1,pulse_interval=1000,timeout=5000 > > In secondary node: > > -monitor tcp::4445,server,nowait > -chardev socket,id=h1,host=3.3.3.3,port=9009,reconnect=1 > -chardev socket,id=heart1,host=3.3.3.8,port=4445 > -object iothread,id=iothread1 > -object advanced-watchdog,id=heart1,server=off,awd_node=h1,notification_node=heart1,opt_script=colo_secondary_opt_script,iothread=iothread1,timeout=10000 > > > V3: > - Rebased on Qemu 4.2.0-rc1 code. > - Fix commit message issue. > > V2: > - Addressed Philippe comments add configure selector for AWD. > > Initial: > - Initial version. > > > Zhang Chen (4): > net/awd.c: Introduce Advanced Watch Dog module framework > net/awd.c: Initailize input/output chardev > net/awd.c: Load advanced watch dog worker thread job > vl.c: Make Advanced Watch Dog delayed initialization > > configure | 9 + > net/Makefile.objs | 1 + > net/awd.c | 491 ++++++++++++++++++++++++++++++++++++++++++++++ > qemu-options.hx | 6 + > vl.c | 7 + > 5 files changed, 514 insertions(+) > create mode 100644 net/awd.c >
On 08/12/19 18:52, Zhang, Chen wrote: > Hi All~ > > No news for a long time. > > Please give me more comments about this series. Sorry, people were probably busy with the QEMU release candidates. Even before looking at the code, the series is completely missing documentation on how to use it and on the chardev protocol. The documentation should go in docs/ and should be written as restructuredText. The qemu-options.hx patches also lack documentation about the properties accepted by the new object. In particular: >> -chardev socket,id=h1,host=3.3.3.3,port=9009,server,nowait >> -chardev socket,id=heartbeat0,host=3.3.3.3,port=4445 >> -object iothread,id=iothread2 >> -object >> advanced-watchdog,id=heart1,server=on,awd_node=h1,notification_node=heartbeat0,opt_script=colo_opt_script_path,iothread=iothread1,pulse_interval=1000,timeout=5000 What are the two sockets for, and what should be in colo_opt_script_path? >> >> In secondary node: >> >> -monitor tcp::4445,server,nowait >> -chardev socket,id=h1,host=3.3.3.3,port=9009,reconnect=1 >> -chardev socket,id=heart1,host=3.3.3.8,port=4445 >> -object iothread,id=iothread1 >> -object >> advanced-watchdog,id=heart1,server=off,awd_node=h1,notification_node=heart1,opt_script=colo_secondary_opt_script,iothread=iothread1,timeout=10000 Same here. Paolo
On 12/9/2019 5:08 PM, Paolo Bonzini wrote: > On 08/12/19 18:52, Zhang, Chen wrote: >> Hi All~ >> >> No news for a long time. >> >> Please give me more comments about this series. > Sorry, people were probably busy with the QEMU release candidates. > > Even before looking at the code, the series is completely missing > documentation on how to use it and on the chardev protocol. The > documentation should go in docs/ and should be written as restructuredText. > > The qemu-options.hx patches also lack documentation about the properties > accepted by the new object. OK, I will add documentation in docs/ and qemu-options.hx in next version. For the chardev protocol part, current implementation just use plaintext that make AWD easy to connect with other user defined node, I am not very familiar with this part, do you have any suggestions here? maybe use some general protocol is better? or Jason have any suggestions? > > In particular: > >>> -chardev socket,id=h1,host=3.3.3.3,port=9009,server,nowait >>> -chardev socket,id=heartbeat0,host=3.3.3.3,port=4445 >>> -object iothread,id=iothread2 >>> -object >>> advanced-watchdog,id=heart1,server=on,awd_node=h1,notification_node=heartbeat0,opt_script=colo_opt_script_path,iothread=iothread1,pulse_interval=1000,timeout=5000 > What are the two sockets for, and what should be in colo_opt_script_path? Anything user want to send when timeout, for example: If timeout is detected, AWD send quit command to Qemu. colo_opt_script_path=/tmp/qemu-qmp-quit.script ------------------------------------ qemu-qmp-quit.script: { "execute": "quit" } ------------------------------------ Thanks Zhang Chen > >>> In secondary node: >>> >>> -monitor tcp::4445,server,nowait >>> -chardev socket,id=h1,host=3.3.3.3,port=9009,reconnect=1 >>> -chardev socket,id=heart1,host=3.3.3.8,port=4445 >>> -object iothread,id=iothread1 >>> -object >>> advanced-watchdog,id=heart1,server=off,awd_node=h1,notification_node=heart1,opt_script=colo_secondary_opt_script,iothread=iothread1,timeout=10000 > Same here. > > Paolo >
From: Zhang Chen <chen.zhang@intel.com> Advanced Watch Dog is an universal monitoring module on VMM side, it can be used to detect network down(VMM to guest, VMM to VMM, VMM to another remote server) and do previously set operation. Current AWD patch just accept any input as the signal to refresh the watchdog timer, and we can also make a certain interactive protocol here. For the output user can pre-write some command or some messages in the AWD opt-script. We noticed that there is no way for VMM communicate directly, maybe some people think we don't need such things(up layer software like openstack can handle it). But we engaged with real customer found that in some cases,they need a lightweight and efficient mechanism to solve some practical problems(openstack is too heavy). for example: When it detects lost connection with the paired node,it will send message to admin, notify another VMM, send qmp command to qemu do some operation like restart the VM, build VMM heartbeat system, etc. It make user have basic VM/Host network monitoring tools and basic false tolerance and recovery solution. Demo usage(for COLO heartbeat service): In primary node: -chardev socket,id=h1,host=3.3.3.3,port=9009,server,nowait -chardev socket,id=heartbeat0,host=3.3.3.3,port=4445 -object iothread,id=iothread2 -object advanced-watchdog,id=heart1,server=on,awd_node=h1,notification_node=heartbeat0,opt_script=colo_opt_script_path,iothread=iothread1,pulse_interval=1000,timeout=5000 In secondary node: -monitor tcp::4445,server,nowait -chardev socket,id=h1,host=3.3.3.3,port=9009,reconnect=1 -chardev socket,id=heart1,host=3.3.3.8,port=4445 -object iothread,id=iothread1 -object advanced-watchdog,id=heart1,server=off,awd_node=h1,notification_node=heart1,opt_script=colo_secondary_opt_script,iothread=iothread1,timeout=10000 V3: - Rebased on Qemu 4.2.0-rc1 code. - Fix commit message issue. V2: - Addressed Philippe comments add configure selector for AWD. Initial: - Initial version. Zhang Chen (4): net/awd.c: Introduce Advanced Watch Dog module framework net/awd.c: Initailize input/output chardev net/awd.c: Load advanced watch dog worker thread job vl.c: Make Advanced Watch Dog delayed initialization configure | 9 + net/Makefile.objs | 1 + net/awd.c | 491 ++++++++++++++++++++++++++++++++++++++++++++++ qemu-options.hx | 6 + vl.c | 7 + 5 files changed, 514 insertions(+) create mode 100644 net/awd.c