mbox series

[v2,0/3] add MEMORY_FAILURE event

Message ID 20200922095630.394893-1-pizhenwei@bytedance.com (mailing list archive)
Headers show
Series add MEMORY_FAILURE event | expand

Message

zhenwei pi Sept. 22, 2020, 9:56 a.m. UTC
v1->v2:
Suggested by Peter Maydell, rename events to make them
architecture-neutral:
'PC-RAM' -> 'guest-memory'
'guest-triple-fault' -> 'guest-mce-fatal'

Suggested by Paolo, add more fields in event:
'action-required': boolean type to distinguish a guest-mce is AR/AO.
'recursive': boolean type. set true if: previous MCE in processing
             in guest, another AO MCE occurs.

v1:
Although QEMU could catch signal BUS to handle hardware memory
corrupted event, sadly, QEMU just prints a little log and try to fix
it silently.

In these patches, introduce a 'MEMORY_FAILURE' event with 4 detailed
actions of QEMU, then uplayer could know what situaction QEMU hit and
did. And further step we can do: if a host server hits a 'hypervisor-ignore'
or 'guest-mce', scheduler could migrate VM to another host; if hitting
'hypervisor-stop' or 'guest-triple-fault', scheduler could select other
healthy servers to launch VM.

Zhenwei Pi (3):
  target-i386: seperate MCIP & MCE_MASK error reason
  qapi/run-state.json: introduce memory failure event
  target-i386: post memory failure event to uplayer

 qapi/run-state.json  | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 target/i386/helper.c | 40 +++++++++++++++++++++++++------
 target/i386/kvm.c    |  7 +++++-
 3 files changed, 106 insertions(+), 8 deletions(-)

Comments

no-reply@patchew.org Sept. 22, 2020, 3:40 p.m. UTC | #1
Patchew URL: https://patchew.org/QEMU/20200922095630.394893-1-pizhenwei@bytedance.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

C linker for the host machine: cc ld.bfd 2.27-43
Host machine cpu family: x86_64
Host machine cpu: x86_64
../src/meson.build:10: WARNING: Module unstable-keyval has no backwards or forwards compatibility and might not exist in future releases.
Program sh found: YES
Program python3 found: YES (/usr/bin/python3)
Configuring ninjatool using configuration
---
Not run: 259
Failures: 192
Failed 1 of 121 iotests
make: *** [check-block] Error 1
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 709, in <module>
    sys.exit(main())
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--rm', '--label', 'com.qemu.instance.uuid=c2bcee7055544144a0155e97d2f7a118', '-u', '1001', '--security-opt', 'seccomp=unconfined', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-prfzegjw/src/docker-src.2020-09-22-11.22.51.1488:/var/tmp/qemu:z,ro', 'qemu/centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=c2bcee7055544144a0155e97d2f7a118
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-prfzegjw/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    17m39.310s
user    0m20.428s


The full log is available at
http://patchew.org/logs/20200922095630.394893-1-pizhenwei@bytedance.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
zhenwei pi Sept. 28, 2020, 12:01 p.m. UTC | #2
PING

On 9/22/20 5:56 PM, zhenwei pi wrote:
> v1->v2:
> Suggested by Peter Maydell, rename events to make them
> architecture-neutral:
> 'PC-RAM' -> 'guest-memory'
> 'guest-triple-fault' -> 'guest-mce-fatal'
> 
> Suggested by Paolo, add more fields in event:
> 'action-required': boolean type to distinguish a guest-mce is AR/AO.
> 'recursive': boolean type. set true if: previous MCE in processing
>               in guest, another AO MCE occurs.
> 
> v1:
> Although QEMU could catch signal BUS to handle hardware memory
> corrupted event, sadly, QEMU just prints a little log and try to fix
> it silently.
> 
> In these patches, introduce a 'MEMORY_FAILURE' event with 4 detailed
> actions of QEMU, then uplayer could know what situaction QEMU hit and
> did. And further step we can do: if a host server hits a 'hypervisor-ignore'
> or 'guest-mce', scheduler could migrate VM to another host; if hitting
> 'hypervisor-stop' or 'guest-triple-fault', scheduler could select other
> healthy servers to launch VM.
> 
> Zhenwei Pi (3):
>    target-i386: seperate MCIP & MCE_MASK error reason
>    qapi/run-state.json: introduce memory failure event
>    target-i386: post memory failure event to uplayer
> 
>   qapi/run-state.json  | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>   target/i386/helper.c | 40 +++++++++++++++++++++++++------
>   target/i386/kvm.c    |  7 +++++-
>   3 files changed, 106 insertions(+), 8 deletions(-)
>