diff mbox series

[2/3] iqapi/run-state.json: introduce memory failure event

Message ID 20200914134321.958079-3-pizhenwei@bytedance.com (mailing list archive)
State New, archived
Headers show
Series add MEMORY_FAILURE event | expand

Commit Message

zhenwei pi Sept. 14, 2020, 1:43 p.m. UTC
Introduce 4 memory failure events for a guest. Then uplayer could
know when/why/what happened to a guest during hitting a hardware
memory failure.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 qapi/run-state.json | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

Comments

Peter Maydell Sept. 21, 2020, 12:48 p.m. UTC | #1
On Mon, 14 Sep 2020 at 14:53, zhenwei pi <pizhenwei@bytedance.com> wrote:
>
> Introduce 4 memory failure events for a guest. Then uplayer could
> know when/why/what happened to a guest during hitting a hardware
> memory failure.
>
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
> +##
> +# @MemoryFailureAction:
> +#
> +# Host memory failure occurs, handled by QEMU.
> +#
> +# @hypervisor-ignore: action optional memory failure at QEMU process
> +#                     addressspace (none PC-RAM), QEMU could ignore this
> +#                     hardware memory failure.
> +#
> +# @hypervisor-stop: action required memory failure at QEMU process address
> +#                   space (none PC-RAM), QEMU has to stop itself.

I'm not entirely clear what the descriptions here are trying to say.
These would be for memory failure events which are reported by the
host and which are not in guest RAM but are in the memory QEMU itself
is using ? ("PC-RAM" is a bit x86-specific.)

> +#
> +# @guest-mce: action required memory failure at PC-RAM, and guest enables MCE
> +#             handling, QEMU injects MCE to guest.
> +#
> +# @guest-triple-fault: action required memory failure at PC-RAM, but guest does
> +#                      not enable MCE handling. QEMU raises triple fault and
> +#                      shutdown/reset. Also see detailed info in QEMU log.

"triple fault" sounds rather x86-specific; other architectures
also have support for host memory failure notifications, so we
should design the QAPI events to have architecture-neutral
definitions and descriptions.

I think the four cases you're trying to distinguish here are:
 (1) action-optional memory failure in memory used by the hypervisor
     (which QEMU has ignored other than to report this event)
 (2) action-required memory failure in memory used by the hypervisor
     (QEMU is stopping)
 (3) action-required memory failure in guest memory, which QEMU
     has reported to the guest
 (4) action-required memory failure in guest memory, but the
     guest OS does not support a mechanism for reporting it

Is that right?

Anyway, I think we should try to find names for the failure
types that are not x86-specific.

thanks
-- PMM
zhenwei pi Sept. 21, 2020, 1:10 p.m. UTC | #2
On 9/21/20 8:48 PM, Peter Maydell wrote:
> On Mon, 14 Sep 2020 at 14:53, zhenwei pi <pizhenwei@bytedance.com> wrote:
>>
>> Introduce 4 memory failure events for a guest. Then uplayer could
>> know when/why/what happened to a guest during hitting a hardware
>> memory failure.
>>
>> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
>> ---
>> +##
>> +# @MemoryFailureAction:
>> +#
>> +# Host memory failure occurs, handled by QEMU.
>> +#
>> +# @hypervisor-ignore: action optional memory failure at QEMU process
>> +#                     addressspace (none PC-RAM), QEMU could ignore this
>> +#                     hardware memory failure.
>> +#
>> +# @hypervisor-stop: action required memory failure at QEMU process address
>> +#                   space (none PC-RAM), QEMU has to stop itself.
> 
> I'm not entirely clear what the descriptions here are trying to say.
> These would be for memory failure events which are reported by the
> host and which are not in guest RAM but are in the memory QEMU itself
> is using ? ("PC-RAM" is a bit x86-specific.)
> 
>> +#
>> +# @guest-mce: action required memory failure at PC-RAM, and guest enables MCE
>> +#             handling, QEMU injects MCE to guest.
>> +#
>> +# @guest-triple-fault: action required memory failure at PC-RAM, but guest does
>> +#                      not enable MCE handling. QEMU raises triple fault and
>> +#                      shutdown/reset. Also see detailed info in QEMU log.
> 
> "triple fault" sounds rather x86-specific; other architectures
> also have support for host memory failure notifications, so we
> should design the QAPI events to have architecture-neutral
> definitions and descriptions.
> 
> I think the four cases you're trying to distinguish here are:
>   (1) action-optional memory failure in memory used by the hypervisor
>       (which QEMU has ignored other than to report this event)
>   (2) action-required memory failure in memory used by the hypervisor
>       (QEMU is stopping)
>   (3) action-required memory failure in guest memory, which QEMU
>       has reported to the guest
>   (4) action-required memory failure in guest memory, but the
>       guest OS does not support a mechanism for reporting it
> 
> Is that right?
> 
> Anyway, I think we should try to find names for the failure
> types that are not x86-specific.
> 
> thanks
> -- PMM
> 
Right, to make architecture-neutral, how about these changes:
'PC-RAM' -> 'guest-memory'
'guest-mce' -> 'guest-mce-inject'
'guest-triple-fault' -> 'guest-mce-fault'
Paolo Bonzini Sept. 22, 2020, 7:11 a.m. UTC | #3
On 21/09/20 15:10, zhenwei pi wrote:
>>
> Right, to make architecture-neutral, how about these changes:
> 'PC-RAM' -> 'guest-memory'
> 'guest-mce' -> 'guest-mce-inject'
> 'guest-triple-fault' -> 'guest-mce-fault'

Perhaps we should have three fields

1) recipient: 'hypervisor' or 'guest'

2) action: 'ignore', 'inject', 'fatal'

3) kind: 'action-optional' or 'action-required'

And possibly:

4) recursive: true or false

On x86 "recursive" would be set if MCIP=1.

Paolo
diff mbox series

Patch

diff --git a/qapi/run-state.json b/qapi/run-state.json
index 7cc9f96a5b..fdc39ce262 100644
--- a/qapi/run-state.json
+++ b/qapi/run-state.json
@@ -475,3 +475,49 @@ 
            'psw-mask': 'uint64',
            'psw-addr': 'uint64',
            'reason': 'S390CrashReason' } }
+
+##
+# @MEMORY_FAILURE:
+#
+# Emitted when a memory failure occurs on host side.
+#
+# @action: action that has been taken. action is defined as @MemoryFailureAction.
+#
+# Since: 5.2
+#
+# Example:
+#
+# <- { "event": "MEMORY_FAILURE",
+#      "data": { "action": "guest-mce" } }
+#
+##
+{ 'event': 'MEMORY_FAILURE',
+  'data': { 'action': 'MemoryFailureAction'} }
+
+##
+# @MemoryFailureAction:
+#
+# Host memory failure occurs, handled by QEMU.
+#
+# @hypervisor-ignore: action optional memory failure at QEMU process
+#                     addressspace (none PC-RAM), QEMU could ignore this
+#                     hardware memory failure.
+#
+# @hypervisor-stop: action required memory failure at QEMU process address
+#                   space (none PC-RAM), QEMU has to stop itself.
+#
+# @guest-mce: action required memory failure at PC-RAM, and guest enables MCE
+#             handling, QEMU injects MCE to guest.
+#
+# @guest-triple-fault: action required memory failure at PC-RAM, but guest does
+#                      not enable MCE handling. QEMU raises triple fault and
+#                      shutdown/reset. Also see detailed info in QEMU log.
+#
+# Since: 5.2
+#
+##
+{ 'enum': 'MemoryFailureAction',
+  'data': [ 'hypervisor-ignore',
+            'hypervisor-stop',
+            'guest-mce',
+            'guest-triple-fault' ] }