Message ID | 20220923073333.23381-3-chenyi.qiang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Enable notify VM exit | expand |
On 9/23/22 09:33, Chenyi Qiang wrote: > Because there are some concerns, e.g. a notify VM exit may happen with > VM_CONTEXT_INVALID set in exit qualification (no cases are anticipated > that would set this bit), which means VM context is corrupted. To avoid > the false positive and a well-behaved guest gets killed, make this > feature disabled by default. Users can enable the feature by a new > machine property: > qemu -machine notify_vmexit=on,notify_window=0 ... Some comments on the interface: - the argument should be one of "run" (i.e. do nothing and continue, the default), "internal-error" (i.e. raise a KVM internal error), "disable" (i.e. do not enable the capability). You can add the enum to qapi/runstate.json and use object_class_property_add_enum to define the QOM property. - properties should have a dash ("-") in the name, not an underscore - the property should be added to "-accel kvm,..." (on x86 only). See after my signature for a preparatory patch that adds a new kvm_arch_accel_class_init hook. The default would be either "run" or "disable". Honestly I think it should be "run", otherwise there's no point in adding the feature; if it is not enabled by default, it is very likely that no one would use it. > A new KVM exit reason KVM_EXIT_NOTIFY is defined for notify VM exit. If > it happens with VM_INVALID_CONTEXT, hypervisor exits to user space to > inform the fatal case. Then user space can inject a SHUTDOWN event to > the target vcpu. This is implemented by injecting a sythesized triple > fault event. I don't think a triple fault is a good match for an event that "should not happen" and is the fault of the processor rather than the guest. This should be a KVM internal error. The workaround is to disable the notify vmexit. > + warn_report_once("KVM: encounter a notify exit with %svalid context in" > + " guest. It means there can be possible misbehaves in" > + " guest, please have a look.", > + ctx_invalid ? "in" : ""); The warning should be unconditional if the context is invalid. > + object_class_property_add(oc, X86_MACHINE_NOTIFY_WINDOW, "uint32_t", uint32 (not uint32_t) > + x86_machine_get_notify_window, > + x86_machine_set_notify_window, NULL, NULL); > + object_class_property_set_description(oc, X86_MACHINE_NOTIFY_WINDOW, > + "Set the notify window required by notify VM exit"); "Clock cycles without an event window after which a notification VM exit occurs" Thanks, Paolo From a5cb704991cfcda19a33c622833b69a8f6928530 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini <pbonzini@redhat.com> Date: Tue, 27 Sep 2022 15:20:16 +0200 Subject: [PATCH] kvm: allow target-specific accelerator properties Several hypervisor capabilities in KVM are target-specific. When exposed to QEMU users as accelerator properties (i.e. -accel kvm,prop=value), they should not be available for all targets. Add a hook for targets to add their own properties to -accel kvm; for now no such property is defined. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c index 5acab1767f..f90c5cb285 100644 --- a/accel/kvm/kvm-all.c +++ b/accel/kvm/kvm-all.c @@ -3737,6 +3737,8 @@ static void kvm_accel_class_init(ObjectClass *oc, void *data) NULL, NULL); object_class_property_set_description(oc, "dirty-ring-size", "Size of KVM dirty page ring buffer (default: 0, i.e. use bitmap)"); + + kvm_arch_accel_class_init(oc); } static const TypeInfo kvm_accel_type = { diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index efd6dee818..50868ebf60 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -353,6 +353,8 @@ bool kvm_device_supported(int vmfd, uint64_t type); extern const KVMCapabilityInfo kvm_arch_required_capabilities[]; +void kvm_arch_accel_class_init(ObjectClass *oc); + void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run); MemTxAttrs kvm_arch_post_run(CPUState *cpu, struct kvm_run *run); diff --git a/target/arm/kvm.c b/target/arm/kvm.c index e5c1bd50d2..d21603cf28 100644 --- a/target/arm/kvm.c +++ b/target/arm/kvm.c @@ -1056,3 +1056,7 @@ bool kvm_arch_cpu_check_are_resettable(void) { return true; } + +void kvm_arch_accel_class_init(ObjectClass *oc) +{ +} diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 21880836a6..22b3b37193 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -5472,3 +5472,7 @@ void kvm_request_xsave_components(X86CPU *cpu, uint64_t mask) mask &= ~BIT_ULL(bit); } } + +void kvm_arch_accel_class_init(ObjectClass *oc) +{ +} diff --git a/target/mips/kvm.c b/target/mips/kvm.c index caf70decd2..bcb8e06b2c 100644 --- a/target/mips/kvm.c +++ b/target/mips/kvm.c @@ -1294,3 +1294,7 @@ bool kvm_arch_cpu_check_are_resettable(void) { return true; } + +void kvm_arch_accel_class_init(ObjectClass *oc) +{ +} diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c index 466d0d2f4c..7c25348b7b 100644 --- a/target/ppc/kvm.c +++ b/target/ppc/kvm.c @@ -2966,3 +2966,7 @@ bool kvm_arch_cpu_check_are_resettable(void) { return true; } + +void kvm_arch_accel_class_init(ObjectClass *oc) +{ +} diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c index 70b4cff06f..30f21453d6 100644 --- a/target/riscv/kvm.c +++ b/target/riscv/kvm.c @@ -532,3 +532,7 @@ bool kvm_arch_cpu_check_are_resettable(void) { return true; } + +void kvm_arch_accel_class_init(ObjectClass *oc) +{ +} diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c index 7bd8db0e7b..840af34576 100644 --- a/target/s390x/kvm/kvm.c +++ b/target/s390x/kvm/kvm.c @@ -2574,3 +2574,7 @@ bool kvm_arch_cpu_check_are_resettable(void) { return true; } + +void kvm_arch_accel_class_init(ObjectClass *oc) +{ +}
On 9/27/2022 9:43 PM, Paolo Bonzini wrote: > On 9/23/22 09:33, Chenyi Qiang wrote: >> Because there are some concerns, e.g. a notify VM exit may happen with >> VM_CONTEXT_INVALID set in exit qualification (no cases are anticipated >> that would set this bit), which means VM context is corrupted. To avoid >> the false positive and a well-behaved guest gets killed, make this >> feature disabled by default. Users can enable the feature by a new >> machine property: >> qemu -machine notify_vmexit=on,notify_window=0 ... > > Some comments on the interface: > > - the argument should be one of "run" (i.e. do nothing and continue, the > default), "internal-error" (i.e. raise a KVM internal error), "disable" > (i.e. do not enable the capability). You can add the enum to > qapi/runstate.json and use object_class_property_add_enum to define > the QOM property. > So, IIUC, the three options of notify-vmexit would be: 1. run (enable the capability but do nothing if the vmexit happens) 2. internal-error (enable the capability and raise a KVM internal error if it happens) 3. disable (do not enable the capability) For the invalid context case, exit and raise a KVM internal error unconditionally. > - properties should have a dash ("-") in the name, not an underscore > > - the property should be added to "-accel kvm,..." (on x86 only). See > after my signature for a preparatory patch that adds a new > kvm_arch_accel_class_init hook. > > The default would be either "run" or "disable". Honestly I think it > should be "run", otherwise there's no point in adding the feature; > if it is not enabled by default, it is very likely that no one would > use it. > Yeah, personally speaking, I also prefer to enable it by default. In previous KVM patch discussion, we were worried about the buggy silicon to cause the invalid context case, which will kill the benign VM. But since there is little possibility and we can't tell if it is a false positive when it happens. I think default to "run" is acceptable. >> A new KVM exit reason KVM_EXIT_NOTIFY is defined for notify VM exit. If >> it happens with VM_INVALID_CONTEXT, hypervisor exits to user space to >> inform the fatal case. Then user space can inject a SHUTDOWN event to >> the target vcpu. This is implemented by injecting a sythesized triple >> fault event. > > I don't think a triple fault is a good match for an event that "should > not happen" and is the fault of the processor rather than the guest. > This should be a KVM internal error. The workaround is to disable the > notify vmexit. > >> + warn_report_once("KVM: encounter a notify exit with %svalid >> context in" >> + " guest. It means there can be possible >> misbehaves in" >> + " guest, please have a look.", >> + ctx_invalid ? "in" : ""); > > The warning should be unconditional if the context is invalid. > In valid context case, the warning can also notify the admin that the guest misbehaves. Is it necessary to remove it? >> + object_class_property_add(oc, X86_MACHINE_NOTIFY_WINDOW, "uint32_t", > > uint32 (not uint32_t) > ... >> + x86_machine_get_notify_window, >> + x86_machine_set_notify_window, NULL, >> NULL); >> + object_class_property_set_description(oc, X86_MACHINE_NOTIFY_WINDOW, >> + "Set the notify window required by notify VM exit"); > > "Clock cycles without an event window after which a notification VM exit > occurs" > Will Fix it. Thanks a lot! > Thanks, > > Paolo > > From a5cb704991cfcda19a33c622833b69a8f6928530 Mon Sep 17 00:00:00 2001 > From: Paolo Bonzini <pbonzini@redhat.com> > Date: Tue, 27 Sep 2022 15:20:16 +0200 > Subject: [PATCH] kvm: allow target-specific accelerator properties > > Several hypervisor capabilities in KVM are target-specific. When exposed > to QEMU users as accelerator properties (i.e. -accel kvm,prop=value), they > should not be available for all targets. > > Add a hook for targets to add their own properties to -accel kvm; for > now no such property is defined. > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c > index 5acab1767f..f90c5cb285 100644 > --- a/accel/kvm/kvm-all.c > +++ b/accel/kvm/kvm-all.c > @@ -3737,6 +3737,8 @@ static void kvm_accel_class_init(ObjectClass *oc, > void *data) > NULL, NULL); > object_class_property_set_description(oc, "dirty-ring-size", > "Size of KVM dirty page ring buffer (default: 0, i.e. use > bitmap)"); > + > + kvm_arch_accel_class_init(oc); > } > > static const TypeInfo kvm_accel_type = { > diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h > index efd6dee818..50868ebf60 100644 > --- a/include/sysemu/kvm.h > +++ b/include/sysemu/kvm.h > @@ -353,6 +353,8 @@ bool kvm_device_supported(int vmfd, uint64_t type); > > extern const KVMCapabilityInfo kvm_arch_required_capabilities[]; > > +void kvm_arch_accel_class_init(ObjectClass *oc); > + > void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run); > MemTxAttrs kvm_arch_post_run(CPUState *cpu, struct kvm_run *run); > > diff --git a/target/arm/kvm.c b/target/arm/kvm.c > index e5c1bd50d2..d21603cf28 100644 > --- a/target/arm/kvm.c > +++ b/target/arm/kvm.c > @@ -1056,3 +1056,7 @@ bool kvm_arch_cpu_check_are_resettable(void) > { > return true; > } > + > +void kvm_arch_accel_class_init(ObjectClass *oc) > +{ > +} > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c > index 21880836a6..22b3b37193 100644 > --- a/target/i386/kvm/kvm.c > +++ b/target/i386/kvm/kvm.c > @@ -5472,3 +5472,7 @@ void kvm_request_xsave_components(X86CPU *cpu, > uint64_t mask) > mask &= ~BIT_ULL(bit); > } > } > + > +void kvm_arch_accel_class_init(ObjectClass *oc) > +{ > +} > diff --git a/target/mips/kvm.c b/target/mips/kvm.c > index caf70decd2..bcb8e06b2c 100644 > --- a/target/mips/kvm.c > +++ b/target/mips/kvm.c > @@ -1294,3 +1294,7 @@ bool kvm_arch_cpu_check_are_resettable(void) > { > return true; > } > + > +void kvm_arch_accel_class_init(ObjectClass *oc) > +{ > +} > diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c > index 466d0d2f4c..7c25348b7b 100644 > --- a/target/ppc/kvm.c > +++ b/target/ppc/kvm.c > @@ -2966,3 +2966,7 @@ bool kvm_arch_cpu_check_are_resettable(void) > { > return true; > } > + > +void kvm_arch_accel_class_init(ObjectClass *oc) > +{ > +} > diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c > index 70b4cff06f..30f21453d6 100644 > --- a/target/riscv/kvm.c > +++ b/target/riscv/kvm.c > @@ -532,3 +532,7 @@ bool kvm_arch_cpu_check_are_resettable(void) > { > return true; > } > + > +void kvm_arch_accel_class_init(ObjectClass *oc) > +{ > +} > diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c > index 7bd8db0e7b..840af34576 100644 > --- a/target/s390x/kvm/kvm.c > +++ b/target/s390x/kvm/kvm.c > @@ -2574,3 +2574,7 @@ bool kvm_arch_cpu_check_are_resettable(void) > { > return true; > } > + > +void kvm_arch_accel_class_init(ObjectClass *oc) > +{ > +} >
Il mer 28 set 2022, 04:21 Chenyi Qiang <chenyi.qiang@intel.com> ha scritto: > >> + warn_report_once("KVM: encounter a notify exit with %svalid > >> context in" > >> + " guest. It means there can be possible > >> misbehaves in" > >> + " guest, please have a look.", > >> + ctx_invalid ? "in" : ""); > > > > The warning should be unconditional if the context is invalid. > > > > In valid context case, the warning can also notify the admin that the > guest misbehaves. Is it necessary to remove it? > You can keep it, but it should be separated so that that invalid context case uses warn_report instead of warn_report_once. Paolo > >> + object_class_property_add(oc, X86_MACHINE_NOTIFY_WINDOW, > "uint32_t", > > > > uint32 (not uint32_t) > > > > ... > > >> + x86_machine_get_notify_window, > >> + x86_machine_set_notify_window, NULL, > >> NULL); > >> + object_class_property_set_description(oc, > X86_MACHINE_NOTIFY_WINDOW, > >> + "Set the notify window required by notify VM exit"); > > > > "Clock cycles without an event window after which a notification VM exit > > occurs" > > > > Will Fix it. Thanks a lot! > > > Thanks, > > > > Paolo > > > > From a5cb704991cfcda19a33c622833b69a8f6928530 Mon Sep 17 00:00:00 2001 > > From: Paolo Bonzini <pbonzini@redhat.com> > > Date: Tue, 27 Sep 2022 15:20:16 +0200 > > Subject: [PATCH] kvm: allow target-specific accelerator properties > > > > Several hypervisor capabilities in KVM are target-specific. When exposed > > to QEMU users as accelerator properties (i.e. -accel kvm,prop=value), > they > > should not be available for all targets. > > > > Add a hook for targets to add their own properties to -accel kvm; for > > now no such property is defined. > > > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > > > > diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c > > index 5acab1767f..f90c5cb285 100644 > > --- a/accel/kvm/kvm-all.c > > +++ b/accel/kvm/kvm-all.c > > @@ -3737,6 +3737,8 @@ static void kvm_accel_class_init(ObjectClass *oc, > > void *data) > > NULL, NULL); > > object_class_property_set_description(oc, "dirty-ring-size", > > "Size of KVM dirty page ring buffer (default: 0, i.e. use > > bitmap)"); > > + > > + kvm_arch_accel_class_init(oc); > > } > > > > static const TypeInfo kvm_accel_type = { > > diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h > > index efd6dee818..50868ebf60 100644 > > --- a/include/sysemu/kvm.h > > +++ b/include/sysemu/kvm.h > > @@ -353,6 +353,8 @@ bool kvm_device_supported(int vmfd, uint64_t type); > > > > extern const KVMCapabilityInfo kvm_arch_required_capabilities[]; > > > > +void kvm_arch_accel_class_init(ObjectClass *oc); > > + > > void kvm_arch_pre_run(CPUState *cpu, struct kvm_run *run); > > MemTxAttrs kvm_arch_post_run(CPUState *cpu, struct kvm_run *run); > > > > diff --git a/target/arm/kvm.c b/target/arm/kvm.c > > index e5c1bd50d2..d21603cf28 100644 > > --- a/target/arm/kvm.c > > +++ b/target/arm/kvm.c > > @@ -1056,3 +1056,7 @@ bool kvm_arch_cpu_check_are_resettable(void) > > { > > return true; > > } > > + > > +void kvm_arch_accel_class_init(ObjectClass *oc) > > +{ > > +} > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c > > index 21880836a6..22b3b37193 100644 > > --- a/target/i386/kvm/kvm.c > > +++ b/target/i386/kvm/kvm.c > > @@ -5472,3 +5472,7 @@ void kvm_request_xsave_components(X86CPU *cpu, > > uint64_t mask) > > mask &= ~BIT_ULL(bit); > > } > > } > > + > > +void kvm_arch_accel_class_init(ObjectClass *oc) > > +{ > > +} > > diff --git a/target/mips/kvm.c b/target/mips/kvm.c > > index caf70decd2..bcb8e06b2c 100644 > > --- a/target/mips/kvm.c > > +++ b/target/mips/kvm.c > > @@ -1294,3 +1294,7 @@ bool kvm_arch_cpu_check_are_resettable(void) > > { > > return true; > > } > > + > > +void kvm_arch_accel_class_init(ObjectClass *oc) > > +{ > > +} > > diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c > > index 466d0d2f4c..7c25348b7b 100644 > > --- a/target/ppc/kvm.c > > +++ b/target/ppc/kvm.c > > @@ -2966,3 +2966,7 @@ bool kvm_arch_cpu_check_are_resettable(void) > > { > > return true; > > } > > + > > +void kvm_arch_accel_class_init(ObjectClass *oc) > > +{ > > +} > > diff --git a/target/riscv/kvm.c b/target/riscv/kvm.c > > index 70b4cff06f..30f21453d6 100644 > > --- a/target/riscv/kvm.c > > +++ b/target/riscv/kvm.c > > @@ -532,3 +532,7 @@ bool kvm_arch_cpu_check_are_resettable(void) > > { > > return true; > > } > > + > > +void kvm_arch_accel_class_init(ObjectClass *oc) > > +{ > > +} > > diff --git a/target/s390x/kvm/kvm.c b/target/s390x/kvm/kvm.c > > index 7bd8db0e7b..840af34576 100644 > > --- a/target/s390x/kvm/kvm.c > > +++ b/target/s390x/kvm/kvm.c > > @@ -2574,3 +2574,7 @@ bool kvm_arch_cpu_check_are_resettable(void) > > { > > return true; > > } > > + > > +void kvm_arch_accel_class_init(ObjectClass *oc) > > +{ > > +} > > > >
diff --git a/hw/i386/x86.c b/hw/i386/x86.c index 050eedc0c8..1eccbd3deb 100644 --- a/hw/i386/x86.c +++ b/hw/i386/x86.c @@ -1379,6 +1379,37 @@ static void machine_set_sgx_epc(Object *obj, Visitor *v, const char *name, qapi_free_SgxEPCList(list); } +static bool x86_machine_get_notify_vmexit(Object *obj, Error **errp) +{ + X86MachineState *x86ms = X86_MACHINE(obj); + + return x86ms->notify_vmexit; +} + +static void x86_machine_set_notify_vmexit(Object *obj, bool value, Error **errp) +{ + X86MachineState *x86ms = X86_MACHINE(obj); + + x86ms->notify_vmexit = value; +} + +static void x86_machine_get_notify_window(Object *obj, Visitor *v, + const char *name, void *opaque, Error **errp) +{ + X86MachineState *x86ms = X86_MACHINE(obj); + uint32_t notify_window = x86ms->notify_window; + + visit_type_uint32(v, name, ¬ify_window, errp); +} + +static void x86_machine_set_notify_window(Object *obj, Visitor *v, + const char *name, void *opaque, Error **errp) +{ + X86MachineState *x86ms = X86_MACHINE(obj); + + visit_type_uint32(v, name, &x86ms->notify_window, errp); +} + static void x86_machine_initfn(Object *obj) { X86MachineState *x86ms = X86_MACHINE(obj); @@ -1392,6 +1423,8 @@ static void x86_machine_initfn(Object *obj) x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8); x86ms->bus_lock_ratelimit = 0; x86ms->above_4g_mem_start = 4 * GiB; + x86ms->notify_vmexit = false; + x86ms->notify_window = 0; } static void x86_machine_class_init(ObjectClass *oc, void *data) @@ -1461,6 +1494,18 @@ static void x86_machine_class_init(ObjectClass *oc, void *data) NULL, NULL); object_class_property_set_description(oc, "sgx-epc", "SGX EPC device"); + + object_class_property_add(oc, X86_MACHINE_NOTIFY_WINDOW, "uint32_t", + x86_machine_get_notify_window, + x86_machine_set_notify_window, NULL, NULL); + object_class_property_set_description(oc, X86_MACHINE_NOTIFY_WINDOW, + "Set the notify window required by notify VM exit"); + + object_class_property_add_bool(oc, X86_MACHINE_NOTIFY_VMEXIT, + x86_machine_get_notify_vmexit, + x86_machine_set_notify_vmexit); + object_class_property_set_description(oc, X86_MACHINE_NOTIFY_VMEXIT, + "Enable notify VM exit"); } static const TypeInfo x86_machine_info = { diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h index 62fa5774f8..5707329fa7 100644 --- a/include/hw/i386/x86.h +++ b/include/hw/i386/x86.h @@ -85,6 +85,9 @@ struct X86MachineState { * which means no limitation on the guest's bus locks. */ uint64_t bus_lock_ratelimit; + + bool notify_vmexit; + uint32_t notify_window; }; #define X86_MACHINE_SMM "smm" @@ -94,6 +97,8 @@ struct X86MachineState { #define X86_MACHINE_OEM_ID "x-oem-id" #define X86_MACHINE_OEM_TABLE_ID "x-oem-table-id" #define X86_MACHINE_BUS_LOCK_RATELIMIT "bus-lock-ratelimit" +#define X86_MACHINE_NOTIFY_VMEXIT "notify-vmexit" +#define X86_MACHINE_NOTIFY_WINDOW "notify-window" #define TYPE_X86_MACHINE MACHINE_TYPE_NAME("x86") OBJECT_DECLARE_TYPE(X86MachineState, X86MachineClass, X86_MACHINE) diff --git a/qemu-options.hx b/qemu-options.hx index d8b5ce5b43..1fa0fd8f1a 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -37,7 +37,8 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \ " memory-encryption=@var{} memory encryption object to use (default=none)\n" " hmat=on|off controls ACPI HMAT support (default=off)\n" " memory-backend='backend-id' specifies explicitly provided backend for main RAM (default=none)\n" - " cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n", + " cxl-fmw.0.targets.0=firsttarget,cxl-fmw.0.targets.1=secondtarget,cxl-fmw.0.size=size[,cxl-fmw.0.interleave-granularity=granularity]\n" + " notify_vmexit=on|off,notify_window=n controls notify VM exit support (default=off) and specifies the notify window size (default=0)\n", QEMU_ARCH_ALL) SRST ``-machine [type=]name[,prop=value[,...]]`` @@ -157,6 +158,13 @@ SRST :: -machine cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.targets.1=cxl.1,cxl-fmw.0.size=128G,cxl-fmw.0.interleave-granularity=512k + + ``notify_vmexit=on|off,notify_window=n`` + Enables or disables Notify VM exit support on x86 host and specify + the corresponding notify window to trigger the VM exit if enabled. + This feature can mitigate the CPU stuck issue due to event windows + don't open up for a specified of time (notify window). + The default is off. ERST DEF("M", HAS_ARG, QEMU_OPTION_M, diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 3838827134..dd2d33f994 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -2597,6 +2597,21 @@ int kvm_arch_init(MachineState *ms, KVMState *s) ratelimit_set_speed(&bus_lock_ratelimit_ctrl, x86ms->bus_lock_ratelimit, BUS_LOCK_SLICE_TIME); } + + if (x86ms->notify_vmexit && + kvm_check_extension(s, KVM_CAP_X86_NOTIFY_VMEXIT)) { + uint64_t notify_window_flags = + ((uint64_t)x86ms->notify_window << 32) | + KVM_X86_NOTIFY_VMEXIT_ENABLED | + KVM_X86_NOTIFY_VMEXIT_USER; + ret = kvm_vm_enable_cap(s, KVM_CAP_X86_NOTIFY_VMEXIT, 0, + notify_window_flags); + if (ret < 0) { + error_report("kvm: Failed to enable notify vmexit cap: %s", + strerror(-ret)); + return ret; + } + } } return 0; @@ -5141,6 +5156,8 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run) X86CPU *cpu = X86_CPU(cs); uint64_t code; int ret; + struct kvm_vcpu_events events = {}; + bool ctx_invalid; switch (run->exit_reason) { case KVM_EXIT_HLT: @@ -5196,6 +5213,23 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run) /* already handled in kvm_arch_post_run */ ret = 0; break; + case KVM_EXIT_NOTIFY: + ctx_invalid = !!(run->notify.flags & KVM_NOTIFY_CONTEXT_INVALID); + ret = 0; + warn_report_once("KVM: encounter a notify exit with %svalid context in" + " guest. It means there can be possible misbehaves in" + " guest, please have a look.", + ctx_invalid ? "in" : ""); + if (ctx_invalid) { + if (has_triple_fault_event) { + events.flags |= KVM_VCPUEVENT_VALID_TRIPLE_FAULT; + events.triple_fault.pending = true; + ret = kvm_vcpu_ioctl(cs, KVM_SET_VCPU_EVENTS, &events); + } else { + ret = -1; + } + } + break; default: fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason); ret = -1;