Message ID | 20220310090205.10645-3-chenyi.qiang@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Enable notify VM exit | expand |
On Thu, Mar 10, 2022 at 05:02:05PM +0800, Chenyi Qiang wrote: > There are cases that malicious virtual machine can cause CPU stuck (due > to event windows don't open up), e.g., infinite loop in microcode when > nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and > IRQ) can be delivered. It leads the CPU to be unavailable to host or > other VMs. Notify VM exit is introduced to mitigate such kind of > attacks, which will generate a VM exit if no event window occurs in VM > non-root mode for a specified amount of time (notify window). > > A new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT is exposed to user space > so that the user can query the capability and set the expected notify > window when creating VMs. > > If notify VM exit happens with VM_INVALID_CONTEXT, hypervisor should > exit to user space with the exit reason KVM_EXIT_NOTIFY to inform the > fatal case. Then user space can inject a SHUTDOWN event to the target > vcpu. This is implemented by defining a new bit in flags field of > kvm_vcpu_event in KVM_SET_VCPU_EVENTS ioctl. > > Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> > --- > hw/i386/x86.c | 24 ++++++++++++++++++ > include/hw/i386/x86.h | 3 +++ > target/i386/kvm/kvm.c | 58 ++++++++++++++++++++++++++++--------------- > 3 files changed, 65 insertions(+), 20 deletions(-) > > diff --git a/hw/i386/x86.c b/hw/i386/x86.c > index b84840a1bb..25e6c50b1e 100644 > --- a/hw/i386/x86.c > +++ b/hw/i386/x86.c > @@ -1309,6 +1309,23 @@ static void machine_set_sgx_epc(Object *obj, Visitor *v, const char *name, > qapi_free_SgxEPCList(list); > } > > +static void x86_machine_get_notify_window(Object *obj, Visitor *v, > + const char *name, void *opaque, Error **errp) > +{ > + X86MachineState *x86ms = X86_MACHINE(obj); > + int32_t notify_window = x86ms->notify_window; > + > + visit_type_int32(v, name, ¬ify_window, errp); > +} > + > +static void x86_machine_set_notify_window(Object *obj, Visitor *v, > + const char *name, void *opaque, Error **errp) > +{ > + X86MachineState *x86ms = X86_MACHINE(obj); > + > + visit_type_int32(v, name, &x86ms->notify_window, errp); > +} > + > static void x86_machine_initfn(Object *obj) > { > X86MachineState *x86ms = X86_MACHINE(obj); > @@ -1319,6 +1336,7 @@ static void x86_machine_initfn(Object *obj) > x86ms->oem_id = g_strndup(ACPI_BUILD_APPNAME6, 6); > x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8); > x86ms->bus_lock_ratelimit = 0; > + x86ms->notify_window = -1; > } IIUC from the kernel patch, this negative value leaves the protection disabled, and thus the host remains vulnerable to the CVE. I would expect this ought to set a suitable default value to fix the flaw. Regards, Daniel
On 3/10/2022 5:17 PM, Daniel P. Berrangé wrote: > On Thu, Mar 10, 2022 at 05:02:05PM +0800, Chenyi Qiang wrote: >> There are cases that malicious virtual machine can cause CPU stuck (due >> to event windows don't open up), e.g., infinite loop in microcode when >> nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and >> IRQ) can be delivered. It leads the CPU to be unavailable to host or >> other VMs. Notify VM exit is introduced to mitigate such kind of >> attacks, which will generate a VM exit if no event window occurs in VM >> non-root mode for a specified amount of time (notify window). >> >> A new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT is exposed to user space >> so that the user can query the capability and set the expected notify >> window when creating VMs. >> >> If notify VM exit happens with VM_INVALID_CONTEXT, hypervisor should >> exit to user space with the exit reason KVM_EXIT_NOTIFY to inform the >> fatal case. Then user space can inject a SHUTDOWN event to the target >> vcpu. This is implemented by defining a new bit in flags field of >> kvm_vcpu_event in KVM_SET_VCPU_EVENTS ioctl. >> >> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> >> --- >> hw/i386/x86.c | 24 ++++++++++++++++++ >> include/hw/i386/x86.h | 3 +++ >> target/i386/kvm/kvm.c | 58 ++++++++++++++++++++++++++++--------------- >> 3 files changed, 65 insertions(+), 20 deletions(-) >> >> diff --git a/hw/i386/x86.c b/hw/i386/x86.c >> index b84840a1bb..25e6c50b1e 100644 >> --- a/hw/i386/x86.c >> +++ b/hw/i386/x86.c >> @@ -1309,6 +1309,23 @@ static void machine_set_sgx_epc(Object *obj, Visitor *v, const char *name, >> qapi_free_SgxEPCList(list); >> } >> >> +static void x86_machine_get_notify_window(Object *obj, Visitor *v, >> + const char *name, void *opaque, Error **errp) >> +{ >> + X86MachineState *x86ms = X86_MACHINE(obj); >> + int32_t notify_window = x86ms->notify_window; >> + >> + visit_type_int32(v, name, ¬ify_window, errp); >> +} >> + >> +static void x86_machine_set_notify_window(Object *obj, Visitor *v, >> + const char *name, void *opaque, Error **errp) >> +{ >> + X86MachineState *x86ms = X86_MACHINE(obj); >> + >> + visit_type_int32(v, name, &x86ms->notify_window, errp); >> +} >> + >> static void x86_machine_initfn(Object *obj) >> { >> X86MachineState *x86ms = X86_MACHINE(obj); >> @@ -1319,6 +1336,7 @@ static void x86_machine_initfn(Object *obj) >> x86ms->oem_id = g_strndup(ACPI_BUILD_APPNAME6, 6); >> x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8); >> x86ms->bus_lock_ratelimit = 0; >> + x86ms->notify_window = -1; >> } > > IIUC from the kernel patch, this negative value leaves the protection > disabled, and thus the host remains vulnerable to the CVE. I would > expect this ought to set a suitable default value to fix the flaw. > Hum, I missed some explanation in commit message. We had some discussion about the default behavior of this feature. There are some concerns. e.g. There's a possibility, however small, that a notify VM exit happens with VM_CONTEXT_INVALID set in exit qualification, which means VM context is corrupted. To avoid the false positive and a well-behaved guest gets killed, we decide to make this feature opt-in. > Regards, > Daniel
On Thu, Mar 10, 2022 at 05:53:05PM +0800, Chenyi Qiang wrote: > > > On 3/10/2022 5:17 PM, Daniel P. Berrangé wrote: > > On Thu, Mar 10, 2022 at 05:02:05PM +0800, Chenyi Qiang wrote: > > > There are cases that malicious virtual machine can cause CPU stuck (due > > > to event windows don't open up), e.g., infinite loop in microcode when > > > nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and > > > IRQ) can be delivered. It leads the CPU to be unavailable to host or > > > other VMs. Notify VM exit is introduced to mitigate such kind of > > > attacks, which will generate a VM exit if no event window occurs in VM > > > non-root mode for a specified amount of time (notify window). > > > > > > A new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT is exposed to user space > > > so that the user can query the capability and set the expected notify > > > window when creating VMs. > > > > > > If notify VM exit happens with VM_INVALID_CONTEXT, hypervisor should > > > exit to user space with the exit reason KVM_EXIT_NOTIFY to inform the > > > fatal case. Then user space can inject a SHUTDOWN event to the target > > > vcpu. This is implemented by defining a new bit in flags field of > > > kvm_vcpu_event in KVM_SET_VCPU_EVENTS ioctl. > > > > > > Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> > > > --- > > > hw/i386/x86.c | 24 ++++++++++++++++++ > > > include/hw/i386/x86.h | 3 +++ > > > target/i386/kvm/kvm.c | 58 ++++++++++++++++++++++++++++--------------- > > > 3 files changed, 65 insertions(+), 20 deletions(-) > > > > > > diff --git a/hw/i386/x86.c b/hw/i386/x86.c > > > index b84840a1bb..25e6c50b1e 100644 > > > --- a/hw/i386/x86.c > > > +++ b/hw/i386/x86.c > > > @@ -1309,6 +1309,23 @@ static void machine_set_sgx_epc(Object *obj, Visitor *v, const char *name, > > > qapi_free_SgxEPCList(list); > > > } > > > +static void x86_machine_get_notify_window(Object *obj, Visitor *v, > > > + const char *name, void *opaque, Error **errp) > > > +{ > > > + X86MachineState *x86ms = X86_MACHINE(obj); > > > + int32_t notify_window = x86ms->notify_window; > > > + > > > + visit_type_int32(v, name, ¬ify_window, errp); > > > +} > > > + > > > +static void x86_machine_set_notify_window(Object *obj, Visitor *v, > > > + const char *name, void *opaque, Error **errp) > > > +{ > > > + X86MachineState *x86ms = X86_MACHINE(obj); > > > + > > > + visit_type_int32(v, name, &x86ms->notify_window, errp); > > > +} > > > + > > > static void x86_machine_initfn(Object *obj) > > > { > > > X86MachineState *x86ms = X86_MACHINE(obj); > > > @@ -1319,6 +1336,7 @@ static void x86_machine_initfn(Object *obj) > > > x86ms->oem_id = g_strndup(ACPI_BUILD_APPNAME6, 6); > > > x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8); > > > x86ms->bus_lock_ratelimit = 0; > > > + x86ms->notify_window = -1; > > > } > > > > IIUC from the kernel patch, this negative value leaves the protection > > disabled, and thus the host remains vulnerable to the CVE. I would > > expect this ought to set a suitable default value to fix the flaw. > > > > Hum, I missed some explanation in commit message. > We had some discussion about the default behavior of this feature. There are > some concerns. e.g. > There's a possibility, however small, that a notify VM exit happens > with VM_CONTEXT_INVALID set in exit qualification, which means VM > context is corrupted. To avoid the false positive and a well-behaved > guest gets killed, we decide to make this feature opt-in. That is unfortunate. It is not going to be much benefit to add this feature, if users are discouraged from using it because of the danger of it killing non-malicious guests :-( Regards, Daniel
diff --git a/hw/i386/x86.c b/hw/i386/x86.c index b84840a1bb..25e6c50b1e 100644 --- a/hw/i386/x86.c +++ b/hw/i386/x86.c @@ -1309,6 +1309,23 @@ static void machine_set_sgx_epc(Object *obj, Visitor *v, const char *name, qapi_free_SgxEPCList(list); } +static void x86_machine_get_notify_window(Object *obj, Visitor *v, + const char *name, void *opaque, Error **errp) +{ + X86MachineState *x86ms = X86_MACHINE(obj); + int32_t notify_window = x86ms->notify_window; + + visit_type_int32(v, name, ¬ify_window, errp); +} + +static void x86_machine_set_notify_window(Object *obj, Visitor *v, + const char *name, void *opaque, Error **errp) +{ + X86MachineState *x86ms = X86_MACHINE(obj); + + visit_type_int32(v, name, &x86ms->notify_window, errp); +} + static void x86_machine_initfn(Object *obj) { X86MachineState *x86ms = X86_MACHINE(obj); @@ -1319,6 +1336,7 @@ static void x86_machine_initfn(Object *obj) x86ms->oem_id = g_strndup(ACPI_BUILD_APPNAME6, 6); x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8); x86ms->bus_lock_ratelimit = 0; + x86ms->notify_window = -1; } static void x86_machine_class_init(ObjectClass *oc, void *data) @@ -1375,6 +1393,12 @@ static void x86_machine_class_init(ObjectClass *oc, void *data) NULL, NULL); object_class_property_set_description(oc, "sgx-epc", "SGX EPC device"); + + object_class_property_add(oc, X86_MACHINE_NOTIFY_WINDOW, "int32_t", + x86_machine_get_notify_window, + x86_machine_set_notify_window, NULL, NULL); + object_class_property_set_description(oc, X86_MACHINE_NOTIFY_WINDOW, + "Set the notify window required by notify VM exit"); } static const TypeInfo x86_machine_info = { diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h index a145a30370..2a4ca21a94 100644 --- a/include/hw/i386/x86.h +++ b/include/hw/i386/x86.h @@ -82,6 +82,8 @@ struct X86MachineState { * which means no limitation on the guest's bus locks. */ uint64_t bus_lock_ratelimit; + + int32_t notify_window; }; #define X86_MACHINE_SMM "smm" @@ -89,6 +91,7 @@ struct X86MachineState { #define X86_MACHINE_OEM_ID "x-oem-id" #define X86_MACHINE_OEM_TABLE_ID "x-oem-table-id" #define X86_MACHINE_BUS_LOCK_RATELIMIT "bus-lock-ratelimit" +#define X86_MACHINE_NOTIFY_WINDOW "notify-window" #define TYPE_X86_MACHINE MACHINE_TYPE_NAME("x86") OBJECT_DECLARE_TYPE(X86MachineState, X86MachineClass, X86_MACHINE) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 83d0988302..65ad370652 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -2299,6 +2299,10 @@ int kvm_arch_init(MachineState *ms, KVMState *s) int ret; struct utsname utsname; Error *local_err = NULL; + X86MachineState *x86ms; + + assert(object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE)); + x86ms = X86_MACHINE(ms); /* * Initialize SEV context, if required @@ -2394,8 +2398,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s) } if (kvm_check_extension(s, KVM_CAP_X86_SMM) && - object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE) && - x86_machine_is_smm_enabled(X86_MACHINE(ms))) { + x86_machine_is_smm_enabled(x86ms)) { smram_machine_done.notify = register_smram_listener; qemu_add_machine_init_done_notifier(&smram_machine_done); } @@ -2423,25 +2426,31 @@ int kvm_arch_init(MachineState *ms, KVMState *s) } } - if (object_dynamic_cast(OBJECT(ms), TYPE_X86_MACHINE)) { - X86MachineState *x86ms = X86_MACHINE(ms); + if (x86ms->bus_lock_ratelimit > 0) { + ret = kvm_check_extension(s, KVM_CAP_X86_BUS_LOCK_EXIT); + if (!(ret & KVM_BUS_LOCK_DETECTION_EXIT)) { + error_report("kvm: bus lock detection unsupported"); + return -ENOTSUP; + } + ret = kvm_vm_enable_cap(s, KVM_CAP_X86_BUS_LOCK_EXIT, 0, + KVM_BUS_LOCK_DETECTION_EXIT); + if (ret < 0) { + error_report("kvm: Failed to enable bus lock detection cap: %s", + strerror(-ret)); + return ret; + } + ratelimit_init(&bus_lock_ratelimit_ctrl); + ratelimit_set_speed(&bus_lock_ratelimit_ctrl, + x86ms->bus_lock_ratelimit, BUS_LOCK_SLICE_TIME); + } - if (x86ms->bus_lock_ratelimit > 0) { - ret = kvm_check_extension(s, KVM_CAP_X86_BUS_LOCK_EXIT); - if (!(ret & KVM_BUS_LOCK_DETECTION_EXIT)) { - error_report("kvm: bus lock detection unsupported"); - return -ENOTSUP; - } - ret = kvm_vm_enable_cap(s, KVM_CAP_X86_BUS_LOCK_EXIT, 0, - KVM_BUS_LOCK_DETECTION_EXIT); - if (ret < 0) { - error_report("kvm: Failed to enable bus lock detection cap: %s", - strerror(-ret)); - return ret; - } - ratelimit_init(&bus_lock_ratelimit_ctrl); - ratelimit_set_speed(&bus_lock_ratelimit_ctrl, - x86ms->bus_lock_ratelimit, BUS_LOCK_SLICE_TIME); + if (kvm_check_extension(s, KVM_CAP_X86_NOTIFY_VMEXIT)) { + ret = kvm_vm_enable_cap(s, KVM_CAP_X86_NOTIFY_VMEXIT, 0, + x86ms->notify_window); + if (ret < 0) { + error_report("kvm: Failed to enable notify vmexit cap: %s", + strerror(-ret)); + return ret; } } @@ -4856,6 +4865,7 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run) X86CPU *cpu = X86_CPU(cs); uint64_t code; int ret; + struct kvm_vcpu_events events = {}; switch (run->exit_reason) { case KVM_EXIT_HLT: @@ -4911,6 +4921,14 @@ int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run) /* already handled in kvm_arch_post_run */ ret = 0; break; + case KVM_EXIT_NOTIFY: + ret = 0; + if (run->notify.data & KVM_NOTIFY_CONTEXT_INVALID) { + warn_report("KVM: invalid context due to notify vmexit"); + events.flags |= KVM_VCPUEVENT_SHUTDOWN; + ret = kvm_vcpu_ioctl(cs, KVM_SET_VCPU_EVENTS, &events); + } + break; default: fprintf(stderr, "KVM: unknown exit reason %d\n", run->exit_reason); ret = -1;
There are cases that malicious virtual machine can cause CPU stuck (due to event windows don't open up), e.g., infinite loop in microcode when nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and IRQ) can be delivered. It leads the CPU to be unavailable to host or other VMs. Notify VM exit is introduced to mitigate such kind of attacks, which will generate a VM exit if no event window occurs in VM non-root mode for a specified amount of time (notify window). A new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT is exposed to user space so that the user can query the capability and set the expected notify window when creating VMs. If notify VM exit happens with VM_INVALID_CONTEXT, hypervisor should exit to user space with the exit reason KVM_EXIT_NOTIFY to inform the fatal case. Then user space can inject a SHUTDOWN event to the target vcpu. This is implemented by defining a new bit in flags field of kvm_vcpu_event in KVM_SET_VCPU_EVENTS ioctl. Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> --- hw/i386/x86.c | 24 ++++++++++++++++++ include/hw/i386/x86.h | 3 +++ target/i386/kvm/kvm.c | 58 ++++++++++++++++++++++++++++--------------- 3 files changed, 65 insertions(+), 20 deletions(-)