diff mbox

[2/2] KVM: nVMX: postpone VMCS changes on MSR_IA32_APICBASE write

Message ID 20160808181623.12132-3-rkrcmar@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Radim Krčmář Aug. 8, 2016, 6:16 p.m. UTC
If vmcs12 does not intercept APIC_BASE writes, then KVM will handle the
write with vmcs02 as the current VMCS.
This will incorrectly apply modifications intended for vmcs01 to vmcs02
and L2 can use it to gain access to L0's x2APIC registers by disabling
virtualized x2APIC while using msr bitmap that assumes enabled.

Postpone execution of vmx_set_virtual_x2apic_mode until vmcs01 is the
current VMCS.  An alternative solution would temporarily make vmcs01 the
current VMCS, but it requires more care.

Fixes: 8d14695f9542 ("x86, apicv: add virtual x2apic support")
Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
---
 arch/x86/kvm/vmx.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Wanpeng Li Aug. 12, 2016, 6:07 a.m. UTC | #1
2016-08-09 2:16 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> If vmcs12 does not intercept APIC_BASE writes, then KVM will handle the
> write with vmcs02 as the current VMCS.
> This will incorrectly apply modifications intended for vmcs01 to vmcs02
> and L2 can use it to gain access to L0's x2APIC registers by disabling
> virtualized x2APIC while using msr bitmap that assumes enabled.
>
> Postpone execution of vmx_set_virtual_x2apic_mode until vmcs01 is the
> current VMCS.  An alternative solution would temporarily make vmcs01 the
> current VMCS, but it requires more care.

There is a scenario both L1 and L2 are running on x2apic mode, L1
don't own the APIC_BASE writes, then L2 is intended to disable x2apic
mode, however, your logic will also disable x2apic mode for L1.

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář Aug. 12, 2016, 9:44 a.m. UTC | #2
2016-08-12 14:07+0800, Wanpeng Li:
> 2016-08-09 2:16 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> If vmcs12 does not intercept APIC_BASE writes, then KVM will handle the
>> write with vmcs02 as the current VMCS.
>> This will incorrectly apply modifications intended for vmcs01 to vmcs02
>> and L2 can use it to gain access to L0's x2APIC registers by disabling
>> virtualized x2APIC while using msr bitmap that assumes enabled.
>>
>> Postpone execution of vmx_set_virtual_x2apic_mode until vmcs01 is the
>> current VMCS.  An alternative solution would temporarily make vmcs01 the
>> current VMCS, but it requires more care.
> 
> There is a scenario both L1 and L2 are running on x2apic mode, L1
> don't own the APIC_BASE writes, then L2 is intended to disable x2apic
> mode, however, your logic will also disable x2apic mode for L1.

You mean a case where L1 does intercept APIC_BASE?

That case is not affected, because it should cause a nested VM exit, so
vmx_set_virtual_x2apic_mode() won't be called in the first place.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wanpeng Li Aug. 12, 2016, 10:14 a.m. UTC | #3
2016-08-12 17:44 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2016-08-12 14:07+0800, Wanpeng Li:
>> 2016-08-09 2:16 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>>> If vmcs12 does not intercept APIC_BASE writes, then KVM will handle the
>>> write with vmcs02 as the current VMCS.
>>> This will incorrectly apply modifications intended for vmcs01 to vmcs02
>>> and L2 can use it to gain access to L0's x2APIC registers by disabling
>>> virtualized x2APIC while using msr bitmap that assumes enabled.
>>>
>>> Postpone execution of vmx_set_virtual_x2apic_mode until vmcs01 is the
>>> current VMCS.  An alternative solution would temporarily make vmcs01 the
>>> current VMCS, but it requires more care.
>>
>> There is a scenario both L1 and L2 are running on x2apic mode, L1
>> don't own the APIC_BASE writes, then L2 is intended to disable x2apic
>> mode, however, your logic will also disable x2apic mode for L1.
>
> You mean a case where L1 does intercept APIC_BASE?
>
> That case is not affected, because it should cause a nested VM exit, so
> vmx_set_virtual_x2apic_mode() won't be called in the first place.

I mean L1 doesn't intercept APIC_BASE.

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář Aug. 12, 2016, 11:39 a.m. UTC | #4
2016-08-12 18:14+0800, Wanpeng Li:
> 2016-08-12 17:44 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> 2016-08-12 14:07+0800, Wanpeng Li:
>>> 2016-08-09 2:16 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>>>> If vmcs12 does not intercept APIC_BASE writes, then KVM will handle the
>>>> write with vmcs02 as the current VMCS.
>>>> This will incorrectly apply modifications intended for vmcs01 to vmcs02
>>>> and L2 can use it to gain access to L0's x2APIC registers by disabling
>>>> virtualized x2APIC while using msr bitmap that assumes enabled.
>>>>
>>>> Postpone execution of vmx_set_virtual_x2apic_mode until vmcs01 is the
>>>> current VMCS.  An alternative solution would temporarily make vmcs01 the
>>>> current VMCS, but it requires more care.
>>>
>>> There is a scenario both L1 and L2 are running on x2apic mode, L1
>>> don't own the APIC_BASE writes, then L2 is intended to disable x2apic
>>> mode, however, your logic will also disable x2apic mode for L1.
>>
>> You mean a case where L1 does intercept APIC_BASE?
>>
>> That case is not affected, because it should cause a nested VM exit, so
>> vmx_set_virtual_x2apic_mode() won't be called in the first place.
> 
> I mean L1 doesn't intercept APIC_BASE.

Then L2's write to APIC_BASE should only affect L1.
L2 is buggy if it intended to disable its x2APIC with the write
or L1 set up intercepts incorrectly for the indented L2.

In the non-nested case, if we didn't intercept APIC_BASE in KVM, then
the guest wouldn't change either;  only the host would change, so I
think it is correct to disable x2APIC mode in L1 only.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wanpeng Li Aug. 15, 2016, 5:19 a.m. UTC | #5
2016-08-12 19:39 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2016-08-12 18:14+0800, Wanpeng Li:
>> 2016-08-12 17:44 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>>> 2016-08-12 14:07+0800, Wanpeng Li:
>>>> 2016-08-09 2:16 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>>>>> If vmcs12 does not intercept APIC_BASE writes, then KVM will handle the
>>>>> write with vmcs02 as the current VMCS.
>>>>> This will incorrectly apply modifications intended for vmcs01 to vmcs02
>>>>> and L2 can use it to gain access to L0's x2APIC registers by disabling
>>>>> virtualized x2APIC while using msr bitmap that assumes enabled.
>>>>>
>>>>> Postpone execution of vmx_set_virtual_x2apic_mode until vmcs01 is the
>>>>> current VMCS.  An alternative solution would temporarily make vmcs01 the
>>>>> current VMCS, but it requires more care.
>>>>
>>>> There is a scenario both L1 and L2 are running on x2apic mode, L1
>>>> don't own the APIC_BASE writes, then L2 is intended to disable x2apic
>>>> mode, however, your logic will also disable x2apic mode for L1.
>>>
>>> You mean a case where L1 does intercept APIC_BASE?
>>>
>>> That case is not affected, because it should cause a nested VM exit, so
>>> vmx_set_virtual_x2apic_mode() won't be called in the first place.
>>
>> I mean L1 doesn't intercept APIC_BASE.
>
> Then L2's write to APIC_BASE should only affect L1.
> L2 is buggy if it intended to disable its x2APIC with the write
> or L1 set up intercepts incorrectly for the indented L2.

Do you mean OS disable x2APIC during its running is buggy?

> In the non-nested case, if we didn't intercept APIC_BASE in KVM, then
> the guest wouldn't change either;  only the host would change, so I
> think it is correct to disable x2APIC mode in L1 only.

Agreed. :)

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář Aug. 15, 2016, 2:31 p.m. UTC | #6
2016-08-15 13:19+0800, Wanpeng Li:
> 2016-08-12 19:39 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> 2016-08-12 18:14+0800, Wanpeng Li:
>>> 2016-08-12 17:44 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>>>> 2016-08-12 14:07+0800, Wanpeng Li:
>>>>> 2016-08-09 2:16 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>>>>>> If vmcs12 does not intercept APIC_BASE writes, then KVM will handle the
>>>>>> write with vmcs02 as the current VMCS.
>>>>>> This will incorrectly apply modifications intended for vmcs01 to vmcs02
>>>>>> and L2 can use it to gain access to L0's x2APIC registers by disabling
>>>>>> virtualized x2APIC while using msr bitmap that assumes enabled.
>>>>>>
>>>>>> Postpone execution of vmx_set_virtual_x2apic_mode until vmcs01 is the
>>>>>> current VMCS.  An alternative solution would temporarily make vmcs01 the
>>>>>> current VMCS, but it requires more care.
>>>>>
>>>>> There is a scenario both L1 and L2 are running on x2apic mode, L1
>>>>> don't own the APIC_BASE writes, then L2 is intended to disable x2apic
>>>>> mode, however, your logic will also disable x2apic mode for L1.
>>>>
>>>> You mean a case where L1 does intercept APIC_BASE?
>>>>
>>>> That case is not affected, because it should cause a nested VM exit, so
>>>> vmx_set_virtual_x2apic_mode() won't be called in the first place.
>>>
>>> I mean L1 doesn't intercept APIC_BASE.
>>
>> Then L2's write to APIC_BASE should only affect L1.
>> L2 is buggy if it intended to disable its x2APIC with the write
>> or L1 set up intercepts incorrectly for the indented L2.
> 
> Do you mean OS disable x2APIC during its running is buggy?

Not in general, but if L1 doesn't intercept APIC_BASE and L2 writes to
it in order to disable its (L2's) x2APIC, then there is a bug in L2 or
L1.

If L1 intended to intercept, then it's a clear L1 bug, otherwise L2
should have known that L1 is a special hypervisor that doesn't intercept
APIC_BASE and the bug is on L2 side or on the user that ran unsuspecting
L2 on that L1.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Wanpeng Li Aug. 16, 2016, 2:43 a.m. UTC | #7
2016-08-09 2:16 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> If vmcs12 does not intercept APIC_BASE writes, then KVM will handle the
> write with vmcs02 as the current VMCS.
> This will incorrectly apply modifications intended for vmcs01 to vmcs02
> and L2 can use it to gain access to L0's x2APIC registers by disabling
> virtualized x2APIC while using msr bitmap that assumes enabled.
>
> Postpone execution of vmx_set_virtual_x2apic_mode until vmcs01 is the
> current VMCS.  An alternative solution would temporarily make vmcs01 the
> current VMCS, but it requires more care.
>
> Fixes: 8d14695f9542 ("x86, apicv: add virtual x2apic support")
> Reported-by: Jim Mattson <jmattson@google.com>
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>

Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>

> ---
>  arch/x86/kvm/vmx.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c66ac2c70d22..ae111a07acc4 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -422,6 +422,7 @@ struct nested_vmx {
>         struct list_head vmcs02_pool;
>         int vmcs02_num;
>         u64 vmcs01_tsc_offset;
> +       bool change_vmcs01_virtual_x2apic_mode;
>         /* L2 must run next, and mustn't decide to exit to L1. */
>         bool nested_run_pending;
>         /*
> @@ -8424,6 +8425,12 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
>  {
>         u32 sec_exec_control;
>
> +       /* Postpone execution until vmcs01 is the current VMCS. */
> +       if (is_guest_mode(vcpu)) {
> +               to_vmx(vcpu)->nested.change_vmcs01_virtual_x2apic_mode = true;
> +               return;
> +       }
> +
>         /*
>          * There is not point to enable virtualize x2apic without enable
>          * apicv
> @@ -10749,6 +10756,12 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
>                 vmcs_set_bits(PIN_BASED_VM_EXEC_CONTROL,
>                               PIN_BASED_VMX_PREEMPTION_TIMER);
>
> +       if (vmx->nested.change_vmcs01_virtual_x2apic_mode) {
> +               vmx->nested.change_vmcs01_virtual_x2apic_mode = false;
> +               vmx_set_virtual_x2apic_mode(vcpu,
> +                               vcpu->arch.apic_base & X2APIC_ENABLE);
> +       }
> +
>         /* This is needed for same reason as it was needed in prepare_vmcs02 */
>         vmx->host_rsp = 0;
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c66ac2c70d22..ae111a07acc4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -422,6 +422,7 @@  struct nested_vmx {
 	struct list_head vmcs02_pool;
 	int vmcs02_num;
 	u64 vmcs01_tsc_offset;
+	bool change_vmcs01_virtual_x2apic_mode;
 	/* L2 must run next, and mustn't decide to exit to L1. */
 	bool nested_run_pending;
 	/*
@@ -8424,6 +8425,12 @@  static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
 {
 	u32 sec_exec_control;
 
+	/* Postpone execution until vmcs01 is the current VMCS. */
+	if (is_guest_mode(vcpu)) {
+		to_vmx(vcpu)->nested.change_vmcs01_virtual_x2apic_mode = true;
+		return;
+	}
+
 	/*
 	 * There is not point to enable virtualize x2apic without enable
 	 * apicv
@@ -10749,6 +10756,12 @@  static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
 		vmcs_set_bits(PIN_BASED_VM_EXEC_CONTROL,
 			      PIN_BASED_VMX_PREEMPTION_TIMER);
 
+	if (vmx->nested.change_vmcs01_virtual_x2apic_mode) {
+		vmx->nested.change_vmcs01_virtual_x2apic_mode = false;
+		vmx_set_virtual_x2apic_mode(vcpu,
+				vcpu->arch.apic_base & X2APIC_ENABLE);
+	}
+
 	/* This is needed for same reason as it was needed in prepare_vmcs02 */
 	vmx->host_rsp = 0;