diff mbox

[v2] KVM: nVMX: Fix attempting to emulate "Acknowledge interrupt on exit" when there is no interrupt which L1 requires to inject to L2

Message ID 20170801195859.GB1437@flask (mailing list archive)
State New, archived
Headers show

Commit Message

Radim Krčmář Aug. 1, 2017, 7:59 p.m. UTC
2017-07-31 19:25-0700, Wanpeng Li:
> From: Wanpeng Li <wanpeng.li@hotmail.com>
> 
> ------------[ cut here ]------------
>  WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
>  CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7
>  RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
> Call Trace:
>   vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
>   ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
>   kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm]
>   ? vmx_vcpu_load+0x1be/0x220 [kvm_intel]
>   ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
>   kvm_vcpu_ioctl+0x340/0x700 [kvm]
>   ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
>   ? __fget+0xfc/0x210
>   do_vfs_ioctl+0xa4/0x6a0
>   ? __fget+0x11d/0x210
>   SyS_ioctl+0x79/0x90
>   do_syscall_64+0x8f/0x750
>   ? trace_hardirqs_on_thunk+0x1a/0x1c
>   entry_SYSCALL64_slow_path+0x25/0x25
> 
> This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which 
> means that tells the kernel to not make use of any IOAPICs that may be present 
> in the system.
> 
> Actually external_intr variable in nested_vmx_vmexit() is the req_int_win 
> variable passed from vcpu_enter_guest() which means that the L0's userspace 
> requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) &&
> L0's userspace reqeusts an irq window) is true, so there is no interrupt which 
> L1 requires to inject to L2, we should not attempt to emualte "Acknowledge 
> interrupt on exit" for the irq window requirement in this scenario.
> 
> This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit"  
> if there is no L1 requirement to inject an interrupt to L2.
> 
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
> ---
> v1 -> v2:
>  * update patch description
>  * check nested_exit_intr_ack_set() first 
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> @@ -11118,8 +11118,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
>  
>  	vmx_switch_vmcs(vcpu, &vmx->vmcs01);
>  
> -	if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
> -	    && nested_exit_intr_ack_set(vcpu)) {
> +	if (nested_exit_intr_ack_set(vcpu) &&
> +		exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT &&
> +		kvm_cpu_has_interrupt(vcpu)) {

This would work as a solution, but I don't think it's the correct
behavior.

SDM says that with acknowledge interrupt on exit, bit 31 of the VM-exit
interrupt information (valid interrupt) is always set to 1 on
EXIT_REASON_EXTERNAL_INTERRUPT.  We don't want to break hypervisors
expecting an interrupt in that case, so we should do a userspace VM exit
when the window is open and then inject the userspace interrupt with a
VM exit.

The simplest thing that came to my mind is to:


(It doesn't prevent malicious userspace from hitting the WARN, though.)

Thanks.

Comments

Wanpeng Li Aug. 1, 2017, 10:42 p.m. UTC | #1
2017-08-02 3:59 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
> 2017-07-31 19:25-0700, Wanpeng Li:
>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>
>> ------------[ cut here ]------------
>>  WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
>>  CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7
>>  RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
>> Call Trace:
>>   vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
>>   ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
>>   kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm]
>>   ? vmx_vcpu_load+0x1be/0x220 [kvm_intel]
>>   ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
>>   kvm_vcpu_ioctl+0x340/0x700 [kvm]
>>   ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
>>   ? __fget+0xfc/0x210
>>   do_vfs_ioctl+0xa4/0x6a0
>>   ? __fget+0x11d/0x210
>>   SyS_ioctl+0x79/0x90
>>   do_syscall_64+0x8f/0x750
>>   ? trace_hardirqs_on_thunk+0x1a/0x1c
>>   entry_SYSCALL64_slow_path+0x25/0x25
>>
>> This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which
>> means that tells the kernel to not make use of any IOAPICs that may be present
>> in the system.
>>
>> Actually external_intr variable in nested_vmx_vmexit() is the req_int_win
>> variable passed from vcpu_enter_guest() which means that the L0's userspace
>> requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) &&
>> L0's userspace reqeusts an irq window) is true, so there is no interrupt which
>> L1 requires to inject to L2, we should not attempt to emualte "Acknowledge
>> interrupt on exit" for the irq window requirement in this scenario.
>>
>> This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit"
>> if there is no L1 requirement to inject an interrupt to L2.
>>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> ---
>> v1 -> v2:
>>  * update patch description
>>  * check nested_exit_intr_ack_set() first
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> @@ -11118,8 +11118,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
>>
>>       vmx_switch_vmcs(vcpu, &vmx->vmcs01);
>>
>> -     if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
>> -         && nested_exit_intr_ack_set(vcpu)) {
>> +     if (nested_exit_intr_ack_set(vcpu) &&
>> +             exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT &&
>> +             kvm_cpu_has_interrupt(vcpu)) {
>
> This would work as a solution, but I don't think it's the correct
> behavior.
>
> SDM says that with acknowledge interrupt on exit, bit 31 of the VM-exit
> interrupt information (valid interrupt) is always set to 1 on
> EXIT_REASON_EXTERNAL_INTERRUPT.  We don't want to break hypervisors
> expecting an interrupt in that case, so we should do a userspace VM exit
> when the window is open and then inject the userspace interrupt with a
> VM exit.

Agreed.

>
> The simplest thing that came to my mind is to:
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 39a6222bf968..9ad0c882c4f5 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -10687,7 +10687,8 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>                 return 0;
>         }
>
> -       if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
> +       if ((kvm_cpu_has_interrupt(vcpu) ||
> +            (external_intr && !nested_exit_intr_ack_set(vcpu))) &&
>             nested_exit_on_intr(vcpu)) {
>                 if (vmx->nested.nested_run_pending)
>                         return -EBUSY;
>

Agreed.

> but I think it could break more ... actually, why was the window closed?
>
> kvm_vcpu_ready_for_interrupt_injection() checks vmx_interrupt_allowed()
> in order to decide need for the window, but vmx_check_nested_events()
> doesn't care about that at all, so the window might just appear closed.
> Would the following hunk help too?

In addition, the request window can be requested by L0's userspace
(kvm_arch_pre_run), and the idea below still can't fix in my testing.

Regards,
Wanpeng Li

>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 39a6222bf968..7e6caa9c225d 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5567,8 +5567,10 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
>
>  static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
>  {
> -       return (!to_vmx(vcpu)->nested.nested_run_pending &&
> -               vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
> +       if (is_guest_mode(vcpu))
> +               return !to_vmx(vcpu)->nested.nested_run_pending;
> +
> +       return vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF &&
>                 !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
>                         (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS));
>  }
>
> (It doesn't prevent malicious userspace from hitting the WARN, though.)
>
> Thanks.
Wanpeng Li Aug. 2, 2017, 8:05 a.m. UTC | #2
2017-08-02 6:42 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>:
> 2017-08-02 3:59 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>:
>> 2017-07-31 19:25-0700, Wanpeng Li:
>>> From: Wanpeng Li <wanpeng.li@hotmail.com>
>>>
>>> ------------[ cut here ]------------
>>>  WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
>>>  CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7
>>>  RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
>>> Call Trace:
>>>   vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
>>>   ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
>>>   kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm]
>>>   ? vmx_vcpu_load+0x1be/0x220 [kvm_intel]
>>>   ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
>>>   kvm_vcpu_ioctl+0x340/0x700 [kvm]
>>>   ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
>>>   ? __fget+0xfc/0x210
>>>   do_vfs_ioctl+0xa4/0x6a0
>>>   ? __fget+0x11d/0x210
>>>   SyS_ioctl+0x79/0x90
>>>   do_syscall_64+0x8f/0x750
>>>   ? trace_hardirqs_on_thunk+0x1a/0x1c
>>>   entry_SYSCALL64_slow_path+0x25/0x25
>>>
>>> This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which
>>> means that tells the kernel to not make use of any IOAPICs that may be present
>>> in the system.
>>>
>>> Actually external_intr variable in nested_vmx_vmexit() is the req_int_win
>>> variable passed from vcpu_enter_guest() which means that the L0's userspace
>>> requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) &&
>>> L0's userspace reqeusts an irq window) is true, so there is no interrupt which
>>> L1 requires to inject to L2, we should not attempt to emualte "Acknowledge
>>> interrupt on exit" for the irq window requirement in this scenario.
>>>
>>> This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit"
>>> if there is no L1 requirement to inject an interrupt to L2.
>>>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
>>> ---
>>> v1 -> v2:
>>>  * update patch description
>>>  * check nested_exit_intr_ack_set() first
>>>
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> @@ -11118,8 +11118,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
>>>
>>>       vmx_switch_vmcs(vcpu, &vmx->vmcs01);
>>>
>>> -     if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT)
>>> -         && nested_exit_intr_ack_set(vcpu)) {
>>> +     if (nested_exit_intr_ack_set(vcpu) &&
>>> +             exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT &&
>>> +             kvm_cpu_has_interrupt(vcpu)) {
>>
>> This would work as a solution, but I don't think it's the correct
>> behavior.
>>
>> SDM says that with acknowledge interrupt on exit, bit 31 of the VM-exit
>> interrupt information (valid interrupt) is always set to 1 on
>> EXIT_REASON_EXTERNAL_INTERRUPT.  We don't want to break hypervisors
>> expecting an interrupt in that case, so we should do a userspace VM exit
>> when the window is open and then inject the userspace interrupt with a
>> VM exit.
>
> Agreed.
>
>>
>> The simplest thing that came to my mind is to:
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 39a6222bf968..9ad0c882c4f5 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -10687,7 +10687,8 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>>                 return 0;
>>         }
>>
>> -       if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
>> +       if ((kvm_cpu_has_interrupt(vcpu) ||
>> +            (external_intr && !nested_exit_intr_ack_set(vcpu))) &&
>>             nested_exit_on_intr(vcpu)) {
>>                 if (vmx->nested.nested_run_pending)
>>                         return -EBUSY;
>>
>
> Agreed.

What's your opinion, Paolo? :) Actually I considered the above idea
before, it is what SDM defined.

Regards,
Wanpeng Li

>
>> but I think it could break more ... actually, why was the window closed?
>>
>> kvm_vcpu_ready_for_interrupt_injection() checks vmx_interrupt_allowed()
>> in order to decide need for the window, but vmx_check_nested_events()
>> doesn't care about that at all, so the window might just appear closed.
>> Would the following hunk help too?
>
> In addition, the request window can be requested by L0's userspace
> (kvm_arch_pre_run), and the idea below still can't fix in my testing.
>
> Regards,
> Wanpeng Li
>
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 39a6222bf968..7e6caa9c225d 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -5567,8 +5567,10 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
>>
>>  static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
>>  {
>> -       return (!to_vmx(vcpu)->nested.nested_run_pending &&
>> -               vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
>> +       if (is_guest_mode(vcpu))
>> +               return !to_vmx(vcpu)->nested.nested_run_pending;
>> +
>> +       return vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF &&
>>                 !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
>>                         (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS));
>>  }
>>
>> (It doesn't prevent malicious userspace from hitting the WARN, though.)
>>
>> Thanks.
Paolo Bonzini Aug. 2, 2017, 8:13 a.m. UTC | #3
On 02/08/2017 10:05, Wanpeng Li wrote:
>>>
>>> SDM says that with acknowledge interrupt on exit, bit 31 of the VM-exit
>>> interrupt information (valid interrupt) is always set to 1 on
>>> EXIT_REASON_EXTERNAL_INTERRUPT.  We don't want to break hypervisors
>>> expecting an interrupt in that case, so we should do a userspace VM exit
>>> when the window is open and then inject the userspace interrupt with a
>>> VM exit.
>> Agreed.
>>
>>> The simplest thing that came to my mind is to:
>>>
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index 39a6222bf968..9ad0c882c4f5 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -10687,7 +10687,8 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
>>>                 return 0;
>>>         }
>>>
>>> -       if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
>>> +       if ((kvm_cpu_has_interrupt(vcpu) ||
>>> +            (external_intr && !nested_exit_intr_ack_set(vcpu))) &&
>>>             nested_exit_on_intr(vcpu)) {
>>>                 if (vmx->nested.nested_run_pending)
>>>                         return -EBUSY;
>>>
>> Agreed.
>
> What's your opinion, Paolo? :) Actually I considered the above idea
> before, it is what SDM defined.

Radim and I always agree. :)

Paolo
diff mbox

Patch

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 39a6222bf968..9ad0c882c4f5 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -10687,7 +10687,8 @@  static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr)
 		return 0;
 	}
 
-	if ((kvm_cpu_has_interrupt(vcpu) || external_intr) &&
+	if ((kvm_cpu_has_interrupt(vcpu) ||
+	     (external_intr && !nested_exit_intr_ack_set(vcpu))) &&
 	    nested_exit_on_intr(vcpu)) {
 		if (vmx->nested.nested_run_pending)
 			return -EBUSY;

but I think it could break more ... actually, why was the window closed?

kvm_vcpu_ready_for_interrupt_injection() checks vmx_interrupt_allowed()
in order to decide need for the window, but vmx_check_nested_events()
doesn't care about that at all, so the window might just appear closed.
Would the following hunk help too?

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 39a6222bf968..7e6caa9c225d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5567,8 +5567,10 @@  static int vmx_nmi_allowed(struct kvm_vcpu *vcpu)
 
 static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
 {
-	return (!to_vmx(vcpu)->nested.nested_run_pending &&
-		vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
+	if (is_guest_mode(vcpu))
+		return !to_vmx(vcpu)->nested.nested_run_pending;
+
+	return vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF &&
 		!(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
 			(GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS));
 }