Message ID | 1501554327-3608-1-git-send-email-wanpeng.li@hotmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
2017-07-31 19:25-0700, Wanpeng Li: > From: Wanpeng Li <wanpeng.li@hotmail.com> > > ------------[ cut here ]------------ > WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel] > CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7 > RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel] > Call Trace: > vmx_check_nested_events+0x131/0x1f0 [kvm_intel] > ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel] > kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm] > ? vmx_vcpu_load+0x1be/0x220 [kvm_intel] > ? kvm_arch_vcpu_load+0x62/0x230 [kvm] > kvm_vcpu_ioctl+0x340/0x700 [kvm] > ? kvm_vcpu_ioctl+0x340/0x700 [kvm] > ? __fget+0xfc/0x210 > do_vfs_ioctl+0xa4/0x6a0 > ? __fget+0x11d/0x210 > SyS_ioctl+0x79/0x90 > do_syscall_64+0x8f/0x750 > ? trace_hardirqs_on_thunk+0x1a/0x1c > entry_SYSCALL64_slow_path+0x25/0x25 > > This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which > means that tells the kernel to not make use of any IOAPICs that may be present > in the system. > > Actually external_intr variable in nested_vmx_vmexit() is the req_int_win > variable passed from vcpu_enter_guest() which means that the L0's userspace > requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) && > L0's userspace reqeusts an irq window) is true, so there is no interrupt which > L1 requires to inject to L2, we should not attempt to emualte "Acknowledge > interrupt on exit" for the irq window requirement in this scenario. > > This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit" > if there is no L1 requirement to inject an interrupt to L2. > > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Radim Krčmář <rkrcmar@redhat.com> > Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> > --- > v1 -> v2: > * update patch description > * check nested_exit_intr_ack_set() first > > arch/x86/kvm/vmx.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 2737343..c5a0ab5 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -11118,8 +11118,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, > > vmx_switch_vmcs(vcpu, &vmx->vmcs01); > > - if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT) > - && nested_exit_intr_ack_set(vcpu)) { I've added a TODO comment so it's clearer that we should not be here if there is no interrupt. > + if (nested_exit_intr_ack_set(vcpu) && > + exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT && > + kvm_cpu_has_interrupt(vcpu)) { > int irq = kvm_cpu_get_interrupt(vcpu); > WARN_ON(irq < 0); > vmcs12->vm_exit_intr_info = irq | Changed the indentation to the original alignment. Please don't use 1 tab -- the condition and body meld, which makes it harder to read. (2 tabs would be ok too.) And the subject was way too long, so I changed it to KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit" Applied as it results in better behavior, even if it still is incorrect, thanks.
2017-08-03 21:46 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>: > 2017-07-31 19:25-0700, Wanpeng Li: >> From: Wanpeng Li <wanpeng.li@hotmail.com> >> >> ------------[ cut here ]------------ >> WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel] >> CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7 >> RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel] >> Call Trace: >> vmx_check_nested_events+0x131/0x1f0 [kvm_intel] >> ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel] >> kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm] >> ? vmx_vcpu_load+0x1be/0x220 [kvm_intel] >> ? kvm_arch_vcpu_load+0x62/0x230 [kvm] >> kvm_vcpu_ioctl+0x340/0x700 [kvm] >> ? kvm_vcpu_ioctl+0x340/0x700 [kvm] >> ? __fget+0xfc/0x210 >> do_vfs_ioctl+0xa4/0x6a0 >> ? __fget+0x11d/0x210 >> SyS_ioctl+0x79/0x90 >> do_syscall_64+0x8f/0x750 >> ? trace_hardirqs_on_thunk+0x1a/0x1c >> entry_SYSCALL64_slow_path+0x25/0x25 >> >> This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which >> means that tells the kernel to not make use of any IOAPICs that may be present >> in the system. >> >> Actually external_intr variable in nested_vmx_vmexit() is the req_int_win >> variable passed from vcpu_enter_guest() which means that the L0's userspace >> requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) && >> L0's userspace reqeusts an irq window) is true, so there is no interrupt which >> L1 requires to inject to L2, we should not attempt to emualte "Acknowledge >> interrupt on exit" for the irq window requirement in this scenario. >> >> This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit" >> if there is no L1 requirement to inject an interrupt to L2. >> >> Cc: Paolo Bonzini <pbonzini@redhat.com> >> Cc: Radim Krčmář <rkrcmar@redhat.com> >> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> >> --- >> v1 -> v2: >> * update patch description >> * check nested_exit_intr_ack_set() first >> >> arch/x86/kvm/vmx.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> index 2737343..c5a0ab5 100644 >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -11118,8 +11118,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, >> >> vmx_switch_vmcs(vcpu, &vmx->vmcs01); >> >> - if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT) >> - && nested_exit_intr_ack_set(vcpu)) { > > I've added a TODO comment so it's clearer that we should not be here if > there is no interrupt. > >> + if (nested_exit_intr_ack_set(vcpu) && >> + exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT && >> + kvm_cpu_has_interrupt(vcpu)) { >> int irq = kvm_cpu_get_interrupt(vcpu); >> WARN_ON(irq < 0); >> vmcs12->vm_exit_intr_info = irq | > > Changed the indentation to the original alignment. > Please don't use 1 tab -- the condition and body meld, which makes it > harder to read. (2 tabs would be ok too.) > > And the subject was way too long, so I changed it to > KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit" > > Applied as it results in better behavior, even if it still is incorrect, > thanks. Thanks Radim. :) In addition, I will think more about it and figure out a finial solution. Regards, Wanpeng Li
On Thu, Aug 3, 2017 at 6:23 PM Wanpeng Li <kernellwp@gmail.com> wrote: > Thanks Radim. :) In addition, I will think more about it and figure > out a finial solution. Have you had any thoughts on a final solution? We're seeing incorrect behavior with an L1 hypervisor running under qemu with "-machine q35,kernel-irqchip=split", and I believe this may be the cause. In particular, VMCS12 has ACK_INTERRUPT_ON_EXIT set, but L1 is seeing an L2 exit for "external interrupt" with the VMCS12 VM-exit interruption information cleared to 0.
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 2737343..c5a0ab5 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -11118,8 +11118,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, vmx_switch_vmcs(vcpu, &vmx->vmcs01); - if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT) - && nested_exit_intr_ack_set(vcpu)) { + if (nested_exit_intr_ack_set(vcpu) && + exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT && + kvm_cpu_has_interrupt(vcpu)) { int irq = kvm_cpu_get_interrupt(vcpu); WARN_ON(irq < 0); vmcs12->vm_exit_intr_info = irq |