Message ID | 20170801195859.GB1437@flask (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
2017-08-02 3:59 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>: > 2017-07-31 19:25-0700, Wanpeng Li: >> From: Wanpeng Li <wanpeng.li@hotmail.com> >> >> ------------[ cut here ]------------ >> WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel] >> CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7 >> RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel] >> Call Trace: >> vmx_check_nested_events+0x131/0x1f0 [kvm_intel] >> ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel] >> kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm] >> ? vmx_vcpu_load+0x1be/0x220 [kvm_intel] >> ? kvm_arch_vcpu_load+0x62/0x230 [kvm] >> kvm_vcpu_ioctl+0x340/0x700 [kvm] >> ? kvm_vcpu_ioctl+0x340/0x700 [kvm] >> ? __fget+0xfc/0x210 >> do_vfs_ioctl+0xa4/0x6a0 >> ? __fget+0x11d/0x210 >> SyS_ioctl+0x79/0x90 >> do_syscall_64+0x8f/0x750 >> ? trace_hardirqs_on_thunk+0x1a/0x1c >> entry_SYSCALL64_slow_path+0x25/0x25 >> >> This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which >> means that tells the kernel to not make use of any IOAPICs that may be present >> in the system. >> >> Actually external_intr variable in nested_vmx_vmexit() is the req_int_win >> variable passed from vcpu_enter_guest() which means that the L0's userspace >> requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) && >> L0's userspace reqeusts an irq window) is true, so there is no interrupt which >> L1 requires to inject to L2, we should not attempt to emualte "Acknowledge >> interrupt on exit" for the irq window requirement in this scenario. >> >> This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit" >> if there is no L1 requirement to inject an interrupt to L2. >> >> Cc: Paolo Bonzini <pbonzini@redhat.com> >> Cc: Radim Krčmář <rkrcmar@redhat.com> >> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> >> --- >> v1 -> v2: >> * update patch description >> * check nested_exit_intr_ack_set() first >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> @@ -11118,8 +11118,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, >> >> vmx_switch_vmcs(vcpu, &vmx->vmcs01); >> >> - if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT) >> - && nested_exit_intr_ack_set(vcpu)) { >> + if (nested_exit_intr_ack_set(vcpu) && >> + exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT && >> + kvm_cpu_has_interrupt(vcpu)) { > > This would work as a solution, but I don't think it's the correct > behavior. > > SDM says that with acknowledge interrupt on exit, bit 31 of the VM-exit > interrupt information (valid interrupt) is always set to 1 on > EXIT_REASON_EXTERNAL_INTERRUPT. We don't want to break hypervisors > expecting an interrupt in that case, so we should do a userspace VM exit > when the window is open and then inject the userspace interrupt with a > VM exit. Agreed. > > The simplest thing that came to my mind is to: > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 39a6222bf968..9ad0c882c4f5 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -10687,7 +10687,8 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) > return 0; > } > > - if ((kvm_cpu_has_interrupt(vcpu) || external_intr) && > + if ((kvm_cpu_has_interrupt(vcpu) || > + (external_intr && !nested_exit_intr_ack_set(vcpu))) && > nested_exit_on_intr(vcpu)) { > if (vmx->nested.nested_run_pending) > return -EBUSY; > Agreed. > but I think it could break more ... actually, why was the window closed? > > kvm_vcpu_ready_for_interrupt_injection() checks vmx_interrupt_allowed() > in order to decide need for the window, but vmx_check_nested_events() > doesn't care about that at all, so the window might just appear closed. > Would the following hunk help too? In addition, the request window can be requested by L0's userspace (kvm_arch_pre_run), and the idea below still can't fix in my testing. Regards, Wanpeng Li > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 39a6222bf968..7e6caa9c225d 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -5567,8 +5567,10 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu) > > static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu) > { > - return (!to_vmx(vcpu)->nested.nested_run_pending && > - vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) && > + if (is_guest_mode(vcpu)) > + return !to_vmx(vcpu)->nested.nested_run_pending; > + > + return vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF && > !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & > (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS)); > } > > (It doesn't prevent malicious userspace from hitting the WARN, though.) > > Thanks.
2017-08-02 6:42 GMT+08:00 Wanpeng Li <kernellwp@gmail.com>: > 2017-08-02 3:59 GMT+08:00 Radim Krčmář <rkrcmar@redhat.com>: >> 2017-07-31 19:25-0700, Wanpeng Li: >>> From: Wanpeng Li <wanpeng.li@hotmail.com> >>> >>> ------------[ cut here ]------------ >>> WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel] >>> CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7 >>> RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel] >>> Call Trace: >>> vmx_check_nested_events+0x131/0x1f0 [kvm_intel] >>> ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel] >>> kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm] >>> ? vmx_vcpu_load+0x1be/0x220 [kvm_intel] >>> ? kvm_arch_vcpu_load+0x62/0x230 [kvm] >>> kvm_vcpu_ioctl+0x340/0x700 [kvm] >>> ? kvm_vcpu_ioctl+0x340/0x700 [kvm] >>> ? __fget+0xfc/0x210 >>> do_vfs_ioctl+0xa4/0x6a0 >>> ? __fget+0x11d/0x210 >>> SyS_ioctl+0x79/0x90 >>> do_syscall_64+0x8f/0x750 >>> ? trace_hardirqs_on_thunk+0x1a/0x1c >>> entry_SYSCALL64_slow_path+0x25/0x25 >>> >>> This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which >>> means that tells the kernel to not make use of any IOAPICs that may be present >>> in the system. >>> >>> Actually external_intr variable in nested_vmx_vmexit() is the req_int_win >>> variable passed from vcpu_enter_guest() which means that the L0's userspace >>> requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) && >>> L0's userspace reqeusts an irq window) is true, so there is no interrupt which >>> L1 requires to inject to L2, we should not attempt to emualte "Acknowledge >>> interrupt on exit" for the irq window requirement in this scenario. >>> >>> This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit" >>> if there is no L1 requirement to inject an interrupt to L2. >>> >>> Cc: Paolo Bonzini <pbonzini@redhat.com> >>> Cc: Radim Krčmář <rkrcmar@redhat.com> >>> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> >>> --- >>> v1 -> v2: >>> * update patch description >>> * check nested_exit_intr_ack_set() first >>> >>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>> @@ -11118,8 +11118,9 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, >>> >>> vmx_switch_vmcs(vcpu, &vmx->vmcs01); >>> >>> - if ((exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT) >>> - && nested_exit_intr_ack_set(vcpu)) { >>> + if (nested_exit_intr_ack_set(vcpu) && >>> + exit_reason == EXIT_REASON_EXTERNAL_INTERRUPT && >>> + kvm_cpu_has_interrupt(vcpu)) { >> >> This would work as a solution, but I don't think it's the correct >> behavior. >> >> SDM says that with acknowledge interrupt on exit, bit 31 of the VM-exit >> interrupt information (valid interrupt) is always set to 1 on >> EXIT_REASON_EXTERNAL_INTERRUPT. We don't want to break hypervisors >> expecting an interrupt in that case, so we should do a userspace VM exit >> when the window is open and then inject the userspace interrupt with a >> VM exit. > > Agreed. > >> >> The simplest thing that came to my mind is to: >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> index 39a6222bf968..9ad0c882c4f5 100644 >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -10687,7 +10687,8 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) >> return 0; >> } >> >> - if ((kvm_cpu_has_interrupt(vcpu) || external_intr) && >> + if ((kvm_cpu_has_interrupt(vcpu) || >> + (external_intr && !nested_exit_intr_ack_set(vcpu))) && >> nested_exit_on_intr(vcpu)) { >> if (vmx->nested.nested_run_pending) >> return -EBUSY; >> > > Agreed. What's your opinion, Paolo? :) Actually I considered the above idea before, it is what SDM defined. Regards, Wanpeng Li > >> but I think it could break more ... actually, why was the window closed? >> >> kvm_vcpu_ready_for_interrupt_injection() checks vmx_interrupt_allowed() >> in order to decide need for the window, but vmx_check_nested_events() >> doesn't care about that at all, so the window might just appear closed. >> Would the following hunk help too? > > In addition, the request window can be requested by L0's userspace > (kvm_arch_pre_run), and the idea below still can't fix in my testing. > > Regards, > Wanpeng Li > >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >> index 39a6222bf968..7e6caa9c225d 100644 >> --- a/arch/x86/kvm/vmx.c >> +++ b/arch/x86/kvm/vmx.c >> @@ -5567,8 +5567,10 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu) >> >> static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu) >> { >> - return (!to_vmx(vcpu)->nested.nested_run_pending && >> - vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) && >> + if (is_guest_mode(vcpu)) >> + return !to_vmx(vcpu)->nested.nested_run_pending; >> + >> + return vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF && >> !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & >> (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS)); >> } >> >> (It doesn't prevent malicious userspace from hitting the WARN, though.) >> >> Thanks.
On 02/08/2017 10:05, Wanpeng Li wrote: >>> >>> SDM says that with acknowledge interrupt on exit, bit 31 of the VM-exit >>> interrupt information (valid interrupt) is always set to 1 on >>> EXIT_REASON_EXTERNAL_INTERRUPT. We don't want to break hypervisors >>> expecting an interrupt in that case, so we should do a userspace VM exit >>> when the window is open and then inject the userspace interrupt with a >>> VM exit. >> Agreed. >> >>> The simplest thing that came to my mind is to: >>> >>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>> index 39a6222bf968..9ad0c882c4f5 100644 >>> --- a/arch/x86/kvm/vmx.c >>> +++ b/arch/x86/kvm/vmx.c >>> @@ -10687,7 +10687,8 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) >>> return 0; >>> } >>> >>> - if ((kvm_cpu_has_interrupt(vcpu) || external_intr) && >>> + if ((kvm_cpu_has_interrupt(vcpu) || >>> + (external_intr && !nested_exit_intr_ack_set(vcpu))) && >>> nested_exit_on_intr(vcpu)) { >>> if (vmx->nested.nested_run_pending) >>> return -EBUSY; >>> >> Agreed. > > What's your opinion, Paolo? :) Actually I considered the above idea > before, it is what SDM defined. Radim and I always agree. :) Paolo
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 39a6222bf968..9ad0c882c4f5 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -10687,7 +10687,8 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) return 0; } - if ((kvm_cpu_has_interrupt(vcpu) || external_intr) && + if ((kvm_cpu_has_interrupt(vcpu) || + (external_intr && !nested_exit_intr_ack_set(vcpu))) && nested_exit_on_intr(vcpu)) { if (vmx->nested.nested_run_pending) return -EBUSY; but I think it could break more ... actually, why was the window closed? kvm_vcpu_ready_for_interrupt_injection() checks vmx_interrupt_allowed() in order to decide need for the window, but vmx_check_nested_events() doesn't care about that at all, so the window might just appear closed. Would the following hunk help too? diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 39a6222bf968..7e6caa9c225d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5567,8 +5567,10 @@ static int vmx_nmi_allowed(struct kvm_vcpu *vcpu) static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu) { - return (!to_vmx(vcpu)->nested.nested_run_pending && - vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) && + if (is_guest_mode(vcpu)) + return !to_vmx(vcpu)->nested.nested_run_pending; + + return vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF && !(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) & (GUEST_INTR_STATE_STI | GUEST_INTR_STATE_MOV_SS)); }