Message ID | 20160913190145.GE15680@potion (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 13/09/2016 21:01, Radim Krcmar wrote: > kvm_handle_interrupt() does > > interrupt_request |= CPU_INTERRUPT_HARD > > which later calls cpu_get_pic_interrupt() in kvm_arch_pre_run(), but > that function uses stale information from APIC and injects 62 again. > If we synchronized the APIC, then the test would #GP, because there > would be no injectable interrupt in LAPIC or PIC, so pic_read_irq() > would return 15, thinking it was spurious. > > I think the bug starts in pic_irq_request(), which should not touch > LAPIC. The patch below makes it work (just the second hunk is > sufficient), but it's still far from sane code ... This makes sense. Most of the functions exported by hw/intc/apic.c should only be used with a userspace APIC: 0000000000000b70 T apic_accept_pic_intr 00000000000010f0 T apic_deliver_irq 00000000000011e0 T apic_deliver_pic_intr 0000000000001310 T apic_get_interrupt 0000000000001270 T apic_poll_irq 0000000000000a40 T apic_sipi The patch is okay, though for consistency with other code I'd use !kvm_irqchip_in_kernel() rather than !kvm_irqchip_is_split(). Wanpeng, can you do that, and change hw/intc/apic.c to use a new casting macro APICCommonState *s = APIC(dev); instead of APIC_COMMON? Thanks, Paolo > diff --git a/hw/i386/pc.c b/hw/i386/pc.c > index 47593b741a5b..6983e9f13813 100644 > --- a/hw/i386/pc.c > +++ b/hw/i386/pc.c > @@ -161,13 +161,16 @@ int cpu_get_pic_interrupt(CPUX86State *env) > X86CPU *cpu = x86_env_get_cpu(env); > int intno; > > - intno = apic_get_interrupt(cpu->apic_state); > - if (intno >= 0) { > - return intno; > - } > - /* read the irq from the PIC */ > - if (!apic_accept_pic_intr(cpu->apic_state)) { > - return -1; > + if (!kvm_irqchip_is_split()) { > + /* XXX: why query APIC at all? */ > + intno = apic_get_interrupt(cpu->apic_state); > + if (intno >= 0) { > + return intno; > + } > + /* read the irq from the PIC */ > + if (!apic_accept_pic_intr(cpu->apic_state)) { > + return -1; > + } > } > > intno = pic_read_irq(isa_pic); > @@ -180,7 +183,7 @@ static void pic_irq_request(void *opaque, int irq, int level) > X86CPU *cpu = X86_CPU(cs); > > DPRINTF("pic_irqs: %s irq %d\n", level? "raise" : "lower", irq); > - if (cpu->apic_state) { > + if (cpu->apic_state && !kvm_irqchip_is_split()) { > CPU_FOREACH(cs) { > cpu = X86_CPU(cs); > if (apic_accept_pic_intr(cpu->apic_state)) { > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2016-09-14 3:01 GMT+08:00 Radim Krcmar <rkrcmar@redhat.com>: > 2016-09-13 19:06+0800, Wanpeng Li: >> Add -kernel_irqchip=split >> ./x86-run x86/eventinj.flat >> >> qemu-system-x86_64 -enable-kvm -machine kernel_irqchip=split -cpu host >> -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc >> none -serial stdio -device pci-testdev -kernel x86/eventinj.flat >> enabling apic >> paging enabled >> cr0 = 80010011 >> cr3 = 7fff000 >> cr4 = 20 >> Sending vec 33 and 62 and mask one with TPR >> irq1 running >> irq1 running >> After 33/62 TPR test >> FAIL: TPR >> irq0 running >> irq0 running >> >> Both irq1 and irq0 are executing twice. >> >> qemu-system-x86-22794 [001] d..2 34591.708476: kvm_entry: vcpu 0 >> qemu-system-x86-22794 [001] ...1 34591.708478: kvm_exit: reason >> MSR_WRITE rip 0x401f33 info 0 0 >> qemu-system-x86-22794 [001] ...1 34591.708478: kvm_apic: apic_write >> APIC_EOI = 0x0 >> qemu-system-x86-22794 [001] ...1 34591.708479: kvm_eoi: apicid 0 vector 62 >> qemu-system-x86-22794 [001] ...1 34591.708479: kvm_msr: msr_write 80b = 0x0 >> qemu-system-x86-22794 [001] d..2 34591.708480: kvm_entry: vcpu 0 >> qemu-system-x86-22794 [001] ...1 34591.708482: kvm_exit: reason >> PENDING_INTERRUPT rip 0x401f35 info 0 0 >> qemu-system-x86-22794 [001] ...1 34591.708482: kvm_userspace_exit: >> reason KVM_EXIT_IRQ_WINDOW_OPEN (7) >> qemu-system-x86-22794 [001] ...1 34591.708491: kvm_inj_virq: irq 62 >> qemu-system-x86-22794 [001] d..2 34591.708492: kvm_entry: vcpu 0 >> qemu-system-x86-22794 [001] ...1 34591.708493: kvm_exit: reason >> IO_INSTRUCTION rip 0x4016ec info 3fd0008 0 >> >> From the trace we can see there is an interrupt window exit after the >> first interrupt EOI(irq 62), and the same irq(62) is injected >> duplicately after the interrupt window. >> >> The bug can disappear if kernel_irqchip is on or -x2apic, the virtual >> x2apic mode will not be set due to commit (8d14695f9542 x86, apicv: >> add virtual x2apic support), so that tpr shadow in the x2apic doesn't >> work and wrmsr TPR register will trigger vmexit, and then kvmvapic >> will be used to optimize flexpriority=N or AMD. The >> report_trp_access() which is called in kvm_lapic_reg_write() will >> trigger a userspace exit. >> >> TPR report access callbacks in qemu, kvm_handle_tpr_access() -> >> apic_handle_tpr_access_report() -> vapic_report_tpr_access() -> >> cpu_synchronize_state() will get apic register states from kvm. >> >> Later, kvm_arch_pre_run -> cpu_get_pic_interrupt(if there is a pic >> interrupt) -> apic_get_interrupt, it is a pic interrupt, however it >> gets the stale irq from apic register sync by report tpr access and >> KVM_INTERRUPT the second duplicate interrupt. >> >> Paolo pointed out it is not the TPR associated bug, and we should >> figure out why there is an interrupt window exit after the first EOI. > > Yeah, seems like QEMU bug. QEMU does KVM_INTERRUPT(62) ioctl after KVM > exits with KVM_EXIT_IRQ_WINDOW_OPEN, which QEMU requested while the > guest was printing. The printing calls > > serial_update_irq() -> qemu_irq_lower() -> qemu_set_irq() -> > gsi_handler() -> qemu_set_irq() -> pic_irq_request() -> > apic_deliver_pic_intr() -> kvm_handle_interrupt() > > kvm_handle_interrupt() does > > interrupt_request |= CPU_INTERRUPT_HARD > > which later calls cpu_get_pic_interrupt() in kvm_arch_pre_run(), but > that function uses stale information from APIC and injects 62 again. > If we synchronized the APIC, then the test would #GP, because there > would be no injectable interrupt in LAPIC or PIC, so pic_read_irq() > would return 15, thinking it was spurious. > > I think the bug starts in pic_irq_request(), which should not touch > LAPIC. The patch below makes it work (just the second hunk is > sufficient), but it's still far from sane code ... Thanks for the analysis, Radim! :) Regards, Wanpeng Li -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2016-09-14 4:43 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>: > > > On 13/09/2016 21:01, Radim Krcmar wrote: >> kvm_handle_interrupt() does >> >> interrupt_request |= CPU_INTERRUPT_HARD >> >> which later calls cpu_get_pic_interrupt() in kvm_arch_pre_run(), but >> that function uses stale information from APIC and injects 62 again. >> If we synchronized the APIC, then the test would #GP, because there >> would be no injectable interrupt in LAPIC or PIC, so pic_read_irq() >> would return 15, thinking it was spurious. >> >> I think the bug starts in pic_irq_request(), which should not touch >> LAPIC. The patch below makes it work (just the second hunk is >> sufficient), but it's still far from sane code ... > > This makes sense. Most of the functions exported by hw/intc/apic.c > should only be used with a userspace APIC: > > 0000000000000b70 T apic_accept_pic_intr > 00000000000010f0 T apic_deliver_irq > 00000000000011e0 T apic_deliver_pic_intr > 0000000000001310 T apic_get_interrupt > 0000000000001270 T apic_poll_irq > 0000000000000a40 T apic_sipi > > The patch is okay, though for consistency with other code I'd use > !kvm_irqchip_in_kernel() rather than !kvm_irqchip_is_split(). > > Wanpeng, can you do that, Yeah, I just sent out a patch to fix the bug. Thanks for the long discussion with me and thanks Radim's proposal. :) > and change hw/intc/apic.c to use a new casting > macro > > APICCommonState *s = APIC(dev); > > instead of APIC_COMMON? > I'm not familiar with QOM too much, what APIC macro do you like? Regards, Wanpeng Li -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 14/09/2016 05:57, Wanpeng Li wrote: > 2016-09-14 4:43 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>: >> >> >> On 13/09/2016 21:01, Radim Krcmar wrote: >>> kvm_handle_interrupt() does >>> >>> interrupt_request |= CPU_INTERRUPT_HARD >>> >>> which later calls cpu_get_pic_interrupt() in kvm_arch_pre_run(), but >>> that function uses stale information from APIC and injects 62 again. >>> If we synchronized the APIC, then the test would #GP, because there >>> would be no injectable interrupt in LAPIC or PIC, so pic_read_irq() >>> would return 15, thinking it was spurious. >>> >>> I think the bug starts in pic_irq_request(), which should not touch >>> LAPIC. The patch below makes it work (just the second hunk is >>> sufficient), but it's still far from sane code ... >> >> This makes sense. Most of the functions exported by hw/intc/apic.c >> should only be used with a userspace APIC: >> >> 0000000000000b70 T apic_accept_pic_intr >> 00000000000010f0 T apic_deliver_irq >> 00000000000011e0 T apic_deliver_pic_intr >> 0000000000001310 T apic_get_interrupt >> 0000000000001270 T apic_poll_irq >> 0000000000000a40 T apic_sipi >> >> The patch is okay, though for consistency with other code I'd use >> !kvm_irqchip_in_kernel() rather than !kvm_irqchip_is_split(). >> >> Wanpeng, can you do that, > > Yeah, I just sent out a patch to fix the bug. Thanks for the long > discussion with me and thanks Radim's proposal. :) > >> and change hw/intc/apic.c to use a new casting >> macro >> >> APICCommonState *s = APIC(dev); >> >> instead of APIC_COMMON? >> > > I'm not familiar with QOM too much, what APIC macro do you like? The macro would be defined like #define TYPE_APIC "apic" #define APIC(obj) \ OBJECT_CHECK(APICCommonState, (obj), TYPE_APIC) With this change and without your other patch, you should get an assertion failure in eventinj.flat (which would have pointed immediately at the bug!). Thanks, Paolo > Regards, > Wanpeng Li > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2016-09-14 17:42 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>: > > > On 14/09/2016 05:57, Wanpeng Li wrote: >> 2016-09-14 4:43 GMT+08:00 Paolo Bonzini <pbonzini@redhat.com>: >>> >>> >>> On 13/09/2016 21:01, Radim Krcmar wrote: >>>> kvm_handle_interrupt() does >>>> >>>> interrupt_request |= CPU_INTERRUPT_HARD >>>> >>>> which later calls cpu_get_pic_interrupt() in kvm_arch_pre_run(), but >>>> that function uses stale information from APIC and injects 62 again. >>>> If we synchronized the APIC, then the test would #GP, because there >>>> would be no injectable interrupt in LAPIC or PIC, so pic_read_irq() >>>> would return 15, thinking it was spurious. >>>> >>>> I think the bug starts in pic_irq_request(), which should not touch >>>> LAPIC. The patch below makes it work (just the second hunk is >>>> sufficient), but it's still far from sane code ... >>> >>> This makes sense. Most of the functions exported by hw/intc/apic.c >>> should only be used with a userspace APIC: >>> >>> 0000000000000b70 T apic_accept_pic_intr >>> 00000000000010f0 T apic_deliver_irq >>> 00000000000011e0 T apic_deliver_pic_intr >>> 0000000000001310 T apic_get_interrupt >>> 0000000000001270 T apic_poll_irq >>> 0000000000000a40 T apic_sipi >>> >>> The patch is okay, though for consistency with other code I'd use >>> !kvm_irqchip_in_kernel() rather than !kvm_irqchip_is_split(). >>> >>> Wanpeng, can you do that, >> >> Yeah, I just sent out a patch to fix the bug. Thanks for the long >> discussion with me and thanks Radim's proposal. :) >> >>> and change hw/intc/apic.c to use a new casting >>> macro >>> >>> APICCommonState *s = APIC(dev); >>> >>> instead of APIC_COMMON? >>> >> >> I'm not familiar with QOM too much, what APIC macro do you like? > > The macro would be defined like > > #define TYPE_APIC "apic" > #define APIC(obj) \ > OBJECT_CHECK(APICCommonState, (obj), TYPE_APIC) > > With this change and without your other patch, you should get an > assertion failure in eventinj.flat (which would have pointed immediately > at the bug!). Good point, I just sent out the patch, it catches the bug as you expected. :) Regards, Wanpeng Li -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 47593b741a5b..6983e9f13813 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -161,13 +161,16 @@ int cpu_get_pic_interrupt(CPUX86State *env) X86CPU *cpu = x86_env_get_cpu(env); int intno; - intno = apic_get_interrupt(cpu->apic_state); - if (intno >= 0) { - return intno; - } - /* read the irq from the PIC */ - if (!apic_accept_pic_intr(cpu->apic_state)) { - return -1; + if (!kvm_irqchip_is_split()) { + /* XXX: why query APIC at all? */ + intno = apic_get_interrupt(cpu->apic_state); + if (intno >= 0) { + return intno; + } + /* read the irq from the PIC */ + if (!apic_accept_pic_intr(cpu->apic_state)) { + return -1; + } } intno = pic_read_irq(isa_pic); @@ -180,7 +183,7 @@ static void pic_irq_request(void *opaque, int irq, int level) X86CPU *cpu = X86_CPU(cs); DPRINTF("pic_irqs: %s irq %d\n", level? "raise" : "lower", irq); - if (cpu->apic_state) { + if (cpu->apic_state && !kvm_irqchip_is_split()) { CPU_FOREACH(cs) { cpu = X86_CPU(cs); if (apic_accept_pic_intr(cpu->apic_state)) {