From patchwork Tue Sep 13 19:01:46 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?UmFkaW0gS3LEjW3DocWZ?= X-Patchwork-Id: 9329839 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 436326077F for ; Tue, 13 Sep 2016 19:01:55 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 367B92937E for ; Tue, 13 Sep 2016 19:01:55 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2940E2952A; Tue, 13 Sep 2016 19:01:55 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 864F52937E for ; Tue, 13 Sep 2016 19:01:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932415AbcIMTBw (ORCPT ); Tue, 13 Sep 2016 15:01:52 -0400 Received: from mx1.redhat.com ([209.132.183.28]:37524 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759957AbcIMTBt (ORCPT ); Tue, 13 Sep 2016 15:01:49 -0400 Received: from int-mx13.intmail.prod.int.phx2.redhat.com (int-mx13.intmail.prod.int.phx2.redhat.com [10.5.11.26]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 00D5B8AE72; Tue, 13 Sep 2016 19:01:49 +0000 (UTC) Received: from potion (dhcp-1-112.brq.redhat.com [10.34.1.112]) by int-mx13.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with SMTP id u8DJ1klA005846; Tue, 13 Sep 2016 15:01:46 -0400 Received: by potion (sSMTP sendmail emulation); Tue, 13 Sep 2016 21:01:46 +0200 Date: Tue, 13 Sep 2016 21:01:46 +0200 From: Radim Krcmar To: Wanpeng Li Cc: kvm , "linux-kernel@vger.kernel.org" , Paolo Bonzini Subject: Re: kvm-unit-test fail for split irqchip Message-ID: <20160913190145.GE15680@potion> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 2.68 on 10.5.11.26 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Tue, 13 Sep 2016 19:01:49 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP 2016-09-13 19:06+0800, Wanpeng Li: > Add -kernel_irqchip=split > ./x86-run x86/eventinj.flat > > qemu-system-x86_64 -enable-kvm -machine kernel_irqchip=split -cpu host > -device pc-testdev -device isa-debug-exit,iobase=0xf4,iosize=0x4 -vnc > none -serial stdio -device pci-testdev -kernel x86/eventinj.flat > enabling apic > paging enabled > cr0 = 80010011 > cr3 = 7fff000 > cr4 = 20 > Sending vec 33 and 62 and mask one with TPR > irq1 running > irq1 running > After 33/62 TPR test > FAIL: TPR > irq0 running > irq0 running > > Both irq1 and irq0 are executing twice. > > qemu-system-x86-22794 [001] d..2 34591.708476: kvm_entry: vcpu 0 > qemu-system-x86-22794 [001] ...1 34591.708478: kvm_exit: reason > MSR_WRITE rip 0x401f33 info 0 0 > qemu-system-x86-22794 [001] ...1 34591.708478: kvm_apic: apic_write > APIC_EOI = 0x0 > qemu-system-x86-22794 [001] ...1 34591.708479: kvm_eoi: apicid 0 vector 62 > qemu-system-x86-22794 [001] ...1 34591.708479: kvm_msr: msr_write 80b = 0x0 > qemu-system-x86-22794 [001] d..2 34591.708480: kvm_entry: vcpu 0 > qemu-system-x86-22794 [001] ...1 34591.708482: kvm_exit: reason > PENDING_INTERRUPT rip 0x401f35 info 0 0 > qemu-system-x86-22794 [001] ...1 34591.708482: kvm_userspace_exit: > reason KVM_EXIT_IRQ_WINDOW_OPEN (7) > qemu-system-x86-22794 [001] ...1 34591.708491: kvm_inj_virq: irq 62 > qemu-system-x86-22794 [001] d..2 34591.708492: kvm_entry: vcpu 0 > qemu-system-x86-22794 [001] ...1 34591.708493: kvm_exit: reason > IO_INSTRUCTION rip 0x4016ec info 3fd0008 0 > > From the trace we can see there is an interrupt window exit after the > first interrupt EOI(irq 62), and the same irq(62) is injected > duplicately after the interrupt window. > > The bug can disappear if kernel_irqchip is on or -x2apic, the virtual > x2apic mode will not be set due to commit (8d14695f9542 x86, apicv: > add virtual x2apic support), so that tpr shadow in the x2apic doesn't > work and wrmsr TPR register will trigger vmexit, and then kvmvapic > will be used to optimize flexpriority=N or AMD. The > report_trp_access() which is called in kvm_lapic_reg_write() will > trigger a userspace exit. > > TPR report access callbacks in qemu, kvm_handle_tpr_access() -> > apic_handle_tpr_access_report() -> vapic_report_tpr_access() -> > cpu_synchronize_state() will get apic register states from kvm. > > Later, kvm_arch_pre_run -> cpu_get_pic_interrupt(if there is a pic > interrupt) -> apic_get_interrupt, it is a pic interrupt, however it > gets the stale irq from apic register sync by report tpr access and > KVM_INTERRUPT the second duplicate interrupt. > > Paolo pointed out it is not the TPR associated bug, and we should > figure out why there is an interrupt window exit after the first EOI. Yeah, seems like QEMU bug. QEMU does KVM_INTERRUPT(62) ioctl after KVM exits with KVM_EXIT_IRQ_WINDOW_OPEN, which QEMU requested while the guest was printing. The printing calls serial_update_irq() -> qemu_irq_lower() -> qemu_set_irq() -> gsi_handler() -> qemu_set_irq() -> pic_irq_request() -> apic_deliver_pic_intr() -> kvm_handle_interrupt() kvm_handle_interrupt() does interrupt_request |= CPU_INTERRUPT_HARD which later calls cpu_get_pic_interrupt() in kvm_arch_pre_run(), but that function uses stale information from APIC and injects 62 again. If we synchronized the APIC, then the test would #GP, because there would be no injectable interrupt in LAPIC or PIC, so pic_read_irq() would return 15, thinking it was spurious. I think the bug starts in pic_irq_request(), which should not touch LAPIC. The patch below makes it work (just the second hunk is sufficient), but it's still far from sane code ... --- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 47593b741a5b..6983e9f13813 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -161,13 +161,16 @@ int cpu_get_pic_interrupt(CPUX86State *env) X86CPU *cpu = x86_env_get_cpu(env); int intno; - intno = apic_get_interrupt(cpu->apic_state); - if (intno >= 0) { - return intno; - } - /* read the irq from the PIC */ - if (!apic_accept_pic_intr(cpu->apic_state)) { - return -1; + if (!kvm_irqchip_is_split()) { + /* XXX: why query APIC at all? */ + intno = apic_get_interrupt(cpu->apic_state); + if (intno >= 0) { + return intno; + } + /* read the irq from the PIC */ + if (!apic_accept_pic_intr(cpu->apic_state)) { + return -1; + } } intno = pic_read_irq(isa_pic); @@ -180,7 +183,7 @@ static void pic_irq_request(void *opaque, int irq, int level) X86CPU *cpu = X86_CPU(cs); DPRINTF("pic_irqs: %s irq %d\n", level? "raise" : "lower", irq); - if (cpu->apic_state) { + if (cpu->apic_state && !kvm_irqchip_is_split()) { CPU_FOREACH(cs) { cpu = X86_CPU(cs); if (apic_accept_pic_intr(cpu->apic_state)) {