From patchwork Wed Apr 12 11:57:52 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 9677263 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id C30B860325 for ; Wed, 12 Apr 2017 11:58:07 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B35452830A for ; Wed, 12 Apr 2017 11:58:07 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A7DA8285E0; Wed, 12 Apr 2017 11:58:07 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D4EFE2830A for ; Wed, 12 Apr 2017 11:58:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753308AbdDLL6A (ORCPT ); Wed, 12 Apr 2017 07:58:00 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36262 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752147AbdDLL56 (ORCPT ); Wed, 12 Apr 2017 07:57:58 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D4B222E604C for ; Wed, 12 Apr 2017 11:57:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com D4B222E604C Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx05.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=peterx@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com D4B222E604C Received: from pxdev.xzpeter.org (ovpn-8-31.pek2.redhat.com [10.72.8.31]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 9BEBC77EAD; Wed, 12 Apr 2017 11:57:55 +0000 (UTC) Date: Wed, 12 Apr 2017 19:57:52 +0800 From: Peter Xu To: Ladi Prosek Cc: KVM list , Radim Krcmar Subject: Re: [PATCH] KVM: x86: implement IOAPIC_REG_EOI for directed EOI support Message-ID: <20170412115752.GJ16464@pxdev.xzpeter.org> References: <20170411141115.4314-1-lprosek@redhat.com> <20170412064029.GD16464@pxdev.xzpeter.org> <20170412090644.GG16464@pxdev.xzpeter.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Wed, 12 Apr 2017 11:57:58 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Wed, Apr 12, 2017 at 12:28:25PM +0200, Ladi Prosek wrote: > On Wed, Apr 12, 2017 at 11:37 AM, Ladi Prosek wrote: > > On Wed, Apr 12, 2017 at 11:06 AM, Peter Xu wrote: > >> On Wed, Apr 12, 2017 at 09:36:58AM +0200, Ladi Prosek wrote: > >>> On Wed, Apr 12, 2017 at 8:40 AM, Peter Xu wrote: > >>> > On Tue, Apr 11, 2017 at 04:11:15PM +0200, Ladi Prosek wrote: > >>> >> If the guest takes advantage of the directed EOI feature by setting > >>> >> APIC_SPIV_DIRECTED_EOI, it is expected to signal EOI by writing to > >>> >> the EOI register of the respective IOAPIC. > >>> >> > >>> >> From Intel's x2APIC Specification: > >>> >> "following the EOI to the local x2APIC unit for a level triggered > >>> >> interrupt, perform a directed EOI to the IOxAPIC generating the > >>> >> interrupt by writing to its EOI register." > >>> >> > >>> >> Commit fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation") > >>> >> inhibited EOI on LAPIC EOI register write but didn't add the IOAPIC > >>> >> part. IOAPIC_REG_EOI writes were handled only on IA64 and the code > >>> >> was later removed with the rest of IA64 support. > >>> >> > >>> >> The bug has gone undetected for a long time because Linux writes to > >>> >> IOAPIC_REG_EOI only if the IOAPIC version is >=0x20. Windows doesn't > >>> >> seem to perform such a check. > >>> > > >>> > Hi, Ladi, > >>> > >>> Hi Peter, > >>> > >>> > Not sure I'm understanding it correctly... I see "direct EOI" is a > >>> > feature for IOAPIC version 0x20, while "suppress EOI-broadcast" is > >>> > another feature for APIC. They are not the same feature, so it may not > >>> > be required to have them all together. IIUC current x86 kvm is just > >>> > the case - it supports EOI broadcast suppression on APIC, but it does > >>> > not support direct EOI on kernel IOAPIC. > >>> > >>> Thanks, that makes perfect sense and explains why Linux behaves the > >>> way it does (__eoi_ioapic_pin in arch/x86/kernel/apic/io_apic.c). > >>> > >>> This document makes it look like suppress EOI-broadcast always implies > >>> directed EOI, though: > >>> > >>> http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf > >>> > >>> NB "The support for Directed EOI capability can be detected by means > >>> of bit 24 in the Local APIC Version Register. " > >>> > >>> There is no mention of APIC version or any other detection mechanism > >>> for directed EOI. Maybe the chip being x2APIC implies version >= 0x20 > >>> but I don't see that in the document either. > >>> > >>> I suspect that Microsoft implemented EOI by following this same spec. > >>> Level-triggered interrupts don't work right on Windows Server 2016 > >>> with Hyper-V enabled without this patch. > >> > >> Yes, the documents for IOAPIC is always hard to find, at least for > >> me... > >> > >> There is some pages mentioned IOAPIC in ICH9 manual on chap 13.5 here: > >> http://www.intel.com/content/dam/doc/datasheet/io-controller-hub-9-datasheet.pdf > >> > >> However I see nothing related to how the IOAPIC version is defined. In > >> that sense, the comment above __eoi_ioapic_pin() seems to be better. :) > >> > >>> > >>> > I think the problem is why the guest setup APIC_SPIV_DIRECTED_EOI even > >>> > if IOAPIC does not support direct EOI (the guest can know it by > >>> > probing IOAPIC version). Please correct if I'm wrong. > >>> > >>> Yes, I think that the guest is to blame here. We might add that to the > >>> commit message. > >> > >> Agreed. > >> > >>> > >>> >> > >>> >> This commit re-adds IOAPIC_REG_EOI and implements it in terms of > >>> >> __kvm_ioapic_update_eoi. > >>> >> > >>> >> Fixes: fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation") > >>> >> Signed-off-by: Ladi Prosek > >>> >> --- > >>> >> arch/x86/kvm/ioapic.c | 46 ++++++++++++++++++++++++++++------------------ > >>> >> arch/x86/kvm/ioapic.h | 1 + > >>> >> 2 files changed, 29 insertions(+), 18 deletions(-) > >>> >> > >>> >> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c > >>> >> index 289270a..8df1c6c 100644 > >>> >> --- a/arch/x86/kvm/ioapic.c > >>> >> +++ b/arch/x86/kvm/ioapic.c > >>> >> @@ -415,14 +415,15 @@ static void kvm_ioapic_eoi_inject_work(struct work_struct *work) > >>> >> #define IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT 10000 > >>> >> > >>> >> static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, > >>> >> - struct kvm_ioapic *ioapic, int vector, int trigger_mode) > >>> >> + struct kvm_ioapic *ioapic, int vector, int trigger_mode, > >>> >> + bool directed) > >>> >> { > >>> >> struct dest_map *dest_map = &ioapic->rtc_status.dest_map; > >>> >> struct kvm_lapic *apic = vcpu->arch.apic; > >>> >> int i; > >>> >> > >>> >> /* RTC special handling */ > >>> >> - if (test_bit(vcpu->vcpu_id, dest_map->map) && > >>> >> + if (!directed && test_bit(vcpu->vcpu_id, dest_map->map) && > >>> >> vector == dest_map->vectors[vcpu->vcpu_id]) > >>> >> rtc_irq_eoi(ioapic, vcpu); > >>> >> > >>> >> @@ -432,21 +433,23 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, > >>> >> if (ent->fields.vector != vector) > >>> >> continue; > >>> >> > >>> >> - /* > >>> >> - * We are dropping lock while calling ack notifiers because ack > >>> >> - * notifier callbacks for assigned devices call into IOAPIC > >>> >> - * recursively. Since remote_irr is cleared only after call > >>> >> - * to notifiers if the same vector will be delivered while lock > >>> >> - * is dropped it will be put into irr and will be delivered > >>> >> - * after ack notifier returns. > >>> >> - */ > >>> >> - spin_unlock(&ioapic->lock); > >>> >> - kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i); > >>> >> - spin_lock(&ioapic->lock); > >>> >> - > >>> >> - if (trigger_mode != IOAPIC_LEVEL_TRIG || > >>> >> - kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) > >>> >> - continue; > >>> >> + if (!directed) { > >>> > > >>> > Could I ask why we need to skip this if the EOI is sent via direct EOI > >>> > register of IOAPIC? > >>> > >>> Because it's already been done as part of the local EOI. With directed > >>> EOI we hit this function twice, first time when doing the local EOI > >>> and then the newly added code path for IOAPIC EOI with directed=true. > >>> > >>> I, again, followed the above mentioned document which explicitly > >>> dictates the sequence. And I mechanically split the function to the > >>> "local part' - what it had been doing up to the continue statement - > >>> and the "directed part" - what it had been skipping. I'll admit that > >>> my familiarity with this code is limited and there may be a better way > >>> to do this. > >> > >> Instead of the "!directed" flag (which is imho duplicated with what > >> APIC_SPIV_DIRECTED_EOI means), do you like below fix? > >> > >> -----8<----- > >> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c > >> index 6e219e5..78d3ec8 100644 > >> --- a/arch/x86/kvm/ioapic.c > >> +++ b/arch/x86/kvm/ioapic.c > >> @@ -444,8 +444,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, > >> kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i); > >> spin_lock(&ioapic->lock); > >> > >> - if (trigger_mode != IOAPIC_LEVEL_TRIG || > >> - kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) > >> + if (trigger_mode != IOAPIC_LEVEL_TRIG) > >> continue; > >> > >> ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG); > >> @@ -473,10 +472,15 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, > >> } > >> } > >> > >> +/* This should only be triggered by APIC EOI broadcast */ > >> void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode) > >> { > >> struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic; > >> > >> + /* If we'll be using direct EOI, skip broadcast */ > >> + if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) > >> + return; > >> + > > > > I've only seen the direct EOI sent for level irqs so I'm afraid that > > __kvm_ioapic_update_eoi needs to run for edge-triggered even if the > > APIC_SPIV_DIRECTED_EOI flag is set. Yes, if without your patch, it is problematic. But if with your patch, __kvm_ioapic_update_eoi() will be called in ioapic mmio write then. > > > > Other than that it looks reasonable. > > Although, wait, what if the guest uses APIC_SPIV_DIRECTED_EOI to > suppress the broadcast but then does EOI by writing to the IOAPIC > routing entry? You kind of indicated that this would be a valid use of > the feature. That's exactly how I understand it. :) > This is what __eoi_ioapic_pin does for version<0x20 and > on the host side we reset the remote_irr in ioapic_write_indirect if > I'm reading the code correctly. Wouldn't we want to deliver the > notification via kvm_notify_acked_irq in this case also? I think in that case (EOI sent via "direct EOI" of ioapic mmio register) the remote_irr is not cleaned in ioapic_write_indirect(), but it's cleaned in __kvm_ioapic_update_eoi() as well. It has the line: ent->fields.remote_irr = 0; Right? > > Thanks! > > >> spin_lock(&ioapic->lock); > >> __kvm_ioapic_update_eoi(vcpu, ioapic, vector, trigger_mode); > >> spin_unlock(&ioapic->lock); > >> ---->8---- > >> > >> This patch along will break kvm_notify_acked_irq() in some way I > >> guess, but if with your patch (though will possibly need to boost > >> IOAPIC version to 0x20 as well), it should work fine as long as guest > >> remembers to send the direct EOI. > > > > Not sure about the version boost, especially since we don't have a > > good spec to define what the version means. Maybe only if it helps > > Linux performance. In theory __eoi_ioapic_pin should be causing fewer > > vmexits with version>=0x20. I think at least from the comment it means only ioapic version 0x20 has that EOI register, and that's how I understand it. Btw, I would even optimize my above fix to the following, which I like it better: ---->8---- ----8<---- Moving the APIC_SPIV_DIRECTED_EOI check into kvm_ioapic_send_eoi() will make sure kvm will not send this useless EOI broadcast even the ioapic is in userspace. Thanks, diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c index 6e219e5..67d0849 100644 --- a/arch/x86/kvm/ioapic.c +++ b/arch/x86/kvm/ioapic.c @@ -444,8 +444,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i); spin_lock(&ioapic->lock); - if (trigger_mode != IOAPIC_LEVEL_TRIG || - kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) + if (trigger_mode != IOAPIC_LEVEL_TRIG) continue; ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG); diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index bad6a25..11ee9b7 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1064,6 +1064,9 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector) if (!kvm_ioapic_handles_vector(apic, vector)) return; + if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) + return; + /* Request a KVM exit to inform the userspace IOAPIC. */ if (irqchip_split(apic->vcpu->kvm)) { apic->vcpu->arch.pending_ioapic_eoi = vector;