From patchwork Tue Jan 19 19:18:03 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 8064751 Return-Path: X-Original-To: patchwork-qemu-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id BBD91BEEE5 for ; Tue, 19 Jan 2016 19:18:28 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id C9E892053F for ; Tue, 19 Jan 2016 19:18:27 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9A86320392 for ; Tue, 19 Jan 2016 19:18:26 +0000 (UTC) Received: from localhost ([::1]:38925 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aLbnN-0004Eu-Sd for patchwork-qemu-devel@patchwork.kernel.org; Tue, 19 Jan 2016 14:18:25 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44280) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aLbn6-00046g-UL for qemu-devel@nongnu.org; Tue, 19 Jan 2016 14:18:10 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aLbn2-0002Ed-Sm for qemu-devel@nongnu.org; Tue, 19 Jan 2016 14:18:08 -0500 Received: from mx1.redhat.com ([209.132.183.28]:47796) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aLbn2-0002ET-JP for qemu-devel@nongnu.org; Tue, 19 Jan 2016 14:18:04 -0500 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) by mx1.redhat.com (Postfix) with ESMTPS id 542D0C09FA92 for ; Tue, 19 Jan 2016 19:18:04 +0000 (UTC) Received: from gimli.home (ovpn-113-148.phx2.redhat.com [10.3.113.148]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u0JJI3ov005411; Tue, 19 Jan 2016 14:18:04 -0500 From: Alex Williamson To: qemu-devel@nongnu.org Date: Tue, 19 Jan 2016 12:18:03 -0700 Message-ID: <20160119191803.19659.19245.stgit@gimli.home> In-Reply-To: <20160119191704.19659.31099.stgit@gimli.home> References: <20160119191704.19659.31099.stgit@gimli.home> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PULL 2/2] vfio/pci: Lazy PBA emulation X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The PCI spec recommends devices use additional alignment for MSI-X data structures to allow software to map them to separate processor pages. One advantage of doing this is that we can emulate those data structures without a significant performance impact to the operation of the device. Some devices fail to implement that suggestion and assigned device performance suffers. One such case of this is a Mellanox MT27500 series, ConnectX-3 VF, where the MSI-X vector table and PBA are aligned on separate 4K pages. If PBA emulation is enabled, performance suffers. It's not clear how much value we get from PBA emulation, but the solution here is to only lazily enable the emulated PBA when a masked MSI-X vector fires. We then attempt to more aggresively disable the PBA memory region any time a vector is unmasked. The expectation is then that a typical VM will run entirely with PBA emulation disabled, and only when used is that emulation re-enabled. Reported-by: Shyam Kaushik Tested-by: Shyam Kaushik Signed-off-by: Alex Williamson --- hw/vfio/pci.c | 39 +++++++++++++++++++++++++++++++++++++++ hw/vfio/pci.h | 1 + trace-events | 2 ++ 3 files changed, 42 insertions(+) diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 1fb868c..e66c47f 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -356,6 +356,13 @@ static void vfio_msi_interrupt(void *opaque) if (vdev->interrupt == VFIO_INT_MSIX) { get_msg = msix_get_message; notify = msix_notify; + + /* A masked vector firing needs to use the PBA, enable it */ + if (msix_is_masked(&vdev->pdev, nr)) { + set_bit(nr, vdev->msix->pending); + memory_region_set_enabled(&vdev->pdev.msix_pba_mmio, true); + trace_vfio_msix_pba_enable(vdev->vbasedev.name); + } } else if (vdev->interrupt == VFIO_INT_MSI) { get_msg = msi_get_message; notify = msi_notify; @@ -535,6 +542,14 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr, } } + /* Disable PBA emulation when nothing more is pending. */ + clear_bit(nr, vdev->msix->pending); + if (find_first_bit(vdev->msix->pending, + vdev->nr_vectors) == vdev->nr_vectors) { + memory_region_set_enabled(&vdev->pdev.msix_pba_mmio, false); + trace_vfio_msix_pba_disable(vdev->vbasedev.name); + } + return 0; } @@ -738,6 +753,9 @@ static void vfio_msix_disable(VFIOPCIDevice *vdev) vfio_msi_disable_common(vdev); + memset(vdev->msix->pending, 0, + BITS_TO_LONGS(vdev->msix->entries) * sizeof(unsigned long)); + trace_vfio_msix_disable(vdev->vbasedev.name); } @@ -1251,6 +1269,8 @@ static int vfio_msix_setup(VFIOPCIDevice *vdev, int pos) { int ret; + vdev->msix->pending = g_malloc0(BITS_TO_LONGS(vdev->msix->entries) * + sizeof(unsigned long)); ret = msix_init(&vdev->pdev, vdev->msix->entries, &vdev->bars[vdev->msix->table_bar].region.mem, vdev->msix->table_bar, vdev->msix->table_offset, @@ -1264,6 +1284,24 @@ static int vfio_msix_setup(VFIOPCIDevice *vdev, int pos) return ret; } + /* + * The PCI spec suggests that devices provide additional alignment for + * MSI-X structures and avoid overlapping non-MSI-X related registers. + * For an assigned device, this hopefully means that emulation of MSI-X + * structures does not affect the performance of the device. If devices + * fail to provide that alignment, a significant performance penalty may + * result, for instance Mellanox MT27500 VFs: + * http://www.spinics.net/lists/kvm/msg125881.html + * + * The PBA is simply not that important for such a serious regression and + * most drivers do not appear to look at it. The solution for this is to + * disable the PBA MemoryRegion unless it's being used. We disable it + * here and only enable it if a masked vector fires through QEMU. As the + * vector-use notifier is called, which occurs on unmask, we test whether + * PBA emulation is needed and again disable if not. + */ + memory_region_set_enabled(&vdev->pdev.msix_pba_mmio, false); + return 0; } @@ -1275,6 +1313,7 @@ static void vfio_teardown_msi(VFIOPCIDevice *vdev) msix_uninit(&vdev->pdev, &vdev->bars[vdev->msix->table_bar].region.mem, &vdev->bars[vdev->msix->pba_bar].region.mem); + g_free(vdev->msix->pending); } } diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index f004d52..6256587 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -95,6 +95,7 @@ typedef struct VFIOMSIXInfo { uint32_t pba_offset; MemoryRegion mmap_mem; void *mmap; + unsigned long *pending; } VFIOMSIXInfo; typedef struct VFIOPCIDevice { diff --git a/trace-events b/trace-events index 934a7b6..c9ac144 100644 --- a/trace-events +++ b/trace-events @@ -1631,6 +1631,8 @@ vfio_msi_interrupt(const char *name, int index, uint64_t addr, int data) " (%s) vfio_msix_vector_do_use(const char *name, int index) " (%s) vector %d used" vfio_msix_vector_release(const char *name, int index) " (%s) vector %d released" vfio_msix_enable(const char *name) " (%s)" +vfio_msix_pba_disable(const char *name) " (%s)" +vfio_msix_pba_enable(const char *name) " (%s)" vfio_msix_disable(const char *name) " (%s)" vfio_msi_enable(const char *name, int nr_vectors) " (%s) Enabled %d MSI vectors" vfio_msi_disable(const char *name) " (%s)"