From patchwork Fri Feb 22 05:33:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 10825241 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 525BB6C2 for ; Fri, 22 Feb 2019 05:35:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3B7B12E25B for ; Fri, 22 Feb 2019 05:35:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2D47A2EEA9; Fri, 22 Feb 2019 05:35:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B1D612EE1A for ; Fri, 22 Feb 2019 05:35:02 +0000 (UTC) Received: from localhost ([127.0.0.1]:44509 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gx3UD-0002nO-F6 for patchwork-qemu-devel@patchwork.kernel.org; Fri, 22 Feb 2019 00:35:01 -0500 Received: from eggs.gnu.org ([209.51.188.92]:49749) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gx3TI-0002Ei-3X for qemu-devel@nongnu.org; Fri, 22 Feb 2019 00:34:05 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gx3TG-0004kQ-Qn for qemu-devel@nongnu.org; Fri, 22 Feb 2019 00:34:03 -0500 Received: from mx1.redhat.com ([209.132.183.28]:38172) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gx3TG-0004gx-DD for qemu-devel@nongnu.org; Fri, 22 Feb 2019 00:34:02 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E34873091D80 for ; Fri, 22 Feb 2019 05:33:51 +0000 (UTC) Received: from gimli.home (ovpn-116-24.phx2.redhat.com [10.3.116.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id A74A167676; Fri, 22 Feb 2019 05:33:49 +0000 (UTC) From: Alex Williamson To: qemu-devel@nongnu.org Date: Thu, 21 Feb 2019 22:33:49 -0700 Message-ID: <155081362929.23160.15724111710078454465.stgit@gimli.home> In-Reply-To: <155081340903.23160.4034617687275790161.stgit@gimli.home> References: <155081340903.23160.4034617687275790161.stgit@gimli.home> User-Agent: StGit/0.19-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.42]); Fri, 22 Feb 2019 05:33:51 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PULL 1/2] vfio/common: Work around kernel overflow bug in DMA unmap X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP A kernel bug was introduced in v4.15 via commit 71a7d3d78e3c which adds a test for address space wrap-around in the vfio DMA unmap path. Unfortunately due to overflow, the kernel detects an unmap of the last page in the 64-bit address space as a wrap-around. In QEMU, a Q35 guest with VT-d emulation and guest IOMMU enabled will attempt to make such an unmap request during VM system reset, triggering an error: qemu-kvm: VFIO_UNMAP_DMA: -22 qemu-kvm: vfio_dma_unmap(0x561f059948f0, 0xfef00000, 0xffffffff01100000) = -22 (Invalid argument) Here the IOVA start address (0xfef00000) and the size parameter (0xffffffff01100000) add to exactly 2^64, triggering the bug. A kernel fix is queued for the Linux v5.0 release to address this. This patch implements a workaround to retry the unmap, excluding the final page of the range when we detect an unmap failing which matches the requirements for this issue. This is expected to be a safe and complete workaround as the VT-d address space does not extend to the full 64-bit space and therefore the last page should never be mapped. This workaround can be removed once all kernels with this bug are sufficiently deprecated. Link: https://bugzilla.redhat.com/show_bug.cgi?id=1662291 Reported-by: Pei Zhang Debugged-by: Peter Xu Reviewed-by: Peter Xu Reviewed-by: Cornelia Huck Signed-off-by: Alex Williamson --- hw/vfio/common.c | 20 +++++++++++++++++++- hw/vfio/trace-events | 1 + 2 files changed, 20 insertions(+), 1 deletion(-) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 4262b80c4450..9c3796e7db43 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -220,7 +220,25 @@ static int vfio_dma_unmap(VFIOContainer *container, .size = size, }; - if (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) { + while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) { + /* + * The type1 backend has an off-by-one bug in the kernel (71a7d3d78e3c + * v4.15) where an overflow in its wrap-around check prevents us from + * unmapping the last page of the address space. Test for the error + * condition and re-try the unmap excluding the last page. The + * expectation is that we've never mapped the last page anyway and this + * unmap request comes via vIOMMU support which also makes it unlikely + * that this page is used. This bug was introduced well after type1 v2 + * support was introduced, so we shouldn't need to test for v1. A fix + * is queued for kernel v5.0 so this workaround can be removed once + * affected kernels are sufficiently deprecated. + */ + if (errno == EINVAL && unmap.size && !(unmap.iova + unmap.size) && + container->iommu_type == VFIO_TYPE1v2_IOMMU) { + trace_vfio_dma_unmap_overflow_workaround(); + unmap.size -= 1ULL << ctz64(container->pgsizes); + continue; + } error_report("VFIO_UNMAP_DMA: %d", -errno); return -errno; } diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index f41ca96160bf..ed2f333ad726 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -110,6 +110,7 @@ vfio_region_mmaps_set_enabled(const char *name, bool enabled) "Region %s mmaps e vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) "Device %s region %d: %d sparse mmap entries" vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) "sparse entry %d [0x%lx - 0x%lx]" vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t subtype) "%s index %d, %08x/%0x8" +vfio_dma_unmap_overflow_workaround(void) "" # hw/vfio/platform.c vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group #%d" From patchwork Fri Feb 22 05:33:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 10825247 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5C5721805 for ; Fri, 22 Feb 2019 05:36:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 458DA28876 for ; Fri, 22 Feb 2019 05:36:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 384353042E; Fri, 22 Feb 2019 05:36:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 9BC8828876 for ; Fri, 22 Feb 2019 05:36:42 +0000 (UTC) Received: from localhost ([127.0.0.1]:44571 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gx3Vp-0003nj-VP for patchwork-qemu-devel@patchwork.kernel.org; Fri, 22 Feb 2019 00:36:41 -0500 Received: from eggs.gnu.org ([209.51.188.92]:49879) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gx3TL-0002FV-BN for qemu-devel@nongnu.org; Fri, 22 Feb 2019 00:34:08 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gx3TJ-0004oH-Ug for qemu-devel@nongnu.org; Fri, 22 Feb 2019 00:34:07 -0500 Received: from mx1.redhat.com ([209.132.183.28]:48896) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gx3TJ-0004if-A5 for qemu-devel@nongnu.org; Fri, 22 Feb 2019 00:34:05 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9AF1A3688B for ; Fri, 22 Feb 2019 05:33:57 +0000 (UTC) Received: from gimli.home (ovpn-116-24.phx2.redhat.com [10.3.116.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5DA2B19C58; Fri, 22 Feb 2019 05:33:57 +0000 (UTC) From: Alex Williamson To: qemu-devel@nongnu.org Date: Thu, 21 Feb 2019 22:33:57 -0700 Message-ID: <155081363698.23160.8018986798435862933.stgit@gimli.home> In-Reply-To: <155081340903.23160.4034617687275790161.stgit@gimli.home> References: <155081340903.23160.4034617687275790161.stgit@gimli.home> User-Agent: StGit/0.19-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Fri, 22 Feb 2019 05:33:57 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PULL 2/2] hw/vfio/common: Refactor container initialization X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Eric Auger We introduce the vfio_init_container_type() helper. It computes the highest usable iommu type and then set the container and the iommu type. Its usage in vfio_connect_container() makes the code ready for addition of new iommu types. Signed-off-by: Eric Auger Reviewed-by: Greg Kurz Signed-off-by: Alex Williamson --- hw/vfio/common.c | 114 +++++++++++++++++++++++++++++++++--------------------- 1 file changed, 70 insertions(+), 44 deletions(-) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 9c3796e7db43..df2b4721bffb 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -1054,6 +1054,60 @@ static void vfio_put_address_space(VFIOAddressSpace *space) } } +/* + * vfio_get_iommu_type - selects the richest iommu_type (v2 first) + */ +static int vfio_get_iommu_type(VFIOContainer *container, + Error **errp) +{ + int iommu_types[] = { VFIO_TYPE1v2_IOMMU, VFIO_TYPE1_IOMMU, + VFIO_SPAPR_TCE_v2_IOMMU, VFIO_SPAPR_TCE_IOMMU }; + int i; + + for (i = 0; i < ARRAY_SIZE(iommu_types); i++) { + if (ioctl(container->fd, VFIO_CHECK_EXTENSION, iommu_types[i])) { + return iommu_types[i]; + } + } + error_setg(errp, "No available IOMMU models"); + return -EINVAL; +} + +static int vfio_init_container(VFIOContainer *container, int group_fd, + Error **errp) +{ + int iommu_type, ret; + + iommu_type = vfio_get_iommu_type(container, errp); + if (iommu_type < 0) { + return iommu_type; + } + + ret = ioctl(group_fd, VFIO_GROUP_SET_CONTAINER, &container->fd); + if (ret) { + error_setg_errno(errp, errno, "Failed to set group container"); + return -errno; + } + + while (ioctl(container->fd, VFIO_SET_IOMMU, iommu_type)) { + if (iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) { + /* + * On sPAPR, despite the IOMMU subdriver always advertises v1 and + * v2, the running platform may not support v2 and there is no + * way to guess it until an IOMMU group gets added to the container. + * So in case it fails with v2, try v1 as a fallback. + */ + iommu_type = VFIO_SPAPR_TCE_IOMMU; + continue; + } + error_setg_errno(errp, errno, "Failed to set iommu for container"); + return -errno; + } + + container->iommu_type = iommu_type; + return 0; +} + static int vfio_connect_container(VFIOGroup *group, AddressSpace *as, Error **errp) { @@ -1119,25 +1173,17 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as, container->fd = fd; QLIST_INIT(&container->giommu_list); QLIST_INIT(&container->hostwin_list); - if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU) || - ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) { - bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU); - struct vfio_iommu_type1_info info; - ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd); - if (ret) { - error_setg_errno(errp, errno, "failed to set group container"); - ret = -errno; - goto free_container_exit; - } + ret = vfio_init_container(container, group->fd, errp); + if (ret) { + goto free_container_exit; + } - container->iommu_type = v2 ? VFIO_TYPE1v2_IOMMU : VFIO_TYPE1_IOMMU; - ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type); - if (ret) { - error_setg_errno(errp, errno, "failed to set iommu for container"); - ret = -errno; - goto free_container_exit; - } + switch (container->iommu_type) { + case VFIO_TYPE1v2_IOMMU: + case VFIO_TYPE1_IOMMU: + { + struct vfio_iommu_type1_info info; /* * FIXME: This assumes that a Type1 IOMMU can map any 64-bit @@ -1155,30 +1201,13 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as, } vfio_host_win_add(container, 0, (hwaddr)-1, info.iova_pgsizes); container->pgsizes = info.iova_pgsizes; - } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU) || - ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU)) { + break; + } + case VFIO_SPAPR_TCE_v2_IOMMU: + case VFIO_SPAPR_TCE_IOMMU: + { struct vfio_iommu_spapr_tce_info info; - bool v2 = !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU); - - ret = ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd); - if (ret) { - error_setg_errno(errp, errno, "failed to set group container"); - ret = -errno; - goto free_container_exit; - } - container->iommu_type = - v2 ? VFIO_SPAPR_TCE_v2_IOMMU : VFIO_SPAPR_TCE_IOMMU; - ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type); - if (ret) { - container->iommu_type = VFIO_SPAPR_TCE_IOMMU; - v2 = false; - ret = ioctl(fd, VFIO_SET_IOMMU, container->iommu_type); - } - if (ret) { - error_setg_errno(errp, errno, "failed to set iommu for container"); - ret = -errno; - goto free_container_exit; - } + bool v2 = container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU; /* * The host kernel code implementing VFIO_IOMMU_DISABLE is called @@ -1240,10 +1269,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as, info.dma32_window_size - 1, 0x1000); } - } else { - error_setg(errp, "No available IOMMU models"); - ret = -EINVAL; - goto free_container_exit; + } } vfio_kvm_device_add_group(group);