From patchwork Mon Jul 30 23:14:37 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 10549699 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 565DB15E2 for ; Mon, 30 Jul 2018 23:14:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3249429FDA for ; Mon, 30 Jul 2018 23:14:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 26A1829FFB; Mon, 30 Jul 2018 23:14:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 82A0229FDA for ; Mon, 30 Jul 2018 23:14:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732163AbeGaAwD (ORCPT ); Mon, 30 Jul 2018 20:52:03 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52918 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732027AbeGaAwD (ORCPT ); Mon, 30 Jul 2018 20:52:03 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 6A4254E907; Mon, 30 Jul 2018 23:14:46 +0000 (UTC) Received: from gimli.home (ovpn-116-35.phx2.redhat.com [10.3.116.35]) by smtp.corp.redhat.com (Postfix) with ESMTP id A04FB16C1C; Mon, 30 Jul 2018 23:14:37 +0000 (UTC) Subject: [PATCH v2 4/4] vfio/ccw/pci: Allow devices to opt-in for ballooning From: Alex Williamson To: qemu-devel@nongnu.org Cc: kvm@vger.kernel.org, peterx@redhat.com, cohuck@redhat.com, mst@redhat.com, david@redhat.com Date: Mon, 30 Jul 2018 17:14:37 -0600 Message-ID: <153299247733.14411.6837517320997223920.stgit@gimli.home> In-Reply-To: <153299204130.14411.11438396195753743913.stgit@gimli.home> References: <153299204130.14411.11438396195753743913.stgit@gimli.home> User-Agent: StGit/0.18-136-gffd7-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Mon, 30 Jul 2018 23:14:46 +0000 (UTC) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP If a vfio assigned device makes use of a physical IOMMU, then memory ballooning is necessarily inhibited due to the page pinning, lack of page level granularity at the IOMMU, and sufficient notifiers to both remove the page on balloon inflation and add it back on deflation. However, not all devices are backed by a physical IOMMU. In the case of mediated devices, if a vendor driver is well synchronized with the guest driver, such that only pages actively used by the guest driver are pinned by the host mdev vendor driver, then there should be no overlap between pages available for the balloon driver and pages actively in use by the device. Under these conditions, ballooning should be safe. vfio-ccw devices are always mediated devices and always operate under the constraints above. Therefore we can consider all vfio-ccw devices as balloon compatible. The situation is far from straightforward with vfio-pci. These devices can be physical devices with physical IOMMU backing or mediated devices where it is unknown whether a physical IOMMU is in use or whether the vendor driver is well synchronized to the working set of the guest driver. The safest approach is therefore to assume all vfio-pci devices are incompatible with ballooning, but allow user opt-in should they have further insight into mediated devices. Signed-off-by: Alex Williamson --- hw/vfio/ccw.c | 9 +++++++++ hw/vfio/common.c | 23 ++++++++++++++++++++++- hw/vfio/pci.c | 26 +++++++++++++++++++++++++- hw/vfio/trace-events | 1 + include/hw/vfio/vfio-common.h | 2 ++ 5 files changed, 59 insertions(+), 2 deletions(-) diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c index 351b305e1ae7..40e7b5623e69 100644 --- a/hw/vfio/ccw.c +++ b/hw/vfio/ccw.c @@ -349,6 +349,15 @@ static void vfio_ccw_get_device(VFIOGroup *group, VFIOCCWDevice *vcdev, } } + /* + * All vfio-ccw devices are believed to operate compatibly with memory + * ballooning, ie. pages pinned in the host are in the current working + * set of the guest driver and therefore never overlap with pages + * available to the guest balloon driver. This needs to be set before + * vfio_get_device() for vfio common to handle the balloon inhibitor. + */ + vcdev->vdev.balloon_allowed = true; + if (vfio_get_device(group, vcdev->cdev.mdevid, &vcdev->vdev, errp)) { goto out_err; } diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 4881b691a659..ef5f4b77548a 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -1356,7 +1356,9 @@ void vfio_put_group(VFIOGroup *group) return; } - qemu_balloon_inhibit(false); + if (!group->balloon_allowed) { + qemu_balloon_inhibit(false); + } vfio_kvm_device_del_group(group); vfio_disconnect_container(group); QLIST_REMOVE(group, next); @@ -1392,6 +1394,25 @@ int vfio_get_device(VFIOGroup *group, const char *name, return ret; } + /* + * Clear the balloon inhibitor for this group if the driver knows the + * device operates compatibly with ballooning. Setting must be consistent + * per group, but since compatibility is really only possible with mdev + * currently, we expect singleton groups. + */ + if (vbasedev->balloon_allowed != group->balloon_allowed) { + if (!QLIST_EMPTY(&group->device_list)) { + error_setg(errp, + "Inconsistent device balloon setting within group"); + return -1; + } + + if (!group->balloon_allowed) { + group->balloon_allowed = true; + qemu_balloon_inhibit(false); + } + } + vbasedev->fd = fd; vbasedev->group = group; QLIST_INSERT_HEAD(&group->device_list, vbasedev, next); diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 6cbb8fa0549d..056f3a887a8f 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2804,12 +2804,13 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev); VFIODevice *vbasedev_iter; VFIOGroup *group; - char *tmp, group_path[PATH_MAX], *group_name; + char *tmp, *subsys, group_path[PATH_MAX], *group_name; Error *err = NULL; ssize_t len; struct stat st; int groupid; int i, ret; + bool is_mdev; if (!vdev->vbasedev.sysfsdev) { if (!(~vdev->host.domain || ~vdev->host.bus || @@ -2869,6 +2870,27 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) } } + /* + * Mediated devices *might* operate compatibly with memory ballooning, but + * we cannot know for certain, it depends on whether the mdev vendor driver + * stays in sync with the active working set of the guest driver. Prevent + * the x-balloon-allowed option unless this is minimally an mdev device. + */ + tmp = g_strdup_printf("%s/subsystem", vdev->vbasedev.sysfsdev); + subsys = realpath(tmp, NULL); + g_free(tmp); + is_mdev = (strcmp(subsys, "/sys/bus/mdev") == 0); + free(subsys); + + trace_vfio_mdev(vdev->vbasedev.name, is_mdev); + + if (vdev->vbasedev.balloon_allowed && !is_mdev) { + error_setg(errp, "x-balloon-allowed only potentially compatible " + "with mdev devices"); + vfio_put_group(group); + goto error; + } + ret = vfio_get_device(group, vdev->vbasedev.name, &vdev->vbasedev, errp); if (ret) { vfio_put_group(group); @@ -3170,6 +3192,8 @@ static Property vfio_pci_dev_properties[] = { DEFINE_PROP_BIT("x-igd-opregion", VFIOPCIDevice, features, VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false), DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false), + DEFINE_PROP_BOOL("x-balloon-allowed", VFIOPCIDevice, + vbasedev.balloon_allowed, false), DEFINE_PROP_BOOL("x-no-kvm-intx", VFIOPCIDevice, no_kvm_intx, false), DEFINE_PROP_BOOL("x-no-kvm-msi", VFIOPCIDevice, no_kvm_msi, false), DEFINE_PROP_BOOL("x-no-kvm-msix", VFIOPCIDevice, no_kvm_msix, false), diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index d2a74952e389..a85e8662eadb 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -39,6 +39,7 @@ vfio_pci_hot_reset_result(const char *name, const char *result) "%s hot reset: % vfio_populate_device_config(const char *name, unsigned long size, unsigned long offset, unsigned long flags) "Device %s config:\n size: 0x%lx, offset: 0x%lx, flags: 0x%lx" vfio_populate_device_get_irq_info_failure(void) "VFIO_DEVICE_GET_IRQ_INFO failure: %m" vfio_realize(const char *name, int group_id) " (%s) group %d" +vfio_mdev(const char *name, bool is_mdev) " (%s) is_mdev %d" vfio_add_ext_cap_dropped(const char *name, uint16_t cap, uint16_t offset) "%s 0x%x@0x%x" vfio_pci_reset(const char *name) " (%s)" vfio_pci_reset_flr(const char *name) "%s FLR/VFIO_DEVICE_RESET" diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index a9036929b220..15ea6c26fdbc 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -112,6 +112,7 @@ typedef struct VFIODevice { bool reset_works; bool needs_reset; bool no_mmap; + bool balloon_allowed; VFIODeviceOps *ops; unsigned int num_irqs; unsigned int num_regions; @@ -131,6 +132,7 @@ typedef struct VFIOGroup { QLIST_HEAD(, VFIODevice) device_list; QLIST_ENTRY(VFIOGroup) next; QLIST_ENTRY(VFIOGroup) container_next; + bool balloon_allowed; } VFIOGroup; typedef struct VFIODMABuf {