From patchwork Thu Aug 5 17:07:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 12421687 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B904C432BE for ; Thu, 5 Aug 2021 17:07:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DD5556113B for ; Thu, 5 Aug 2021 17:07:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237105AbhHERHh (ORCPT ); Thu, 5 Aug 2021 13:07:37 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:30438 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236937AbhHERHf (ORCPT ); Thu, 5 Aug 2021 13:07:35 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628183239; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=flm8HIEA9Hn4PQkZkDKodDdN26bAb3fK34DI1Ef+s1Y=; b=JxGySDyXy7NRqauDdkvZdtyq/Nnba1G61d3b5DJ3brrlrgSSMCJttYywAyOpASUtWIH2W4 oUxqIVFyU9l/+t6ZVO1mT3xIo7DsgRp/V+MjbAFjSH/ZxCkHR+3oqnjsofBUUaKrGfPJn8 1QhP2kBZA831z8bFTi1m2pOmlm9NK5I= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-218-QtE1bS47OD62_5TOMO9ZJQ-1; Thu, 05 Aug 2021 13:07:18 -0400 X-MC-Unique: QtE1bS47OD62_5TOMO9ZJQ-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 2C9B3190A7A0; Thu, 5 Aug 2021 17:07:17 +0000 (UTC) Received: from [172.30.41.16] (ovpn-113-77.phx2.redhat.com [10.3.113.77]) by smtp.corp.redhat.com (Postfix) with ESMTP id A7DD71000186; Thu, 5 Aug 2021 17:07:09 +0000 (UTC) Subject: [PATCH 1/7] vfio: Create vfio_fs_type with inode per device From: Alex Williamson To: alex.williamson@redhat.com Cc: Jason Gunthorpe , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, jgg@nvidia.com, peterx@redhat.com Date: Thu, 05 Aug 2021 11:07:09 -0600 Message-ID: <162818322947.1511194.6035266132085405252.stgit@omen> In-Reply-To: <162818167535.1511194.6614962507750594786.stgit@omen> References: <162818167535.1511194.6614962507750594786.stgit@omen> User-Agent: StGit/1.0-8-g6af9-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org By linking all the device fds we provide to userspace to an address space through a new pseudo fs, we can use tools like unmap_mapping_range() to zap all vmas associated with a device. Suggested-by: Jason Gunthorpe Signed-off-by: Alex Williamson --- drivers/vfio/vfio.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/vfio.h | 1 + 2 files changed, 58 insertions(+) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 02cc51ce6891..b88de89bda31 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -21,8 +21,10 @@ #include #include #include +#include #include #include +#include #include #include #include @@ -37,6 +39,14 @@ #define DRIVER_AUTHOR "Alex Williamson " #define DRIVER_DESC "VFIO - User Level meta-driver" +/* + * Not exposed via UAPI + * + * XXX Adopt the following when available: + * https://lore.kernel.org/lkml/20210309155348.974875-1-hch@lst.de/ + */ +#define VFIO_MAGIC 0x5646494f /* "VFIO" */ + static struct vfio { struct class *class; struct list_head iommu_drivers_list; @@ -46,6 +56,8 @@ static struct vfio { struct mutex group_lock; struct cdev group_cdev; dev_t group_devt; + struct vfsmount *vfio_fs_mnt; + int vfio_fs_cnt; } vfio; struct vfio_iommu_driver { @@ -519,6 +531,35 @@ static struct vfio_group *vfio_group_get_from_dev(struct device *dev) return group; } +static int vfio_fs_init_fs_context(struct fs_context *fc) +{ + return init_pseudo(fc, VFIO_MAGIC) ? 0 : -ENOMEM; +} + +static struct file_system_type vfio_fs_type = { + .name = "vfio", + .owner = THIS_MODULE, + .init_fs_context = vfio_fs_init_fs_context, + .kill_sb = kill_anon_super, +}; + +static struct inode *vfio_fs_inode_new(void) +{ + struct inode *inode; + int ret; + + ret = simple_pin_fs(&vfio_fs_type, + &vfio.vfio_fs_mnt, &vfio.vfio_fs_cnt); + if (ret) + return ERR_PTR(ret); + + inode = alloc_anon_inode(vfio.vfio_fs_mnt->mnt_sb); + if (IS_ERR(inode)) + simple_release_fs(&vfio.vfio_fs_mnt, &vfio.vfio_fs_cnt); + + return inode; +} + /** * Device objects - create, release, get, put, search */ @@ -783,6 +824,12 @@ int vfio_register_group_dev(struct vfio_device *device) return -EBUSY; } + device->inode = vfio_fs_inode_new(); + if (IS_ERR(device->inode)) { + vfio_group_put(group); + return PTR_ERR(device->inode); + } + /* Our reference on group is moved to the device */ device->group = group; @@ -907,6 +954,9 @@ void vfio_unregister_group_dev(struct vfio_device *device) group->dev_counter--; mutex_unlock(&group->device_lock); + iput(device->inode); + simple_release_fs(&vfio.vfio_fs_mnt, &vfio.vfio_fs_cnt); + /* * In order to support multiple devices per group, devices can be * plucked from the group while other devices in the group are still @@ -1411,6 +1461,13 @@ static int vfio_group_get_device_fd(struct vfio_group *group, char *buf) */ filep->f_mode |= (FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE); + /* + * Use the pseudo fs inode on the device to link all mmaps + * to the same address space, allowing us to unmap all vmas + * associated to this device using unmap_mapping_range(). + */ + filep->f_mapping = device->inode->i_mapping; + atomic_inc(&group->container_users); fd_install(ret, filep); diff --git a/include/linux/vfio.h b/include/linux/vfio.h index a2c5b30e1763..90bcc2e9c8eb 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -24,6 +24,7 @@ struct vfio_device { refcount_t refcount; struct completion comp; struct list_head group_next; + struct inode *inode; }; /** From patchwork Thu Aug 5 17:07:22 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 12421689 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1719C4338F for ; Thu, 5 Aug 2021 17:07:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 872CE61132 for ; Thu, 5 Aug 2021 17:07:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237217AbhHERHt (ORCPT ); Thu, 5 Aug 2021 13:07:49 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:43733 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237396AbhHERHr (ORCPT ); Thu, 5 Aug 2021 13:07:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628183252; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fMfIvFu91OrZADMlmkZk2yLTz+kS1Ic9Q6C56vZ5Z1M=; b=TSbWeM0BaewiGVnW8arGoi3CmPsPcYi76WrcnZekgz4o0x/a9QioJMBehFDnhbEO3RdJAE yyK+DNCPI8kEV7pFFiDjqJEXAl/KOEruPXMYzwO7g3bl8a8GsfyoSKA5iAoR9xPN2EbCf5 M8SfePcV6DOf3kaDNe8glfbRyx41JaU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-328-IfSVrlTANxqgf5EXrnP1wQ-1; Thu, 05 Aug 2021 13:07:31 -0400 X-MC-Unique: IfSVrlTANxqgf5EXrnP1wQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 25FF9101C8A9; Thu, 5 Aug 2021 17:07:30 +0000 (UTC) Received: from [172.30.41.16] (ovpn-113-77.phx2.redhat.com [10.3.113.77]) by smtp.corp.redhat.com (Postfix) with ESMTP id 64C6E544F1; Thu, 5 Aug 2021 17:07:22 +0000 (UTC) Subject: [PATCH 2/7] vfio: Export unmap_mapping_range() wrapper From: Alex Williamson To: alex.williamson@redhat.com Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, jgg@nvidia.com, peterx@redhat.com Date: Thu, 05 Aug 2021 11:07:22 -0600 Message-ID: <162818324222.1511194.15934590640437021149.stgit@omen> In-Reply-To: <162818167535.1511194.6614962507750594786.stgit@omen> References: <162818167535.1511194.6614962507750594786.stgit@omen> User-Agent: StGit/1.0-8-g6af9-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Allow bus drivers to use vfio pseudo fs mapping to zap all mmaps across a range of their device files. Signed-off-by: Alex Williamson --- drivers/vfio/vfio.c | 7 +++++++ include/linux/vfio.h | 2 ++ 2 files changed, 9 insertions(+) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index b88de89bda31..1e4fc69fee7d 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -560,6 +560,13 @@ static struct inode *vfio_fs_inode_new(void) return inode; } +void vfio_device_unmap_mapping_range(struct vfio_device *device, + loff_t start, loff_t len) +{ + unmap_mapping_range(device->inode->i_mapping, start, len, true); +} +EXPORT_SYMBOL_GPL(vfio_device_unmap_mapping_range); + /** * Device objects - create, release, get, put, search */ diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 90bcc2e9c8eb..712813703e5a 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -66,6 +66,8 @@ int vfio_register_group_dev(struct vfio_device *device); void vfio_unregister_group_dev(struct vfio_device *device); extern struct vfio_device *vfio_device_get_from_dev(struct device *dev); extern void vfio_device_put(struct vfio_device *device); +extern void vfio_device_unmap_mapping_range(struct vfio_device *device, + loff_t start, loff_t len); /* events for the backend driver notify callback */ enum vfio_iommu_notify_type { From patchwork Thu Aug 5 17:07:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 12421691 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 540A5C432BE for ; Thu, 5 Aug 2021 17:07:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3BC5061132 for ; Thu, 5 Aug 2021 17:07:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237315AbhHERH7 (ORCPT ); Thu, 5 Aug 2021 13:07:59 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:52964 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237222AbhHERH7 (ORCPT ); Thu, 5 Aug 2021 13:07:59 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628183264; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TrIcgTaar+7Hjqmh1348IEVpYyHSuu9GUjxCpyciw8k=; b=Zvy5HgJojIbEcyzzEihvpeS6Ef8cES81Orj16OuKAoDHJSCO++GLWAALVKzOM2WiwNfm3s KfQmBKrm9ashcHcK+fkypQQRjYdEC1qxal4DhNE4f0f3xRuY7y2E8ToNGVJEW9FONJzhBw oi4cyjIuuf4MaOoT/yFwiq2kf3r+yEE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-144-HJigzVWqO36oHH9pqNAFLg-1; Thu, 05 Aug 2021 13:07:43 -0400 X-MC-Unique: HJigzVWqO36oHH9pqNAFLg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 5F6291007B36; Thu, 5 Aug 2021 17:07:42 +0000 (UTC) Received: from [172.30.41.16] (ovpn-113-77.phx2.redhat.com [10.3.113.77]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5AFFC226FB; Thu, 5 Aug 2021 17:07:35 +0000 (UTC) Subject: [PATCH 3/7] vfio/pci: Use vfio_device_unmap_mapping_range() From: Alex Williamson To: alex.williamson@redhat.com Cc: Jason Gunthorpe , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, jgg@nvidia.com, peterx@redhat.com Date: Thu, 05 Aug 2021 11:07:35 -0600 Message-ID: <162818325518.1511194.1243290800645603609.stgit@omen> In-Reply-To: <162818167535.1511194.6614962507750594786.stgit@omen> References: <162818167535.1511194.6614962507750594786.stgit@omen> User-Agent: StGit/1.0-8-g6af9-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org With the vfio device fd tied to the address space of the pseudo fs inode, we can use the mm to track all vmas that might be mmap'ing device BARs, which removes our vma_list and all the complicated lock ordering necessary to manually zap each related vma. Note that we can no longer store the pfn in vm_pgoff if we want to use unmap_mapping_range() to zap a selective portion of the device fd corresponding to BAR mappings. This also converts our mmap fault handler to use vmf_insert_pfn() because we no longer have a vma_list to avoid the concurrency problem with io_remap_pfn_range(). This is a step towards removing the fault handler entirely, at which point we'll return to using io_remap_pfn_range(). Suggested-by: Jason Gunthorpe Signed-off-by: Alex Williamson --- drivers/vfio/pci/vfio_pci.c | 238 +++++++---------------------------- drivers/vfio/pci/vfio_pci_private.h | 2 2 files changed, 49 insertions(+), 191 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 318864d52837..c526edbf1173 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -225,7 +225,7 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev) static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev); static void vfio_pci_disable(struct vfio_pci_device *vdev); -static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data); +static int vfio_pci_mem_trylock_and_zap_cb(struct pci_dev *pdev, void *data); /* * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND @@ -1141,7 +1141,7 @@ static long vfio_pci_ioctl(struct vfio_device *core_vdev, struct vfio_pci_group_info info; struct vfio_devices devs = { .cur_index = 0 }; bool slot = false; - int i, group_idx, mem_idx = 0, count = 0, ret = 0; + int i, group_idx, count = 0, ret = 0; minsz = offsetofend(struct vfio_pci_hot_reset, count); @@ -1241,39 +1241,22 @@ static long vfio_pci_ioctl(struct vfio_device *core_vdev, } /* - * We need to get memory_lock for each device, but devices - * can share mmap_lock, therefore we need to zap and hold - * the vma_lock for each device, and only then get each - * memory_lock. + * Try to get the memory_lock write lock for all devices and + * zap all BAR mmaps. */ ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, - vfio_pci_try_zap_and_vma_lock_cb, + vfio_pci_mem_trylock_and_zap_cb, &devs, slot); - if (ret) - goto hot_reset_release; - - for (; mem_idx < devs.cur_index; mem_idx++) { - struct vfio_pci_device *tmp = devs.devices[mem_idx]; - - ret = down_write_trylock(&tmp->memory_lock); - if (!ret) { - ret = -EBUSY; - goto hot_reset_release; - } - mutex_unlock(&tmp->vma_lock); - } /* User has access, do the reset */ - ret = pci_reset_bus(vdev->pdev); + if (!ret) + ret = pci_reset_bus(vdev->pdev); hot_reset_release: for (i = 0; i < devs.cur_index; i++) { struct vfio_pci_device *tmp = devs.devices[i]; - if (i < mem_idx) - up_write(&tmp->memory_lock); - else - mutex_unlock(&tmp->vma_lock); + up_write(&tmp->memory_lock); vfio_device_put(&tmp->vdev); } kfree(devs.devices); @@ -1424,100 +1407,18 @@ static ssize_t vfio_pci_write(struct vfio_device *core_vdev, const char __user * return vfio_pci_rw(vdev, (char __user *)buf, count, ppos, true); } -/* Return 1 on zap and vma_lock acquired, 0 on contention (only with @try) */ -static int vfio_pci_zap_and_vma_lock(struct vfio_pci_device *vdev, bool try) +static void vfio_pci_zap_bars(struct vfio_pci_device *vdev) { - struct vfio_pci_mmap_vma *mmap_vma, *tmp; - - /* - * Lock ordering: - * vma_lock is nested under mmap_lock for vm_ops callback paths. - * The memory_lock semaphore is used by both code paths calling - * into this function to zap vmas and the vm_ops.fault callback - * to protect the memory enable state of the device. - * - * When zapping vmas we need to maintain the mmap_lock => vma_lock - * ordering, which requires using vma_lock to walk vma_list to - * acquire an mm, then dropping vma_lock to get the mmap_lock and - * reacquiring vma_lock. This logic is derived from similar - * requirements in uverbs_user_mmap_disassociate(). - * - * mmap_lock must always be the top-level lock when it is taken. - * Therefore we can only hold the memory_lock write lock when - * vma_list is empty, as we'd need to take mmap_lock to clear - * entries. vma_list can only be guaranteed empty when holding - * vma_lock, thus memory_lock is nested under vma_lock. - * - * This enables the vm_ops.fault callback to acquire vma_lock, - * followed by memory_lock read lock, while already holding - * mmap_lock without risk of deadlock. - */ - while (1) { - struct mm_struct *mm = NULL; - - if (try) { - if (!mutex_trylock(&vdev->vma_lock)) - return 0; - } else { - mutex_lock(&vdev->vma_lock); - } - while (!list_empty(&vdev->vma_list)) { - mmap_vma = list_first_entry(&vdev->vma_list, - struct vfio_pci_mmap_vma, - vma_next); - mm = mmap_vma->vma->vm_mm; - if (mmget_not_zero(mm)) - break; - - list_del(&mmap_vma->vma_next); - kfree(mmap_vma); - mm = NULL; - } - if (!mm) - return 1; - mutex_unlock(&vdev->vma_lock); - - if (try) { - if (!mmap_read_trylock(mm)) { - mmput(mm); - return 0; - } - } else { - mmap_read_lock(mm); - } - if (try) { - if (!mutex_trylock(&vdev->vma_lock)) { - mmap_read_unlock(mm); - mmput(mm); - return 0; - } - } else { - mutex_lock(&vdev->vma_lock); - } - list_for_each_entry_safe(mmap_vma, tmp, - &vdev->vma_list, vma_next) { - struct vm_area_struct *vma = mmap_vma->vma; - - if (vma->vm_mm != mm) - continue; - - list_del(&mmap_vma->vma_next); - kfree(mmap_vma); - - zap_vma_ptes(vma, vma->vm_start, - vma->vm_end - vma->vm_start); - } - mutex_unlock(&vdev->vma_lock); - mmap_read_unlock(mm); - mmput(mm); - } + vfio_device_unmap_mapping_range(&vdev->vdev, + VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX), + VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_ROM_REGION_INDEX) - + VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX)); } void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_device *vdev) { - vfio_pci_zap_and_vma_lock(vdev, false); down_write(&vdev->memory_lock); - mutex_unlock(&vdev->vma_lock); + vfio_pci_zap_bars(vdev); } u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_device *vdev) @@ -1539,95 +1440,58 @@ void vfio_pci_memory_unlock_and_restore(struct vfio_pci_device *vdev, u16 cmd) up_write(&vdev->memory_lock); } -/* Caller holds vma_lock */ -static int __vfio_pci_add_vma(struct vfio_pci_device *vdev, - struct vm_area_struct *vma) +static int vfio_pci_bar_vma_to_pfn(struct vm_area_struct *vma, + unsigned long *pfn) { - struct vfio_pci_mmap_vma *mmap_vma; - - mmap_vma = kmalloc(sizeof(*mmap_vma), GFP_KERNEL); - if (!mmap_vma) - return -ENOMEM; + struct vfio_pci_device *vdev = vma->vm_private_data; + struct pci_dev *pdev = vdev->pdev; + int index; + u64 pgoff; - mmap_vma->vma = vma; - list_add(&mmap_vma->vma_next, &vdev->vma_list); + index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); - return 0; -} + if (index >= VFIO_PCI_ROM_REGION_INDEX || + !vdev->bar_mmap_supported[index] || !vdev->barmap[index]) + return -EINVAL; -/* - * Zap mmaps on open so that we can fault them in on access and therefore - * our vma_list only tracks mappings accessed since last zap. - */ -static void vfio_pci_mmap_open(struct vm_area_struct *vma) -{ - zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start); -} + pgoff = vma->vm_pgoff & + ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1); -static void vfio_pci_mmap_close(struct vm_area_struct *vma) -{ - struct vfio_pci_device *vdev = vma->vm_private_data; - struct vfio_pci_mmap_vma *mmap_vma; + *pfn = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff; - mutex_lock(&vdev->vma_lock); - list_for_each_entry(mmap_vma, &vdev->vma_list, vma_next) { - if (mmap_vma->vma == vma) { - list_del(&mmap_vma->vma_next); - kfree(mmap_vma); - break; - } - } - mutex_unlock(&vdev->vma_lock); + return 0; } static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct vfio_pci_device *vdev = vma->vm_private_data; - struct vfio_pci_mmap_vma *mmap_vma; - vm_fault_t ret = VM_FAULT_NOPAGE; + unsigned long vaddr, pfn; + vm_fault_t ret = VM_FAULT_SIGBUS; - mutex_lock(&vdev->vma_lock); - down_read(&vdev->memory_lock); - - if (!__vfio_pci_memory_enabled(vdev)) { - ret = VM_FAULT_SIGBUS; - goto up_out; - } - - /* - * We populate the whole vma on fault, so we need to test whether - * the vma has already been mapped, such as for concurrent faults - * to the same vma. io_remap_pfn_range() will trigger a BUG_ON if - * we ask it to fill the same range again. - */ - list_for_each_entry(mmap_vma, &vdev->vma_list, vma_next) { - if (mmap_vma->vma == vma) - goto up_out; - } + if (vfio_pci_bar_vma_to_pfn(vma, &pfn)) + return ret; - if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, - vma->vm_end - vma->vm_start, - vma->vm_page_prot)) { - ret = VM_FAULT_SIGBUS; - zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start); - goto up_out; - } + down_read(&vdev->memory_lock); - if (__vfio_pci_add_vma(vdev, vma)) { - ret = VM_FAULT_OOM; - zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start); + if (__vfio_pci_memory_enabled(vdev)) { + for (vaddr = vma->vm_start; + vaddr < vma->vm_end; vaddr += PAGE_SIZE, pfn++) { + ret = vmf_insert_pfn(vma, vaddr, pfn); + if (ret != VM_FAULT_NOPAGE) { + zap_vma_ptes(vma, vma->vm_start, + vaddr - vma->vm_start); + break; + } + } } -up_out: up_read(&vdev->memory_lock); - mutex_unlock(&vdev->vma_lock); + return ret; } static const struct vm_operations_struct vfio_pci_mmap_ops = { - .open = vfio_pci_mmap_open, - .close = vfio_pci_mmap_close, .fault = vfio_pci_mmap_fault, }; @@ -1690,7 +1554,7 @@ static int vfio_pci_mmap(struct vfio_device *core_vdev, struct vm_area_struct *v vma->vm_private_data = vdev; vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); - vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff; + vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot); /* * See remap_pfn_range(), called from vfio_pci_fault() but we can't @@ -2016,8 +1880,6 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) mutex_init(&vdev->ioeventfds_lock); INIT_LIST_HEAD(&vdev->dummy_resources_list); INIT_LIST_HEAD(&vdev->ioeventfds_list); - mutex_init(&vdev->vma_lock); - INIT_LIST_HEAD(&vdev->vma_list); init_rwsem(&vdev->memory_lock); ret = vfio_pci_reflck_attach(vdev); @@ -2261,7 +2123,7 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data) return 0; } -static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data) +static int vfio_pci_mem_trylock_and_zap_cb(struct pci_dev *pdev, void *data) { struct vfio_devices *devs = data; struct vfio_device *device; @@ -2281,15 +2143,13 @@ static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data) vdev = container_of(device, struct vfio_pci_device, vdev); - /* - * Locking multiple devices is prone to deadlock, runaway and - * unwind if we hit contention. - */ - if (!vfio_pci_zap_and_vma_lock(vdev, true)) { + if (!down_write_trylock(&vdev->memory_lock)) { vfio_device_put(device); return -EBUSY; } + vfio_pci_zap_bars(vdev); + devs->devices[devs->cur_index++] = vdev; return 0; } diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index bbc56c857ef0..0aa542fa1e26 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -140,8 +140,6 @@ struct vfio_pci_device { struct list_head ioeventfds_list; struct vfio_pci_vf_token *vf_token; struct notifier_block nb; - struct mutex vma_lock; - struct list_head vma_list; struct rw_semaphore memory_lock; }; From patchwork Thu Aug 5 17:07:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 12421693 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3034C4338F for ; Thu, 5 Aug 2021 17:07:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A90F661132 for ; Thu, 5 Aug 2021 17:07:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237497AbhHERIM (ORCPT ); Thu, 5 Aug 2021 13:08:12 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:37849 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237222AbhHERIM (ORCPT ); Thu, 5 Aug 2021 13:08:12 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628183277; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZTN5INyjQiEkVh8c4VlLRN4VK58ndKszKj5XfQVOIPI=; b=ZSkHPZdH0ggCUbTOS9C63FlZtPJAb7283hrdGB0wLwhzoWGInHsXDnGNZ7qAFhUiQQEAE5 ceQqiTTkIWeWGFGk1bAgd1QMN9jyTV5L3ZVeuGOMoREdwi90uifl4/FAnVPRautCHpcUM5 dOy67ATAa2W8bzXxfUQKfNfVu9L3qaU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-264-wMPjUAs7Pw6ZMBTrS-DkpQ-1; Thu, 05 Aug 2021 13:07:56 -0400 X-MC-Unique: wMPjUAs7Pw6ZMBTrS-DkpQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 6419D107ACF5; Thu, 5 Aug 2021 17:07:55 +0000 (UTC) Received: from [172.30.41.16] (ovpn-113-77.phx2.redhat.com [10.3.113.77]) by smtp.corp.redhat.com (Postfix) with ESMTP id 97EC13CC7; Thu, 5 Aug 2021 17:07:47 +0000 (UTC) Subject: [PATCH 4/7] vfio,vfio-pci: Add vma to pfn callback From: Alex Williamson To: alex.williamson@redhat.com Cc: Jason Gunthorpe , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, jgg@nvidia.com, peterx@redhat.com Date: Thu, 05 Aug 2021 11:07:47 -0600 Message-ID: <162818326742.1511194.1366505678218237973.stgit@omen> In-Reply-To: <162818167535.1511194.6614962507750594786.stgit@omen> References: <162818167535.1511194.6614962507750594786.stgit@omen> User-Agent: StGit/1.0-8-g6af9-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add a new vfio_device_ops callback to allow the vfio device driver to translate a vma mapping of a vfio device fd to a pfn. Implementation limited to vfio-pci here for the purpose of supporting the reverse of unmap_mapping_range(), but expected to be implemented for all vfio device drivers supporting DMA mapping of device memory mmaps. Suggested-by: Jason Gunthorpe Signed-off-by: Alex Williamson --- drivers/vfio/pci/vfio_pci.c | 9 ++++++--- drivers/vfio/vfio.c | 18 ++++++++++++++++-- include/linux/vfio.h | 6 ++++++ 3 files changed, 28 insertions(+), 5 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index c526edbf1173..7a9f67cfc0a2 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -1440,10 +1440,12 @@ void vfio_pci_memory_unlock_and_restore(struct vfio_pci_device *vdev, u16 cmd) up_write(&vdev->memory_lock); } -static int vfio_pci_bar_vma_to_pfn(struct vm_area_struct *vma, +static int vfio_pci_bar_vma_to_pfn(struct vfio_device *core_vdev, + struct vm_area_struct *vma, unsigned long *pfn) { - struct vfio_pci_device *vdev = vma->vm_private_data; + struct vfio_pci_device *vdev = + container_of(core_vdev, struct vfio_pci_device, vdev); struct pci_dev *pdev = vdev->pdev; int index; u64 pgoff; @@ -1469,7 +1471,7 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf) unsigned long vaddr, pfn; vm_fault_t ret = VM_FAULT_SIGBUS; - if (vfio_pci_bar_vma_to_pfn(vma, &pfn)) + if (vfio_pci_bar_vma_to_pfn(&vdev->vdev, vma, &pfn)) return ret; down_read(&vdev->memory_lock); @@ -1742,6 +1744,7 @@ static const struct vfio_device_ops vfio_pci_ops = { .mmap = vfio_pci_mmap, .request = vfio_pci_request, .match = vfio_pci_match, + .vma_to_pfn = vfio_pci_bar_vma_to_pfn, }; static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev); diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 1e4fc69fee7d..42ca93be152a 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -875,6 +875,22 @@ struct vfio_device *vfio_device_get_from_dev(struct device *dev) } EXPORT_SYMBOL_GPL(vfio_device_get_from_dev); +static const struct file_operations vfio_device_fops; + +int vfio_device_vma_to_pfn(struct vfio_device *device, + struct vm_area_struct *vma, unsigned long *pfn) +{ + if (WARN_ON(!vma->vm_file || vma->vm_file->f_op != &vfio_device_fops || + vma->vm_file->private_data != device)) + return -EINVAL; + + if (unlikely(!device->ops->vma_to_pfn)) + return -EPERM; + + return device->ops->vma_to_pfn(device, vma, pfn); +} +EXPORT_SYMBOL_GPL(vfio_device_vma_to_pfn); + static struct vfio_device *vfio_device_get_from_name(struct vfio_group *group, char *buf) { @@ -1407,8 +1423,6 @@ static int vfio_group_add_container_user(struct vfio_group *group) return 0; } -static const struct file_operations vfio_device_fops; - static int vfio_group_get_device_fd(struct vfio_group *group, char *buf) { struct vfio_device *device; diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 712813703e5a..5f07ebe0f85d 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -41,6 +41,7 @@ struct vfio_device { * @match: Optional device name match callback (return: 0 for no-match, >0 for * match, -errno for abort (ex. match with insufficient or incorrect * additional args) + * @vma_to_pfn: Optional pfn from vma lookup against vma mapping device fd */ struct vfio_device_ops { char *name; @@ -55,6 +56,8 @@ struct vfio_device_ops { int (*mmap)(struct vfio_device *vdev, struct vm_area_struct *vma); void (*request)(struct vfio_device *vdev, unsigned int count); int (*match)(struct vfio_device *vdev, char *buf); + int (*vma_to_pfn)(struct vfio_device *vdev, + struct vm_area_struct *vma, unsigned long *pfn); }; extern struct iommu_group *vfio_iommu_group_get(struct device *dev); @@ -68,6 +71,9 @@ extern struct vfio_device *vfio_device_get_from_dev(struct device *dev); extern void vfio_device_put(struct vfio_device *device); extern void vfio_device_unmap_mapping_range(struct vfio_device *device, loff_t start, loff_t len); +extern int vfio_device_vma_to_pfn(struct vfio_device *device, + struct vm_area_struct *vma, + unsigned long *pfn); /* events for the backend driver notify callback */ enum vfio_iommu_notify_type { From patchwork Thu Aug 5 17:08:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 12421695 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 434B2C4338F for ; Thu, 5 Aug 2021 17:08:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 25F7D61154 for ; Thu, 5 Aug 2021 17:08:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237575AbhHERIZ (ORCPT ); Thu, 5 Aug 2021 13:08:25 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:28696 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237202AbhHERIZ (ORCPT ); Thu, 5 Aug 2021 13:08:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628183290; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RN93NwEe5WtbM1cvBFWuqQaTNSgmNNz3gZJhT/+o3w8=; b=CC/MDQSggD85ARI38xqp+jb4LRMn8CShVTGauNKIyvCppns+Dq+4yXKKXMxDlaWJ3Ogia3 DgMBfEGz4X4u+WF9C323SFy5kWgPUFxudXlSmWlGtTD4iRZEZHiWfIHggmYr5mgFwLR1Q3 /BTLTgrp3dmfv+OwrAxAPEt0MJ2iDlQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-590-L60JlJt9PXyetW7tYPzBKw-1; Thu, 05 Aug 2021 13:08:08 -0400 X-MC-Unique: L60JlJt9PXyetW7tYPzBKw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4E3318799E0; Thu, 5 Aug 2021 17:08:07 +0000 (UTC) Received: from [172.30.41.16] (ovpn-113-77.phx2.redhat.com [10.3.113.77]) by smtp.corp.redhat.com (Postfix) with ESMTP id 991AD60CC9; Thu, 5 Aug 2021 17:08:00 +0000 (UTC) Subject: [PATCH 5/7] mm/interval_tree.c: Export vma interval tree iterators From: Alex Williamson To: alex.williamson@redhat.com Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, jgg@nvidia.com, peterx@redhat.com Date: Thu, 05 Aug 2021 11:08:00 -0600 Message-ID: <162818328044.1511194.11410182995960067691.stgit@omen> In-Reply-To: <162818167535.1511194.6614962507750594786.stgit@omen> References: <162818167535.1511194.6614962507750594786.stgit@omen> User-Agent: StGit/1.0-8-g6af9-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org In order to make use of vma_interval_tree_foreach() from a module we need to export the first and next interators. vfio code would like to use this foreach helper to create a remapping helper, essentially the reverse of unmap_mapping_range() for specific vmas mapping vfio device memory. Cc: Andrew Morton Cc: linux-mm@kvack.org Signed-off-by: Alex Williamson --- mm/interval_tree.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/mm/interval_tree.c b/mm/interval_tree.c index 32e390c42c53..faa50767496c 100644 --- a/mm/interval_tree.c +++ b/mm/interval_tree.c @@ -24,6 +24,9 @@ INTERVAL_TREE_DEFINE(struct vm_area_struct, shared.rb, unsigned long, shared.rb_subtree_last, vma_start_pgoff, vma_last_pgoff, /* empty */, vma_interval_tree) +EXPORT_SYMBOL_GPL(vma_interval_tree_iter_first); +EXPORT_SYMBOL_GPL(vma_interval_tree_iter_next); + /* Insert node immediately after prev in the interval tree */ void vma_interval_tree_insert_after(struct vm_area_struct *node, struct vm_area_struct *prev, From patchwork Thu Aug 5 17:08:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 12421697 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1FC95C4320A for ; Thu, 5 Aug 2021 17:08:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0311E6113C for ; Thu, 5 Aug 2021 17:08:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237819AbhHERIf (ORCPT ); Thu, 5 Aug 2021 13:08:35 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:45411 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237671AbhHERId (ORCPT ); Thu, 5 Aug 2021 13:08:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628183299; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=52ysVOWK9I9bBgN7ggwsw/lOUOadE8Uwd/lGY6JIZmE=; b=Eo83GyAghbYHgabniFmpmesLziruy4OreBsVV6GBVbAUNBQdPF/J8AKfjokZ9CMXOccWnq FyERqZ/zXP7R/447UhhGW30AYGHQplC0DR2HYudc+h+U9M6LPJUTilAPNDBKu71njh5S38 XY6PrsiLxTdBl740lziJFnMrfJL6FeA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-138-wbzGY5czNNO7psv5ypCTKA-1; Thu, 05 Aug 2021 13:08:17 -0400 X-MC-Unique: wbzGY5czNNO7psv5ypCTKA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D4031190D341; Thu, 5 Aug 2021 17:08:16 +0000 (UTC) Received: from [172.30.41.16] (ovpn-113-77.phx2.redhat.com [10.3.113.77]) by smtp.corp.redhat.com (Postfix) with ESMTP id 87B3E5F724; Thu, 5 Aug 2021 17:08:12 +0000 (UTC) Subject: [PATCH 6/7] vfio: Add vfio_device_io_remap_mapping_range() From: Alex Williamson To: alex.williamson@redhat.com Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, jgg@nvidia.com, peterx@redhat.com Date: Thu, 05 Aug 2021 11:08:12 -0600 Message-ID: <162818329235.1511194.15804833796430403640.stgit@omen> In-Reply-To: <162818167535.1511194.6614962507750594786.stgit@omen> References: <162818167535.1511194.6614962507750594786.stgit@omen> User-Agent: StGit/1.0-8-g6af9-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This provides a mirror of vfio_device_unmap_mapping_range() for vmas mapping device memory where the pfn is provided by vfio_device_vma_to_pfn(). Signed-off-by: Alex Williamson --- drivers/vfio/vfio.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/vfio.h | 2 ++ 2 files changed, 46 insertions(+) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 42ca93be152a..c5b3a3446dd9 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -33,6 +33,7 @@ #include #include #include +#include #include #define DRIVER_VERSION "0.3" @@ -567,6 +568,49 @@ void vfio_device_unmap_mapping_range(struct vfio_device *device, } EXPORT_SYMBOL_GPL(vfio_device_unmap_mapping_range); +int vfio_device_io_remap_mapping_range(struct vfio_device *device, + loff_t start, loff_t len) +{ + struct address_space *mapping = device->inode->i_mapping; + int ret = 0; + + i_mmap_lock_write(mapping); + if (mapping_mapped(mapping)) { + struct rb_root_cached *root = &mapping->i_mmap; + pgoff_t pgstart = start >> PAGE_SHIFT; + pgoff_t pgend = (start + len - 1) >> PAGE_SHIFT; + struct vm_area_struct *vma; + + vma_interval_tree_foreach(vma, root, pgstart, pgend) { + unsigned long pfn; + unsigned int flags; + + ret = vfio_device_vma_to_pfn(device, vma, &pfn); + if (ret) + break; + + /* + * Force NOFS memory allocation context to avoid + * deadlock while we hold i_mmap_rwsem. + */ + flags = memalloc_nofs_save(); + ret = io_remap_pfn_range(vma, vma->vm_start, pfn, + vma->vm_end - vma->vm_start, + vma->vm_page_prot); + memalloc_nofs_restore(flags); + if (ret) + break; + } + } + i_mmap_unlock_write(mapping); + + if (ret) + vfio_device_unmap_mapping_range(device, start, len); + + return ret; +} +EXPORT_SYMBOL_GPL(vfio_device_io_remap_mapping_range); + /** * Device objects - create, release, get, put, search */ diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 5f07ebe0f85d..c2c51c7a6f05 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -71,6 +71,8 @@ extern struct vfio_device *vfio_device_get_from_dev(struct device *dev); extern void vfio_device_put(struct vfio_device *device); extern void vfio_device_unmap_mapping_range(struct vfio_device *device, loff_t start, loff_t len); +extern int vfio_device_io_remap_mapping_range(struct vfio_device *device, + loff_t start, loff_t len); extern int vfio_device_vma_to_pfn(struct vfio_device *device, struct vm_area_struct *vma, unsigned long *pfn); From patchwork Thu Aug 5 17:08:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 12421699 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACD3BC432BE for ; Thu, 5 Aug 2021 17:08:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8AC5061155 for ; Thu, 5 Aug 2021 17:08:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237969AbhHERIu (ORCPT ); Thu, 5 Aug 2021 13:08:50 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:40864 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237668AbhHERIt (ORCPT ); Thu, 5 Aug 2021 13:08:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628183314; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cmH6Fr1v5dKikdp8IvedcWQSNzuiqYnvmJ72YO3MhX0=; b=WRnPZ/F7w78DiHvljCSCqMprxIXeH3zVuCCAGLdJMfOA78ydp7gvhW3DRFmt2Bc79HrnS3 YrcYyuxP5jQcl5pKO5iv2l3cC24HI55EvPWCwpIZ7hHuLgwtyqH7pRLjzYyaBn3amqbfEa G0zG+EFZb7E4fnmtVdVG/z1F+50vDEA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-129-xi5TlknyO4Kf0UWmzL52HQ-1; Thu, 05 Aug 2021 13:08:31 -0400 X-MC-Unique: xi5TlknyO4Kf0UWmzL52HQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 2A25793920; Thu, 5 Aug 2021 17:08:30 +0000 (UTC) Received: from [172.30.41.16] (ovpn-113-77.phx2.redhat.com [10.3.113.77]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1691F19C87; Thu, 5 Aug 2021 17:08:22 +0000 (UTC) Subject: [PATCH 7/7] vfio/pci: Remove map-on-fault behavior From: Alex Williamson To: alex.williamson@redhat.com Cc: Jason Gunthorpe , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, jgg@nvidia.com, peterx@redhat.com Date: Thu, 05 Aug 2021 11:08:21 -0600 Message-ID: <162818330190.1511194.10498114924408843888.stgit@omen> In-Reply-To: <162818167535.1511194.6614962507750594786.stgit@omen> References: <162818167535.1511194.6614962507750594786.stgit@omen> User-Agent: StGit/1.0-8-g6af9-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org With vfio_device_io_remap_mapping_range() we can repopulate vmas with device mappings around manipulation of the device rather than waiting for an access. This allows us to go back to the more standard use case of io_remap_pfn_range() for device memory while still preventing access to device memory through mmaps when the device is disabled. Suggested-by: Jason Gunthorpe Signed-off-by: Alex Williamson --- drivers/vfio/pci/vfio_pci.c | 80 +++++++++++++++++------------------ drivers/vfio/pci/vfio_pci_config.c | 8 ++-- drivers/vfio/pci/vfio_pci_private.h | 3 + 3 files changed, 45 insertions(+), 46 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 7a9f67cfc0a2..196b8002447b 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -447,6 +447,7 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev) kfree(dummy_res); } + vdev->zapped_bars = false; vdev->needs_reset = true; /* @@ -1057,7 +1058,7 @@ static long vfio_pci_ioctl(struct vfio_device *core_vdev, vfio_pci_zap_and_down_write_memory_lock(vdev); ret = pci_try_reset_function(vdev->pdev); - up_write(&vdev->memory_lock); + vfio_pci_test_and_up_write_memory_lock(vdev); return ret; @@ -1256,7 +1257,7 @@ static long vfio_pci_ioctl(struct vfio_device *core_vdev, for (i = 0; i < devs.cur_index; i++) { struct vfio_pci_device *tmp = devs.devices[i]; - up_write(&tmp->memory_lock); + vfio_pci_test_and_up_write_memory_lock(tmp); vfio_device_put(&tmp->vdev); } kfree(devs.devices); @@ -1413,6 +1414,14 @@ static void vfio_pci_zap_bars(struct vfio_pci_device *vdev) VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX), VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_ROM_REGION_INDEX) - VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX)); + + /* + * Modified under memory_lock write semaphore. Device handoff + * with memory enabled, therefore any disable will zap and setup + * a remap when re-enabled. io_remap_pfn_range() is not forgiving + * of duplicate mappings so we must track. + */ + vdev->zapped_bars = true; } void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_device *vdev) @@ -1421,6 +1430,18 @@ void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_device *vdev) vfio_pci_zap_bars(vdev); } +void vfio_pci_test_and_up_write_memory_lock(struct vfio_pci_device *vdev) +{ + if (vdev->zapped_bars && __vfio_pci_memory_enabled(vdev)) { + WARN_ON(vfio_device_io_remap_mapping_range(&vdev->vdev, + VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX), + VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_ROM_REGION_INDEX) - + VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX))); + vdev->zapped_bars = false; + } + up_write(&vdev->memory_lock); +} + u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_device *vdev) { u16 cmd; @@ -1464,39 +1485,6 @@ static int vfio_pci_bar_vma_to_pfn(struct vfio_device *core_vdev, return 0; } -static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf) -{ - struct vm_area_struct *vma = vmf->vma; - struct vfio_pci_device *vdev = vma->vm_private_data; - unsigned long vaddr, pfn; - vm_fault_t ret = VM_FAULT_SIGBUS; - - if (vfio_pci_bar_vma_to_pfn(&vdev->vdev, vma, &pfn)) - return ret; - - down_read(&vdev->memory_lock); - - if (__vfio_pci_memory_enabled(vdev)) { - for (vaddr = vma->vm_start; - vaddr < vma->vm_end; vaddr += PAGE_SIZE, pfn++) { - ret = vmf_insert_pfn(vma, vaddr, pfn); - if (ret != VM_FAULT_NOPAGE) { - zap_vma_ptes(vma, vma->vm_start, - vaddr - vma->vm_start); - break; - } - } - } - - up_read(&vdev->memory_lock); - - return ret; -} - -static const struct vm_operations_struct vfio_pci_mmap_ops = { - .fault = vfio_pci_mmap_fault, -}; - static int vfio_pci_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma) { struct vfio_pci_device *vdev = @@ -1504,6 +1492,7 @@ static int vfio_pci_mmap(struct vfio_device *core_vdev, struct vm_area_struct *v struct pci_dev *pdev = vdev->pdev; unsigned int index; u64 phys_len, req_len, pgoff, req_start; + unsigned long pfn; int ret; index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); @@ -1554,18 +1543,25 @@ static int vfio_pci_mmap(struct vfio_device *core_vdev, struct vm_area_struct *v } } - vma->vm_private_data = vdev; + ret = vfio_pci_bar_vma_to_pfn(core_vdev, vma, &pfn); + if (ret) + return ret; + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); - vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot); + down_read(&vdev->memory_lock); /* - * See remap_pfn_range(), called from vfio_pci_fault() but we can't - * change vm_flags within the fault handler. Set them now. + * Only perform the mapping now if BAR is not in zapped state, VFs + * always report memory enabled so relying on device enable state + * could lead to duplicate remaps. */ - vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP; - vma->vm_ops = &vfio_pci_mmap_ops; + if (!vdev->zapped_bars) + ret = io_remap_pfn_range(vma, vma->vm_start, pfn, + vma->vm_end - vma->vm_start, + vma->vm_page_prot); + up_read(&vdev->memory_lock); - return 0; + return ret; } static void vfio_pci_request(struct vfio_device *core_vdev, unsigned int count) diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index 70e28efbc51f..4220057b253c 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -605,7 +605,7 @@ static int vfio_basic_config_write(struct vfio_pci_device *vdev, int pos, count = vfio_default_config_write(vdev, pos, count, perm, offset, val); if (count < 0) { if (offset == PCI_COMMAND) - up_write(&vdev->memory_lock); + vfio_pci_test_and_up_write_memory_lock(vdev); return count; } @@ -619,7 +619,7 @@ static int vfio_basic_config_write(struct vfio_pci_device *vdev, int pos, *virt_cmd &= cpu_to_le16(~mask); *virt_cmd |= cpu_to_le16(new_cmd & mask); - up_write(&vdev->memory_lock); + vfio_pci_test_and_up_write_memory_lock(vdev); } /* Emulate INTx disable */ @@ -860,7 +860,7 @@ static int vfio_exp_config_write(struct vfio_pci_device *vdev, int pos, if (!ret && (cap & PCI_EXP_DEVCAP_FLR)) { vfio_pci_zap_and_down_write_memory_lock(vdev); pci_try_reset_function(vdev->pdev); - up_write(&vdev->memory_lock); + vfio_pci_test_and_up_write_memory_lock(vdev); } } @@ -942,7 +942,7 @@ static int vfio_af_config_write(struct vfio_pci_device *vdev, int pos, if (!ret && (cap & PCI_AF_CAP_FLR) && (cap & PCI_AF_CAP_TP)) { vfio_pci_zap_and_down_write_memory_lock(vdev); pci_try_reset_function(vdev->pdev); - up_write(&vdev->memory_lock); + vfio_pci_test_and_up_write_memory_lock(vdev); } } diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index 0aa542fa1e26..9aedb78a4ae3 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -128,6 +128,7 @@ struct vfio_pci_device { bool needs_reset; bool nointx; bool needs_pm_restore; + bool zapped_bars; struct pci_saved_state *pci_saved_state; struct pci_saved_state *pm_save; struct vfio_pci_reflck *reflck; @@ -186,6 +187,8 @@ extern int vfio_pci_set_power_state(struct vfio_pci_device *vdev, extern bool __vfio_pci_memory_enabled(struct vfio_pci_device *vdev); extern void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_device *vdev); +extern void vfio_pci_test_and_up_write_memory_lock(struct vfio_pci_device + *vdev); extern u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_device *vdev); extern void vfio_pci_memory_unlock_and_restore(struct vfio_pci_device *vdev, u16 cmd);