From patchwork Sun Mar 22 12:31:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11451677 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 24CDF139A for ; Sun, 22 Mar 2020 12:27:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id F0F45207FF for ; Sun, 22 Mar 2020 12:27:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727226AbgCVM0x (ORCPT ); Sun, 22 Mar 2020 08:26:53 -0400 Received: from mga18.intel.com ([134.134.136.126]:51561 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726756AbgCVM0Z (ORCPT ); Sun, 22 Mar 2020 08:26:25 -0400 IronPort-SDR: CH2Qsc4v6+3MwLJ2gcRPep9QECMA3bPWY+ZTEga/hiigxK+rqH6/CEJfDgsSKvFOeW78hopaVH MZ7odHuL6G+w== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Mar 2020 05:26:23 -0700 IronPort-SDR: ZK/YYV802jEaczSnu2jl2XuKM9yPHRrtgpD/8bTIknWnbmThjU3Xa8I7PoPuumtk6wIWV1xt6S gxMaFk0KBPhg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,292,1580803200"; d="scan'208";a="239663865" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 22 Mar 2020 05:26:23 -0700 From: "Liu, Yi L" To: alex.williamson@redhat.com, eric.auger@redhat.com Cc: kevin.tian@intel.com, jacob.jun.pan@linux.intel.com, joro@8bytes.org, ashok.raj@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, jean-philippe@linaro.org, peterx@redhat.com, iommu@lists.linux-foundation.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, hao.wu@intel.com Subject: [PATCH v1 1/8] vfio: Add VFIO_IOMMU_PASID_REQUEST(alloc/free) Date: Sun, 22 Mar 2020 05:31:58 -0700 Message-Id: <1584880325-10561-2-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584880325-10561-1-git-send-email-yi.l.liu@intel.com> References: <1584880325-10561-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Liu Yi L For a long time, devices have only one DMA address space from platform IOMMU's point of view. This is true for both bare metal and directed- access in virtualization environment. Reason is the source ID of DMA in PCIe are BDF (bus/dev/fnc ID), which results in only device granularity DMA isolation. However, this is changing with the latest advancement in I/O technology area. More and more platform vendors are utilizing the PCIe PASID TLP prefix in DMA requests, thus to give devices with multiple DMA address spaces as identified by their individual PASIDs. For example, Shared Virtual Addressing (SVA, a.k.a Shared Virtual Memory) is able to let device access multiple process virtual address space by binding the virtual address space with a PASID. Wherein the PASID is allocated in software and programmed to device per device specific manner. Devices which support PASID capability are called PASID-capable devices. If such devices are passed through to VMs, guest software are also able to bind guest process virtual address space on such devices. Therefore, the guest software could reuse the bare metal software programming model, which means guest software will also allocate PASID and program it to device directly. This is a dangerous situation since it has potential PASID conflicts and unauthorized address space access. It would be safer to let host intercept in the guest software's PASID allocation. Thus PASID are managed system-wide. This patch adds VFIO_IOMMU_PASID_REQUEST ioctl which aims to passdown PASID allocation/free request from the virtual IOMMU. Additionally, such requests are intended to be invoked by QEMU or other applications which are running in userspace, it is necessary to have a mechanism to prevent single application from abusing available PASIDs in system. With such consideration, this patch tracks the VFIO PASID allocation per-VM. There was a discussion to make quota to be per assigned devices. e.g. if a VM has many assigned devices, then it should have more quota. However, it is not sure how many PASIDs an assigned devices will use. e.g. it is possible that a VM with multiples assigned devices but requests less PASIDs. Therefore per-VM quota would be better. This patch uses struct mm pointer as a per-VM token. We also considered using task structure pointer and vfio_iommu structure pointer. However, task structure is per-thread, which means it cannot achieve per-VM PASID alloc tracking purpose. While for vfio_iommu structure, it is visible only within vfio. Therefore, structure mm pointer is selected. This patch adds a structure vfio_mm. A vfio_mm is created when the first vfio container is opened by a VM. On the reverse order, vfio_mm is free when the last vfio container is released. Each VM is assigned with a PASID quota, so that it is not able to request PASID beyond its quota. This patch adds a default quota of 1000. This quota could be tuned by administrator. Making PASID quota tunable will be added in another patch in this series. Previous discussions: https://patchwork.kernel.org/patch/11209429/ Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Signed-off-by: Liu Yi L Signed-off-by: Yi Sun Signed-off-by: Jacob Pan Reported-by: kbuild test robot --- drivers/vfio/vfio.c | 130 ++++++++++++++++++++++++++++++++++++++++ drivers/vfio/vfio_iommu_type1.c | 104 ++++++++++++++++++++++++++++++++ include/linux/vfio.h | 20 +++++++ include/uapi/linux/vfio.h | 41 +++++++++++++ 4 files changed, 295 insertions(+) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index c848262..d13b483 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -32,6 +32,7 @@ #include #include #include +#include #define DRIVER_VERSION "0.3" #define DRIVER_AUTHOR "Alex Williamson " @@ -46,6 +47,8 @@ static struct vfio { struct mutex group_lock; struct cdev group_cdev; dev_t group_devt; + struct list_head vfio_mm_list; + struct mutex vfio_mm_lock; wait_queue_head_t release_q; } vfio; @@ -2129,6 +2132,131 @@ int vfio_unregister_notifier(struct device *dev, enum vfio_notify_type type, EXPORT_SYMBOL(vfio_unregister_notifier); /** + * VFIO_MM objects - create, release, get, put, search + * Caller of the function should have held vfio.vfio_mm_lock. + */ +static struct vfio_mm *vfio_create_mm(struct mm_struct *mm) +{ + struct vfio_mm *vmm; + struct vfio_mm_token *token; + int ret = 0; + + vmm = kzalloc(sizeof(*vmm), GFP_KERNEL); + if (!vmm) + return ERR_PTR(-ENOMEM); + + /* Per mm IOASID set used for quota control and group operations */ + ret = ioasid_alloc_set((struct ioasid_set *) mm, + VFIO_DEFAULT_PASID_QUOTA, &vmm->ioasid_sid); + if (ret) { + kfree(vmm); + return ERR_PTR(ret); + } + + kref_init(&vmm->kref); + token = &vmm->token; + token->val = mm; + vmm->pasid_quota = VFIO_DEFAULT_PASID_QUOTA; + mutex_init(&vmm->pasid_lock); + + list_add(&vmm->vfio_next, &vfio.vfio_mm_list); + + return vmm; +} + +static void vfio_mm_unlock_and_free(struct vfio_mm *vmm) +{ + /* destroy the ioasid set */ + ioasid_free_set(vmm->ioasid_sid, true); + mutex_unlock(&vfio.vfio_mm_lock); + kfree(vmm); +} + +/* called with vfio.vfio_mm_lock held */ +static void vfio_mm_release(struct kref *kref) +{ + struct vfio_mm *vmm = container_of(kref, struct vfio_mm, kref); + + list_del(&vmm->vfio_next); + vfio_mm_unlock_and_free(vmm); +} + +void vfio_mm_put(struct vfio_mm *vmm) +{ + kref_put_mutex(&vmm->kref, vfio_mm_release, &vfio.vfio_mm_lock); +} +EXPORT_SYMBOL_GPL(vfio_mm_put); + +/* Assume vfio_mm_lock or vfio_mm reference is held */ +static void vfio_mm_get(struct vfio_mm *vmm) +{ + kref_get(&vmm->kref); +} + +struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task) +{ + struct mm_struct *mm = get_task_mm(task); + struct vfio_mm *vmm; + unsigned long long val = (unsigned long long) mm; + + mutex_lock(&vfio.vfio_mm_lock); + list_for_each_entry(vmm, &vfio.vfio_mm_list, vfio_next) { + if (vmm->token.val == val) { + vfio_mm_get(vmm); + goto out; + } + } + + vmm = vfio_create_mm(mm); + if (IS_ERR(vmm)) + vmm = NULL; +out: + mutex_unlock(&vfio.vfio_mm_lock); + mmput(mm); + return vmm; +} +EXPORT_SYMBOL_GPL(vfio_mm_get_from_task); + +int vfio_mm_pasid_alloc(struct vfio_mm *vmm, int min, int max) +{ + ioasid_t pasid; + int ret = -ENOSPC; + + mutex_lock(&vmm->pasid_lock); + + pasid = ioasid_alloc(vmm->ioasid_sid, min, max, NULL); + if (pasid == INVALID_IOASID) { + ret = -ENOSPC; + goto out_unlock; + } + + ret = pasid; +out_unlock: + mutex_unlock(&vmm->pasid_lock); + return ret; +} +EXPORT_SYMBOL_GPL(vfio_mm_pasid_alloc); + +int vfio_mm_pasid_free(struct vfio_mm *vmm, ioasid_t pasid) +{ + void *pdata; + int ret = 0; + + mutex_lock(&vmm->pasid_lock); + pdata = ioasid_find(vmm->ioasid_sid, pasid, NULL); + if (IS_ERR(pdata)) { + ret = PTR_ERR(pdata); + goto out_unlock; + } + ioasid_free(pasid); + +out_unlock: + mutex_unlock(&vmm->pasid_lock); + return ret; +} +EXPORT_SYMBOL_GPL(vfio_mm_pasid_free); + +/** * Module/class support */ static char *vfio_devnode(struct device *dev, umode_t *mode) @@ -2151,8 +2279,10 @@ static int __init vfio_init(void) idr_init(&vfio.group_idr); mutex_init(&vfio.group_lock); mutex_init(&vfio.iommu_drivers_lock); + mutex_init(&vfio.vfio_mm_lock); INIT_LIST_HEAD(&vfio.group_list); INIT_LIST_HEAD(&vfio.iommu_drivers_list); + INIT_LIST_HEAD(&vfio.vfio_mm_list); init_waitqueue_head(&vfio.release_q); ret = misc_register(&vfio_dev); diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index a177bf2..331ceee 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -70,6 +70,7 @@ struct vfio_iommu { unsigned int dma_avail; bool v2; bool nesting; + struct vfio_mm *vmm; }; struct vfio_domain { @@ -2018,6 +2019,7 @@ static void vfio_iommu_type1_detach_group(void *iommu_data, static void *vfio_iommu_type1_open(unsigned long arg) { struct vfio_iommu *iommu; + struct vfio_mm *vmm = NULL; iommu = kzalloc(sizeof(*iommu), GFP_KERNEL); if (!iommu) @@ -2043,6 +2045,10 @@ static void *vfio_iommu_type1_open(unsigned long arg) iommu->dma_avail = dma_entry_limit; mutex_init(&iommu->lock); BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier); + vmm = vfio_mm_get_from_task(current); + if (!vmm) + pr_err("Failed to get vfio_mm track\n"); + iommu->vmm = vmm; return iommu; } @@ -2084,6 +2090,8 @@ static void vfio_iommu_type1_release(void *iommu_data) } vfio_iommu_iova_free(&iommu->iova_list); + if (iommu->vmm) + vfio_mm_put(iommu->vmm); kfree(iommu); } @@ -2172,6 +2180,55 @@ static int vfio_iommu_iova_build_caps(struct vfio_iommu *iommu, return ret; } +static bool vfio_iommu_type1_pasid_req_valid(u32 flags) +{ + return !((flags & ~VFIO_PASID_REQUEST_MASK) || + (flags & VFIO_IOMMU_PASID_ALLOC && + flags & VFIO_IOMMU_PASID_FREE)); +} + +static int vfio_iommu_type1_pasid_alloc(struct vfio_iommu *iommu, + int min, + int max) +{ + struct vfio_mm *vmm = iommu->vmm; + int ret = 0; + + mutex_lock(&iommu->lock); + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) { + ret = -EFAULT; + goto out_unlock; + } + if (vmm) + ret = vfio_mm_pasid_alloc(vmm, min, max); + else + ret = -EINVAL; +out_unlock: + mutex_unlock(&iommu->lock); + return ret; +} + +static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu, + unsigned int pasid) +{ + struct vfio_mm *vmm = iommu->vmm; + int ret = 0; + + mutex_lock(&iommu->lock); + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) { + ret = -EFAULT; + goto out_unlock; + } + + if (vmm) + ret = vfio_mm_pasid_free(vmm, pasid); + else + ret = -EINVAL; +out_unlock: + mutex_unlock(&iommu->lock); + return ret; +} + static long vfio_iommu_type1_ioctl(void *iommu_data, unsigned int cmd, unsigned long arg) { @@ -2276,6 +2333,53 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, return copy_to_user((void __user *)arg, &unmap, minsz) ? -EFAULT : 0; + + } else if (cmd == VFIO_IOMMU_PASID_REQUEST) { + struct vfio_iommu_type1_pasid_request req; + unsigned long offset; + + minsz = offsetofend(struct vfio_iommu_type1_pasid_request, + flags); + + if (copy_from_user(&req, (void __user *)arg, minsz)) + return -EFAULT; + + if (req.argsz < minsz || + !vfio_iommu_type1_pasid_req_valid(req.flags)) + return -EINVAL; + + if (copy_from_user((void *)&req + minsz, + (void __user *)arg + minsz, + sizeof(req) - minsz)) + return -EFAULT; + + switch (req.flags & VFIO_PASID_REQUEST_MASK) { + case VFIO_IOMMU_PASID_ALLOC: + { + int ret = 0, result; + + result = vfio_iommu_type1_pasid_alloc(iommu, + req.alloc_pasid.min, + req.alloc_pasid.max); + if (result > 0) { + offset = offsetof( + struct vfio_iommu_type1_pasid_request, + alloc_pasid.result); + ret = copy_to_user( + (void __user *) (arg + offset), + &result, sizeof(result)); + } else { + pr_debug("%s: PASID alloc failed\n", __func__); + ret = -EFAULT; + } + return ret; + } + case VFIO_IOMMU_PASID_FREE: + return vfio_iommu_type1_pasid_free(iommu, + req.free_pasid); + default: + return -EINVAL; + } } return -ENOTTY; diff --git a/include/linux/vfio.h b/include/linux/vfio.h index e42a711..75f9f7f1 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -89,6 +89,26 @@ extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops); extern void vfio_unregister_iommu_driver( const struct vfio_iommu_driver_ops *ops); +#define VFIO_DEFAULT_PASID_QUOTA 1000 +struct vfio_mm_token { + unsigned long long val; +}; + +struct vfio_mm { + struct kref kref; + struct vfio_mm_token token; + int ioasid_sid; + /* protect @pasid_quota field and pasid allocation/free */ + struct mutex pasid_lock; + int pasid_quota; + struct list_head vfio_next; +}; + +extern struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task); +extern void vfio_mm_put(struct vfio_mm *vmm); +extern int vfio_mm_pasid_alloc(struct vfio_mm *vmm, int min, int max); +extern int vfio_mm_pasid_free(struct vfio_mm *vmm, ioasid_t pasid); + /* * External user API */ diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 9e843a1..298ac80 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -794,6 +794,47 @@ struct vfio_iommu_type1_dma_unmap { #define VFIO_IOMMU_ENABLE _IO(VFIO_TYPE, VFIO_BASE + 15) #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16) +/* + * PASID (Process Address Space ID) is a PCIe concept which + * has been extended to support DMA isolation in fine-grain. + * With device assigned to user space (e.g. VMs), PASID alloc + * and free need to be system wide. This structure defines + * the info for pasid alloc/free between user space and kernel + * space. + * + * @flag=VFIO_IOMMU_PASID_ALLOC, refer to the @alloc_pasid + * @flag=VFIO_IOMMU_PASID_FREE, refer to @free_pasid + */ +struct vfio_iommu_type1_pasid_request { + __u32 argsz; +#define VFIO_IOMMU_PASID_ALLOC (1 << 0) +#define VFIO_IOMMU_PASID_FREE (1 << 1) + __u32 flags; + union { + struct { + __u32 min; + __u32 max; + __u32 result; + } alloc_pasid; + __u32 free_pasid; + }; +}; + +#define VFIO_PASID_REQUEST_MASK (VFIO_IOMMU_PASID_ALLOC | \ + VFIO_IOMMU_PASID_FREE) + +/** + * VFIO_IOMMU_PASID_REQUEST - _IOWR(VFIO_TYPE, VFIO_BASE + 22, + * struct vfio_iommu_type1_pasid_request) + * + * Availability of this feature depends on PASID support in the device, + * its bus, the underlying IOMMU and the CPU architecture. In VFIO, it + * is available after VFIO_SET_IOMMU. + * + * returns: 0 on success, -errno on failure. + */ +#define VFIO_IOMMU_PASID_REQUEST _IO(VFIO_TYPE, VFIO_BASE + 22) + /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */ /* From patchwork Sun Mar 22 12:31:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11451673 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1E576913 for ; Sun, 22 Mar 2020 12:26:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0617720784 for ; Sun, 22 Mar 2020 12:26:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727064AbgCVM0Z (ORCPT ); Sun, 22 Mar 2020 08:26:25 -0400 Received: from mga18.intel.com ([134.134.136.126]:51561 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727008AbgCVM0Y (ORCPT ); Sun, 22 Mar 2020 08:26:24 -0400 IronPort-SDR: AJqd8HXzqQYzDmS9d0gekE7+goANJwg7Q9x5c2SnxcwiF1lKEUn1SFgX/mR0OzCtGx+nh4ZeUM ZTg4Ut9SAO6A== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Mar 2020 05:26:23 -0700 IronPort-SDR: ODU0KUNuNzn/OqGhgj2jPcOOpyU0+gOjRODvSadyb95UTsobMtrBs5/P+2WtfjMzTUOx/yvOxI jHywM3xIEA2A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,292,1580803200"; d="scan'208";a="239663867" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 22 Mar 2020 05:26:23 -0700 From: "Liu, Yi L" To: alex.williamson@redhat.com, eric.auger@redhat.com Cc: kevin.tian@intel.com, jacob.jun.pan@linux.intel.com, joro@8bytes.org, ashok.raj@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, jean-philippe@linaro.org, peterx@redhat.com, iommu@lists.linux-foundation.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, hao.wu@intel.com Subject: [PATCH v1 2/8] vfio/type1: Add vfio_iommu_type1 parameter for quota tuning Date: Sun, 22 Mar 2020 05:31:59 -0700 Message-Id: <1584880325-10561-3-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584880325-10561-1-git-send-email-yi.l.liu@intel.com> References: <1584880325-10561-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Liu Yi L This patch adds a module option to make the PASID quota tunable by administrator. TODO: needs to think more on how to make the tuning to be per-process. Previous discussions: https://patchwork.kernel.org/patch/11209429/ Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Signed-off-by: Liu Yi L Reported-by: kbuild test robot --- drivers/vfio/vfio.c | 8 +++++++- drivers/vfio/vfio_iommu_type1.c | 7 ++++++- include/linux/vfio.h | 3 ++- 3 files changed, 15 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index d13b483..020a792 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -2217,13 +2217,19 @@ struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task) } EXPORT_SYMBOL_GPL(vfio_mm_get_from_task); -int vfio_mm_pasid_alloc(struct vfio_mm *vmm, int min, int max) +int vfio_mm_pasid_alloc(struct vfio_mm *vmm, int quota, int min, int max) { ioasid_t pasid; int ret = -ENOSPC; mutex_lock(&vmm->pasid_lock); + /* update quota as it is tunable by admin */ + if (vmm->pasid_quota != quota) { + vmm->pasid_quota = quota; + ioasid_adjust_set(vmm->ioasid_sid, quota); + } + pasid = ioasid_alloc(vmm->ioasid_sid, min, max, NULL); if (pasid == INVALID_IOASID) { ret = -ENOSPC; diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 331ceee..e40afc0 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -60,6 +60,11 @@ module_param_named(dma_entry_limit, dma_entry_limit, uint, 0644); MODULE_PARM_DESC(dma_entry_limit, "Maximum number of user DMA mappings per container (65535)."); +static int pasid_quota = VFIO_DEFAULT_PASID_QUOTA; +module_param_named(pasid_quota, pasid_quota, uint, 0644); +MODULE_PARM_DESC(pasid_quota, + "Quota of user owned PASIDs per vfio-based application (1000)."); + struct vfio_iommu { struct list_head domain_list; struct list_head iova_list; @@ -2200,7 +2205,7 @@ static int vfio_iommu_type1_pasid_alloc(struct vfio_iommu *iommu, goto out_unlock; } if (vmm) - ret = vfio_mm_pasid_alloc(vmm, min, max); + ret = vfio_mm_pasid_alloc(vmm, pasid_quota, min, max); else ret = -EINVAL; out_unlock: diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 75f9f7f1..af2ef78 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -106,7 +106,8 @@ struct vfio_mm { extern struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task); extern void vfio_mm_put(struct vfio_mm *vmm); -extern int vfio_mm_pasid_alloc(struct vfio_mm *vmm, int min, int max); +extern int vfio_mm_pasid_alloc(struct vfio_mm *vmm, + int quota, int min, int max); extern int vfio_mm_pasid_free(struct vfio_mm *vmm, ioasid_t pasid); /* From patchwork Sun Mar 22 12:32:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11451679 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 61400913 for ; Sun, 22 Mar 2020 12:27:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 49B8F2076F for ; Sun, 22 Mar 2020 12:27:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727036AbgCVM0Y (ORCPT ); Sun, 22 Mar 2020 08:26:24 -0400 Received: from mga18.intel.com ([134.134.136.126]:51561 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726984AbgCVM0X (ORCPT ); Sun, 22 Mar 2020 08:26:23 -0400 IronPort-SDR: XueL8sDvfdQrkVMPChLfn7c+v6XFIfGTibb5sqUA64Woajmcm35e/AVZRl7FBZOZ2PY2ynEjjy dgYGD1FaJwuw== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Mar 2020 05:26:23 -0700 IronPort-SDR: v0vHSezpTRJ7HcU3bHoZ1b77N8M1QJwLIowCl7aHn3jl9161LDi/haUmOF5d7mS4wTI8dGRTmY p7wVYMtmR4ug== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,292,1580803200"; d="scan'208";a="239663871" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 22 Mar 2020 05:26:23 -0700 From: "Liu, Yi L" To: alex.williamson@redhat.com, eric.auger@redhat.com Cc: kevin.tian@intel.com, jacob.jun.pan@linux.intel.com, joro@8bytes.org, ashok.raj@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, jean-philippe@linaro.org, peterx@redhat.com, iommu@lists.linux-foundation.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, hao.wu@intel.com Subject: [PATCH v1 3/8] vfio/type1: Report PASID alloc/free support to userspace Date: Sun, 22 Mar 2020 05:32:00 -0700 Message-Id: <1584880325-10561-4-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584880325-10561-1-git-send-email-yi.l.liu@intel.com> References: <1584880325-10561-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Liu Yi L This patch reports PASID alloc/free availability to userspace (e.g. QEMU) thus userspace could do a pre-check before utilizing this feature. Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Signed-off-by: Liu Yi L --- drivers/vfio/vfio_iommu_type1.c | 28 ++++++++++++++++++++++++++++ include/uapi/linux/vfio.h | 8 ++++++++ 2 files changed, 36 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index e40afc0..ddd1ffe 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -2234,6 +2234,30 @@ static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu, return ret; } +static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu, + struct vfio_info_cap *caps) +{ + struct vfio_info_cap_header *header; + struct vfio_iommu_type1_info_cap_nesting *nesting_cap; + + header = vfio_info_cap_add(caps, sizeof(*nesting_cap), + VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1); + if (IS_ERR(header)) + return PTR_ERR(header); + + nesting_cap = container_of(header, + struct vfio_iommu_type1_info_cap_nesting, + header); + + nesting_cap->nesting_capabilities = 0; + if (iommu->nesting) { + /* nesting iommu type supports PASID requests (alloc/free) */ + nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS; + } + + return 0; +} + static long vfio_iommu_type1_ioctl(void *iommu_data, unsigned int cmd, unsigned long arg) { @@ -2283,6 +2307,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, if (ret) return ret; + ret = vfio_iommu_info_add_nesting_cap(iommu, &caps); + if (ret) + return ret; + if (caps.size) { info.flags |= VFIO_IOMMU_INFO_CAPS; diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 298ac80..8837219 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -748,6 +748,14 @@ struct vfio_iommu_type1_info_cap_iova_range { struct vfio_iova_range iova_ranges[]; }; +#define VFIO_IOMMU_TYPE1_INFO_CAP_NESTING 2 + +struct vfio_iommu_type1_info_cap_nesting { + struct vfio_info_cap_header header; +#define VFIO_IOMMU_PASID_REQS (1 << 0) + __u32 nesting_capabilities; +}; + #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12) /** From patchwork Sun Mar 22 12:32:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11451669 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E8478139A for ; Sun, 22 Mar 2020 12:26:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CFF6020769 for ; Sun, 22 Mar 2020 12:26:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727095AbgCVM0Z (ORCPT ); Sun, 22 Mar 2020 08:26:25 -0400 Received: from mga18.intel.com ([134.134.136.126]:51561 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727061AbgCVM0Z (ORCPT ); Sun, 22 Mar 2020 08:26:25 -0400 IronPort-SDR: QMsZBmxgk+66lUUKEVZIhxC6W1UIAG0foY5xkUlxHVGh1DSyoPo0kiiEO7DIIDxGBTMRZAfTEI zHQvpJ1/bX7Q== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Mar 2020 05:26:23 -0700 IronPort-SDR: oDhKabdEgKGFhpjlxoqmAbyif2PuoUrVoTpudwxrGw3kcsSiW5nnR6w8cuyypgFXvymAR4fuLf dUBxviObXjeg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,292,1580803200"; d="scan'208";a="239663874" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 22 Mar 2020 05:26:23 -0700 From: "Liu, Yi L" To: alex.williamson@redhat.com, eric.auger@redhat.com Cc: kevin.tian@intel.com, jacob.jun.pan@linux.intel.com, joro@8bytes.org, ashok.raj@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, jean-philippe@linaro.org, peterx@redhat.com, iommu@lists.linux-foundation.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, hao.wu@intel.com Subject: [PATCH v1 4/8] vfio: Check nesting iommu uAPI version Date: Sun, 22 Mar 2020 05:32:01 -0700 Message-Id: <1584880325-10561-5-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584880325-10561-1-git-send-email-yi.l.liu@intel.com> References: <1584880325-10561-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Liu Yi L In Linux Kernel, the IOMMU nesting translation (a.k.a dual stage address translation) capability is abstracted in uapi/iommu.h, in which the uAPIs like bind_gpasid/iommu_cache_invalidate/fault_report/pgreq_resp are defined. VFIO_TYPE1_NESTING_IOMMU stands for the vfio iommu type which is backed by hardware IOMMU w/ dual stage translation capability. For such vfio iommu type, userspace is able to setup dual stage DMA translation in host side via VFIO's ABI. However, such VFIO ABIs rely on the uAPIs defined in uapi/ iommu.h. So VFIO needs to provide an API to userspace for the uapi/iommu.h version check to ensure the iommu uAPI compatibility. This patch reports the iommu uAPI version to userspace in VFIO_CHECK_EXTENSION IOCTL. Applications could do version check before further setup dual stage translation in host IOMMU. Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Signed-off-by: Liu Yi L Reported-by: kbuild test robot --- drivers/vfio/vfio_iommu_type1.c | 2 ++ include/uapi/linux/vfio.h | 9 +++++++++ 2 files changed, 11 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index ddd1ffe..9aa2a67 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -2274,6 +2274,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, if (!iommu) return 0; return vfio_domains_have_iommu_cache(iommu); + case VFIO_NESTING_IOMMU_UAPI: + return iommu_get_uapi_version(); default: return 0; } diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 8837219..ed9881d 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -47,6 +47,15 @@ #define VFIO_NOIOMMU_IOMMU 8 /* + * Hardware IOMMUs with two-stage translation capability give userspace + * the ownership of stage-1 translation structures (e.g. page tables). + * VFIO exposes the two-stage IOMMU programming capability to userspace + * based on the IOMMU UAPIs. Therefore user of VFIO_TYPE1_NESTING should + * check the IOMMU UAPI version compatibility. + */ +#define VFIO_NESTING_IOMMU_UAPI 9 + +/* * The IOCTL interface is designed for extensibility by embedding the * structure length (argsz) and flags into structures passed between * kernel and userspace. We therefore use the _IO() macro for these From patchwork Sun Mar 22 12:32:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11451671 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0D2AF139A for ; Sun, 22 Mar 2020 12:26:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EC385207FF for ; Sun, 22 Mar 2020 12:26:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727191AbgCVM0o (ORCPT ); Sun, 22 Mar 2020 08:26:44 -0400 Received: from mga18.intel.com ([134.134.136.126]:51562 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727068AbgCVM0Z (ORCPT ); Sun, 22 Mar 2020 08:26:25 -0400 IronPort-SDR: p+DVMWb/fJpB2hqgDgilEltsQywUGVHFP43qZ4TJisZXBzhIuHKSC194RcnHUoPOJzOhwFUt7p /vtKh65c+JKA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Mar 2020 05:26:23 -0700 IronPort-SDR: n+usVYSN9bVDhASoq4dNnClcMH2wws3t8OfgUpnS/JqtLHe1YyYt/DkSowJupqC+8CnPqami8x Qte9Jz14rSWw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,292,1580803200"; d="scan'208";a="239663876" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 22 Mar 2020 05:26:23 -0700 From: "Liu, Yi L" To: alex.williamson@redhat.com, eric.auger@redhat.com Cc: kevin.tian@intel.com, jacob.jun.pan@linux.intel.com, joro@8bytes.org, ashok.raj@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, jean-philippe@linaro.org, peterx@redhat.com, iommu@lists.linux-foundation.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, hao.wu@intel.com Subject: [PATCH v1 5/8] vfio/type1: Report 1st-level/stage-1 format to userspace Date: Sun, 22 Mar 2020 05:32:02 -0700 Message-Id: <1584880325-10561-6-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584880325-10561-1-git-send-email-yi.l.liu@intel.com> References: <1584880325-10561-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Liu Yi L VFIO exposes IOMMU nesting translation (a.k.a dual stage translation) capability to userspace. Thus applications like QEMU could support vIOMMU with hardware's nesting translation capability for pass-through devices. Before setting up nesting translation for pass-through devices, QEMU and other applications need to learn the supported 1st-lvl/stage-1 translation structure format like page table format. Take vSVA (virtual Shared Virtual Addressing) as an example, to support vSVA for pass-through devices, QEMU setup nesting translation for pass- through devices. The guest page table are configured to host as 1st-lvl/ stage-1 page table. Therefore, guest format should be compatible with host side. This patch reports the supported 1st-lvl/stage-1 page table format on the current platform to userspace. QEMU and other alike applications should use this format info when trying to setup IOMMU nesting translation on host IOMMU. Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Signed-off-by: Liu Yi L Reported-by: kbuild test robot --- drivers/vfio/vfio_iommu_type1.c | 56 +++++++++++++++++++++++++++++++++++++++++ include/uapi/linux/vfio.h | 1 + 2 files changed, 57 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 9aa2a67..82a9e0b 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -2234,11 +2234,66 @@ static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu, return ret; } +static int vfio_iommu_get_stage1_format(struct vfio_iommu *iommu, + u32 *stage1_format) +{ + struct vfio_domain *domain; + u32 format = 0, tmp_format = 0; + int ret; + + mutex_lock(&iommu->lock); + if (list_empty(&iommu->domain_list)) { + mutex_unlock(&iommu->lock); + return -EINVAL; + } + + list_for_each_entry(domain, &iommu->domain_list, next) { + if (iommu_domain_get_attr(domain->domain, + DOMAIN_ATTR_PASID_FORMAT, &format)) { + ret = -EINVAL; + format = 0; + goto out_unlock; + } + /* + * format is always non-zero (the first format is + * IOMMU_PASID_FORMAT_INTEL_VTD which is 1). For + * the reason of potential different backed IOMMU + * formats, here we expect to have identical formats + * in the domain list, no mixed formats support. + * return -EINVAL to fail the attempt of setup + * VFIO_TYPE1_NESTING_IOMMU if non-identical formats + * are detected. + */ + if (tmp_format && tmp_format != format) { + ret = -EINVAL; + format = 0; + goto out_unlock; + } + + tmp_format = format; + } + ret = 0; + +out_unlock: + if (format) + *stage1_format = format; + mutex_unlock(&iommu->lock); + return ret; +} + static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu, struct vfio_info_cap *caps) { struct vfio_info_cap_header *header; struct vfio_iommu_type1_info_cap_nesting *nesting_cap; + u32 formats = 0; + int ret; + + ret = vfio_iommu_get_stage1_format(iommu, &formats); + if (ret) { + pr_warn("Failed to get stage-1 format\n"); + return ret; + } header = vfio_info_cap_add(caps, sizeof(*nesting_cap), VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1); @@ -2254,6 +2309,7 @@ static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu, /* nesting iommu type supports PASID requests (alloc/free) */ nesting_cap->nesting_capabilities |= VFIO_IOMMU_PASID_REQS; } + nesting_cap->stage1_formats = formats; return 0; } diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index ed9881d..ebeaf3e 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -763,6 +763,7 @@ struct vfio_iommu_type1_info_cap_nesting { struct vfio_info_cap_header header; #define VFIO_IOMMU_PASID_REQS (1 << 0) __u32 nesting_capabilities; + __u32 stage1_formats; }; #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12) From patchwork Sun Mar 22 12:32:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11451667 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2ABDE913 for ; Sun, 22 Mar 2020 12:26:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0064120769 for ; Sun, 22 Mar 2020 12:26:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727160AbgCVM0g (ORCPT ); Sun, 22 Mar 2020 08:26:36 -0400 Received: from mga18.intel.com ([134.134.136.126]:51561 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727085AbgCVM00 (ORCPT ); Sun, 22 Mar 2020 08:26:26 -0400 IronPort-SDR: iO/knZrY3qM11ZZVlLS4V9qnBlRq/keUs1KC6ni48XXDKp4HIdHG6O4PS6Y4exdiB8SCV7MOtn mSvvjOgOIAmw== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Mar 2020 05:26:23 -0700 IronPort-SDR: VWMjTkIXx7/w7hghIN8367EG5w+Lr0F9FRSyyBL+YpnkgB+B5Fl148yVcq1epTy/tTbnU3P9qM UlWSKbO/YfdA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,292,1580803200"; d="scan'208";a="239663880" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 22 Mar 2020 05:26:23 -0700 From: "Liu, Yi L" To: alex.williamson@redhat.com, eric.auger@redhat.com Cc: kevin.tian@intel.com, jacob.jun.pan@linux.intel.com, joro@8bytes.org, ashok.raj@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, jean-philippe@linaro.org, peterx@redhat.com, iommu@lists.linux-foundation.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, hao.wu@intel.com Subject: [PATCH v1 6/8] vfio/type1: Bind guest page tables to host Date: Sun, 22 Mar 2020 05:32:03 -0700 Message-Id: <1584880325-10561-7-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584880325-10561-1-git-send-email-yi.l.liu@intel.com> References: <1584880325-10561-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Liu Yi L VFIO_TYPE1_NESTING_IOMMU is an IOMMU type which is backed by hardware IOMMUs that have nesting DMA translation (a.k.a dual stage address translation). For such hardware IOMMUs, there are two stages/levels of address translation, and software may let userspace/VM to own the first- level/stage-1 translation structures. Example of such usage is vSVA ( virtual Shared Virtual Addressing). VM owns the first-level/stage-1 translation structures and bind the structures to host, then hardware IOMMU would utilize nesting translation when doing DMA translation fo the devices behind such hardware IOMMU. This patch adds vfio support for binding guest translation (a.k.a stage 1) structure to host iommu. And for VFIO_TYPE1_NESTING_IOMMU, not only bind guest page table is needed, it also requires to expose interface to guest for iommu cache invalidation when guest modified the first-level/stage-1 translation structures since hardware needs to be notified to flush stale iotlbs. This would be introduced in next patch. In this patch, guest page table bind and unbind are done by using flags VFIO_IOMMU_BIND_GUEST_PGTBL and VFIO_IOMMU_UNBIND_GUEST_PGTBL under IOCTL VFIO_IOMMU_BIND, the bind/unbind data are conveyed by struct iommu_gpasid_bind_data. Before binding guest page table to host, VM should have got a PASID allocated by host via VFIO_IOMMU_PASID_REQUEST. Bind guest translation structures (here is guest page table) to host are the first step to setup vSVA (Virtual Shared Virtual Addressing). Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Signed-off-by: Jean-Philippe Brucker Signed-off-by: Liu Yi L Signed-off-by: Jacob Pan Reported-by: kbuild test robot --- drivers/vfio/vfio_iommu_type1.c | 158 ++++++++++++++++++++++++++++++++++++++++ include/uapi/linux/vfio.h | 46 ++++++++++++ 2 files changed, 204 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 82a9e0b..a877747 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -130,6 +130,33 @@ struct vfio_regions { #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) \ (!list_empty(&iommu->domain_list)) +struct domain_capsule { + struct iommu_domain *domain; + void *data; +}; + +/* iommu->lock must be held */ +static int vfio_iommu_for_each_dev(struct vfio_iommu *iommu, + int (*fn)(struct device *dev, void *data), + void *data) +{ + struct domain_capsule dc = {.data = data}; + struct vfio_domain *d; + struct vfio_group *g; + int ret = 0; + + list_for_each_entry(d, &iommu->domain_list, next) { + dc.domain = d->domain; + list_for_each_entry(g, &d->group_list, next) { + ret = iommu_group_for_each_dev(g->iommu_group, + &dc, fn); + if (ret) + break; + } + } + return ret; +} + static int put_pfn(unsigned long pfn, int prot); /* @@ -2314,6 +2341,88 @@ static int vfio_iommu_info_add_nesting_cap(struct vfio_iommu *iommu, return 0; } +static int vfio_bind_gpasid_fn(struct device *dev, void *data) +{ + struct domain_capsule *dc = (struct domain_capsule *)data; + struct iommu_gpasid_bind_data *gbind_data = + (struct iommu_gpasid_bind_data *) dc->data; + + return iommu_sva_bind_gpasid(dc->domain, dev, gbind_data); +} + +static int vfio_unbind_gpasid_fn(struct device *dev, void *data) +{ + struct domain_capsule *dc = (struct domain_capsule *)data; + struct iommu_gpasid_bind_data *gbind_data = + (struct iommu_gpasid_bind_data *) dc->data; + + return iommu_sva_unbind_gpasid(dc->domain, dev, + gbind_data->hpasid); +} + +/** + * Unbind specific gpasid, caller of this function requires hold + * vfio_iommu->lock + */ +static long vfio_iommu_type1_do_guest_unbind(struct vfio_iommu *iommu, + struct iommu_gpasid_bind_data *gbind_data) +{ + return vfio_iommu_for_each_dev(iommu, + vfio_unbind_gpasid_fn, gbind_data); +} + +static long vfio_iommu_type1_bind_gpasid(struct vfio_iommu *iommu, + struct iommu_gpasid_bind_data *gbind_data) +{ + int ret = 0; + + mutex_lock(&iommu->lock); + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) { + ret = -EINVAL; + goto out_unlock; + } + + ret = vfio_iommu_for_each_dev(iommu, + vfio_bind_gpasid_fn, gbind_data); + /* + * If bind failed, it may not be a total failure. Some devices + * within the iommu group may have bind successfully. Although + * we don't enable pasid capability for non-singletion iommu + * groups, a unbind operation would be helpful to ensure no + * partial binding for an iommu group. + */ + if (ret) + /* + * Undo all binds that already succeeded, no need to + * check the return value here since some device within + * the group has no successful bind when coming to this + * place switch. + */ + vfio_iommu_type1_do_guest_unbind(iommu, gbind_data); + +out_unlock: + mutex_unlock(&iommu->lock); + return ret; +} + +static long vfio_iommu_type1_unbind_gpasid(struct vfio_iommu *iommu, + struct iommu_gpasid_bind_data *gbind_data) +{ + int ret = 0; + + mutex_lock(&iommu->lock); + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) { + ret = -EINVAL; + goto out_unlock; + } + + ret = vfio_iommu_type1_do_guest_unbind(iommu, gbind_data); + +out_unlock: + mutex_unlock(&iommu->lock); + return ret; +} + static long vfio_iommu_type1_ioctl(void *iommu_data, unsigned int cmd, unsigned long arg) { @@ -2471,6 +2580,55 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, default: return -EINVAL; } + + } else if (cmd == VFIO_IOMMU_BIND) { + struct vfio_iommu_type1_bind bind; + u32 version; + int data_size; + void *gbind_data; + int ret; + + minsz = offsetofend(struct vfio_iommu_type1_bind, flags); + + if (copy_from_user(&bind, (void __user *)arg, minsz)) + return -EFAULT; + + if (bind.argsz < minsz) + return -EINVAL; + + /* Get the version of struct iommu_gpasid_bind_data */ + if (copy_from_user(&version, + (void __user *) (arg + minsz), + sizeof(version))) + return -EFAULT; + + data_size = iommu_uapi_get_data_size( + IOMMU_UAPI_BIND_GPASID, version); + gbind_data = kzalloc(data_size, GFP_KERNEL); + if (!gbind_data) + return -ENOMEM; + + if (copy_from_user(gbind_data, + (void __user *) (arg + minsz), data_size)) { + kfree(gbind_data); + return -EFAULT; + } + + switch (bind.flags & VFIO_IOMMU_BIND_MASK) { + case VFIO_IOMMU_BIND_GUEST_PGTBL: + ret = vfio_iommu_type1_bind_gpasid(iommu, + gbind_data); + break; + case VFIO_IOMMU_UNBIND_GUEST_PGTBL: + ret = vfio_iommu_type1_unbind_gpasid(iommu, + gbind_data); + break; + default: + ret = -EINVAL; + break; + } + kfree(gbind_data); + return ret; } return -ENOTTY; diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index ebeaf3e..2235bc6 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -14,6 +14,7 @@ #include #include +#include #define VFIO_API_VERSION 0 @@ -853,6 +854,51 @@ struct vfio_iommu_type1_pasid_request { */ #define VFIO_IOMMU_PASID_REQUEST _IO(VFIO_TYPE, VFIO_BASE + 22) +/** + * Supported flags: + * - VFIO_IOMMU_BIND_GUEST_PGTBL: bind guest page tables to host for + * nesting type IOMMUs. In @data field It takes struct + * iommu_gpasid_bind_data. + * - VFIO_IOMMU_UNBIND_GUEST_PGTBL: undo a bind guest page table operation + * invoked by VFIO_IOMMU_BIND_GUEST_PGTBL. + * + */ +struct vfio_iommu_type1_bind { + __u32 argsz; + __u32 flags; +#define VFIO_IOMMU_BIND_GUEST_PGTBL (1 << 0) +#define VFIO_IOMMU_UNBIND_GUEST_PGTBL (1 << 1) + __u8 data[]; +}; + +#define VFIO_IOMMU_BIND_MASK (VFIO_IOMMU_BIND_GUEST_PGTBL | \ + VFIO_IOMMU_UNBIND_GUEST_PGTBL) + +/** + * VFIO_IOMMU_BIND - _IOW(VFIO_TYPE, VFIO_BASE + 23, + * struct vfio_iommu_type1_bind) + * + * Manage address spaces of devices in this container. Initially a TYPE1 + * container can only have one address space, managed with + * VFIO_IOMMU_MAP/UNMAP_DMA. + * + * An IOMMU of type VFIO_TYPE1_NESTING_IOMMU can be managed by both MAP/UNMAP + * and BIND ioctls at the same time. MAP/UNMAP acts on the stage-2 (host) page + * tables, and BIND manages the stage-1 (guest) page tables. Other types of + * IOMMU may allow MAP/UNMAP and BIND to coexist, where MAP/UNMAP controls + * the traffics only require single stage translation while BIND controls the + * traffics require nesting translation. But this depends on the underlying + * IOMMU architecture and isn't guaranteed. Example of this is the guest SVA + * traffics, such traffics need nesting translation to gain gVA->gPA and then + * gPA->hPA translation. + * + * Availability of this feature depends on the device, its bus, the underlying + * IOMMU and the CPU architecture. + * + * returns: 0 on success, -errno on failure. + */ +#define VFIO_IOMMU_BIND _IO(VFIO_TYPE, VFIO_BASE + 23) + /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */ /* From patchwork Sun Mar 22 12:32:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11451665 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A1491913 for ; Sun, 22 Mar 2020 12:26:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8A58720774 for ; Sun, 22 Mar 2020 12:26:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727134AbgCVM0c (ORCPT ); Sun, 22 Mar 2020 08:26:32 -0400 Received: from mga18.intel.com ([134.134.136.126]:51564 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727099AbgCVM00 (ORCPT ); Sun, 22 Mar 2020 08:26:26 -0400 IronPort-SDR: 1lK/tANWQp5LWleezTVc4pHv35hhcimuGmgdQ/6uxgoUlpxiuX0d9Q38mU3wMGIl/69rBBn/VR 1YZXU0kTbKpw== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Mar 2020 05:26:23 -0700 IronPort-SDR: XVRlIpu/d2DWOsgb+ymfD6NKYbgW+7z52sm8Ou8/b2gOE9wIx/c9xqqTI9gVEdEus1Cwit0U86 iCybnCM0x0sQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,292,1580803200"; d="scan'208";a="239663882" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 22 Mar 2020 05:26:23 -0700 From: "Liu, Yi L" To: alex.williamson@redhat.com, eric.auger@redhat.com Cc: kevin.tian@intel.com, jacob.jun.pan@linux.intel.com, joro@8bytes.org, ashok.raj@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, jean-philippe@linaro.org, peterx@redhat.com, iommu@lists.linux-foundation.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, hao.wu@intel.com Subject: [PATCH v1 7/8] vfio/type1: Add VFIO_IOMMU_CACHE_INVALIDATE Date: Sun, 22 Mar 2020 05:32:04 -0700 Message-Id: <1584880325-10561-8-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584880325-10561-1-git-send-email-yi.l.liu@intel.com> References: <1584880325-10561-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Liu Yi L For VFIO IOMMUs with the type VFIO_TYPE1_NESTING_IOMMU, guest "owns" the first-level/stage-1 translation structures, the host IOMMU driver has no knowledge of first-level/stage-1 structure cache updates unless the guest invalidation requests are trapped and propagated to the host. This patch adds a new IOCTL VFIO_IOMMU_CACHE_INVALIDATE to propagate guest first-level/stage-1 IOMMU cache invalidations to host to ensure IOMMU cache correctness. With this patch, vSVA (Virtual Shared Virtual Addressing) can be used safely as the host IOMMU iotlb correctness are ensured. Cc: Kevin Tian CC: Jacob Pan Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Signed-off-by: Liu Yi L Signed-off-by: Eric Auger Signed-off-by: Jacob Pan --- drivers/vfio/vfio_iommu_type1.c | 49 +++++++++++++++++++++++++++++++++++++++++ include/uapi/linux/vfio.h | 22 ++++++++++++++++++ 2 files changed, 71 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index a877747..937ec3f 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -2423,6 +2423,15 @@ static long vfio_iommu_type1_unbind_gpasid(struct vfio_iommu *iommu, return ret; } +static int vfio_cache_inv_fn(struct device *dev, void *data) +{ + struct domain_capsule *dc = (struct domain_capsule *)data; + struct iommu_cache_invalidate_info *cache_inv_info = + (struct iommu_cache_invalidate_info *) dc->data; + + return iommu_cache_invalidate(dc->domain, dev, cache_inv_info); +} + static long vfio_iommu_type1_ioctl(void *iommu_data, unsigned int cmd, unsigned long arg) { @@ -2629,6 +2638,46 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, } kfree(gbind_data); return ret; + } else if (cmd == VFIO_IOMMU_CACHE_INVALIDATE) { + struct vfio_iommu_type1_cache_invalidate cache_inv; + u32 version; + int info_size; + void *cache_info; + int ret; + + minsz = offsetofend(struct vfio_iommu_type1_cache_invalidate, + flags); + + if (copy_from_user(&cache_inv, (void __user *)arg, minsz)) + return -EFAULT; + + if (cache_inv.argsz < minsz || cache_inv.flags) + return -EINVAL; + + /* Get the version of struct iommu_cache_invalidate_info */ + if (copy_from_user(&version, + (void __user *) (arg + minsz), sizeof(version))) + return -EFAULT; + + info_size = iommu_uapi_get_data_size( + IOMMU_UAPI_CACHE_INVAL, version); + + cache_info = kzalloc(info_size, GFP_KERNEL); + if (!cache_info) + return -ENOMEM; + + if (copy_from_user(cache_info, + (void __user *) (arg + minsz), info_size)) { + kfree(cache_info); + return -EFAULT; + } + + mutex_lock(&iommu->lock); + ret = vfio_iommu_for_each_dev(iommu, vfio_cache_inv_fn, + cache_info); + mutex_unlock(&iommu->lock); + kfree(cache_info); + return ret; } return -ENOTTY; diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 2235bc6..62ca791 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -899,6 +899,28 @@ struct vfio_iommu_type1_bind { */ #define VFIO_IOMMU_BIND _IO(VFIO_TYPE, VFIO_BASE + 23) +/** + * VFIO_IOMMU_CACHE_INVALIDATE - _IOW(VFIO_TYPE, VFIO_BASE + 24, + * struct vfio_iommu_type1_cache_invalidate) + * + * Propagate guest IOMMU cache invalidation to the host. The cache + * invalidation information is conveyed by @cache_info, the content + * format would be structures defined in uapi/linux/iommu.h. User + * should be aware of that the struct iommu_cache_invalidate_info + * has a @version field, vfio needs to parse this field before getting + * data from userspace. + * + * Availability of this IOCTL is after VFIO_SET_IOMMU. + * + * returns: 0 on success, -errno on failure. + */ +struct vfio_iommu_type1_cache_invalidate { + __u32 argsz; + __u32 flags; + struct iommu_cache_invalidate_info cache_info; +}; +#define VFIO_IOMMU_CACHE_INVALIDATE _IO(VFIO_TYPE, VFIO_BASE + 24) + /* -------- Additional API for SPAPR TCE (Server POWERPC) IOMMU -------- */ /* From patchwork Sun Mar 22 12:32:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11451663 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2AD72913 for ; Sun, 22 Mar 2020 12:26:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0B54820769 for ; Sun, 22 Mar 2020 12:26:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727116AbgCVM01 (ORCPT ); Sun, 22 Mar 2020 08:26:27 -0400 Received: from mga18.intel.com ([134.134.136.126]:51562 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727096AbgCVM00 (ORCPT ); Sun, 22 Mar 2020 08:26:26 -0400 IronPort-SDR: qjSaPrQYT3Myhd6t5jdSMOhQk8R9LkFLLCeeUz2JiM2PgCIWNxxW1Wc+AwjMvgSUfj2xUjsAJr d4r+UlMnoYGw== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Mar 2020 05:26:23 -0700 IronPort-SDR: otHhxyJ/DdEO6E/Tuzgk+1XlBpwM4FqDJ5EBJp/djmZ4fXQInX5o+MieGn4MV1FKPSNmXGtVJe 8kI/ULE5CQrg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,292,1580803200"; d="scan'208";a="239663884" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 22 Mar 2020 05:26:23 -0700 From: "Liu, Yi L" To: alex.williamson@redhat.com, eric.auger@redhat.com Cc: kevin.tian@intel.com, jacob.jun.pan@linux.intel.com, joro@8bytes.org, ashok.raj@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, jean-philippe@linaro.org, peterx@redhat.com, iommu@lists.linux-foundation.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, hao.wu@intel.com Subject: [PATCH v1 8/8] vfio/type1: Add vSVA support for IOMMU-backed mdevs Date: Sun, 22 Mar 2020 05:32:05 -0700 Message-Id: <1584880325-10561-9-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1584880325-10561-1-git-send-email-yi.l.liu@intel.com> References: <1584880325-10561-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Liu Yi L Recent years, mediated device pass-through framework (e.g. vfio-mdev) are used to achieve flexible device sharing across domains (e.g. VMs). Also there are hardware assisted mediated pass-through solutions from platform vendors. e.g. Intel VT-d scalable mode which supports Intel Scalable I/O Virtualization technology. Such mdevs are called IOMMU- backed mdevs as there are IOMMU enforced DMA isolation for such mdevs. In kernel, IOMMU-backed mdevs are exposed to IOMMU layer by aux-domain concept, which means mdevs are protected by an iommu domain which is aux-domain of its physical device. Details can be found in the KVM presentation from Kevin Tian. IOMMU-backed equals to IOMMU-capable. https://events19.linuxfoundation.org/wp-content/uploads/2017/12/\ Hardware-Assisted-Mediated-Pass-Through-with-VFIO-Kevin-Tian-Intel.pdf This patch supports NESTING IOMMU for IOMMU-backed mdevs by figuring out the physical device of an IOMMU-backed mdev and then invoking IOMMU requests to IOMMU layer with the physical device and the mdev's aux domain info. With this patch, vSVA (Virtual Shared Virtual Addressing) can be used on IOMMU-backed mdevs. Cc: Kevin Tian CC: Jacob Pan CC: Jun Tian Cc: Alex Williamson Cc: Eric Auger Cc: Jean-Philippe Brucker Signed-off-by: Liu Yi L --- drivers/vfio/vfio_iommu_type1.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 937ec3f..d473665 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -132,6 +132,7 @@ struct vfio_regions { struct domain_capsule { struct iommu_domain *domain; + struct vfio_group *group; void *data; }; @@ -148,6 +149,7 @@ static int vfio_iommu_for_each_dev(struct vfio_iommu *iommu, list_for_each_entry(d, &iommu->domain_list, next) { dc.domain = d->domain; list_for_each_entry(g, &d->group_list, next) { + dc.group = g; ret = iommu_group_for_each_dev(g->iommu_group, &dc, fn); if (ret) @@ -2347,7 +2349,12 @@ static int vfio_bind_gpasid_fn(struct device *dev, void *data) struct iommu_gpasid_bind_data *gbind_data = (struct iommu_gpasid_bind_data *) dc->data; - return iommu_sva_bind_gpasid(dc->domain, dev, gbind_data); + if (dc->group->mdev_group) + return iommu_sva_bind_gpasid(dc->domain, + vfio_mdev_get_iommu_device(dev), gbind_data); + else + return iommu_sva_bind_gpasid(dc->domain, + dev, gbind_data); } static int vfio_unbind_gpasid_fn(struct device *dev, void *data) @@ -2356,8 +2363,13 @@ static int vfio_unbind_gpasid_fn(struct device *dev, void *data) struct iommu_gpasid_bind_data *gbind_data = (struct iommu_gpasid_bind_data *) dc->data; - return iommu_sva_unbind_gpasid(dc->domain, dev, + if (dc->group->mdev_group) + return iommu_sva_unbind_gpasid(dc->domain, + vfio_mdev_get_iommu_device(dev), gbind_data->hpasid); + else + return iommu_sva_unbind_gpasid(dc->domain, dev, + gbind_data->hpasid); } /** @@ -2429,7 +2441,12 @@ static int vfio_cache_inv_fn(struct device *dev, void *data) struct iommu_cache_invalidate_info *cache_inv_info = (struct iommu_cache_invalidate_info *) dc->data; - return iommu_cache_invalidate(dc->domain, dev, cache_inv_info); + if (dc->group->mdev_group) + return iommu_cache_invalidate(dc->domain, + vfio_mdev_get_iommu_device(dev), cache_inv_info); + else + return iommu_cache_invalidate(dc->domain, + dev, cache_inv_info); } static long vfio_iommu_type1_ioctl(void *iommu_data,