From patchwork Mon Feb 27 19:54:33 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jean-Philippe Brucker X-Patchwork-Id: 9594025 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id C7AB760578 for ; Mon, 27 Feb 2017 20:24:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B6CAD2832D for ; Mon, 27 Feb 2017 20:24:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AB747281F9; Mon, 27 Feb 2017 20:24:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5D1E828304 for ; Mon, 27 Feb 2017 20:24:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751783AbdB0UYs (ORCPT ); Mon, 27 Feb 2017 15:24:48 -0500 Received: from foss.arm.com ([217.140.101.70]:58766 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751643AbdB0UYn (ORCPT ); Mon, 27 Feb 2017 15:24:43 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AD1951682; Mon, 27 Feb 2017 11:59:07 -0800 (PST) Received: from e106794-lin.cambridge.arm.com (e106794-lin.cambridge.arm.com [10.1.210.60]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 0ABFB3F3E1; Mon, 27 Feb 2017 11:59:04 -0800 (PST) From: Jean-Philippe Brucker Cc: Harv Abdulhamid , Will Deacon , Shanker Donthineni , Bjorn Helgaas , Sinan Kaya , Lorenzo Pieralisi , Catalin Marinas , Robin Murphy , Joerg Roedel , Nate Watterson , Alex Williamson , David Woodhouse , linux-arm-kernel@lists.infradead.org, linux-pci@vger.kernel.org, iommu@lists.linux-foundation.org, kvm@vger.kernel.org Subject: [RFC PATCH 22/30] iommu: Bind/unbind tasks to/from devices Date: Mon, 27 Feb 2017 19:54:33 +0000 Message-Id: <20170227195441.5170-23-jean-philippe.brucker@arm.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20170227195441.5170-1-jean-philippe.brucker@arm.com> References: <20170227195441.5170-1-jean-philippe.brucker@arm.com> To: unlisted-recipients:; (no To-header on input) Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add three functions to the IOMMU API. iommu_bind_task takes a device and a task as argument. If the IOMMU, the device and the bus support it, attach task to device and create a Process Address Space ID (PASID) unique to the device. DMA from the device can then use the PASID to read or write into the address space. iommu_unbind_task removes a bond created with iommu_bind_task. iommu_set_svm_ops allows a device driver to set some callbacks for specific SVM-related operations. Try to accommodate current implementations (AMD, Intel and ARM), by letting the IOMMU driver do all the work, but attempt by the same occasion to find intersections between implementations. * amd_iommu_v2 expects the device to allocate a PASID and pass it to the IOMMU. The driver also provides separate functions to register callbacks that handles failed PRI requests and invalidate PASIDs. int amd_iommu_bind_pasid(struct pci_dev *pdev, int pasid, struct task_struct *task) void amd_iommu_unbind_pasid(struct pci_dev *pdev, int pasid) int amd_iommu_set_invalid_ppr_cb(struct pci_dev *pdev, amd_iommu_invalid_ppr_cb cb) int amd_iommu_set_invalidate_ctx_cb(struct pci_dev *pdev, amd_iommu_invalidate_ctx cb) * intel-svm allocates a PASID, and requires the driver to pass "svm_dev_ops", which currently contains a fault callback. It also doesn't take a task as argument, but uses 'current'. int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ops *ops) int intel_svm_unbind_mm(struct device *dev, int pasid) * For arm-smmu-v3, PASID must be allocated by the SMMU driver since it indexes contexts in an array handled by the SMMU device. Bind and unbind =============== The following could suit existing implementations: int iommu_bind_task(struct device *dev, struct task_struct *task, int *pasid, int flags, void *priv); int iommu_unbind_task(struct device *dev, int pasid, int flags); This is similar to existing functions. * @dev is a SVM-capable device. If it is not, bind fails, * @task is a userspace task. It doesn't have to be current, but implementations can reject the call if they only support current. * @pasid is a handle for the bond. It would be nice to have the IOMMU driver handle PASID allocation, for consistency. Otherwise, the requirement for drivers to allocate PASIDs might be advertised in a capability. * @flags represents parameters of bind/unbind. We might want to reserve a few bits, maybe the bottom half, for the API, and give the rest to the driver. * @priv will be passed to SVM callbacks targeting this bond SVM device callbacks ===================== Making svm_dev_ops (here iommu_svm_ops) a first-class citizen of struct device would be a useful next step. Device drivers could set this structure when they want to participate in SVM. For the moment, iommu_set_svm_ops must be called. I'm not sure what to do when assigning a device via VFIO. Should we remove the SVM ops when detaching from a domain, or have the device driver remove them when detaching itself from a device? Fault handling -------------- The first callback allows a driver to be notified when the IOMMU driver cannot handle a fault. amd_iommu_v2 has: int (*amd_iommu_invalid_ppr_cb)(struct pci_dev *pdev, int pasid, unsigned long address, u16 prot) intel-svm has (called for all faults): void (*fault_cb)(struct device *dev, int pasid, u64 address, u32 private, int rwxp, int response) We put the following in iommu_svm_ops: int (*handle_fault)(struct device *dev, int pasid, u64 address, int prot, int status, void *priv); The IOMMU driver calls handle_mm_fault and sends the result back to the device. If the fault cannot be handled, it gives a chance to the device driver to record the fault and maybe even fix it up. @pasid, @address and @prot are copied from the page request. @status is the return value of handle_mm_fault. @prot could use the format defined in iommu.h (IOMMU_READ, IOMMU_WRITE, etc.) @status could be a combination of VM_FAULT_* as returned by handle_mm_fault, but this leaves out the case where we don't even reach the fault handling part. We could instead define new status flags: one for failure to locate the context associated to the PASID, one for failure of mm to handle the fault. We cannot piggy-back on existing IOMMU_FAULT_READ and WRITE in their current state, because devices might request both read and write permissions at the same time. They would need to be redefined as flags. All callbacks have a @priv field. This is an opaque pointer set by the device driver when binding. This way the device driver gets both a PASID and its metadata in the callback, and we avoid duplicating pasid state lookups in both IOMMU driver and device driver. Another question is the location of the callback. IOMMU driver could notify device driver either: * before handle_mm_fault, to do some custom fault handling and perhaps bypass the IOMMU handler entirely, * after handle_mm_fault, to notify the driver of an error (AMD), * after handle_mm_fault, to notify the driver of any page request (Intel), We might want to let the driver decide when binding a PASID, or offer two callbacks: handle_fault and report_fault. I don't have a proposal for this yet. handle_fault returns the response that the IOMMU driver should send to the device. Either success, meaning that the page has been mapped (or it is likely to succeed later), or failure, meaning that the device shouldn't bother retrying. It would be nice to reconcile with the iommu_fault_handler API, that isn't widely used yet but is being considered for handling domain faults from platform devices on the SMMUv2, using the stall model instead of ATS/PRI. Yet another concern for ARM is that platform devices may issue traffic over multiple stream IDs, for instance one stream ID per channel in a DMA engine. handle_fault doesn't provide a way to pass those stream IDs back to the driver. PASID invalidation ------------------ Next, we need to let the IOMMU driver notify the device driver before it attempts to unbind a PASID. Subsequent patches discuss PASID invalidation in more details, so we'll simply propose the following interface for now. AMD has: void (*amd_iommu_invalidate_ctx)(struct pci_dev *pdev, int pasid); We put the following in iommu_svm_ops: int (*invalidate_pasid)(struct device *dev, int pasid, void *priv); Capability detection ==================== I didn't add any public function for detecting SVM capability yet. In my opinion, a nice way to do it is to have user query the state of the device to know if they can call bind/unbind. If the IOMMU supports SVM, and the IOMMU driver was able to enable it successfully in the device, then user can call bind/unbind on the device. In the VFIO patch later in this series, I implemented the PCIe detection like this: if ATS, PRI and PASID are enabled (by the IOMMU driver), then the device can do SVM. If for some reason the IOMMU is incompatible with the device's SVM properties or is incompatible with the MMU page tables, then it shouldn't enable PRI or PASID. For platform devices, the requirements are very blurry at the moment. We'll probably add a device- tree property saying that a device and its bus are SVM-capable. The following interface could be added to the API: int iommu_svm_capable(struct device *dev, int flags); This tells the device driver whether the IOMMU driver is capable of binding a task to the device. @flags may contain specific SVM capabilities (paging/pinned, executable, etc) and the function could return a subset of these flags. For PCI devices, everything is enabled when this call is successful. For platform devices the device driver would have to enable SVM itself. API naming ========== I realize that "SVM" as a name isn't great because the svm namespace is already taken by AMD-V (Secure Virtual Machine) in arch/x86. Also, the name itself doesn't say much. I personally prefer "Unified Virtual Addressing" (UVA), adopted by CUDA, or rather Unified Virtual Address Space (UVAS). Another possibility is Unified Virtual Memory (UVM). Acronym UAS for Unified Address Space is already used by USB. Same for Shared Address Space (SAS), already in use in the kernel, but SVAS would work (although it doesn't look good). Signed-off-by: Jean-Philippe Brucker --- drivers/iommu/iommu.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/iommu.h | 41 +++++++++++++++++++ 2 files changed, 149 insertions(+) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 8ea14f41a979..26c5f6528c69 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1438,6 +1438,114 @@ void iommu_detach_group(struct iommu_domain *domain, struct iommu_group *group) } EXPORT_SYMBOL_GPL(iommu_detach_group); +int iommu_set_svm_ops(struct device *dev, const struct iommu_svm_ops *svm_ops) +{ + const struct iommu_ops *ops; + struct iommu_group *group; + int ret; + + group = iommu_group_get_for_dev(dev); + if (IS_ERR(group)) + return PTR_ERR(group); + + ops = dev->bus->iommu_ops; + if (!ops->set_svm_ops) { + iommu_group_put(group); + return -ENODEV; + } + + mutex_lock(&group->mutex); + ret = ops->set_svm_ops(dev, svm_ops); + mutex_unlock(&group->mutex); + + iommu_group_put(group); + return ret; + +} +EXPORT_SYMBOL_GPL(iommu_set_svm_ops); + +/* + * iommu_bind_task - Share task address space with device + * + * @dev: device to bind + * @task: task to bind + * @pasid: valid address where the PASID is stored + * @flags: driver-specific flags + * @priv: private data to associate with the bond + * + * Create a bond between device and task, allowing the device to access the task + * address space using @pasid. Intel and ARM SMMU drivers allocate and return + * the PASID, while AMD requires the caller to allocate a PASID beforehand. + * + * iommu_unbind_task must be called with this PASID before the task exits. + */ +int iommu_bind_task(struct device *dev, struct task_struct *task, int *pasid, + int flags, void *priv) +{ + const struct iommu_ops *ops; + struct iommu_group *group; + int ret; + + if (!pasid) + return -EINVAL; + + group = iommu_group_get(dev); + if (!group) + return -ENODEV; + + ops = dev->bus->iommu_ops; + if (!ops->bind_task) { + iommu_group_put(group); + return -ENODEV; + } + + mutex_lock(&group->mutex); + if (!group->domain) + ret = -EINVAL; + else + ret = ops->bind_task(dev, task, pasid, flags, priv); + mutex_unlock(&group->mutex); + + iommu_group_put(group); + return ret; +} +EXPORT_SYMBOL_GPL(iommu_bind_task); + +/* + * iommu_unbind_task - Remove a bond created with iommu_bind_task + * + * @dev: device bound to the task + * @pasid: identifier of the bond + * @flags: state of the PASID and driver-specific flags + */ +int iommu_unbind_task(struct device *dev, int pasid, int flags) +{ + const struct iommu_ops *ops; + struct iommu_group *group; + int ret; + + group = iommu_group_get(dev); + if (!group) + return -ENODEV; + + ops = dev->bus->iommu_ops; + if (!ops->unbind_task) { + iommu_group_put(group); + return -ENODEV; + } + + mutex_lock(&group->mutex); + if (!group->domain) + ret = -EINVAL; + else + ret = ops->unbind_task(dev, pasid, flags); + mutex_unlock(&group->mutex); + + iommu_group_put(group); + return ret; +} +EXPORT_SYMBOL_GPL(iommu_unbind_task); + phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova) { if (unlikely(domain->ops->iova_to_phys == NULL)) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 6a6de187ddc0..9554f45d4305 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -145,6 +145,16 @@ struct iommu_resv_region { int type; }; +/* + * @handle_fault: report or handle a fault from the device (FIXME: imprecise) + * @invalidate_pasid: stop using a PASID. + */ +struct iommu_svm_ops { + int (*handle_fault)(struct device *dev, int pasid, u64 address, + int prot, int status, void *priv); + int (*invalidate_pasid)(struct device *dev, int pasid, void *priv); +}; + #ifdef CONFIG_IOMMU_API /** @@ -154,6 +164,9 @@ struct iommu_resv_region { * @domain_free: free iommu domain * @attach_dev: attach device to an iommu domain * @detach_dev: detach device from an iommu domain + * @set_svm_ops: set SVM callbacks for device + * @bind_task: attach a task address space to a device + * @unbind_task: detach a task address space from a device * @map: map a physically contiguous memory region to an iommu domain * @unmap: unmap a physically contiguous memory region from an iommu domain * @map_sg: map a scatter-gather list of physically contiguous memory chunks @@ -183,6 +196,10 @@ struct iommu_ops { int (*attach_dev)(struct iommu_domain *domain, struct device *dev); void (*detach_dev)(struct iommu_domain *domain, struct device *dev); + int (*set_svm_ops)(struct device *dev, const struct iommu_svm_ops *ops); + int (*bind_task)(struct device *dev, struct task_struct *task, + int *pasid, int flags, void *priv); + int (*unbind_task)(struct device *dev, int pasid, int flags); int (*map)(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot); size_t (*unmap)(struct iommu_domain *domain, unsigned long iova, @@ -403,6 +420,13 @@ void iommu_fwspec_free(struct device *dev); int iommu_fwspec_add_ids(struct device *dev, u32 *ids, int num_ids); const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode); +extern int iommu_set_svm_ops(struct device *dev, + const struct iommu_svm_ops *svm_ops); +extern int iommu_bind_task(struct device *dev, struct task_struct *task, + int *pasid, int flags, void *priv); + +extern int iommu_unbind_task(struct device *dev, int pasid, int flags); + #else /* CONFIG_IOMMU_API */ struct iommu_ops {}; @@ -663,6 +687,23 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode) return NULL; } +static inline int iommu_set_svm_ops(struct device *dev, + const struct iommu_svm_ops *svm_ops) +{ + return -ENODEV; +} + +static inline int iommu_bind_task(struct device *dev, struct task_struct *task, + int *pasid, int flags, void *priv) +{ + return -ENODEV; +} + +static int iommu_unbind_task(struct device *dev, int pasid, int flags) +{ + return -ENODEV; +} + #endif /* CONFIG_IOMMU_API */ #endif /* __LINUX_IOMMU_H */