From patchwork Fri Feb 5 20:52:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12070965 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2269CC433E6 for ; Fri, 5 Feb 2021 21:15:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D6FC864FBA for ; Fri, 5 Feb 2021 21:15:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233310AbhBEVPJ (ORCPT ); Fri, 5 Feb 2021 16:15:09 -0500 Received: from mga12.intel.com ([192.55.52.136]:30881 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233282AbhBETLZ (ORCPT ); Fri, 5 Feb 2021 14:11:25 -0500 IronPort-SDR: wYDldr4DhXycNHLDSL9Yasl8x4khfqwaknoOdL4WiLLyrWgnI8xvWaAnhKadXt/TQAKEKoimN+ xXIRLVUrjidg== X-IronPort-AV: E=McAfee;i="6000,8403,9886"; a="160645130" X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="160645130" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:53:01 -0800 IronPort-SDR: UTqjwzv3JE77P49uRvkMuBpMF5TxMoIO2nJupHY3O3m2A/nBmkibq473+BYTa4o6nJhQYhf99X pLXKckMHRjqg== X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="434596829" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:52:59 -0800 Subject: [PATCH v5 01/14] vfio/mdev: idxd: add theory of operation documentation for idxd mdev From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, parav@mellanox.com, netanelg@mellanox.com, shahafs@mellanox.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 05 Feb 2021 13:52:59 -0700 Message-ID: <161255837927.339900.13569654778245565488.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> References: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add idxd vfio mediated device theory of operation documentation. Provide description on mdev design, usage, and why vfio mdev was chosen. Reviewed-by: Ashok Raj Reviewed-by: Kevin Tian Signed-off-by: Dave Jiang --- Documentation/driver-api/vfio/mdev-idxd.rst | 397 +++++++++++++++++++++++++++ MAINTAINERS | 1 2 files changed, 398 insertions(+) create mode 100644 Documentation/driver-api/vfio/mdev-idxd.rst diff --git a/Documentation/driver-api/vfio/mdev-idxd.rst b/Documentation/driver-api/vfio/mdev-idxd.rst new file mode 100644 index 000000000000..9bf93eafc7c8 --- /dev/null +++ b/Documentation/driver-api/vfio/mdev-idxd.rst @@ -0,0 +1,397 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============= +IDXD Overview +============= +IDXD (Intel Data Accelerator Driver) is the driver for the Intel Data +Streaming Accelerator (DSA). Intel DSA is a high performance data copy +and transformation accelerator. In addition to data move operations, +the device also supports data fill, CRC generation, Data Integrity Field +(DIF), and memory compare and delta generation. Intel DSA supports +a variety of PCI-SIG defined capabilities such as Address Translation +Services (ATS), Process address Space ID (PASID), Page Request Interface +(PRI), Message Signalled Interrupts Extended (MSI-X), and Advanced Error +Reporting (AER). Some of those capabilities enable the device to support +Shared Virtual Memory (SVM), or also known as Shared Virtual Addressing +(SVA). Intel DSA also supports Intel Scalable I/O Virtualization (SIOV) +to improve scalability of device assignment. + + +The Intel DSA device contains the following basic components: +* Work queue (WQ) + + A WQ is an on device storage to queue descriptors to the + device. Requests are added to a WQ by using new CPU instructions + (MOVDIR64B and ENQCMD(S)) to write the memory mapped “portal” + associated with each WQ. + +* Engine + + Operation unit that pulls descriptors from WQs and processes them. + +* Group + + Abstract container to associate one or more engines with one or more WQs. + + +Two types of WQs are supported: +* Dedicated WQ (DWQ) + + A single client should owns this exclusively and can submit work + to it. The MOVDIR64B instruction is used to submit descriptors to + this type of WQ. The instruction is a posted write, therefore the + submitter must ensure not exceed the WQ length for submission. The + use of PASID is optional with DWQ. Multiple clients can submit to + a DWQ, but sychronization is required due to when the WQ is full, + the submission is silently dropped. + +* Shared WQ (SWQ) + + Multiple clients can submit work to this WQ. The submitter must use + ENQMCDS (from supervisor mode) or ENQCMD (from user mode). These + instructions will indicate via EFLAGS.ZF bit whether a submission + succeeds. The use of PASID is mandatory to identify the address space + of each client. + + +For more information about the new instructions [1][2]. + +The IDXD driver is broken down into following usages: +* In kernel interface through dmaengine subsystem API. +* Userspace DMA support through character device. mmap(2) is utilized + to map directly to mmio address (or portals) for descriptor submission. +* VFIO Mediated device (mdev) supporting device passthrough usages. This + is only for the mdev usage. + + +================================= +Assignable Device Interface (ADI) +================================= +The term ADI is used to represent the minimal unit of assignment for +Intel Scalable IOV device. Each ADI instance refers to the set of device +backend resources that are allocated, configured and organized as an +isolated unit. + +Intel DSA defines each WQ as an ADI. The MMIO registers of each work queue +are partitioned into two categories: +* MMIO registers accessed for data-path operations. +* MMIO registers accessed for control-path operations. + +Data-path MMIO registers of each WQ are contained within +one or more system page size aligned regions and can be mapped in the +CPU page table for direct access from the guest. Control-path MMIO +registers of all WQs are located together but segregated from data-path +MMIO regions. Therefore, guest updates to control-path registers must +be intercepted and then go through the host driver to be reflected in +the device. + +Data-path MMIO registers of DSA WQ are portals for submitting descriptors +to the device. There are four portals per WQ, each being 64 bytes +in size and located on a separate 4KB page in BAR2. Each portal has +different implications regarding interrupt message type (MSI vs. IMS) +and occupancy control (limited vs. unlimited). It is not necessary to +map all portals to the guest. + +Control-path MMIO registers of DSA WQ include global configurations +(shared by all WQs) and WQ-specific configurations. The owner +(e.g. the guest) of the WQ is expected to only change WQ-specific +configurations. Intel DSA spec introduces a “Configuration Support” +capability which, if cleared, indicates that some fields of WQ +configuration registers are read-only and the WQ configuration is +pre-configured by the host. + + +Interrupt Message Store (IMS) +----------------------------- +The ADI utilizes Interrupt Message Store (IMS), a device-specific MSI +implementation, instead of MSIX for interrupts for the guest. This +preserves MSIX for host usages and also allows a significantly larger +number of interrupt vectors for large number of guests usage. + +Intel DSA device implements IMS as on-device memory mapped unified +storage. Each interrupt message is stored as a DWORD size data payload +and a 64-bit address (same as MSI-X). Access to the IMS is through the +host idxd driver. + +The idxd driver makes use of the generic IMS irq chip and domain which +stores the interrupt messages in an array in device memory. Allocation and +freeing of interrupts happens via the generic msi_domain_alloc/free_irqs() +interface. Driver only needs to ensure the interrupt domain is stored in +the underlying device struct. + + +ADI Isolation +------------- +Operations or functioning of one ADI must not affect the functioning +of another ADI or the physical device. Upstream memory requests from +different ADIs are distinguished using a Process Address Space Identifier +(PASID). With the support of PASID-granular address translation in Intel +VT-d, the address space targeted by a request from ADI can be a Host +Virtual Address (HVA), Host I/O Virtual Address (HIOVA), Guest Physical +Address (GPA), Guest Virtual Address (GVA), Guest I/O Virtual Address +(GIOVA), etc. The PASID identity for an ADI is expected to be accessed +or modified by privileged software through the host driver. + +========================= +Virtual DSA (vDSA) Device +========================= +The DSA WQ itself is not a PCI device thus must be composed into a +virtual DSA device to the guest. + +The composition logic needs to handle four main requirements: +* Emulate PCI config space. +* Map data-path portals for direct access from the guest. +* Emulate control-path MMIO registers and selectively forward WQ + configuration requests through host driver to the device. +* Forward and emulate WQ interrupts to the guest. + +The composition logic tells the guest aspects of WQ which are configurable +through a combination of capability fields, e.g.: +* Configuration Support (if cleared, most aspects are not modifiable). +* WQ Mode Support (if cleared, cannot change between dedicated and + shared mode). +* Dedicated Mode Support. +* Shared Mode Support. +* ... + +The virtual capability fields are set according to the vDSA +type. Following is an example of vDSA types and related WQ configurability: +* Type ‘1dwq-v1’ + * One DSA gen1 dedicated WQ to this guest + * Guest cannot share the WQ between its clients (no guest SVA) + * Guest cannot change any WQ configuration + +Besides, the composition logic also needs to serve administrative commands +(thru virtual CMD register) through host driver, including: +* Drain/abort all descriptors submitted by this guest. +* Drain/abort descriptors associated with a PASID. +* Enable/disable/reset the WQ (when it’s not shared by multiple VMs). +* Request interrupt handle. + +With this design, vDSA emulation is **greatly simplified**. Most +registers are emulated in simple READ-ONLY flavor, and handling limited +configurability is required only for a few registers. + +=========================== +VFIO mdev vs. userspace DMA +=========================== +There are two avenues to support vDSA composition. +1. VFIO mediated device (mdev) +2. Userspace DMA through char device + +VFIO mdev provides a generic subdevice passthrough framework. Unified +uAPIs are used for both device and subdevice passthrough, thus any +userspace VMM which already supports VFIO device passthrough would +naturally support mdev/subdevice passthrough. The implication of VFIO +mdev is putting emulation of device interface in the kernel (part of +host driver) which must be carefully scrutinized. Fortunately, vDSA +composition includes only a small portion of emulation code, due to the +fact that most registers are simply READ-ONLY to the guest. The majority +logic of handling limited configurability and administrative commands +is anyway required to sit in the kernel, regardless of which kernel uAPI +is pursued. In this regard, VFIO mdev is a nice fit for vDSA composition. + +IDXD driver provides a char device interface for applications to +map the WQ portal and directly submit descriptors to do DMA. This +interface provides only data-path access to userspace and relies on +the host driver to handle control-path configurations. Expanding such +interface to support subdevice passthrough allows moving the emulation +code to userspace. However, quite some work is required to grow it from +an application-oriented interface into a passthrough-oriented interface: +new uAPIs to handle guest WQ configurability and administrative commands, +and new uAPIs to handle passthrough specific requirements (e.g. DMA map, +guest SVA, live migration, posted interrupt, etc.). And once it is done, +every userspace VMM has to explicitly bind to IDXD specific uAPI, even +though the real user is in the guest (instead of the VMM itself) in the +passthrough scenario. + +Although some generalization might be possible to reduce the work of +handling passthrough, we feel the difference between userspace DMA +and subdevice passthrough is distinct in IDXD. Therefore, we choose to +build vDSA composition on top of VFIO mdev framework and leave userspace +DMA intact after discussion at LPC 2020. + +============================= +Host Registration and Release +============================= + +Intel DSA reports support for Intel Scalable IOV via a PCI Express +Designated Vendor Specific Extended Capability (DVSEC). In addition, +PASID-granular address translation capability is required in the +IOMMU. During host initialization, the IDXD driver should check the +presence of both capabilities before calling mdev_register_device() +to register with the VFIO mdev framework and provide a set of ops +(struct mdev_parent_ops). The IOMMU capability is indicated by the +IOMMU_DEV_FEAT_AUX feature flag with iommu_dev_has_feature() and enabled +with iommu_dev_enable_feature(). + +On release, iommu_dev_disable_feature() is called after +mdev_unregister_device() to disable the IOMMU_DEV_FEAT_AUX flag that +the driver enabled during host initialization. + +The mdev_parent_ops data structure is filled out by the driver to provide +a number of ops called by VFIO mdev framework:: + + struct mdev_parent_ops { + .supported_type_groups + .create + .remove + .open + .release + .read + .write + .mmap + .ioctl + }; + +Supported_type_groups +--------------------- +At the moment only one vDSA type is supported. + +“1dwq-v1”: + Single dedicated WQ (DSA 1.0) with read-only configuration exposed to + the guest. On the guest kernel, a vDSA device shows up with a single + WQ that is pre-configured by the host. The configuration for the WQ + is entirely read-only and cannot be reconfigured. There is no support + of guest SVA on this WQ. + + The interrupt vector 0 is emulated by the host driver to support the admin + command completion and error reporting. A second interrupt vector is + bound to the IMS and used for I/O operation. In this implementation, + there are only two vectors being supported. + +create +------ +API function to create the mdev. mdev_set_iommu_device() is called to +associate the mdev device to the parent PCI device. This function is +where the driver sets up and initializes the resources to support a single +mdev device. This is triggered through sysfs to initiate the creation. + +remove +------ +API function that mirrors the create() function and releases all the +resources backing the mdev. This is also triggered through sysfs. + +open +---- +API function that is called down from VFIO userspace to indicate to the +driver that the upper layers are ready to claim and utilize the mdev. IMS +entries are allocated and setup here. + +release +------- +The mirror function to open that releases the mdev by VFIO userspace. + +read / write +------------ +This is where the Intel IDXD driver provides read/write emulation of +PCI config space and MMIO registers. These paths are the “slow” path +of the mediated device and emulation is used rather than direct access +to the hardware resources. Typically configuration and administrative +commands go through this path. This allows the mdev to show up as a +virtual PCI device on the guest kernel. + +The emulation of PCI config space is nothing special, which is simply +copied from kvmgt. In the future this part might be consolidated to +reduce duplication. + +Emulating MMIO reads are simply memory copies. There is no side-effect +to be emulated upon guest read. + +Emulating MMIO writes are required only for a few registers, due to +read-only configuration on the ‘1dwq-v1’ type. Majority of composition +logic is hooked in the CMD register for performing administrative commands +such as WQ drain, abort, enable, disable and reset operations. The rest of +the emulation is about handling errors (GENCTRL/SWERROR) and interrupts +(INTCAUSE/MSIXPERM) on the vDSA device. Future mdev types might allow +limited WQ configurability, which then requires additional emulation of +the WQCFG register. + +mmap +---- +This is the function that provides the setup to expose a portion of the +hardware, also known as portals, for direct access for “fast” path +operations through the mmap() syscall. A limited region of the hardware +is mapped to the guest for direct I/O submission. + +There are four portals per WQ: unlimited MSI-X, limited MSI-X, unlimited +IMS, limited IMS. Descriptors submitted to limited portals are subject +to threshold configuration limitations for shared WQs. The MSI-X portals +are used for host submissions, and the IMS portals are mapped to vm for +guest submission. + +ioctl +----- +This API function does several things +* Provides general device information to VFIO userspace. +* Provides device region information (PCI, mmio, etc). +* Get interrupts information +* Setup interrupts for the mediated device. +* Mdev device reset + +For the Intel idxd driver, Interrupt Message Store (IMS) vectors are being +used for mdev interrupts rather than MSIX vectors. IMS provides additional +interrupt vectors outside of PCI MSIX specification in order to support +significantly more vectors. The emulated interrupt (0) is connected through +kernel eventfd. When interrupt 0 needs to be asserted, the driver will +signal the eventfd to trigger vector 0 interrupt on the guest. +The IMS interrupts are setup via eventfd as well. However, it utilizes +irq bypass manager to directly inject the interrupt in the guest. + +To allocate IMS, we utilize the IMS array APIs. On host init, we need +to create the MSI domain:: + + struct ims_array_info ims_info; + struct device *dev = &pci_dev->dev; + + + /* assign the device IMS size */ + ims_info.max_slots = max_ims_size; + /* assign the MMIO base address for the IMS table */ + ims_info.slots = mmio_base + ims_offset; + /* assign the MSI domain to the device */ + dev->msi_domain = pci_ims_array_create_msi_irq_domain(pci_dev, &ims_info); + +When we are ready to allocate the interrupts:: + + struct device *dev = mdev_dev(mdev); + + irq_domain = pci_dev->dev.msi_domain; + /* the irqs are allocated against device of mdev */ + rc = msi_domain_alloc_irqs(irq_domain, dev, num_vecs); + + + /* we can retrieve the slot index from msi_entry */ + for_each_msi_entry(entry, dev) { + slot_index = entry->device_msi.hwirq; + irq = entry->irq; + } + + request_irq(irq, interrupt_handler_function, 0, “ims”, context); + + +The DSA device is structured such that MSI-X table entry 0 is used for +admin commands completion, error reporting, and other misc commands. The +remaining MSI-X table entries are used for WQ completion. For vm support, +the virtual device also presents a similar layout. Therefore, vector 0 +is emulated by the software. Additional vector(s) are associated with IMS. + +The index (slot) for the per device IMS entry is managed by the MSI +core. The index is the “interrupt handle” that the guest kernel +needs to program into a DMA descriptor. That interrupt handle tells the +hardware which IMS vector to trigger the interrupt on for the host. + +The virtual device presents an admin command called “request interrupt +handle” that is not supported by the physical device. On probe of +the DSA device on the guest kernel, the guest driver will issue the +“request interrupt handle” command in order to get the interrupt +handle for descriptor programming. The host driver will return the +assigned slot for the IMS entry table to the guest. + +========== +References +========== +[1] https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html +[2] https://software.intel.com/en-us/articles/intel-sdm +[3] https://software.intel.com/sites/default/files/managed/cc/0e/intel-scalable-io-virtualization-technical-specification.pdf +[4] https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification diff --git a/MAINTAINERS b/MAINTAINERS index c2114daa6bc7..ae34b0331eb4 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8970,6 +8970,7 @@ INTEL IADX DRIVER M: Dave Jiang L: dmaengine@vger.kernel.org S: Supported +F: Documentation/driver-api/vfio/mdev-idxd.rst F: drivers/dma/idxd/* F: include/uapi/linux/idxd.h From patchwork Fri Feb 5 20:53:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12070967 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84055C433DB for ; Fri, 5 Feb 2021 21:15:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 42EF964E51 for ; Fri, 5 Feb 2021 21:15:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233468AbhBEVPP (ORCPT ); Fri, 5 Feb 2021 16:15:15 -0500 Received: from mga02.intel.com ([134.134.136.20]:36319 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233580AbhBETLV (ORCPT ); Fri, 5 Feb 2021 14:11:21 -0500 IronPort-SDR: o1CebUpBRUMW728vh3sclPQDolHlG40Nfj0eUZeKDNsncjfdyZYL2mklupHr9b+l/oJzYpCghZ SaqyYo99jVOg== X-IronPort-AV: E=McAfee;i="6000,8403,9886"; a="168606260" X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="168606260" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:53:06 -0800 IronPort-SDR: DhzVfB6V8LUBAZ2D+uT2lfqLGg+JhbtV0es1K/BU8hZchskavCUKdehpYTiM4k2uTc5my5Pmaj zyigBoeI4A6Q== X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="373680385" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:53:06 -0800 Subject: [PATCH v5 02/14] dmaengine: idxd: add IMS detection in base driver From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, parav@mellanox.com, netanelg@mellanox.com, shahafs@mellanox.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 05 Feb 2021 13:53:05 -0700 Message-ID: <161255838583.339900.1371114106574605023.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> References: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org In preparation for support of VFIO mediated device for idxd driver, the enabling for Interrupt Message Store (IMS) interrupts is added for the idxd With IMS support the idxd driver can dynamically allocate interrupts on a per mdev basis based on how many IMS vectors that are mapped to the mdev device. This commit only provides the detection functions in the base driver and not the VFIO mdev code utilization. The commit has some portal related changes. A "portal" is a special location within the MMIO BAR2 of the DSA device where descriptors are submitted via the CPU command MOVDIR64B or ENQCMD(S). The offset for the portal address determines whether the submitted descriptor is for MSI-X or IMS notification. See Intel SIOV spec for more details: https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification Signed-off-by: Dave Jiang --- Documentation/ABI/stable/sysfs-driver-dma-idxd | 6 ++++++ drivers/dma/idxd/cdev.c | 4 ++-- drivers/dma/idxd/device.c | 2 +- drivers/dma/idxd/idxd.h | 13 +++++++++---- drivers/dma/idxd/init.c | 19 +++++++++++++++++++ drivers/dma/idxd/registers.h | 7 +++++++ drivers/dma/idxd/sysfs.c | 9 +++++++++ 7 files changed, 53 insertions(+), 7 deletions(-) diff --git a/Documentation/ABI/stable/sysfs-driver-dma-idxd b/Documentation/ABI/stable/sysfs-driver-dma-idxd index 55285c136cf0..95cd7975f488 100644 --- a/Documentation/ABI/stable/sysfs-driver-dma-idxd +++ b/Documentation/ABI/stable/sysfs-driver-dma-idxd @@ -129,6 +129,12 @@ KernelVersion: 5.10.0 Contact: dmaengine@vger.kernel.org Description: The last executed device administrative command's status/error. +What: /sys/bus/dsa/devices/dsa/ims_size +Date: Oct 15, 2020 +KernelVersion: 5.11.0 +Contact: dmaengine@vger.kernel.org +Description: The total number of vectors available for Interrupt Message Store. + What: /sys/bus/dsa/devices/wq./block_on_fault Date: Oct 27, 2020 KernelVersion: 5.11.0 diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c index 0db9b82ed8cf..b1518106434f 100644 --- a/drivers/dma/idxd/cdev.c +++ b/drivers/dma/idxd/cdev.c @@ -205,8 +205,8 @@ static int idxd_cdev_mmap(struct file *filp, struct vm_area_struct *vma) return rc; vma->vm_flags |= VM_DONTCOPY; - pfn = (base + idxd_get_wq_portal_full_offset(wq->id, - IDXD_PORTAL_LIMITED)) >> PAGE_SHIFT; + pfn = (base + idxd_get_wq_portal_full_offset(wq->id, IDXD_PORTAL_LIMITED, + IDXD_IRQ_MSIX)) >> PAGE_SHIFT; vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); vma->vm_private_data = ctx; diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index 205156afeb54..d6c447d09a6f 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -290,7 +290,7 @@ int idxd_wq_map_portal(struct idxd_wq *wq) resource_size_t start; start = pci_resource_start(pdev, IDXD_WQ_BAR); - start += idxd_get_wq_portal_full_offset(wq->id, IDXD_PORTAL_LIMITED); + start += idxd_get_wq_portal_full_offset(wq->id, IDXD_PORTAL_LIMITED, IDXD_IRQ_MSIX); wq->portal = devm_ioremap(dev, start, IDXD_PORTAL_SIZE); if (!wq->portal) diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index a9386a66ab72..90c9458903e1 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -163,6 +163,7 @@ enum idxd_device_flag { IDXD_FLAG_CONFIGURABLE = 0, IDXD_FLAG_CMD_RUNNING, IDXD_FLAG_PASID_ENABLED, + IDXD_FLAG_IMS_SUPPORTED, }; struct idxd_device { @@ -190,6 +191,7 @@ struct idxd_device { int num_groups; + u32 ims_offset; u32 msix_perm_offset; u32 wqcfg_offset; u32 grpcfg_offset; @@ -197,6 +199,7 @@ struct idxd_device { u64 max_xfer_bytes; u32 max_batch_size; + int ims_size; int max_groups; int max_engines; int max_tokens; @@ -279,15 +282,17 @@ enum idxd_interrupt_type { IDXD_IRQ_IMS, }; -static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot) +static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot, + enum idxd_interrupt_type irq_type) { - return prot * 0x1000; + return prot * 0x1000 + irq_type * 0x2000; } static inline int idxd_get_wq_portal_full_offset(int wq_id, - enum idxd_portal_prot prot) + enum idxd_portal_prot prot, + enum idxd_interrupt_type irq_type) { - return ((wq_id * 4) << PAGE_SHIFT) + idxd_get_wq_portal_offset(prot); + return ((wq_id * 4) << PAGE_SHIFT) + idxd_get_wq_portal_offset(prot, irq_type); } static inline void idxd_set_type(struct idxd_device *idxd) diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index 0c982337ef84..ee56b92108d8 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -254,10 +254,28 @@ static void idxd_read_table_offsets(struct idxd_device *idxd) dev_dbg(dev, "IDXD Work Queue Config Offset: %#x\n", idxd->wqcfg_offset); idxd->msix_perm_offset = offsets.msix_perm * IDXD_TABLE_MULT; dev_dbg(dev, "IDXD MSIX Permission Offset: %#x\n", idxd->msix_perm_offset); + idxd->ims_offset = offsets.ims * IDXD_TABLE_MULT; + dev_dbg(dev, "IDXD IMS Offset: %#x\n", idxd->ims_offset); idxd->perfmon_offset = offsets.perfmon * IDXD_TABLE_MULT; dev_dbg(dev, "IDXD Perfmon Offset: %#x\n", idxd->perfmon_offset); } +static void idxd_check_ims(struct idxd_device *idxd) +{ + struct pci_dev *pdev = idxd->pdev; + + /* verify that we have IMS vectors supported by device */ + if (idxd->hw.gen_cap.max_ims_mult) { + idxd->ims_size = idxd->hw.gen_cap.max_ims_mult * 256ULL; + dev_dbg(&pdev->dev, "IMS size: %u\n", idxd->ims_size); + set_bit(IDXD_FLAG_IMS_SUPPORTED, &idxd->flags); + dev_dbg(&pdev->dev, "IMS supported for device\n"); + return; + } + + dev_dbg(&pdev->dev, "IMS unsupported for device\n"); +} + static void idxd_read_caps(struct idxd_device *idxd) { struct device *dev = &idxd->pdev->dev; @@ -276,6 +294,7 @@ static void idxd_read_caps(struct idxd_device *idxd) dev_dbg(dev, "max xfer size: %llu bytes\n", idxd->max_xfer_bytes); idxd->max_batch_size = 1U << idxd->hw.gen_cap.max_batch_shift; dev_dbg(dev, "max batch size: %u\n", idxd->max_batch_size); + idxd_check_ims(idxd); if (idxd->hw.gen_cap.config_en) set_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags); diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h index 5cbf368c7367..c97f700bcf34 100644 --- a/drivers/dma/idxd/registers.h +++ b/drivers/dma/idxd/registers.h @@ -385,4 +385,11 @@ union wqcfg { #define GRPENGCFG_OFFSET(idxd_dev, n) ((idxd_dev)->grpcfg_offset + (n) * GRPCFG_SIZE + 32) #define GRPFLGCFG_OFFSET(idxd_dev, n) ((idxd_dev)->grpcfg_offset + (n) * GRPCFG_SIZE + 40) +#define PCI_EXT_CAP_ID_DVSEC 0x23 /* Designated Vendor-Specific */ +#define PCI_DVSEC_HEADER1 0x4 /* Designated Vendor-Specific Header1 */ +#define PCI_DVSEC_HEADER2 0x8 /* Designated Vendor-Specific Header2 */ +#define PCI_DVSEC_ID_INTEL_SIOV 0x0005 +#define PCI_DVSEC_INTEL_SIOV_CAP 0x0014 +#define PCI_DVSEC_INTEL_SIOV_CAP_IMS 0X00000001 + #endif diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index 21c1e23cdf23..ab5c76e1226b 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -1444,6 +1444,14 @@ static ssize_t numa_node_show(struct device *dev, } static DEVICE_ATTR_RO(numa_node); +static ssize_t ims_size_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct idxd_device *idxd = container_of(dev, struct idxd_device, conf_dev); + + return sprintf(buf, "%u\n", idxd->ims_size); +} +static DEVICE_ATTR_RO(ims_size); + static ssize_t max_batch_size_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -1639,6 +1647,7 @@ static struct attribute *idxd_device_attributes[] = { &dev_attr_max_work_queues_size.attr, &dev_attr_max_engines.attr, &dev_attr_numa_node.attr, + &dev_attr_ims_size.attr, &dev_attr_max_batch_size.attr, &dev_attr_max_transfer_size.attr, &dev_attr_op_cap.attr, From patchwork Fri Feb 5 20:53:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12070905 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3067C433E0 for ; Fri, 5 Feb 2021 20:55:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8408064FB0 for ; Fri, 5 Feb 2021 20:55:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233713AbhBETMg (ORCPT ); Fri, 5 Feb 2021 14:12:36 -0500 Received: from mga17.intel.com ([192.55.52.151]:61557 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233537AbhBETLq (ORCPT ); Fri, 5 Feb 2021 14:11:46 -0500 IronPort-SDR: gsEoVXZASDEKW9dCeVSI0zZVi4G95fQJ/h/h/GE16XSJfbufKAGjo8+Yn1hs5107zV+/BpwR0s FsIShFR3f6mA== X-IronPort-AV: E=McAfee;i="6000,8403,9886"; a="161241404" X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="161241404" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:53:13 -0800 IronPort-SDR: WQo7t6UA6OS/eq/Jgf1sDDWOvVmnA2Nk9gHdyAJycQg/KlCybzaQhvmK9zrvo8ubBOkTS924/i UwXn2QJxvCmg== X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="434596858" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:53:12 -0800 Subject: [PATCH v5 03/14] dmaengine: idxd: add device support functions in prep for mdev From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, parav@mellanox.com, netanelg@mellanox.com, shahafs@mellanox.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 05 Feb 2021 13:53:11 -0700 Message-ID: <161255839186.339900.4487176195680305351.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> References: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add device support helper functions in preparation of adding VFIO mdev support. Signed-off-by: Dave Jiang --- drivers/dma/idxd/device.c | 61 ++++++++++++++++++++++++++++++++++++++++++ drivers/dma/idxd/idxd.h | 4 +++ drivers/dma/idxd/registers.h | 3 +- 3 files changed, 67 insertions(+), 1 deletion(-) diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index d6c447d09a6f..2491b27c8125 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -306,6 +306,30 @@ void idxd_wq_unmap_portal(struct idxd_wq *wq) devm_iounmap(dev, wq->portal); } +int idxd_wq_abort(struct idxd_wq *wq) +{ + struct idxd_device *idxd = wq->idxd; + struct device *dev = &idxd->pdev->dev; + u32 operand, status; + + dev_dbg(dev, "Abort WQ %d\n", wq->id); + if (wq->state != IDXD_WQ_ENABLED) { + dev_dbg(dev, "WQ %d not active\n", wq->id); + return -ENXIO; + } + + operand = BIT(wq->id % 16) | ((wq->id / 16) << 16); + dev_dbg(dev, "cmd: %u operand: %#x\n", IDXD_CMD_ABORT_WQ, operand); + idxd_cmd_exec(idxd, IDXD_CMD_ABORT_WQ, operand, &status); + if (status != IDXD_CMDSTS_SUCCESS) { + dev_dbg(dev, "WQ abort failed: %#x\n", status); + return -ENXIO; + } + + dev_dbg(dev, "WQ %d aborted\n", wq->id); + return 0; +} + int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid) { struct idxd_device *idxd = wq->idxd; @@ -412,6 +436,32 @@ void idxd_wq_quiesce(struct idxd_wq *wq) percpu_ref_exit(&wq->wq_active); } +void idxd_wq_setup_pasid(struct idxd_wq *wq, int pasid) +{ + struct idxd_device *idxd = wq->idxd; + int offset; + + lockdep_assert_held(&idxd->dev_lock); + + /* PASID fields are 8 bytes into the WQCFG register */ + offset = WQCFG_OFFSET(idxd, wq->id, WQCFG_PASID_IDX); + wq->wqcfg->pasid = pasid; + iowrite32(wq->wqcfg->bits[WQCFG_PASID_IDX], idxd->reg_base + offset); +} + +void idxd_wq_setup_priv(struct idxd_wq *wq, int priv) +{ + struct idxd_device *idxd = wq->idxd; + int offset; + + lockdep_assert_held(&idxd->dev_lock); + + /* priv field is 8 bytes into the WQCFG register */ + offset = WQCFG_OFFSET(idxd, wq->id, WQCFG_PRIV_IDX); + wq->wqcfg->priv = !!priv; + iowrite32(wq->wqcfg->bits[WQCFG_PRIV_IDX], idxd->reg_base + offset); +} + /* Device control bits */ static inline bool idxd_is_enabled(struct idxd_device *idxd) { @@ -599,6 +649,17 @@ void idxd_device_drain_pasid(struct idxd_device *idxd, int pasid) dev_dbg(dev, "pasid %d drained\n", pasid); } +void idxd_device_abort_pasid(struct idxd_device *idxd, int pasid) +{ + struct device *dev = &idxd->pdev->dev; + u32 operand; + + operand = pasid; + dev_dbg(dev, "cmd: %u operand: %#x\n", IDXD_CMD_ABORT_PASID, operand); + idxd_cmd_exec(idxd, IDXD_CMD_ABORT_PASID, operand, NULL); + dev_dbg(dev, "pasid %d aborted\n", pasid); +} + int idxd_device_request_int_handle(struct idxd_device *idxd, int idx, int *handle, enum idxd_interrupt_type irq_type) { diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 90c9458903e1..a2438b3166db 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -350,6 +350,7 @@ void idxd_device_cleanup(struct idxd_device *idxd); int idxd_device_config(struct idxd_device *idxd); void idxd_device_wqs_clear_state(struct idxd_device *idxd); void idxd_device_drain_pasid(struct idxd_device *idxd, int pasid); +void idxd_device_abort_pasid(struct idxd_device *idxd, int pasid); int idxd_device_load_config(struct idxd_device *idxd); int idxd_device_request_int_handle(struct idxd_device *idxd, int idx, int *handle, enum idxd_interrupt_type irq_type); @@ -369,6 +370,9 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid); int idxd_wq_disable_pasid(struct idxd_wq *wq); void idxd_wq_quiesce(struct idxd_wq *wq); int idxd_wq_init_percpu_ref(struct idxd_wq *wq); +int idxd_wq_abort(struct idxd_wq *wq); +void idxd_wq_setup_pasid(struct idxd_wq *wq, int pasid); +void idxd_wq_setup_priv(struct idxd_wq *wq, int priv); /* submission */ int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc); diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h index c97f700bcf34..d9a732decdd5 100644 --- a/drivers/dma/idxd/registers.h +++ b/drivers/dma/idxd/registers.h @@ -347,7 +347,8 @@ union wqcfg { u32 bits[8]; } __packed; -#define WQCFG_PASID_IDX 2 +#define WQCFG_PASID_IDX 2 +#define WQCFG_PRIV_IDX 2 /* * This macro calculates the offset into the WQCFG register From patchwork Fri Feb 5 20:53:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12070907 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F4AAC433E6 for ; Fri, 5 Feb 2021 20:55:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 094FE64FB0 for ; Fri, 5 Feb 2021 20:55:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233727AbhBETMu (ORCPT ); Fri, 5 Feb 2021 14:12:50 -0500 Received: from mga05.intel.com ([192.55.52.43]:48413 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233591AbhBETLp (ORCPT ); Fri, 5 Feb 2021 14:11:45 -0500 IronPort-SDR: 7sjZFwh6TebOO1BZG6/ylT6NgjYTbqDkEkFOiOGU2ctB1OPsAd1gt2f7eXVFUHzW/i97awJXFA ickffDo0HZ9w== X-IronPort-AV: E=McAfee;i="6000,8403,9886"; a="266315667" X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="266315667" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:53:19 -0800 IronPort-SDR: yGfLwh5KGI3BxbgLEyENrsnQ2aQAjVAEa2L1wHDhJ46qjehwkJg1CJCH88EgFG279aUQSi4PS9 avlkFONKBG5w== X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="394010985" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:53:18 -0800 Subject: [PATCH v5 04/14] vfio/mdev: idxd: Add auxialary device plumbing for idxd mdev support From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, parav@mellanox.com, netanelg@mellanox.com, shahafs@mellanox.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 05 Feb 2021 13:53:18 -0700 Message-ID: <161255839829.339900.16438737078488315104.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> References: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add the VFIO mediated device driver as an auxiliary device to the main idxd driver. This allows the mdev code to be under VFIO mdev subsystem. Signed-off-by: Dave Jiang --- MAINTAINERS | 8 ++++ drivers/dma/idxd/Makefile | 2 + drivers/dma/idxd/idxd.h | 7 ++++ drivers/dma/idxd/init.c | 77 +++++++++++++++++++++++++++++++++++++++ drivers/vfio/mdev/Kconfig | 9 +++++ drivers/vfio/mdev/Makefile | 1 + drivers/vfio/mdev/idxd/Makefile | 4 ++ drivers/vfio/mdev/idxd/mdev.c | 75 ++++++++++++++++++++++++++++++++++++++ 8 files changed, 182 insertions(+), 1 deletion(-) create mode 100644 drivers/vfio/mdev/idxd/Makefile create mode 100644 drivers/vfio/mdev/idxd/mdev.c diff --git a/MAINTAINERS b/MAINTAINERS index ae34b0331eb4..71862e759075 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -8970,7 +8970,6 @@ INTEL IADX DRIVER M: Dave Jiang L: dmaengine@vger.kernel.org S: Supported -F: Documentation/driver-api/vfio/mdev-idxd.rst F: drivers/dma/idxd/* F: include/uapi/linux/idxd.h @@ -18720,6 +18719,13 @@ F: drivers/vfio/mdev/ F: include/linux/mdev.h F: samples/vfio-mdev/ +VFIO MEDIATED DEVICE IDXD DRIVER +M: Dave Jiang +L: kvm@vger.kernel.org +S: Maintained +F: Documentation/driver-api/vfio/mdev-idxd.rst +F: drivers/vfio/mdev/idxd/ + VFIO PLATFORM DRIVER M: Eric Auger L: kvm@vger.kernel.org diff --git a/drivers/dma/idxd/Makefile b/drivers/dma/idxd/Makefile index 8978b898d777..d91d1718efac 100644 --- a/drivers/dma/idxd/Makefile +++ b/drivers/dma/idxd/Makefile @@ -1,2 +1,4 @@ +ccflags-y += -DDEFAULT_SYMBOL_NAMESPACE=IDXD + obj-$(CONFIG_INTEL_IDXD) += idxd.o idxd-y := init.o irq.o device.o sysfs.o submit.o dma.o cdev.o diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index a2438b3166db..f02c96164515 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -8,6 +8,7 @@ #include #include #include +#include #include "registers.h" #define IDXD_DRIVER_VERSION "1.00" @@ -221,6 +222,8 @@ struct idxd_device { struct work_struct work; int *int_handles; + + struct auxiliary_device *mdev_auxdev; }; /* IDXD software descriptor */ @@ -282,6 +285,10 @@ enum idxd_interrupt_type { IDXD_IRQ_IMS, }; +struct idxd_mdev_aux_drv { + struct auxiliary_driver auxiliary_drv; +}; + static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot, enum idxd_interrupt_type irq_type) { diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index ee56b92108d8..fd57f39e4b7d 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -382,6 +382,74 @@ static void idxd_disable_system_pasid(struct idxd_device *idxd) idxd->sva = NULL; } +static void idxd_remove_mdev_auxdev(struct idxd_device *idxd) +{ + if (!IS_ENABLED(CONFIG_VFIO_MDEV_IDXD)) + return; + + auxiliary_device_delete(idxd->mdev_auxdev); + auxiliary_device_uninit(idxd->mdev_auxdev); +} + +static void idxd_auxdev_release(struct device *dev) +{ + struct auxiliary_device *auxdev = to_auxiliary_dev(dev); + struct idxd_device *idxd = dev_get_drvdata(dev); + + kfree(auxdev->name); + kfree(auxdev); + idxd->mdev_auxdev = NULL; +} + +static int idxd_setup_mdev_auxdev(struct idxd_device *idxd) +{ + struct auxiliary_device *auxdev; + struct device *dev = &idxd->pdev->dev; + int rc; + + if (!IS_ENABLED(CONFIG_VFIO_MDEV_IDXD)) + return 0; + + auxdev = kzalloc(sizeof(*auxdev), GFP_KERNEL); + if (!auxdev) + return -ENOMEM; + + auxdev->name = kasprintf(GFP_KERNEL, "mdev-%s", idxd_name[idxd->type]); + if (!auxdev->name) { + rc = -ENOMEM; + goto err_name; + } + + dev_dbg(&idxd->pdev->dev, "aux dev mdev: %s\n", auxdev->name); + + auxdev->dev.parent = dev; + auxdev->dev.release = idxd_auxdev_release; + auxdev->id = idxd->id; + + rc = auxiliary_device_init(auxdev); + if (rc < 0) { + dev_err(dev, "Failed to init aux dev: %d\n", rc); + goto err_auxdev; + } + + rc = auxiliary_device_add(auxdev); + if (rc < 0) { + dev_err(dev, "Failed to add aux dev: %d\n", rc); + goto err_auxdev; + } + + idxd->mdev_auxdev = auxdev; + dev_set_drvdata(&auxdev->dev, idxd); + + return 0; + + err_auxdev: + kfree(auxdev->name); + err_name: + kfree(auxdev); + return rc; +} + static int idxd_probe(struct idxd_device *idxd) { struct pci_dev *pdev = idxd->pdev; @@ -434,11 +502,19 @@ static int idxd_probe(struct idxd_device *idxd) goto err_idr_fail; } + rc = idxd_setup_mdev_auxdev(idxd); + if (rc < 0) + goto err_auxdev_fail; + idxd->major = idxd_cdev_get_major(idxd); dev_dbg(dev, "IDXD device %d probed successfully\n", idxd->id); return 0; + err_auxdev_fail: + mutex_lock(&idxd_idr_lock); + idr_remove(&idxd_idrs[idxd->type], idxd->id); + mutex_unlock(&idxd_idr_lock); err_idr_fail: idxd_mask_error_interrupts(idxd); idxd_mask_msix_vectors(idxd); @@ -610,6 +686,7 @@ static void idxd_remove(struct pci_dev *pdev) dev_dbg(&pdev->dev, "%s called\n", __func__); idxd_cleanup_sysfs(idxd); idxd_shutdown(pdev); + idxd_remove_mdev_auxdev(idxd); if (device_pasid_enabled(idxd)) idxd_disable_system_pasid(idxd); mutex_lock(&idxd_idr_lock); diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig index 5da27f2100f9..e9540e43d1f1 100644 --- a/drivers/vfio/mdev/Kconfig +++ b/drivers/vfio/mdev/Kconfig @@ -16,3 +16,12 @@ config VFIO_MDEV_DEVICE default n help VFIO based driver for Mediated devices. + +config VFIO_MDEV_IDXD + tristate "VFIO Mediated device driver for Intel IDXD" + depends on VFIO && VFIO_MDEV && X86_64 + select AUXILIARY_BUS + select IMS_MSI_ARRAY + default n + help + VFIO based mediated device driver for Intel Accelerator Devices driver. diff --git a/drivers/vfio/mdev/Makefile b/drivers/vfio/mdev/Makefile index 101516fdf375..338843fa6110 100644 --- a/drivers/vfio/mdev/Makefile +++ b/drivers/vfio/mdev/Makefile @@ -4,3 +4,4 @@ mdev-y := mdev_core.o mdev_sysfs.o mdev_driver.o obj-$(CONFIG_VFIO_MDEV) += mdev.o obj-$(CONFIG_VFIO_MDEV_DEVICE) += vfio_mdev.o +obj-$(CONFIG_VFIO_MDEV_IDXD) += idxd/ diff --git a/drivers/vfio/mdev/idxd/Makefile b/drivers/vfio/mdev/idxd/Makefile new file mode 100644 index 000000000000..e8f45cb96117 --- /dev/null +++ b/drivers/vfio/mdev/idxd/Makefile @@ -0,0 +1,4 @@ +ccflags-y += -I$(srctree)/drivers/dma/idxd -DDEFAULT_SYMBOL_NAMESPACE=IDXD + +obj-$(CONFIG_VFIO_MDEV_IDXD) += idxd_mdev.o +idxd_mdev-y := mdev.o diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c new file mode 100644 index 000000000000..8b9a6adeb606 --- /dev/null +++ b/drivers/vfio/mdev/idxd/mdev.c @@ -0,0 +1,75 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2020 Intel Corporation. All rights rsvd. */ +#include +#include +#include +#include +#include +#include +#include +#include "registers.h" +#include "idxd.h" + +static int idxd_mdev_host_init(struct idxd_device *idxd) +{ + /* FIXME: Fill in later */ + return 0; +} + +static int idxd_mdev_host_release(struct idxd_device *idxd) +{ + /* FIXME: Fill in later */ + return 0; +} + +static int idxd_mdev_aux_probe(struct auxiliary_device *auxdev, + const struct auxiliary_device_id *id) +{ + struct idxd_device *idxd = dev_get_drvdata(&auxdev->dev); + int rc; + + rc = idxd_mdev_host_init(idxd); + if (rc < 0) { + dev_warn(&auxdev->dev, "mdev host init failed: %d\n", rc); + return rc; + } + + return 0; +} + +static void idxd_mdev_aux_remove(struct auxiliary_device *auxdev) +{ + struct idxd_device *idxd = dev_get_drvdata(&auxdev->dev); + + idxd_mdev_host_release(idxd); +} + +static const struct auxiliary_device_id idxd_mdev_auxbus_id_table[] = { + { .name = "idxd.mdev-dsa" }, + { .name = "idxd.mdev-iax" }, + {}, +}; +MODULE_DEVICE_TABLE(auxiliary, idxd_mdev_auxbus_id_table); + +static struct idxd_mdev_aux_drv idxd_mdev_aux_drv = { + .auxiliary_drv = { + .id_table = idxd_mdev_auxbus_id_table, + .probe = idxd_mdev_aux_probe, + .remove = idxd_mdev_aux_remove, + }, +}; + +static int idxd_mdev_auxdev_drv_register(struct idxd_mdev_aux_drv *drv) +{ + return auxiliary_driver_register(&drv->auxiliary_drv); +} + +static void idxd_mdev_auxdev_drv_unregister(struct idxd_mdev_aux_drv *drv) +{ + auxiliary_driver_unregister(&drv->auxiliary_drv); +} + +module_driver(idxd_mdev_aux_drv, idxd_mdev_auxdev_drv_register, idxd_mdev_auxdev_drv_unregister); + +MODULE_LICENSE("GPL v2"); +MODULE_AUTHOR("Intel Corporation"); From patchwork Fri Feb 5 20:53:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12070909 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 522A0C433E0 for ; Fri, 5 Feb 2021 20:56:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F3E4664EE4 for ; Fri, 5 Feb 2021 20:56:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233674AbhBETMd (ORCPT ); Fri, 5 Feb 2021 14:12:33 -0500 Received: from mga14.intel.com ([192.55.52.115]:28083 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229866AbhBETLr (ORCPT ); Fri, 5 Feb 2021 14:11:47 -0500 IronPort-SDR: JE23ZEkTyX9gPKY9KsLK9VO290YXO4+DKyqAIyABlr9Sy8VehfdGTLF8KBZ3xtyw54Nzf55RaL GPr1sh61Uu2w== X-IronPort-AV: E=McAfee;i="6000,8403,9886"; a="180711290" X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="180711290" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:53:26 -0800 IronPort-SDR: Om8GKkssz7UQJLGclVUqC/LDznlBmDjdX81FaT4VhRBZZthQgojDW/RqWmut9YlU5uBE+h2UjM qFPCjaLp2RrA== X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="394011012" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:53:25 -0800 Subject: [PATCH v5 05/14] vfio/mdev: idxd: add basic mdev registration and helper functions From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, parav@mellanox.com, netanelg@mellanox.com, shahafs@mellanox.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 05 Feb 2021 13:53:24 -0700 Message-ID: <161255840486.339900.5478922203128287192.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> References: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Create a mediated device through the VFIO mediated device framework. The mdev framework allows creation of an mediated device by the driver with portion of the device's resources. The driver will emulate the slow path such as the PCI config space, MMIO bar, and the command registers. The descriptor submission portal(s) will be mmaped to the guest in order to submit descriptors directly by the guest kernel or apps. The mediated device support code in the idxd will be referred to as the Virtual Device Composition Module (vdcm). Add basic plumbing to fill out the mdev_parent_ops struct that VFIO mdev requires to support a mediated device. Signed-off-by: Dave Jiang --- drivers/dma/idxd/device.c | 1 drivers/dma/idxd/idxd.h | 7 drivers/dma/idxd/init.c | 2 drivers/vfio/mdev/idxd/Makefile | 2 drivers/vfio/mdev/idxd/mdev.c | 1006 +++++++++++++++++++++++++++++++++++++++ drivers/vfio/mdev/idxd/mdev.h | 115 ++++ drivers/vfio/mdev/idxd/vdev.c | 75 +++ drivers/vfio/mdev/idxd/vdev.h | 19 + 8 files changed, 1218 insertions(+), 9 deletions(-) create mode 100644 drivers/vfio/mdev/idxd/mdev.h create mode 100644 drivers/vfio/mdev/idxd/vdev.c create mode 100644 drivers/vfio/mdev/idxd/vdev.h diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index 2491b27c8125..89fa2bbe6ebf 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -265,6 +265,7 @@ int idxd_wq_disable(struct idxd_wq *wq) dev_dbg(dev, "WQ %d disabled\n", wq->id); return 0; } +EXPORT_SYMBOL_GPL(idxd_wq_disable); void idxd_wq_drain(struct idxd_wq *wq) { diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index f02c96164515..a271942df2be 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -133,6 +133,7 @@ struct idxd_wq { u64 max_xfer_bytes; u32 max_batch_size; bool ats_dis; + struct list_head vdcm_list; }; struct idxd_engine { @@ -165,6 +166,7 @@ enum idxd_device_flag { IDXD_FLAG_CMD_RUNNING, IDXD_FLAG_PASID_ENABLED, IDXD_FLAG_IMS_SUPPORTED, + IDXD_FLAG_MDEV_ENABLED, }; struct idxd_device { @@ -275,6 +277,11 @@ static inline bool device_swq_supported(struct idxd_device *idxd) return (support_enqcmd && device_pasid_enabled(idxd)); } +static inline bool device_mdev_enabled(struct idxd_device *idxd) +{ + return test_bit(IDXD_FLAG_MDEV_ENABLED, &idxd->flags); +} + enum idxd_portal_prot { IDXD_PORTAL_UNLIMITED = 0, IDXD_PORTAL_LIMITED, diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index fd57f39e4b7d..cc3b757d300f 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -215,7 +215,6 @@ static int idxd_setup_internals(struct idxd_device *idxd) for (i = 0; i < idxd->max_wqs; i++) { struct idxd_wq *wq = &idxd->wqs[i]; - int rc; wq->id = i; wq->idxd = idxd; @@ -227,6 +226,7 @@ static int idxd_setup_internals(struct idxd_device *idxd) if (!wq->wqcfg) return -ENOMEM; init_completion(&wq->wq_dead); + INIT_LIST_HEAD(&wq->vdcm_list); } for (i = 0; i < idxd->max_engines; i++) { diff --git a/drivers/vfio/mdev/idxd/Makefile b/drivers/vfio/mdev/idxd/Makefile index e8f45cb96117..27a08621d120 100644 --- a/drivers/vfio/mdev/idxd/Makefile +++ b/drivers/vfio/mdev/idxd/Makefile @@ -1,4 +1,4 @@ ccflags-y += -I$(srctree)/drivers/dma/idxd -DDEFAULT_SYMBOL_NAMESPACE=IDXD obj-$(CONFIG_VFIO_MDEV_IDXD) += idxd_mdev.o -idxd_mdev-y := mdev.o +idxd_mdev-y := mdev.o vdev.o diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c index 8b9a6adeb606..384ba5d6bc2b 100644 --- a/drivers/vfio/mdev/idxd/mdev.c +++ b/drivers/vfio/mdev/idxd/mdev.c @@ -1,27 +1,1017 @@ // SPDX-License-Identifier: GPL-2.0 -/* Copyright(c) 2020 Intel Corporation. All rights rsvd. */ +/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */ #include #include #include #include #include -#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include #include #include "registers.h" #include "idxd.h" +#include "../../vfio/pci/vfio_pci_private.h" +#include "mdev.h" +#include "vdev.h" -static int idxd_mdev_host_init(struct idxd_device *idxd) +static u64 idxd_pci_config[] = { + 0x0010000000008086ULL, + 0x0080000008800000ULL, + 0x000000000000000cULL, + 0x000000000000000cULL, + 0x0000000000000000ULL, + 0x2010808600000000ULL, + 0x0000004000000000ULL, + 0x000000ff00000000ULL, + 0x0000060000015011ULL, /* MSI-X capability, hardcoded 2 entries, Encoded as N-1 */ + 0x0000070000000000ULL, + 0x0000000000920010ULL, /* PCIe capability */ + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, +}; + +static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags, unsigned int index, + unsigned int start, unsigned int count, void *data); + +static int idxd_mdev_get_pasid(struct mdev_device *mdev, u32 *pasid) +{ + struct vfio_group *vfio_group; + struct iommu_domain *iommu_domain; + struct device *dev = mdev_dev(mdev); + struct device *iommu_device = mdev_get_iommu_device(dev); + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + int mdev_pasid; + + if (!vidxd->vdev.vfio_group) { + dev_warn(dev, "Missing vfio_group.\n"); + return -EINVAL; + } + + vfio_group = vidxd->vdev.vfio_group; + + iommu_domain = vfio_group_iommu_domain(vfio_group); + if (IS_ERR_OR_NULL(iommu_domain)) + goto err; + + mdev_pasid = iommu_aux_get_pasid(iommu_domain, iommu_device); + if (mdev_pasid < 0) + goto err; + + *pasid = (u32)mdev_pasid; + return 0; + + err: + vfio_group_put_external_user(vfio_group); + vidxd->vdev.vfio_group = NULL; + return -EFAULT; +} + +static inline void reset_vconfig(struct vdcm_idxd *vidxd) +{ + u16 *devid = (u16 *)(vidxd->cfg + PCI_DEVICE_ID); + struct idxd_device *idxd = vidxd->idxd; + + memset(vidxd->cfg, 0, VIDXD_MAX_CFG_SPACE_SZ); + memcpy(vidxd->cfg, idxd_pci_config, sizeof(idxd_pci_config)); + + if (idxd->type == IDXD_TYPE_DSA) + *devid = PCI_DEVICE_ID_INTEL_DSA_SPR0; + else if (idxd->type == IDXD_TYPE_IAX) + *devid = PCI_DEVICE_ID_INTEL_IAX_SPR0; +} + +static inline void reset_vmmio(struct vdcm_idxd *vidxd) +{ + memset(&vidxd->bar0, 0, VIDXD_MAX_MMIO_SPACE_SZ); +} + +static void idxd_vdcm_init(struct vdcm_idxd *vidxd) +{ + struct idxd_wq *wq = vidxd->wq; + + reset_vconfig(vidxd); + reset_vmmio(vidxd); + + vidxd->bar_size[0] = VIDXD_BAR0_SIZE; + vidxd->bar_size[1] = VIDXD_BAR2_SIZE; + + vidxd_mmio_init(vidxd); + + if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED) + idxd_wq_disable(wq); +} + +static void idxd_vdcm_release(struct mdev_device *mdev) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + struct device *dev = mdev_dev(mdev); + + dev_dbg(dev, "vdcm_idxd_release %d\n", vidxd->type->type); + mutex_lock(&vidxd->dev_lock); + if (!vidxd->refcount) + goto out; + + idxd_vdcm_set_irqs(vidxd, VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER, + VFIO_PCI_MSIX_IRQ_INDEX, 0, 0, NULL); + + vidxd_free_ims_entries(vidxd); + if (vidxd->vdev.vfio_group) { + vfio_group_put_external_user(vidxd->vdev.vfio_group); + vidxd->vdev.vfio_group = NULL; + } + + /* Re-initialize the VIDXD to a pristine state for re-use */ + idxd_vdcm_init(vidxd); + vidxd->refcount--; + + out: + mutex_unlock(&vidxd->dev_lock); +} + +static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev_device *mdev, + struct vdcm_idxd_type *type) +{ + struct vdcm_idxd *vidxd; + struct idxd_wq *wq = NULL; + int i; + + /* PLACEHOLDER, wq matching comes later */ + + if (!wq) + return ERR_PTR(-ENODEV); + + vidxd = kzalloc(sizeof(*vidxd), GFP_KERNEL); + if (!vidxd) + return ERR_PTR(-ENOMEM); + + mutex_init(&vidxd->dev_lock); + vidxd->idxd = idxd; + vidxd->vdev.mdev = mdev; + vidxd->wq = wq; + mdev_set_drvdata(mdev, vidxd); + vidxd->type = type; + vidxd->num_wqs = VIDXD_MAX_WQS; + + idxd_vdcm_init(vidxd); + mutex_lock(&wq->wq_lock); + idxd_wq_get(wq); + mutex_unlock(&wq->wq_lock); + + for (i = 0; i < VIDXD_MAX_MSIX_ENTRIES; i++) { + vidxd->irq_entries[i].vidxd = vidxd; + vidxd->irq_entries[i].id = i; + } + + return vidxd; +} + +static struct vdcm_idxd_type idxd_mdev_types[IDXD_MDEV_TYPES]; + +static struct vdcm_idxd_type *idxd_vdcm_find_vidxd_type(struct device *dev, + const char *name) +{ + int i; + char dev_name[IDXD_MDEV_NAME_LEN]; + + for (i = 0; i < IDXD_MDEV_TYPES; i++) { + snprintf(dev_name, IDXD_MDEV_NAME_LEN, "idxd-%s", + idxd_mdev_types[i].name); + + if (!strncmp(name, dev_name, IDXD_MDEV_NAME_LEN)) + return &idxd_mdev_types[i]; + } + + return NULL; +} + +static int idxd_vdcm_create(struct kobject *kobj, struct mdev_device *mdev) +{ + struct vdcm_idxd *vidxd; + struct vdcm_idxd_type *type; + struct device *dev, *parent; + struct idxd_device *idxd; + struct idxd_wq *wq; + + parent = mdev_parent_dev(mdev); + idxd = dev_get_drvdata(parent); + dev = mdev_dev(mdev); + mdev_set_iommu_device(dev, parent); + type = idxd_vdcm_find_vidxd_type(dev, kobject_name(kobj)); + if (!type) { + dev_err(dev, "failed to find type %s to create\n", + kobject_name(kobj)); + return -EINVAL; + } + + vidxd = vdcm_vidxd_create(idxd, mdev, type); + if (IS_ERR(vidxd)) { + dev_err(dev, "failed to create vidxd: %ld\n", PTR_ERR(vidxd)); + return PTR_ERR(vidxd); + } + + wq = vidxd->wq; + mutex_lock(&wq->wq_lock); + list_add(&vidxd->list, &wq->vdcm_list); + mutex_unlock(&wq->wq_lock); + dev_dbg(dev, "mdev creation success: %s\n", dev_name(mdev_dev(mdev))); + + return 0; +} + +static int idxd_vdcm_remove(struct mdev_device *mdev) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + struct idxd_device *idxd = vidxd->idxd; + struct device *dev = &idxd->pdev->dev; + struct idxd_wq *wq = vidxd->wq; + + dev_dbg(dev, "%s: removing for wq %d\n", __func__, vidxd->wq->id); + + mutex_lock(&wq->wq_lock); + list_del(&vidxd->list); + idxd_wq_put(wq); + mutex_unlock(&wq->wq_lock); + + kfree(vidxd); + return 0; +} + +static int idxd_vdcm_open(struct mdev_device *mdev) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + int rc = -EINVAL; + struct vdcm_idxd_type *type = vidxd->type; + struct device *dev = mdev_dev(mdev); + struct vfio_group *vfio_group; + + dev_dbg(dev, "%s: type: %d\n", __func__, type->type); + + mutex_lock(&vidxd->dev_lock); + if (vidxd->refcount) + goto out; + + vfio_group = vfio_group_get_external_user_from_dev(dev); + if (IS_ERR_OR_NULL(vfio_group)) + return -EFAULT; + vidxd->vdev.vfio_group = vfio_group; + + /* allocate and setup IMS entries */ + rc = vidxd_setup_ims_entries(vidxd); + if (rc < 0) + goto ims_fail; + + vidxd->refcount++; + mutex_unlock(&vidxd->dev_lock); + + return rc; + + ims_fail: + vfio_group_put_external_user(vfio_group); + vidxd->vdev.vfio_group = NULL; + out: + mutex_unlock(&vidxd->dev_lock); + return rc; +} + +static ssize_t idxd_vdcm_rw(struct mdev_device *mdev, char *buf, size_t count, loff_t *ppos, + enum idxd_vdcm_rw mode) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos); + u64 pos = *ppos & VFIO_PCI_OFFSET_MASK; + struct device *dev = mdev_dev(mdev); + int rc = -EINVAL; + + if (index >= VFIO_PCI_NUM_REGIONS) { + dev_err(dev, "invalid index: %u\n", index); + return -EINVAL; + } + + switch (index) { + case VFIO_PCI_CONFIG_REGION_INDEX: + if (mode == IDXD_VDCM_WRITE) + rc = vidxd_cfg_write(vidxd, pos, buf, count); + else + rc = vidxd_cfg_read(vidxd, pos, buf, count); + break; + case VFIO_PCI_BAR0_REGION_INDEX: + case VFIO_PCI_BAR1_REGION_INDEX: + if (mode == IDXD_VDCM_WRITE) + rc = vidxd_mmio_write(vidxd, vidxd->bar_val[0] + pos, buf, count); + else + rc = vidxd_mmio_read(vidxd, vidxd->bar_val[0] + pos, buf, count); + break; + case VFIO_PCI_BAR2_REGION_INDEX: + case VFIO_PCI_BAR3_REGION_INDEX: + case VFIO_PCI_BAR4_REGION_INDEX: + case VFIO_PCI_BAR5_REGION_INDEX: + case VFIO_PCI_VGA_REGION_INDEX: + case VFIO_PCI_ROM_REGION_INDEX: + default: + dev_err(dev, "unsupported region: %u\n", index); + } + + return rc == 0 ? count : rc; +} + +static ssize_t idxd_vdcm_read(struct mdev_device *mdev, char __user *buf, size_t count, + loff_t *ppos) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + unsigned int done = 0; + int rc; + + mutex_lock(&vidxd->dev_lock); + while (count) { + size_t filled; + + if (count >= 4 && !(*ppos % 4)) { + u32 val; + + rc = idxd_vdcm_rw(mdev, (char *)&val, sizeof(val), + ppos, IDXD_VDCM_READ); + if (rc <= 0) + goto read_err; + + if (copy_to_user(buf, &val, sizeof(val))) + goto read_err; + + filled = 4; + } else if (count >= 2 && !(*ppos % 2)) { + u16 val; + + rc = idxd_vdcm_rw(mdev, (char *)&val, sizeof(val), + ppos, IDXD_VDCM_READ); + if (rc <= 0) + goto read_err; + + if (copy_to_user(buf, &val, sizeof(val))) + goto read_err; + + filled = 2; + } else { + u8 val; + + rc = idxd_vdcm_rw(mdev, &val, sizeof(val), ppos, + IDXD_VDCM_READ); + if (rc <= 0) + goto read_err; + + if (copy_to_user(buf, &val, sizeof(val))) + goto read_err; + + filled = 1; + } + + count -= filled; + done += filled; + *ppos += filled; + buf += filled; + } + + mutex_unlock(&vidxd->dev_lock); + return done; + + read_err: + mutex_unlock(&vidxd->dev_lock); + return -EFAULT; +} + +static ssize_t idxd_vdcm_write(struct mdev_device *mdev, const char __user *buf, size_t count, + loff_t *ppos) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + unsigned int done = 0; + int rc; + + mutex_lock(&vidxd->dev_lock); + while (count) { + size_t filled; + + if (count >= 4 && !(*ppos % 4)) { + u32 val; + + if (copy_from_user(&val, buf, sizeof(val))) + goto write_err; + + rc = idxd_vdcm_rw(mdev, (char *)&val, sizeof(val), + ppos, IDXD_VDCM_WRITE); + if (rc <= 0) + goto write_err; + + filled = 4; + } else if (count >= 2 && !(*ppos % 2)) { + u16 val; + + if (copy_from_user(&val, buf, sizeof(val))) + goto write_err; + + rc = idxd_vdcm_rw(mdev, (char *)&val, + sizeof(val), ppos, IDXD_VDCM_WRITE); + if (rc <= 0) + goto write_err; + + filled = 2; + } else { + u8 val; + + if (copy_from_user(&val, buf, sizeof(val))) + goto write_err; + + rc = idxd_vdcm_rw(mdev, &val, sizeof(val), + ppos, IDXD_VDCM_WRITE); + if (rc <= 0) + goto write_err; + + filled = 1; + } + + count -= filled; + done += filled; + *ppos += filled; + buf += filled; + } + + mutex_unlock(&vidxd->dev_lock); + return done; + +write_err: + mutex_unlock(&vidxd->dev_lock); + return -EFAULT; +} + +static int check_vma(struct idxd_wq *wq, struct vm_area_struct *vma) { - /* FIXME: Fill in later */ + if (vma->vm_end < vma->vm_start) + return -EINVAL; + if (!(vma->vm_flags & VM_SHARED)) + return -EINVAL; + return 0; } -static int idxd_mdev_host_release(struct idxd_device *idxd) +static int idxd_vdcm_mmap(struct mdev_device *mdev, struct vm_area_struct *vma) +{ + unsigned int wq_idx, rc; + unsigned long req_size, pgoff = 0, offset; + pgprot_t pg_prot; + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + struct idxd_wq *wq = vidxd->wq; + struct idxd_device *idxd = vidxd->idxd; + enum idxd_portal_prot virt_portal, phys_portal; + phys_addr_t base = pci_resource_start(idxd->pdev, IDXD_WQ_BAR); + struct device *dev = mdev_dev(mdev); + + rc = check_vma(wq, vma); + if (rc) + return rc; + + pg_prot = vma->vm_page_prot; + req_size = vma->vm_end - vma->vm_start; + vma->vm_flags |= VM_DONTCOPY; + + offset = (vma->vm_pgoff << PAGE_SHIFT) & + ((1ULL << VFIO_PCI_OFFSET_SHIFT) - 1); + + wq_idx = offset >> (PAGE_SHIFT + 2); + if (wq_idx >= 1) { + dev_err(dev, "mapping invalid wq %d off %lx\n", + wq_idx, offset); + return -EINVAL; + } + + /* + * Check and see if the guest wants to map to the limited or unlimited portal. + * The driver will allow mapping to unlimited portal only if the the wq is a + * dedicated wq. Otherwise, it goes to limited. + */ + virt_portal = ((offset >> PAGE_SHIFT) & 0x3) == 1; + phys_portal = IDXD_PORTAL_LIMITED; + if (virt_portal == IDXD_PORTAL_UNLIMITED && wq_dedicated(wq)) + phys_portal = IDXD_PORTAL_UNLIMITED; + + /* We always map IMS portals to the guest */ + pgoff = (base + idxd_get_wq_portal_full_offset(wq->id, phys_portal, + IDXD_IRQ_IMS)) >> PAGE_SHIFT; + + dev_dbg(dev, "mmap %lx %lx %lx %lx\n", vma->vm_start, pgoff, req_size, + pgprot_val(pg_prot)); + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + vma->vm_private_data = mdev; + vma->vm_pgoff = pgoff; + + return remap_pfn_range(vma, vma->vm_start, pgoff, req_size, pg_prot); +} + +static int idxd_vdcm_get_irq_count(struct vdcm_idxd *vidxd, int type) { - /* FIXME: Fill in later */ + /* + * Even though the number of MSIX vectors supported are not tied to number of + * wqs being exported, the current design is to allow 1 vector per WQ for guest. + * So here we end up with num of wqs plus 1 that handles the misc interrupts. + */ + if (type == VFIO_PCI_MSI_IRQ_INDEX || type == VFIO_PCI_MSIX_IRQ_INDEX) + return VIDXD_MAX_MSIX_VECS; + return 0; } +static irqreturn_t idxd_guest_wq_completion(int irq, void *data) +{ + struct ims_irq_entry *irq_entry = data; + + vidxd_send_interrupt(irq_entry); + return IRQ_HANDLED; +} + +static int msix_trigger_unregister(struct vdcm_idxd *vidxd, int index) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct ims_irq_entry *irq_entry; + int rc; + + if (!vidxd->vdev.msix_trigger[index]) + return 0; + + dev_dbg(dev, "disable MSIX trigger %d\n", index); + if (index) { + u32 auxval; + + irq_entry = &vidxd->irq_entries[index]; + if (irq_entry->irq_set) { + free_irq(irq_entry->irq, irq_entry); + irq_entry->irq_set = false; + } + + auxval = ims_ctrl_pasid_aux(0, false); + rc = irq_set_auxdata(irq_entry->irq, IMS_AUXDATA_CONTROL_WORD, auxval); + if (rc) + return rc; + } + eventfd_ctx_put(vidxd->vdev.msix_trigger[index]); + vidxd->vdev.msix_trigger[index] = NULL; + + return 0; +} + +static int msix_trigger_register(struct vdcm_idxd *vidxd, u32 fd, int index) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct ims_irq_entry *irq_entry; + struct eventfd_ctx *trigger; + int rc; + + if (vidxd->vdev.msix_trigger[index]) + return 0; + + dev_dbg(dev, "enable MSIX trigger %d\n", index); + trigger = eventfd_ctx_fdget(fd); + if (IS_ERR(trigger)) { + dev_warn(dev, "eventfd_ctx_fdget failed %d\n", index); + return PTR_ERR(trigger); + } + + if (index) { + u32 pasid; + u32 auxval; + + irq_entry = &vidxd->irq_entries[index]; + rc = idxd_mdev_get_pasid(mdev, &pasid); + if (rc < 0) + return rc; + + /* + * Program and enable the pasid field in the IMS entry. The programmed pasid and + * enabled field is checked against the pasid and enable field for the work queue + * configuration and the pasid for the descriptor. A mismatch will result in blocked + * IMS interrupt. + */ + auxval = ims_ctrl_pasid_aux(pasid, true); + rc = irq_set_auxdata(irq_entry->irq, IMS_AUXDATA_CONTROL_WORD, auxval); + if (rc < 0) + return rc; + + rc = request_irq(irq_entry->irq, idxd_guest_wq_completion, 0, "idxd-ims", + irq_entry); + if (rc) { + dev_warn(dev, "failed to request ims irq\n"); + eventfd_ctx_put(trigger); + auxval = ims_ctrl_pasid_aux(0, false); + irq_set_auxdata(irq_entry->irq, IMS_AUXDATA_CONTROL_WORD, auxval); + return rc; + } + irq_entry->irq_set = true; + } + + vidxd->vdev.msix_trigger[index] = trigger; + return 0; +} + +static int vdcm_idxd_set_msix_trigger(struct vdcm_idxd *vidxd, + unsigned int index, unsigned int start, + unsigned int count, uint32_t flags, + void *data) +{ + int i, rc = 0; + + if (count > VIDXD_MAX_MSIX_ENTRIES - 1) + count = VIDXD_MAX_MSIX_ENTRIES - 1; + + if (count == 0 && (flags & VFIO_IRQ_SET_DATA_NONE)) { + /* Disable all MSIX entries */ + for (i = 0; i < VIDXD_MAX_MSIX_ENTRIES; i++) { + rc = msix_trigger_unregister(vidxd, i); + if (rc < 0) + return rc; + } + return 0; + } + + for (i = 0; i < count; i++) { + if (flags & VFIO_IRQ_SET_DATA_EVENTFD) { + u32 fd = *(u32 *)(data + i * sizeof(u32)); + + rc = msix_trigger_register(vidxd, fd, i); + if (rc < 0) + return rc; + } else if (flags & VFIO_IRQ_SET_DATA_NONE) { + rc = msix_trigger_unregister(vidxd, i); + if (rc < 0) + return rc; + } + } + return rc; +} + +static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags, + unsigned int index, unsigned int start, + unsigned int count, void *data) +{ + int (*func)(struct vdcm_idxd *vidxd, unsigned int index, + unsigned int start, unsigned int count, uint32_t flags, + void *data) = NULL; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + switch (index) { + case VFIO_PCI_INTX_IRQ_INDEX: + dev_warn(dev, "intx interrupts not supported.\n"); + break; + case VFIO_PCI_MSI_IRQ_INDEX: + dev_dbg(dev, "msi interrupt.\n"); + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { + case VFIO_IRQ_SET_ACTION_MASK: + case VFIO_IRQ_SET_ACTION_UNMASK: + break; + case VFIO_IRQ_SET_ACTION_TRIGGER: + func = vdcm_idxd_set_msix_trigger; + break; + } + break; + case VFIO_PCI_MSIX_IRQ_INDEX: + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { + case VFIO_IRQ_SET_ACTION_MASK: + case VFIO_IRQ_SET_ACTION_UNMASK: + break; + case VFIO_IRQ_SET_ACTION_TRIGGER: + func = vdcm_idxd_set_msix_trigger; + break; + } + break; + default: + return -ENOTTY; + } + + if (!func) + return -ENOTTY; + + return func(vidxd, index, start, count, flags, data); +} + +static void vidxd_vdcm_reset(struct vdcm_idxd *vidxd) +{ + vidxd_reset(vidxd); +} + +static long idxd_vdcm_ioctl(struct mdev_device *mdev, unsigned int cmd, + unsigned long arg) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + unsigned long minsz; + int rc = -EINVAL; + struct device *dev = mdev_dev(mdev); + + dev_dbg(dev, "vidxd %p ioctl, cmd: %d\n", vidxd, cmd); + + mutex_lock(&vidxd->dev_lock); + if (cmd == VFIO_DEVICE_GET_INFO) { + struct vfio_device_info info; + + minsz = offsetofend(struct vfio_device_info, num_irqs); + + if (copy_from_user(&info, (void __user *)arg, minsz)) { + rc = -EFAULT; + goto out; + } + + if (info.argsz < minsz) { + rc = -EINVAL; + goto out; + } + + info.flags = VFIO_DEVICE_FLAGS_PCI; + info.flags |= VFIO_DEVICE_FLAGS_RESET; + info.num_regions = VFIO_PCI_NUM_REGIONS; + info.num_irqs = VFIO_PCI_NUM_IRQS; + + if (copy_to_user((void __user *)arg, &info, minsz)) + rc = -EFAULT; + else + rc = 0; + goto out; + } else if (cmd == VFIO_DEVICE_GET_REGION_INFO) { + struct vfio_region_info info; + struct vfio_info_cap caps = { .buf = NULL, .size = 0 }; + struct vfio_region_info_cap_sparse_mmap *sparse = NULL; + size_t size; + int nr_areas = 1; + int cap_type_id = 0; + + minsz = offsetofend(struct vfio_region_info, offset); + + if (copy_from_user(&info, (void __user *)arg, minsz)) { + rc = -EFAULT; + goto out; + } + + if (info.argsz < minsz) { + rc = -EINVAL; + goto out; + } + + switch (info.index) { + case VFIO_PCI_CONFIG_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = VIDXD_MAX_CFG_SPACE_SZ; + info.flags = VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE; + break; + case VFIO_PCI_BAR0_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = vidxd->bar_size[info.index]; + if (!info.size) { + info.flags = 0; + break; + } + + info.flags = VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE; + break; + case VFIO_PCI_BAR1_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = 0; + info.flags = 0; + break; + case VFIO_PCI_BAR2_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.flags = VFIO_REGION_INFO_FLAG_CAPS | VFIO_REGION_INFO_FLAG_MMAP | + VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE; + info.size = vidxd->bar_size[1]; + + /* + * Every WQ has two areas for unlimited and limited + * MSI-X portals. IMS portals are not reported + */ + nr_areas = 2; + + size = sizeof(*sparse) + (nr_areas * sizeof(*sparse->areas)); + sparse = kzalloc(size, GFP_KERNEL); + if (!sparse) { + rc = -ENOMEM; + goto out; + } + + sparse->header.id = VFIO_REGION_INFO_CAP_SPARSE_MMAP; + sparse->header.version = 1; + sparse->nr_areas = nr_areas; + cap_type_id = VFIO_REGION_INFO_CAP_SPARSE_MMAP; + + /* Unlimited portal */ + sparse->areas[0].offset = 0; + sparse->areas[0].size = PAGE_SIZE; + + /* Limited portal */ + sparse->areas[1].offset = PAGE_SIZE; + sparse->areas[1].size = PAGE_SIZE; + break; + + case VFIO_PCI_BAR3_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = 0; + info.flags = 0; + dev_dbg(dev, "get region info bar:%d\n", info.index); + break; + + case VFIO_PCI_ROM_REGION_INDEX: + case VFIO_PCI_VGA_REGION_INDEX: + dev_dbg(dev, "get region info index:%d\n", info.index); + break; + default: { + if (info.index >= VFIO_PCI_NUM_REGIONS) + rc = -EINVAL; + else + rc = 0; + goto out; + } /* default */ + } /* info.index switch */ + + if ((info.flags & VFIO_REGION_INFO_FLAG_CAPS) && sparse) { + if (cap_type_id == VFIO_REGION_INFO_CAP_SPARSE_MMAP) { + rc = vfio_info_add_capability(&caps, &sparse->header, + sizeof(*sparse) + (sparse->nr_areas * + sizeof(*sparse->areas))); + kfree(sparse); + if (rc) + goto out; + } + } + + if (caps.size) { + if (info.argsz < sizeof(info) + caps.size) { + info.argsz = sizeof(info) + caps.size; + info.cap_offset = 0; + } else { + vfio_info_cap_shift(&caps, sizeof(info)); + if (copy_to_user((void __user *)arg + sizeof(info), + caps.buf, caps.size)) { + kfree(caps.buf); + rc = -EFAULT; + goto out; + } + info.cap_offset = sizeof(info); + } + + kfree(caps.buf); + } + if (copy_to_user((void __user *)arg, &info, minsz)) + rc = -EFAULT; + else + rc = 0; + goto out; + } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) { + struct vfio_irq_info info; + + minsz = offsetofend(struct vfio_irq_info, count); + + if (copy_from_user(&info, (void __user *)arg, minsz)) { + rc = -EFAULT; + goto out; + } + + if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS) { + rc = -EINVAL; + goto out; + } + + info.flags = VFIO_IRQ_INFO_EVENTFD; + + switch (info.index) { + case VFIO_PCI_INTX_IRQ_INDEX: + info.flags |= (VFIO_IRQ_INFO_MASKABLE | VFIO_IRQ_INFO_AUTOMASKED); + break; + case VFIO_PCI_MSI_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX: + case VFIO_PCI_REQ_IRQ_INDEX: + info.flags |= VFIO_IRQ_INFO_NORESIZE; + break; + case VFIO_PCI_ERR_IRQ_INDEX: + info.flags |= VFIO_IRQ_INFO_NORESIZE; + if (pci_is_pcie(vidxd->idxd->pdev)) + break; + fallthrough; + default: + rc = -EINVAL; + goto out; + } /* switch(info.index) */ + + info.flags = VFIO_IRQ_INFO_EVENTFD | VFIO_IRQ_INFO_NORESIZE; + info.count = idxd_vdcm_get_irq_count(vidxd, info.index); + + if (copy_to_user((void __user *)arg, &info, minsz)) + rc = -EFAULT; + else + rc = 0; + goto out; + } else if (cmd == VFIO_DEVICE_SET_IRQS) { + struct vfio_irq_set hdr; + u8 *data = NULL; + size_t data_size = 0; + + minsz = offsetofend(struct vfio_irq_set, count); + + if (copy_from_user(&hdr, (void __user *)arg, minsz)) { + rc = -EFAULT; + goto out; + } + + if (!(hdr.flags & VFIO_IRQ_SET_DATA_NONE)) { + int max = idxd_vdcm_get_irq_count(vidxd, hdr.index); + + rc = vfio_set_irqs_validate_and_prepare(&hdr, max, VFIO_PCI_NUM_IRQS, + &data_size); + if (rc) { + dev_err(dev, "intel:vfio_set_irqs_validate_and_prepare failed\n"); + rc = -EINVAL; + goto out; + } + if (data_size) { + data = memdup_user((void __user *)(arg + minsz), data_size); + if (IS_ERR(data)) { + rc = PTR_ERR(data); + goto out; + } + } + } + + if (!data) { + rc = -EINVAL; + goto out; + } + + rc = idxd_vdcm_set_irqs(vidxd, hdr.flags, hdr.index, hdr.start, hdr.count, data); + kfree(data); + goto out; + } else if (cmd == VFIO_DEVICE_RESET) { + vidxd_vdcm_reset(vidxd); + } + + out: + mutex_unlock(&vidxd->dev_lock); + return rc; +} + +static const struct mdev_parent_ops idxd_vdcm_ops = { + .create = idxd_vdcm_create, + .remove = idxd_vdcm_remove, + .open = idxd_vdcm_open, + .release = idxd_vdcm_release, + .read = idxd_vdcm_read, + .write = idxd_vdcm_write, + .mmap = idxd_vdcm_mmap, + .ioctl = idxd_vdcm_ioctl, +}; + +int idxd_mdev_host_init(struct idxd_device *idxd) +{ + struct device *dev = &idxd->pdev->dev; + int rc; + + if (!test_bit(IDXD_FLAG_IMS_SUPPORTED, &idxd->flags)) + return -EOPNOTSUPP; + + if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) { + rc = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_AUX); + if (rc < 0) { + dev_warn(dev, "Failed to enable aux-domain: %d\n", rc); + return rc; + } + } else { + dev_warn(dev, "No aux-domain feature.\n"); + return -EOPNOTSUPP; + } + + return mdev_register_device(dev, &idxd_vdcm_ops); +} + +void idxd_mdev_host_release(struct idxd_device *idxd) +{ + struct device *dev = &idxd->pdev->dev; + int rc; + + mdev_unregister_device(dev); + if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) { + rc = iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_AUX); + if (rc < 0) + dev_warn(dev, "Failed to disable aux-domain: %d\n", + rc); + } +} + static int idxd_mdev_aux_probe(struct auxiliary_device *auxdev, const struct auxiliary_device_id *id) { @@ -34,6 +1024,7 @@ static int idxd_mdev_aux_probe(struct auxiliary_device *auxdev, return rc; } + set_bit(IDXD_FLAG_MDEV_ENABLED, &idxd->flags); return 0; } @@ -41,6 +1032,7 @@ static void idxd_mdev_aux_remove(struct auxiliary_device *auxdev) { struct idxd_device *idxd = dev_get_drvdata(&auxdev->dev); + clear_bit(IDXD_FLAG_MDEV_ENABLED, &idxd->flags); idxd_mdev_host_release(idxd); } @@ -70,6 +1062,6 @@ static void idxd_mdev_auxdev_drv_unregister(struct idxd_mdev_aux_drv *drv) } module_driver(idxd_mdev_aux_drv, idxd_mdev_auxdev_drv_register, idxd_mdev_auxdev_drv_unregister); - +MODULE_IMPORT_NS(IDXD); MODULE_LICENSE("GPL v2"); MODULE_AUTHOR("Intel Corporation"); diff --git a/drivers/vfio/mdev/idxd/mdev.h b/drivers/vfio/mdev/idxd/mdev.h new file mode 100644 index 000000000000..7ca50f054714 --- /dev/null +++ b/drivers/vfio/mdev/idxd/mdev.h @@ -0,0 +1,115 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2020 Intel Corporation. All rights rsvd. */ + +#ifndef _IDXD_MDEV_H_ +#define _IDXD_MDEV_H_ + +/* two 64-bit BARs implemented */ +#define VIDXD_MAX_BARS 2 +#define VIDXD_MAX_CFG_SPACE_SZ 4096 +#define VIDXD_MAX_MMIO_SPACE_SZ 8192 +#define VIDXD_MSIX_TBL_SZ_OFFSET 0x42 +#define VIDXD_CAP_CTRL_SZ 0x100 +#define VIDXD_GRP_CTRL_SZ 0x100 +#define VIDXD_WQ_CTRL_SZ 0x100 +#define VIDXD_WQ_OCPY_INT_SZ 0x20 +#define VIDXD_MSIX_TBL_SZ 0x90 +#define VIDXD_MSIX_PERM_TBL_SZ 0x48 + +#define VIDXD_MSIX_TABLE_OFFSET 0x600 +#define VIDXD_MSIX_PERM_OFFSET 0x300 +#define VIDXD_GRPCFG_OFFSET 0x400 +#define VIDXD_WQCFG_OFFSET 0x500 +#define VIDXD_IMS_OFFSET 0x1000 + +#define VIDXD_BAR0_SIZE 0x2000 +#define VIDXD_BAR2_SIZE 0x2000 +#define VIDXD_MAX_MSIX_ENTRIES (VIDXD_MSIX_TBL_SZ / 0x10) +#define VIDXD_MAX_WQS 1 +#define VIDXD_MAX_MSIX_VECS 2 + +#define VIDXD_ATS_OFFSET 0x100 +#define VIDXD_PRS_OFFSET 0x110 +#define VIDXD_PASID_OFFSET 0x120 +#define VIDXD_MSIX_PBA_OFFSET 0x700 + +struct ims_irq_entry { + struct vdcm_idxd *vidxd; + bool irq_set; + int id; + int irq; +}; + +struct idxd_vdev { + struct mdev_device *mdev; + struct vfio_group *vfio_group; + struct eventfd_ctx *msix_trigger[VIDXD_MAX_MSIX_ENTRIES]; +}; + +struct vdcm_idxd { + struct idxd_device *idxd; + struct idxd_wq *wq; + struct idxd_vdev vdev; + struct vdcm_idxd_type *type; + int num_wqs; + struct ims_irq_entry irq_entries[VIDXD_MAX_MSIX_ENTRIES]; + + /* For VM use case */ + u64 bar_val[VIDXD_MAX_BARS]; + u64 bar_size[VIDXD_MAX_BARS]; + u8 cfg[VIDXD_MAX_CFG_SPACE_SZ]; + u8 bar0[VIDXD_MAX_MMIO_SPACE_SZ]; + struct list_head list; + struct mutex dev_lock; /* lock for vidxd resources */ + + int refcount; +}; + +static inline struct vdcm_idxd *to_vidxd(struct idxd_vdev *vdev) +{ + return container_of(vdev, struct vdcm_idxd, vdev); +} + +#define IDXD_MDEV_NAME_LEN 64 + +enum idxd_mdev_type { + IDXD_MDEV_TYPE_DSA_1_DWQ = 0, + IDXD_MDEV_TYPE_IAX_1_DWQ, +}; + +#define IDXD_MDEV_TYPES 2 + +struct vdcm_idxd_type { + char *name; + enum idxd_mdev_type type; + unsigned int avail_instance; +}; + +enum idxd_vdcm_rw { + IDXD_VDCM_READ = 0, + IDXD_VDCM_WRITE, +}; + +static inline u64 get_reg_val(void *buf, int size) +{ + u64 val = 0; + + switch (size) { + case 8: + val = *(u64 *)buf; + break; + case 4: + val = *(u32 *)buf; + break; + case 2: + val = *(u16 *)buf; + break; + case 1: + val = *(u8 *)buf; + break; + } + + return val; +} + +#endif diff --git a/drivers/vfio/mdev/idxd/vdev.c b/drivers/vfio/mdev/idxd/vdev.c new file mode 100644 index 000000000000..766753a2ec53 --- /dev/null +++ b/drivers/vfio/mdev/idxd/vdev.c @@ -0,0 +1,75 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "registers.h" +#include "idxd.h" +#include "../../vfio/pci/vfio_pci_private.h" +#include "mdev.h" +#include "vdev.h" + +int vidxd_send_interrupt(struct ims_irq_entry *iie) +{ + /* PLACE HOLDER */ + return 0; +} + +int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size) +{ + /* PLACEHOLDER */ + return 0; +} + +int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size) +{ + /* PLACEHOLDER */ + return 0; +} + +int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count) +{ + /* PLACEHOLDER */ + return 0; +} + +int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size) +{ + /* PLACEHOLDER */ + return 0; +} + +void vidxd_mmio_init(struct vdcm_idxd *vidxd) +{ + /* PLACEHOLDER */ +} + +void vidxd_reset(struct vdcm_idxd *vidxd) +{ + /* PLACEHOLDER */ +} + +int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd) +{ + /* PLACEHOLDER */ + return 0; +} + +void vidxd_free_ims_entries(struct vdcm_idxd *vidxd) +{ + /* PLACEHOLDER */ +} diff --git a/drivers/vfio/mdev/idxd/vdev.h b/drivers/vfio/mdev/idxd/vdev.h new file mode 100644 index 000000000000..cc2ba6ccff7b --- /dev/null +++ b/drivers/vfio/mdev/idxd/vdev.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */ + +#ifndef _IDXD_VDEV_H_ +#define _IDXD_VDEV_H_ + +#include "mdev.h" + +int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size); +int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size); +int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count); +int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size); +void vidxd_mmio_init(struct vdcm_idxd *vidxd); +void vidxd_reset(struct vdcm_idxd *vidxd); +int vidxd_send_interrupt(struct ims_irq_entry *iie); +int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd); +void vidxd_free_ims_entries(struct vdcm_idxd *vidxd); + +#endif From patchwork Fri Feb 5 20:53:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12070955 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65D67C433DB for ; Fri, 5 Feb 2021 21:09:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2024564FB7 for ; Fri, 5 Feb 2021 21:09:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233778AbhBEVIs (ORCPT ); Fri, 5 Feb 2021 16:08:48 -0500 Received: from mga03.intel.com ([134.134.136.65]:37593 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233233AbhBETLt (ORCPT ); Fri, 5 Feb 2021 14:11:49 -0500 IronPort-SDR: t7IHgm0xHcwlvD/vEMn1cV70rm0mXzTnQunjifcQG1GdTgqds5iK4sAdmOCvBMytcaUhr40M4y JH9gNMrheMjw== X-IronPort-AV: E=McAfee;i="6000,8403,9886"; a="181551539" X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="181551539" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:53:32 -0800 IronPort-SDR: NObB7WXcy1BzHBA+4p4EBEI0yd1hrZ5n6VRbRp/2P0KQQVsx/X7elqYn50dg5Z0vCaDFybutip mD0ksTTqT0Vg== X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="434596933" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:53:31 -0800 Subject: [PATCH v5 06/14] vfio/mdev: idxd: add mdev type as a new wq type From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, parav@mellanox.com, netanelg@mellanox.com, shahafs@mellanox.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 05 Feb 2021 13:53:31 -0700 Message-ID: <161255841136.339900.9537818102228577552.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> References: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add "mdev" wq type and support helpers. The mdev wq type marks the wq to be utilized as a VFIO mediated device. Signed-off-by: Dave Jiang --- drivers/dma/idxd/idxd.h | 2 ++ drivers/dma/idxd/sysfs.c | 13 +++++++++++-- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index a271942df2be..67428c8d476d 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -73,6 +73,7 @@ enum idxd_wq_type { IDXD_WQT_NONE = 0, IDXD_WQT_KERNEL, IDXD_WQT_USER, + IDXD_WQT_MDEV, }; struct idxd_cdev { @@ -344,6 +345,7 @@ void idxd_cleanup_sysfs(struct idxd_device *idxd); int idxd_register_driver(void); void idxd_unregister_driver(void); struct bus_type *idxd_get_bus_type(struct idxd_device *idxd); +bool is_idxd_wq_mdev(struct idxd_wq *wq); /* device interrupt control */ irqreturn_t idxd_irq_handler(int vec, void *data); diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index ab5c76e1226b..13d20cbd4cf6 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -14,6 +14,7 @@ static char *idxd_wq_type_names[] = { [IDXD_WQT_NONE] = "none", [IDXD_WQT_KERNEL] = "kernel", [IDXD_WQT_USER] = "user", + [IDXD_WQT_MDEV] = "mdev", }; static void idxd_conf_device_release(struct device *dev) @@ -79,6 +80,11 @@ static inline bool is_idxd_wq_cdev(struct idxd_wq *wq) return wq->type == IDXD_WQT_USER; } +inline bool is_idxd_wq_mdev(struct idxd_wq *wq) +{ + return wq->type == IDXD_WQT_MDEV ? true : false; +} + static int idxd_config_bus_match(struct device *dev, struct device_driver *drv) { @@ -1151,8 +1157,9 @@ static ssize_t wq_type_show(struct device *dev, return sprintf(buf, "%s\n", idxd_wq_type_names[IDXD_WQT_KERNEL]); case IDXD_WQT_USER: - return sprintf(buf, "%s\n", - idxd_wq_type_names[IDXD_WQT_USER]); + return sprintf(buf, "%s\n", idxd_wq_type_names[IDXD_WQT_USER]); + case IDXD_WQT_MDEV: + return sprintf(buf, "%s\n", idxd_wq_type_names[IDXD_WQT_MDEV]); case IDXD_WQT_NONE: default: return sprintf(buf, "%s\n", @@ -1179,6 +1186,8 @@ static ssize_t wq_type_store(struct device *dev, wq->type = IDXD_WQT_KERNEL; else if (sysfs_streq(buf, idxd_wq_type_names[IDXD_WQT_USER])) wq->type = IDXD_WQT_USER; + else if (sysfs_streq(buf, idxd_wq_type_names[IDXD_WQT_MDEV])) + wq->type = IDXD_WQT_MDEV; else return -EINVAL; From patchwork Fri Feb 5 20:53:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12070953 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2785CC433E9 for ; Fri, 5 Feb 2021 21:08:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C70CF64FB7 for ; Fri, 5 Feb 2021 21:08:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233767AbhBEVIl (ORCPT ); Fri, 5 Feb 2021 16:08:41 -0500 Received: from mga03.intel.com ([134.134.136.65]:37605 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231258AbhBETLy (ORCPT ); Fri, 5 Feb 2021 14:11:54 -0500 IronPort-SDR: ZEkPSrGNuHFQMhZI7fKRbJg/+Jsqkz+dbvthuR3XXkx9Yjc2JIsKa1Td6+ADDOtlOt178zmBBf PyHN40eTo0Tg== X-IronPort-AV: E=McAfee;i="6000,8403,9886"; a="181551551" X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="181551551" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:53:39 -0800 IronPort-SDR: ciJu/jukbZ2x70DZn4RggD5Af3iy+htFDTKKOgykpck1Gs5e0d3UI8KVZSQH3ER/BHZXZKX+5J t0IcmRCVOPKw== X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="373680486" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:53:38 -0800 Subject: [PATCH v5 07/14] vfio/mdev: idxd: add 1dwq-v1 mdev type From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, parav@mellanox.com, netanelg@mellanox.com, shahafs@mellanox.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 05 Feb 2021 13:53:37 -0700 Message-ID: <161255841792.339900.13314425685185083794.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> References: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add mdev device type "1dwq-v1" support code. 1dwq-v1 is defined as a single DSA gen1 dedicated WQ. This WQ cannot be shared between guests. The guest also cannot change any WQ configuration. Signed-off-by: Dave Jiang --- drivers/dma/idxd/sysfs.c | 1 drivers/vfio/mdev/idxd/mdev.c | 216 +++++++++++++++++++++++++++++++++++++++-- 2 files changed, 207 insertions(+), 10 deletions(-) diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index 13d20cbd4cf6..d985a0ac23d9 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -84,6 +84,7 @@ inline bool is_idxd_wq_mdev(struct idxd_wq *wq) { return wq->type == IDXD_WQT_MDEV ? true : false; } +EXPORT_SYMBOL_GPL(is_idxd_wq_mdev); static int idxd_config_bus_match(struct device *dev, struct device_driver *drv) diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c index 384ba5d6bc2b..7529396f3812 100644 --- a/drivers/vfio/mdev/idxd/mdev.c +++ b/drivers/vfio/mdev/idxd/mdev.c @@ -46,6 +46,9 @@ static u64 idxd_pci_config[] = { 0x0000000000000000ULL, }; +static char idxd_dsa_1dwq_name[IDXD_MDEV_NAME_LEN]; +static char idxd_iax_1dwq_name[IDXD_MDEV_NAME_LEN]; + static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags, unsigned int index, unsigned int start, unsigned int count, void *data); @@ -144,21 +147,70 @@ static void idxd_vdcm_release(struct mdev_device *mdev) mutex_unlock(&vidxd->dev_lock); } +static struct idxd_wq *find_any_dwq(struct idxd_device *idxd, struct vdcm_idxd_type *type) +{ + int i; + struct idxd_wq *wq; + unsigned long flags; + + switch (type->type) { + case IDXD_MDEV_TYPE_DSA_1_DWQ: + if (idxd->type != IDXD_TYPE_DSA) + return NULL; + break; + case IDXD_MDEV_TYPE_IAX_1_DWQ: + if (idxd->type != IDXD_TYPE_IAX) + return NULL; + break; + default: + return NULL; + } + + spin_lock_irqsave(&idxd->dev_lock, flags); + for (i = 0; i < idxd->max_wqs; i++) { + wq = &idxd->wqs[i]; + + if (wq->state != IDXD_WQ_ENABLED) + continue; + + if (!wq_dedicated(wq)) + continue; + + if (idxd_wq_refcount(wq) != 0) + continue; + + spin_unlock_irqrestore(&idxd->dev_lock, flags); + mutex_lock(&wq->wq_lock); + if (idxd_wq_refcount(wq)) { + spin_lock_irqsave(&idxd->dev_lock, flags); + continue; + } + + idxd_wq_get(wq); + mutex_unlock(&wq->wq_lock); + return wq; + } + + spin_unlock_irqrestore(&idxd->dev_lock, flags); + return NULL; +} + static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev_device *mdev, struct vdcm_idxd_type *type) { struct vdcm_idxd *vidxd; struct idxd_wq *wq = NULL; - int i; - - /* PLACEHOLDER, wq matching comes later */ + int i, rc; + wq = find_any_dwq(idxd, type); if (!wq) return ERR_PTR(-ENODEV); vidxd = kzalloc(sizeof(*vidxd), GFP_KERNEL); - if (!vidxd) - return ERR_PTR(-ENOMEM); + if (!vidxd) { + rc = -ENOMEM; + goto err; + } mutex_init(&vidxd->dev_lock); vidxd->idxd = idxd; @@ -169,9 +221,6 @@ static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev vidxd->num_wqs = VIDXD_MAX_WQS; idxd_vdcm_init(vidxd); - mutex_lock(&wq->wq_lock); - idxd_wq_get(wq); - mutex_unlock(&wq->wq_lock); for (i = 0; i < VIDXD_MAX_MSIX_ENTRIES; i++) { vidxd->irq_entries[i].vidxd = vidxd; @@ -179,9 +228,24 @@ static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev } return vidxd; + + err: + mutex_lock(&wq->wq_lock); + idxd_wq_put(wq); + mutex_unlock(&wq->wq_lock); + return ERR_PTR(rc); } -static struct vdcm_idxd_type idxd_mdev_types[IDXD_MDEV_TYPES]; +static struct vdcm_idxd_type idxd_mdev_types[IDXD_MDEV_TYPES] = { + { + .name = idxd_dsa_1dwq_name, + .type = IDXD_MDEV_TYPE_DSA_1_DWQ, + }, + { + .name = idxd_iax_1dwq_name, + .type = IDXD_MDEV_TYPE_IAX_1_DWQ, + }, +}; static struct vdcm_idxd_type *idxd_vdcm_find_vidxd_type(struct device *dev, const char *name) @@ -965,7 +1029,94 @@ static long idxd_vdcm_ioctl(struct mdev_device *mdev, unsigned int cmd, return rc; } -static const struct mdev_parent_ops idxd_vdcm_ops = { +static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf) +{ + struct vdcm_idxd_type *type; + + type = idxd_vdcm_find_vidxd_type(dev, kobject_name(kobj)); + + if (type) + return sprintf(buf, "%s\n", type->name); + + return -EINVAL; +} +static MDEV_TYPE_ATTR_RO(name); + +static int find_available_mdev_instances(struct idxd_device *idxd, struct vdcm_idxd_type *type) +{ + int count = 0, i; + unsigned long flags; + + switch (type->type) { + case IDXD_MDEV_TYPE_DSA_1_DWQ: + if (idxd->type != IDXD_TYPE_DSA) + return 0; + break; + case IDXD_MDEV_TYPE_IAX_1_DWQ: + if (idxd->type != IDXD_TYPE_IAX) + return 0; + break; + default: + return 0; + } + + spin_lock_irqsave(&idxd->dev_lock, flags); + for (i = 0; i < idxd->max_wqs; i++) { + struct idxd_wq *wq; + + wq = &idxd->wqs[i]; + if (!is_idxd_wq_mdev(wq) || !wq_dedicated(wq) || idxd_wq_refcount(wq)) + continue; + + count++; + } + spin_unlock_irqrestore(&idxd->dev_lock, flags); + + return count; +} + +static ssize_t available_instances_show(struct kobject *kobj, + struct device *dev, char *buf) +{ + int count; + struct idxd_device *idxd = dev_get_drvdata(dev); + struct vdcm_idxd_type *type; + + type = idxd_vdcm_find_vidxd_type(dev, kobject_name(kobj)); + if (!type) + return -EINVAL; + + count = find_available_mdev_instances(idxd, type); + + return sprintf(buf, "%d\n", count); +} +static MDEV_TYPE_ATTR_RO(available_instances); + +static ssize_t device_api_show(struct kobject *kobj, struct device *dev, + char *buf) +{ + return sprintf(buf, "%s\n", VFIO_DEVICE_API_PCI_STRING); +} +static MDEV_TYPE_ATTR_RO(device_api); + +static struct attribute *idxd_mdev_types_attrs[] = { + &mdev_type_attr_name.attr, + &mdev_type_attr_device_api.attr, + &mdev_type_attr_available_instances.attr, + NULL, +}; + +static struct attribute_group idxd_mdev_type_dsa_group0 = { + .name = idxd_dsa_1dwq_name, + .attrs = idxd_mdev_types_attrs, +}; + +static struct attribute_group idxd_mdev_type_iax_group0 = { + .name = idxd_iax_1dwq_name, + .attrs = idxd_mdev_types_attrs, +}; + +static struct mdev_parent_ops idxd_vdcm_ops = { .create = idxd_vdcm_create, .remove = idxd_vdcm_remove, .open = idxd_vdcm_open, @@ -976,6 +1127,43 @@ static const struct mdev_parent_ops idxd_vdcm_ops = { .ioctl = idxd_vdcm_ioctl, }; +/* Set the mdev type version to the hardware version supported */ +static void init_mdev_1dwq_name(struct idxd_device *idxd) +{ + unsigned int version; + + version = (idxd->hw.version & GENMASK(15, 8)) >> 8; + if (idxd->type == IDXD_TYPE_DSA && strlen(idxd_dsa_1dwq_name) == 0) + sprintf(idxd_dsa_1dwq_name, "dsa-1dwq-v%u", version); + else if (idxd->type == IDXD_TYPE_IAX && strlen(idxd_iax_1dwq_name) == 0) + sprintf(idxd_iax_1dwq_name, "iax-1dwq-v%u", version); +} + +static int alloc_supported_types(struct idxd_device *idxd) +{ + struct attribute_group **idxd_mdev_type_groups; + + idxd_mdev_type_groups = kcalloc(2, sizeof(struct attribute_group *), GFP_KERNEL); + if (!idxd_mdev_type_groups) + return -ENOMEM; + + switch (idxd->type) { + case IDXD_TYPE_DSA: + idxd_mdev_type_groups[0] = &idxd_mdev_type_dsa_group0; + break; + case IDXD_TYPE_IAX: + idxd_mdev_type_groups[0] = &idxd_mdev_type_iax_group0; + break; + case IDXD_TYPE_UNKNOWN: + default: + return -ENODEV; + } + + idxd_vdcm_ops.supported_type_groups = idxd_mdev_type_groups; + + return 0; +} + int idxd_mdev_host_init(struct idxd_device *idxd) { struct device *dev = &idxd->pdev->dev; @@ -984,6 +1172,11 @@ int idxd_mdev_host_init(struct idxd_device *idxd) if (!test_bit(IDXD_FLAG_IMS_SUPPORTED, &idxd->flags)) return -EOPNOTSUPP; + init_mdev_1dwq_name(idxd); + rc = alloc_supported_types(idxd); + if (rc < 0) + return rc; + if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) { rc = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_AUX); if (rc < 0) { @@ -1010,6 +1203,9 @@ void idxd_mdev_host_release(struct idxd_device *idxd) dev_warn(dev, "Failed to disable aux-domain: %d\n", rc); } + + kfree(idxd_vdcm_ops.supported_type_groups); + idxd_vdcm_ops.supported_type_groups = NULL; } static int idxd_mdev_aux_probe(struct auxiliary_device *auxdev, From patchwork Fri Feb 5 20:53:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12070963 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 845EDC433DB for ; Fri, 5 Feb 2021 21:12:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4052164E51 for ; Fri, 5 Feb 2021 21:12:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233381AbhBEVHc (ORCPT ); Fri, 5 Feb 2021 16:07:32 -0500 Received: from mga11.intel.com ([192.55.52.93]:22064 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233385AbhBETMB (ORCPT ); Fri, 5 Feb 2021 14:12:01 -0500 IronPort-SDR: tNJdH/V9pm/ch5GT+J8LWVBoG4nfZgXxZW6NUppZ9OVzkCX/4mBfs5FcVeKeXyf2fkhZPD6Mvs DL/XE//F7cag== X-IronPort-AV: E=McAfee;i="6000,8403,9886"; a="177982049" X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="177982049" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:53:45 -0800 IronPort-SDR: A9H5lm1Sd7i/M1gNhBM3L79kwkWoUYKCEc7Q63pqu+getCM1rq23nqINQo/VKV9hbKa2J3d4dq vtBnQGY+FsCQ== X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="484459065" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:53:44 -0800 Subject: [PATCH v5 08/14] vfio/mdev: idxd: add emulation rw routines From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, parav@mellanox.com, netanelg@mellanox.com, shahafs@mellanox.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 05 Feb 2021 13:53:44 -0700 Message-ID: <161255842420.339900.2588415689318940385.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> References: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add emulation routines for PCI config read/write, MMIO read/write, and interrupt handling routine for the emulated device. The rw routines are called when PCI config read/writes or BAR0 mmio read/writes and being issued by the guest kernel through KVM/qemu. Because we are supporting read-only configuration, most of the MMIO emulations are simple memory copy except for cases such as handling device commands and interrupts. As part of emulation code, add the support code for "1dwq" mdev type. This mdev type follows the standard VFIO mdev flow. The "1dwq" type will export a single dedicated wq to the mdev. The dwq will have read-only configuration that is configured by the host. The mdev type does not support PASID and SVA and will match the stage 1 driver in functional support. For backward compatibility, the mdev will maintain the DSA spec definition of this mdev type once the commit goes upstream. Signed-off-by: Dave Jiang --- drivers/dma/idxd/registers.h | 10 + drivers/vfio/mdev/idxd/vdev.c | 456 ++++++++++++++++++++++++++++++++++++++++- drivers/vfio/mdev/idxd/vdev.h | 8 + include/uapi/linux/idxd.h | 2 4 files changed, 468 insertions(+), 8 deletions(-) diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h index d9a732decdd5..50ea94259c99 100644 --- a/drivers/dma/idxd/registers.h +++ b/drivers/dma/idxd/registers.h @@ -195,7 +195,8 @@ union cmdsts_reg { }; u32 bits; } __packed; -#define IDXD_CMDSTS_ACTIVE 0x80000000 +#define IDXD_CMDS_ACTIVE_BIT 31 +#define IDXD_CMDSTS_ACTIVE BIT(IDXD_CMDS_ACTIVE_BIT) #define IDXD_CMDSTS_ERR_MASK 0xff #define IDXD_CMDSTS_RES_SHIFT 8 @@ -278,6 +279,11 @@ union msix_perm { u32 bits; } __packed; +#define IDXD_MSIX_PERM_MASK 0xfffff00c +#define IDXD_MSIX_PERM_IGNORE 0x3 +#define MSIX_ENTRY_MASK_INT 0x1 +#define MSIX_ENTRY_CTRL_BYTE 12 + union group_flags { struct { u32 tc_a:3; @@ -349,6 +355,8 @@ union wqcfg { #define WQCFG_PASID_IDX 2 #define WQCFG_PRIV_IDX 2 +#define WQCFG_MODE_DEDICATED 1 +#define WQCFG_MODE_SHARED 0 /* * This macro calculates the offset into the WQCFG register diff --git a/drivers/vfio/mdev/idxd/vdev.c b/drivers/vfio/mdev/idxd/vdev.c index 766753a2ec53..958b09987e5c 100644 --- a/drivers/vfio/mdev/idxd/vdev.c +++ b/drivers/vfio/mdev/idxd/vdev.c @@ -25,35 +25,472 @@ int vidxd_send_interrupt(struct ims_irq_entry *iie) { - /* PLACE HOLDER */ + struct vdcm_idxd *vidxd = iie->vidxd; + struct device *dev = &vidxd->idxd->pdev->dev; + int rc; + + dev_dbg(dev, "%s interrput %d\n", __func__, iie->id); + + if (!vidxd->vdev.msix_trigger[iie->id]) { + dev_warn(dev, "%s: intr eventfd not found %d\n", __func__, iie->id); + return -EINVAL; + } + + rc = eventfd_signal(vidxd->vdev.msix_trigger[iie->id], 1); + if (rc != 1) + dev_err(dev, "eventfd signal failed: %d on wq(%d) vector(%d)\n", + vidxd->wq->id, iie->id, rc); + else + dev_dbg(dev, "vidxd interrupt triggered wq(%d) %d\n", vidxd->wq->id, iie->id); + + return rc; +} + +static void vidxd_report_error(struct vdcm_idxd *vidxd, unsigned int error) +{ + u8 *bar0 = vidxd->bar0; + union sw_err_reg *swerr = (union sw_err_reg *)(bar0 + IDXD_SWERR_OFFSET); + union genctrl_reg *genctrl; + bool send = false; + + if (!swerr->valid) { + memset(swerr, 0, sizeof(*swerr)); + swerr->valid = 1; + swerr->error = error; + send = true; + } else if (swerr->valid && !swerr->overflow) { + swerr->overflow = 1; + } + + genctrl = (union genctrl_reg *)(bar0 + IDXD_GENCTRL_OFFSET); + if (send && genctrl->softerr_int_en) { + u32 *intcause = (u32 *)(bar0 + IDXD_INTCAUSE_OFFSET); + + *intcause |= IDXD_INTC_ERR; + vidxd_send_interrupt(&vidxd->irq_entries[0]); + } +} + +int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size) +{ + u32 offset = pos & (vidxd->bar_size[0] - 1); + u8 *bar0 = vidxd->bar0; + struct device *dev = mdev_dev(vidxd->vdev.mdev); + + dev_dbg(dev, "vidxd mmio W %d %x %x: %llx\n", vidxd->wq->id, size, + offset, get_reg_val(buf, size)); + + if (((size & (size - 1)) != 0) || (offset & (size - 1)) != 0) + return -EINVAL; + + /* If we don't limit this, we potentially can write out of bound */ + if (size > sizeof(u32)) + return -EINVAL; + + switch (offset) { + case IDXD_GENCFG_OFFSET ... IDXD_GENCFG_OFFSET + 3: + /* Write only when device is disabled. */ + if (vidxd_state(vidxd) == IDXD_DEVICE_STATE_DISABLED) + memcpy(bar0 + offset, buf, size); + break; + + case IDXD_GENCTRL_OFFSET: + memcpy(bar0 + offset, buf, size); + break; + + case IDXD_INTCAUSE_OFFSET: + bar0[offset] &= ~(get_reg_val(buf, 1) & GENMASK(4, 0)); + break; + + case IDXD_CMD_OFFSET: { + u32 *cmdsts = (u32 *)(bar0 + IDXD_CMDSTS_OFFSET); + u32 val = get_reg_val(buf, size); + + if (size != sizeof(u32)) + return -EINVAL; + + /* Check and set command in progress */ + if (test_and_set_bit(IDXD_CMDS_ACTIVE_BIT, (unsigned long *)cmdsts) == 0) + vidxd_do_command(vidxd, val); + else + vidxd_report_error(vidxd, DSA_ERR_CMD_REG); + break; + } + + case IDXD_SWERR_OFFSET: + /* W1C */ + bar0[offset] &= ~(get_reg_val(buf, 1) & GENMASK(1, 0)); + break; + + case VIDXD_WQCFG_OFFSET ... VIDXD_WQCFG_OFFSET + VIDXD_WQ_CTRL_SZ - 1: + case VIDXD_GRPCFG_OFFSET ... VIDXD_GRPCFG_OFFSET + VIDXD_GRP_CTRL_SZ - 1: + /* Nothing is written. Should be all RO */ + break; + + case VIDXD_MSIX_TABLE_OFFSET ... VIDXD_MSIX_TABLE_OFFSET + VIDXD_MSIX_TBL_SZ - 1: { + int index = (offset - VIDXD_MSIX_TABLE_OFFSET) / 0x10; + u8 *msix_entry = &bar0[VIDXD_MSIX_TABLE_OFFSET + index * 0x10]; + u64 *pba = (u64 *)(bar0 + VIDXD_MSIX_PBA_OFFSET); + u8 ctrl; + + ctrl = msix_entry[MSIX_ENTRY_CTRL_BYTE]; + memcpy(bar0 + offset, buf, size); + /* Handle clearing of UNMASK bit */ + if (!(msix_entry[MSIX_ENTRY_CTRL_BYTE] & MSIX_ENTRY_MASK_INT) && + ctrl & MSIX_ENTRY_MASK_INT) + if (test_and_clear_bit(index, (unsigned long *)pba)) + vidxd_send_interrupt(&vidxd->irq_entries[index]); + break; + } + + case VIDXD_MSIX_PERM_OFFSET ... VIDXD_MSIX_PERM_OFFSET + VIDXD_MSIX_PERM_TBL_SZ - 1: + memcpy(bar0 + offset, buf, size); + break; + } /* offset */ + return 0; } int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size) { - /* PLACEHOLDER */ + u32 offset = pos & (vidxd->bar_size[0] - 1); + struct device *dev = mdev_dev(vidxd->vdev.mdev); + + memcpy(buf, vidxd->bar0 + offset, size); + + dev_dbg(dev, "vidxd mmio R %d %x %x: %llx\n", + vidxd->wq->id, size, offset, get_reg_val(buf, size)); return 0; } -int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size) +int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count) { - /* PLACEHOLDER */ + u32 offset = pos & 0xfff; + struct device *dev = mdev_dev(vidxd->vdev.mdev); + + memcpy(buf, &vidxd->cfg[offset], count); + + dev_dbg(dev, "vidxd pci R %d %x %x: %llx\n", + vidxd->wq->id, count, offset, get_reg_val(buf, count)); + return 0; } -int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count) +/* + * Much of the emulation code has been borrowed from Intel i915 cfg space + * emulation code. + * drivers/gpu/drm/i915/gvt/cfg_space.c: + */ + +/* + * Bitmap for writable bits (RW or RW1C bits, but cannot co-exist in one + * byte) byte by byte in standard pci configuration space. (not the full + * 256 bytes.) + */ +static const u8 pci_cfg_space_rw_bmp[PCI_INTERRUPT_LINE + 4] = { + [PCI_COMMAND] = 0xff, 0x07, + [PCI_STATUS] = 0x00, 0xf9, /* the only one RW1C byte */ + [PCI_CACHE_LINE_SIZE] = 0xff, + [PCI_BASE_ADDRESS_0 ... PCI_CARDBUS_CIS - 1] = 0xff, + [PCI_ROM_ADDRESS] = 0x01, 0xf8, 0xff, 0xff, + [PCI_INTERRUPT_LINE] = 0xff, +}; + +static void _pci_cfg_mem_write(struct vdcm_idxd *vidxd, unsigned int off, u8 *src, + unsigned int bytes) { - /* PLACEHOLDER */ + u8 *cfg_base = vidxd->cfg; + u8 mask, new, old; + int i = 0; + + for (; i < bytes && (off + i < sizeof(pci_cfg_space_rw_bmp)); i++) { + mask = pci_cfg_space_rw_bmp[off + i]; + old = cfg_base[off + i]; + new = src[i] & mask; + + /** + * The PCI_STATUS high byte has RW1C bits, here + * emulates clear by writing 1 for these bits. + * Writing a 0b to RW1C bits has no effect. + */ + if (off + i == PCI_STATUS + 1) + new = (~new & old) & mask; + + cfg_base[off + i] = (old & ~mask) | new; + } + + /* For other configuration space directly copy as it is. */ + if (i < bytes) + memcpy(cfg_base + off + i, src + i, bytes - i); +} + +static inline void _write_pci_bar(struct vdcm_idxd *vidxd, u32 offset, u32 val, bool low) +{ + u32 *pval; + + /* BAR offset should be 32 bits algiend */ + offset = rounddown(offset, 4); + pval = (u32 *)(vidxd->cfg + offset); + + if (low) { + /* + * only update bit 31 - bit 4, + * leave the bit 3 - bit 0 unchanged. + */ + *pval = (val & GENMASK(31, 4)) | (*pval & GENMASK(3, 0)); + } else { + *pval = val; + } +} + +static int _pci_cfg_bar_write(struct vdcm_idxd *vidxd, unsigned int offset, void *p_data, + unsigned int bytes) +{ + u32 new = *(u32 *)(p_data); + bool lo = IS_ALIGNED(offset, 8); + u64 size; + unsigned int bar_id; + + /* + * Power-up software can determine how much address + * space the device requires by writing a value of + * all 1's to the register and then reading the value + * back. The device will return 0's in all don't-care + * address bits. + */ + if (new == 0xffffffff) { + switch (offset) { + case PCI_BASE_ADDRESS_0: + case PCI_BASE_ADDRESS_1: + case PCI_BASE_ADDRESS_2: + case PCI_BASE_ADDRESS_3: + bar_id = (offset - PCI_BASE_ADDRESS_0) / 8; + size = vidxd->bar_size[bar_id]; + _write_pci_bar(vidxd, offset, size >> (lo ? 0 : 32), lo); + break; + default: + /* Unimplemented BARs */ + _write_pci_bar(vidxd, offset, 0x0, false); + } + } else { + switch (offset) { + case PCI_BASE_ADDRESS_0: + case PCI_BASE_ADDRESS_1: + case PCI_BASE_ADDRESS_2: + case PCI_BASE_ADDRESS_3: + _write_pci_bar(vidxd, offset, new, lo); + break; + default: + break; + } + } return 0; } int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size) { - /* PLACEHOLDER */ + struct device *dev = &vidxd->idxd->pdev->dev; + + if (size > 4) + return -EINVAL; + + if (pos + size > VIDXD_MAX_CFG_SPACE_SZ) + return -EINVAL; + + dev_dbg(dev, "vidxd pci W %d %x %x: %llx\n", vidxd->wq->id, size, pos, + get_reg_val(buf, size)); + + /* First check if it's PCI_COMMAND */ + if (IS_ALIGNED(pos, 2) && pos == PCI_COMMAND) { + bool new_bme; + bool bme; + + if (size > 2) + return -EINVAL; + + new_bme = !!(get_reg_val(buf, 2) & PCI_COMMAND_MASTER); + bme = !!(vidxd->cfg[pos] & PCI_COMMAND_MASTER); + _pci_cfg_mem_write(vidxd, pos, buf, size); + + /* Flag error if turning off BME while device is enabled */ + if ((bme && !new_bme) && vidxd_state(vidxd) == IDXD_DEVICE_STATE_ENABLED) + vidxd_report_error(vidxd, DSA_ERR_PCI_CFG); + return 0; + } + + switch (pos) { + case PCI_BASE_ADDRESS_0 ... PCI_BASE_ADDRESS_5: + if (!IS_ALIGNED(pos, 4)) + return -EINVAL; + return _pci_cfg_bar_write(vidxd, pos, buf, size); + + default: + _pci_cfg_mem_write(vidxd, pos, buf, size); + } return 0; } +static void vidxd_mmio_init_grpcap(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + union group_cap_reg *grp_cap = (union group_cap_reg *)(bar0 + IDXD_GRPCAP_OFFSET); + + /* single group for current implementation */ + grp_cap->token_en = 0; + grp_cap->token_limit = 0; + grp_cap->total_tokens = 0; + grp_cap->num_groups = 1; +} + +static void vidxd_mmio_init_grpcfg(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + struct grpcfg *grpcfg = (struct grpcfg *)(bar0 + VIDXD_GRPCFG_OFFSET); + struct idxd_wq *wq = vidxd->wq; + struct idxd_group *group = wq->group; + int i; + + /* + * At this point, we are only exporting a single workqueue for + * each mdev. So we need to just fake it as first workqueue + * and also mark the available engines in this group. + */ + + /* Set single workqueue and the first one */ + grpcfg->wqs[0] = BIT(0); + grpcfg->engines = 0; + for (i = 0; i < group->num_engines; i++) + grpcfg->engines |= BIT(i); + grpcfg->flags.bits = group->grpcfg.flags.bits; +} + +static void vidxd_mmio_init_wqcap(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + struct idxd_wq *wq = vidxd->wq; + union wq_cap_reg *wq_cap = (union wq_cap_reg *)(bar0 + IDXD_WQCAP_OFFSET); + + wq_cap->occupancy_int = 0; + wq_cap->occupancy = 0; + wq_cap->priority = 0; + wq_cap->total_wq_size = wq->size; + wq_cap->num_wqs = VIDXD_MAX_WQS; + wq_cap->wq_ats_support = 0; + wq_cap->dedicated_mode = 1; + wq_cap->shared_mode = 0; +} + +static void vidxd_mmio_init_wqcfg(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + struct idxd_wq *wq = vidxd->wq; + u8 *bar0 = vidxd->bar0; + union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + + wqcfg->wq_size = wq->size; + wqcfg->wq_thresh = wq->threshold; + + wqcfg->mode = WQCFG_MODE_DEDICATED; + + wqcfg->bof = 0; + + wqcfg->priority = wq->priority; + wqcfg->max_xfer_shift = idxd->hw.gen_cap.max_xfer_shift; + wqcfg->max_batch_shift = idxd->hw.gen_cap.max_batch_shift; + /* make mode change read-only */ + wqcfg->mode_support = 0; +} + +static void vidxd_mmio_init_engcap(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + union engine_cap_reg *engcap = (union engine_cap_reg *)(bar0 + IDXD_ENGCAP_OFFSET); + struct idxd_wq *wq = vidxd->wq; + struct idxd_group *group = wq->group; + + engcap->num_engines = group->num_engines; +} + +static void vidxd_mmio_init_gencap(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + u8 *bar0 = vidxd->bar0; + union gen_cap_reg *gencap = (union gen_cap_reg *)(bar0 + IDXD_GENCAP_OFFSET); + + gencap->bits = idxd->hw.gen_cap.bits; + gencap->config_en = 0; + gencap->max_ims_mult = 0; + gencap->cmd_cap = 1; + gencap->block_on_fault = 0; +} + +static void vidxd_mmio_init_cmdcap(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + u8 *bar0 = vidxd->bar0; + u32 *cmdcap = (u32 *)(bar0 + IDXD_CMDCAP_OFFSET); + + if (idxd->hw.cmd_cap) + *cmdcap = idxd->hw.cmd_cap; + else + *cmdcap = 0x1ffe; + + *cmdcap |= BIT(IDXD_CMD_REQUEST_INT_HANDLE) | BIT(IDXD_CMD_RELEASE_INT_HANDLE); +} + +static void vidxd_mmio_init_opcap(struct vdcm_idxd *vidxd) +{ + u64 opcode; + u8 *bar0 = vidxd->bar0; + u64 *opcap = (u64 *)(bar0 + IDXD_OPCAP_OFFSET); + + opcode = BIT_ULL(DSA_OPCODE_NOOP) | BIT_ULL(DSA_OPCODE_BATCH) | + BIT_ULL(DSA_OPCODE_DRAIN) | BIT_ULL(DSA_OPCODE_MEMMOVE) | + BIT_ULL(DSA_OPCODE_MEMFILL) | BIT_ULL(DSA_OPCODE_COMPARE) | + BIT_ULL(DSA_OPCODE_COMPVAL) | BIT_ULL(DSA_OPCODE_CR_DELTA) | + BIT_ULL(DSA_OPCODE_AP_DELTA) | BIT_ULL(DSA_OPCODE_DUALCAST) | + BIT_ULL(DSA_OPCODE_CRCGEN) | BIT_ULL(DSA_OPCODE_COPY_CRC) | + BIT_ULL(DSA_OPCODE_DIF_CHECK) | BIT_ULL(DSA_OPCODE_DIF_INS) | + BIT_ULL(DSA_OPCODE_DIF_STRP) | BIT_ULL(DSA_OPCODE_DIF_UPDT) | + BIT_ULL(DSA_OPCODE_CFLUSH); + *opcap = opcode; +} + +static void vidxd_mmio_init_version(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + u32 *version; + + version = (u32 *)vidxd->bar0; + *version = idxd->hw.version; +} + void vidxd_mmio_init(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + union offsets_reg *offsets; + + memset(vidxd->bar0, 0, VIDXD_BAR0_SIZE); + + vidxd_mmio_init_version(vidxd); + vidxd_mmio_init_gencap(vidxd); + vidxd_mmio_init_wqcap(vidxd); + vidxd_mmio_init_grpcap(vidxd); + vidxd_mmio_init_engcap(vidxd); + vidxd_mmio_init_opcap(vidxd); + + offsets = (union offsets_reg *)(bar0 + IDXD_TABLE_OFFSET); + offsets->grpcfg = VIDXD_GRPCFG_OFFSET / 0x100; + offsets->wqcfg = VIDXD_WQCFG_OFFSET / 0x100; + offsets->msix_perm = VIDXD_MSIX_PERM_OFFSET / 0x100; + + vidxd_mmio_init_cmdcap(vidxd); + memset(bar0 + VIDXD_MSIX_PERM_OFFSET, 0, VIDXD_MSIX_PERM_TBL_SZ); + vidxd_mmio_init_grpcfg(vidxd); + vidxd_mmio_init_wqcfg(vidxd); +} + +static void idxd_complete_command(struct vdcm_idxd *vidxd, enum idxd_cmdsts_err val) { /* PLACEHOLDER */ } @@ -63,6 +500,11 @@ void vidxd_reset(struct vdcm_idxd *vidxd) /* PLACEHOLDER */ } +void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val) +{ + /* PLACEHOLDER */ +} + int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd) { /* PLACEHOLDER */ diff --git a/drivers/vfio/mdev/idxd/vdev.h b/drivers/vfio/mdev/idxd/vdev.h index cc2ba6ccff7b..fc0f405baa40 100644 --- a/drivers/vfio/mdev/idxd/vdev.h +++ b/drivers/vfio/mdev/idxd/vdev.h @@ -6,6 +6,13 @@ #include "mdev.h" +static inline u8 vidxd_state(struct vdcm_idxd *vidxd) +{ + union gensts_reg *gensts = (union gensts_reg *)(vidxd->bar0 + IDXD_GENSTATS_OFFSET); + + return gensts->state; +} + int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size); int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size); int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count); @@ -15,5 +22,6 @@ void vidxd_reset(struct vdcm_idxd *vidxd); int vidxd_send_interrupt(struct ims_irq_entry *iie); int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd); void vidxd_free_ims_entries(struct vdcm_idxd *vidxd); +void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val); #endif diff --git a/include/uapi/linux/idxd.h b/include/uapi/linux/idxd.h index 236d437947bc..22d1b229a912 100644 --- a/include/uapi/linux/idxd.h +++ b/include/uapi/linux/idxd.h @@ -89,6 +89,8 @@ enum dsa_completion_status { DSA_COMP_HW_ERR1, DSA_COMP_HW_ERR_DRB, DSA_COMP_TRANSLATION_FAIL, + DSA_ERR_PCI_CFG = 0x51, + DSA_ERR_CMD_REG, }; enum iax_completion_status { From patchwork Fri Feb 5 20:53:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12070949 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 171D6C433DB for ; Fri, 5 Feb 2021 21:08:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C4F8B64DA3 for ; Fri, 5 Feb 2021 21:08:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233728AbhBEVHp (ORCPT ); Fri, 5 Feb 2021 16:07:45 -0500 Received: from mga03.intel.com ([134.134.136.65]:37623 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233571AbhBETMJ (ORCPT ); Fri, 5 Feb 2021 14:12:09 -0500 IronPort-SDR: zWX+fy1q9AkGWedxHdQ8t4i3PjbTkmX5a7a/CpIBSnSgwNGQU6nIBUGlrmmpdqieXjIX3g/dnE rnSQ7gVEnUOA== X-IronPort-AV: E=McAfee;i="6000,8403,9886"; a="181551568" X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="181551568" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:53:52 -0800 IronPort-SDR: 5m9HgDclL4za8ezeMzEDad1ZEJ9kYRjw0ozQzEwzoEWELybEC3CkqFi3e2/qB/g8VUpM5qcPNV PVEoG8Yusi9Q== X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="373513429" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:53:51 -0800 Subject: [PATCH v5 09/14] vfio/mdev: idxd: prep for virtual device commands From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, parav@mellanox.com, netanelg@mellanox.com, shahafs@mellanox.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 05 Feb 2021 13:53:50 -0700 Message-ID: <161255843037.339900.11011951029875473128.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> References: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Update some of the device commands in order to support usage by the virtual device commands emulated by the vdcm. Expose some of the commands' raw status so the virtual commands can utilize them accordingly. Signed-off-by: Dave Jiang --- drivers/dma/idxd/cdev.c | 2 + drivers/dma/idxd/device.c | 69 +++++++++++++++++++++++++++-------------- drivers/dma/idxd/idxd.h | 8 ++--- drivers/dma/idxd/irq.c | 2 + drivers/dma/idxd/sysfs.c | 8 ++--- drivers/vfio/mdev/idxd/mdev.c | 2 + 6 files changed, 56 insertions(+), 35 deletions(-) diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c index b1518106434f..f46328ba8493 100644 --- a/drivers/dma/idxd/cdev.c +++ b/drivers/dma/idxd/cdev.c @@ -160,7 +160,7 @@ static int idxd_cdev_release(struct inode *node, struct file *filep) if (rc < 0) dev_err(dev, "wq disable pasid failed.\n"); } else { - idxd_wq_drain(wq); + idxd_wq_drain(wq, NULL); } } diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index 89fa2bbe6ebf..245d576ddc43 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -216,22 +216,25 @@ void idxd_wq_free_resources(struct idxd_wq *wq) sbitmap_queue_free(&wq->sbq); } -int idxd_wq_enable(struct idxd_wq *wq) +int idxd_wq_enable(struct idxd_wq *wq, u32 *status) { struct idxd_device *idxd = wq->idxd; struct device *dev = &idxd->pdev->dev; - u32 status; + u32 stat; if (wq->state == IDXD_WQ_ENABLED) { dev_dbg(dev, "WQ %d already enabled\n", wq->id); return -ENXIO; } - idxd_cmd_exec(idxd, IDXD_CMD_ENABLE_WQ, wq->id, &status); + idxd_cmd_exec(idxd, IDXD_CMD_ENABLE_WQ, wq->id, &stat); - if (status != IDXD_CMDSTS_SUCCESS && - status != IDXD_CMDSTS_ERR_WQ_ENABLED) { - dev_dbg(dev, "WQ enable failed: %#x\n", status); + if (status) + *status = stat; + + if (stat != IDXD_CMDSTS_SUCCESS && + stat != IDXD_CMDSTS_ERR_WQ_ENABLED) { + dev_dbg(dev, "WQ enable failed: %#x\n", stat); return -ENXIO; } @@ -240,11 +243,11 @@ int idxd_wq_enable(struct idxd_wq *wq) return 0; } -int idxd_wq_disable(struct idxd_wq *wq) +int idxd_wq_disable(struct idxd_wq *wq, u32 *status) { struct idxd_device *idxd = wq->idxd; struct device *dev = &idxd->pdev->dev; - u32 status, operand; + u32 stat, operand; dev_dbg(dev, "Disabling WQ %d\n", wq->id); @@ -254,10 +257,13 @@ int idxd_wq_disable(struct idxd_wq *wq) } operand = BIT(wq->id % 16) | ((wq->id / 16) << 16); - idxd_cmd_exec(idxd, IDXD_CMD_DISABLE_WQ, operand, &status); + idxd_cmd_exec(idxd, IDXD_CMD_DISABLE_WQ, operand, &stat); + + if (status) + *status = stat; - if (status != IDXD_CMDSTS_SUCCESS) { - dev_dbg(dev, "WQ disable failed: %#x\n", status); + if (stat != IDXD_CMDSTS_SUCCESS) { + dev_dbg(dev, "WQ disable failed: %#x\n", stat); return -ENXIO; } @@ -267,20 +273,31 @@ int idxd_wq_disable(struct idxd_wq *wq) } EXPORT_SYMBOL_GPL(idxd_wq_disable); -void idxd_wq_drain(struct idxd_wq *wq) +int idxd_wq_drain(struct idxd_wq *wq, u32 *status) { struct idxd_device *idxd = wq->idxd; struct device *dev = &idxd->pdev->dev; - u32 operand; + u32 operand, stat; if (wq->state != IDXD_WQ_ENABLED) { dev_dbg(dev, "WQ %d in wrong state: %d\n", wq->id, wq->state); - return; + return 0; } dev_dbg(dev, "Draining WQ %d\n", wq->id); operand = BIT(wq->id % 16) | ((wq->id / 16) << 16); - idxd_cmd_exec(idxd, IDXD_CMD_DRAIN_WQ, operand, NULL); + idxd_cmd_exec(idxd, IDXD_CMD_DRAIN_WQ, operand, &stat); + + if (status) + *status = stat; + + if (stat != IDXD_CMDSTS_SUCCESS) { + dev_dbg(dev, "WQ drain failed: %#x\n", stat); + return -ENXIO; + } + + dev_dbg(dev, "WQ %d drained\n", wq->id); + return 0; } int idxd_wq_map_portal(struct idxd_wq *wq) @@ -307,11 +324,11 @@ void idxd_wq_unmap_portal(struct idxd_wq *wq) devm_iounmap(dev, wq->portal); } -int idxd_wq_abort(struct idxd_wq *wq) +int idxd_wq_abort(struct idxd_wq *wq, u32 *status) { struct idxd_device *idxd = wq->idxd; struct device *dev = &idxd->pdev->dev; - u32 operand, status; + u32 operand, stat; dev_dbg(dev, "Abort WQ %d\n", wq->id); if (wq->state != IDXD_WQ_ENABLED) { @@ -321,9 +338,13 @@ int idxd_wq_abort(struct idxd_wq *wq) operand = BIT(wq->id % 16) | ((wq->id / 16) << 16); dev_dbg(dev, "cmd: %u operand: %#x\n", IDXD_CMD_ABORT_WQ, operand); - idxd_cmd_exec(idxd, IDXD_CMD_ABORT_WQ, operand, &status); - if (status != IDXD_CMDSTS_SUCCESS) { - dev_dbg(dev, "WQ abort failed: %#x\n", status); + idxd_cmd_exec(idxd, IDXD_CMD_ABORT_WQ, operand, &stat); + + if (status) + *status = stat; + + if (stat != IDXD_CMDSTS_SUCCESS) { + dev_dbg(dev, "WQ abort failed: %#x\n", stat); return -ENXIO; } @@ -339,7 +360,7 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid) unsigned int offset; unsigned long flags; - rc = idxd_wq_disable(wq); + rc = idxd_wq_disable(wq, NULL); if (rc < 0) return rc; @@ -351,7 +372,7 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid) iowrite32(wqcfg.bits[WQCFG_PASID_IDX], idxd->reg_base + offset); spin_unlock_irqrestore(&idxd->dev_lock, flags); - rc = idxd_wq_enable(wq); + rc = idxd_wq_enable(wq, NULL); if (rc < 0) return rc; @@ -366,7 +387,7 @@ int idxd_wq_disable_pasid(struct idxd_wq *wq) unsigned int offset; unsigned long flags; - rc = idxd_wq_disable(wq); + rc = idxd_wq_disable(wq, NULL); if (rc < 0) return rc; @@ -378,7 +399,7 @@ int idxd_wq_disable_pasid(struct idxd_wq *wq) iowrite32(wqcfg.bits[WQCFG_PASID_IDX], idxd->reg_base + offset); spin_unlock_irqrestore(&idxd->dev_lock, flags); - rc = idxd_wq_enable(wq); + rc = idxd_wq_enable(wq, NULL); if (rc < 0) return rc; diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 67428c8d476d..41eee987c9b7 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -376,9 +376,9 @@ int idxd_device_release_int_handle(struct idxd_device *idxd, int handle, /* work queue control */ int idxd_wq_alloc_resources(struct idxd_wq *wq); void idxd_wq_free_resources(struct idxd_wq *wq); -int idxd_wq_enable(struct idxd_wq *wq); -int idxd_wq_disable(struct idxd_wq *wq); -void idxd_wq_drain(struct idxd_wq *wq); +int idxd_wq_enable(struct idxd_wq *wq, u32 *status); +int idxd_wq_disable(struct idxd_wq *wq, u32 *status); +int idxd_wq_drain(struct idxd_wq *wq, u32 *status); int idxd_wq_map_portal(struct idxd_wq *wq); void idxd_wq_unmap_portal(struct idxd_wq *wq); void idxd_wq_disable_cleanup(struct idxd_wq *wq); @@ -386,7 +386,7 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid); int idxd_wq_disable_pasid(struct idxd_wq *wq); void idxd_wq_quiesce(struct idxd_wq *wq); int idxd_wq_init_percpu_ref(struct idxd_wq *wq); -int idxd_wq_abort(struct idxd_wq *wq); +int idxd_wq_abort(struct idxd_wq *wq, u32 *status); void idxd_wq_setup_pasid(struct idxd_wq *wq, int pasid); void idxd_wq_setup_priv(struct idxd_wq *wq, int priv); diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c index a60ca11a5784..090926856df3 100644 --- a/drivers/dma/idxd/irq.c +++ b/drivers/dma/idxd/irq.c @@ -48,7 +48,7 @@ static void idxd_device_reinit(struct work_struct *work) struct idxd_wq *wq = &idxd->wqs[i]; if (wq->state == IDXD_WQ_ENABLED) { - rc = idxd_wq_enable(wq); + rc = idxd_wq_enable(wq, NULL); if (rc < 0) { dev_warn(dev, "Unable to re-enable wq %s\n", dev_name(&wq->conf_dev)); diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index d985a0ac23d9..913ff019fe36 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -189,7 +189,7 @@ static int enable_wq(struct idxd_wq *wq) return rc; } - rc = idxd_wq_enable(wq); + rc = idxd_wq_enable(wq, NULL); if (rc < 0) { mutex_unlock(&wq->wq_lock); dev_warn(dev, "WQ %d enabling failed: %d\n", wq->id, rc); @@ -199,7 +199,7 @@ static int enable_wq(struct idxd_wq *wq) rc = idxd_wq_map_portal(wq); if (rc < 0) { dev_warn(dev, "wq portal mapping failed: %d\n", rc); - rc = idxd_wq_disable(wq); + rc = idxd_wq_disable(wq, NULL); if (rc < 0) dev_warn(dev, "IDXD wq disable failed\n"); mutex_unlock(&wq->wq_lock); @@ -321,8 +321,8 @@ static void disable_wq(struct idxd_wq *wq) idxd_wq_unmap_portal(wq); - idxd_wq_drain(wq); - rc = idxd_wq_disable(wq); + idxd_wq_drain(wq, NULL); + rc = idxd_wq_disable(wq, NULL); idxd_wq_free_resources(wq); wq->client_count = 0; diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c index 7529396f3812..67e6b33468cd 100644 --- a/drivers/vfio/mdev/idxd/mdev.c +++ b/drivers/vfio/mdev/idxd/mdev.c @@ -117,7 +117,7 @@ static void idxd_vdcm_init(struct vdcm_idxd *vidxd) vidxd_mmio_init(vidxd); if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED) - idxd_wq_disable(wq); + idxd_wq_disable(wq, NULL); } static void idxd_vdcm_release(struct mdev_device *mdev) From patchwork Fri Feb 5 20:53:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12070933 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52C7CC433E0 for ; Fri, 5 Feb 2021 21:02:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 02B7264DA3 for ; Fri, 5 Feb 2021 21:02:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233181AbhBETUV (ORCPT ); Fri, 5 Feb 2021 14:20:21 -0500 Received: from mga06.intel.com ([134.134.136.31]:64259 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233625AbhBETMY (ORCPT ); Fri, 5 Feb 2021 14:12:24 -0500 IronPort-SDR: aHS7M7udBphhDi/cE+F6g9cop7+EqTlEVof6s/vqSscWis+RVl5z9sRvePT2IvLcWaSpXUhOzm kw7oLeBs6aQg== X-IronPort-AV: E=McAfee;i="6000,8403,9886"; a="242990275" X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="242990275" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:53:59 -0800 IronPort-SDR: RqEkyteHsXTYspbPmte1gpuQohms5CGMkStJvpWer1/ZnLC69pZ/0t444vic6+sm/BZWXOAcsl jN+Ct6VQW+eA== X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="416374787" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:53:58 -0800 Subject: [PATCH v5 10/14] vfio/mdev: idxd: virtual device commands emulation From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, parav@mellanox.com, netanelg@mellanox.com, shahafs@mellanox.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 05 Feb 2021 13:53:57 -0700 Message-ID: <161255843743.339900.10371190228025336137.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> References: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add all the helper functions that supports the emulation of the commands that are submitted to the device command register. Signed-off-by: Dave Jiang --- drivers/dma/idxd/device.c | 5 drivers/dma/idxd/registers.h | 16 + drivers/vfio/mdev/idxd/mdev.c | 2 drivers/vfio/mdev/idxd/mdev.h | 3 drivers/vfio/mdev/idxd/vdev.c | 440 +++++++++++++++++++++++++++++++++++++++++ 5 files changed, 460 insertions(+), 6 deletions(-) diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index 245d576ddc43..c5faa23bd8ce 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -242,6 +242,7 @@ int idxd_wq_enable(struct idxd_wq *wq, u32 *status) dev_dbg(dev, "WQ %d enabled\n", wq->id); return 0; } +EXPORT_SYMBOL_GPL(idxd_wq_enable); int idxd_wq_disable(struct idxd_wq *wq, u32 *status) { @@ -299,6 +300,7 @@ int idxd_wq_drain(struct idxd_wq *wq, u32 *status) dev_dbg(dev, "WQ %d drained\n", wq->id); return 0; } +EXPORT_SYMBOL_GPL(idxd_wq_drain); int idxd_wq_map_portal(struct idxd_wq *wq) { @@ -351,6 +353,7 @@ int idxd_wq_abort(struct idxd_wq *wq, u32 *status) dev_dbg(dev, "WQ %d aborted\n", wq->id); return 0; } +EXPORT_SYMBOL_GPL(idxd_wq_abort); int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid) { @@ -470,6 +473,7 @@ void idxd_wq_setup_pasid(struct idxd_wq *wq, int pasid) wq->wqcfg->pasid = pasid; iowrite32(wq->wqcfg->bits[WQCFG_PASID_IDX], idxd->reg_base + offset); } +EXPORT_SYMBOL_GPL(idxd_wq_setup_pasid); void idxd_wq_setup_priv(struct idxd_wq *wq, int priv) { @@ -483,6 +487,7 @@ void idxd_wq_setup_priv(struct idxd_wq *wq, int priv) wq->wqcfg->priv = !!priv; iowrite32(wq->wqcfg->bits[WQCFG_PRIV_IDX], idxd->reg_base + offset); } +EXPORT_SYMBOL_GPL(idxd_wq_setup_priv); /* Device control bits */ static inline bool idxd_is_enabled(struct idxd_device *idxd) diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h index 50ea94259c99..0f985787417c 100644 --- a/drivers/dma/idxd/registers.h +++ b/drivers/dma/idxd/registers.h @@ -120,7 +120,8 @@ union gencfg_reg { union genctrl_reg { struct { u32 softerr_int_en:1; - u32 rsvd:31; + u32 halt_state_int_en:1; + u32 rsvd:30; }; u32 bits; } __packed; @@ -142,6 +143,8 @@ enum idxd_device_status_state { IDXD_DEVICE_STATE_HALT, }; +#define IDXD_GENSTATS_MASK 0x03 + enum idxd_device_reset_type { IDXD_DEVICE_RESET_SOFTWARE = 0, IDXD_DEVICE_RESET_FLR, @@ -154,6 +157,7 @@ enum idxd_device_reset_type { #define IDXD_INTC_CMD 0x02 #define IDXD_INTC_OCCUPY 0x04 #define IDXD_INTC_PERFMON_OVFL 0x08 +#define IDXD_INTC_HALT_STATE 0x10 #define IDXD_CMD_OFFSET 0xa0 union idxd_command_reg { @@ -165,6 +169,7 @@ union idxd_command_reg { }; u32 bits; } __packed; +#define IDXD_CMD_INT_MASK 0x80000000 enum idxd_cmd { IDXD_CMD_ENABLE_DEVICE = 1, @@ -228,10 +233,11 @@ enum idxd_cmdsts_err { /* disable device errors */ IDXD_CMDSTS_ERR_DIS_DEV_EN = 0x31, /* disable WQ, drain WQ, abort WQ, reset WQ */ - IDXD_CMDSTS_ERR_DEV_NOT_EN, + IDXD_CMDSTS_ERR_WQ_NOT_EN, /* request interrupt handle */ IDXD_CMDSTS_ERR_INVAL_INT_IDX = 0x41, IDXD_CMDSTS_ERR_NO_HANDLE, + IDXD_CMDSTS_ERR_INVAL_INT_IDX_RELEASE, }; #define IDXD_CMDCAP_OFFSET 0xb0 @@ -353,6 +359,12 @@ union wqcfg { u32 bits[8]; } __packed; +enum idxd_wq_hw_state { + IDXD_WQ_DEV_DISABLED = 0, + IDXD_WQ_DEV_ENABLED, + IDXD_WQ_DEV_BUSY, +}; + #define WQCFG_PASID_IDX 2 #define WQCFG_PRIV_IDX 2 #define WQCFG_MODE_DEDICATED 1 diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c index 67e6b33468cd..7cde707021db 100644 --- a/drivers/vfio/mdev/idxd/mdev.c +++ b/drivers/vfio/mdev/idxd/mdev.c @@ -52,7 +52,7 @@ static char idxd_iax_1dwq_name[IDXD_MDEV_NAME_LEN]; static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags, unsigned int index, unsigned int start, unsigned int count, void *data); -static int idxd_mdev_get_pasid(struct mdev_device *mdev, u32 *pasid) +int idxd_mdev_get_pasid(struct mdev_device *mdev, u32 *pasid) { struct vfio_group *vfio_group; struct iommu_domain *iommu_domain; diff --git a/drivers/vfio/mdev/idxd/mdev.h b/drivers/vfio/mdev/idxd/mdev.h index 7ca50f054714..8421b4962ac7 100644 --- a/drivers/vfio/mdev/idxd/mdev.h +++ b/drivers/vfio/mdev/idxd/mdev.h @@ -38,6 +38,7 @@ struct ims_irq_entry { bool irq_set; int id; int irq; + int ims_idx; }; struct idxd_vdev { @@ -112,4 +113,6 @@ static inline u64 get_reg_val(void *buf, int size) return val; } +int idxd_mdev_get_pasid(struct mdev_device *mdev, u32 *pasid); + #endif diff --git a/drivers/vfio/mdev/idxd/vdev.c b/drivers/vfio/mdev/idxd/vdev.c index 958b09987e5c..766fd98e9eea 100644 --- a/drivers/vfio/mdev/idxd/vdev.c +++ b/drivers/vfio/mdev/idxd/vdev.c @@ -492,17 +492,451 @@ void vidxd_mmio_init(struct vdcm_idxd *vidxd) static void idxd_complete_command(struct vdcm_idxd *vidxd, enum idxd_cmdsts_err val) { - /* PLACEHOLDER */ + u8 *bar0 = vidxd->bar0; + u32 *cmd = (u32 *)(bar0 + IDXD_CMD_OFFSET); + u32 *cmdsts = (u32 *)(bar0 + IDXD_CMDSTS_OFFSET); + u32 *intcause = (u32 *)(bar0 + IDXD_INTCAUSE_OFFSET); + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + *cmdsts = val; + dev_dbg(dev, "%s: cmd: %#x status: %#x\n", __func__, *cmd, val); + + if (*cmd & IDXD_CMD_INT_MASK) { + *intcause |= IDXD_INTC_CMD; + vidxd_send_interrupt(&vidxd->irq_entries[0]); + } +} + +static void vidxd_enable(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + union gensts_reg *gensts = (union gensts_reg *)(bar0 + IDXD_GENSTATS_OFFSET); + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + dev_dbg(dev, "%s\n", __func__); + if (gensts->state == IDXD_DEVICE_STATE_ENABLED) + return idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DEV_ENABLED); + + /* Check PCI configuration */ + if (!(vidxd->cfg[PCI_COMMAND] & PCI_COMMAND_MASTER)) + return idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_BUSMASTER_EN); + + gensts->state = IDXD_DEVICE_STATE_ENABLED; + + return idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_disable(struct vdcm_idxd *vidxd) +{ + struct idxd_wq *wq; + union wqcfg *wqcfg; + u8 *bar0 = vidxd->bar0; + union gensts_reg *gensts = (union gensts_reg *)(bar0 + IDXD_GENSTATS_OFFSET); + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u32 status; + + dev_dbg(dev, "%s\n", __func__); + if (gensts->state == IDXD_DEVICE_STATE_DISABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DIS_DEV_EN); + return; + } + + wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + wq = vidxd->wq; + + /* If it is a DWQ, need to disable the DWQ as well */ + if (wq_dedicated(wq)) { + idxd_wq_disable(wq, &status); + if (status) { + dev_warn(dev, "vidxd disable (wq disable) failed: %#x\n", status); + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DIS_DEV_EN); + return; + } + } else { + idxd_wq_drain(wq, &status); + if (status) + dev_warn(dev, "vidxd disable (wq drain) failed: %#x\n", status); + } + + wqcfg->wq_state = 0; + gensts->state = IDXD_DEVICE_STATE_DISABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_drain_all(struct vdcm_idxd *vidxd) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct idxd_wq *wq = vidxd->wq; + + dev_dbg(dev, "%s\n", __func__); + + idxd_wq_drain(wq, NULL); + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_drain(struct vdcm_idxd *vidxd, int val) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u8 *bar0 = vidxd->bar0; + union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + struct idxd_wq *wq = vidxd->wq; + u32 status; + + dev_dbg(dev, "%s\n", __func__); + if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN); + return; + } + + idxd_wq_drain(wq, &status); + if (status) { + dev_dbg(dev, "wq drain failed: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_abort_all(struct vdcm_idxd *vidxd) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct idxd_wq *wq = vidxd->wq; + + dev_dbg(dev, "%s\n", __func__); + idxd_wq_abort(wq, NULL); + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_abort(struct vdcm_idxd *vidxd, int val) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u8 *bar0 = vidxd->bar0; + union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + struct idxd_wq *wq = vidxd->wq; + u32 status; + + dev_dbg(dev, "%s\n", __func__); + if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN); + return; + } + + idxd_wq_abort(wq, &status); + if (status) { + dev_dbg(dev, "wq abort failed: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); } void vidxd_reset(struct vdcm_idxd *vidxd) { - /* PLACEHOLDER */ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u8 *bar0 = vidxd->bar0; + union gensts_reg *gensts = (union gensts_reg *)(bar0 + IDXD_GENSTATS_OFFSET); + struct idxd_wq *wq; + + dev_dbg(dev, "%s\n", __func__); + gensts->state = IDXD_DEVICE_STATE_DRAIN; + wq = vidxd->wq; + + if (wq->state == IDXD_WQ_ENABLED) { + idxd_wq_abort(wq, NULL); + idxd_wq_disable(wq, NULL); + } + + vidxd_mmio_init(vidxd); + gensts->state = IDXD_DEVICE_STATE_DISABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_reset(struct vdcm_idxd *vidxd, int wq_id_mask) +{ + struct idxd_wq *wq; + u8 *bar0 = vidxd->bar0; + union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u32 status; + + wq = vidxd->wq; + dev_dbg(dev, "vidxd reset wq %u:%u\n", 0, wq->id); + + if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN); + return; + } + + idxd_wq_abort(wq, &status); + if (status) { + dev_dbg(dev, "vidxd reset wq failed to abort: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + + idxd_wq_disable(wq, &status); + if (status) { + dev_dbg(dev, "vidxd reset wq failed to disable: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + + wqcfg->wq_state = IDXD_WQ_DEV_DISABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_alloc_int_handle(struct vdcm_idxd *vidxd, int operand) +{ + bool ims = !!(operand & CMD_INT_HANDLE_IMS); + u32 cmdsts; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + int ims_idx, vidx; + + vidx = operand & GENMASK(15, 0); + + dev_dbg(dev, "allocating int handle for %d\n", vidx); + + /* vidx cannot be 0 since that's emulated and does not require IMS handle */ + if (vidx <= 0 || vidx >= VIDXD_MAX_MSIX_ENTRIES) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_INVAL_INT_IDX); + return; + } + + if (ims) { + dev_warn(dev, "IMS allocation is not implemented yet\n"); + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_NO_HANDLE); + return; + } + + ims_idx = vidxd->irq_entries[vidx].ims_idx; + cmdsts = ims_idx << IDXD_CMDSTS_RES_SHIFT; + dev_dbg(dev, "requested index %d handle %d\n", vidx, ims_idx); + idxd_complete_command(vidxd, cmdsts); +} + +static void vidxd_release_int_handle(struct vdcm_idxd *vidxd, int operand) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + bool ims = !!(operand & CMD_INT_HANDLE_IMS); + int handle, i; + bool found = false; + + handle = operand & GENMASK(15, 0); + dev_dbg(dev, "allocating int handle %d\n", handle); + + if (ims) { + dev_warn(dev, "IMS allocation is not implemented yet\n"); + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_INVAL_INT_IDX_RELEASE); + return; + } + + /* IMS backed entry start at 1, 0 is emulated vector */ + for (i = 1; i < VIDXD_MAX_MSIX_ENTRIES; i++) { + if (vidxd->irq_entries[i].ims_idx == handle) { + found = true; + break; + } + } + + if (!found) { + dev_warn(dev, "Freeing unallocated int handle.\n"); + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_INVAL_INT_IDX_RELEASE); + } + + dev_dbg(dev, "int handle %d released.\n", handle); + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_enable(struct vdcm_idxd *vidxd, int wq_id) +{ + struct idxd_wq *wq; + u8 *bar0 = vidxd->bar0; + union wq_cap_reg *wqcap; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct idxd_device *idxd; + union wqcfg *vwqcfg, *wqcfg; + unsigned long flags; + u32 status, wq_pasid; + int priv, rc; + + if (wq_id >= VIDXD_MAX_WQS) { + idxd_complete_command(vidxd, IDXD_CMDSTS_INVAL_WQIDX); + return; + } + + idxd = vidxd->idxd; + wq = vidxd->wq; + + dev_dbg(dev, "%s: wq %u:%u\n", __func__, wq_id, wq->id); + + vwqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET + wq_id * 32); + wqcap = (union wq_cap_reg *)(bar0 + IDXD_WQCAP_OFFSET); + wqcfg = wq->wqcfg; + + if (vidxd_state(vidxd) != IDXD_DEVICE_STATE_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DEV_NOTEN); + return; + } + + if (vwqcfg->wq_state != IDXD_WQ_DEV_DISABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_ENABLED); + return; + } + + if (wq_dedicated(wq) && wqcap->dedicated_mode == 0) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_MODE); + return; + } + + priv = 1; + rc = idxd_mdev_get_pasid(mdev, &wq_pasid); + if (rc < 0) { + dev_err(dev, "idxd pasid setup failed wq %d: %d\n", wq->id, rc); + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_PASID_EN); + return; + } + + /* Clear pasid_en, pasid, and priv values */ + wqcfg->bits[WQCFG_PASID_IDX] &= ~GENMASK(29, 8); + wqcfg->priv = priv; + wqcfg->pasid_en = 1; + wqcfg->pasid = wq_pasid; + dev_dbg(dev, "program pasid %d in wq %d\n", wq_pasid, wq->id); + spin_lock_irqsave(&idxd->dev_lock, flags); + idxd_wq_setup_pasid(wq, wq_pasid); + idxd_wq_setup_priv(wq, priv); + spin_unlock_irqrestore(&idxd->dev_lock, flags); + idxd_wq_enable(wq, &status); + if (status) { + dev_err(dev, "vidxd enable wq %d failed\n", wq->id); + idxd_complete_command(vidxd, status); + return; + } + + vwqcfg->wq_state = IDXD_WQ_DEV_ENABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_disable(struct vdcm_idxd *vidxd, int wq_id_mask) +{ + struct idxd_wq *wq; + union wqcfg *wqcfg; + u8 *bar0 = vidxd->bar0; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u32 status; + + wq = vidxd->wq; + + dev_dbg(dev, "vidxd disable wq %u:%u\n", 0, wq->id); + + wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN); + return; + } + + /* If it is a DWQ, need to disable the DWQ as well */ + if (wq_dedicated(wq)) { + idxd_wq_disable(wq, &status); + if (status) { + dev_warn(dev, "vidxd disable wq failed: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + } else { + idxd_wq_drain(wq, &status); + if (status) { + dev_warn(dev, "vidxd disable drain wq failed: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + } + + wqcfg->wq_state = IDXD_WQ_DEV_DISABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static bool command_supported(struct vdcm_idxd *vidxd, u32 cmd) +{ + struct idxd_device *idxd = vidxd->idxd; + + if (cmd == IDXD_CMD_REQUEST_INT_HANDLE || cmd == IDXD_CMD_RELEASE_INT_HANDLE) + return true; + + return !!(idxd->hw.opcap.bits[0] & BIT_ULL(cmd)); } void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val) { - /* PLACEHOLDER */ + union idxd_command_reg *reg = (union idxd_command_reg *)(vidxd->bar0 + IDXD_CMD_OFFSET); + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + reg->bits = val; + + dev_dbg(dev, "%s: cmd code: %u reg: %x\n", __func__, reg->cmd, reg->bits); + + if (!command_supported(vidxd, reg->cmd)) { + idxd_complete_command(vidxd, IDXD_CMDSTS_INVAL_CMD); + return; + } + + switch (reg->cmd) { + case IDXD_CMD_ENABLE_DEVICE: + vidxd_enable(vidxd); + break; + case IDXD_CMD_DISABLE_DEVICE: + vidxd_disable(vidxd); + break; + case IDXD_CMD_DRAIN_ALL: + vidxd_drain_all(vidxd); + break; + case IDXD_CMD_ABORT_ALL: + vidxd_abort_all(vidxd); + break; + case IDXD_CMD_RESET_DEVICE: + vidxd_reset(vidxd); + break; + case IDXD_CMD_ENABLE_WQ: + vidxd_wq_enable(vidxd, reg->operand); + break; + case IDXD_CMD_DISABLE_WQ: + vidxd_wq_disable(vidxd, reg->operand); + break; + case IDXD_CMD_DRAIN_WQ: + vidxd_wq_drain(vidxd, reg->operand); + break; + case IDXD_CMD_ABORT_WQ: + vidxd_wq_abort(vidxd, reg->operand); + break; + case IDXD_CMD_RESET_WQ: + vidxd_wq_reset(vidxd, reg->operand); + break; + case IDXD_CMD_REQUEST_INT_HANDLE: + vidxd_alloc_int_handle(vidxd, reg->operand); + break; + case IDXD_CMD_RELEASE_INT_HANDLE: + vidxd_release_int_handle(vidxd, reg->operand); + break; + default: + idxd_complete_command(vidxd, IDXD_CMDSTS_INVAL_CMD); + break; + } } int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd) From patchwork Fri Feb 5 20:54:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12070937 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CAF6C433E0 for ; Fri, 5 Feb 2021 21:03:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2771B64FB4 for ; Fri, 5 Feb 2021 21:03:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233489AbhBETUg (ORCPT ); Fri, 5 Feb 2021 14:20:36 -0500 Received: from mga06.intel.com ([134.134.136.31]:64265 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233624AbhBETMX (ORCPT ); Fri, 5 Feb 2021 14:12:23 -0500 IronPort-SDR: nSM3jjjnIHSyg+twsxOtU+94OWxV7dMkmlGzwNpS7tY2f5uYUaGb4RFMTheMcoD63hmyP/3cvP E28yeMro4nSw== X-IronPort-AV: E=McAfee;i="6000,8403,9886"; a="242990281" X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="242990281" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:54:05 -0800 IronPort-SDR: PajqOwiDjXskbWiMWCMobc+aQEjlgmErAdbdO33UjwDDoBectEaet4CLZreXZshgOBDvDjUI6J VFq2av9VA1zA== X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="357820056" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:54:04 -0800 Subject: [PATCH v5 11/14] vfio/mdev: idxd: ims setup for the vdcm From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, parav@mellanox.com, netanelg@mellanox.com, shahafs@mellanox.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 05 Feb 2021 13:54:04 -0700 Message-ID: <161255844433.339900.3136365210231233047.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> References: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add setup for IMS enabling for the mediated device. On the actual hardware the MSIX vector 0 is misc interrupt and handles events such as administrative command completion, error reporting, performance monitor overflow, and etc. The MSIX vectors 1...N are used for descriptor completion interrupts. On the guest kernel, the MSIX interrupts are backed by the mediated device through emulation or IMS vectors. Vector 0 is handled through emulation by the host vdcm. The vector 1 (and more may be supported later) is backed by IMS. IMS can be setup with interrupt handlers via request_irq() just like MSIX interrupts once the relevant IRQ domain is set. The msi_domain_alloc_irqs()/msi_domain_free_irqs() APIs can then be used to allocate interrupts from the above set domain. Signed-off-by: Dave Jiang --- drivers/dma/idxd/idxd.h | 1 + drivers/vfio/mdev/idxd/mdev.c | 12 +++++++++ drivers/vfio/mdev/idxd/vdev.c | 53 ++++++++++++++++++++++++++++++++--------- kernel/irq/msi.c | 2 ++ 4 files changed, 57 insertions(+), 11 deletions(-) diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 41eee987c9b7..c5ef6ccc9ba6 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -224,6 +224,7 @@ struct idxd_device { struct workqueue_struct *wq; struct work_struct work; + struct irq_domain *ims_domain; int *int_handles; struct auxiliary_device *mdev_auxdev; diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c index 7cde707021db..8a4af882a47f 100644 --- a/drivers/vfio/mdev/idxd/mdev.c +++ b/drivers/vfio/mdev/idxd/mdev.c @@ -1167,6 +1167,7 @@ static int alloc_supported_types(struct idxd_device *idxd) int idxd_mdev_host_init(struct idxd_device *idxd) { struct device *dev = &idxd->pdev->dev; + struct ims_array_info ims_info; int rc; if (!test_bit(IDXD_FLAG_IMS_SUPPORTED, &idxd->flags)) @@ -1188,6 +1189,15 @@ int idxd_mdev_host_init(struct idxd_device *idxd) return -EOPNOTSUPP; } + ims_info.max_slots = idxd->ims_size; + ims_info.slots = idxd->reg_base + idxd->ims_offset; + idxd->ims_domain = pci_ims_array_create_msi_irq_domain(idxd->pdev, &ims_info); + if (!idxd->ims_domain) { + dev_warn(dev, "Fail to acquire IMS domain\n"); + iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_AUX); + return -ENODEV; + } + return mdev_register_device(dev, &idxd_vdcm_ops); } @@ -1196,6 +1206,8 @@ void idxd_mdev_host_release(struct idxd_device *idxd) struct device *dev = &idxd->pdev->dev; int rc; + irq_domain_remove(idxd->ims_domain); + mdev_unregister_device(dev); if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) { rc = iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_AUX); diff --git a/drivers/vfio/mdev/idxd/vdev.c b/drivers/vfio/mdev/idxd/vdev.c index 766fd98e9eea..8626438a9e54 100644 --- a/drivers/vfio/mdev/idxd/vdev.c +++ b/drivers/vfio/mdev/idxd/vdev.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include "registers.h" #include "idxd.h" @@ -871,6 +872,47 @@ static void vidxd_wq_disable(struct vdcm_idxd *vidxd, int wq_id_mask) idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); } +void vidxd_free_ims_entries(struct vdcm_idxd *vidxd) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + msi_domain_free_irqs(dev_get_msi_domain(dev), dev); +} + +int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd) +{ + struct irq_domain *irq_domain; + struct idxd_device *idxd = vidxd->idxd; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct msi_desc *entry; + struct ims_irq_entry *irq_entry; + int rc, i; + + irq_domain = idxd->ims_domain; + dev_set_msi_domain(dev, irq_domain); + + /* We are allocate MAX_MSIX - 1 is because vector 0 is emulated and not IMS backed. */ + rc = msi_domain_alloc_irqs(irq_domain, dev, VIDXD_MAX_MSIX_VECS - 1); + if (rc < 0) + return rc; + /* + * The first MSIX vector on the guest is emulated and not backed by IMS. To make matters + * simple the ims entries include the emulated vector. Here the code starts at index + * 1 to setup all the IMS backed vectors. + */ + i = 1; + for_each_msi_entry(entry, dev) { + irq_entry = &vidxd->irq_entries[i]; + irq_entry->ims_idx = entry->device_msi.hwirq; + irq_entry->irq = entry->irq; + i++; + } + + return 0; +} + static bool command_supported(struct vdcm_idxd *vidxd, u32 cmd) { struct idxd_device *idxd = vidxd->idxd; @@ -938,14 +980,3 @@ void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val) break; } } - -int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd) -{ - /* PLACEHOLDER */ - return 0; -} - -void vidxd_free_ims_entries(struct vdcm_idxd *vidxd) -{ - /* PLACEHOLDER */ -} diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c index d70d92eac322..d95299b4ae79 100644 --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -536,6 +536,7 @@ int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev, return ops->domain_alloc_irqs(domain, dev, nvec); } +EXPORT_SYMBOL_GPL(msi_domain_alloc_irqs); void __msi_domain_free_irqs(struct irq_domain *domain, struct device *dev) { @@ -572,6 +573,7 @@ void msi_domain_free_irqs(struct irq_domain *domain, struct device *dev) return ops->domain_free_irqs(domain, dev); } +EXPORT_SYMBOL_GPL(msi_domain_free_irqs); /** * msi_get_domain_info - Get the MSI interrupt domain info for @domain From patchwork Fri Feb 5 20:54:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12070951 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4954C433DB for ; Fri, 5 Feb 2021 21:08:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 780B364FBB for ; Fri, 5 Feb 2021 21:08:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233744AbhBEVIF (ORCPT ); Fri, 5 Feb 2021 16:08:05 -0500 Received: from mga07.intel.com ([134.134.136.100]:56062 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233481AbhBETMJ (ORCPT ); Fri, 5 Feb 2021 14:12:09 -0500 IronPort-SDR: ZvdZ+pGO4vRzHu/NU1/oNMRyH9qTnoSNOoSXZKS4jWV2R6e02+tsAUpMN3HCMY6QuTQTh0l22c VmgScOyRQuUw== X-IronPort-AV: E=McAfee;i="6000,8403,9886"; a="245557951" X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="245557951" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:54:12 -0800 IronPort-SDR: I49WTfDurRmURM/DTk/92r5bCOpZaF9j0yeGrW5dXKsWGtCIlkkYgSySBFhQAdk+CRAoJ9K80w 43uPmeMeL6tQ== X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="508666130" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:54:11 -0800 Subject: [PATCH v5 12/14] vfio/mdev: idxd: add irq bypass for IMS vectors From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, parav@mellanox.com, netanelg@mellanox.com, shahafs@mellanox.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 05 Feb 2021 13:54:10 -0700 Message-ID: <161255845083.339900.12339149209312159722.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> References: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add support to bypass host for IMS interrupts configured for the guest. Signed-off-by: Dave Jiang --- drivers/vfio/mdev/Kconfig | 1 + drivers/vfio/mdev/idxd/mdev.c | 17 +++++++++++++++-- drivers/vfio/mdev/idxd/mdev.h | 1 + 3 files changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig index e9540e43d1f1..ab0a6f0930bc 100644 --- a/drivers/vfio/mdev/Kconfig +++ b/drivers/vfio/mdev/Kconfig @@ -22,6 +22,7 @@ config VFIO_MDEV_IDXD depends on VFIO && VFIO_MDEV && X86_64 select AUXILIARY_BUS select IMS_MSI_ARRAY + select IRQ_BYPASS_MANAGER default n help VFIO based mediated device driver for Intel Accelerator Devices driver. diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c index 8a4af882a47f..d59920f78109 100644 --- a/drivers/vfio/mdev/idxd/mdev.c +++ b/drivers/vfio/mdev/idxd/mdev.c @@ -616,9 +616,13 @@ static int msix_trigger_unregister(struct vdcm_idxd *vidxd, int index) dev_dbg(dev, "disable MSIX trigger %d\n", index); if (index) { + struct irq_bypass_producer *producer; u32 auxval; + producer = &vidxd->vdev.producer[index]; + irq_bypass_unregister_producer(producer); irq_entry = &vidxd->irq_entries[index]; + if (irq_entry->irq_set) { free_irq(irq_entry->irq, irq_entry); irq_entry->irq_set = false; @@ -654,9 +658,10 @@ static int msix_trigger_register(struct vdcm_idxd *vidxd, u32 fd, int index) } if (index) { - u32 pasid; - u32 auxval; + struct irq_bypass_producer *producer; + u32 pasid, auxval; + producer = &vidxd->vdev.producer[index]; irq_entry = &vidxd->irq_entries[index]; rc = idxd_mdev_get_pasid(mdev, &pasid); if (rc < 0) @@ -682,6 +687,14 @@ static int msix_trigger_register(struct vdcm_idxd *vidxd, u32 fd, int index) irq_set_auxdata(irq_entry->irq, IMS_AUXDATA_CONTROL_WORD, auxval); return rc; } + + producer->token = trigger; + producer->irq = irq_entry->irq; + rc = irq_bypass_register_producer(producer); + if (unlikely(rc)) + dev_info(dev, "irq bypass producer (token %p) registration failed: %d\n", + producer->token, rc); + irq_entry->irq_set = true; } diff --git a/drivers/vfio/mdev/idxd/mdev.h b/drivers/vfio/mdev/idxd/mdev.h index 8421b4962ac7..1f867de416e7 100644 --- a/drivers/vfio/mdev/idxd/mdev.h +++ b/drivers/vfio/mdev/idxd/mdev.h @@ -45,6 +45,7 @@ struct idxd_vdev { struct mdev_device *mdev; struct vfio_group *vfio_group; struct eventfd_ctx *msix_trigger[VIDXD_MAX_MSIX_ENTRIES]; + struct irq_bypass_producer producer[VIDXD_MAX_MSIX_ENTRIES]; }; struct vdcm_idxd { From patchwork Fri Feb 5 20:54:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12070915 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C18B0C433E6 for ; Fri, 5 Feb 2021 20:57:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7684D64FB1 for ; Fri, 5 Feb 2021 20:57:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233755AbhBETON (ORCPT ); Fri, 5 Feb 2021 14:14:13 -0500 Received: from mga03.intel.com ([134.134.136.65]:37593 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233723AbhBETMu (ORCPT ); Fri, 5 Feb 2021 14:12:50 -0500 IronPort-SDR: fJdkk9BGeDch80PC9/g9WV6BGfHotRP2h5ZzrJm6kJgWVfGGrAabTnpJ2iHQbCg3iQHG6VyTFO SVnuFrIdjT+g== X-IronPort-AV: E=McAfee;i="6000,8403,9886"; a="181551601" X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="181551601" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:54:18 -0800 IronPort-SDR: Cge2ehvW9insyeKyV9XHY3vQmr6/rhDWwyFyeR62HO7mYG20bPdATVecw7vrDZMBAW2P8H5PQu 4LzpDMECI5sw== X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="416146008" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:54:17 -0800 Subject: [PATCH v5 13/14] vfio/mdev: idxd: add new wq state for mdev From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, parav@mellanox.com, netanelg@mellanox.com, shahafs@mellanox.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 05 Feb 2021 13:54:17 -0700 Message-ID: <161255845740.339900.10615654311326039119.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> References: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org When a dedicated wq is enabled as mdev, we must disable the wq on the device in order to program the pasid to the wq. Introduce a wq state IDXD_WQ_LOCKED that is software state only in order to prevent the user from modifying the configuration while mdev wq is in this state. While in this state, the wq is not in DISABLED state and will prevent any modifications to the configuration. It is also not in the ENABLED state and therefore prevents any actions allowed in the ENABLED state. For mdev, the dwq is disabled and set to LOCKED state upon the mdev creation. When ->open() is called on the mdev and a pasid is programmed to the WQCFG, the dwq is enabled again and goes to the ENABLED state. Signed-off-by: Dave Jiang --- drivers/dma/idxd/device.c | 9 +++++++++ drivers/dma/idxd/idxd.h | 1 + drivers/dma/idxd/sysfs.c | 2 ++ drivers/vfio/mdev/idxd/mdev.c | 4 +++- 4 files changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index c5faa23bd8ce..1cd64a6a60de 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -252,6 +252,14 @@ int idxd_wq_disable(struct idxd_wq *wq, u32 *status) dev_dbg(dev, "Disabling WQ %d\n", wq->id); + /* + * When the wq is in LOCKED state, it means it is disabled but + * also at the same time is "enabled" as far as the user is + * concerned. So a call to disable the hardware can be skipped. + */ + if (wq->state == IDXD_WQ_LOCKED) + goto out; + if (wq->state != IDXD_WQ_ENABLED) { dev_dbg(dev, "WQ %d in wrong state: %d\n", wq->id, wq->state); return 0; @@ -268,6 +276,7 @@ int idxd_wq_disable(struct idxd_wq *wq, u32 *status) return -ENXIO; } + out: wq->state = IDXD_WQ_DISABLED; dev_dbg(dev, "WQ %d disabled\n", wq->id); return 0; diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index c5ef6ccc9ba6..4afe35385f85 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -62,6 +62,7 @@ struct idxd_group { enum idxd_wq_state { IDXD_WQ_DISABLED = 0, IDXD_WQ_ENABLED, + IDXD_WQ_LOCKED, }; enum idxd_wq_flag { diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index 913ff019fe36..1bce55ac24b9 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -879,6 +879,8 @@ static ssize_t wq_state_show(struct device *dev, return sprintf(buf, "disabled\n"); case IDXD_WQ_ENABLED: return sprintf(buf, "enabled\n"); + case IDXD_WQ_LOCKED: + return sprintf(buf, "locked\n"); } return sprintf(buf, "unknown\n"); diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c index d59920f78109..60913950a4f5 100644 --- a/drivers/vfio/mdev/idxd/mdev.c +++ b/drivers/vfio/mdev/idxd/mdev.c @@ -116,8 +116,10 @@ static void idxd_vdcm_init(struct vdcm_idxd *vidxd) vidxd_mmio_init(vidxd); - if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED) + if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED) { idxd_wq_disable(wq, NULL); + wq->state = IDXD_WQ_LOCKED; + } } static void idxd_vdcm_release(struct mdev_device *mdev) From patchwork Fri Feb 5 20:54:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12070935 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C95F9C433E0 for ; Fri, 5 Feb 2021 21:03:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 90DA164FB4 for ; Fri, 5 Feb 2021 21:03:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233404AbhBETUc (ORCPT ); Fri, 5 Feb 2021 14:20:32 -0500 Received: from mga12.intel.com ([192.55.52.136]:30881 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233630AbhBETMX (ORCPT ); Fri, 5 Feb 2021 14:12:23 -0500 IronPort-SDR: gYIR2fB7z72T09LeE127MckFYCBnQlZFDBUfKid/acZojIKGPThDTrvMYuSs0YzYHBAdX+1OD3 EYgBoKFqHrXQ== X-IronPort-AV: E=McAfee;i="6000,8403,9886"; a="160645235" X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="160645235" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:54:25 -0800 IronPort-SDR: uhZLUkkHEErKzCwzWYQJT5naWnl2dq0WsrMNy1yXXf8wDRsWV5X0cpgNFlAy/5ONLy2smKpGFX KlHEuIJjiBpg== X-IronPort-AV: E=Sophos;i="5.81,156,1610438400"; d="scan'208";a="434597079" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Feb 2021 12:54:23 -0800 Subject: [PATCH v5 14/14] vfio/mdev: idxd: add error notification from host driver to mediated device From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, parav@mellanox.com, netanelg@mellanox.com, shahafs@mellanox.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 05 Feb 2021 13:54:23 -0700 Message-ID: <161255846348.339900.12785712530534526428.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> References: <161255810396.339900.7646244556839438765.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org When a device error occurs, the mediated device need to be notified in order to notify the guest of device error. Add support to notify the specific mdev when an error is wq specific and broadcast errors to all mdev when it's a generic device error. Signed-off-by: Dave Jiang --- drivers/dma/idxd/idxd.h | 7 +++++++ drivers/dma/idxd/irq.c | 6 ++++++ drivers/vfio/mdev/idxd/mdev.c | 5 +++++ drivers/vfio/mdev/idxd/vdev.c | 32 ++++++++++++++++++++++++++++++++ drivers/vfio/mdev/idxd/vdev.h | 1 + 5 files changed, 51 insertions(+) diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 4afe35385f85..6016df029ed4 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -295,10 +295,17 @@ enum idxd_interrupt_type { IDXD_IRQ_IMS, }; +struct aux_mdev_ops { + void (*notify_error)(struct idxd_wq *wq); +}; + struct idxd_mdev_aux_drv { struct auxiliary_driver auxiliary_drv; + const struct aux_mdev_ops ops; }; +#define to_mdev_aux_drv(_aux_drv) container_of(_aux_drv, struct idxd_mdev_aux_drv, auxiliary_drv) + static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot, enum idxd_interrupt_type irq_type) { diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c index 090926856df3..9cdd3e789799 100644 --- a/drivers/dma/idxd/irq.c +++ b/drivers/dma/idxd/irq.c @@ -118,6 +118,8 @@ static int process_misc_interrupts(struct idxd_device *idxd, u32 cause) u32 val = 0; int i; bool err = false; + struct auxiliary_driver *auxdrv = to_auxiliary_drv(idxd->mdev_auxdev->dev.driver); + struct idxd_mdev_aux_drv *mdevdrv = to_mdev_aux_drv(auxdrv); if (cause & IDXD_INTC_ERR) { spin_lock_bh(&idxd->dev_lock); @@ -132,6 +134,8 @@ static int process_misc_interrupts(struct idxd_device *idxd, u32 cause) if (wq->type == IDXD_WQT_USER) wake_up_interruptible(&wq->idxd_cdev.err_queue); + else if (wq->type == IDXD_WQT_MDEV) + mdevdrv->ops.notify_error(wq); } else { int i; @@ -140,6 +144,8 @@ static int process_misc_interrupts(struct idxd_device *idxd, u32 cause) if (wq->type == IDXD_WQT_USER) wake_up_interruptible(&wq->idxd_cdev.err_queue); + else if (wq->type == IDXD_WQT_MDEV) + mdevdrv->ops.notify_error(wq); } } diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c index 60913950a4f5..edccaad66c8c 100644 --- a/drivers/vfio/mdev/idxd/mdev.c +++ b/drivers/vfio/mdev/idxd/mdev.c @@ -1266,12 +1266,17 @@ static const struct auxiliary_device_id idxd_mdev_auxbus_id_table[] = { }; MODULE_DEVICE_TABLE(auxiliary, idxd_mdev_auxbus_id_table); +static const struct aux_mdev_ops aux_mdev_ops = { + .notify_error = idxd_wq_vidxd_send_errors, +}; + static struct idxd_mdev_aux_drv idxd_mdev_aux_drv = { .auxiliary_drv = { .id_table = idxd_mdev_auxbus_id_table, .probe = idxd_mdev_aux_probe, .remove = idxd_mdev_aux_remove, }, + .ops = aux_mdev_ops, }; static int idxd_mdev_auxdev_drv_register(struct idxd_mdev_aux_drv *drv) diff --git a/drivers/vfio/mdev/idxd/vdev.c b/drivers/vfio/mdev/idxd/vdev.c index 8626438a9e54..3aa9d5b870e8 100644 --- a/drivers/vfio/mdev/idxd/vdev.c +++ b/drivers/vfio/mdev/idxd/vdev.c @@ -980,3 +980,35 @@ void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val) break; } } + +static void vidxd_send_errors(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + u8 *bar0 = vidxd->bar0; + union sw_err_reg *swerr = (union sw_err_reg *)(bar0 + IDXD_SWERR_OFFSET); + union genctrl_reg *genctrl = (union genctrl_reg *)(bar0 + IDXD_GENCTRL_OFFSET); + u32 *intcause = (u32 *)(bar0 + IDXD_INTCAUSE_OFFSET); + int i; + + if (swerr->valid) { + if (!swerr->overflow) + swerr->overflow = 1; + return; + } + + lockdep_assert_held(&idxd->dev_lock); + for (i = 0; i < 4; i++) + swerr->bits[i] = idxd->sw_err.bits[i]; + + *intcause |= IDXD_INTC_ERR; + if (genctrl->softerr_int_en) + vidxd_send_interrupt(&vidxd->irq_entries[0]); +} + +void idxd_wq_vidxd_send_errors(struct idxd_wq *wq) +{ + struct vdcm_idxd *vidxd; + + list_for_each_entry(vidxd, &wq->vdcm_list, list) + vidxd_send_errors(vidxd); +} diff --git a/drivers/vfio/mdev/idxd/vdev.h b/drivers/vfio/mdev/idxd/vdev.h index fc0f405baa40..00df08f9a963 100644 --- a/drivers/vfio/mdev/idxd/vdev.h +++ b/drivers/vfio/mdev/idxd/vdev.h @@ -23,5 +23,6 @@ int vidxd_send_interrupt(struct ims_irq_entry *iie); int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd); void vidxd_free_ims_entries(struct vdcm_idxd *vidxd); void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val); +void idxd_wq_vidxd_send_errors(struct idxd_wq *wq); #endif