From patchwork Sat May 22 00:19:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12274203 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 486DBC4707F for ; Sat, 22 May 2021 00:19:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 28AB1613E1 for ; Sat, 22 May 2021 00:19:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230333AbhEVAUi (ORCPT ); Fri, 21 May 2021 20:20:38 -0400 Received: from mga18.intel.com ([134.134.136.126]:45223 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230184AbhEVAUh (ORCPT ); Fri, 21 May 2021 20:20:37 -0400 IronPort-SDR: CDvn45KaIoVZ0cTkJr7XBoTX/rMHsn8aO5iU9iA/p/dUptrvyFxGsde0K9ii3n7fSaALnPmZU6 +wUQ6/SKp1mQ== X-IronPort-AV: E=McAfee;i="6200,9189,9991"; a="188993167" X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="188993167" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:19:12 -0700 IronPort-SDR: zZ+iUC64phyt/rw0hFaqJh+6G3ZXR3oR0KbwjR+LvOXv/oVm4IuMs1jjWaW5eKgqwrqqZEEqhB J/KZhLW1Tq5g== X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="474752627" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:19:11 -0700 Subject: [PATCH v6 01/20] vfio/mdev: idxd: add theory of operation documentation for idxd mdev From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org, jgg@mellanox.com Cc: Ashok Raj , Kevin Tian , megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 21 May 2021 17:19:11 -0700 Message-ID: <162164275148.261970.30424337261509487.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> References: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add idxd vfio mediated device theory of operation documentation. Provide description on mdev design, usage, and why vfio mdev was chosen. Reviewed-by: Ashok Raj Reviewed-by: Kevin Tian Signed-off-by: Dave Jiang --- Documentation/driver-api/vfio/mdev-idxd.rst | 379 +++++++++++++++++++++++++++ MAINTAINERS | 7 2 files changed, 386 insertions(+) create mode 100644 Documentation/driver-api/vfio/mdev-idxd.rst diff --git a/Documentation/driver-api/vfio/mdev-idxd.rst b/Documentation/driver-api/vfio/mdev-idxd.rst new file mode 100644 index 000000000000..5c793638e176 --- /dev/null +++ b/Documentation/driver-api/vfio/mdev-idxd.rst @@ -0,0 +1,379 @@ +.. SPDX-License-Identifier: GPL-2.0 + +============= +IDXD Overview +============= +IDXD (Intel Data Accelerator Driver) is the driver for the Intel Data +Streaming Accelerator (DSA). Intel DSA is a high performance data copy +and transformation accelerator. In addition to data move operations, +the device also supports data fill, CRC generation, Data Integrity Field +(DIF), and memory compare and delta generation. Intel DSA supports +a variety of PCI-SIG defined capabilities such as Address Translation +Services (ATS), Process address Space ID (PASID), Page Request Interface +(PRI), Message Signalled Interrupts Extended (MSI-X), and Advanced Error +Reporting (AER). Some of those capabilities enable the device to support +Shared Virtual Memory (SVM), or also known as Shared Virtual Addressing +(SVA). Intel DSA also supports Intel Scalable I/O Virtualization (SIOV) +to improve scalability of device assignment. + + +The Intel DSA device contains the following basic components: +* Work queue (WQ) + + A WQ is an on device storage to queue descriptors to the + device. Requests are added to a WQ by using new CPU instructions + (MOVDIR64B and ENQCMD(S)) to write the memory mapped “portal” + associated with each WQ. + +* Engine + + Operation unit that pulls descriptors from WQs and processes them. + +* Group + + Abstract container to associate one or more engines with one or more WQs. + + +Two types of WQs are supported: +* Dedicated WQ (DWQ) + + Usually a single client owns this exclusively and can submit work + to it. The MOVDIR64B instruction is used to submit descriptors to + this type of WQ. The instruction is a posted write, therefore the + submitter must ensure not exceed the WQ length for submission. The + use of PASID is optional with DWQ. Multiple clients can submit to + a DWQ, but sychronization is required due to when the WQ is full, + the submission is silently dropped. + +* Shared WQ (SWQ) + + Multiple clients can submit work to this WQ. The submitter must use + ENQMCDS (from supervisor mode) or ENQCMD (from user mode). These + instructions are non-posted writes. That means a response is + expected from the issued instruction. The EFLAGS.ZF bit will be set + when a failure (busy or fail) has occurred from the command. + The use of PASID is mandatory to identify the address space + of each client. + + +For more information about the new instructions [1][2]. + +The IDXD driver is broken down into following usages: +* In kernel interface through dmaengine subsystem API. +* Userspace DMA support through character device. mmap(2) is utilized + to map directly to mmio address (or portals) for descriptor submission. +* VFIO Mediated device (mdev) supporting device passthrough usages. + +This document is only for the mdev usage. + + +================================= +Assignable Device Interface (ADI) +================================= +The term ADI is used to represent the minimal unit of assignment for +Intel Scalable IOV device. Each ADI instance refers to the set of device +backend resources that are allocated, configured and organized as an +isolated unit. + +Intel DSA defines each WQ as an ADI. The MMIO registers of each work queue +are partitioned into two categories: +* MMIO registers accessed for data-path operations. +* MMIO registers accessed for control-path operations. + +Data-path MMIO registers of each WQ are contained within +one or more system page size aligned regions and can be mapped in the +CPU page table for direct access from the guest. Control-path MMIO +registers of all WQs are located together but segregated from data-path +MMIO regions. Therefore, guest updates to control-path registers must +be intercepted and then go through the host driver to be reflected in +the device. + +Data-path MMIO registers of DSA WQ are portals for submitting descriptors +to the device. There are four portals per WQ, each being 64 bytes +in size and located on a separate 4KB page in BAR2. Each portal has +different implications regarding interrupt message type (MSI vs. IMS) +and occupancy control (limited vs. unlimited). It is not necessary to +map all portals to the guest. + +Control-path MMIO registers of DSA WQ include global configurations +(shared by all WQs) and WQ-specific configurations. The owner +(e.g. the guest) of the WQ is expected to only change WQ-specific +configurations. Intel DSA spec introduces a “Configuration Support” +capability which, if cleared, indicates that some fields of WQ +configuration registers are read-only thus pre-configured by the host. + + +Interrupt Message Store (IMS) +----------------------------- +The ADI utilizes Interrupt Message Store (IMS), a device-specific MSI +implementation, instead of MSIX for interrupts for the guest. This +preserves MSIX for host usages and also allows a significantly larger +number of interrupt vectors for large number of guests usage. + +Intel DSA device implements IMS as on-device memory mapped unified +storage. Each interrupt message is stored as a DWORD size data payload +and a 64-bit address (same as MSI-X). Access to the IMS is through the +host idxd driver. + + +ADI Isolation +------------- +Operations or functioning of one ADI must not affect the functioning +of another ADI or the physical device. Upstream memory requests from +different ADIs are distinguished using a Process Address Space Identifier +(PASID). With the support of PASID-granular address translation in Intel +VT-d, the address space targeted by a request from ADI can be a Host +Virtual Address (HVA), Host I/O Virtual Address (HIOVA), Guest Physical +Address (GPA), Guest Virtual Address (GVA), Guest I/O Virtual Address +(GIOVA), etc. The PASID identity for an ADI is expected to be accessed +or modified by privileged software through the host driver. + +========================= +Virtual DSA (vDSA) Device +========================= +The DSA WQ itself is not a PCI device thus must be composed into a +virtual DSA device to the guest. + +The composition logic needs to handle four main requirements: +* Emulate PCI config space. +* Map data-path portals for direct access from the guest. +* Emulate control-path MMIO registers and selectively forward WQ + configuration requests through host driver to the device. +* Forward and emulate WQ interrupts to the guest. + +The composition logic tells the guest which aspects of WQ are configurable +through a combination of capability fields, e.g.: +* Configuration Support (if cleared, most aspects are not modifiable). +* WQ Mode Support (if cleared, cannot change between dedicated and + shared mode). +* Dedicated Mode Support. +* Shared Mode Support. +* ... + +The virtual capability fields are set according to the vDSA +type. Following is an example of vDSA types and related WQ configurability: +* Type ‘1dwq-v1’ + * One DSA gen1 dedicated WQ + * Guest cannot share the WQ between its clients (no guest SVA) + * Guest cannot change any WQ configuration + +Besides, the composition logic also needs to serve administrative commands +(thru virtual CMD register) through host driver, including: +* Drain/abort all descriptors submitted by this guest. +* Drain/abort descriptors associated with a PASID. +* Enable/disable/reset the WQ (when it’s not shared by multiple VMs). +* Request interrupt handle. + +With this design, vDSA emulation is **greatly simplified**. Only limited +configurability is handled with most registers emulated in simple +READ-ONLY flavor. + +======================================= +Mdev Framework Registration and Release +======================================= + +Intel DSA reports support for Intel Scalable IOV via a PCI Express +Designated Vendor Specific Extended Capability (DVSEC). In addition, +PASID-granular address translation capability is required in the +IOMMU. During host initialization, the IDXD driver should check the +presence of both capabilities before calling mdev_register_device() +to register with the VFIO mdev framework and provide a set of ops +(struct vfio_device_ops). The IOMMU capability is indicated by the +IOMMU_DEV_FEAT_AUX feature flag with iommu_dev_has_feature() and enabled +with iommu_dev_enable_feature(). + +On release, iommu_dev_disable_feature() is called after +mdev_unregister_device() to disable the IOMMU_DEV_FEAT_AUX flag that +the driver enabled during host initialization. + +The vfio_device_ops data structure is filled out by the driver to provide +a number of ops called by VFIO core:: + + struct vfio_device_ops { + .open + .release + .read + .write + .mmap + .ioctl + }; + +The mdev driver provides supported type group attributes. It also +registers the mdev driver with probe and remove calls:: + + struct mdev_driver { + .probe + .remove + .supported_type_groups + }; + + +Supported_type_groups +--------------------- +At the moment only one vDSA type is supported. + +“1dwq-v1”: + Single dedicated WQ (DSA 1.0) with read-only configuration exposed to + the guest. On the guest kernel, a vDSA device shows up with a single + WQ that is pre-configured by the host. The configuration for the WQ + is entirely read-only and cannot be reconfigured. There is no support + of guest SVA on this WQ. + + PCI MSI-X vectors are surfaced from the mdev device to the guest kernel. + In the current implementation 2 vectors are supported. Vector 0 is used for + device misc operations (admin command completion, error report, etc.) just + like on the host. Vector 1 is used for descriptor completion. The vector 0 + is emulated by the host driver. The second interrupt vector is backed by + an IMS vector on the host. + +probe +------ +API function to create the mdev. mdev_set_iommu_device() is called to +associate the mdev device to the parent PCI device. This function is +where the driver sets up and initializes the resources to support a single +mdev device. vfio_init_group_dev() and vfio_register_group_dev() are called +in order to associate the 'struct vfio_device' with the 'struct device' from +the mdev and the vfio_device_ops. + +remove +------ +API function that mirrors the create() function and releases all the +resources backing the mdev. vfio_unregister_group_dev() is called. + +open +---- +API function that is called down from VFIO userspace when it is ready to claim +and utilize the mdev. + +release +------- +The mirror function to open that releases the mdev by VFIO userspace. + +read / write +------------ +This is where the Intel IDXD driver provides read/write emulation of +the "slow" path of the mdev, including PCI config space and control-path +MMIO registers. Typically configuration and administrative commands go +through this path. This allows the mdev to show up as a virtual PCI +device in the guest kernel. + +The emulation of PCI config space is nothing special, which is simply +copied from kvmgt. In the future this part might be consolidated to +reduce duplication. + +Emulating MMIO reads are simply memory copies. There is no side-effect +to be emulated upon guest read. + +Emulating MMIO writes are required only for a few registers, due to +read-only configuration on the ‘1dwq-v1’ type. Majority of composition +logic is hooked in the CMD register for performing administrative commands +such as WQ drain, abort, enable, disable and reset operations. The rest of +the emulation is about handling errors (GENCTRL/SWERROR) and interrupts +(INTCAUSE/MSIXPERM) on the vDSA device. Future mdev types might allow +limited WQ configurability, which then requires additional emulation of +the WQCFG register. + +mmap +---- +This is the function that provides the setup to expose a portion of the +hardware, also known as portals, for direct access for “fast” path +operations through the mmap() syscall. A limited region of the hardware +is mapped to the guest for direct I/O submission. + +There are four portals per WQ: unlimited MSI-X, limited MSI-X, unlimited +IMS, limited IMS. Descriptors submitted to limited portals are subject +to threshold configuration limitations for shared WQs. The MSI-X portals +are used for host submissions, and the IMS portals are mapped to vm for +guest submission. The host driver provides IMS portal through the mmap +function to be mapped to the user space in order to expose it directly +to the guest kernel. + +ioctl +----- +This API function does several things +* Provides general device information to VFIO userspace. +* Provides device region information (PCI, mmio, etc). +* Get interrupts information +* Setup interrupts for the mediated device. +* Mdev device reset + +The PCI device presented by VFIO to the guest kernel will show that it +supports MSIX vectors. The Intel idxd driver will support two vectors +per mdev to back those MSIX vectors. The first vector is emulated by +the host driver via eventfd in order to support various non I/O operations just +like the actual device. The second vector is backed by IMS. IMS provides +additional interrupt vectors on the device outside of PCI MSIX specification +in order to support significantly more vectors. Eventfd is also used by +the second vector to notify the guest kernel. However irq bypass manager is +used to directly inject the interrupt in the guest. When the guest submits +a descriptor through the IMS portal directly to the device, an IMS interrupt +is triggered on completion and routed to the guest as an MSIX interrupt. + +The idxd driver makes use of the generic IMS irq chip and domain which +stores the interrupt messages in an array in device memory. Allocation and +freeing of interrupts happens via the generic msi_domain_alloc/free_irqs() +interface. Driver only needs to ensure the interrupt domain is stored in +the underlying device struct. + +To allocate IMS, we utilize the IMS array APIs. On host init, we need +to create the MSI domain:: + + struct ims_array_info ims_info; + struct device *dev = &pci_dev->dev; + + /* assign the device IMS size */ + ims_info.max_slots = max_ims_size; + /* assign the MMIO base address for the IMS table */ + ims_info.slots = mmio_base + ims_offset; + /* assign the MSI domain to the device */ + dev->msi_domain = pci_ims_array_create_msi_irq_domain(pci_dev, &ims_info); + +When we are ready to allocate the interrupts via the mdev IMS common lib code:: + + struct device *dev = &mdev->dev; + + irq_domain = dev_get_msi_domain(dev); + /* the irqs are allocated against device of mdev */ + rc = msi_domain_alloc_irqs(irq_domain, dev, num_vecs); + + + /* we can retrieve the slot index from msi_entry */ + irq = dev_msi_irq_vector(dev, vector); + + request_irq(irq, interrupt_handler_function, 0, “ims”, context); + + +The DSA device is structured such that MSI-X table entry 0 is used for +admin commands completion, error reporting, and other misc commands. The +remaining MSI-X table entries are used for WQ completion. For vm support, +the virtual device also presents a similar layout. Therefore, vector 0 +is emulated by the software. Additional vector(s) are associated with IMS. + +The index (slot) for the per device IMS entry is managed by the MSI +core. The index is the “interrupt handle” that the guest kernel +needs to program into a DMA descriptor. That interrupt handle tells the +hardware which IMS vector to trigger the interrupt on for the host. + +The virtual device presents an admin command called “request interrupt +handle” that is not supported by the physical device. On probe of +the DSA device on the guest kernel, the guest driver will issue the +“request interrupt handle” command in order to get the interrupt +handle for descriptor programming. The host driver will return the +assigned slot for the IMS entry table to the guest. + +reset +----- + +Device reset is emulated through the mdev. With mdev being a wq rather +than the whole device, we would not reset the entire device on a reset +request. The host driver will simulate a reset of the device by +aborting all the outstanding descriptors on the wq and then disabling +the wq. All MMIO registers are reset to pre-programmed values. + +========== +References +========== +[1] https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.html +[2] https://software.intel.com/en-us/articles/intel-sdm +[3] https://software.intel.com/sites/default/files/managed/cc/0e/intel-scalable-io-virtualization-technical-specification.pdf +[4] https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification diff --git a/MAINTAINERS b/MAINTAINERS index 9450e052f1b1..20f91064a4d1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -18878,6 +18878,13 @@ F: drivers/vfio/mdev/ F: include/linux/mdev.h F: samples/vfio-mdev/ +VFIO MEDIATED DEVICE IDXD DRIVER +M: Dave Jiang +L: kvm@vger.kernel.org +S: Maintained +F: Documentation/driver-api/vfio/mdev-idxd.rst +F: drivers/vfio/mdev/idxd/ + VFIO PLATFORM DRIVER M: Eric Auger L: kvm@vger.kernel.org From patchwork Sat May 22 00:19:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12274205 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4B4FC4707A for ; Sat, 22 May 2021 00:19:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 87880613E1 for ; Sat, 22 May 2021 00:19:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230407AbhEVAUu (ORCPT ); Fri, 21 May 2021 20:20:50 -0400 Received: from mga12.intel.com ([192.55.52.136]:24108 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230147AbhEVAUo (ORCPT ); Fri, 21 May 2021 20:20:44 -0400 IronPort-SDR: hm9HCTDqIBeDvNqHD+FUXrRIZka0ro3BfpXV1f7quv/jgFNhR80wLeZHw+A22guLY4OL4hNwRC 4Lo/cQ13kDPw== X-IronPort-AV: E=McAfee;i="6200,9189,9991"; a="181210488" X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="181210488" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:19:19 -0700 IronPort-SDR: DnWxC3uEvh++a+mt4AeZTukx2h2qeH/QP/9ZlhvjTjrCsx7cpYdnIKVux4DyC3oIWDINQuMiCv UOfmjP6GTZ7g== X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="434499957" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:19:18 -0700 Subject: [PATCH v6 02/20] dmaengine: idxd: add external module driver support for dsa_bus_type From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org, jgg@mellanox.com Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 21 May 2021 17:19:17 -0700 Message-ID: <162164275795.261970.6600777460965468381.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> References: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add support to allow an external driver to be registered to the dsa_bus_type and also auto-loaded. Signed-off-by: Dave Jiang --- drivers/dma/idxd/idxd.h | 6 ++++++ drivers/dma/idxd/init.c | 2 ++ drivers/dma/idxd/sysfs.c | 6 ++++++ 3 files changed, 14 insertions(+) diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 0970d0e67976..22afaf7ee637 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -483,11 +483,17 @@ static inline int idxd_wq_refcount(struct idxd_wq *wq) return wq->client_count; }; +#define MODULE_ALIAS_IDXD_DEVICE(type) MODULE_ALIAS("idxd:t" __stringify(type) "*") +#define IDXD_DEVICES_MODALIAS_FMT "idxd:t%d" + int __must_check __idxd_driver_register(struct idxd_device_driver *idxd_drv, struct module *module, const char *mod_name); #define idxd_driver_register(driver) \ __idxd_driver_register(driver, THIS_MODULE, KBUILD_MODNAME) +#define module_idxd_driver(driver) \ + module_driver(driver, idxd_driver_register, idxd_driver_unregister) + void idxd_driver_unregister(struct idxd_device_driver *idxd_drv); int idxd_register_bus_type(void); diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index 30d3ab0c4051..bed9169152f9 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -843,8 +843,10 @@ int __idxd_driver_register(struct idxd_device_driver *idxd_drv, struct module *o return driver_register(drv); } +EXPORT_SYMBOL_GPL(__idxd_driver_register); void idxd_driver_unregister(struct idxd_device_driver *idxd_drv) { driver_unregister(&idxd_drv->drv); } +EXPORT_SYMBOL_GPL(idxd_driver_unregister); diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index ff2f1c97ed74..4fcb8833a4df 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -53,11 +53,17 @@ static int idxd_config_bus_remove(struct device *dev) return 0; } +static int idxd_bus_uevent(struct device *dev, struct kobj_uevent_env *env) +{ + return add_uevent_var(env, "MODALIAS=" IDXD_DEVICES_MODALIAS_FMT, 0); +} + struct bus_type dsa_bus_type = { .name = "dsa", .match = idxd_config_bus_match, .probe = idxd_config_bus_probe, .remove = idxd_config_bus_remove, + .uevent = idxd_bus_uevent, }; #define DRIVER_ATTR_IGNORE_LOCKDEP(_name, _mode, _show, _store) \ From patchwork Sat May 22 00:19:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12274207 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A289FC4707F for ; Sat, 22 May 2021 00:19:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7B602613E1 for ; Sat, 22 May 2021 00:19:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230410AbhEVAUz (ORCPT ); Fri, 21 May 2021 20:20:55 -0400 Received: from mga14.intel.com ([192.55.52.115]:39645 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230382AbhEVAUu (ORCPT ); Fri, 21 May 2021 20:20:50 -0400 IronPort-SDR: Q1morS0lcA7UBlWOuN2o8qZpvWByhYgxJCbrrGh6hfa748M++Na94aM99NxKDjK4cUzXiBXoDQ GGLSLAGxEEmQ== X-IronPort-AV: E=McAfee;i="6200,9189,9991"; a="201312114" X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="201312114" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:19:25 -0700 IronPort-SDR: SS8nw179FZK2ypju3HOo8yDQMZGa4TvwCjl2XtqtqE0MprseuOZP3YLsEH01gn5zILEtRFUmyg 54HFIk+JupCg== X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="406873451" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:19:24 -0700 Subject: [PATCH v6 03/20] dmaengine: idxd: add IMS offset and size retrieval code From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org, jgg@mellanox.com Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 21 May 2021 17:19:24 -0700 Message-ID: <162164276439.261970.331517339815823049.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> References: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add retrieval code for Interrupt Message Store (IMS) related info (table offset and size). IMS is used to back the MSIX vectors that support the descriptor completion interrupt for the mediated device. In the SIOV spec [1], IMS is specified as detected via DVSEC. Here's the upstream discussion WRT having the device driver doing the detection vs a platform detection feature: [2]. The latest agreement is that IMS should be done from platform perspective. Given that DSA 1.0 and any foreseeable future devices is expected to support IMS, the driver will just check the ims size field to determine if IMS is supported. [1]: https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification [2]: https://lore.kernel.org/dmaengine/20201030224534.GN2620339@nvidia.com/ Signed-off-by: Dave Jiang --- Documentation/ABI/stable/sysfs-driver-dma-idxd | 6 ++++++ drivers/dma/idxd/idxd.h | 2 ++ drivers/dma/idxd/init.c | 4 ++++ drivers/dma/idxd/sysfs.c | 9 +++++++++ 4 files changed, 21 insertions(+) diff --git a/Documentation/ABI/stable/sysfs-driver-dma-idxd b/Documentation/ABI/stable/sysfs-driver-dma-idxd index 55285c136cf0..884065b2e85c 100644 --- a/Documentation/ABI/stable/sysfs-driver-dma-idxd +++ b/Documentation/ABI/stable/sysfs-driver-dma-idxd @@ -129,6 +129,12 @@ KernelVersion: 5.10.0 Contact: dmaengine@vger.kernel.org Description: The last executed device administrative command's status/error. +What: /sys/bus/dsa/devices/dsa/ims_size +Date: May 3, 2021 +KernelVersion: 5.14.0 +Contact: dmaengine@vger.kernel.org +Description: The total number of vectors available for Interrupt Message Store. + What: /sys/bus/dsa/devices/wq./block_on_fault Date: Oct 27, 2020 KernelVersion: 5.11.0 diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 22afaf7ee637..288e3fe15b3e 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -266,6 +266,7 @@ struct idxd_device { int num_groups; + u32 ims_offset; u32 msix_perm_offset; u32 wqcfg_offset; u32 grpcfg_offset; @@ -273,6 +274,7 @@ struct idxd_device { u64 max_xfer_bytes; u32 max_batch_size; + int ims_size; int max_groups; int max_engines; int max_tokens; diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index bed9169152f9..16ff37be2d26 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -381,6 +381,8 @@ static void idxd_read_table_offsets(struct idxd_device *idxd) dev_dbg(dev, "IDXD Work Queue Config Offset: %#x\n", idxd->wqcfg_offset); idxd->msix_perm_offset = offsets.msix_perm * IDXD_TABLE_MULT; dev_dbg(dev, "IDXD MSIX Permission Offset: %#x\n", idxd->msix_perm_offset); + idxd->ims_offset = offsets.ims * IDXD_TABLE_MULT; + dev_dbg(dev, "IDXD IMS Offset: %#x\n", idxd->ims_offset); idxd->perfmon_offset = offsets.perfmon * IDXD_TABLE_MULT; dev_dbg(dev, "IDXD Perfmon Offset: %#x\n", idxd->perfmon_offset); } @@ -403,6 +405,8 @@ static void idxd_read_caps(struct idxd_device *idxd) dev_dbg(dev, "max xfer size: %llu bytes\n", idxd->max_xfer_bytes); idxd->max_batch_size = 1U << idxd->hw.gen_cap.max_batch_shift; dev_dbg(dev, "max batch size: %u\n", idxd->max_batch_size); + idxd->ims_size = idxd->hw.gen_cap.max_ims_mult * 256ULL; + dev_dbg(dev, "IMS size: %u\n", idxd->ims_size); if (idxd->hw.gen_cap.config_en) set_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags); diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index 4fcb8833a4df..6583c9c2e992 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -1166,6 +1166,14 @@ static ssize_t numa_node_show(struct device *dev, } static DEVICE_ATTR_RO(numa_node); +static ssize_t ims_size_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct idxd_device *idxd = confdev_to_idxd(dev); + + return sysfs_emit(buf, "%u\n", idxd->ims_size); +} +static DEVICE_ATTR_RO(ims_size); + static ssize_t max_batch_size_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -1352,6 +1360,7 @@ static struct attribute *idxd_device_attributes[] = { &dev_attr_max_work_queues_size.attr, &dev_attr_max_engines.attr, &dev_attr_numa_node.attr, + &dev_attr_ims_size.attr, &dev_attr_max_batch_size.attr, &dev_attr_max_transfer_size.attr, &dev_attr_op_cap.attr, From patchwork Sat May 22 00:19:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12274209 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6EFAC04FF3 for ; Sat, 22 May 2021 00:19:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B854C613D6 for ; Sat, 22 May 2021 00:19:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230472AbhEVAVC (ORCPT ); Fri, 21 May 2021 20:21:02 -0400 Received: from mga05.intel.com ([192.55.52.43]:23956 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230429AbhEVAU4 (ORCPT ); Fri, 21 May 2021 20:20:56 -0400 IronPort-SDR: GMHoDOlXvgbWADaMrHO0t6GA7zxQMh2NfyyRV5p6TVmBOdui6iXJVzE10GYwLNikAXdPGhqvyo eHtIDpGYPOBQ== X-IronPort-AV: E=McAfee;i="6200,9189,9991"; a="287141588" X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="287141588" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:19:32 -0700 IronPort-SDR: esG7x3geBjjEJwpaWiErkd74HLovIDMST2CdV6mmZKEub9WlCvtfgIUhggS+y3oTR+f5hRdHj4 Ygc5vhS9k3Iw== X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="406873535" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:19:30 -0700 Subject: [PATCH v6 04/20] dmaengine: idxd: add portal offset for IMS portals From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org, jgg@mellanox.com Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 21 May 2021 17:19:30 -0700 Message-ID: <162164277035.261970.14322823489384216890.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> References: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Device portal offsets are 4k apart laid out in the order of unlimited MSIX portal, limited MSIX portal, unlimited IMS portal, limited IMS portal. Add an additional parameter to calculate the IMS portal offsets. Signed-off-by: Dave Jiang --- drivers/dma/idxd/cdev.c | 4 ++-- drivers/dma/idxd/device.c | 2 +- drivers/dma/idxd/idxd.h | 11 +++-------- 3 files changed, 6 insertions(+), 11 deletions(-) diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c index e3d29244f752..62a53123fd58 100644 --- a/drivers/dma/idxd/cdev.c +++ b/drivers/dma/idxd/cdev.c @@ -202,8 +202,8 @@ static int idxd_cdev_mmap(struct file *filp, struct vm_area_struct *vma) return rc; vma->vm_flags |= VM_DONTCOPY; - pfn = (base + idxd_get_wq_portal_full_offset(wq->id, - IDXD_PORTAL_LIMITED)) >> PAGE_SHIFT; + pfn = (base + idxd_get_wq_portal_offset(wq->id, IDXD_PORTAL_LIMITED, + IDXD_IRQ_MSIX)) >> PAGE_SHIFT; vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); vma->vm_private_data = ctx; diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index 3549a73fc7db..02e9a050b5bb 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -300,7 +300,7 @@ int idxd_wq_map_portal(struct idxd_wq *wq) resource_size_t start; start = pci_resource_start(pdev, IDXD_WQ_BAR); - start += idxd_get_wq_portal_full_offset(wq->id, IDXD_PORTAL_LIMITED); + start += idxd_get_wq_portal_offset(wq->id, IDXD_PORTAL_LIMITED, IDXD_IRQ_MSIX); wq->portal = devm_ioremap(dev, start, IDXD_PORTAL_SIZE); if (!wq->portal) diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 288e3fe15b3e..e5b90e6970aa 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -459,15 +459,10 @@ enum idxd_interrupt_type { IDXD_IRQ_IMS, }; -static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot) +static inline int idxd_get_wq_portal_offset(int wq_id, enum idxd_portal_prot prot, + enum idxd_interrupt_type irq_type) { - return prot * 0x1000; -} - -static inline int idxd_get_wq_portal_full_offset(int wq_id, - enum idxd_portal_prot prot) -{ - return ((wq_id * 4) << PAGE_SHIFT) + idxd_get_wq_portal_offset(prot); + return ((wq_id * 4) << PAGE_SHIFT) + prot * 0x1000 + irq_type * 0x2000; } static inline void idxd_wq_get(struct idxd_wq *wq) From patchwork Sat May 22 00:19:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12274211 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E8D4C4707A for ; Sat, 22 May 2021 00:19:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5EAB0613D6 for ; Sat, 22 May 2021 00:19:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230460AbhEVAVP (ORCPT ); Fri, 21 May 2021 20:21:15 -0400 Received: from mga02.intel.com ([134.134.136.20]:2497 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230453AbhEVAVB (ORCPT ); Fri, 21 May 2021 20:21:01 -0400 IronPort-SDR: ZST9NzzDWewFL+dJuRf36MAnlAW0nqIUSg7wQWLXbafANl37D84p4wTHAQs7Be9ALK4HnpPXek 7AY68KTau92g== X-IronPort-AV: E=McAfee;i="6200,9189,9991"; a="188728094" X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="188728094" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:19:37 -0700 IronPort-SDR: cI0DdXQZFyMKhXSMOe9LhovWqziGOmqOHS/1A1y191XgQEJdy3YUJKlmyG/H5syc1GqUKAdHRr Ou2XKxGfL1Jg== X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="628864430" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:19:36 -0700 Subject: [PATCH v6 05/20] vfio: mdev: common lib code for setting up Interrupt Message Store From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org, jgg@mellanox.com Cc: Jason Gunthorpe , megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 21 May 2021 17:19:36 -0700 Message-ID: <162164277624.261970.7989190254803052804.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> References: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add common helper code to setup IMS once the MSI domain has been setup by the device driver. The main helper function is mdev_ims_set_msix_trigger() that is called by the VFIO ioctl VFIO_DEVICE_SET_IRQS. The function deals with the setup and teardown of emulated and IMS backed eventfd that gets exported to the guest kernel via VFIO as MSIX vectors. Suggested-by: Jason Gunthorpe Signed-off-by: Dave Jiang --- drivers/vfio/mdev/Kconfig | 12 ++ drivers/vfio/mdev/Makefile | 3 drivers/vfio/mdev/mdev_irqs.c | 318 +++++++++++++++++++++++++++++++++++++++++ include/linux/mdev.h | 51 +++++++ 4 files changed, 384 insertions(+) create mode 100644 drivers/vfio/mdev/mdev_irqs.c diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig index 763c877a1318..82f79d99a7db 100644 --- a/drivers/vfio/mdev/Kconfig +++ b/drivers/vfio/mdev/Kconfig @@ -9,3 +9,15 @@ config VFIO_MDEV See Documentation/driver-api/vfio-mediated-device.rst for more details. If you don't know what do here, say N. + +config VFIO_MDEV_IRQS + bool "Mediated device driver common lib code for interrupts" + depends on VFIO_MDEV + select IMS_MSI_ARRAY + select IRQ_BYPASS_MANAGER + default n + help + Provide common library code to deal with IMS interrupts for mediated + devices. + + If you don't know what to do here, say N. diff --git a/drivers/vfio/mdev/Makefile b/drivers/vfio/mdev/Makefile index 7c236ba1b90e..c3f160cae192 100644 --- a/drivers/vfio/mdev/Makefile +++ b/drivers/vfio/mdev/Makefile @@ -2,4 +2,7 @@ mdev-y := mdev_core.o mdev_sysfs.o mdev_driver.o +mdev-$(CONFIG_VFIO_MDEV_IRQS) += mdev_irqs.o + obj-$(CONFIG_VFIO_MDEV) += mdev.o + diff --git a/drivers/vfio/mdev/mdev_irqs.c b/drivers/vfio/mdev/mdev_irqs.c new file mode 100644 index 000000000000..ed2d11a7c729 --- /dev/null +++ b/drivers/vfio/mdev/mdev_irqs.c @@ -0,0 +1,318 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Mediate device IMS library code + * + * Copyright (c) 2021 Intel Corp. All rights reserved. + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static irqreturn_t mdev_irq_handler(int irq, void *arg) +{ + struct eventfd_ctx *trigger = arg; + + eventfd_signal(trigger, 1); + return IRQ_HANDLED; +} + +/* + * Common helper routine to send signal to the eventfd that has been setup. + * + * @mdev_irq [in] : struct mdev_irq context + * @vector [in] : vector index for eventfd + * + * No return value. + */ +void mdev_msix_send_signal(struct mdev_device *mdev, int vector) +{ + struct mdev_irq *mdev_irq = &mdev->mdev_irq; + struct eventfd_ctx *trigger = mdev_irq->irq_entries[vector].trigger; + + if (!mdev_irq->irq_entries || !trigger) { + dev_warn(&mdev->dev, "EventFD %d trigger not setup, can't send!\n", vector); + return; + } + mdev_irq_handler(0, (void *)trigger); +} +EXPORT_SYMBOL_GPL(mdev_msix_send_signal); + +static int mdev_msix_set_vector_signal(struct mdev_irq *mdev_irq, int vector, int fd) +{ + int rc, irq; + struct mdev_device *mdev = irq_to_mdev(mdev_irq); + struct mdev_irq_entry *entry; + struct device *dev = &mdev->dev; + struct eventfd_ctx *trigger; + char *name; + bool pasid_en; + u32 auxval; + + if (vector < 0 || vector >= mdev_irq->num) + return -EINVAL; + + entry = &mdev_irq->irq_entries[vector]; + + if (entry->ims) + irq = dev_msi_irq_vector(dev, entry->ims_id); + else + irq = 0; + + pasid_en = mdev_irq->pasid != INVALID_IOASID ? true : false; + + /* IMS and invalid pasid is not a valid configuration */ + if (entry->ims && !pasid_en) + return -EINVAL; + + if (entry->trigger) { + if (irq) { + irq_bypass_unregister_producer(&entry->producer); + free_irq(irq, entry->trigger); + if (pasid_en) { + auxval = ims_ctrl_pasid_aux(0, false); + irq_set_auxdata(irq, IMS_AUXDATA_CONTROL_WORD, auxval); + } + } + kfree(entry->name); + eventfd_ctx_put(entry->trigger); + entry->trigger = NULL; + } + + if (fd < 0) + return 0; + + name = kasprintf(GFP_KERNEL, "vfio-mdev-irq[%d](%s)", vector, dev_name(dev)); + if (!name) + return -ENOMEM; + + trigger = eventfd_ctx_fdget(fd); + if (IS_ERR(trigger)) { + kfree(name); + return PTR_ERR(trigger); + } + + entry->name = name; + entry->trigger = trigger; + + if (!irq) + return 0; + + if (pasid_en) { + auxval = ims_ctrl_pasid_aux(mdev_irq->pasid, true); + rc = irq_set_auxdata(irq, IMS_AUXDATA_CONTROL_WORD, auxval); + if (rc < 0) + goto err; + } + + rc = request_irq(irq, mdev_irq_handler, 0, name, trigger); + if (rc < 0) + goto irq_err; + + entry->producer.token = trigger; + entry->producer.irq = irq; + rc = irq_bypass_register_producer(&entry->producer); + if (unlikely(rc)) { + dev_warn(dev, "irq bypass producer (token %p) registration fails: %d\n", + &entry->producer.token, rc); + entry->producer.token = NULL; + } + + return 0; + + irq_err: + if (pasid_en) { + auxval = ims_ctrl_pasid_aux(0, false); + irq_set_auxdata(irq, IMS_AUXDATA_CONTROL_WORD, auxval); + } + err: + kfree(name); + eventfd_ctx_put(trigger); + entry->trigger = NULL; + return rc; +} + +static int mdev_msix_set_vector_signals(struct mdev_irq *mdev_irq, unsigned int start, + unsigned int count, int *fds) +{ + int i, j, rc = 0; + + if (start >= mdev_irq->num || start + count > mdev_irq->num) + return -EINVAL; + + for (i = 0, j = start; j < count && !rc; i++, j++) { + int fd = fds ? fds[i] : -1; + + rc = mdev_msix_set_vector_signal(mdev_irq, j, fd); + } + + if (rc) { + for (--j; j >= (int)start; j--) + mdev_msix_set_vector_signal(mdev_irq, j, -1); + } + + return rc; +} + +static int mdev_msix_enable(struct mdev_irq *mdev_irq, int nvec) +{ + struct mdev_device *mdev = irq_to_mdev(mdev_irq); + struct device *dev; + int rc; + + if (nvec != mdev_irq->num) + return -EINVAL; + + if (mdev_irq->ims_num) { + dev = &mdev->dev; + rc = msi_domain_alloc_irqs(dev_get_msi_domain(dev), dev, mdev_irq->ims_num); + if (rc < 0) + return rc; + } + + mdev_irq->irq_type = VFIO_PCI_MSIX_IRQ_INDEX; + return 0; +} + +static int mdev_msix_disable(struct mdev_irq *mdev_irq) +{ + struct mdev_device *mdev = irq_to_mdev(mdev_irq); + struct device *dev = &mdev->dev; + struct irq_domain *irq_domain; + + mdev_msix_set_vector_signals(mdev_irq, 0, mdev_irq->num, NULL); + irq_domain = dev_get_msi_domain(&mdev->dev); + if (irq_domain) + msi_domain_free_irqs(irq_domain, dev); + mdev_irq->irq_type = VFIO_PCI_NUM_IRQS; + return 0; +} + +/* + * Common helper function that sets up the MSIX vectors for the mdev device that are + * Interrupt Message Store (IMS) backed. Certain mdev devices can have the first + * vector emulated rather than backed by IMS. + * + * @mdev [in] : mdev device + * @index [in] : type of VFIO vectors to setup + * @start [in] : start position of the vector index + * @count [in] : number of vectors + * @flags [in] : VFIO_IRQ action to be taken + * @data [in] : data accompanied for the call + * Return error code on failure or 0 on success. + */ + +int mdev_set_msix_trigger(struct mdev_device *mdev, unsigned int index, + unsigned int start, unsigned int count, u32 flags, + void *data) +{ + struct mdev_irq *mdev_irq = &mdev->mdev_irq; + int i, rc = 0; + + if (count > mdev_irq->num) + count = mdev_irq->num; + + if (!count && (flags & VFIO_IRQ_SET_DATA_NONE)) { + mdev_msix_disable(mdev_irq); + return 0; + } + + if (flags & VFIO_IRQ_SET_DATA_EVENTFD) { + int *fds = data; + + if (mdev_irq->irq_type == index) + return mdev_msix_set_vector_signals(mdev_irq, start, count, fds); + + rc = mdev_msix_enable(mdev_irq, start + count); + if (rc < 0) + return rc; + + rc = mdev_msix_set_vector_signals(mdev_irq, start, count, fds); + if (rc < 0) + mdev_msix_disable(mdev_irq); + + return rc; + } + + if (start + count > mdev_irq->num) + return -EINVAL; + + for (i = start; i < start + count; i++) { + if (!mdev_irq->irq_entries[i].trigger) + continue; + if (flags & VFIO_IRQ_SET_DATA_NONE) { + eventfd_signal(mdev_irq->irq_entries[i].trigger, 1); + } else if (flags & VFIO_IRQ_SET_DATA_BOOL) { + u8 *bools = data; + + if (bools[i - start]) + eventfd_signal(mdev_irq->irq_entries[i].trigger, 1); + } + } + return 0; +} +EXPORT_SYMBOL_GPL(mdev_set_msix_trigger); + +void mdev_irqs_set_pasid(struct mdev_device *mdev, u32 pasid) +{ + mdev->mdev_irq.pasid = pasid; +} +EXPORT_SYMBOL_GPL(mdev_irqs_set_pasid); + +/* + * Initialize and setup the mdev_irq context under mdev. + * + * @mdev [in] : mdev device + * @num [in] : number of vectors + * @ims_map [in] : bool array that indicates whether a guest MSIX vector is + * backed by an IMS vector or emulated + * Return error code on failure or 0 on success. + */ +int mdev_irqs_init(struct mdev_device *mdev, int num, bool *ims_map) +{ + struct mdev_irq *mdev_irq = &mdev->mdev_irq; + int i; + + if (num < 1) + return -EINVAL; + + mdev_irq->irq_type = VFIO_PCI_NUM_IRQS; + mdev_irq->num = num; + mdev_irq->pasid = INVALID_IOASID; + + mdev_irq->irq_entries = kcalloc(num, sizeof(*mdev_irq->irq_entries), GFP_KERNEL); + if (!mdev_irq->irq_entries) + return -ENOMEM; + + for (i = 0; i < num; i++) { + mdev_irq->irq_entries[i].ims = ims_map[i]; + if (ims_map[i]) { + mdev_irq->irq_entries[i].ims_id = mdev_irq->ims_num; + mdev_irq->ims_num++; + } + } + + return 0; +} +EXPORT_SYMBOL_GPL(mdev_irqs_init); + +/* + * Free allocated memory in mdev_irq + * + * @mdev [in] : mdev device + */ +void mdev_irqs_free(struct mdev_device *mdev) +{ + kfree(mdev->mdev_irq.irq_entries); + memset(&mdev->mdev_irq, 0, sizeof(mdev->mdev_irq)); +} +EXPORT_SYMBOL_GPL(mdev_irqs_free); diff --git a/include/linux/mdev.h b/include/linux/mdev.h index 0cd8db2d3422..035c021e8068 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -10,8 +10,26 @@ #ifndef MDEV_H #define MDEV_H +#include + struct mdev_type; +struct mdev_irq_entry { + struct eventfd_ctx *trigger; + struct irq_bypass_producer producer; + char *name; + bool ims; + int ims_id; +}; + +struct mdev_irq { + struct mdev_irq_entry *irq_entries; + int num; + int ims_num; + int irq_type; + int pasid; +}; + struct mdev_device { struct device dev; guid_t uuid; @@ -19,8 +37,14 @@ struct mdev_device { struct mdev_type *type; struct device *iommu_device; struct mutex creation_lock; + struct mdev_irq mdev_irq; }; +static inline struct mdev_device *irq_to_mdev(struct mdev_irq *mdev_irq) +{ + return container_of(mdev_irq, struct mdev_device, mdev_irq); +} + static inline struct mdev_device *to_mdev_device(struct device *dev) { return container_of(dev, struct mdev_device, dev); @@ -99,4 +123,31 @@ static inline struct mdev_device *mdev_from_dev(struct device *dev) return dev->bus == &mdev_bus_type ? to_mdev_device(dev) : NULL; } +#if IS_ENABLED(CONFIG_VFIO_MDEV_IRQS) +int mdev_set_msix_trigger(struct mdev_device *mdev, unsigned int index, + unsigned int start, unsigned int count, u32 flags, + void *data); +void mdev_msix_send_signal(struct mdev_device *mdev, int vector); +int mdev_irqs_init(struct mdev_device *mdev, int num, bool *ims_map); +void mdev_irqs_free(struct mdev_device *mdev); +void mdev_irqs_set_pasid(struct mdev_device *mdev, u32 pasid); +#else +static inline int mdev_set_msix_trigger(struct mdev_device *mdev, unsigned int index, + unsigned int start, unsigned int count, u32 flags, + void *data) +{ + return -EOPNOTSUPP; +} + +void mdev_msix_send_signal(struct mdev_device *mdev, int vector) {} + +static inline int mdev_irqs_init(struct mdev_device *mdev, int num, bool *ims_map) +{ + return -EOPNOTSUPP; +} + +void mdev_irqs_free(struct mdev_device *mdev) {} +void mdev_irqs_set_pasid(struct mdev_device *mdev, u32 pasid) {} +#endif /* CONFIG_VFIO_MDEV_IMS */ + #endif /* MDEV_H */ From patchwork Sat May 22 00:19:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12274221 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE450C47080 for ; Sat, 22 May 2021 00:20:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B1138613E4 for ; Sat, 22 May 2021 00:20:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231186AbhEVAVv (ORCPT ); Fri, 21 May 2021 20:21:51 -0400 Received: from mga18.intel.com ([134.134.136.126]:45248 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230478AbhEVAVe (ORCPT ); Fri, 21 May 2021 20:21:34 -0400 IronPort-SDR: u9yCGp7K0GzHsDPNXF0DqAYkhZccbhmVSiE9Dx8NMSvcLNMOcG+38ntPjw8/cmMeBoD4VrHqXs bMW7TYxCfQHw== X-IronPort-AV: E=McAfee;i="6200,9189,9991"; a="188993218" X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="188993218" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:19:43 -0700 IronPort-SDR: Eb8/oGmG+cGzbFFUPLMwo9QQ3/Q57LR7j5KjEN5sYmxbVE4jhbN/OHWHLurAgK0xSig/VuNAGN 1rAhe+UzdWjQ== X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="631991805" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:19:42 -0700 Subject: [PATCH v6 06/20] vfio/mdev: idxd: add PCI config for read/write for mdev From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org, jgg@mellanox.com Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 21 May 2021 17:19:42 -0700 Message-ID: <162164278223.261970.13253077604552916351.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> References: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The mediated device will emulate the PCI config access (read/write) from the guest. Add PCI config read/write functions to support the config read/write accesses from the guest. Signed-off-by: Dave Jiang --- drivers/dma/idxd/registers.h | 4 + drivers/vfio/mdev/Kconfig | 9 + drivers/vfio/mdev/Makefile | 1 drivers/vfio/mdev/idxd/Makefile | 4 + drivers/vfio/mdev/idxd/mdev.h | 76 ++++++++++++ drivers/vfio/mdev/idxd/vdev.c | 255 +++++++++++++++++++++++++++++++++++++++ include/uapi/linux/idxd.h | 1 7 files changed, 350 insertions(+) create mode 100644 drivers/vfio/mdev/idxd/Makefile create mode 100644 drivers/vfio/mdev/idxd/mdev.h create mode 100644 drivers/vfio/mdev/idxd/vdev.c diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h index c970c3f025f0..c699c72bd8d2 100644 --- a/drivers/dma/idxd/registers.h +++ b/drivers/dma/idxd/registers.h @@ -155,6 +155,7 @@ enum idxd_device_reset_type { #define IDXD_INTC_CMD 0x02 #define IDXD_INTC_OCCUPY 0x04 #define IDXD_INTC_PERFMON_OVFL 0x08 +#define IDXD_INTC_HALT 0x10 #define IDXD_CMD_OFFSET 0xa0 union idxd_command_reg { @@ -279,6 +280,9 @@ union msix_perm { u32 bits; } __packed; +#define MSIX_ENTRY_MASK_INT 0x1 +#define MSIX_ENTRY_CTRL_BYTE 12 + union group_flags { struct { u32 tc_a:3; diff --git a/drivers/vfio/mdev/Kconfig b/drivers/vfio/mdev/Kconfig index 82f79d99a7db..57f89457e9a7 100644 --- a/drivers/vfio/mdev/Kconfig +++ b/drivers/vfio/mdev/Kconfig @@ -21,3 +21,12 @@ config VFIO_MDEV_IRQS devices. If you don't know what to do here, say N. + +config VFIO_MDEV_IDXD + tristate "VFIO Mediated device driver for Intel IDXD" + depends on VFIO && VFIO_MDEV && X86_64 + select VFIO_MDEV_IMS + select IMS_MSI_ARRAY + default n + help + VFIO based mediated device driver for Intel Accelerator Devices driver. diff --git a/drivers/vfio/mdev/Makefile b/drivers/vfio/mdev/Makefile index c3f160cae192..7e3f5fae4bf1 100644 --- a/drivers/vfio/mdev/Makefile +++ b/drivers/vfio/mdev/Makefile @@ -6,3 +6,4 @@ mdev-$(CONFIG_VFIO_MDEV_IRQS) += mdev_irqs.o obj-$(CONFIG_VFIO_MDEV) += mdev.o +obj-$(CONFIG_VFIO_MDEV_IDXD) += idxd/ diff --git a/drivers/vfio/mdev/idxd/Makefile b/drivers/vfio/mdev/idxd/Makefile new file mode 100644 index 000000000000..ccd3bc1c7ab6 --- /dev/null +++ b/drivers/vfio/mdev/idxd/Makefile @@ -0,0 +1,4 @@ +ccflags-y += -I$(srctree)/drivers/dma/idxd -DDEFAULT_SYMBOL_NAMESPACE=IDXD + +obj-$(CONFIG_VFIO_MDEV_IDXD) += idxd_mdev.o +idxd_mdev-y := vdev.o diff --git a/drivers/vfio/mdev/idxd/mdev.h b/drivers/vfio/mdev/idxd/mdev.h new file mode 100644 index 000000000000..120c2dc29ba7 --- /dev/null +++ b/drivers/vfio/mdev/idxd/mdev.h @@ -0,0 +1,76 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2020 Intel Corporation. All rights rsvd. */ + +#ifndef _IDXD_MDEV_H_ +#define _IDXD_MDEV_H_ + +/* two 64-bit BARs implemented */ +#define VIDXD_MAX_BARS 2 +#define VIDXD_MAX_CFG_SPACE_SZ 4096 +#define VIDXD_MAX_MMIO_SPACE_SZ 8192 +#define VIDXD_MSIX_TBL_SZ_OFFSET 0x42 +#define VIDXD_CAP_CTRL_SZ 0x100 +#define VIDXD_GRP_CTRL_SZ 0x100 +#define VIDXD_WQ_CTRL_SZ 0x100 +#define VIDXD_WQ_OCPY_INT_SZ 0x20 +#define VIDXD_MSIX_TBL_SZ 0x90 +#define VIDXD_MSIX_PERM_TBL_SZ 0x48 + +#define VIDXD_MSIX_PERM_OFFSET 0x300 +#define VIDXD_GRPCFG_OFFSET 0x400 +#define VIDXD_WQCFG_OFFSET 0x500 +#define VIDXD_MSIX_TABLE_OFFSET 0x600 +#define VIDXD_MSIX_PBA_OFFSET 0x700 +#define VIDXD_IMS_OFFSET 0x1000 + +#define VIDXD_BAR0_SIZE 0x2000 +#define VIDXD_BAR2_SIZE 0x2000 +#define VIDXD_MAX_MSIX_VECS 2 +#define VIDXD_MAX_MSIX_ENTRIES VIDXD_MAX_MSIX_VECS +#define VIDXD_MAX_WQS 1 + +struct vdcm_idxd { + struct idxd_device *idxd; + struct idxd_wq *wq; + struct mdev_device *mdev; + int num_wqs; + + u64 bar_val[VIDXD_MAX_BARS]; + u64 bar_size[VIDXD_MAX_BARS]; + u8 cfg[VIDXD_MAX_CFG_SPACE_SZ]; + u8 bar0[VIDXD_MAX_MMIO_SPACE_SZ]; + struct mutex dev_lock; /* lock for vidxd resources */ +}; + +static inline u64 get_reg_val(void *buf, int size) +{ + u64 val = 0; + + switch (size) { + case 8: + val = *(u64 *)buf; + break; + case 4: + val = *(u32 *)buf; + break; + case 2: + val = *(u16 *)buf; + break; + case 1: + val = *(u8 *)buf; + break; + } + + return val; +} + +static inline u8 vidxd_state(struct vdcm_idxd *vidxd) +{ + union gensts_reg *gensts = (union gensts_reg *)(vidxd->bar0 + IDXD_GENSTATS_OFFSET); + + return gensts->state; +} + +int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count); +int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size); +#endif diff --git a/drivers/vfio/mdev/idxd/vdev.c b/drivers/vfio/mdev/idxd/vdev.c new file mode 100644 index 000000000000..aca4a1228a97 --- /dev/null +++ b/drivers/vfio/mdev/idxd/vdev.c @@ -0,0 +1,255 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "registers.h" +#include "idxd.h" +#include "mdev.h" + +void vidxd_send_interrupt(struct vdcm_idxd *vidxd, int vector) +{ + struct mdev_device *mdev = vidxd->mdev; + u8 *bar0 = vidxd->bar0; + u8 *msix_entry = &bar0[VIDXD_MSIX_TABLE_OFFSET + vector * 0x10]; + u64 *pba = (u64 *)(bar0 + VIDXD_MSIX_PBA_OFFSET); + u8 ctrl; + + ctrl = msix_entry[MSIX_ENTRY_CTRL_BYTE]; + if (ctrl & MSIX_ENTRY_MASK_INT) + set_bit(vector, (unsigned long *)pba); + else + mdev_msix_send_signal(mdev, vector); +} + +static void vidxd_set_swerr(struct vdcm_idxd *vidxd, unsigned int error) +{ + union sw_err_reg *swerr = (union sw_err_reg *)(vidxd->bar0 + IDXD_SWERR_OFFSET); + + if (!swerr->valid) { + memset(swerr, 0, sizeof(*swerr)); + swerr->valid = 1; + swerr->error = error; + } else if (!swerr->overflow) { + swerr->overflow = 1; + } +} + +static inline void send_swerr_interrupt(struct vdcm_idxd *vidxd) +{ + union genctrl_reg *genctrl = (union genctrl_reg *)(vidxd->bar0 + IDXD_GENCTRL_OFFSET); + u32 *intcause = (u32 *)(vidxd->bar0 + IDXD_INTCAUSE_OFFSET); + + if (!genctrl->softerr_int_en) + return; + + *intcause |= IDXD_INTC_ERR; + vidxd_send_interrupt(vidxd, 0); +} + +static inline void send_halt_interrupt(struct vdcm_idxd *vidxd) +{ + union genctrl_reg *genctrl = (union genctrl_reg *)(vidxd->bar0 + IDXD_GENCTRL_OFFSET); + u32 *intcause = (u32 *)(vidxd->bar0 + IDXD_INTCAUSE_OFFSET); + + if (!genctrl->halt_int_en) + return; + + *intcause |= IDXD_INTC_HALT; + vidxd_send_interrupt(vidxd, 0); +} + +static void vidxd_report_pci_error(struct vdcm_idxd *vidxd) +{ + union gensts_reg *gensts = (union gensts_reg *)(vidxd->bar0 + IDXD_GENSTATS_OFFSET); + + vidxd_set_swerr(vidxd, DSA_ERR_PCI_CFG); + /* set device to halt */ + gensts->reset_type = IDXD_DEVICE_RESET_FLR; + gensts->state = IDXD_DEVICE_STATE_HALT; + + send_halt_interrupt(vidxd); +} + +int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count) +{ + u32 offset = pos & 0xfff; + struct device *dev = &vidxd->mdev->dev; + + memcpy(buf, &vidxd->cfg[offset], count); + + dev_dbg(dev, "vidxd pci R %d %x %x: %llx\n", + vidxd->wq->id, count, offset, get_reg_val(buf, count)); + + return 0; +} + +/* + * Much of the emulation code has been borrowed from Intel i915 cfg space + * emulation code. + * drivers/gpu/drm/i915/gvt/cfg_space.c: + */ + +/* + * Bitmap for writable bits (RW or RW1C bits, but cannot co-exist in one + * byte) byte by byte in standard pci configuration space. (not the full + * 256 bytes.) + */ +static const u8 pci_cfg_space_rw_bmp[PCI_INTERRUPT_LINE + 4] = { + [PCI_COMMAND] = 0xff, 0x07, + [PCI_STATUS] = 0x00, 0xf9, /* the only one RW1C byte */ + [PCI_CACHE_LINE_SIZE] = 0xff, + [PCI_BASE_ADDRESS_0 ... PCI_CARDBUS_CIS - 1] = 0xff, + [PCI_ROM_ADDRESS] = 0x01, 0xf8, 0xff, 0xff, + [PCI_INTERRUPT_LINE] = 0xff, +}; + +static void _pci_cfg_mem_write(struct vdcm_idxd *vidxd, unsigned int off, u8 *src, + unsigned int bytes) +{ + u8 *cfg_base = vidxd->cfg; + u8 mask, new, old; + int i = 0; + + for (; i < bytes && (off + i < sizeof(pci_cfg_space_rw_bmp)); i++) { + mask = pci_cfg_space_rw_bmp[off + i]; + old = cfg_base[off + i]; + new = src[i] & mask; + + /** + * The PCI_STATUS high byte has RW1C bits, here + * emulates clear by writing 1 for these bits. + * Writing a 0b to RW1C bits has no effect. + */ + if (off + i == PCI_STATUS + 1) + new = (~new & old) & mask; + + cfg_base[off + i] = (old & ~mask) | new; + } + + /* For other configuration space directly copy as it is. */ + if (i < bytes) + memcpy(cfg_base + off + i, src + i, bytes - i); +} + +static inline void _write_pci_bar(struct vdcm_idxd *vidxd, u32 offset, u32 val, bool low) +{ + u32 *pval; + + /* BAR offset should be 32 bits algiend */ + offset = rounddown(offset, 4); + pval = (u32 *)(vidxd->cfg + offset); + + if (low) { + /* + * only update bit 31 - bit 4, + * leave the bit 3 - bit 0 unchanged. + */ + *pval = (val & GENMASK(31, 4)) | (*pval & GENMASK(3, 0)); + } else { + *pval = val; + } +} + +static int _pci_cfg_bar_write(struct vdcm_idxd *vidxd, unsigned int offset, void *p_data, + unsigned int bytes) +{ + u32 new = *(u32 *)(p_data); + bool lo = IS_ALIGNED(offset, 8); + u64 size; + unsigned int bar_id; + + /* + * Power-up software can determine how much address + * space the device requires by writing a value of + * all 1's to the register and then reading the value + * back. The device will return 0's in all don't-care + * address bits. + */ + if (new == 0xffffffff) { + switch (offset) { + case PCI_BASE_ADDRESS_0: + case PCI_BASE_ADDRESS_1: + case PCI_BASE_ADDRESS_2: + case PCI_BASE_ADDRESS_3: + bar_id = (offset - PCI_BASE_ADDRESS_0) / 8; + size = vidxd->bar_size[bar_id]; + _write_pci_bar(vidxd, offset, size >> (lo ? 0 : 32), lo); + break; + default: + /* Unimplemented BARs */ + _write_pci_bar(vidxd, offset, 0x0, false); + } + } else { + switch (offset) { + case PCI_BASE_ADDRESS_0: + case PCI_BASE_ADDRESS_1: + case PCI_BASE_ADDRESS_2: + case PCI_BASE_ADDRESS_3: + _write_pci_bar(vidxd, offset, new, lo); + break; + default: + break; + } + } + return 0; +} + +int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size) +{ + struct device *dev = &vidxd->idxd->pdev->dev; + + if (size > 4) + return -EINVAL; + + if (pos + size > VIDXD_MAX_CFG_SPACE_SZ) + return -EINVAL; + + dev_dbg(dev, "vidxd pci W %d %x %x: %llx\n", vidxd->wq->id, size, pos, + get_reg_val(buf, size)); + + /* First check if it's PCI_COMMAND */ + if (IS_ALIGNED(pos, 2) && pos == PCI_COMMAND) { + bool new_bme; + bool bme; + + if (size > 2) + return -EINVAL; + + new_bme = !!(get_reg_val(buf, 2) & PCI_COMMAND_MASTER); + bme = !!(vidxd->cfg[pos] & PCI_COMMAND_MASTER); + _pci_cfg_mem_write(vidxd, pos, buf, size); + + /* Flag error if turning off BME while device is enabled */ + if ((bme && !new_bme) && vidxd_state(vidxd) == IDXD_DEVICE_STATE_ENABLED) + vidxd_report_pci_error(vidxd); + return 0; + } + + switch (pos) { + case PCI_BASE_ADDRESS_0 ... PCI_BASE_ADDRESS_5: + if (!IS_ALIGNED(pos, 4)) + return -EINVAL; + return _pci_cfg_bar_write(vidxd, pos, buf, size); + + default: + _pci_cfg_mem_write(vidxd, pos, buf, size); + } + return 0; +} + +MODULE_LICENSE("GPL v2"); diff --git a/include/uapi/linux/idxd.h b/include/uapi/linux/idxd.h index e33997b4d750..751f6107217c 100644 --- a/include/uapi/linux/idxd.h +++ b/include/uapi/linux/idxd.h @@ -89,6 +89,7 @@ enum dsa_completion_status { DSA_COMP_HW_ERR1, DSA_COMP_HW_ERR_DRB, DSA_COMP_TRANSLATION_FAIL, + DSA_ERR_PCI_CFG = 0x51, }; enum iax_completion_status { From patchwork Sat May 22 00:19:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12274215 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 343DCC04FF3 for ; Sat, 22 May 2021 00:20:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 13E3A613D0 for ; Sat, 22 May 2021 00:20:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230349AbhEVAVh (ORCPT ); Fri, 21 May 2021 20:21:37 -0400 Received: from mga09.intel.com ([134.134.136.24]:14563 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230307AbhEVAV0 (ORCPT ); Fri, 21 May 2021 20:21:26 -0400 IronPort-SDR: xuaaiCVCQHcECNj1GmntMbHu3HHpD3zNeR9LcfPTgL0fViJkAdCupGRuXlNHPvLlrmfb1SKiXq qvqJl68HspMw== X-IronPort-AV: E=McAfee;i="6200,9189,9991"; a="201634347" X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="201634347" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:19:49 -0700 IronPort-SDR: Jd8vnF4rAAbm1EENdjuI99XAByddyCnqpuRv/rEK8ILDHzmZWB0rR1savcn1v7WG1y30t1c2+1 7m+Teg0QNK+w== X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="474752725" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:19:48 -0700 Subject: [PATCH v6 07/20] vfio/mdev: idxd: Add administrative commands emulation for mdev From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org, jgg@mellanox.com Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 21 May 2021 17:19:48 -0700 Message-ID: <162164278827.261970.17985121038264068736.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> References: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Administrative commands are issued to the command register on the accelerator device. For the mediated device, the MMIO path is emulated. Add the command emulation support functions for the mdev. Signed-off-by: Dave Jiang --- drivers/dma/idxd/Makefile | 2 drivers/dma/idxd/device.c | 81 ++++++++ drivers/dma/idxd/idxd.h | 7 + drivers/dma/idxd/registers.h | 12 + drivers/vfio/mdev/idxd/Makefile | 2 drivers/vfio/mdev/idxd/mdev.c | 44 ++++ drivers/vfio/mdev/idxd/mdev.h | 5 drivers/vfio/mdev/idxd/vdev.c | 406 +++++++++++++++++++++++++++++++++++++++ 8 files changed, 555 insertions(+), 4 deletions(-) create mode 100644 drivers/vfio/mdev/idxd/mdev.c diff --git a/drivers/dma/idxd/Makefile b/drivers/dma/idxd/Makefile index 6d11558756f8..4d5352b1b5ce 100644 --- a/drivers/dma/idxd/Makefile +++ b/drivers/dma/idxd/Makefile @@ -1,3 +1,5 @@ +ccflags-y += -DDEFAULT_SYMBOL_NAMESPACE=IDXD + obj-$(CONFIG_INTEL_IDXD) += idxd.o idxd-y := init.o irq.o device.o sysfs.o submit.o dma.o cdev.o diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index 02e9a050b5bb..99542c9cbc47 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -233,6 +233,7 @@ int idxd_wq_enable(struct idxd_wq *wq) dev_dbg(dev, "WQ %d enabled\n", wq->id); return 0; } +EXPORT_SYMBOL_GPL(idxd_wq_enable); int idxd_wq_disable(struct idxd_wq *wq) { @@ -259,6 +260,7 @@ int idxd_wq_disable(struct idxd_wq *wq) dev_dbg(dev, "WQ %d disabled\n", wq->id); return 0; } +EXPORT_SYMBOL_GPL(idxd_wq_disable); void idxd_wq_drain(struct idxd_wq *wq) { @@ -329,7 +331,31 @@ void idxd_wqs_unmap_portal(struct idxd_device *idxd) } } -int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid) +int idxd_wq_abort(struct idxd_wq *wq) +{ + struct idxd_device *idxd = wq->idxd; + struct device *dev = &idxd->pdev->dev; + u32 operand, stat; + + if (wq->state != IDXD_WQ_ENABLED) { + dev_dbg(dev, "WQ %d not active\n", wq->id); + return -ENXIO; + } + + operand = BIT(wq->id % 16) | ((wq->id / 16) << 16); + dev_dbg(dev, "cmd: %u (wq abort) operand: %#x\n", IDXD_CMD_ABORT_WQ, operand); + idxd_cmd_exec(idxd, IDXD_CMD_ABORT_WQ, operand, &stat); + + if (stat != IDXD_CMDSTS_SUCCESS) { + dev_dbg(dev, "WQ abort failed: %#x\n", stat); + return -ENXIO; + } + + return 0; +} +EXPORT_SYMBOL_GPL(idxd_wq_abort); + +int idxd_wq_set_pasid(struct idxd_wq *wq, u32 pasid) { struct idxd_device *idxd = wq->idxd; int rc; @@ -425,6 +451,48 @@ void idxd_wq_quiesce(struct idxd_wq *wq) percpu_ref_exit(&wq->wq_active); } +void idxd_wq_setup_pasid(struct idxd_wq *wq, int pasid) +{ + struct idxd_device *idxd = wq->idxd; + int offset; + + lockdep_assert_held(&idxd->dev_lock); + + /* PASID fields are 8 bytes into the WQCFG register */ + offset = WQCFG_OFFSET(idxd, wq->id, WQCFG_PASID_IDX); + wq->wqcfg->pasid_en = 1; + wq->wqcfg->pasid = pasid; + iowrite32(wq->wqcfg->bits[WQCFG_PASID_IDX], idxd->reg_base + offset); +} +EXPORT_SYMBOL_GPL(idxd_wq_setup_pasid); + +void idxd_wq_clear_pasid(struct idxd_wq *wq) +{ + struct idxd_device *idxd = wq->idxd; + int offset; + + lockdep_assert_held(&idxd->dev_lock); + offset = WQCFG_OFFSET(idxd, wq->id, WQCFG_PASID_IDX); + wq->wqcfg->pasid = 0; + wq->wqcfg->pasid_en = 0; + iowrite32(wq->wqcfg->bits[WQCFG_PASID_IDX], idxd->reg_base + offset); +} +EXPORT_SYMBOL_GPL(idxd_wq_clear_pasid); + +void idxd_wq_setup_priv(struct idxd_wq *wq, int priv) +{ + struct idxd_device *idxd = wq->idxd; + int offset; + + lockdep_assert_held(&idxd->dev_lock); + + /* priv field is 8 bytes into the WQCFG register */ + offset = WQCFG_OFFSET(idxd, wq->id, WQCFG_PRIV_IDX); + wq->wqcfg->priv = !!priv; + iowrite32(wq->wqcfg->bits[WQCFG_PRIV_IDX], idxd->reg_base + offset); +} +EXPORT_SYMBOL_GPL(idxd_wq_setup_priv); + /* Device control bits */ static inline bool idxd_is_enabled(struct idxd_device *idxd) { @@ -613,6 +681,17 @@ void idxd_device_drain_pasid(struct idxd_device *idxd, int pasid) dev_dbg(dev, "pasid %d drained\n", pasid); } +void idxd_device_abort_pasid(struct idxd_device *idxd, int pasid) +{ + struct device *dev = &idxd->pdev->dev; + u32 operand; + + operand = pasid; + dev_dbg(dev, "cmd: %u operand: %#x\n", IDXD_CMD_ABORT_PASID, operand); + idxd_cmd_exec(idxd, IDXD_CMD_ABORT_PASID, operand, NULL); + dev_dbg(dev, "pasid %d aborted\n", pasid); +} + int idxd_device_request_int_handle(struct idxd_device *idxd, int idx, int *handle, enum idxd_interrupt_type irq_type) { diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index e5b90e6970aa..34ffa6dad53a 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -529,6 +529,7 @@ void idxd_device_cleanup(struct idxd_device *idxd); int idxd_device_config(struct idxd_device *idxd); void idxd_device_wqs_clear_state(struct idxd_device *idxd); void idxd_device_drain_pasid(struct idxd_device *idxd, int pasid); +void idxd_device_abort_pasid(struct idxd_device *idxd, int pasid); int idxd_device_load_config(struct idxd_device *idxd); int idxd_device_request_int_handle(struct idxd_device *idxd, int idx, int *handle, enum idxd_interrupt_type irq_type); @@ -546,10 +547,14 @@ void idxd_wq_reset(struct idxd_wq *wq); int idxd_wq_map_portal(struct idxd_wq *wq); void idxd_wq_unmap_portal(struct idxd_wq *wq); void idxd_wq_disable_cleanup(struct idxd_wq *wq); -int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid); +int idxd_wq_set_pasid(struct idxd_wq *wq, u32 pasid); int idxd_wq_disable_pasid(struct idxd_wq *wq); void idxd_wq_quiesce(struct idxd_wq *wq); int idxd_wq_init_percpu_ref(struct idxd_wq *wq); +int idxd_wq_abort(struct idxd_wq *wq); +void idxd_wq_setup_pasid(struct idxd_wq *wq, int pasid); +void idxd_wq_clear_pasid(struct idxd_wq *wq); +void idxd_wq_setup_priv(struct idxd_wq *wq, int priv); /* submission */ int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc); diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h index c699c72bd8d2..c2d558e37baf 100644 --- a/drivers/dma/idxd/registers.h +++ b/drivers/dma/idxd/registers.h @@ -167,6 +167,7 @@ union idxd_command_reg { }; u32 bits; } __packed; +#define IDXD_CMD_INT_MASK 0x80000000 enum idxd_cmd { IDXD_CMD_ENABLE_DEVICE = 1, @@ -233,6 +234,7 @@ enum idxd_cmdsts_err { /* request interrupt handle */ IDXD_CMDSTS_ERR_INVAL_INT_IDX = 0x41, IDXD_CMDSTS_ERR_NO_HANDLE, + IDXD_CMDSTS_ERR_INVAL_INT_IDX_RELEASE, }; #define IDXD_CMDCAP_OFFSET 0xb0 @@ -352,8 +354,16 @@ union wqcfg { u32 bits[8]; } __packed; -#define WQCFG_PASID_IDX 2 +enum idxd_wq_hw_state { + IDXD_WQ_DEV_DISABLED = 0, + IDXD_WQ_DEV_ENABLED, + IDXD_WQ_DEV_BUSY, +}; +#define WQCFG_PASID_IDX 2 +#define WQCFG_PRIV_IDX 2 +#define WQCFG_MODE_DEDICATED 1 +#define WQCFG_MODE_SHARED 0 /* * This macro calculates the offset into the WQCFG register * idxd - struct idxd * diff --git a/drivers/vfio/mdev/idxd/Makefile b/drivers/vfio/mdev/idxd/Makefile index ccd3bc1c7ab6..27a08621d120 100644 --- a/drivers/vfio/mdev/idxd/Makefile +++ b/drivers/vfio/mdev/idxd/Makefile @@ -1,4 +1,4 @@ ccflags-y += -I$(srctree)/drivers/dma/idxd -DDEFAULT_SYMBOL_NAMESPACE=IDXD obj-$(CONFIG_VFIO_MDEV_IDXD) += idxd_mdev.o -idxd_mdev-y := vdev.o +idxd_mdev-y := mdev.o vdev.o diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c new file mode 100644 index 000000000000..90ff7cedb8b4 --- /dev/null +++ b/drivers/vfio/mdev/idxd/mdev.c @@ -0,0 +1,44 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2021 Intel Corporation. All rights rsvd. */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "registers.h" +#include "idxd.h" +#include "mdev.h" + +int idxd_mdev_get_pasid(struct mdev_device *mdev, struct vfio_device *vdev, u32 *pasid) +{ + struct vfio_group *vfio_group = vdev->group; + struct iommu_domain *iommu_domain; + struct device *iommu_device = mdev_get_iommu_device(mdev); + int rc; + + iommu_domain = vfio_group_iommu_domain(vfio_group); + if (IS_ERR_OR_NULL(iommu_domain)) + return -ENODEV; + + rc = iommu_aux_get_pasid(iommu_domain, iommu_device); + if (rc < 0) + return -ENODEV; + + *pasid = (u32)rc; + return 0; +} + +MODULE_IMPORT_NS(IDXD); +MODULE_LICENSE("GPL v2"); diff --git a/drivers/vfio/mdev/idxd/mdev.h b/drivers/vfio/mdev/idxd/mdev.h index 120c2dc29ba7..e52b50760ee7 100644 --- a/drivers/vfio/mdev/idxd/mdev.h +++ b/drivers/vfio/mdev/idxd/mdev.h @@ -30,6 +30,7 @@ #define VIDXD_MAX_WQS 1 struct vdcm_idxd { + struct vfio_device vdev; struct idxd_device *idxd; struct idxd_wq *wq; struct mdev_device *mdev; @@ -71,6 +72,10 @@ static inline u8 vidxd_state(struct vdcm_idxd *vidxd) return gensts->state; } +int idxd_mdev_get_pasid(struct mdev_device *mdev, struct vfio_device *vdev, u32 *pasid); + +void vidxd_reset(struct vdcm_idxd *vidxd); + int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count); int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size); #endif diff --git a/drivers/vfio/mdev/idxd/vdev.c b/drivers/vfio/mdev/idxd/vdev.c index aca4a1228a97..4ead50947047 100644 --- a/drivers/vfio/mdev/idxd/vdev.c +++ b/drivers/vfio/mdev/idxd/vdev.c @@ -252,4 +252,410 @@ int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsign return 0; } +static void idxd_complete_command(struct vdcm_idxd *vidxd, enum idxd_cmdsts_err val) +{ + u8 *bar0 = vidxd->bar0; + u32 *cmd = (u32 *)(bar0 + IDXD_CMD_OFFSET); + u32 *cmdsts = (u32 *)(bar0 + IDXD_CMDSTS_OFFSET); + u32 *intcause = (u32 *)(bar0 + IDXD_INTCAUSE_OFFSET); + struct device *dev = &vidxd->mdev->dev; + + *cmdsts = val; + dev_dbg(dev, "%s: cmd: %#x status: %#x\n", __func__, *cmd, val); + + if (*cmd & IDXD_CMD_INT_MASK) { + *intcause |= IDXD_INTC_CMD; + vidxd_send_interrupt(vidxd, 0); + } +} + +static void vidxd_enable(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + union gensts_reg *gensts = (union gensts_reg *)(bar0 + IDXD_GENSTATS_OFFSET); + + if (gensts->state == IDXD_DEVICE_STATE_ENABLED) + return idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DEV_ENABLED); + + /* Check PCI configuration */ + if (!(vidxd->cfg[PCI_COMMAND] & PCI_COMMAND_MASTER)) + return idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_BUSMASTER_EN); + + gensts->state = IDXD_DEVICE_STATE_ENABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_disable(struct vdcm_idxd *vidxd) +{ + struct idxd_wq *wq; + union wqcfg *vwqcfg; + u8 *bar0 = vidxd->bar0; + union gensts_reg *gensts = (union gensts_reg *)(bar0 + IDXD_GENSTATS_OFFSET); + struct mdev_device *mdev = vidxd->mdev; + struct device *dev = &mdev->dev; + int rc; + + if (gensts->state == IDXD_DEVICE_STATE_DISABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DIS_DEV_EN); + return; + } + + vwqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + wq = vidxd->wq; + + rc = idxd_wq_disable(wq); + if (rc < 0) { + dev_warn(dev, "vidxd disable (wq disable) failed.\n"); + idxd_complete_command(vidxd, IDXD_CMDSTS_HW_ERR); + return; + } + + vwqcfg->wq_state = IDXD_WQ_DISABLED; + gensts->state = IDXD_DEVICE_STATE_DISABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_drain_all(struct vdcm_idxd *vidxd) +{ + struct idxd_wq *wq = vidxd->wq; + + idxd_wq_drain(wq); + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_drain(struct vdcm_idxd *vidxd, int val) +{ + u8 *bar0 = vidxd->bar0; + union wqcfg *vwqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + struct idxd_wq *wq = vidxd->wq; + + if (vwqcfg->wq_state != IDXD_WQ_DEV_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DEV_NOT_EN); + return; + } + + idxd_wq_drain(wq); + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_abort_all(struct vdcm_idxd *vidxd) +{ + struct idxd_wq *wq = vidxd->wq; + int rc; + + rc = idxd_wq_abort(wq); + if (rc < 0) { + idxd_complete_command(vidxd, IDXD_CMDSTS_HW_ERR); + return; + } + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_abort(struct vdcm_idxd *vidxd, int val) +{ + u8 *bar0 = vidxd->bar0; + union wqcfg *vwqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + struct idxd_wq *wq = vidxd->wq; + int rc; + + if (vwqcfg->wq_state != IDXD_WQ_DEV_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DEV_NOT_EN); + return; + } + + rc = idxd_wq_abort(wq); + if (rc < 0) { + idxd_complete_command(vidxd, IDXD_CMDSTS_HW_ERR); + return; + } + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +void vidxd_reset(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + union gensts_reg *gensts = (union gensts_reg *)(bar0 + IDXD_GENSTATS_OFFSET); + union wqcfg *vwqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + struct idxd_wq *wq; + int rc; + + gensts->state = IDXD_DEVICE_STATE_DRAIN; + wq = vidxd->wq; + + if (wq->state == IDXD_WQ_ENABLED) { + rc = idxd_wq_abort(wq); + if (rc < 0) { + idxd_complete_command(vidxd, IDXD_CMDSTS_HW_ERR); + return; + } + + rc = idxd_wq_disable(wq); + if (rc < 0) { + idxd_complete_command(vidxd, IDXD_CMDSTS_HW_ERR); + return; + } + } + + vwqcfg->wq_state = IDXD_WQ_DISABLED; + gensts->state = IDXD_DEVICE_STATE_DISABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_reset(struct vdcm_idxd *vidxd, int wq_id_mask) +{ + struct idxd_wq *wq; + u8 *bar0 = vidxd->bar0; + union wqcfg *vwqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + int rc; + + wq = vidxd->wq; + if (vwqcfg->wq_state != IDXD_WQ_DEV_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DEV_NOT_EN); + return; + } + + rc = idxd_wq_abort(wq); + if (rc < 0) { + idxd_complete_command(vidxd, IDXD_CMDSTS_HW_ERR); + return; + } + + rc = idxd_wq_disable(wq); + if (rc < 0) { + idxd_complete_command(vidxd, IDXD_CMDSTS_HW_ERR); + return; + } + + vwqcfg->wq_state = IDXD_WQ_DEV_DISABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_alloc_int_handle(struct vdcm_idxd *vidxd, int operand) +{ + bool ims = !!(operand & CMD_INT_HANDLE_IMS); + u32 cmdsts; + struct mdev_device *mdev = vidxd->mdev; + struct device *dev = &mdev->dev; + int ims_idx, vidx; + + vidx = operand & GENMASK(15, 0); + + /* vidx cannot be 0 since that's emulated and does not require IMS handle */ + if (vidx <= 0 || vidx >= VIDXD_MAX_MSIX_VECS) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_INVAL_INT_IDX); + return; + } + + if (ims) { + dev_warn(dev, "IMS allocation is not implemented yet\n"); + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_NO_HANDLE); + return; + } + + /* + * The index coming from the guest driver will start at 1. Vector 0 is + * the command interrupt and is emulated by the vdcm. Here we are asking + * for the IMS index that's backing the I/O vectors from the relative + * index to the mdev device. This index would start at 0. So for a + * passed in vidx that is 1, we pass 0 to dev_msi_hwirq() and so forth. + */ + ims_idx = dev_msi_hwirq(dev, vidx - 1); + cmdsts = ims_idx << IDXD_CMDSTS_RES_SHIFT; + dev_dbg(dev, "requested index %d handle %d\n", vidx, ims_idx); + idxd_complete_command(vidxd, cmdsts); +} + +static void vidxd_release_int_handle(struct vdcm_idxd *vidxd, int operand) +{ + struct mdev_device *mdev = vidxd->mdev; + struct device *dev = &mdev->dev; + bool ims = !!(operand & CMD_INT_HANDLE_IMS); + int handle, i; + bool found = false; + + handle = operand & GENMASK(15, 0); + if (ims) { + dev_dbg(dev, "IMS allocation is not implemented yet\n"); + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_INVAL_INT_IDX_RELEASE); + return; + } + + /* IMS backed entry start at 1, 0 is emulated vector */ + for (i = 1; i < VIDXD_MAX_MSIX_VECS; i++) { + if (dev_msi_hwirq(dev, i) == handle) { + found = true; + break; + } + } + + if (!found) { + dev_dbg(dev, "Freeing unallocated int handle.\n"); + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_INVAL_INT_IDX_RELEASE); + } + + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_enable(struct vdcm_idxd *vidxd, int wq_id) +{ + struct idxd_wq *wq; + u8 *bar0 = vidxd->bar0; + union wq_cap_reg *wqcap; + struct mdev_device *mdev = vidxd->mdev; + struct device *dev = &mdev->dev; + struct idxd_device *idxd; + union wqcfg *vwqcfg; + unsigned long flags; + u32 wq_pasid; + int priv, rc; + + if (wq_id >= VIDXD_MAX_WQS) { + idxd_complete_command(vidxd, IDXD_CMDSTS_INVAL_WQIDX); + return; + } + + idxd = vidxd->idxd; + wq = vidxd->wq; + + vwqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET + wq_id * 32); + wqcap = (union wq_cap_reg *)(bar0 + IDXD_WQCAP_OFFSET); + + if (vidxd_state(vidxd) != IDXD_DEVICE_STATE_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DEV_NOTEN); + return; + } + + if (vwqcfg->wq_state != IDXD_WQ_DEV_DISABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_ENABLED); + return; + } + + if (wq_dedicated(wq) && wqcap->dedicated_mode == 0) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_MODE); + return; + } + + priv = 1; + rc = idxd_mdev_get_pasid(mdev, &vidxd->vdev, &wq_pasid); + if (rc < 0) { + dev_warn(dev, "idxd pasid setup failed wq %d: %d\n", wq->id, rc); + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_PASID_EN); + return; + } + + dev_dbg(dev, "program pasid %d in wq %d\n", wq_pasid, wq->id); + spin_lock_irqsave(&idxd->dev_lock, flags); + idxd_wq_setup_pasid(wq, wq_pasid); + idxd_wq_setup_priv(wq, priv); + spin_unlock_irqrestore(&idxd->dev_lock, flags); + rc = idxd_wq_enable(wq); + if (rc < 0) { + dev_dbg(dev, "vidxd enable wq %d failed\n", wq->id); + spin_lock_irqsave(&idxd->dev_lock, flags); + idxd_wq_clear_pasid(wq); + idxd_wq_setup_priv(wq, 0); + spin_unlock_irqrestore(&idxd->dev_lock, flags); + idxd_complete_command(vidxd, IDXD_CMDSTS_HW_ERR); + return; + } + + vwqcfg->wq_state = IDXD_WQ_DEV_ENABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_disable(struct vdcm_idxd *vidxd, int wq_id_mask) +{ + struct idxd_wq *wq; + u8 *bar0 = vidxd->bar0; + union wqcfg *vwqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + int rc; + + wq = vidxd->wq; + if (vwqcfg->wq_state != IDXD_WQ_DEV_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DEV_NOT_EN); + return; + } + + rc = idxd_wq_disable(wq); + if (rc < 0) { + idxd_complete_command(vidxd, IDXD_CMDSTS_HW_ERR); + return; + } + + vwqcfg->wq_state = IDXD_WQ_DEV_DISABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static bool command_supported(struct vdcm_idxd *vidxd, u32 cmd) +{ + u8 *bar0 = vidxd->bar0; + u32 *cmd_cap = (u32 *)(bar0 + IDXD_CMDCAP_OFFSET); + + return !!(*cmd_cap & BIT(cmd)); +} + +static void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val) +{ + union idxd_command_reg *reg = (union idxd_command_reg *)(vidxd->bar0 + IDXD_CMD_OFFSET); + union gensts_reg *gensts = (union gensts_reg *)(vidxd->bar0 + IDXD_GENSTATS_OFFSET); + struct mdev_device *mdev = vidxd->mdev; + struct device *dev = &mdev->dev; + + reg->bits = val; + + dev_dbg(dev, "%s: cmd code: %u reg: %x\n", __func__, reg->cmd, reg->bits); + if (!command_supported(vidxd, reg->cmd)) { + idxd_complete_command(vidxd, IDXD_CMDSTS_INVAL_CMD); + return; + } + + if (gensts->state == IDXD_DEVICE_STATE_HALT) { + idxd_complete_command(vidxd, IDXD_CMDSTS_HW_ERR); + return; + } + + switch (reg->cmd) { + case IDXD_CMD_ENABLE_DEVICE: + vidxd_enable(vidxd); + break; + case IDXD_CMD_DISABLE_DEVICE: + vidxd_disable(vidxd); + break; + case IDXD_CMD_DRAIN_ALL: + vidxd_drain_all(vidxd); + break; + case IDXD_CMD_ABORT_ALL: + vidxd_abort_all(vidxd); + break; + case IDXD_CMD_RESET_DEVICE: + vidxd_reset(vidxd); + break; + case IDXD_CMD_ENABLE_WQ: + vidxd_wq_enable(vidxd, reg->operand); + break; + case IDXD_CMD_DISABLE_WQ: + vidxd_wq_disable(vidxd, reg->operand); + break; + case IDXD_CMD_DRAIN_WQ: + vidxd_wq_drain(vidxd, reg->operand); + break; + case IDXD_CMD_ABORT_WQ: + vidxd_wq_abort(vidxd, reg->operand); + break; + case IDXD_CMD_RESET_WQ: + vidxd_wq_reset(vidxd, reg->operand); + break; + case IDXD_CMD_REQUEST_INT_HANDLE: + vidxd_alloc_int_handle(vidxd, reg->operand); + break; + case IDXD_CMD_RELEASE_INT_HANDLE: + vidxd_release_int_handle(vidxd, reg->operand); + break; + default: + idxd_complete_command(vidxd, IDXD_CMDSTS_INVAL_CMD); + break; + } +} + +MODULE_IMPORT_NS(IDXD); MODULE_LICENSE("GPL v2"); From patchwork Sat May 22 00:19:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12274213 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7608BC4707A for ; Sat, 22 May 2021 00:20:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 57A7D613D6 for ; Sat, 22 May 2021 00:20:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230453AbhEVAVc (ORCPT ); Fri, 21 May 2021 20:21:32 -0400 Received: from mga04.intel.com ([192.55.52.120]:56018 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230268AbhEVAVU (ORCPT ); Fri, 21 May 2021 20:21:20 -0400 IronPort-SDR: hsUlE92OGPdPQQIThVjWgpIyzua3uImWSBxNB6DRwrxxndKFqTNImptFtWXS7ZURqYwUfSlTnc ZVArNoZJ3ogg== X-IronPort-AV: E=McAfee;i="6200,9189,9991"; a="199661193" X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="199661193" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:19:56 -0700 IronPort-SDR: YkEtAZ80uK9zugMnOdyWsS4n8Nbrq7PRClYcodvuqx8oo6tcL9N6Azy5Gv+Rdi1/1KPKP80sfF dW1zVEhNOcrg== X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="441149316" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:19:55 -0700 Subject: [PATCH v6 08/20] vfio/mdev: idxd: Add mdev device context initialization From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org, jgg@mellanox.com Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 21 May 2021 17:19:54 -0700 Message-ID: <162164279478.261970.8966553743790451233.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> References: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add support functions to initialize the vdcm context and the PCI config space region and the MMIO region. These regions are to support the emulation paths for the mdev. Signed-off-by: Dave Jiang --- drivers/dma/idxd/registers.h | 3 + drivers/vfio/mdev/idxd/mdev.h | 4 + drivers/vfio/mdev/idxd/vdev.c | 214 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 220 insertions(+), 1 deletion(-) diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h index c2d558e37baf..8ac2be4e174b 100644 --- a/drivers/dma/idxd/registers.h +++ b/drivers/dma/idxd/registers.h @@ -88,6 +88,9 @@ struct opcap { u64 bits[4]; }; +#define OPCAP_OFS(op) (op - (0x40 * (op >> 6))) +#define OPCAP_BIT(op) (BIT_ULL(OPCAP_OFS(op))) + #define IDXD_OPCAP_OFFSET 0x40 #define IDXD_TABLE_OFFSET 0x60 diff --git a/drivers/vfio/mdev/idxd/mdev.h b/drivers/vfio/mdev/idxd/mdev.h index e52b50760ee7..91cb2662abd6 100644 --- a/drivers/vfio/mdev/idxd/mdev.h +++ b/drivers/vfio/mdev/idxd/mdev.h @@ -16,6 +16,7 @@ #define VIDXD_MSIX_TBL_SZ 0x90 #define VIDXD_MSIX_PERM_TBL_SZ 0x48 +#define VIDXD_VERSION_OFFSET 0 #define VIDXD_MSIX_PERM_OFFSET 0x300 #define VIDXD_GRPCFG_OFFSET 0x400 #define VIDXD_WQCFG_OFFSET 0x500 @@ -74,8 +75,9 @@ static inline u8 vidxd_state(struct vdcm_idxd *vidxd) int idxd_mdev_get_pasid(struct mdev_device *mdev, struct vfio_device *vdev, u32 *pasid); +void vidxd_init(struct vdcm_idxd *vidxd); void vidxd_reset(struct vdcm_idxd *vidxd); - +void vidxd_mmio_init(struct vdcm_idxd *vidxd); int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count); int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size); #endif diff --git a/drivers/vfio/mdev/idxd/vdev.c b/drivers/vfio/mdev/idxd/vdev.c index 4ead50947047..78cc2377e637 100644 --- a/drivers/vfio/mdev/idxd/vdev.c +++ b/drivers/vfio/mdev/idxd/vdev.c @@ -21,6 +21,62 @@ #include "idxd.h" #include "mdev.h" +static u64 idxd_pci_config[] = { + 0x0010000000008086ULL, + 0x0080000008800000ULL, + 0x000000000000000cULL, + 0x000000000000000cULL, + 0x0000000000000000ULL, + 0x2010808600000000ULL, + 0x0000004000000000ULL, + 0x000000ff00000000ULL, + 0x0000060000015011ULL, /* MSI-X capability, hardcoded 2 entries, Encoded as N-1 */ + 0x0000070000000000ULL, + 0x0000000000920010ULL, /* PCIe capability */ + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, +}; + +static void vidxd_reset_config(struct vdcm_idxd *vidxd) +{ + u16 *devid = (u16 *)(vidxd->cfg + PCI_DEVICE_ID); + struct idxd_device *idxd = vidxd->idxd; + + memset(vidxd->cfg, 0, VIDXD_MAX_CFG_SPACE_SZ); + memcpy(vidxd->cfg, idxd_pci_config, sizeof(idxd_pci_config)); + + if (idxd->data->type == IDXD_TYPE_DSA) + *devid = PCI_DEVICE_ID_INTEL_DSA_SPR0; + else if (idxd->data->type == IDXD_TYPE_IAX) + *devid = PCI_DEVICE_ID_INTEL_IAX_SPR0; +} + +static inline void vidxd_reset_mmio(struct vdcm_idxd *vidxd) +{ + memset(&vidxd->bar0, 0, VIDXD_MAX_MMIO_SPACE_SZ); +} + +void vidxd_init(struct vdcm_idxd *vidxd) +{ + struct idxd_wq *wq = vidxd->wq; + + vidxd_reset_config(vidxd); + vidxd_reset_mmio(vidxd); + + vidxd->bar_size[0] = VIDXD_BAR0_SIZE; + vidxd->bar_size[1] = VIDXD_BAR2_SIZE; + + vidxd_mmio_init(vidxd); + + if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED) + idxd_wq_disable(wq); +} + void vidxd_send_interrupt(struct vdcm_idxd *vidxd, int vector) { struct mdev_device *mdev = vidxd->mdev; @@ -252,6 +308,163 @@ int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsign return 0; } +static void vidxd_mmio_init_grpcap(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + union group_cap_reg *grp_cap = (union group_cap_reg *)(bar0 + IDXD_GRPCAP_OFFSET); + + /* single group for current implementation */ + grp_cap->num_groups = 1; +} + +static void vidxd_mmio_init_grpcfg(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + struct grpcfg *grpcfg = (struct grpcfg *)(bar0 + VIDXD_GRPCFG_OFFSET); + struct idxd_wq *wq = vidxd->wq; + struct idxd_group *group = wq->group; + int i; + + /* + * At this point, we are only exporting a single workqueue for + * each mdev. + */ + grpcfg->wqs[0] = BIT(0); + for (i = 0; i < group->num_engines; i++) + grpcfg->engines |= BIT(i); + grpcfg->flags.bits = group->grpcfg.flags.bits; +} + +static void vidxd_mmio_init_wqcap(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + struct idxd_wq *wq = vidxd->wq; + union wq_cap_reg *wq_cap = (union wq_cap_reg *)(bar0 + IDXD_WQCAP_OFFSET); + + wq_cap->total_wq_size = wq->size; + wq_cap->num_wqs = 1; + wq_cap->dedicated_mode = 1; +} + +static void vidxd_mmio_init_wqcfg(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + struct idxd_wq *wq = vidxd->wq; + u8 *bar0 = vidxd->bar0; + union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + + wqcfg->wq_size = wq->size; + wqcfg->wq_thresh = wq->threshold; + wqcfg->mode = WQCFG_MODE_DEDICATED; + wqcfg->priority = wq->priority; + wqcfg->max_xfer_shift = idxd->hw.gen_cap.max_xfer_shift; + wqcfg->max_batch_shift = idxd->hw.gen_cap.max_batch_shift; +} + +static void vidxd_mmio_init_engcap(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + union engine_cap_reg *engcap = (union engine_cap_reg *)(bar0 + IDXD_ENGCAP_OFFSET); + struct idxd_wq *wq = vidxd->wq; + struct idxd_group *group = wq->group; + + engcap->num_engines = group->num_engines; +} + +static void vidxd_mmio_init_gencap(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + u8 *bar0 = vidxd->bar0; + union gen_cap_reg *gencap = (union gen_cap_reg *)(bar0 + IDXD_GENCAP_OFFSET); + + gencap->overlap_copy = idxd->hw.gen_cap.overlap_copy; + gencap->cache_control_mem = idxd->hw.gen_cap.cache_control_mem; + gencap->cache_control_cache = idxd->hw.gen_cap.cache_control_cache; + gencap->cmd_cap = 1; + gencap->dest_readback = idxd->hw.gen_cap.dest_readback; + gencap->drain_readback = idxd->hw.gen_cap.drain_readback; + gencap->max_xfer_shift = idxd->hw.gen_cap.max_xfer_shift; + gencap->max_batch_shift = idxd->hw.gen_cap.max_batch_shift; + gencap->max_descs_per_engine = idxd->hw.gen_cap.max_descs_per_engine; +} + +static void vidxd_mmio_init_cmdcap(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + u32 *cmdcap = (u32 *)(bar0 + IDXD_CMDCAP_OFFSET); + + *cmdcap |= BIT(IDXD_CMD_ENABLE_DEVICE) | BIT(IDXD_CMD_DISABLE_DEVICE) | + BIT(IDXD_CMD_DRAIN_ALL) | BIT(IDXD_CMD_ABORT_ALL) | + BIT(IDXD_CMD_RESET_DEVICE) | BIT(IDXD_CMD_ENABLE_WQ) | + BIT(IDXD_CMD_DISABLE_WQ) | BIT(IDXD_CMD_DRAIN_WQ) | + BIT(IDXD_CMD_ABORT_WQ) | BIT(IDXD_CMD_RESET_WQ) | + BIT(IDXD_CMD_DRAIN_PASID) | BIT(IDXD_CMD_ABORT_PASID) | + BIT(IDXD_CMD_REQUEST_INT_HANDLE) | BIT(IDXD_CMD_RELEASE_INT_HANDLE); +} + +static void vidxd_mmio_init_opcap(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + u64 opcode; + u8 *bar0 = vidxd->bar0; + u64 *opcap = (u64 *)(bar0 + IDXD_OPCAP_OFFSET); + + if (idxd->data->type == IDXD_TYPE_DSA) { + opcode = BIT_ULL(DSA_OPCODE_NOOP) | BIT_ULL(DSA_OPCODE_BATCH) | + BIT_ULL(DSA_OPCODE_DRAIN) | BIT_ULL(DSA_OPCODE_MEMMOVE) | + BIT_ULL(DSA_OPCODE_MEMFILL) | BIT_ULL(DSA_OPCODE_COMPARE) | + BIT_ULL(DSA_OPCODE_COMPVAL) | BIT_ULL(DSA_OPCODE_CR_DELTA) | + BIT_ULL(DSA_OPCODE_AP_DELTA) | BIT_ULL(DSA_OPCODE_DUALCAST) | + BIT_ULL(DSA_OPCODE_CRCGEN) | BIT_ULL(DSA_OPCODE_COPY_CRC) | + BIT_ULL(DSA_OPCODE_DIF_CHECK) | BIT_ULL(DSA_OPCODE_DIF_INS) | + BIT_ULL(DSA_OPCODE_DIF_STRP) | BIT_ULL(DSA_OPCODE_DIF_UPDT) | + BIT_ULL(DSA_OPCODE_CFLUSH); + *opcap = opcode; + } else if (idxd->data->type == IDXD_TYPE_IAX) { + opcode = BIT_ULL(IAX_OPCODE_NOOP) | BIT_ULL(IAX_OPCODE_DRAIN) | + BIT_ULL(IAX_OPCODE_MEMMOVE); + *opcap = opcode; + opcap++; + opcode = OPCAP_BIT(IAX_OPCODE_DECOMPRESS) | + OPCAP_BIT(IAX_OPCODE_COMPRESS); + *opcap = opcode; + } +} + +static void vidxd_mmio_init_version(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + u32 *version; + + version = (u32 *)(vidxd->bar0 + VIDXD_VERSION_OFFSET); + *version = idxd->hw.version; +} + +void vidxd_mmio_init(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + union offsets_reg *offsets; + + memset(vidxd->bar0, 0, VIDXD_BAR0_SIZE); + + vidxd_mmio_init_version(vidxd); + vidxd_mmio_init_gencap(vidxd); + vidxd_mmio_init_wqcap(vidxd); + vidxd_mmio_init_grpcap(vidxd); + vidxd_mmio_init_engcap(vidxd); + vidxd_mmio_init_opcap(vidxd); + + offsets = (union offsets_reg *)(bar0 + IDXD_TABLE_OFFSET); + offsets->grpcfg = VIDXD_GRPCFG_OFFSET / 0x100; + offsets->wqcfg = VIDXD_WQCFG_OFFSET / 0x100; + offsets->msix_perm = VIDXD_MSIX_PERM_OFFSET / 0x100; + + vidxd_mmio_init_cmdcap(vidxd); + memset(bar0 + VIDXD_MSIX_PERM_OFFSET, 0, VIDXD_MSIX_PERM_TBL_SZ); + vidxd_mmio_init_grpcfg(vidxd); + vidxd_mmio_init_wqcfg(vidxd); +} + static void idxd_complete_command(struct vdcm_idxd *vidxd, enum idxd_cmdsts_err val) { u8 *bar0 = vidxd->bar0; @@ -396,6 +609,7 @@ void vidxd_reset(struct vdcm_idxd *vidxd) } } + vidxd_mmio_init(vidxd); vwqcfg->wq_state = IDXD_WQ_DISABLED; gensts->state = IDXD_DEVICE_STATE_DISABLED; idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); From patchwork Sat May 22 00:20:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12274217 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B98FDC4707E for ; Sat, 22 May 2021 00:20:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 929A9613D0 for ; Sat, 22 May 2021 00:20:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231135AbhEVAVm (ORCPT ); Fri, 21 May 2021 20:21:42 -0400 Received: from mga06.intel.com ([134.134.136.31]:50235 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230359AbhEVAV0 (ORCPT ); Fri, 21 May 2021 20:21:26 -0400 IronPort-SDR: z3Ivjb9gm0NXXq82lehPsnGJsoJ5JLZ27h2uLFcXWugFWBJyZb3/HvAxmZUKHn4NoXnd8E6u9u vEhswLWXmVGA== X-IronPort-AV: E=McAfee;i="6200,9189,9991"; a="262818140" X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="262818140" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:20:02 -0700 IronPort-SDR: T2uzjo7HAO8glCKV/sYwZjVQ5R2W8fIwzRVf1kRQbkcXbbNDzWtDAqPK3gTS8Tq5qgIFREuROQ 83cTQgKzOGAg== X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="441312120" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:20:01 -0700 Subject: [PATCH v6 09/20] vfio/mdev: Add mmio read/write support for mdev From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org, jgg@mellanox.com Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 21 May 2021 17:20:01 -0700 Message-ID: <162164280156.261970.13755411310924589194.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> References: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The BAR0 MMIO path for the mdev is emulated. Add read/write support functions to deal with MMIO access when the guest read/write to the device. The support functions deals with the BAR0 MMIO region of the mdev. Signed-off-by: Dave Jiang --- drivers/dma/idxd/device.c | 1 drivers/dma/idxd/registers.h | 6 ++ drivers/vfio/mdev/idxd/mdev.h | 3 + drivers/vfio/mdev/idxd/vdev.c | 119 +++++++++++++++++++++++++++++++++++++++++ include/uapi/linux/idxd.h | 1 5 files changed, 129 insertions(+), 1 deletion(-) diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index 99542c9cbc47..2ea6015e0d53 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -277,6 +277,7 @@ void idxd_wq_drain(struct idxd_wq *wq) operand = BIT(wq->id % 16) | ((wq->id / 16) << 16); idxd_cmd_exec(idxd, IDXD_CMD_DRAIN_WQ, operand, NULL); } +EXPORT_SYMBOL_GPL(idxd_wq_drain); void idxd_wq_reset(struct idxd_wq *wq) { diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h index 8ac2be4e174b..cf3d513a18e0 100644 --- a/drivers/dma/idxd/registers.h +++ b/drivers/dma/idxd/registers.h @@ -201,7 +201,9 @@ union cmdsts_reg { }; u32 bits; } __packed; -#define IDXD_CMDSTS_ACTIVE 0x80000000 + +#define IDXD_CMDS_ACTIVE_BIT 31 +#define IDXD_CMDSTS_ACTIVE BIT(IDXD_CMDS_ACTIVE_BIT) #define IDXD_CMDSTS_ERR_MASK 0xff #define IDXD_CMDSTS_RES_SHIFT 8 @@ -285,6 +287,8 @@ union msix_perm { u32 bits; } __packed; +#define IDXD_MSIX_PERM_MASK 0xfffff00c +#define IDXD_MSIX_PERM_IGNORE 0x3 #define MSIX_ENTRY_MASK_INT 0x1 #define MSIX_ENTRY_CTRL_BYTE 12 diff --git a/drivers/vfio/mdev/idxd/mdev.h b/drivers/vfio/mdev/idxd/mdev.h index 91cb2662abd6..f696fe38e374 100644 --- a/drivers/vfio/mdev/idxd/mdev.h +++ b/drivers/vfio/mdev/idxd/mdev.h @@ -80,4 +80,7 @@ void vidxd_reset(struct vdcm_idxd *vidxd); void vidxd_mmio_init(struct vdcm_idxd *vidxd); int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count); int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size); +int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size); +int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size); + #endif diff --git a/drivers/vfio/mdev/idxd/vdev.c b/drivers/vfio/mdev/idxd/vdev.c index 78cc2377e637..d2416765ce7e 100644 --- a/drivers/vfio/mdev/idxd/vdev.c +++ b/drivers/vfio/mdev/idxd/vdev.c @@ -42,6 +42,8 @@ static u64 idxd_pci_config[] = { 0x0000000000000000ULL, }; +static void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val); + static void vidxd_reset_config(struct vdcm_idxd *vidxd) { u16 *devid = (u16 *)(vidxd->cfg + PCI_DEVICE_ID); @@ -141,6 +143,123 @@ static void vidxd_report_pci_error(struct vdcm_idxd *vidxd) send_halt_interrupt(vidxd); } +static void vidxd_report_swerror(struct vdcm_idxd *vidxd, unsigned int error) +{ + vidxd_set_swerr(vidxd, error); + send_swerr_interrupt(vidxd); +} + +int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size) +{ + u32 offset = pos & (vidxd->bar_size[0] - 1); + u8 *bar0 = vidxd->bar0; + struct device *dev = &vidxd->mdev->dev; + + dev_dbg(dev, "vidxd mmio W %d %x %x: %llx\n", vidxd->wq->id, size, + offset, get_reg_val(buf, size)); + + if (((size & (size - 1)) != 0) || (offset & (size - 1)) != 0) + return -EINVAL; + + /* If we don't limit this, we potentially can write out of bound */ + if (size > sizeof(u32)) + return -EINVAL; + + switch (offset) { + case IDXD_GENCFG_OFFSET ... IDXD_GENCFG_OFFSET + 3: + /* Write only when device is disabled. */ + if (vidxd_state(vidxd) == IDXD_DEVICE_STATE_DISABLED) { + dev_warn(dev, "Guest writes to unsupported GENCFG register\n"); + memcpy(bar0 + offset, buf, size); + } + break; + + case IDXD_GENCTRL_OFFSET: + memcpy(bar0 + offset, buf, size); + break; + + case IDXD_INTCAUSE_OFFSET: + bar0[offset] &= ~(get_reg_val(buf, 1) & GENMASK(4, 0)); + break; + + case IDXD_CMD_OFFSET: { + u32 *cmdsts = (u32 *)(bar0 + IDXD_CMDSTS_OFFSET); + u32 val = get_reg_val(buf, size); + + if (size != sizeof(u32)) + return -EINVAL; + + /* Check and set command in progress */ + if (test_and_set_bit(IDXD_CMDS_ACTIVE_BIT, (unsigned long *)cmdsts) == 0) + vidxd_do_command(vidxd, val); + else + vidxd_report_swerror(vidxd, DSA_ERR_CMD_REG); + break; + } + + case IDXD_SWERR_OFFSET: + /* W1C */ + bar0[offset] &= ~(get_reg_val(buf, 1) & GENMASK(1, 0)); + break; + + case VIDXD_MSIX_TABLE_OFFSET ... VIDXD_MSIX_TABLE_OFFSET + VIDXD_MSIX_TBL_SZ - 1: { + int index = (offset - VIDXD_MSIX_TABLE_OFFSET) / 0x10; + u8 *msix_entry = &bar0[VIDXD_MSIX_TABLE_OFFSET + index * 0x10]; + u64 *pba = (u64 *)(bar0 + VIDXD_MSIX_PBA_OFFSET); + u8 ctrl, new_mask; + int ims_index, ims_off; + u32 ims_ctrl, ims_mask; + struct idxd_device *idxd = vidxd->idxd; + + memcpy(bar0 + offset, buf, size); + ctrl = msix_entry[MSIX_ENTRY_CTRL_BYTE]; + + new_mask = ctrl & MSIX_ENTRY_MASK_INT; + if (!new_mask && test_and_clear_bit(index, (unsigned long *)pba)) + vidxd_send_interrupt(vidxd, index); + + if (index == 0) + break; + + ims_index = dev_msi_hwirq(dev, index - 1); + ims_off = idxd->ims_offset + ims_index * 16 + sizeof(u64); + ims_ctrl = ioread32(idxd->reg_base + ims_off); + ims_mask = ims_ctrl & MSIX_ENTRY_MASK_INT; + + if (new_mask == ims_mask) + break; + + if (new_mask) + ims_ctrl |= MSIX_ENTRY_MASK_INT; + else + ims_ctrl &= ~MSIX_ENTRY_MASK_INT; + + iowrite32(ims_ctrl, idxd->reg_base + ims_off); + /* readback to flush */ + ims_ctrl = ioread32(idxd->reg_base + ims_off); + break; + } + + case VIDXD_MSIX_PERM_OFFSET ... VIDXD_MSIX_PERM_OFFSET + VIDXD_MSIX_PERM_TBL_SZ - 1: + memcpy(bar0 + offset, buf, size); + break; + } /* offset */ + + return 0; +} + +int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size) +{ + u32 offset = pos & (vidxd->bar_size[0] - 1); + struct device *dev = &vidxd->mdev->dev; + + memcpy(buf, vidxd->bar0 + offset, size); + + dev_dbg(dev, "vidxd mmio R %d %x %x: %llx\n", + vidxd->wq->id, size, offset, get_reg_val(buf, size)); + return 0; +} + int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count) { u32 offset = pos & 0xfff; diff --git a/include/uapi/linux/idxd.h b/include/uapi/linux/idxd.h index 751f6107217c..e8c39849a526 100644 --- a/include/uapi/linux/idxd.h +++ b/include/uapi/linux/idxd.h @@ -90,6 +90,7 @@ enum dsa_completion_status { DSA_COMP_HW_ERR_DRB, DSA_COMP_TRANSLATION_FAIL, DSA_ERR_PCI_CFG = 0x51, + DSA_ERR_CMD_REG, }; enum iax_completion_status { From patchwork Sat May 22 00:20:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12274219 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1F1EC4707A for ; Sat, 22 May 2021 00:20:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C7D8A613F4 for ; Sat, 22 May 2021 00:20:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230466AbhEVAVu (ORCPT ); Fri, 21 May 2021 20:21:50 -0400 Received: from mga01.intel.com ([192.55.52.88]:24594 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230448AbhEVAVc (ORCPT ); Fri, 21 May 2021 20:21:32 -0400 IronPort-SDR: LF2MPiCXllk4cEOIio/MgJsD2XjDb8r3gO3Eo/CB6EiZ5UQqfJDruwoicit5zF1DkuWj2bsL/k LJHaw9Rp9R0A== X-IronPort-AV: E=McAfee;i="6200,9189,9991"; a="222715671" X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="222715671" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:20:08 -0700 IronPort-SDR: hZOuSZ0CJpLCgrLBJ3aW1MyOVv4Ft8V37cTrk97PDJSE+4tfuBU2QM5z2+p5QWjpF1Gp1S4kO/ b3CaTPLj5mQA== X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="628864578" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:20:07 -0700 Subject: [PATCH v6 10/20] vfio/mdev: idxd: add mdev type as a new wq type From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org, jgg@mellanox.com Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 21 May 2021 17:20:07 -0700 Message-ID: <162164280754.261970.2354155734920115161.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> References: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add "mdev" wq type and support helpers. The mdev wq type marks the wq to be utilized as a VFIO mediated device. Signed-off-by: Dave Jiang --- drivers/dma/idxd/idxd.h | 6 ++++++ drivers/dma/idxd/sysfs.c | 5 +++++ 2 files changed, 11 insertions(+) diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 34ffa6dad53a..cbb046c2921f 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -129,6 +129,7 @@ enum idxd_wq_type { IDXD_WQT_NONE = 0, IDXD_WQT_KERNEL, IDXD_WQT_USER, + IDXD_WQT_MDEV, }; struct idxd_cdev { @@ -424,6 +425,11 @@ static inline bool is_idxd_wq_user(struct idxd_wq *wq) return wq->type == IDXD_WQT_USER; } +static inline bool is_idxd_wq_mdev(struct idxd_wq *wq) +{ + return (wq->type == IDXD_WQT_MDEV); +} + static inline bool wq_dedicated(struct idxd_wq *wq) { return test_bit(WQ_FLAG_DEDICATED, &wq->flags); diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index 6583c9c2e992..3d3a84be2c9b 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -17,6 +17,7 @@ static char *idxd_wq_type_names[] = { [IDXD_WQT_NONE] = "none", [IDXD_WQT_KERNEL] = "kernel", [IDXD_WQT_USER] = "user", + [IDXD_WQT_MDEV] = "mdev", }; static bool is_idxd_dev_drv(struct device_driver *drv) @@ -860,6 +861,8 @@ static ssize_t wq_type_show(struct device *dev, return sysfs_emit(buf, "%s\n", idxd_wq_type_names[IDXD_WQT_KERNEL]); case IDXD_WQT_USER: return sysfs_emit(buf, "%s\n", idxd_wq_type_names[IDXD_WQT_USER]); + case IDXD_WQT_MDEV: + return sysfs_emit(buf, "%s\n", idxd_wq_type_names[IDXD_WQT_MDEV]); case IDXD_WQT_NONE: default: return sysfs_emit(buf, "%s\n", idxd_wq_type_names[IDXD_WQT_NONE]); @@ -885,6 +888,8 @@ static ssize_t wq_type_store(struct device *dev, wq->type = IDXD_WQT_KERNEL; else if (sysfs_streq(buf, idxd_wq_type_names[IDXD_WQT_USER])) wq->type = IDXD_WQT_USER; + else if (sysfs_streq(buf, idxd_wq_type_names[IDXD_WQT_MDEV])) + wq->type = IDXD_WQT_MDEV; else return -EINVAL; From patchwork Sat May 22 00:20:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12274223 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56205C4707A for ; Sat, 22 May 2021 00:20:43 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3AAB7613F4 for ; Sat, 22 May 2021 00:20:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230522AbhEVAWF (ORCPT ); Fri, 21 May 2021 20:22:05 -0400 Received: from mga18.intel.com ([134.134.136.126]:45251 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231127AbhEVAVl (ORCPT ); Fri, 21 May 2021 20:21:41 -0400 IronPort-SDR: 88YNDiG8RHe89cArKMcHk0QwSUM03GFmGHvUcjMwvPWz+Ml/fwwv6EAQVUeMkKajKbdBkUH3lL xnKrGc+Kiawg== X-IronPort-AV: E=McAfee;i="6200,9189,9991"; a="188993269" X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="188993269" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:20:14 -0700 IronPort-SDR: to1My5qmEvSohnpclJdCrp3nNpyO6VwnPjebJPJ0HQALJmy5xS59L82d/RZr8QWrabQLKO1TUZ zmChe4XT0+FA== X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="631991888" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:20:14 -0700 Subject: [PATCH v6 11/20] vfio/mdev: idxd: Add basic driver setup for idxd mdev From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org, jgg@mellanox.com Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 21 May 2021 17:20:13 -0700 Message-ID: <162164281350.261970.10539208268885731071.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> References: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add the basic registration and initialization for the mdev idxd driver. To register with the mdev framework, the driver must register the pci_dev. Add the registration as part of the idxd mdev driver probe. The host init is setup to be called on the first wq device probe. And when the last wq device releases the driver, the unregistration also happens. Signed-off-by: Dave Jiang --- drivers/dma/Kconfig | 1 drivers/dma/idxd/device.c | 2 + drivers/dma/idxd/idxd.h | 10 ++++ drivers/dma/idxd/init.c | 39 +++++++++++++++++ drivers/vfio/mdev/idxd/mdev.c | 95 +++++++++++++++++++++++++++++++++++++++++ drivers/vfio/mdev/idxd/vdev.c | 3 - 6 files changed, 147 insertions(+), 3 deletions(-) diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 84f996dd339d..390227027878 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -281,6 +281,7 @@ config INTEL_IDXD depends on PCI && X86_64 depends on PCI_MSI depends on SBITMAP + depends on IMS_MSI_ARRAY select DMA_ENGINE help Enable support for the Intel(R) data accelerators present diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index 2ea6015e0d53..cef41a273cc1 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -1257,6 +1257,7 @@ int drv_enable_wq(struct idxd_wq *wq) mutex_unlock(&wq->wq_lock); return rc; } +EXPORT_SYMBOL_GPL(drv_enable_wq); void __drv_disable_wq(struct idxd_wq *wq) { @@ -1283,6 +1284,7 @@ void drv_disable_wq(struct idxd_wq *wq) __drv_disable_wq(wq); mutex_unlock(&wq->wq_lock); } +EXPORT_SYMBOL_GPL(drv_disable_wq); int idxd_device_drv_probe(struct device *dev) { diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index cbb046c2921f..4d2532175705 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -11,6 +11,7 @@ #include #include #include +#include #include "registers.h" #define IDXD_DRIVER_VERSION "1.00" @@ -60,6 +61,7 @@ static const char idxd_dsa_drv_name[] = "dsa"; static const char idxd_dev_drv_name[] = "idxd"; static const char idxd_dmaengine_drv_name[] = "dmaengine"; static const char idxd_user_drv_name[] = "user"; +static const char idxd_mdev_drv_name[] = "mdev"; struct idxd_irq_entry { struct idxd_device *idxd; @@ -297,6 +299,10 @@ struct idxd_device { int *int_handles; struct idxd_pmu *idxd_pmu; + + struct kref mdev_kref; + struct mutex kref_lock; /* lock for the mdev_kref */ + bool mdev_host_init; }; /* IDXD software descriptor */ @@ -587,6 +593,10 @@ int idxd_cdev_get_major(struct idxd_device *idxd); int idxd_wq_add_cdev(struct idxd_wq *wq); void idxd_wq_del_cdev(struct idxd_wq *wq); +/* mdev host */ +int idxd_mdev_host_init(struct idxd_device *idxd, struct mdev_driver *drv); +void idxd_mdev_host_release(struct kref *kref); + /* perfmon */ #if IS_ENABLED(CONFIG_INTEL_IDXD_PERFMON) int perfmon_pmu_init(struct idxd_device *idxd); diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index 16ff37be2d26..809ca1827772 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -63,6 +63,41 @@ static struct pci_device_id idxd_pci_tbl[] = { }; MODULE_DEVICE_TABLE(pci, idxd_pci_tbl); +int idxd_mdev_host_init(struct idxd_device *idxd, struct mdev_driver *drv) +{ + struct device *dev = &idxd->pdev->dev; + int rc; + + if (!idxd->ims_size) + return -EOPNOTSUPP; + + rc = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_AUX); + if (rc < 0) { + dev_warn(dev, "Failed to enable aux-domain: %d\n", rc); + return rc; + } + + rc = mdev_register_device(dev, drv); + if (rc < 0) { + iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_AUX); + return rc; + } + + idxd->mdev_host_init = true; + return 0; +} +EXPORT_SYMBOL_GPL(idxd_mdev_host_init); + +void idxd_mdev_host_release(struct kref *kref) +{ + struct idxd_device *idxd = container_of(kref, struct idxd_device, mdev_kref); + struct device *dev = &idxd->pdev->dev; + + mdev_unregister_device(dev); + iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_AUX); +} +EXPORT_SYMBOL_GPL(idxd_mdev_host_release); + static int idxd_setup_interrupts(struct idxd_device *idxd) { struct pci_dev *pdev = idxd->pdev; @@ -352,6 +387,9 @@ static int idxd_setup_internals(struct idxd_device *idxd) goto err_wkq_create; } + kref_init(&idxd->mdev_kref); + mutex_init(&idxd->kref_lock); + return 0; err_wkq_create: @@ -741,6 +779,7 @@ static void idxd_remove(struct pci_dev *pdev) dev_dbg(&pdev->dev, "%s called\n", __func__); idxd_shutdown(pdev); + kref_put_mutex(&idxd->mdev_kref, idxd_mdev_host_release, &idxd->kref_lock); if (device_pasid_enabled(idxd)) idxd_disable_system_pasid(idxd); idxd_unregister_devices(idxd); diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c index 90ff7cedb8b4..25cd62b803f8 100644 --- a/drivers/vfio/mdev/idxd/mdev.c +++ b/drivers/vfio/mdev/idxd/mdev.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include "registers.h" #include "idxd.h" @@ -40,5 +41,99 @@ int idxd_mdev_get_pasid(struct mdev_device *mdev, struct vfio_device *vdev, u32 return 0; } +static struct mdev_driver idxd_vdcm_driver = { + .driver = { + .name = "idxd-mdev", + .owner = THIS_MODULE, + .mod_name = KBUILD_MODNAME, + }, +}; + +static int idxd_mdev_drv_probe(struct device *dev) +{ + struct idxd_wq *wq = confdev_to_wq(dev); + struct idxd_device *idxd = wq->idxd; + int rc; + + if (!is_idxd_wq_mdev(wq)) + return -ENODEV; + + rc = drv_enable_wq(wq); + if (rc < 0) + return rc; + + /* + * The kref count starts at 1 on initialization. So the first device gets + * probed, we want to setup the mdev and do the host initialization. The + * follow on probes the driver want to just take a kref. On the remove side, once + * the kref hits 0, the driver will do the host cleanup and unregister from the + * mdev framework. + */ + mutex_lock(&idxd->kref_lock); + if (!idxd->mdev_host_init) { + rc = idxd_mdev_host_init(idxd, &idxd_vdcm_driver); + if (rc < 0) { + mutex_unlock(&idxd->kref_lock); + drv_disable_wq(wq); + dev_warn(dev, "mdev device init failed!\n"); + return -ENXIO; + } + idxd->mdev_host_init = true; + } else { + kref_get(&idxd->mdev_kref); + } + mutex_unlock(&idxd->kref_lock); + + get_device(dev); + dev_info(dev, "wq %s enabled\n", dev_name(dev)); + return 0; +} + +static void idxd_mdev_drv_remove(struct device *dev) +{ + struct idxd_wq *wq = confdev_to_wq(dev); + struct idxd_device *idxd = wq->idxd; + + drv_disable_wq(wq); + dev_info(dev, "wq %s disabled\n", dev_name(dev)); + kref_put_mutex(&idxd->mdev_kref, idxd_mdev_host_release, &idxd->kref_lock); + put_device(dev); +} + +static struct idxd_device_driver idxd_mdev_driver = { + .probe = idxd_mdev_drv_probe, + .remove = idxd_mdev_drv_remove, + .name = idxd_mdev_drv_name, +}; + +static int __init idxd_mdev_init(void) +{ + int rc; + + rc = idxd_driver_register(&idxd_mdev_driver); + if (rc < 0) + return rc; + + rc = mdev_register_driver(&idxd_vdcm_driver); + if (rc < 0) { + idxd_driver_unregister(&idxd_mdev_driver); + return rc; + } + + return 0; +} + +static void __exit idxd_mdev_exit(void) +{ + mdev_unregister_driver(&idxd_vdcm_driver); + idxd_driver_unregister(&idxd_mdev_driver); +} + +module_init(idxd_mdev_init); +module_exit(idxd_mdev_exit); + MODULE_IMPORT_NS(IDXD); +MODULE_SOFTDEP("pre: idxd"); MODULE_LICENSE("GPL v2"); +MODULE_AUTHOR("Intel Corporation"); +MODULE_ALIAS_IDXD_DEVICE(0); diff --git a/drivers/vfio/mdev/idxd/vdev.c b/drivers/vfio/mdev/idxd/vdev.c index d2416765ce7e..67da4c122a96 100644 --- a/drivers/vfio/mdev/idxd/vdev.c +++ b/drivers/vfio/mdev/idxd/vdev.c @@ -989,6 +989,3 @@ static void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val) break; } } - -MODULE_IMPORT_NS(IDXD); -MODULE_LICENSE("GPL v2"); From patchwork Sat May 22 00:20:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12274225 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A15E3C4707E for ; Sat, 22 May 2021 00:20:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 83F42613E4 for ; Sat, 22 May 2021 00:20:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231161AbhEVAWK (ORCPT ); Fri, 21 May 2021 20:22:10 -0400 Received: from mga04.intel.com ([192.55.52.120]:56043 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230429AbhEVAVp (ORCPT ); Fri, 21 May 2021 20:21:45 -0400 IronPort-SDR: K35w6VC0hlQX9d2mettQWPSDGB902fE2ncvqx4YHCnIcuqaxB1vQaAMWrjbCQ8Y3bCWfvR8qLL Y2ASgy9rYUvg== X-IronPort-AV: E=McAfee;i="6200,9189,9991"; a="199661232" X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="199661232" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:20:21 -0700 IronPort-SDR: UkWvMMWLQ2DdpdHYUq2/Q0h8ON+Cj0jkwViQ5ds8S1zGJV6+shLzce+s7LHrKRkSnG3p9dtTBF 5ezXbcRifusw== X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="434500908" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:20:20 -0700 Subject: [PATCH v6 12/20] vfio: move VFIO PCI macros to common header From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org, jgg@mellanox.com Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 21 May 2021 17:20:19 -0700 Message-ID: <162164281969.261970.17759783730654052269.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> References: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Move some VFIO_PCI macros to a common header as they will be shared between mdev and vfio_pci. Signed-off-by: Dave Jiang --- drivers/vfio/pci/vfio_pci_private.h | 6 ------ include/linux/vfio.h | 6 ++++++ 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index a17943911fcb..e644f981509c 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -18,12 +18,6 @@ #ifndef VFIO_PCI_PRIVATE_H #define VFIO_PCI_PRIVATE_H -#define VFIO_PCI_OFFSET_SHIFT 40 - -#define VFIO_PCI_OFFSET_TO_INDEX(off) (off >> VFIO_PCI_OFFSET_SHIFT) -#define VFIO_PCI_INDEX_TO_OFFSET(index) ((u64)(index) << VFIO_PCI_OFFSET_SHIFT) -#define VFIO_PCI_OFFSET_MASK (((u64)(1) << VFIO_PCI_OFFSET_SHIFT) - 1) - /* Special capability IDs predefined access */ #define PCI_CAP_ID_INVALID 0xFF /* default raw access */ #define PCI_CAP_ID_INVALID_VIRT 0xFE /* default virt access */ diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 3b372fa57ef4..ed5ca027eb49 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -15,6 +15,12 @@ #include #include +#define VFIO_PCI_OFFSET_SHIFT 40 + +#define VFIO_PCI_OFFSET_TO_INDEX(off) ((off) >> VFIO_PCI_OFFSET_SHIFT) +#define VFIO_PCI_INDEX_TO_OFFSET(index) ((u64)(index) << VFIO_PCI_OFFSET_SHIFT) +#define VFIO_PCI_OFFSET_MASK (((u64)(1) << VFIO_PCI_OFFSET_SHIFT) - 1) + struct vfio_device { struct device *dev; const struct vfio_device_ops *ops; From patchwork Sat May 22 00:20:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12274227 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D6961C4707E for ; Sat, 22 May 2021 00:20:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BAF7C613E1 for ; Sat, 22 May 2021 00:20:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230479AbhEVAWU (ORCPT ); Fri, 21 May 2021 20:22:20 -0400 Received: from mga03.intel.com ([134.134.136.65]:39211 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231182AbhEVAVw (ORCPT ); Fri, 21 May 2021 20:21:52 -0400 IronPort-SDR: NmYsiFG81xNMSl8MsvGmvVwGbkdqNrCPwGLgffem87SWbYsLqxhXcEe2x/PnUaONWK3RoTxpYy H7aNd7OI6FYQ== X-IronPort-AV: E=McAfee;i="6200,9189,9991"; a="201652798" X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="201652798" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:20:27 -0700 IronPort-SDR: uC8BZWOEfkuBojiC6NjuVOcuTKOcQVY9U9Q4dvrPV28Db5SA5VQXIa0a+F2ikPriOP7K9MVYaJ VkpwMH5evsJA== X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="406873835" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:20:26 -0700 Subject: [PATCH v6 13/20] vfio/mdev: idxd: add mdev driver registration and helper functions From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org, jgg@mellanox.com Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 21 May 2021 17:20:26 -0700 Message-ID: <162164282601.261970.10405911922092921185.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> References: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Create a mediated device through the VFIO mediated device framework. The mdev framework allows creation of an mediated device by the driver with portion of the device's resources. The driver will emulate the slow path such as the PCI config space, MMIO bar, and the command registers. The descriptor submission portal(s) will be mmaped to the guest in order to submit descriptors directly by the guest kernel or apps. The mediated device support code in the idxd will be referred to as the Virtual Device Composition Module (vdcm). Add basic plumbing to fill out the mdev_parent_ops struct that VFIO mdev requires to support a mediated device. Signed-off-by: Dave Jiang --- drivers/dma/idxd/idxd.h | 1 drivers/vfio/mdev/idxd/mdev.c | 638 +++++++++++++++++++++++++++++++++++++++++ drivers/vfio/mdev/idxd/mdev.h | 25 ++ 3 files changed, 664 insertions(+) diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 4d2532175705..0d9e2710fc76 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -198,6 +198,7 @@ struct idxd_wq { u64 max_xfer_bytes; u32 max_batch_size; bool ats_dis; + struct vdcm_idxd *vidxd; }; struct idxd_engine { diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c index 25cd62b803f8..e484095baeea 100644 --- a/drivers/vfio/mdev/idxd/mdev.c +++ b/drivers/vfio/mdev/idxd/mdev.c @@ -41,12 +41,650 @@ int idxd_mdev_get_pasid(struct mdev_device *mdev, struct vfio_device *vdev, u32 return 0; } +static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags, + unsigned int index, unsigned int start, + unsigned int count, void *data); + +static int idxd_vdcm_get_irq_count(struct vfio_device *vdev, int type) +{ + if (type == VFIO_PCI_MSIX_IRQ_INDEX) + return VIDXD_MAX_MSIX_VECS; + + return 0; +} + +static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev_device *mdev, + struct vdcm_idxd_type *type) +{ + struct vdcm_idxd *vidxd; + struct idxd_wq *wq = NULL; + + if (!wq) + return ERR_PTR(-ENODEV); + + vidxd = kzalloc(sizeof(*vidxd), GFP_KERNEL); + if (!vidxd) + return ERR_PTR(-ENOMEM); + + mutex_init(&vidxd->dev_lock); + vidxd->idxd = idxd; + vidxd->mdev = mdev; + vidxd->type = type; + vidxd->num_wqs = VIDXD_MAX_WQS; + + mutex_lock(&wq->wq_lock); + idxd_wq_get(wq); + wq->vidxd = vidxd; + vidxd->wq = wq; + mutex_unlock(&wq->wq_lock); + vidxd_init(vidxd); + + return vidxd; +} + +static struct vdcm_idxd_type idxd_mdev_types[IDXD_MDEV_TYPES]; + +static struct vdcm_idxd_type *idxd_vdcm_get_type(struct mdev_device *mdev) +{ + return &idxd_mdev_types[mdev_get_type_group_id(mdev)]; +} + +static const struct vfio_device_ops idxd_mdev_ops; + +static int idxd_vdcm_probe(struct mdev_device *mdev) +{ + struct vdcm_idxd *vidxd; + struct vdcm_idxd_type *type; + struct device *dev, *parent; + struct idxd_device *idxd; + bool ims_map[VIDXD_MAX_MSIX_VECS]; + int rc; + + parent = mdev_parent_dev(mdev); + idxd = dev_get_drvdata(parent); + dev = &mdev->dev; + mdev_set_iommu_device(mdev, parent); + type = idxd_vdcm_get_type(mdev); + + vidxd = vdcm_vidxd_create(idxd, mdev, type); + if (IS_ERR(vidxd)) { + dev_err(dev, "failed to create vidxd: %ld\n", PTR_ERR(vidxd)); + return PTR_ERR(vidxd); + } + + vfio_init_group_dev(&vidxd->vdev, &mdev->dev, &idxd_mdev_ops); + + ims_map[0] = 0; + ims_map[1] = 1; + rc = mdev_irqs_init(mdev, VIDXD_MAX_MSIX_VECS, ims_map); + if (rc < 0) + goto err; + + rc = vfio_register_group_dev(&vidxd->vdev); + if (rc < 0) + goto err_group_register; + dev_set_drvdata(dev, vidxd); + + return 0; + +err_group_register: + mdev_irqs_free(mdev); +err: + kfree(vidxd); + return rc; +} + +static void idxd_vdcm_remove(struct mdev_device *mdev) +{ + struct vdcm_idxd *vidxd = dev_get_drvdata(&mdev->dev); + struct idxd_wq *wq = vidxd->wq; + + vfio_unregister_group_dev(&vidxd->vdev); + mdev_irqs_free(mdev); + mutex_lock(&wq->wq_lock); + idxd_wq_put(wq); + mutex_unlock(&wq->wq_lock); + + kfree(vidxd); +} + +static int idxd_vdcm_open(struct vfio_device *vdev) +{ + return 0; +} + +static void idxd_vdcm_close(struct vfio_device *vdev) +{ + struct vdcm_idxd *vidxd = vdev_to_vidxd(vdev); + + mutex_lock(&vidxd->dev_lock); + idxd_vdcm_set_irqs(vidxd, VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_TRIGGER, + VFIO_PCI_MSIX_IRQ_INDEX, 0, 0, NULL); + + /* Re-initialize the VIDXD to a pristine state for re-use */ + vidxd_init(vidxd); + mutex_unlock(&vidxd->dev_lock); +} + +static ssize_t idxd_vdcm_rw(struct vfio_device *vdev, char *buf, size_t count, loff_t *ppos, + enum idxd_vdcm_rw mode) +{ + struct vdcm_idxd *vidxd = vdev_to_vidxd(vdev); + unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos); + u64 pos = *ppos & VFIO_PCI_OFFSET_MASK; + struct device *dev = vdev->dev; + int rc = -EINVAL; + + if (index >= VFIO_PCI_NUM_REGIONS) { + dev_err(dev, "invalid index: %u\n", index); + return -EINVAL; + } + + switch (index) { + case VFIO_PCI_CONFIG_REGION_INDEX: + if (mode == IDXD_VDCM_WRITE) + rc = vidxd_cfg_write(vidxd, pos, buf, count); + else + rc = vidxd_cfg_read(vidxd, pos, buf, count); + break; + case VFIO_PCI_BAR0_REGION_INDEX: + case VFIO_PCI_BAR1_REGION_INDEX: + if (mode == IDXD_VDCM_WRITE) + rc = vidxd_mmio_write(vidxd, vidxd->bar_val[0] + pos, buf, count); + else + rc = vidxd_mmio_read(vidxd, vidxd->bar_val[0] + pos, buf, count); + break; + case VFIO_PCI_BAR2_REGION_INDEX: + case VFIO_PCI_BAR3_REGION_INDEX: + case VFIO_PCI_BAR4_REGION_INDEX: + case VFIO_PCI_BAR5_REGION_INDEX: + case VFIO_PCI_VGA_REGION_INDEX: + case VFIO_PCI_ROM_REGION_INDEX: + default: + dev_err(dev, "unsupported region: %u\n", index); + } + + return rc == 0 ? count : rc; +} + +static ssize_t idxd_vdcm_read(struct vfio_device *vdev, char __user *buf, size_t count, + loff_t *ppos) +{ + struct vdcm_idxd *vidxd = vdev_to_vidxd(vdev); + unsigned int done = 0; + int rc; + + mutex_lock(&vidxd->dev_lock); + while (count) { + size_t filled; + + if (count >= 4 && !(*ppos % 4)) { + u32 val; + + rc = idxd_vdcm_rw(vdev, (char *)&val, sizeof(val), + ppos, IDXD_VDCM_READ); + if (rc <= 0) + goto read_err; + + if (copy_to_user(buf, &val, sizeof(val))) + goto read_err; + + filled = 4; + } else if (count >= 2 && !(*ppos % 2)) { + u16 val; + + rc = idxd_vdcm_rw(vdev, (char *)&val, sizeof(val), + ppos, IDXD_VDCM_READ); + if (rc <= 0) + goto read_err; + + if (copy_to_user(buf, &val, sizeof(val))) + goto read_err; + + filled = 2; + } else { + u8 val; + + rc = idxd_vdcm_rw(vdev, &val, sizeof(val), ppos, + IDXD_VDCM_READ); + if (rc <= 0) + goto read_err; + + if (copy_to_user(buf, &val, sizeof(val))) + goto read_err; + + filled = 1; + } + + count -= filled; + done += filled; + *ppos += filled; + buf += filled; + } + + mutex_unlock(&vidxd->dev_lock); + return done; + + read_err: + mutex_unlock(&vidxd->dev_lock); + return -EFAULT; +} + +static ssize_t idxd_vdcm_write(struct vfio_device *vdev, const char __user *buf, size_t count, + loff_t *ppos) +{ + struct vdcm_idxd *vidxd = vdev_to_vidxd(vdev); + unsigned int done = 0; + int rc; + + mutex_lock(&vidxd->dev_lock); + while (count) { + size_t filled; + + if (count >= 4 && !(*ppos % 4)) { + u32 val; + + if (copy_from_user(&val, buf, sizeof(val))) + goto write_err; + + rc = idxd_vdcm_rw(vdev, (char *)&val, sizeof(val), + ppos, IDXD_VDCM_WRITE); + if (rc <= 0) + goto write_err; + + filled = 4; + } else if (count >= 2 && !(*ppos % 2)) { + u16 val; + + if (copy_from_user(&val, buf, sizeof(val))) + goto write_err; + + rc = idxd_vdcm_rw(vdev, (char *)&val, + sizeof(val), ppos, IDXD_VDCM_WRITE); + if (rc <= 0) + goto write_err; + + filled = 2; + } else { + u8 val; + + if (copy_from_user(&val, buf, sizeof(val))) + goto write_err; + + rc = idxd_vdcm_rw(vdev, &val, sizeof(val), + ppos, IDXD_VDCM_WRITE); + if (rc <= 0) + goto write_err; + + filled = 1; + } + + count -= filled; + done += filled; + *ppos += filled; + buf += filled; + } + + mutex_unlock(&vidxd->dev_lock); + return done; + +write_err: + mutex_unlock(&vidxd->dev_lock); + return -EFAULT; +} + +static int idxd_vdcm_mmap(struct vfio_device *vdev, struct vm_area_struct *vma) +{ + unsigned int wq_idx; + unsigned long req_size, pgoff = 0, offset; + pgprot_t pg_prot; + struct vdcm_idxd *vidxd = vdev_to_vidxd(vdev); + struct idxd_wq *wq = vidxd->wq; + struct idxd_device *idxd = vidxd->idxd; + enum idxd_portal_prot virt_portal, phys_portal; + phys_addr_t base = pci_resource_start(idxd->pdev, IDXD_WQ_BAR); + struct device *dev = vdev->dev; + + if (!(vma->vm_flags & VM_SHARED)) + return -EINVAL; + + pg_prot = vma->vm_page_prot; + req_size = vma->vm_end - vma->vm_start; + if (req_size > PAGE_SIZE) + return -EINVAL; + + vma->vm_flags |= VM_DONTCOPY; + + offset = (vma->vm_pgoff << PAGE_SHIFT) & + ((1ULL << VFIO_PCI_OFFSET_SHIFT) - 1); + + wq_idx = offset >> (PAGE_SHIFT + 2); + if (wq_idx >= 1) { + dev_err(dev, "mapping invalid wq %d off %lx\n", + wq_idx, offset); + return -EINVAL; + } + + /* + * Check and see if the guest wants to map to the limited or unlimited portal. + * The driver will allow mapping to unlimited portal only if the wq is a + * dedicated wq. Otherwise, it goes to limited. + */ + virt_portal = ((offset >> PAGE_SHIFT) & 0x3) == 1; + phys_portal = IDXD_PORTAL_LIMITED; + if (virt_portal == IDXD_PORTAL_UNLIMITED && wq_dedicated(wq)) + phys_portal = IDXD_PORTAL_UNLIMITED; + + /* We always map IMS portals to the guest */ + pgoff = (base + idxd_get_wq_portal_offset(wq->id, phys_portal, + IDXD_IRQ_IMS)) >> PAGE_SHIFT; + + dev_dbg(dev, "mmap %lx %lx %lx %lx\n", vma->vm_start, pgoff, req_size, + pgprot_val(pg_prot)); + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + vma->vm_pgoff = pgoff; + + return remap_pfn_range(vma, vma->vm_start, pgoff, req_size, pg_prot); +} + +static void vidxd_vdcm_reset(struct vdcm_idxd *vidxd) +{ + vidxd_reset(vidxd); +} + +static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags, + unsigned int index, unsigned int start, + unsigned int count, void *data) +{ + struct mdev_device *mdev = vidxd->mdev; + + switch (index) { + case VFIO_PCI_INTX_IRQ_INDEX: + case VFIO_PCI_MSI_IRQ_INDEX: + break; + case VFIO_PCI_MSIX_IRQ_INDEX: + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { + case VFIO_IRQ_SET_ACTION_MASK: + case VFIO_IRQ_SET_ACTION_UNMASK: + break; + case VFIO_IRQ_SET_ACTION_TRIGGER: + return mdev_set_msix_trigger(mdev, index, start, count, flags, data); + } + break; + } + + return -ENOTTY; +} + +static long idxd_vdcm_ioctl(struct vfio_device *vdev, unsigned int cmd, unsigned long arg) +{ + struct vdcm_idxd *vidxd = vdev_to_vidxd(vdev); + unsigned long minsz; + int rc = -EINVAL; + struct device *dev = vdev->dev; + + dev_dbg(dev, "vidxd %p ioctl, cmd: %d\n", vidxd, cmd); + + mutex_lock(&vidxd->dev_lock); + if (cmd == VFIO_DEVICE_GET_INFO) { + struct vfio_device_info info; + + minsz = offsetofend(struct vfio_device_info, num_irqs); + + if (copy_from_user(&info, (void __user *)arg, minsz)) { + rc = -EFAULT; + goto out; + } + + if (info.argsz < minsz) { + rc = -EINVAL; + goto out; + } + + info.flags = VFIO_DEVICE_FLAGS_PCI; + info.flags |= VFIO_DEVICE_FLAGS_RESET; + info.num_regions = VFIO_PCI_NUM_REGIONS; + info.num_irqs = VFIO_PCI_NUM_IRQS; + + if (copy_to_user((void __user *)arg, &info, minsz)) + rc = -EFAULT; + else + rc = 0; + goto out; + } else if (cmd == VFIO_DEVICE_GET_REGION_INFO) { + struct vfio_region_info info; + struct vfio_info_cap caps = { .buf = NULL, .size = 0 }; + struct vfio_region_info_cap_sparse_mmap *sparse = NULL; + size_t size; + int nr_areas = 1; + int cap_type_id = 0; + + minsz = offsetofend(struct vfio_region_info, offset); + + if (copy_from_user(&info, (void __user *)arg, minsz)) { + rc = -EFAULT; + goto out; + } + + if (info.argsz < minsz) { + rc = -EINVAL; + goto out; + } + + switch (info.index) { + case VFIO_PCI_CONFIG_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = VIDXD_MAX_CFG_SPACE_SZ; + info.flags = VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE; + break; + case VFIO_PCI_BAR0_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = vidxd->bar_size[info.index]; + if (!info.size) { + info.flags = 0; + break; + } + + info.flags = VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE; + break; + case VFIO_PCI_BAR1_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = 0; + info.flags = 0; + break; + case VFIO_PCI_BAR2_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.flags = VFIO_REGION_INFO_FLAG_CAPS | VFIO_REGION_INFO_FLAG_MMAP | + VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE; + info.size = vidxd->bar_size[1]; + + /* + * Every WQ has two areas for unlimited and limited + * MSI-X portals. IMS portals are not reported + */ + nr_areas = 2; + + size = sizeof(*sparse) + (nr_areas * sizeof(*sparse->areas)); + sparse = kzalloc(size, GFP_KERNEL); + if (!sparse) { + rc = -ENOMEM; + goto out; + } + + sparse->header.id = VFIO_REGION_INFO_CAP_SPARSE_MMAP; + sparse->header.version = 1; + sparse->nr_areas = nr_areas; + cap_type_id = VFIO_REGION_INFO_CAP_SPARSE_MMAP; + + /* Unlimited portal */ + sparse->areas[0].offset = 0; + sparse->areas[0].size = PAGE_SIZE; + + /* Limited portal */ + sparse->areas[1].offset = PAGE_SIZE; + sparse->areas[1].size = PAGE_SIZE; + break; + + case VFIO_PCI_BAR3_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = 0; + info.flags = 0; + dev_dbg(dev, "get region info bar:%d\n", info.index); + break; + + case VFIO_PCI_ROM_REGION_INDEX: + case VFIO_PCI_VGA_REGION_INDEX: + dev_dbg(dev, "get region info index:%d\n", info.index); + break; + default: { + if (info.index >= VFIO_PCI_NUM_REGIONS) + rc = -EINVAL; + else + rc = 0; + goto out; + } /* default */ + } /* info.index switch */ + + if ((info.flags & VFIO_REGION_INFO_FLAG_CAPS) && sparse) { + if (cap_type_id == VFIO_REGION_INFO_CAP_SPARSE_MMAP) { + rc = vfio_info_add_capability(&caps, &sparse->header, + sizeof(*sparse) + (sparse->nr_areas * + sizeof(*sparse->areas))); + kfree(sparse); + if (rc) + goto out; + } + } + + if (caps.size) { + if (info.argsz < sizeof(info) + caps.size) { + info.argsz = sizeof(info) + caps.size; + info.cap_offset = 0; + } else { + vfio_info_cap_shift(&caps, sizeof(info)); + if (copy_to_user((void __user *)arg + sizeof(info), + caps.buf, caps.size)) { + kfree(caps.buf); + rc = -EFAULT; + goto out; + } + info.cap_offset = sizeof(info); + } + + kfree(caps.buf); + } + if (copy_to_user((void __user *)arg, &info, minsz)) + rc = -EFAULT; + else + rc = 0; + goto out; + } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) { + struct vfio_irq_info info; + u32 pasid; + + rc = idxd_mdev_get_pasid(vidxd->mdev, vdev, &pasid); + if (rc < 0) + goto out; + mdev_irqs_set_pasid(vidxd->mdev, pasid); + + minsz = offsetofend(struct vfio_irq_info, count); + + if (copy_from_user(&info, (void __user *)arg, minsz)) { + rc = -EFAULT; + goto out; + } + + if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS) { + rc = -EINVAL; + goto out; + } + + info.flags = VFIO_IRQ_INFO_EVENTFD; + + switch (info.index) { + case VFIO_PCI_MSIX_IRQ_INDEX: + info.flags |= VFIO_IRQ_INFO_NORESIZE; + break; + default: + rc = -EINVAL; + goto out; + } /* switch(info.index) */ + + info.flags = VFIO_IRQ_INFO_EVENTFD | VFIO_IRQ_INFO_NORESIZE; + info.count = idxd_vdcm_get_irq_count(vdev, info.index); + + if (copy_to_user((void __user *)arg, &info, minsz)) + rc = -EFAULT; + else + rc = 0; + goto out; + } else if (cmd == VFIO_DEVICE_SET_IRQS) { + struct vfio_irq_set hdr; + u8 *data = NULL; + size_t data_size = 0; + + minsz = offsetofend(struct vfio_irq_set, count); + + if (copy_from_user(&hdr, (void __user *)arg, minsz)) { + rc = -EFAULT; + goto out; + } + + if (!(hdr.flags & VFIO_IRQ_SET_DATA_NONE)) { + int max = idxd_vdcm_get_irq_count(vdev, hdr.index); + + rc = vfio_set_irqs_validate_and_prepare(&hdr, max, VFIO_PCI_NUM_IRQS, + &data_size); + if (rc) { + dev_err(dev, "intel:vfio_set_irqs_validate_and_prepare failed\n"); + rc = -EINVAL; + goto out; + } + + if (data_size) { + data = memdup_user((void __user *)(arg + minsz), data_size); + if (IS_ERR(data)) { + rc = PTR_ERR(data); + goto out; + } + } + } + + if (!data) { + rc = -EINVAL; + goto out; + } + + rc = idxd_vdcm_set_irqs(vidxd, hdr.flags, hdr.index, hdr.start, hdr.count, data); + kfree(data); + goto out; + } else if (cmd == VFIO_DEVICE_RESET) { + vidxd_vdcm_reset(vidxd); + } + + out: + mutex_unlock(&vidxd->dev_lock); + return rc; +} + +static const struct vfio_device_ops idxd_mdev_ops = { + .name = "vfio-mdev", + .open = idxd_vdcm_open, + .release = idxd_vdcm_close, + .read = idxd_vdcm_read, + .write = idxd_vdcm_write, + .mmap = idxd_vdcm_mmap, + .ioctl = idxd_vdcm_ioctl, +}; + static struct mdev_driver idxd_vdcm_driver = { .driver = { .name = "idxd-mdev", .owner = THIS_MODULE, .mod_name = KBUILD_MODNAME, }, + .probe = idxd_vdcm_probe, + .remove = idxd_vdcm_remove, }; static int idxd_mdev_drv_probe(struct device *dev) diff --git a/drivers/vfio/mdev/idxd/mdev.h b/drivers/vfio/mdev/idxd/mdev.h index f696fe38e374..dd4290bce772 100644 --- a/drivers/vfio/mdev/idxd/mdev.h +++ b/drivers/vfio/mdev/idxd/mdev.h @@ -30,11 +30,26 @@ #define VIDXD_MAX_MSIX_ENTRIES VIDXD_MAX_MSIX_VECS #define VIDXD_MAX_WQS 1 +#define IDXD_MDEV_NAME_LEN 64 +#define IDXD_MDEV_TYPES 2 + +enum idxd_mdev_type { + IDXD_MDEV_TYPE_DSA_1_DWQ = 0, + IDXD_MDEV_TYPE_IAX_1_DWQ, +}; + +struct vdcm_idxd_type { + const char *name; + enum idxd_mdev_type type; + unsigned int avail_instance; +}; + struct vdcm_idxd { struct vfio_device vdev; struct idxd_device *idxd; struct idxd_wq *wq; struct mdev_device *mdev; + struct vdcm_idxd_type *type; int num_wqs; u64 bar_val[VIDXD_MAX_BARS]; @@ -44,6 +59,16 @@ struct vdcm_idxd { struct mutex dev_lock; /* lock for vidxd resources */ }; +enum idxd_vdcm_rw { + IDXD_VDCM_READ = 0, + IDXD_VDCM_WRITE, +}; + +static inline struct vdcm_idxd *vdev_to_vidxd(struct vfio_device *vdev) +{ + return container_of(vdev, struct vdcm_idxd, vdev); +} + static inline u64 get_reg_val(void *buf, int size) { u64 val = 0; From patchwork Sat May 22 00:20:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12274251 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.9 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB721C4707A for ; Sat, 22 May 2021 00:22:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BBB87613DB for ; Sat, 22 May 2021 00:22:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231283AbhEVAYO (ORCPT ); Fri, 21 May 2021 20:24:14 -0400 Received: from mga14.intel.com ([192.55.52.115]:39679 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230265AbhEVAXc (ORCPT ); Fri, 21 May 2021 20:23:32 -0400 IronPort-SDR: /n64bcs1jHRBhQk2SS0Dd3/pBiXb5xSscxWGETK7t0llWqXWKRcNLisjQkL/bA3nUgcw6iaCJQ l8c9257w/KaA== X-IronPort-AV: E=McAfee;i="6200,9189,9991"; a="201312184" X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="201312184" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:20:33 -0700 IronPort-SDR: oX7Y5oO9ui9QqSKITyIQ1u4G0dXw/VfdMfhzvqETytAUDwK0GUMxZ0aAgecUKAFlsyhKR2HHuc epTbZWM793Mg== X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="544268896" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:20:32 -0700 Subject: [PATCH v6 14/20] vfio/mdev: idxd: add 1dwq-v1 mdev type From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org, jgg@mellanox.com Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 21 May 2021 17:20:31 -0700 Message-ID: <162164283194.261970.12689960276759011457.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> References: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add mdev device type "1dwq-v1" support code. 1dwq-v1 is defined as a single DSA gen1 dedicated WQ. This WQ cannot be shared between guests. The guest also cannot change any WQ configuration. Signed-off-by: Dave Jiang --- drivers/vfio/mdev/idxd/mdev.c | 173 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 166 insertions(+), 7 deletions(-) diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c index e484095baeea..9f6c4997ec24 100644 --- a/drivers/vfio/mdev/idxd/mdev.c +++ b/drivers/vfio/mdev/idxd/mdev.c @@ -22,6 +22,13 @@ #include "idxd.h" #include "mdev.h" +static const char idxd_dsa_1dwq_name[] = "dsa-1dwq-v1"; +static const char idxd_iax_1dwq_name[] = "iax-1dwq-v1"; + +static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags, + unsigned int index, unsigned int start, + unsigned int count, void *data); + int idxd_mdev_get_pasid(struct mdev_device *mdev, struct vfio_device *vdev, u32 *pasid) { struct vfio_group *vfio_group = vdev->group; @@ -41,10 +48,6 @@ int idxd_mdev_get_pasid(struct mdev_device *mdev, struct vfio_device *vdev, u32 return 0; } -static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags, - unsigned int index, unsigned int start, - unsigned int count, void *data); - static int idxd_vdcm_get_irq_count(struct vfio_device *vdev, int type) { if (type == VFIO_PCI_MSIX_IRQ_INDEX) @@ -53,18 +56,73 @@ static int idxd_vdcm_get_irq_count(struct vfio_device *vdev, int type) return 0; } +static struct idxd_wq *find_any_dwq(struct idxd_device *idxd, struct vdcm_idxd_type *type) +{ + int i; + struct idxd_wq *wq; + unsigned long flags; + + switch (type->type) { + case IDXD_MDEV_TYPE_DSA_1_DWQ: + if (idxd->data->type != IDXD_TYPE_DSA) + return NULL; + break; + case IDXD_MDEV_TYPE_IAX_1_DWQ: + if (idxd->data->type != IDXD_TYPE_IAX) + return NULL; + break; + default: + return NULL; + } + + spin_lock_irqsave(&idxd->dev_lock, flags); + for (i = 0; i < idxd->max_wqs; i++) { + wq = idxd->wqs[i]; + + if (wq->state != IDXD_WQ_ENABLED) + continue; + + if (!wq_dedicated(wq)) + continue; + + if (!is_idxd_wq_mdev(wq)) + continue; + + if (idxd_wq_refcount(wq) != 0) + continue; + + spin_unlock_irqrestore(&idxd->dev_lock, flags); + mutex_lock(&wq->wq_lock); + if (idxd_wq_refcount(wq)) { + spin_lock_irqsave(&idxd->dev_lock, flags); + continue; + } + + idxd_wq_get(wq); + mutex_unlock(&wq->wq_lock); + return wq; + } + + spin_unlock_irqrestore(&idxd->dev_lock, flags); + return NULL; +} + static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev_device *mdev, struct vdcm_idxd_type *type) { struct vdcm_idxd *vidxd; struct idxd_wq *wq = NULL; + int rc; + wq = find_any_dwq(idxd, type); if (!wq) return ERR_PTR(-ENODEV); vidxd = kzalloc(sizeof(*vidxd), GFP_KERNEL); - if (!vidxd) - return ERR_PTR(-ENOMEM); + if (!vidxd) { + rc = -ENOMEM; + goto err; + } mutex_init(&vidxd->dev_lock); vidxd->idxd = idxd; @@ -80,9 +138,24 @@ static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev vidxd_init(vidxd); return vidxd; + + err: + mutex_lock(&wq->wq_lock); + idxd_wq_put(wq); + mutex_unlock(&wq->wq_lock); + return ERR_PTR(rc); } -static struct vdcm_idxd_type idxd_mdev_types[IDXD_MDEV_TYPES]; +static struct vdcm_idxd_type idxd_mdev_types[IDXD_MDEV_TYPES] = { + { + .name = idxd_dsa_1dwq_name, + .type = IDXD_MDEV_TYPE_DSA_1_DWQ, + }, + { + .name = idxd_iax_1dwq_name, + .type = IDXD_MDEV_TYPE_IAX_1_DWQ, + }, +}; static struct vdcm_idxd_type *idxd_vdcm_get_type(struct mdev_device *mdev) { @@ -677,6 +750,91 @@ static const struct vfio_device_ops idxd_mdev_ops = { .ioctl = idxd_vdcm_ioctl, }; +static ssize_t name_show(struct mdev_type *mtype, struct mdev_type_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%s\n", idxd_mdev_types[mtype_get_type_group_id(mtype)].name); +} +static MDEV_TYPE_ATTR_RO(name); + +static int find_available_mdev_instances(struct idxd_device *idxd, struct vdcm_idxd_type *type) +{ + int count = 0, i; + unsigned long flags; + + switch (type->type) { + case IDXD_MDEV_TYPE_DSA_1_DWQ: + if (idxd->data->type != IDXD_TYPE_DSA) + return 0; + break; + case IDXD_MDEV_TYPE_IAX_1_DWQ: + if (idxd->data->type != IDXD_TYPE_IAX) + return 0; + break; + default: + return 0; + } + + spin_lock_irqsave(&idxd->dev_lock, flags); + for (i = 0; i < idxd->max_wqs; i++) { + struct idxd_wq *wq; + + wq = idxd->wqs[i]; + if (!is_idxd_wq_mdev(wq) || !wq_dedicated(wq) || idxd_wq_refcount(wq)) + continue; + + count++; + } + spin_unlock_irqrestore(&idxd->dev_lock, flags); + + return count; +} + +static ssize_t available_instances_show(struct mdev_type *mtype, + struct mdev_type_attribute *attr, + char *buf) +{ + struct device *dev = mtype_get_parent_dev(mtype); + struct idxd_device *idxd = dev_get_drvdata(dev); + int count; + struct vdcm_idxd_type *type; + + type = &idxd_mdev_types[mtype_get_type_group_id(mtype)]; + count = find_available_mdev_instances(idxd, type); + + return sysfs_emit(buf, "%d\n", count); +} +static MDEV_TYPE_ATTR_RO(available_instances); + +static ssize_t device_api_show(struct mdev_type *mtype, struct mdev_type_attribute *attr, + char *buf) +{ + return sysfs_emit(buf, "%s\n", VFIO_DEVICE_API_PCI_STRING); +} +static MDEV_TYPE_ATTR_RO(device_api); + +static struct attribute *idxd_mdev_types_attrs[] = { + &mdev_type_attr_name.attr, + &mdev_type_attr_device_api.attr, + &mdev_type_attr_available_instances.attr, + NULL, +}; + +static struct attribute_group idxd_mdev_type_dsa_group0 = { + .name = idxd_dsa_1dwq_name, + .attrs = idxd_mdev_types_attrs, +}; + +static struct attribute_group idxd_mdev_type_iax_group0 = { + .name = idxd_iax_1dwq_name, + .attrs = idxd_mdev_types_attrs, +}; + +static struct attribute_group *idxd_mdev_type_groups[] = { + &idxd_mdev_type_dsa_group0, + &idxd_mdev_type_iax_group0, + NULL, +}; + static struct mdev_driver idxd_vdcm_driver = { .driver = { .name = "idxd-mdev", @@ -685,6 +843,7 @@ static struct mdev_driver idxd_vdcm_driver = { }, .probe = idxd_vdcm_probe, .remove = idxd_vdcm_remove, + .supported_type_groups = idxd_mdev_type_groups, }; static int idxd_mdev_drv_probe(struct device *dev) From patchwork Sat May 22 00:20:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12274229 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A942C4707A for ; Sat, 22 May 2021 00:21:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1FF30613D0 for ; Sat, 22 May 2021 00:21:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231256AbhEVAWd (ORCPT ); Fri, 21 May 2021 20:22:33 -0400 Received: from mga02.intel.com ([134.134.136.20]:2546 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231255AbhEVAWD (ORCPT ); Fri, 21 May 2021 20:22:03 -0400 IronPort-SDR: fi+7dYaKZ4qTU+zU6svarIEbKrVvALqgplV+G+x3nJiOpBONuBrkXpm0LuLN1HD6S7zqXHl60Y 75peFwzruLlA== X-IronPort-AV: E=McAfee;i="6200,9189,9991"; a="188728167" X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="188728167" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:20:39 -0700 IronPort-SDR: R6Ld8Jeh1YpVrBnCKtCCt9ZDuxVeJi+rAaPlJpE++vFtUQSWxurWeNvBCu1bU2JZIEwuG6LBjf SME2dqgGOKIA== X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="628864663" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:20:38 -0700 Subject: [PATCH v6 15/20] vfio/mdev: idxd: ims domain setup for the vdcm From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org, jgg@mellanox.com Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 21 May 2021 17:20:37 -0700 Message-ID: <162164283796.261970.11020270418798826121.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> References: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add setup code for the IMS domain. This feeds the MSI subsystem the relevant information for device IMS. The allocation of the IMS vectors are done in common VFIO code if the correct domain set for the mdev device. Signed-off-by: Dave Jiang --- drivers/dma/idxd/idxd.h | 1 + drivers/dma/idxd/init.c | 14 ++++++++++++++ drivers/vfio/mdev/idxd/mdev.c | 2 ++ 3 files changed, 17 insertions(+) diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 0d9e2710fc76..81c78add74dd 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -297,6 +297,7 @@ struct idxd_device { struct workqueue_struct *wq; struct work_struct work; + struct irq_domain *ims_domain; int *int_handles; struct idxd_pmu *idxd_pmu; diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index 809ca1827772..ead46761b23e 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -16,6 +16,8 @@ #include #include #include +#include +#include #include #include #include "../dmaengine.h" @@ -66,6 +68,7 @@ MODULE_DEVICE_TABLE(pci, idxd_pci_tbl); int idxd_mdev_host_init(struct idxd_device *idxd, struct mdev_driver *drv) { struct device *dev = &idxd->pdev->dev; + struct ims_array_info ims_info; int rc; if (!idxd->ims_size) @@ -77,8 +80,18 @@ int idxd_mdev_host_init(struct idxd_device *idxd, struct mdev_driver *drv) return rc; } + ims_info.max_slots = idxd->ims_size; + ims_info.slots = idxd->reg_base + idxd->ims_offset; + idxd->ims_domain = pci_ims_array_create_msi_irq_domain(idxd->pdev, &ims_info); + if (!idxd->ims_domain) { + dev_warn(dev, "Fail to acquire IMS domain\n"); + iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_AUX); + return -ENODEV; + } + rc = mdev_register_device(dev, drv); if (rc < 0) { + irq_domain_remove(idxd->ims_domain); iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_AUX); return rc; } @@ -93,6 +106,7 @@ void idxd_mdev_host_release(struct kref *kref) struct idxd_device *idxd = container_of(kref, struct idxd_device, mdev_kref); struct device *dev = &idxd->pdev->dev; + irq_domain_remove(idxd->ims_domain); mdev_unregister_device(dev); iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_AUX); } diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c index 9f6c4997ec24..7dac024e2852 100644 --- a/drivers/vfio/mdev/idxd/mdev.c +++ b/drivers/vfio/mdev/idxd/mdev.c @@ -111,6 +111,7 @@ static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev struct vdcm_idxd_type *type) { struct vdcm_idxd *vidxd; + struct device *dev = &mdev->dev; struct idxd_wq *wq = NULL; int rc; @@ -129,6 +130,7 @@ static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev vidxd->mdev = mdev; vidxd->type = type; vidxd->num_wqs = VIDXD_MAX_WQS; + dev_set_msi_domain(dev, idxd->ims_domain); mutex_lock(&wq->wq_lock); idxd_wq_get(wq); From patchwork Sat May 22 00:20:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12274233 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8DA9C4707A for ; Sat, 22 May 2021 00:21:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B6C4561152 for ; Sat, 22 May 2021 00:21:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230484AbhEVAXV (ORCPT ); Fri, 21 May 2021 20:23:21 -0400 Received: from mga07.intel.com ([134.134.136.100]:31480 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230334AbhEVAWd (ORCPT ); Fri, 21 May 2021 20:22:33 -0400 IronPort-SDR: vVSjZ/+4IG0PjRoMpVORoaa6TqStWIZwOlEu2zK0d5mcl3GNvI0jTtA9XvQ/i5gzBYpt5EzpHJ 1OIS6FX3mYQw== X-IronPort-AV: E=McAfee;i="6200,9189,9991"; a="265504426" X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="265504426" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:20:44 -0700 IronPort-SDR: p9ylZudZE/ebgREN7McnQJns4oDNrFuq3y077r7ZMETpJ+o52SSvtCHzMlMyFN+uxVFhCFHsVP n2NM2XTO+NHg== X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="631991932" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:20:44 -0700 Subject: [PATCH v6 16/20] vfio/mdev: idxd: add new wq state for mdev From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org, jgg@mellanox.com Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 21 May 2021 17:20:43 -0700 Message-ID: <162164284391.261970.15032815614261092191.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> References: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org When a dedicated wq is enabled as mdev, we must disable the wq on the device in order to program the pasid to the wq. Introduce a wq state IDXD_WQ_LOCKED that is software state only in order to prevent the user from modifying the configuration while mdev wq is in this state. While in this state, the wq is not in DISABLED state and will prevent any modifications to the configuration. It is also not in the ENABLED state and therefore prevents any actions allowed in the ENABLED state. For mdev, the dwq is disabled and set to LOCKED state upon the mdev creation. When ->open() is called on the mdev and a pasid is programmed to the WQCFG, the dwq is enabled again and goes to the ENABLED state. Signed-off-by: Dave Jiang --- drivers/dma/idxd/device.c | 16 ++++++++++++++++ drivers/dma/idxd/idxd.h | 1 + drivers/dma/idxd/sysfs.c | 2 ++ drivers/vfio/mdev/idxd/mdev.c | 5 +++++ drivers/vfio/mdev/idxd/vdev.c | 4 +++- 5 files changed, 27 insertions(+), 1 deletion(-) diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index cef41a273cc1..c46b6bc055bd 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -243,6 +243,14 @@ int idxd_wq_disable(struct idxd_wq *wq) dev_dbg(dev, "Disabling WQ %d\n", wq->id); + /* + * When the wq is in LOCKED state, it means it is disabled but + * also at the same time is "enabled" as far as the user is + * concerned. So a call to disable the hardware can be skipped. + */ + if (wq->state == IDXD_WQ_LOCKED) + return 0; + if (wq->state != IDXD_WQ_ENABLED) { dev_dbg(dev, "WQ %d in wrong state: %d\n", wq->id, wq->state); return 0; @@ -285,6 +293,14 @@ void idxd_wq_reset(struct idxd_wq *wq) struct device *dev = &idxd->pdev->dev; u32 operand; + /* + * When the wq is in LOCKED state, it means it is disabled but + * also at the same time is "enabled" as far as the user is + * concerned. So a call to reset the hardware can be skipped. + */ + if (wq->state == IDXD_WQ_LOCKED) + return; + if (wq->state != IDXD_WQ_ENABLED) { dev_dbg(dev, "WQ %d in wrong state: %d\n", wq->id, wq->state); return; diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 81c78add74dd..5e8da9019c46 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -120,6 +120,7 @@ struct idxd_pmu { enum idxd_wq_state { IDXD_WQ_DISABLED = 0, IDXD_WQ_ENABLED, + IDXD_WQ_LOCKED, }; enum idxd_wq_flag { diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index 3d3a84be2c9b..435ad3c62ad6 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -584,6 +584,8 @@ static ssize_t wq_state_show(struct device *dev, return sysfs_emit(buf, "disabled\n"); case IDXD_WQ_ENABLED: return sysfs_emit(buf, "enabled\n"); + case IDXD_WQ_LOCKED: + return sysfs_emit(buf, "locked\n"); } return sysfs_emit(buf, "unknown\n"); diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c index 7dac024e2852..3257505fb7c7 100644 --- a/drivers/vfio/mdev/idxd/mdev.c +++ b/drivers/vfio/mdev/idxd/mdev.c @@ -894,6 +894,11 @@ static void idxd_mdev_drv_remove(struct device *dev) struct idxd_device *idxd = wq->idxd; drv_disable_wq(wq); + mutex_lock(&wq->wq_lock); + if (wq->state == IDXD_WQ_LOCKED) + wq->state = IDXD_WQ_DISABLED; + mutex_unlock(&wq->wq_lock); + dev_info(dev, "wq %s disabled\n", dev_name(dev)); kref_put_mutex(&idxd->mdev_kref, idxd_mdev_host_release, &idxd->kref_lock); put_device(dev); diff --git a/drivers/vfio/mdev/idxd/vdev.c b/drivers/vfio/mdev/idxd/vdev.c index 67da4c122a96..a444a0af8b5f 100644 --- a/drivers/vfio/mdev/idxd/vdev.c +++ b/drivers/vfio/mdev/idxd/vdev.c @@ -75,8 +75,10 @@ void vidxd_init(struct vdcm_idxd *vidxd) vidxd_mmio_init(vidxd); - if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED) + if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED) { idxd_wq_disable(wq); + wq->state = IDXD_WQ_LOCKED; + } } void vidxd_send_interrupt(struct vdcm_idxd *vidxd, int vector) From patchwork Sat May 22 00:20:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12274245 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2698C4707A for ; Sat, 22 May 2021 00:22:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9AB74613D0 for ; Sat, 22 May 2021 00:22:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231150AbhEVAXj (ORCPT ); Fri, 21 May 2021 20:23:39 -0400 Received: from mga12.intel.com ([192.55.52.136]:24212 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230429AbhEVAW5 (ORCPT ); Fri, 21 May 2021 20:22:57 -0400 IronPort-SDR: 5FW0h8FVWdJ4bgabPRBKyeIbVEJK0GW27bLlcaHowfpgVXrE1eKQnXiVa6jZ1AwubBrRIOIjVV QxIx3PCgWj/A== X-IronPort-AV: E=McAfee;i="6200,9189,9991"; a="181210685" X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="181210685" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:20:51 -0700 IronPort-SDR: MPcVlthEflmHx/IsUHghDBX+s9yREoSIh+uaaWREq5h4+vYWuwki8NFImjdQ6FQssNd/zhVHsJ KJR2qgwPq5vA== X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="412801885" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:20:50 -0700 Subject: [PATCH v6 17/20] vfio/mdev: idxd: add error notification from host driver to mediated device From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org, jgg@mellanox.com Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 21 May 2021 17:20:49 -0700 Message-ID: <162164284985.261970.10964903081717121200.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> References: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org When a device error occurs, the mediated device need to be notified in order to notify the guest of device error. Add support to notify the specific mdev when an error is wq specific and broadcast errors to all mdev when it's a generic device error. Signed-off-by: Dave Jiang --- drivers/dma/idxd/idxd.h | 7 +++++++ drivers/dma/idxd/irq.c | 21 +++++++++++++++++++-- drivers/vfio/mdev/idxd/mdev.c | 5 +++++ drivers/vfio/mdev/idxd/mdev.h | 1 + drivers/vfio/mdev/idxd/vdev.c | 23 +++++++++++++++++++++++ 5 files changed, 55 insertions(+), 2 deletions(-) diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 5e8da9019c46..b1d94463fce5 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -50,10 +50,17 @@ enum idxd_type { #define IDXD_NAME_SIZE 128 #define IDXD_PMU_EVENT_MAX 64 +struct idxd_wq; + +struct idxd_device_ops { + void (*notify_error)(struct idxd_wq *wq); +}; + struct idxd_device_driver { const char *name; int (*probe)(struct device *dev); void (*remove)(struct device *dev); + struct idxd_device_ops *ops; struct device_driver drv; }; diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c index d1a635ecc7f3..b8b6c93f4480 100644 --- a/drivers/dma/idxd/irq.c +++ b/drivers/dma/idxd/irq.c @@ -104,6 +104,7 @@ static int idxd_device_schedule_fault_process(struct idxd_device *idxd, static int process_misc_interrupts(struct idxd_device *idxd, u32 cause) { + struct idxd_device_driver *wq_drv; struct device *dev = &idxd->pdev->dev; union gensts_reg gensts; u32 val = 0; @@ -123,16 +124,32 @@ static int process_misc_interrupts(struct idxd_device *idxd, u32 cause) int id = idxd->sw_err.wq_idx; struct idxd_wq *wq = idxd->wqs[id]; - if (wq->type == IDXD_WQT_USER) + if (is_idxd_wq_user(wq)) { wake_up_interruptible(&wq->err_queue); + } else if (is_idxd_wq_mdev(wq)) { + struct device *conf_dev = wq_confdev(wq); + struct device_driver *drv = conf_dev->driver; + + wq_drv = container_of(drv, struct idxd_device_driver, drv); + if (wq_drv) + wq_drv->ops->notify_error(wq); + } } else { int i; for (i = 0; i < idxd->max_wqs; i++) { struct idxd_wq *wq = idxd->wqs[i]; - if (wq->type == IDXD_WQT_USER) + if (is_idxd_wq_user(wq)) { wake_up_interruptible(&wq->err_queue); + } else if (is_idxd_wq_mdev(wq)) { + struct device *conf_dev = wq_confdev(wq); + struct device_driver *drv = conf_dev->driver; + + wq_drv = container_of(drv, struct idxd_device_driver, drv); + if (wq_drv) + wq_drv->ops->notify_error(wq); + } } } diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c index 3257505fb7c7..25d1ac67b0c9 100644 --- a/drivers/vfio/mdev/idxd/mdev.c +++ b/drivers/vfio/mdev/idxd/mdev.c @@ -904,10 +904,15 @@ static void idxd_mdev_drv_remove(struct device *dev) put_device(dev); } +static struct idxd_device_ops mdev_wq_ops = { + .notify_error = idxd_wq_vidxd_send_errors, +}; + static struct idxd_device_driver idxd_mdev_driver = { .probe = idxd_mdev_drv_probe, .remove = idxd_mdev_drv_remove, .name = idxd_mdev_drv_name, + .ops = &mdev_wq_ops, }; static int __init idxd_mdev_init(void) diff --git a/drivers/vfio/mdev/idxd/mdev.h b/drivers/vfio/mdev/idxd/mdev.h index dd4290bce772..10358831da6a 100644 --- a/drivers/vfio/mdev/idxd/mdev.h +++ b/drivers/vfio/mdev/idxd/mdev.h @@ -107,5 +107,6 @@ int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigne int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size); int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size); int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size); +void idxd_wq_vidxd_send_errors(struct idxd_wq *wq); #endif diff --git a/drivers/vfio/mdev/idxd/vdev.c b/drivers/vfio/mdev/idxd/vdev.c index a444a0af8b5f..2a066e483dd8 100644 --- a/drivers/vfio/mdev/idxd/vdev.c +++ b/drivers/vfio/mdev/idxd/vdev.c @@ -151,6 +151,29 @@ static void vidxd_report_swerror(struct vdcm_idxd *vidxd, unsigned int error) send_swerr_interrupt(vidxd); } +void idxd_wq_vidxd_send_errors(struct idxd_wq *wq) +{ + struct vdcm_idxd *vidxd = wq->vidxd; + struct idxd_device *idxd = vidxd->idxd; + u8 *bar0 = vidxd->bar0; + union sw_err_reg *swerr = (union sw_err_reg *)(bar0 + IDXD_SWERR_OFFSET); + int i; + + if (swerr->valid) { + if (!swerr->overflow) { + swerr->overflow = 1; + send_swerr_interrupt(vidxd); + } + return; + } + + lockdep_assert_held(&idxd->dev_lock); + for (i = 0; i < 4; i++) + swerr->bits[i] = idxd->sw_err.bits[i]; + + send_swerr_interrupt(vidxd); +} + int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size) { u32 offset = pos & (vidxd->bar_size[0] - 1); From patchwork Sat May 22 00:20:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12274231 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 71203C4707E for ; Sat, 22 May 2021 00:21:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4A27B61152 for ; Sat, 22 May 2021 00:21:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231269AbhEVAXD (ORCPT ); Fri, 21 May 2021 20:23:03 -0400 Received: from mga04.intel.com ([192.55.52.120]:56072 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231286AbhEVAWW (ORCPT ); Fri, 21 May 2021 20:22:22 -0400 IronPort-SDR: 3fbRic06r2VCumwW6SahWbJyjULxrDNM5h2XQk4lNxOzH9Bm9jvM7G8m4VOqLAL8+VHFrvqcAS NWn0+MbSb+pQ== X-IronPort-AV: E=McAfee;i="6200,9189,9991"; a="199661288" X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="199661288" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:20:58 -0700 IronPort-SDR: r66u16T10yzWEpgnFp0w9VS6i7FXAGZWvjKzU4FWhmUAM8XjWV9w3sHQKzn0pTZmCSLTCeP1RZ epSHqwRlOJGQ== X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="474753058" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:20:57 -0700 Subject: [PATCH v6 18/20] vfio: move vfio_pci_set_ctx_trigger_single to common code From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org, jgg@mellanox.com Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 21 May 2021 17:20:56 -0700 Message-ID: <162164285626.261970.18327549914978174180.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> References: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org With mdev needing to also use the same function, move the function to a common place where vfio pci and mdev can both utilize. Signed-off-by: Dave Jiang --- drivers/vfio/Makefile | 2 + drivers/vfio/pci/vfio_pci_intrs.c | 63 ++------------------------------ drivers/vfio/vfio_common.c | 74 +++++++++++++++++++++++++++++++++++++ include/linux/vfio.h | 4 ++ 4 files changed, 83 insertions(+), 60 deletions(-) create mode 100644 drivers/vfio/vfio_common.c diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index fee73f3d9480..fc7fd2412dee 100644 --- a/drivers/vfio/Makefile +++ b/drivers/vfio/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 vfio_virqfd-y := virqfd.o -obj-$(CONFIG_VFIO) += vfio.o +obj-$(CONFIG_VFIO) += vfio.o vfio_common.o obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 9c8efad3a859..926cff00146c 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -561,61 +561,6 @@ static int vfio_pci_set_msi_trigger(struct vfio_pci_device *vdev, return 0; } -static int vfio_pci_set_ctx_trigger_single(struct eventfd_ctx **ctx, - unsigned int count, uint32_t flags, - void *data) -{ - /* DATA_NONE/DATA_BOOL enables loopback testing */ - if (flags & VFIO_IRQ_SET_DATA_NONE) { - if (*ctx) { - if (count) { - eventfd_signal(*ctx, 1); - } else { - eventfd_ctx_put(*ctx); - *ctx = NULL; - } - return 0; - } - } else if (flags & VFIO_IRQ_SET_DATA_BOOL) { - uint8_t trigger; - - if (!count) - return -EINVAL; - - trigger = *(uint8_t *)data; - if (trigger && *ctx) - eventfd_signal(*ctx, 1); - - return 0; - } else if (flags & VFIO_IRQ_SET_DATA_EVENTFD) { - int32_t fd; - - if (!count) - return -EINVAL; - - fd = *(int32_t *)data; - if (fd == -1) { - if (*ctx) - eventfd_ctx_put(*ctx); - *ctx = NULL; - } else if (fd >= 0) { - struct eventfd_ctx *efdctx; - - efdctx = eventfd_ctx_fdget(fd); - if (IS_ERR(efdctx)) - return PTR_ERR(efdctx); - - if (*ctx) - eventfd_ctx_put(*ctx); - - *ctx = efdctx; - } - return 0; - } - - return -EINVAL; -} - static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev, unsigned index, unsigned start, unsigned count, uint32_t flags, void *data) @@ -623,8 +568,8 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev, if (index != VFIO_PCI_ERR_IRQ_INDEX || start != 0 || count > 1) return -EINVAL; - return vfio_pci_set_ctx_trigger_single(&vdev->err_trigger, - count, flags, data); + return vfio_set_ctx_trigger_single(&vdev->err_trigger, + count, flags, data); } static int vfio_pci_set_req_trigger(struct vfio_pci_device *vdev, @@ -634,8 +579,8 @@ static int vfio_pci_set_req_trigger(struct vfio_pci_device *vdev, if (index != VFIO_PCI_REQ_IRQ_INDEX || start != 0 || count > 1) return -EINVAL; - return vfio_pci_set_ctx_trigger_single(&vdev->req_trigger, - count, flags, data); + return vfio_set_ctx_trigger_single(&vdev->req_trigger, + count, flags, data); } int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags, diff --git a/drivers/vfio/vfio_common.c b/drivers/vfio/vfio_common.c new file mode 100644 index 000000000000..b209d57c7312 --- /dev/null +++ b/drivers/vfio/vfio_common.c @@ -0,0 +1,74 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2021 Intel, Corp. All rights reserved. + * Copyright (C) 2012 Red Hat, Inc. All rights reserved. + * Author: Alex Williamson + * VFIO common helper functions + */ + +#include +#include + +/* + * Common helper to set single eventfd trigger + * + * @ctx [out] : address of eventfd ctx to be written to + * @count [in] : number of vectors (should be 1) + * @flags [in] : VFIO IRQ flags + * @data [in] : data from ioctl + */ +int vfio_set_ctx_trigger_single(struct eventfd_ctx **ctx, + unsigned int count, u32 flags, + void *data) +{ + /* DATA_NONE/DATA_BOOL enables loopback testing */ + if (flags & VFIO_IRQ_SET_DATA_NONE) { + if (*ctx) { + if (count) { + eventfd_signal(*ctx, 1); + } else { + eventfd_ctx_put(*ctx); + *ctx = NULL; + } + return 0; + } + } else if (flags & VFIO_IRQ_SET_DATA_BOOL) { + u8 trigger; + + if (!count) + return -EINVAL; + + trigger = *(uint8_t *)data; + if (trigger && *ctx) + eventfd_signal(*ctx, 1); + + return 0; + } else if (flags & VFIO_IRQ_SET_DATA_EVENTFD) { + s32 fd; + + if (!count) + return -EINVAL; + + fd = *(s32 *)data; + if (fd == -1) { + if (*ctx) + eventfd_ctx_put(*ctx); + *ctx = NULL; + } else if (fd >= 0) { + struct eventfd_ctx *efdctx; + + efdctx = eventfd_ctx_fdget(fd); + if (IS_ERR(efdctx)) + return PTR_ERR(efdctx); + + if (*ctx) + eventfd_ctx_put(*ctx); + + *ctx = efdctx; + } + return 0; + } + + return -EINVAL; +} +EXPORT_SYMBOL(vfio_set_ctx_trigger_single); diff --git a/include/linux/vfio.h b/include/linux/vfio.h index ed5ca027eb49..aa7cb0e1b8b2 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -235,4 +235,8 @@ extern int vfio_virqfd_enable(void *opaque, void *data, struct virqfd **pvirqfd, int fd); extern void vfio_virqfd_disable(struct virqfd **pvirqfd); +/* common lib functions */ +extern int vfio_set_ctx_trigger_single(struct eventfd_ctx **ctx, + unsigned int count, u32 flags, + void *data); #endif /* VFIO_H */ From patchwork Sat May 22 00:21:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12274247 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3456C04FF3 for ; Sat, 22 May 2021 00:22:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A5D32613D0 for ; Sat, 22 May 2021 00:22:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231182AbhEVAX5 (ORCPT ); Fri, 21 May 2021 20:23:57 -0400 Received: from mga09.intel.com ([134.134.136.24]:27110 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230478AbhEVAXM (ORCPT ); Fri, 21 May 2021 20:23:12 -0400 IronPort-SDR: 4lA4DYl3TTO9RPf0v5L5QiaWGOTxZBixkJKYwdJWrIm4osS0K8HpxPda6iBNKVjPWR0Yu+9H7h Z4laU5ucfnlA== X-IronPort-AV: E=McAfee;i="6200,9189,9991"; a="201634425" X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="201634425" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:21:05 -0700 IronPort-SDR: C9TyUlnAuHhIlKxxo2Tnq56G+67p2ETA+MiqYq+SL80Q7Eryb088zy2NlZJpimmgJ33hd5cwmU U7cU4wJNypxg== X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="441149772" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:21:03 -0700 Subject: [PATCH v6 19/20] vfio: mdev: Add device request interface From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org, jgg@mellanox.com Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 21 May 2021 17:21:03 -0700 Message-ID: <162164286322.261970.2647654897518928545.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> References: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Similar to commit 6140a8f56238 ("vfio-pci: Add device request interface"). Add request interface for mdev to allow userspace to opt in to receive a device request notification, indicating that the device should be released. Signed-off-by: Dave Jiang --- drivers/vfio/mdev/mdev_irqs.c | 23 +++++++++++++++++++++++ include/linux/mdev.h | 15 +++++++++++++++ 2 files changed, 38 insertions(+) diff --git a/drivers/vfio/mdev/mdev_irqs.c b/drivers/vfio/mdev/mdev_irqs.c index ed2d11a7c729..11b1f8df020c 100644 --- a/drivers/vfio/mdev/mdev_irqs.c +++ b/drivers/vfio/mdev/mdev_irqs.c @@ -316,3 +316,26 @@ void mdev_irqs_free(struct mdev_device *mdev) memset(&mdev->mdev_irq, 0, sizeof(mdev->mdev_irq)); } EXPORT_SYMBOL_GPL(mdev_irqs_free); + +void vfio_mdev_request(struct vfio_device *vdev, unsigned int count) +{ + struct device *dev = vdev->dev; + struct mdev_device *mdev = to_mdev_device(dev); + + if (mdev->req_trigger) { + dev_dbg(dev, "Requesting device from user\n"); + eventfd_signal(mdev->req_trigger, 1); + } +} +EXPORT_SYMBOL_GPL(vfio_mdev_request); + +int vfio_mdev_set_req_trigger(struct mdev_device *mdev, unsigned int index, + unsigned int start, unsigned int count, u32 flags, + void *data) +{ + if (index != VFIO_PCI_REQ_IRQ_INDEX || start != 0 || count != 1) + return -EINVAL; + + return vfio_set_ctx_trigger_single(&mdev->req_trigger, count, flags, data); +} +EXPORT_SYMBOL_GPL(vfio_mdev_set_req_trigger); diff --git a/include/linux/mdev.h b/include/linux/mdev.h index 035c021e8068..db73d58f5e81 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -11,6 +11,8 @@ #define MDEV_H #include +#include +#include struct mdev_type; @@ -38,6 +40,7 @@ struct mdev_device { struct device *iommu_device; struct mutex creation_lock; struct mdev_irq mdev_irq; + struct eventfd_ctx *req_trigger; }; static inline struct mdev_device *irq_to_mdev(struct mdev_irq *mdev_irq) @@ -131,6 +134,10 @@ void mdev_msix_send_signal(struct mdev_device *mdev, int vector); int mdev_irqs_init(struct mdev_device *mdev, int num, bool *ims_map); void mdev_irqs_free(struct mdev_device *mdev); void mdev_irqs_set_pasid(struct mdev_device *mdev, u32 pasid); +void vfio_mdev_request(struct vfio_device *vdev, unsigned int count); +int vfio_mdev_set_req_trigger(struct mdev_device *mdev, unsigned int index, + unsigned int start, unsigned int count, u32 flags, + void *data); #else static inline int mdev_set_msix_trigger(struct mdev_device *mdev, unsigned int index, unsigned int start, unsigned int count, u32 flags, @@ -148,6 +155,14 @@ static inline int mdev_irqs_init(struct mdev_device *mdev, int num, bool *ims_ma void mdev_irqs_free(struct mdev_device *mdev) {} void mdev_irqs_set_pasid(struct mdev_device *mdev, u32 pasid) {} +void vfio_mdev_request(struct vfio_device *vdev, unsigned int count) {} + +int vfio_mdev_set_req_trigger(struct mdev_device *mdev, unsigned int index, + unsigned int start, unsigned int count, u32 flags, + void *data) +{ + return -EOPNOTSUPP; +} #endif /* CONFIG_VFIO_MDEV_IMS */ #endif /* MDEV_H */ From patchwork Sat May 22 00:21:10 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 12274249 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFCAAC4707A for ; Sat, 22 May 2021 00:22:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A1980613D0 for ; Sat, 22 May 2021 00:22:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231221AbhEVAYH (ORCPT ); Fri, 21 May 2021 20:24:07 -0400 Received: from mga09.intel.com ([134.134.136.24]:27120 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231249AbhEVAXV (ORCPT ); Fri, 21 May 2021 20:23:21 -0400 IronPort-SDR: a7SWcpXHx8XgFp3XCAELEXspmXEjkKNZLzmiDELMc7EZjE7ILFDr1jR6muv+TsusbBxDkb57Xf xjGdKOV+7/Qw== X-IronPort-AV: E=McAfee;i="6200,9189,9991"; a="201634436" X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="201634436" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:21:11 -0700 IronPort-SDR: QF/V6SSZ5KjbpoMVOKNr3+Vz2atm7msoUr2SSTfKs6IoniD3wJ4wKkOVcO11Wpf6eoE5uEaDJ8 Lm6WE2bQNDBg== X-IronPort-AV: E=Sophos;i="5.82,319,1613462400"; d="scan'208";a="476324236" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 May 2021 17:21:10 -0700 Subject: [PATCH v6 20/20] vfio: mdev: idxd: setup request interrupt From: Dave Jiang To: alex.williamson@redhat.com, kwankhede@nvidia.com, tglx@linutronix.de, vkoul@kernel.org, jgg@mellanox.com Cc: megha.dey@intel.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, dan.j.williams@intel.com, eric.auger@redhat.com, pbonzini@redhat.com, dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Date: Fri, 21 May 2021 17:21:10 -0700 Message-ID: <162164287046.261970.257439178688866229.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> References: <162164243591.261970.3439987543338120797.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/0.23-29-ga622f1 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add request interrupt support for idxd-mdev driver to support requesting release of device. Signed-off-by: Dave Jiang --- drivers/vfio/mdev/idxd/mdev.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/vfio/mdev/idxd/mdev.c b/drivers/vfio/mdev/idxd/mdev.c index 25d1ac67b0c9..6bf2ec43656c 100644 --- a/drivers/vfio/mdev/idxd/mdev.c +++ b/drivers/vfio/mdev/idxd/mdev.c @@ -52,6 +52,8 @@ static int idxd_vdcm_get_irq_count(struct vfio_device *vdev, int type) { if (type == VFIO_PCI_MSIX_IRQ_INDEX) return VIDXD_MAX_MSIX_VECS; + else if (type == VFIO_PCI_REQ_IRQ_INDEX) + return 1; return 0; } @@ -486,6 +488,12 @@ static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags, return mdev_set_msix_trigger(mdev, index, start, count, flags, data); } break; + case VFIO_PCI_REQ_IRQ_INDEX: + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { + case VFIO_IRQ_SET_ACTION_TRIGGER: + return vfio_mdev_set_req_trigger(mdev, index, start, count, flags, data); + } + break; } return -ENOTTY; @@ -678,6 +686,7 @@ static long idxd_vdcm_ioctl(struct vfio_device *vdev, unsigned int cmd, unsigned switch (info.index) { case VFIO_PCI_MSIX_IRQ_INDEX: + case VFIO_PCI_REQ_IRQ_INDEX: info.flags |= VFIO_IRQ_INFO_NORESIZE; break; default: @@ -750,6 +759,7 @@ static const struct vfio_device_ops idxd_mdev_ops = { .write = idxd_vdcm_write, .mmap = idxd_vdcm_mmap, .ioctl = idxd_vdcm_ioctl, + .request = vfio_mdev_request, }; static ssize_t name_show(struct mdev_type *mtype, struct mdev_type_attribute *attr, char *buf)