From patchwork Tue Jul 21 16:02:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11675953 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 060581392 for ; Tue, 21 Jul 2020 16:02:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E395A2077D for ; Tue, 21 Jul 2020 16:02:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730085AbgGUQCZ (ORCPT ); Tue, 21 Jul 2020 12:02:25 -0400 Received: from mga01.intel.com ([192.55.52.88]:47301 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726919AbgGUQCY (ORCPT ); Tue, 21 Jul 2020 12:02:24 -0400 IronPort-SDR: QWAXBytFfS23jUq3AZPabo7CCHI3HCBNcPGKgtVIwAfuSV8kp1OUuYgKJXSK9ghFrjd383Zq3l JAh59aoRU11Q== X-IronPort-AV: E=McAfee;i="6000,8403,9689"; a="168306135" X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="168306135" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2020 09:02:23 -0700 IronPort-SDR: oGDRW/XIcPpgjsIGuxrTgLnNDsN8b7K6pe/VoyFKNQGDe+1KJvKAGLUV3GkHwdmbNnj/KadSMi h1OWazEl1d7w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="488133813" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga005.fm.intel.com with ESMTP; 21 Jul 2020 09:02:21 -0700 Subject: [PATCH RFC v2 01/18] platform-msi: Introduce platform_msi_ops From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, dave.hansen@intel.com, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Jul 2020 09:02:21 -0700 Message-ID: <159534734176.28840.13007887237870414700.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> References: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Megha Dey platform-msi.c provides a generic way to handle non-PCI message signaled interrupts. However, it assumes that only the message needs to be customized. Given that an MSI is just a write transaction, some devices may need custom callbacks to mask/unmask their interrupts. Hence, introduce a new structure platform_msi_ops, which would provide device specific write function for now and introduce device device specific callbacks (mask/unmask), in subsequent patches. Devices may find more efficient ways to store addr/data pairs than what is recommended by the PCI sig. (For e.g. the storage of the vector might not be resident on the device. Consider GPGPU for instance, where the vector could be part of the execution context instead of being stored on the device.) Reviewed-by: Dan Williams Signed-off-by: Megha Dey Signed-off-by: Dave Jiang --- drivers/base/platform-msi.c | 29 +++++++++++++++-------------- drivers/dma/mv_xor_v2.c | 6 +++++- drivers/dma/qcom/hidma.c | 6 +++++- drivers/iommu/arm-smmu-v3.c | 6 +++++- drivers/irqchip/irq-mbigen.c | 8 ++++++-- drivers/irqchip/irq-mvebu-icu.c | 6 +++++- drivers/mailbox/bcm-flexrm-mailbox.c | 6 +++++- drivers/perf/arm_smmuv3_pmu.c | 6 +++++- include/linux/msi.h | 20 ++++++++++++++------ 9 files changed, 65 insertions(+), 28 deletions(-) diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c index c4a17e5edf8b..9d94cd699468 100644 --- a/drivers/base/platform-msi.c +++ b/drivers/base/platform-msi.c @@ -18,14 +18,14 @@ /* * Internal data structure containing a (made up, but unique) devid - * and the callback to write the MSI message. + * and the platform-msi ops */ struct platform_msi_priv_data { - struct device *dev; - void *host_data; - msi_alloc_info_t arg; - irq_write_msi_msg_t write_msg; - int devid; + struct device *dev; + void *host_data; + msi_alloc_info_t arg; + const struct platform_msi_ops *ops; + int devid; }; /* The devid allocator */ @@ -83,7 +83,7 @@ static void platform_msi_write_msg(struct irq_data *data, struct msi_msg *msg) priv_data = desc->platform.msi_priv_data; - priv_data->write_msg(desc, msg); + priv_data->ops->write_msg(desc, msg); } static void platform_msi_update_chip_ops(struct msi_domain_info *info) @@ -194,16 +194,17 @@ struct irq_domain *platform_msi_create_irq_domain(struct fwnode_handle *fwnode, static struct platform_msi_priv_data * platform_msi_alloc_priv_data(struct device *dev, unsigned int nvec, - irq_write_msi_msg_t write_msi_msg) + const struct platform_msi_ops *platform_ops) { struct platform_msi_priv_data *datap; + /* * Limit the number of interrupts to 2048 per device. Should we * need to bump this up, DEV_ID_SHIFT should be adjusted * accordingly (which would impact the max number of MSI * capable devices). */ - if (!dev->msi_domain || !write_msi_msg || !nvec || nvec > MAX_DEV_MSIS) + if (!dev->msi_domain || !platform_ops->write_msg || !nvec || nvec > MAX_DEV_MSIS) return ERR_PTR(-EINVAL); if (dev->msi_domain->bus_token != DOMAIN_BUS_PLATFORM_MSI) { @@ -227,7 +228,7 @@ platform_msi_alloc_priv_data(struct device *dev, unsigned int nvec, return ERR_PTR(err); } - datap->write_msg = write_msi_msg; + datap->ops = platform_ops; datap->dev = dev; return datap; @@ -249,12 +250,12 @@ static void platform_msi_free_priv_data(struct platform_msi_priv_data *data) * Zero for success, or an error code in case of failure */ int platform_msi_domain_alloc_irqs(struct device *dev, unsigned int nvec, - irq_write_msi_msg_t write_msi_msg) + const struct platform_msi_ops *platform_ops) { struct platform_msi_priv_data *priv_data; int err; - priv_data = platform_msi_alloc_priv_data(dev, nvec, write_msi_msg); + priv_data = platform_msi_alloc_priv_data(dev, nvec, platform_ops); if (IS_ERR(priv_data)) return PTR_ERR(priv_data); @@ -324,7 +325,7 @@ struct irq_domain * __platform_msi_create_device_domain(struct device *dev, unsigned int nvec, bool is_tree, - irq_write_msi_msg_t write_msi_msg, + const struct platform_msi_ops *platform_ops, const struct irq_domain_ops *ops, void *host_data) { @@ -332,7 +333,7 @@ __platform_msi_create_device_domain(struct device *dev, struct irq_domain *domain; int err; - data = platform_msi_alloc_priv_data(dev, nvec, write_msi_msg); + data = platform_msi_alloc_priv_data(dev, nvec, platform_ops); if (IS_ERR(data)) return NULL; diff --git a/drivers/dma/mv_xor_v2.c b/drivers/dma/mv_xor_v2.c index 9225f08dfee9..c0033c4f8ee5 100644 --- a/drivers/dma/mv_xor_v2.c +++ b/drivers/dma/mv_xor_v2.c @@ -710,6 +710,10 @@ static int mv_xor_v2_resume(struct platform_device *dev) return 0; } +static const struct platform_msi_ops mv_xor_v2_msi_ops = { + .write_msg = mv_xor_v2_set_msi_msg, +}; + static int mv_xor_v2_probe(struct platform_device *pdev) { struct mv_xor_v2_device *xor_dev; @@ -765,7 +769,7 @@ static int mv_xor_v2_probe(struct platform_device *pdev) } ret = platform_msi_domain_alloc_irqs(&pdev->dev, 1, - mv_xor_v2_set_msi_msg); + &mv_xor_v2_msi_ops); if (ret) goto disable_clk; diff --git a/drivers/dma/qcom/hidma.c b/drivers/dma/qcom/hidma.c index 0a6d3ea08c78..c3ee63159ff8 100644 --- a/drivers/dma/qcom/hidma.c +++ b/drivers/dma/qcom/hidma.c @@ -678,6 +678,10 @@ static void hidma_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg) writel(msg->data, dmadev->dev_evca + 0x120); } } + +static const struct platform_msi_ops hidma_msi_ops = { + .write_msg = hidma_write_msi_msg, +}; #endif static void hidma_free_msis(struct hidma_dev *dmadev) @@ -703,7 +707,7 @@ static int hidma_request_msi(struct hidma_dev *dmadev, struct msi_desc *failed_desc = NULL; rc = platform_msi_domain_alloc_irqs(&pdev->dev, HIDMA_MSI_INTS, - hidma_write_msi_msg); + &hidma_msi_ops); if (rc) return rc; diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index f578677a5c41..655e7987d8af 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -3410,6 +3410,10 @@ static void arm_smmu_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg) writel_relaxed(ARM_SMMU_MEMATTR_DEVICE_nGnRE, smmu->base + cfg[2]); } +static const struct platform_msi_ops arm_smmu_msi_ops = { + .write_msg = arm_smmu_write_msi_msg, +}; + static void arm_smmu_setup_msis(struct arm_smmu_device *smmu) { struct msi_desc *desc; @@ -3434,7 +3438,7 @@ static void arm_smmu_setup_msis(struct arm_smmu_device *smmu) } /* Allocate MSIs for evtq, gerror and priq. Ignore cmdq */ - ret = platform_msi_domain_alloc_irqs(dev, nvec, arm_smmu_write_msi_msg); + ret = platform_msi_domain_alloc_irqs(dev, nvec, &arm_smmu_msi_ops); if (ret) { dev_warn(dev, "failed to allocate MSIs - falling back to wired irqs\n"); return; diff --git a/drivers/irqchip/irq-mbigen.c b/drivers/irqchip/irq-mbigen.c index ff7627b57772..6619e2eadbce 100644 --- a/drivers/irqchip/irq-mbigen.c +++ b/drivers/irqchip/irq-mbigen.c @@ -232,6 +232,10 @@ static const struct irq_domain_ops mbigen_domain_ops = { .free = mbigen_irq_domain_free, }; +static const struct platform_msi_ops mbigen_msi_ops = { + .write_msg = mbigen_write_msg, +}; + static int mbigen_of_create_domain(struct platform_device *pdev, struct mbigen_device *mgn_chip) { @@ -260,7 +264,7 @@ static int mbigen_of_create_domain(struct platform_device *pdev, } domain = platform_msi_create_device_domain(&child->dev, num_pins, - mbigen_write_msg, + &mbigen_msi_ops, &mbigen_domain_ops, mgn_chip); if (!domain) { @@ -308,7 +312,7 @@ static int mbigen_acpi_create_domain(struct platform_device *pdev, return -EINVAL; domain = platform_msi_create_device_domain(&pdev->dev, num_pins, - mbigen_write_msg, + &mbigen_msi_ops, &mbigen_domain_ops, mgn_chip); if (!domain) diff --git a/drivers/irqchip/irq-mvebu-icu.c b/drivers/irqchip/irq-mvebu-icu.c index 91adf771f185..927d8ebc68cb 100644 --- a/drivers/irqchip/irq-mvebu-icu.c +++ b/drivers/irqchip/irq-mvebu-icu.c @@ -295,6 +295,10 @@ static const struct of_device_id mvebu_icu_subset_of_match[] = { {}, }; +static const struct platform_msi_ops mvebu_icu_msi_ops = { + .write_msg = mvebu_icu_write_msg, +}; + static int mvebu_icu_subset_probe(struct platform_device *pdev) { struct mvebu_icu_msi_data *msi_data; @@ -324,7 +328,7 @@ static int mvebu_icu_subset_probe(struct platform_device *pdev) return -ENODEV; irq_domain = platform_msi_create_device_tree_domain(dev, ICU_MAX_IRQS, - mvebu_icu_write_msg, + &mvebu_icu_msi_ops, &mvebu_icu_domain_ops, msi_data); if (!irq_domain) { diff --git a/drivers/mailbox/bcm-flexrm-mailbox.c b/drivers/mailbox/bcm-flexrm-mailbox.c index bee33abb5308..0268337e08e3 100644 --- a/drivers/mailbox/bcm-flexrm-mailbox.c +++ b/drivers/mailbox/bcm-flexrm-mailbox.c @@ -1492,6 +1492,10 @@ static void flexrm_mbox_msi_write(struct msi_desc *desc, struct msi_msg *msg) writel_relaxed(msg->data, ring->regs + RING_MSI_DATA_VALUE); } +static const struct platform_msi_ops flexrm_mbox_msi_ops = { + .write_msg = flexrm_mbox_msi_write, +}; + static int flexrm_mbox_probe(struct platform_device *pdev) { int index, ret = 0; @@ -1604,7 +1608,7 @@ static int flexrm_mbox_probe(struct platform_device *pdev) /* Allocate platform MSIs for each ring */ ret = platform_msi_domain_alloc_irqs(dev, mbox->num_rings, - flexrm_mbox_msi_write); + &flexrm_mbox_msi_ops); if (ret) goto fail_destroy_cmpl_pool; diff --git a/drivers/perf/arm_smmuv3_pmu.c b/drivers/perf/arm_smmuv3_pmu.c index 48e28ef93a70..f1dec2bcd2a1 100644 --- a/drivers/perf/arm_smmuv3_pmu.c +++ b/drivers/perf/arm_smmuv3_pmu.c @@ -652,6 +652,10 @@ static void smmu_pmu_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg) pmu->reg_base + SMMU_PMCG_IRQ_CFG2); } +static const struct platform_msi_ops smmu_pmu_msi_ops = { + .write_msg = smmu_pmu_write_msi_msg, +}; + static void smmu_pmu_setup_msi(struct smmu_pmu *pmu) { struct msi_desc *desc; @@ -665,7 +669,7 @@ static void smmu_pmu_setup_msi(struct smmu_pmu *pmu) if (!(readl(pmu->reg_base + SMMU_PMCG_CFGR) & SMMU_PMCG_CFGR_MSI)) return; - ret = platform_msi_domain_alloc_irqs(dev, 1, smmu_pmu_write_msi_msg); + ret = platform_msi_domain_alloc_irqs(dev, 1, &smmu_pmu_msi_ops); if (ret) { dev_warn(dev, "failed to allocate MSIs\n"); return; diff --git a/include/linux/msi.h b/include/linux/msi.h index 8ad679e9d9c0..7f6a8eb51aca 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -321,6 +321,14 @@ enum { MSI_FLAG_LEVEL_CAPABLE = (1 << 6), }; +/* + * platform_msi_ops - Callbacks for platform MSI ops + * @write_msg: write message content + */ +struct platform_msi_ops { + irq_write_msi_msg_t write_msg; +}; + int msi_domain_set_affinity(struct irq_data *data, const struct cpumask *mask, bool force); @@ -336,7 +344,7 @@ struct irq_domain *platform_msi_create_irq_domain(struct fwnode_handle *fwnode, struct msi_domain_info *info, struct irq_domain *parent); int platform_msi_domain_alloc_irqs(struct device *dev, unsigned int nvec, - irq_write_msi_msg_t write_msi_msg); + const struct platform_msi_ops *platform_ops); void platform_msi_domain_free_irqs(struct device *dev); /* When an MSI domain is used as an intermediate domain */ @@ -348,14 +356,14 @@ struct irq_domain * __platform_msi_create_device_domain(struct device *dev, unsigned int nvec, bool is_tree, - irq_write_msi_msg_t write_msi_msg, + const struct platform_msi_ops *platform_ops, const struct irq_domain_ops *ops, void *host_data); -#define platform_msi_create_device_domain(dev, nvec, write, ops, data) \ - __platform_msi_create_device_domain(dev, nvec, false, write, ops, data) -#define platform_msi_create_device_tree_domain(dev, nvec, write, ops, data) \ - __platform_msi_create_device_domain(dev, nvec, true, write, ops, data) +#define platform_msi_create_device_domain(dev, nvec, p_ops, ops, data) \ + __platform_msi_create_device_domain(dev, nvec, false, p_ops, ops, data) +#define platform_msi_create_device_tree_domain(dev, nvec, p_ops, ops, data) \ + __platform_msi_create_device_domain(dev, nvec, true, p_ops, ops, data) int platform_msi_domain_alloc(struct irq_domain *domain, unsigned int virq, unsigned int nr_irqs); From patchwork Tue Jul 21 16:02:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11675959 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D771C618 for ; Tue, 21 Jul 2020 16:02:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C14CB21702 for ; Tue, 21 Jul 2020 16:02:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730111AbgGUQCg (ORCPT ); Tue, 21 Jul 2020 12:02:36 -0400 Received: from mga09.intel.com ([134.134.136.24]:17457 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726919AbgGUQCg (ORCPT ); Tue, 21 Jul 2020 12:02:36 -0400 IronPort-SDR: V6IFSudYNmCLSXEcZ/jYk7RzdhsJYEJLenDRqrBO+J8Sbv/WYUzzNuATja3h10Rp1BrrOTksvz LBqNlx9jnrQg== X-IronPort-AV: E=McAfee;i="6000,8403,9689"; a="151498642" X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="151498642" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2020 09:02:30 -0700 IronPort-SDR: 4k8m5pBgp08yDsi4xMHaA6oG98O0aTlsV3haxLcdE7QZf9FErdMZXolXxI+MPte1CBcmU5Jo8P /Shn8Fgt7K5Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="287967356" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga006.jf.intel.com with ESMTP; 21 Jul 2020 09:02:28 -0700 Subject: [PATCH RFC v2 02/18] irq/dev-msi: Add support for a new DEV_MSI irq domain From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, dave.hansen@intel.com, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Jul 2020 09:02:28 -0700 Message-ID: <159534734833.28840.10067945890695808535.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> References: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Megha Dey Add support for the creation of a new DEV_MSI irq domain. It creates a new irq chip associated with the DEV_MSI domain and adds the necessary domain operations to it. Add a new config option DEV_MSI which must be enabled by any driver that wants to support device-specific message-signaled-interrupts outside of PCI-MSI(-X). Lastly, add device specific mask/unmask callbacks in addition to a write function to the platform_msi_ops. Reviewed-by: Dan Williams Signed-off-by: Megha Dey Signed-off-by: Dave Jiang --- arch/x86/include/asm/hw_irq.h | 5 ++ drivers/base/Kconfig | 7 +++ drivers/base/Makefile | 1 drivers/base/dev-msi.c | 95 +++++++++++++++++++++++++++++++++++++++++ drivers/base/platform-msi.c | 45 +++++++++++++------ drivers/base/platform-msi.h | 23 ++++++++++ include/linux/msi.h | 8 +++ 7 files changed, 168 insertions(+), 16 deletions(-) create mode 100644 drivers/base/dev-msi.c create mode 100644 drivers/base/platform-msi.h diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index 74c12437401e..8ecd7570589d 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -61,6 +61,11 @@ struct irq_alloc_info { irq_hw_number_t msi_hwirq; }; #endif +#ifdef CONFIG_DEV_MSI + struct { + irq_hw_number_t hwirq; + }; +#endif #ifdef CONFIG_X86_IO_APIC struct { int ioapic_id; diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 8d7001712062..f00901bac056 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -210,4 +210,11 @@ config GENERIC_ARCH_TOPOLOGY appropriate scaling, sysfs interface for reading capacity values at runtime. +config DEV_MSI + bool "Device Specific Interrupt Messages" + select IRQ_DOMAIN_HIERARCHY + select GENERIC_MSI_IRQ_DOMAIN + help + Allow device drivers to generate device-specific interrupt messages + for devices independent of PCI MSI/-X. endmenu diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 157452080f3d..ca1e4d39164e 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -21,6 +21,7 @@ obj-$(CONFIG_REGMAP) += regmap/ obj-$(CONFIG_SOC_BUS) += soc.o obj-$(CONFIG_PINCTRL) += pinctrl.o obj-$(CONFIG_DEV_COREDUMP) += devcoredump.o +obj-$(CONFIG_DEV_MSI) += dev-msi.o obj-$(CONFIG_GENERIC_MSI_IRQ_DOMAIN) += platform-msi.o obj-$(CONFIG_GENERIC_ARCH_TOPOLOGY) += arch_topology.o diff --git a/drivers/base/dev-msi.c b/drivers/base/dev-msi.c new file mode 100644 index 000000000000..240ccc353933 --- /dev/null +++ b/drivers/base/dev-msi.c @@ -0,0 +1,95 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright © 2020 Intel Corporation. + * + * Author: Megha Dey + */ + +#include +#include +#include +#include "platform-msi.h" + +struct irq_domain *dev_msi_default_domain; + +static irq_hw_number_t dev_msi_get_hwirq(struct msi_domain_info *info, + msi_alloc_info_t *arg) +{ + return arg->hwirq; +} + +static irq_hw_number_t dev_msi_calc_hwirq(struct msi_desc *desc) +{ + u32 devid; + + devid = desc->platform.msi_priv_data->devid; + + return (devid << (32 - DEV_ID_SHIFT)) | desc->platform.msi_index; +} + +static void dev_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc) +{ + arg->hwirq = dev_msi_calc_hwirq(desc); +} + +static int dev_msi_prepare(struct irq_domain *domain, struct device *dev, + int nvec, msi_alloc_info_t *arg) +{ + memset(arg, 0, sizeof(*arg)); + + return 0; +} + +static struct msi_domain_ops dev_msi_domain_ops = { + .get_hwirq = dev_msi_get_hwirq, + .set_desc = dev_msi_set_desc, + .msi_prepare = dev_msi_prepare, +}; + +static struct irq_chip dev_msi_controller = { + .name = "DEV-MSI", + .irq_unmask = platform_msi_unmask_irq, + .irq_mask = platform_msi_mask_irq, + .irq_write_msi_msg = platform_msi_write_msg, + .irq_ack = irq_chip_ack_parent, + .irq_retrigger = irq_chip_retrigger_hierarchy, + .flags = IRQCHIP_SKIP_SET_WAKE, +}; + +static struct msi_domain_info dev_msi_domain_info = { + .flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS, + .ops = &dev_msi_domain_ops, + .chip = &dev_msi_controller, + .handler = handle_edge_irq, + .handler_name = "edge", +}; + +static int __init create_dev_msi_domain(void) +{ + struct irq_domain *parent = NULL; + struct fwnode_handle *fn; + + /* + * Modern code should never have to use irq_get_default_host. But since + * dev-msi is invisible to DT/ACPI, this is an exception case. + */ + parent = irq_get_default_host(); + if (!parent) + return -ENXIO; + + fn = irq_domain_alloc_named_fwnode("DEV_MSI"); + if (!fn) + return -ENXIO; + + dev_msi_default_domain = msi_create_irq_domain(fn, &dev_msi_domain_info, parent); + if (!dev_msi_default_domain) { + pr_warn("failed to initialize irqdomain for DEV-MSI.\n"); + return -ENXIO; + } + + irq_domain_update_bus_token(dev_msi_default_domain, DOMAIN_BUS_PLATFORM_MSI); + irq_domain_free_fwnode(fn); + + return 0; +} +device_initcall(create_dev_msi_domain); diff --git a/drivers/base/platform-msi.c b/drivers/base/platform-msi.c index 9d94cd699468..5e1f210d65ee 100644 --- a/drivers/base/platform-msi.c +++ b/drivers/base/platform-msi.c @@ -12,21 +12,7 @@ #include #include #include - -#define DEV_ID_SHIFT 21 -#define MAX_DEV_MSIS (1 << (32 - DEV_ID_SHIFT)) - -/* - * Internal data structure containing a (made up, but unique) devid - * and the platform-msi ops - */ -struct platform_msi_priv_data { - struct device *dev; - void *host_data; - msi_alloc_info_t arg; - const struct platform_msi_ops *ops; - int devid; -}; +#include "platform-msi.h" /* The devid allocator */ static DEFINE_IDA(platform_msi_devid_ida); @@ -76,7 +62,7 @@ static void platform_msi_update_dom_ops(struct msi_domain_info *info) ops->set_desc = platform_msi_set_desc; } -static void platform_msi_write_msg(struct irq_data *data, struct msi_msg *msg) +void platform_msi_write_msg(struct irq_data *data, struct msi_msg *msg) { struct msi_desc *desc = irq_data_get_msi_desc(data); struct platform_msi_priv_data *priv_data; @@ -86,6 +72,33 @@ static void platform_msi_write_msg(struct irq_data *data, struct msi_msg *msg) priv_data->ops->write_msg(desc, msg); } +static void __platform_msi_desc_mask_unmask_irq(struct msi_desc *desc, u32 mask) +{ + const struct platform_msi_ops *ops; + + ops = desc->platform.msi_priv_data->ops; + if (!ops) + return; + + if (mask) { + if (ops->irq_mask) + ops->irq_mask(desc); + } else { + if (ops->irq_unmask) + ops->irq_unmask(desc); + } +} + +void platform_msi_mask_irq(struct irq_data *data) +{ + __platform_msi_desc_mask_unmask_irq(irq_data_get_msi_desc(data), 1); +} + +void platform_msi_unmask_irq(struct irq_data *data) +{ + __platform_msi_desc_mask_unmask_irq(irq_data_get_msi_desc(data), 0); +} + static void platform_msi_update_chip_ops(struct msi_domain_info *info) { struct irq_chip *chip = info->chip; diff --git a/drivers/base/platform-msi.h b/drivers/base/platform-msi.h new file mode 100644 index 000000000000..1de8c2874218 --- /dev/null +++ b/drivers/base/platform-msi.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright © 2020 Intel Corporation. + * + * Author: Megha Dey + */ + +#include + +#define DEV_ID_SHIFT 21 +#define MAX_DEV_MSIS (1 << (32 - DEV_ID_SHIFT)) + +/* + * Data structure containing a (made up, but unique) devid + * and the platform-msi ops. + */ +struct platform_msi_priv_data { + struct device *dev; + void *host_data; + msi_alloc_info_t arg; + const struct platform_msi_ops *ops; + int devid; +}; diff --git a/include/linux/msi.h b/include/linux/msi.h index 7f6a8eb51aca..1da97f905720 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -323,9 +323,13 @@ enum { /* * platform_msi_ops - Callbacks for platform MSI ops + * @irq_mask: mask an interrupt source + * @irq_unmask: unmask an interrupt source * @write_msg: write message content */ struct platform_msi_ops { + unsigned int (*irq_mask)(struct msi_desc *desc); + unsigned int (*irq_unmask)(struct msi_desc *desc); irq_write_msi_msg_t write_msg; }; @@ -370,6 +374,10 @@ int platform_msi_domain_alloc(struct irq_domain *domain, unsigned int virq, void platform_msi_domain_free(struct irq_domain *domain, unsigned int virq, unsigned int nvec); void *platform_msi_get_host_data(struct irq_domain *domain); + +void platform_msi_write_msg(struct irq_data *data, struct msi_msg *msg); +void platform_msi_unmask_irq(struct irq_data *data); +void platform_msi_mask_irq(struct irq_data *data); #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */ #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN From patchwork Tue Jul 21 16:02:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11675983 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0EFAC618 for ; Tue, 21 Jul 2020 16:03:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F09C42077D for ; Tue, 21 Jul 2020 16:03:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730143AbgGUQCt (ORCPT ); Tue, 21 Jul 2020 12:02:49 -0400 Received: from mga17.intel.com ([192.55.52.151]:14231 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730139AbgGUQCs (ORCPT ); Tue, 21 Jul 2020 12:02:48 -0400 IronPort-SDR: MaW+NAXVH5JP4iDk8R9v6n6yHU6eduCJvZfv/kC0DiJajCq81fhs8ofSFg1uApEJC9gEFOrr7s X2paKqT64Y7Q== X-IronPort-AV: E=McAfee;i="6000,8403,9689"; a="130239369" X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="130239369" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2020 09:02:37 -0700 IronPort-SDR: zAO5KWIqCygedZ3OIT58lFgYN7+LChU1kvZ5X+6TvCo3X48M/w5pIteqyX4F9vMRGP1xUTEvMc oFAqy0Qc5TOA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="319951866" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga002.fm.intel.com with ESMTP; 21 Jul 2020 09:02:35 -0700 Subject: [PATCH RFC v2 03/18] irq/dev-msi: Create IR-DEV-MSI irq domain From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, dave.hansen@intel.com, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Jul 2020 09:02:35 -0700 Message-ID: <159534735519.28840.10435935598386192252.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> References: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Megha Dey When DEV_MSI is enabled, the dev_msi_default_domain is updated to the base DEV-MSI irq domain. If interrupt remapping is enabled, we create a new IR-DEV-MSI irq domain and update the dev_msi_default domain to the same. For X86, introduce a new irq_alloc_type which will be used by the interrupt remapping driver. Reviewed-by: Dan Williams Signed-off-by: Megha Dey Signed-off-by: Dave Jiang --- arch/x86/include/asm/hw_irq.h | 1 + arch/x86/kernel/apic/msi.c | 12 ++++++ drivers/base/dev-msi.c | 66 +++++++++++++++++++++++++++++++---- drivers/iommu/intel/irq_remapping.c | 11 +++++- include/linux/intel-iommu.h | 1 + include/linux/irqdomain.h | 11 ++++++ include/linux/msi.h | 3 ++ 7 files changed, 96 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index 8ecd7570589d..bdddd63add41 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -40,6 +40,7 @@ enum irq_alloc_type { X86_IRQ_ALLOC_TYPE_MSIX, X86_IRQ_ALLOC_TYPE_DMAR, X86_IRQ_ALLOC_TYPE_UV, + X86_IRQ_ALLOC_TYPE_DEV_MSI, }; struct irq_alloc_info { diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c index 5cbaca58af95..8b25cadbae09 100644 --- a/arch/x86/kernel/apic/msi.c +++ b/arch/x86/kernel/apic/msi.c @@ -507,3 +507,15 @@ int hpet_assign_irq(struct irq_domain *domain, struct hpet_channel *hc, return irq_domain_alloc_irqs(domain, 1, NUMA_NO_NODE, &info); } #endif + +#ifdef CONFIG_DEV_MSI +int dev_msi_prepare(struct irq_domain *domain, struct device *dev, + int nvec, msi_alloc_info_t *arg) +{ + memset(arg, 0, sizeof(*arg)); + + arg->type = X86_IRQ_ALLOC_TYPE_DEV_MSI; + + return 0; +} +#endif diff --git a/drivers/base/dev-msi.c b/drivers/base/dev-msi.c index 240ccc353933..43d6ed3ba10f 100644 --- a/drivers/base/dev-msi.c +++ b/drivers/base/dev-msi.c @@ -5,6 +5,7 @@ * Author: Megha Dey */ +#include #include #include #include @@ -32,7 +33,7 @@ static void dev_msi_set_desc(msi_alloc_info_t *arg, struct msi_desc *desc) arg->hwirq = dev_msi_calc_hwirq(desc); } -static int dev_msi_prepare(struct irq_domain *domain, struct device *dev, +int __weak dev_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec, msi_alloc_info_t *arg) { memset(arg, 0, sizeof(*arg)); @@ -81,15 +82,66 @@ static int __init create_dev_msi_domain(void) if (!fn) return -ENXIO; - dev_msi_default_domain = msi_create_irq_domain(fn, &dev_msi_domain_info, parent); + /* + * This initcall may come after remap code is initialized. Ensure that + * dev_msi_default domain is updated correctly. + */ if (!dev_msi_default_domain) { - pr_warn("failed to initialize irqdomain for DEV-MSI.\n"); - return -ENXIO; + dev_msi_default_domain = msi_create_irq_domain(fn, &dev_msi_domain_info, parent); + if (!dev_msi_default_domain) { + pr_warn("failed to initialize irqdomain for DEV-MSI.\n"); + return -ENXIO; + } + + irq_domain_update_bus_token(dev_msi_default_domain, DOMAIN_BUS_PLATFORM_MSI); + irq_domain_free_fwnode(fn); } - irq_domain_update_bus_token(dev_msi_default_domain, DOMAIN_BUS_PLATFORM_MSI); - irq_domain_free_fwnode(fn); - return 0; } device_initcall(create_dev_msi_domain); + +#ifdef CONFIG_IRQ_REMAP +static struct irq_chip dev_msi_ir_controller = { + .name = "IR-DEV-MSI", + .irq_unmask = platform_msi_unmask_irq, + .irq_mask = platform_msi_mask_irq, + .irq_write_msi_msg = platform_msi_write_msg, + .irq_ack = irq_chip_ack_parent, + .irq_retrigger = irq_chip_retrigger_hierarchy, + .irq_set_vcpu_affinity = irq_chip_set_vcpu_affinity_parent, + .flags = IRQCHIP_SKIP_SET_WAKE, +}; + +static struct msi_domain_info dev_msi_ir_domain_info = { + .flags = MSI_FLAG_USE_DEF_DOM_OPS | MSI_FLAG_USE_DEF_CHIP_OPS, + .ops = &dev_msi_domain_ops, + .chip = &dev_msi_ir_controller, + .handler = handle_edge_irq, + .handler_name = "edge", +}; + +struct irq_domain *create_remap_dev_msi_irq_domain(struct irq_domain *parent, + const char *name) +{ + struct fwnode_handle *fn; + struct irq_domain *domain; + + fn = irq_domain_alloc_named_fwnode(name); + if (!fn) + return NULL; + + domain = msi_create_irq_domain(fn, &dev_msi_ir_domain_info, parent); + if (!domain) { + pr_warn("failed to initialize irqdomain for IR-DEV-MSI.\n"); + return ERR_PTR(-ENXIO); + } + + irq_domain_update_bus_token(domain, DOMAIN_BUS_PLATFORM_MSI); + + if (!dev_msi_default_domain) + dev_msi_default_domain = domain; + + return domain; +} +#endif diff --git a/drivers/iommu/intel/irq_remapping.c b/drivers/iommu/intel/irq_remapping.c index 7f8769800815..51872aabe5f8 100644 --- a/drivers/iommu/intel/irq_remapping.c +++ b/drivers/iommu/intel/irq_remapping.c @@ -573,6 +573,10 @@ static int intel_setup_irq_remapping(struct intel_iommu *iommu) "INTEL-IR-MSI", iommu->seq_id); + iommu->ir_dev_msi_domain = + create_remap_dev_msi_irq_domain(iommu->ir_domain, + "INTEL-IR-DEV-MSI"); + ir_table->base = page_address(pages); ir_table->bitmap = bitmap; iommu->ir_table = ir_table; @@ -1299,9 +1303,10 @@ static void intel_irq_remapping_prepare_irte(struct intel_ir_data *data, case X86_IRQ_ALLOC_TYPE_HPET: case X86_IRQ_ALLOC_TYPE_MSI: case X86_IRQ_ALLOC_TYPE_MSIX: + case X86_IRQ_ALLOC_TYPE_DEV_MSI: if (info->type == X86_IRQ_ALLOC_TYPE_HPET) set_hpet_sid(irte, info->hpet_id); - else + else if (info->type != X86_IRQ_ALLOC_TYPE_DEV_MSI) set_msi_sid(irte, info->msi_dev); msg->address_hi = MSI_ADDR_BASE_HI; @@ -1353,8 +1358,10 @@ static int intel_irq_remapping_alloc(struct irq_domain *domain, if (!info || !iommu) return -EINVAL; + if (nr_irqs > 1 && info->type != X86_IRQ_ALLOC_TYPE_MSI && - info->type != X86_IRQ_ALLOC_TYPE_MSIX) + info->type != X86_IRQ_ALLOC_TYPE_MSIX && + info->type != X86_IRQ_ALLOC_TYPE_DEV_MSI) return -EINVAL; /* diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index d129baf7e0b8..3b868d1c43df 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -596,6 +596,7 @@ struct intel_iommu { struct ir_table *ir_table; /* Interrupt remapping info */ struct irq_domain *ir_domain; struct irq_domain *ir_msi_domain; + struct irq_domain *ir_dev_msi_domain; #endif struct iommu_device iommu; /* IOMMU core code handle */ int node; diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h index b37350c4fe37..e537d7b50cee 100644 --- a/include/linux/irqdomain.h +++ b/include/linux/irqdomain.h @@ -589,6 +589,17 @@ irq_domain_hierarchical_is_msi_remap(struct irq_domain *domain) } #endif /* CONFIG_IRQ_DOMAIN_HIERARCHY */ +#if defined(CONFIG_DEV_MSI) && defined(CONFIG_IRQ_REMAP) +extern struct irq_domain *create_remap_dev_msi_irq_domain(struct irq_domain *parent, + const char *name); +#else +static inline struct irq_domain *create_remap_dev_msi_irq_domain(struct irq_domain *parent, + const char *name) +{ + return NULL; +} +#endif + #else /* CONFIG_IRQ_DOMAIN */ static inline void irq_dispose_mapping(unsigned int virq) { } static inline struct irq_domain *irq_find_matching_fwnode( diff --git a/include/linux/msi.h b/include/linux/msi.h index 1da97f905720..7098ba566bcd 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -378,6 +378,9 @@ void *platform_msi_get_host_data(struct irq_domain *domain); void platform_msi_write_msg(struct irq_data *data, struct msi_msg *msg); void platform_msi_unmask_irq(struct irq_data *data); void platform_msi_mask_irq(struct irq_data *data); + +int dev_msi_prepare(struct irq_domain *domain, struct device *dev, + int nvec, msi_alloc_info_t *arg); #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */ #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN From patchwork Tue Jul 21 16:02:41 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11675965 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D647C618 for ; Tue, 21 Jul 2020 16:02:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C8FA22077D for ; Tue, 21 Jul 2020 16:02:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730121AbgGUQCo (ORCPT ); Tue, 21 Jul 2020 12:02:44 -0400 Received: from mga03.intel.com ([134.134.136.65]:30545 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728342AbgGUQCo (ORCPT ); Tue, 21 Jul 2020 12:02:44 -0400 IronPort-SDR: YydCq5NOClKvzf8lJdKr0JulIEHL2oQ+gFO2v+2vO4zgxngM7MOINyn3D6gaWUUuNkiJOo++Qd 32Tx/ZurOVUg== X-IronPort-AV: E=McAfee;i="6000,8403,9689"; a="150143040" X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="150143040" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2020 09:02:43 -0700 IronPort-SDR: +j90kn5v9Tgl0Cb2KqBMPe25kTMzUBivffLJ8Vm8x7FzmO/N993eiKojud0fo0UNMorDHUVrez ZYQ2dl7ticKw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="326413312" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by FMSMGA003.fm.intel.com with ESMTP; 21 Jul 2020 09:02:41 -0700 Subject: [PATCH RFC v2 04/18] irq/dev-msi: Introduce APIs to allocate/free dev-msi interrupts From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, dave.hansen@intel.com, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Jul 2020 09:02:41 -0700 Message-ID: <159534736176.28840.5547007059232964457.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> References: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Megha Dey The dev-msi interrupts are to be allocated/freed only for custom devices, not standard PCI-MSIX devices. These interrupts are device-defined and they are distinct from the already existing msi interrupts: pci-msi: Standard PCI MSI/MSI-X setup format platform-msi: Platform custom, but device-driver opaque MSI setup/control arch-msi: fallback for devices not assigned to the generic PCI domain dev-msi: device defined IRQ domain for ancillary devices. For e.g. DSA portal devices use device specific IMS(Interrupt message store) interrupts. dev-msi interrupts are represented by their own device-type. That means dev->msi_list is never contended for different interrupt types. It will either be all PCI-MSI or all device-defined. Reviewed-by: Dan Williams Signed-off-by: Megha Dey Signed-off-by: Dave Jiang --- drivers/base/dev-msi.c | 23 +++++++++++++++++++++++ include/linux/msi.h | 4 ++++ 2 files changed, 27 insertions(+) diff --git a/drivers/base/dev-msi.c b/drivers/base/dev-msi.c index 43d6ed3ba10f..4cc75bfd62da 100644 --- a/drivers/base/dev-msi.c +++ b/drivers/base/dev-msi.c @@ -145,3 +145,26 @@ struct irq_domain *create_remap_dev_msi_irq_domain(struct irq_domain *parent, return domain; } #endif + +int dev_msi_domain_alloc_irqs(struct device *dev, unsigned int nvec, + const struct platform_msi_ops *platform_ops) +{ + if (!dev->msi_domain) { + dev->msi_domain = dev_msi_default_domain; + } else if (dev->msi_domain != dev_msi_default_domain) { + dev_WARN_ONCE(dev, 1, "already registered to another irq domain?\n"); + return -ENXIO; + } + + return platform_msi_domain_alloc_irqs(dev, nvec, platform_ops); +} +EXPORT_SYMBOL_GPL(dev_msi_domain_alloc_irqs); + +void dev_msi_domain_free_irqs(struct device *dev) +{ + if (dev->msi_domain != dev_msi_default_domain) + dev_WARN_ONCE(dev, 1, "registered to incorrect irq domain?\n"); + + platform_msi_domain_free_irqs(dev); +} +EXPORT_SYMBOL_GPL(dev_msi_domain_free_irqs); diff --git a/include/linux/msi.h b/include/linux/msi.h index 7098ba566bcd..9dde8a43a0f7 100644 --- a/include/linux/msi.h +++ b/include/linux/msi.h @@ -381,6 +381,10 @@ void platform_msi_mask_irq(struct irq_data *data); int dev_msi_prepare(struct irq_domain *domain, struct device *dev, int nvec, msi_alloc_info_t *arg); + +int dev_msi_domain_alloc_irqs(struct device *dev, unsigned int nvec, + const struct platform_msi_ops *platform_ops); +void dev_msi_domain_free_irqs(struct device *dev); #endif /* CONFIG_GENERIC_MSI_IRQ_DOMAIN */ #ifdef CONFIG_PCI_MSI_IRQ_DOMAIN From patchwork Tue Jul 21 16:02:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11675975 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6A879618 for ; Tue, 21 Jul 2020 16:02:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 587842077D for ; Tue, 21 Jul 2020 16:02:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730179AbgGUQCw (ORCPT ); Tue, 21 Jul 2020 12:02:52 -0400 Received: from mga05.intel.com ([192.55.52.43]:34926 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730139AbgGUQCu (ORCPT ); Tue, 21 Jul 2020 12:02:50 -0400 IronPort-SDR: VaHCkGAyRyTc/2ttMVfUPkySBzE4l2OWzJ1HTE3e7caqUGqlW9Gz63rLZw/UgO1LEr9BWAfIfQ suRxC3Ug+aLw== X-IronPort-AV: E=McAfee;i="6000,8403,9689"; a="235019033" X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="235019033" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2020 09:02:49 -0700 IronPort-SDR: Pr+0KiI06VsmzTFBtydA++sP59+6WbfdU2s9RymrQVjldSt2RIZk/UHi89wCH+k5jAVy67tmpf tkjTncYuWumg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="318379489" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga008.jf.intel.com with ESMTP; 21 Jul 2020 09:02:48 -0700 Subject: [PATCH RFC v2 05/18] dmaengine: idxd: add support for readonly config devices From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, dave.hansen@intel.com, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Jul 2020 09:02:47 -0700 Message-ID: <159534736793.28840.16449656707947316088.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> References: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The VFIO mediated device for idxd driver will provide a virtual DSA device by backing it with a workqueue. The virtual device will be limited with the wq configuration registers set to read-only. Add support and helper functions for the handling of a DSA device with the configuration registers marked as read-only. Signed-off-by: Dave Jiang Reviewed-by: Kevin Tian --- drivers/dma/idxd/device.c | 116 +++++++++++++++++++++++++++++++++++++++++++++ drivers/dma/idxd/idxd.h | 1 drivers/dma/idxd/init.c | 8 +++ drivers/dma/idxd/sysfs.c | 21 +++++--- 4 files changed, 137 insertions(+), 9 deletions(-) diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index 9032b50b31af..7531ed9c1b81 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -750,3 +750,119 @@ int idxd_device_config(struct idxd_device *idxd) return 0; } + +static int idxd_wq_load_config(struct idxd_wq *wq) +{ + struct idxd_device *idxd = wq->idxd; + struct device *dev = &idxd->pdev->dev; + int wqcfg_offset; + int i; + + wqcfg_offset = idxd->wqcfg_offset + wq->id * 32; + memcpy_fromio(&wq->wqcfg, idxd->reg_base + wqcfg_offset, sizeof(union wqcfg)); + + wq->size = wq->wqcfg.wq_size; + wq->threshold = wq->wqcfg.wq_thresh; + if (wq->wqcfg.priv) + wq->type = IDXD_WQT_KERNEL; + + /* The driver does not support shared WQ mode in read-only config yet */ + if (wq->wqcfg.mode == 0 || wq->wqcfg.pasid_en) + return -EOPNOTSUPP; + + set_bit(WQ_FLAG_DEDICATED, &wq->flags); + + wq->priority = wq->wqcfg.priority; + + for (i = 0; i < 8; i++) { + wqcfg_offset = idxd->wqcfg_offset + wq->id * 32 + i * sizeof(u32); + dev_dbg(dev, "WQ[%d][%d][%#x]: %#x\n", + wq->id, i, wqcfg_offset, wq->wqcfg.bits[i]); + } + + return 0; +} + +static void idxd_group_load_config(struct idxd_group *group) +{ + struct idxd_device *idxd = group->idxd; + struct device *dev = &idxd->pdev->dev; + int i, j, grpcfg_offset; + + /* + * Load WQS bit fields + * Iterate through all 256 bits 64 bits at a time + */ + for (i = 0; i < 4; i++) { + struct idxd_wq *wq; + + grpcfg_offset = idxd->grpcfg_offset + group->id * 64 + i * sizeof(u64); + group->grpcfg.wqs[i] = ioread64(idxd->reg_base + grpcfg_offset); + dev_dbg(dev, "GRPCFG wq[%d:%d: %#x]: %#llx\n", + group->id, i, grpcfg_offset, group->grpcfg.wqs[i]); + + if (i * 64 >= idxd->max_wqs) + break; + + /* Iterate through all 64 bits and check for wq set */ + for (j = 0; j < 64; j++) { + int id = i * 64 + j; + + /* No need to check beyond max wqs */ + if (id >= idxd->max_wqs) + break; + + /* Set group assignment for wq if wq bit is set */ + if (group->grpcfg.wqs[i] & BIT(j)) { + wq = &idxd->wqs[id]; + wq->group = group; + } + } + } + + grpcfg_offset = idxd->grpcfg_offset + group->id * 64 + 32; + group->grpcfg.engines = ioread64(idxd->reg_base + grpcfg_offset); + dev_dbg(dev, "GRPCFG engs[%d: %#x]: %#llx\n", group->id, + grpcfg_offset, group->grpcfg.engines); + + for (i = 0; i < 64; i++) { + if (i >= idxd->max_engines) + break; + + if (group->grpcfg.engines & BIT(i)) { + struct idxd_engine *engine = &idxd->engines[i]; + + engine->group = group; + } + } + + grpcfg_offset = idxd->grpcfg_offset + group->id * 64 + 40; + group->grpcfg.flags.bits = ioread32(idxd->reg_base + grpcfg_offset); + dev_dbg(dev, "GRPFLAGS flags[%d: %#x]: %#x\n", + group->id, grpcfg_offset, group->grpcfg.flags.bits); +} + +int idxd_device_load_config(struct idxd_device *idxd) +{ + union gencfg_reg reg; + int i, rc; + + reg.bits = ioread32(idxd->reg_base + IDXD_GENCFG_OFFSET); + idxd->token_limit = reg.token_limit; + + for (i = 0; i < idxd->max_groups; i++) { + struct idxd_group *group = &idxd->groups[i]; + + idxd_group_load_config(group); + } + + for (i = 0; i < idxd->max_wqs; i++) { + struct idxd_wq *wq = &idxd->wqs[i]; + + rc = idxd_wq_load_config(wq); + if (rc < 0) + return rc; + } + + return 0; +} diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 518768885d7b..fcea8bc060f5 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -306,6 +306,7 @@ void idxd_device_cleanup(struct idxd_device *idxd); int idxd_device_config(struct idxd_device *idxd); void idxd_device_wqs_clear_state(struct idxd_device *idxd); void idxd_device_drain_pasid(struct idxd_device *idxd, int pasid); +int idxd_device_load_config(struct idxd_device *idxd); /* work queue control */ int idxd_wq_alloc_resources(struct idxd_wq *wq); diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index a7e1dbfcd173..50c68de6b4ab 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -344,6 +344,14 @@ static int idxd_probe(struct idxd_device *idxd) if (rc) goto err_setup; + /* If the configs are readonly, then load them from device */ + if (!test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) { + dev_dbg(dev, "Loading RO device config\n"); + rc = idxd_device_load_config(idxd); + if (rc < 0) + goto err_setup; + } + rc = idxd_setup_interrupts(idxd); if (rc) goto err_setup; diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index 57ea94a0dc51..fa1abdf503c2 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -102,7 +102,7 @@ static int idxd_config_bus_match(struct device *dev, static int idxd_config_bus_probe(struct device *dev) { - int rc; + int rc = 0; unsigned long flags; dev_dbg(dev, "%s called\n", __func__); @@ -120,7 +120,8 @@ static int idxd_config_bus_probe(struct device *dev) /* Perform IDXD configuration and enabling */ spin_lock_irqsave(&idxd->dev_lock, flags); - rc = idxd_device_config(idxd); + if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) + rc = idxd_device_config(idxd); spin_unlock_irqrestore(&idxd->dev_lock, flags); if (rc < 0) { module_put(THIS_MODULE); @@ -207,7 +208,8 @@ static int idxd_config_bus_probe(struct device *dev) } spin_lock_irqsave(&idxd->dev_lock, flags); - rc = idxd_device_config(idxd); + if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) + rc = idxd_device_config(idxd); spin_unlock_irqrestore(&idxd->dev_lock, flags); if (rc < 0) { mutex_unlock(&wq->wq_lock); @@ -328,13 +330,14 @@ static int idxd_config_bus_remove(struct device *dev) idxd_unregister_dma_device(idxd); rc = idxd_device_disable(idxd); + if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) { + for (i = 0; i < idxd->max_wqs; i++) { + struct idxd_wq *wq = &idxd->wqs[i]; - for (i = 0; i < idxd->max_wqs; i++) { - struct idxd_wq *wq = &idxd->wqs[i]; - - mutex_lock(&wq->wq_lock); - idxd_wq_disable_cleanup(wq); - mutex_unlock(&wq->wq_lock); + mutex_lock(&wq->wq_lock); + idxd_wq_disable_cleanup(wq); + mutex_unlock(&wq->wq_lock); + } } module_put(THIS_MODULE); From patchwork Tue Jul 21 16:02:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11675977 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 60AE8618 for ; Tue, 21 Jul 2020 16:03:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4791520720 for ; Tue, 21 Jul 2020 16:03:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730195AbgGUQC5 (ORCPT ); Tue, 21 Jul 2020 12:02:57 -0400 Received: from mga14.intel.com ([192.55.52.115]:23714 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730140AbgGUQC4 (ORCPT ); Tue, 21 Jul 2020 12:02:56 -0400 IronPort-SDR: kbCCVJltOLY8VGmPSxWdZPWScdMb0etBSijwkv2NRGVzzfIaaO5v4nRSvNCdFS4YwhK7fcxqQH M/fGnOvuV+9A== X-IronPort-AV: E=McAfee;i="6000,8403,9689"; a="149317914" X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="149317914" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2020 09:02:56 -0700 IronPort-SDR: TzPGTcZN9zlLB4fucHhhO1H9cnoXBW7xAE2Mbe+z2aDFCVbwFodYlgtqbeblCcXecZ2xQPB7uY yCxSAnwHxaug== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="310293518" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga004.fm.intel.com with ESMTP; 21 Jul 2020 09:02:54 -0700 Subject: [PATCH RFC v2 06/18] dmaengine: idxd: add interrupt handle request support From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, dave.hansen@intel.com, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Jul 2020 09:02:54 -0700 Message-ID: <159534737478.28840.15476839290532682395.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> References: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add support for requesting interrupt handle from the device. The interrupt handle is put in the interrupt handle field of a descriptor for the device to determine which interrupt vector to use be it MSI-X or IMS. On the host device, the interrupt handle is indexed to the MSI-X table. This allows a descriptor to program the interrupt handle 1:1 with the MSI-X index without getting it from the request interrupt handle device command. For a guest device, the index can be any index that the host assigned for the IMS table, and therefore it must be requested from the virtual device during MSI-X setup by the driver running on the guest. On the actual hardware the MSIX vector 0 is misc interrupt and handles events such as administrative command completion, error reporting, performance monitor overflow, and etc. The MSIX vectors 1...N are used for descriptor completion interrupts. On the guest kernel, the MSIX interrupts are backed by the mediated device through emulation or IMS vectors. Vector 0 is handled through emulation by the host vdcm. It only requires the host driver to send the signal to qemu. The vector 1 (and more may be supported later) is backed by IMS. Signed-off-by: Dave Jiang Reviewed-by: Kevin Tian --- drivers/dma/idxd/device.c | 30 ++++++++++++++++++++++++++++++ drivers/dma/idxd/idxd.h | 11 +++++++++++ drivers/dma/idxd/init.c | 29 +++++++++++++++++++++++++++++ drivers/dma/idxd/registers.h | 4 +++- drivers/dma/idxd/submit.c | 29 ++++++++++++++++++++++------- 5 files changed, 95 insertions(+), 8 deletions(-) diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index 7531ed9c1b81..2b4e8ab99ebd 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -502,6 +502,36 @@ void idxd_device_drain_pasid(struct idxd_device *idxd, int pasid) dev_dbg(dev, "pasid %d drained\n", pasid); } +#define INT_HANDLE_IMS_TABLE 0x10000 +int idxd_device_request_int_handle(struct idxd_device *idxd, int idx, + int *handle, enum idxd_interrupt_type irq_type) +{ + struct device *dev = &idxd->pdev->dev; + u32 operand, status; + + if (!(idxd->hw.cmd_cap & BIT(IDXD_CMD_REQUEST_INT_HANDLE))) + return -EOPNOTSUPP; + + dev_dbg(dev, "get int handle, idx %d\n", idx); + + operand = idx & 0xffff; + if (irq_type == IDXD_IRQ_IMS) + operand |= INT_HANDLE_IMS_TABLE; + dev_dbg(dev, "cmd: %u operand: %#x\n", + IDXD_CMD_REQUEST_INT_HANDLE, operand); + idxd_cmd_exec(idxd, IDXD_CMD_REQUEST_INT_HANDLE, operand, &status); + + if ((status & 0xff) != IDXD_CMDSTS_SUCCESS) { + dev_dbg(dev, "request int handle failed: %#x\n", status); + return -ENXIO; + } + + *handle = (status >> 8) & 0xffff; + + dev_dbg(dev, "int handle acquired: %u\n", *handle); + return 0; +} + /* Device configuration bits */ static void idxd_group_config_write(struct idxd_group *group) { diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index fcea8bc060f5..2cd190a3da73 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -138,6 +138,7 @@ struct idxd_hw { union group_cap_reg group_cap; union engine_cap_reg engine_cap; struct opcap opcap; + u32 cmd_cap; }; enum idxd_device_state { @@ -201,6 +202,8 @@ struct idxd_device { struct dma_device dma_dev; struct workqueue_struct *wq; struct work_struct work; + + int *int_handles; }; /* IDXD software descriptor */ @@ -214,6 +217,7 @@ struct idxd_desc { struct list_head list; int id; int cpu; + unsigned int vec_ptr; struct idxd_wq *wq; }; @@ -242,6 +246,11 @@ enum idxd_portal_prot { IDXD_PORTAL_LIMITED, }; +enum idxd_interrupt_type { + IDXD_IRQ_MSIX = 0, + IDXD_IRQ_IMS, +}; + static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot) { return prot * 0x1000; @@ -307,6 +316,8 @@ int idxd_device_config(struct idxd_device *idxd); void idxd_device_wqs_clear_state(struct idxd_device *idxd); void idxd_device_drain_pasid(struct idxd_device *idxd, int pasid); int idxd_device_load_config(struct idxd_device *idxd); +int idxd_device_request_int_handle(struct idxd_device *idxd, int idx, int *handle, + enum idxd_interrupt_type irq_type); /* work queue control */ int idxd_wq_alloc_resources(struct idxd_wq *wq); diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index 50c68de6b4ab..9fd505a03444 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -132,6 +132,22 @@ static int idxd_setup_interrupts(struct idxd_device *idxd) } dev_dbg(dev, "Allocated idxd-msix %d for vector %d\n", i, msix->vector); + + if (idxd->hw.cmd_cap & BIT(IDXD_CMD_REQUEST_INT_HANDLE)) { + /* + * The MSIX vector enumeration starts at 1 with vector 0 being the + * misc interrupt that handles non I/O completion events. The + * interrupt handles are for IMS enumeration on guest. The misc + * interrupt vector does not require a handle and therefore we start + * the int_handles at index 0. Since 'i' starts at 1, the first + * int_handles index will be 0. + */ + rc = idxd_device_request_int_handle(idxd, i, &idxd->int_handles[i - 1], + IDXD_IRQ_MSIX); + if (rc < 0) + goto err_no_irq; + dev_dbg(dev, "int handle requested: %u\n", idxd->int_handles[i - 1]); + } } idxd_unmask_error_interrupts(idxd); @@ -159,6 +175,13 @@ static int idxd_setup_internals(struct idxd_device *idxd) int i; init_waitqueue_head(&idxd->cmd_waitq); + + if (idxd->hw.cmd_cap & BIT(IDXD_CMD_REQUEST_INT_HANDLE)) { + idxd->int_handles = devm_kcalloc(dev, idxd->max_wqs, sizeof(int), GFP_KERNEL); + if (!idxd->int_handles) + return -ENOMEM; + } + idxd->groups = devm_kcalloc(dev, idxd->max_groups, sizeof(struct idxd_group), GFP_KERNEL); if (!idxd->groups) @@ -230,6 +253,12 @@ static void idxd_read_caps(struct idxd_device *idxd) /* reading generic capabilities */ idxd->hw.gen_cap.bits = ioread64(idxd->reg_base + IDXD_GENCAP_OFFSET); dev_dbg(dev, "gen_cap: %#llx\n", idxd->hw.gen_cap.bits); + + if (idxd->hw.gen_cap.cmd_cap) { + idxd->hw.cmd_cap = ioread32(idxd->reg_base + IDXD_CMDCAP_OFFSET); + dev_dbg(dev, "cmd_cap: %#x\n", idxd->hw.cmd_cap); + } + idxd->max_xfer_bytes = 1ULL << idxd->hw.gen_cap.max_xfer_shift; dev_dbg(dev, "max xfer size: %llu bytes\n", idxd->max_xfer_bytes); idxd->max_batch_size = 1U << idxd->hw.gen_cap.max_batch_shift; diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h index a0df4f3fe1fb..ace7248ee195 100644 --- a/drivers/dma/idxd/registers.h +++ b/drivers/dma/idxd/registers.h @@ -23,8 +23,8 @@ union gen_cap_reg { u64 overlap_copy:1; u64 cache_control_mem:1; u64 cache_control_cache:1; + u64 cmd_cap:1; u64 rsvd:3; - u64 int_handle_req:1; u64 dest_readback:1; u64 drain_readback:1; u64 rsvd2:6; @@ -223,6 +223,8 @@ enum idxd_cmdsts_err { IDXD_CMDSTS_ERR_NO_HANDLE, }; +#define IDXD_CMDCAP_OFFSET 0xb0 + #define IDXD_SWERR_OFFSET 0xc0 #define IDXD_SWERR_VALID 0x00000001 #define IDXD_SWERR_OVERFLOW 0x00000002 diff --git a/drivers/dma/idxd/submit.c b/drivers/dma/idxd/submit.c index 3e63d820a98e..70c7703a4495 100644 --- a/drivers/dma/idxd/submit.c +++ b/drivers/dma/idxd/submit.c @@ -22,11 +22,17 @@ static struct idxd_desc *__get_desc(struct idxd_wq *wq, int idx, int cpu) desc->hw->pasid = idxd->pasid; /* - * Descriptor completion vectors are 1-8 for MSIX. We will round - * robin through the 8 vectors. + * Descriptor completion vectors are 1...N for MSIX. We will round + * robin through the N vectors. */ wq->vec_ptr = (wq->vec_ptr % idxd->num_wq_irqs) + 1; - desc->hw->int_handle = wq->vec_ptr; + if (!idxd->int_handles) { + desc->hw->int_handle = wq->vec_ptr; + } else { + desc->vec_ptr = wq->vec_ptr; + desc->hw->int_handle = idxd->int_handles[desc->vec_ptr]; + } + return desc; } @@ -79,7 +85,6 @@ void idxd_free_desc(struct idxd_wq *wq, struct idxd_desc *desc) int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc) { struct idxd_device *idxd = wq->idxd; - int vec = desc->hw->int_handle; void __iomem *portal; if (idxd->state != IDXD_DEV_ENABLED) @@ -110,9 +115,19 @@ int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc) * Pending the descriptor to the lockless list for the irq_entry * that we designated the descriptor to. */ - if (desc->hw->flags & IDXD_OP_FLAG_RCI) - llist_add(&desc->llnode, - &idxd->irq_entries[vec].pending_llist); + if (desc->hw->flags & IDXD_OP_FLAG_RCI) { + int vec; + + /* + * If the driver is on host kernel, it would be the value + * assigned to interrupt handle, which is index for MSIX + * vector. If it's guest then can't use the int_handle since + * that is the index to IMS for the entire device. The guest + * device local index will be used. + */ + vec = !idxd->int_handles ? desc->hw->int_handle : desc->vec_ptr; + llist_add(&desc->llnode, &idxd->irq_entries[vec].pending_llist); + } return 0; } From patchwork Tue Jul 21 16:03:01 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11675989 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 965CF618 for ; Tue, 21 Jul 2020 16:03:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8402E20720 for ; Tue, 21 Jul 2020 16:03:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730219AbgGUQDN (ORCPT ); Tue, 21 Jul 2020 12:03:13 -0400 Received: from mga18.intel.com ([134.134.136.126]:27543 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728379AbgGUQDL (ORCPT ); Tue, 21 Jul 2020 12:03:11 -0400 IronPort-SDR: Dq0djEvBF8TiOJncBT7x7vNnMSKx7Knez7eG/DZBaDTepUN+hDvUmQ7oolelucTKxp7oRBcJSQ X665Cktj9Nnw== X-IronPort-AV: E=McAfee;i="6000,8403,9689"; a="137657200" X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="137657200" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2020 09:03:05 -0700 IronPort-SDR: 2jqYA5uRDWKKOM6kQb9ONkoepEQl6cA680AXMAE0esVC+NDaFW9xMtfN4ACASNt9oG3jkdaV9+ B3Dpwpz7LTog== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="488134027" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga005.fm.intel.com with ESMTP; 21 Jul 2020 09:03:01 -0700 Subject: [PATCH RFC v2 07/18] dmaengine: idxd: add DEV-MSI support in base driver From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, dave.hansen@intel.com, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Jul 2020 09:03:01 -0700 Message-ID: <159534738102.28840.3470861020183323671.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> References: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org In preparation for support of VFIO mediated device for idxd driver, the enabling for Interrupt Message Store (IMS) interrupts is added for the idxd base driver. DEV-MSI is the generic kernel support that mechanisms like IMS can use to get their interrupts enabled. With IMS support the idxd driver can dynamically allocate interrupts on a per mdev basis based on how many IMS vectors that are mapped to the mdev device. This commit only provides the support functions in the base driver and not the VFIO mdev code utilization. The commit has some portal related changes. A "portal" is a special location within the MMIO BAR2 of the DSA device where descriptors are submitted via the CPU command MOVDIR64B or ENQCMD(S). The offset for the portal address determines whether the submitted descriptor is for MSI-X or IMS notification. See Intel SIOV spec for more details: https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification Signed-off-by: Dave Jiang Reviewed-by: Kevin Tian --- drivers/dma/idxd/Makefile | 2 +- drivers/dma/idxd/cdev.c | 4 ++- drivers/dma/idxd/idxd.h | 14 ++++++++---- drivers/dma/idxd/ims.c | 53 +++++++++++++++++++++++++++++++++++++++++++++ drivers/dma/idxd/init.c | 52 ++++++++++++++++++++++++++++++++++++++++++++ drivers/dma/idxd/submit.c | 10 +++++++- drivers/dma/idxd/sysfs.c | 11 +++++++++ 7 files changed, 137 insertions(+), 9 deletions(-) create mode 100644 drivers/dma/idxd/ims.c diff --git a/drivers/dma/idxd/Makefile b/drivers/dma/idxd/Makefile index 8978b898d777..d1519b9d1dd0 100644 --- a/drivers/dma/idxd/Makefile +++ b/drivers/dma/idxd/Makefile @@ -1,2 +1,2 @@ obj-$(CONFIG_INTEL_IDXD) += idxd.o -idxd-y := init.o irq.o device.o sysfs.o submit.o dma.o cdev.o +idxd-y := init.o irq.o device.o sysfs.o submit.o dma.o cdev.o ims.o diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c index d4841d8c0928..ffeae646f947 100644 --- a/drivers/dma/idxd/cdev.c +++ b/drivers/dma/idxd/cdev.c @@ -202,8 +202,8 @@ static int idxd_cdev_mmap(struct file *filp, struct vm_area_struct *vma) return rc; vma->vm_flags |= VM_DONTCOPY; - pfn = (base + idxd_get_wq_portal_full_offset(wq->id, - IDXD_PORTAL_LIMITED)) >> PAGE_SHIFT; + pfn = (base + idxd_get_wq_portal_full_offset(wq->id, IDXD_PORTAL_LIMITED, + IDXD_IRQ_MSIX)) >> PAGE_SHIFT; vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); vma->vm_private_data = ctx; diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 2cd190a3da73..c65fb6dcb7e0 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -152,6 +152,7 @@ enum idxd_device_flag { IDXD_FLAG_CONFIGURABLE = 0, IDXD_FLAG_CMD_RUNNING, IDXD_FLAG_PASID_ENABLED, + IDXD_FLAG_SIOV_SUPPORTED, }; struct idxd_device { @@ -178,6 +179,7 @@ struct idxd_device { int num_groups; + u32 ims_offset; u32 msix_perm_offset; u32 wqcfg_offset; u32 grpcfg_offset; @@ -185,6 +187,7 @@ struct idxd_device { u64 max_xfer_bytes; u32 max_batch_size; + int ims_size; int max_groups; int max_engines; int max_tokens; @@ -204,6 +207,7 @@ struct idxd_device { struct work_struct work; int *int_handles; + struct sbitmap ims_sbmap; }; /* IDXD software descriptor */ @@ -251,15 +255,17 @@ enum idxd_interrupt_type { IDXD_IRQ_IMS, }; -static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot) +static inline int idxd_get_wq_portal_offset(enum idxd_portal_prot prot, + enum idxd_interrupt_type irq_type) { - return prot * 0x1000; + return prot * 0x1000 + irq_type * 0x2000; } static inline int idxd_get_wq_portal_full_offset(int wq_id, - enum idxd_portal_prot prot) + enum idxd_portal_prot prot, + enum idxd_interrupt_type irq_type) { - return ((wq_id * 4) << PAGE_SHIFT) + idxd_get_wq_portal_offset(prot); + return ((wq_id * 4) << PAGE_SHIFT) + idxd_get_wq_portal_offset(prot, irq_type); } static inline void idxd_set_type(struct idxd_device *idxd) diff --git a/drivers/dma/idxd/ims.c b/drivers/dma/idxd/ims.c new file mode 100644 index 000000000000..5fece66122a2 --- /dev/null +++ b/drivers/dma/idxd/ims.c @@ -0,0 +1,53 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */ +#include +#include +#include +#include +#include +#include +#include +#include +#include "registers.h" +#include "idxd.h" + +static void idxd_free_ims_index(struct idxd_device *idxd, + unsigned long ims_idx) +{ + sbitmap_clear_bit(&idxd->ims_sbmap, ims_idx); +} + +static int idxd_alloc_ims_index(struct idxd_device *idxd) +{ + int index; + + index = sbitmap_get(&idxd->ims_sbmap, 0, false); + if (index < 0) + return -ENOSPC; + return index; +} + +static unsigned int idxd_ims_irq_mask(struct msi_desc *desc) +{ + // Filled out later when VDCM is introduced. + + return 0; +} + +static unsigned int idxd_ims_irq_unmask(struct msi_desc *desc) +{ + // Filled out later when VDCM is introduced. + + return 0; +} + +static void idxd_ims_write_msg(struct msi_desc *desc, struct msi_msg *msg) +{ + // Filled out later when VDCM is introduced. +} + +static struct platform_msi_ops idxd_ims_ops = { + .irq_mask = idxd_ims_irq_mask, + .irq_unmask = idxd_ims_irq_unmask, + .write_msg = idxd_ims_write_msg, +}; diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index 9fd505a03444..3e2c7ac83daf 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -241,10 +241,51 @@ static void idxd_read_table_offsets(struct idxd_device *idxd) idxd->msix_perm_offset = offsets.msix_perm * 0x100; dev_dbg(dev, "IDXD MSIX Permission Offset: %#x\n", idxd->msix_perm_offset); + idxd->ims_offset = offsets.ims * 0x100; + dev_dbg(dev, "IDXD IMS Offset: %#x\n", idxd->ims_offset); idxd->perfmon_offset = offsets.perfmon * 0x100; dev_dbg(dev, "IDXD Perfmon Offset: %#x\n", idxd->perfmon_offset); } +#define PCI_DEVSEC_CAP 0x23 +#define SIOVDVSEC1(offset) ((offset) + 0x4) +#define SIOVDVSEC2(offset) ((offset) + 0x8) +#define DVSECID 0x5 +#define SIOVCAP(offset) ((offset) + 0x14) + +static void idxd_check_siov(struct idxd_device *idxd) +{ + struct pci_dev *pdev = idxd->pdev; + struct device *dev = &pdev->dev; + int dvsec; + u16 val16; + u32 val32; + + dvsec = pci_find_ext_capability(pdev, PCI_DEVSEC_CAP); + pci_read_config_word(pdev, SIOVDVSEC1(dvsec), &val16); + if (val16 != PCI_VENDOR_ID_INTEL) { + dev_dbg(&pdev->dev, "DVSEC vendor id is not Intel\n"); + return; + } + + pci_read_config_word(pdev, SIOVDVSEC2(dvsec), &val16); + if (val16 != DVSECID) { + dev_dbg(&pdev->dev, "DVSEC ID is not SIOV\n"); + return; + } + + pci_read_config_dword(pdev, SIOVCAP(dvsec), &val32); + if ((val32 & 0x1) && idxd->hw.gen_cap.max_ims_mult) { + idxd->ims_size = idxd->hw.gen_cap.max_ims_mult * 256ULL; + dev_dbg(dev, "IMS size: %u\n", idxd->ims_size); + set_bit(IDXD_FLAG_SIOV_SUPPORTED, &idxd->flags); + dev_dbg(&pdev->dev, "IMS supported for device\n"); + return; + } + + dev_dbg(&pdev->dev, "SIOV unsupported for device\n"); +} + static void idxd_read_caps(struct idxd_device *idxd) { struct device *dev = &idxd->pdev->dev; @@ -263,6 +304,7 @@ static void idxd_read_caps(struct idxd_device *idxd) dev_dbg(dev, "max xfer size: %llu bytes\n", idxd->max_xfer_bytes); idxd->max_batch_size = 1U << idxd->hw.gen_cap.max_batch_shift; dev_dbg(dev, "max batch size: %u\n", idxd->max_batch_size); + idxd_check_siov(idxd); if (idxd->hw.gen_cap.config_en) set_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags); @@ -397,9 +439,19 @@ static int idxd_probe(struct idxd_device *idxd) idxd->major = idxd_cdev_get_major(idxd); + if (idxd->ims_size) { + rc = sbitmap_init_node(&idxd->ims_sbmap, idxd->ims_size, -1, + GFP_KERNEL, dev_to_node(dev)); + if (rc < 0) + goto sbitmap_fail; + } dev_dbg(dev, "IDXD device %d probed successfully\n", idxd->id); return 0; + sbitmap_fail: + mutex_lock(&idxd_idr_lock); + idr_remove(&idxd_idrs[idxd->type], idxd->id); + mutex_unlock(&idxd_idr_lock); err_idr_fail: idxd_mask_error_interrupts(idxd); idxd_mask_msix_vectors(idxd); diff --git a/drivers/dma/idxd/submit.c b/drivers/dma/idxd/submit.c index 70c7703a4495..f0a0a0d21a7a 100644 --- a/drivers/dma/idxd/submit.c +++ b/drivers/dma/idxd/submit.c @@ -30,7 +30,13 @@ static struct idxd_desc *__get_desc(struct idxd_wq *wq, int idx, int cpu) desc->hw->int_handle = wq->vec_ptr; } else { desc->vec_ptr = wq->vec_ptr; - desc->hw->int_handle = idxd->int_handles[desc->vec_ptr]; + /* + * int_handles are only for descriptor completion. However for device + * MSIX enumeration, vec 0 is used for misc interrupts. Therefore even + * though we are rotating through 1...N for descriptor interrupts, we + * need to acqurie the int_handles from 0..N-1. + */ + desc->hw->int_handle = idxd->int_handles[desc->vec_ptr - 1]; } return desc; @@ -90,7 +96,7 @@ int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc) if (idxd->state != IDXD_DEV_ENABLED) return -EIO; - portal = wq->portal + idxd_get_wq_portal_offset(IDXD_PORTAL_LIMITED); + portal = wq->portal + idxd_get_wq_portal_offset(IDXD_PORTAL_LIMITED, IDXD_IRQ_MSIX); /* * The wmb() flushes writes to coherent DMA data before diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index fa1abdf503c2..501a1d489ce3 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -1269,6 +1269,16 @@ static ssize_t numa_node_show(struct device *dev, } static DEVICE_ATTR_RO(numa_node); +static ssize_t ims_size_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct idxd_device *idxd = + container_of(dev, struct idxd_device, conf_dev); + + return sprintf(buf, "%u\n", idxd->ims_size); +} +static DEVICE_ATTR_RO(ims_size); + static ssize_t max_batch_size_show(struct device *dev, struct device_attribute *attr, char *buf) { @@ -1455,6 +1465,7 @@ static struct attribute *idxd_device_attributes[] = { &dev_attr_max_work_queues_size.attr, &dev_attr_max_engines.attr, &dev_attr_numa_node.attr, + &dev_attr_ims_size.attr, &dev_attr_max_batch_size.attr, &dev_attr_max_transfer_size.attr, &dev_attr_op_cap.attr, From patchwork Tue Jul 21 16:03:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11676001 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A50B51392 for ; Tue, 21 Jul 2020 16:03:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 95A98207BB for ; Tue, 21 Jul 2020 16:03:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730136AbgGUQDL (ORCPT ); Tue, 21 Jul 2020 12:03:11 -0400 Received: from mga02.intel.com ([134.134.136.20]:5374 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727058AbgGUQDL (ORCPT ); Tue, 21 Jul 2020 12:03:11 -0400 IronPort-SDR: 1tC4jcURO/5EA5m326zKKKMVORHDAvpiyK2ky8j4dbDFSyqwpc1ZrkS2/C/StobG1JNZFHrdIX fuM7ldvObc4g== X-IronPort-AV: E=McAfee;i="6000,8403,9689"; a="138252508" X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="138252508" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2020 09:03:09 -0700 IronPort-SDR: /pjMsLIxKcWsy/wn5atmSOoFzSeBxOeyT/vCygbm1+GDCi5YJ0eKbF6XHsOu13smUjm15HgFNj DWG3BMtxOP3g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="462124823" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga005.jf.intel.com with ESMTP; 21 Jul 2020 09:03:07 -0700 Subject: [PATCH RFC v2 08/18] dmaengine: idxd: add device support functions in prep for mdev From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, dave.hansen@intel.com, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Jul 2020 09:03:07 -0700 Message-ID: <159534738747.28840.9497901333645332923.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> References: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add device support helper functions in preparation of adding VFIO mdev support. Signed-off-by: Dave Jiang Reviewed-by: Kevin Tian --- drivers/dma/idxd/device.c | 61 +++++++++++++++++++++++++++++++++++++++++++++ drivers/dma/idxd/idxd.h | 4 +++ 2 files changed, 65 insertions(+) diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index 2b4e8ab99ebd..7443ffb5d3c3 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -287,6 +287,30 @@ void idxd_wq_unmap_portal(struct idxd_wq *wq) devm_iounmap(dev, wq->portal); } +int idxd_wq_abort(struct idxd_wq *wq) +{ + struct idxd_device *idxd = wq->idxd; + struct device *dev = &idxd->pdev->dev; + u32 operand, status; + + dev_dbg(dev, "Abort WQ %d\n", wq->id); + if (wq->state != IDXD_WQ_ENABLED) { + dev_dbg(dev, "WQ %d not active\n", wq->id); + return -ENXIO; + } + + operand = BIT(wq->id % 16) | ((wq->id / 16) << 16); + dev_dbg(dev, "cmd: %u operand: %#x\n", IDXD_CMD_ABORT_WQ, operand); + idxd_cmd_exec(idxd, IDXD_CMD_ABORT_WQ, operand, &status); + if (status != IDXD_CMDSTS_SUCCESS) { + dev_dbg(dev, "WQ abort failed: %#x\n", status); + return -ENXIO; + } + + dev_dbg(dev, "WQ %d aborted\n", wq->id); + return 0; +} + int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid) { struct idxd_device *idxd = wq->idxd; @@ -366,6 +390,32 @@ void idxd_wq_disable_cleanup(struct idxd_wq *wq) } } +void idxd_wq_setup_pasid(struct idxd_wq *wq, int pasid) +{ + struct idxd_device *idxd = wq->idxd; + int offset; + + lockdep_assert_held(&idxd->dev_lock); + + /* PASID fields are 8 bytes into the WQCFG register */ + offset = idxd->wqcfg_offset + wq->id * 32 + 8; + wq->wqcfg.pasid = pasid; + iowrite32(wq->wqcfg.bits[2], idxd->reg_base + offset); +} + +void idxd_wq_setup_priv(struct idxd_wq *wq, int priv) +{ + struct idxd_device *idxd = wq->idxd; + int offset; + + lockdep_assert_held(&idxd->dev_lock); + + /* priv field is 8 bytes into the WQCFG register */ + offset = idxd->wqcfg_offset + wq->id * 32 + 8; + wq->wqcfg.priv = !!priv; + iowrite32(wq->wqcfg.bits[2], idxd->reg_base + offset); +} + /* Device control bits */ static inline bool idxd_is_enabled(struct idxd_device *idxd) { @@ -502,6 +552,17 @@ void idxd_device_drain_pasid(struct idxd_device *idxd, int pasid) dev_dbg(dev, "pasid %d drained\n", pasid); } +void idxd_device_abort_pasid(struct idxd_device *idxd, int pasid) +{ + struct device *dev = &idxd->pdev->dev; + u32 operand; + + operand = pasid; + dev_dbg(dev, "cmd: %u operand: %#x\n", IDXD_CMD_ABORT_PASID, operand); + idxd_cmd_exec(idxd, IDXD_CMD_ABORT_PASID, operand, NULL); + dev_dbg(dev, "pasid %d aborted\n", pasid); +} + #define INT_HANDLE_IMS_TABLE 0x10000 int idxd_device_request_int_handle(struct idxd_device *idxd, int idx, int *handle, enum idxd_interrupt_type irq_type) diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index c65fb6dcb7e0..438d6478a3f8 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -321,6 +321,7 @@ void idxd_device_cleanup(struct idxd_device *idxd); int idxd_device_config(struct idxd_device *idxd); void idxd_device_wqs_clear_state(struct idxd_device *idxd); void idxd_device_drain_pasid(struct idxd_device *idxd, int pasid); +void idxd_device_abort_pasid(struct idxd_device *idxd, int pasid); int idxd_device_load_config(struct idxd_device *idxd); int idxd_device_request_int_handle(struct idxd_device *idxd, int idx, int *handle, enum idxd_interrupt_type irq_type); @@ -336,6 +337,9 @@ void idxd_wq_unmap_portal(struct idxd_wq *wq); void idxd_wq_disable_cleanup(struct idxd_wq *wq); int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid); int idxd_wq_disable_pasid(struct idxd_wq *wq); +int idxd_wq_abort(struct idxd_wq *wq); +void idxd_wq_setup_pasid(struct idxd_wq *wq, int pasid); +void idxd_wq_setup_priv(struct idxd_wq *wq, int priv); /* submission */ int idxd_submit_desc(struct idxd_wq *wq, struct idxd_desc *desc); From patchwork Tue Jul 21 16:03:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11675995 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F28F9159A for ; Tue, 21 Jul 2020 16:03:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D5F9F21702 for ; Tue, 21 Jul 2020 16:03:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728379AbgGUQDT (ORCPT ); Tue, 21 Jul 2020 12:03:19 -0400 Received: from mga02.intel.com ([134.134.136.20]:5400 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730231AbgGUQDS (ORCPT ); Tue, 21 Jul 2020 12:03:18 -0400 IronPort-SDR: Wc6WvrtmpIkOSEQGxccs/wBUF4miaxIeLrvFXi1mgRITQk7D7vNEqDli02Zpd7sgcorTj7tDUJ zqvNZE2sLCZQ== X-IronPort-AV: E=McAfee;i="6000,8403,9689"; a="138252542" X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="138252542" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2020 09:03:16 -0700 IronPort-SDR: Hbgsla0mX8zT2rK5MfR86ylyfD4Ju/rPWb69Nt9MQu6SwF/rW20dS0DGIQMxBh4tbkY4OweNLO 04fFwpM+jyuw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="327914017" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga007.jf.intel.com with ESMTP; 21 Jul 2020 09:03:14 -0700 Subject: [PATCH RFC v2 09/18] dmaengine: idxd: add basic mdev registration and helper functions From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, dave.hansen@intel.com, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Jul 2020 09:03:14 -0700 Message-ID: <159534739457.28840.11000033925088538164.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> References: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Create a mediated device through the VFIO mediated device framework. The mdev framework allows creation of an mediated device by the driver with portion of the device's resources. The driver will emulate the slow path such as the PCI config space, MMIO bar, and the command registers. The descriptor submission portal(s) will be mmaped to the guest in order to submit descriptors directly by the guest kernel or apps. The mediated device support code in the idxd will be referred to as the Virtual Device Composition Module (vdcm). Add basic plumbing to fill out the mdev_parent_ops struct that VFIO mdev requires to support a mediated device. Signed-off-by: Dave Jiang Reviewed-by: Kevin Tian --- drivers/dma/Kconfig | 6 drivers/dma/idxd/Makefile | 4 drivers/dma/idxd/idxd.h | 11 + drivers/dma/idxd/ims.c | 13 + drivers/dma/idxd/ims.h | 10 drivers/dma/idxd/init.c | 11 + drivers/dma/idxd/mdev.c | 980 +++++++++++++++++++++++++++++++++++++++++++++ drivers/dma/idxd/mdev.h | 118 +++++ drivers/dma/idxd/vdev.c | 76 +++ drivers/dma/idxd/vdev.h | 19 + 10 files changed, 1247 insertions(+), 1 deletion(-) create mode 100644 drivers/dma/idxd/ims.h create mode 100644 drivers/dma/idxd/mdev.c create mode 100644 drivers/dma/idxd/mdev.h create mode 100644 drivers/dma/idxd/vdev.c create mode 100644 drivers/dma/idxd/vdev.h diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 6a908785a5f7..69c1ae72df86 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -306,6 +306,12 @@ config INTEL_IDXD_SVM depends on PCI_PASID depends on PCI_IOV +config INTEL_IDXD_MDEV + bool "IDXD VFIO Mediated Device Support" + depends on INTEL_IDXD + depends on VFIO_MDEV + depends on VFIO_MDEV_DEVICE + config INTEL_IOATDMA tristate "Intel I/OAT DMA support" depends on PCI && X86_64 diff --git a/drivers/dma/idxd/Makefile b/drivers/dma/idxd/Makefile index d1519b9d1dd0..18622f81eb3f 100644 --- a/drivers/dma/idxd/Makefile +++ b/drivers/dma/idxd/Makefile @@ -1,2 +1,4 @@ obj-$(CONFIG_INTEL_IDXD) += idxd.o -idxd-y := init.o irq.o device.o sysfs.o submit.o dma.o cdev.o ims.o +idxd-y := init.o irq.o device.o sysfs.o submit.o dma.o cdev.o + +idxd-$(CONFIG_INTEL_IDXD_MDEV) += ims.o mdev.o vdev.o diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 438d6478a3f8..9588872cd273 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -121,6 +121,7 @@ struct idxd_wq { struct sbitmap_queue sbq; struct dma_chan dma_chan; char name[WQ_NAME_SIZE + 1]; + struct list_head vdcm_list; }; struct idxd_engine { @@ -153,6 +154,7 @@ enum idxd_device_flag { IDXD_FLAG_CMD_RUNNING, IDXD_FLAG_PASID_ENABLED, IDXD_FLAG_SIOV_SUPPORTED, + IDXD_FLAG_MDEV_ENABLED, }; struct idxd_device { @@ -245,6 +247,11 @@ static inline bool device_pasid_enabled(struct idxd_device *idxd) return test_bit(IDXD_FLAG_PASID_ENABLED, &idxd->flags); } +static inline bool device_mdev_enabled(struct idxd_device *idxd) +{ + return test_bit(IDXD_FLAG_MDEV_ENABLED, &idxd->flags); +} + enum idxd_portal_prot { IDXD_PORTAL_UNLIMITED = 0, IDXD_PORTAL_LIMITED, @@ -363,4 +370,8 @@ int idxd_cdev_get_major(struct idxd_device *idxd); int idxd_wq_add_cdev(struct idxd_wq *wq); void idxd_wq_del_cdev(struct idxd_wq *wq); +/* mdev */ +int idxd_mdev_host_init(struct idxd_device *idxd); +void idxd_mdev_host_release(struct idxd_device *idxd); + #endif diff --git a/drivers/dma/idxd/ims.c b/drivers/dma/idxd/ims.c index 5fece66122a2..bffc74c2b305 100644 --- a/drivers/dma/idxd/ims.c +++ b/drivers/dma/idxd/ims.c @@ -10,6 +10,19 @@ #include #include "registers.h" #include "idxd.h" +#include "mdev.h" + +int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd) +{ + /* PLACEHOLDER */ + return 0; +} + +int vidxd_free_ims_entries(struct vdcm_idxd *vidxd) +{ + /* PLACEHOLDER */ + return 0; +} static void idxd_free_ims_index(struct idxd_device *idxd, unsigned long ims_idx) diff --git a/drivers/dma/idxd/ims.h b/drivers/dma/idxd/ims.h new file mode 100644 index 000000000000..3d823606e3a3 --- /dev/null +++ b/drivers/dma/idxd/ims.h @@ -0,0 +1,10 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */ + +#ifndef _IDXD_IMS_H_ +#define _IDXD_IMS_H_ + +int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd); +int vidxd_free_ims_entries(struct vdcm_idxd *vidxd); + +#endif diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c index 3e2c7ac83daf..639ca74ae1f8 100644 --- a/drivers/dma/idxd/init.c +++ b/drivers/dma/idxd/init.c @@ -211,6 +211,7 @@ static int idxd_setup_internals(struct idxd_device *idxd) wq->idxd = idxd; mutex_init(&wq->wq_lock); wq->idxd_cdev.minor = -1; + INIT_LIST_HEAD(&wq->vdcm_list); } for (i = 0; i < idxd->max_engines; i++) { @@ -507,6 +508,14 @@ static int idxd_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) return -ENODEV; } + if (IS_ENABLED(CONFIG_INTEL_IDXD_MDEV)) { + rc = idxd_mdev_host_init(idxd); + if (rc < 0) + dev_warn(dev, "VFIO mdev not setup: %d\n", rc); + else + set_bit(IDXD_FLAG_MDEV_ENABLED, &idxd->flags); + } + rc = idxd_setup_sysfs(idxd); if (rc) { dev_err(dev, "IDXD sysfs setup failed\n"); @@ -581,6 +590,8 @@ static void idxd_remove(struct pci_dev *pdev) dev_dbg(&pdev->dev, "%s called\n", __func__); idxd_cleanup_sysfs(idxd); idxd_shutdown(pdev); + if (IS_ENABLED(CONFIG_INTEL_IDXD_MDEV) && device_mdev_enabled(idxd)) + idxd_mdev_host_release(idxd); if (device_pasid_enabled(idxd)) idxd_disable_system_pasid(idxd); mutex_lock(&idxd_idr_lock); diff --git a/drivers/dma/idxd/mdev.c b/drivers/dma/idxd/mdev.c new file mode 100644 index 000000000000..f9cc2909b1cf --- /dev/null +++ b/drivers/dma/idxd/mdev.c @@ -0,0 +1,980 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "registers.h" +#include "idxd.h" +#include "../../vfio/pci/vfio_pci_private.h" +#include "mdev.h" +#include "vdev.h" +#include "ims.h" + +static u64 idxd_pci_config[] = { + 0x001000000b258086ULL, + 0x0080000008800000ULL, + 0x000000000000000cULL, + 0x000000000000000cULL, + 0x0000000000000000ULL, + 0x2010808600000000ULL, + 0x0000004000000000ULL, + 0x000000ff00000000ULL, + 0x0000060000015011ULL, /* MSI-X capability, hardcoded 2 entries, Encoded as N-1 */ + 0x0000070000000000ULL, + 0x0000000000920010ULL, /* PCIe capability */ + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, + 0x0000000000000000ULL, +}; + +static inline void reset_vconfig(struct vdcm_idxd *vidxd) +{ + memset(vidxd->cfg, 0, VIDXD_MAX_CFG_SPACE_SZ); + memcpy(vidxd->cfg, idxd_pci_config, sizeof(idxd_pci_config)); +} + +static inline void reset_vmmio(struct vdcm_idxd *vidxd) +{ + memset(&vidxd->bar0, 0, VIDXD_MAX_MMIO_SPACE_SZ); +} + +static void idxd_vdcm_init(struct vdcm_idxd *vidxd) +{ + struct idxd_wq *wq = vidxd->wq; + + reset_vconfig(vidxd); + reset_vmmio(vidxd); + + vidxd->bar_size[0] = VIDXD_BAR0_SIZE; + vidxd->bar_size[1] = VIDXD_BAR2_SIZE; + + vidxd_mmio_init(vidxd); + + if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED) + idxd_wq_disable(wq); +} + +static void __idxd_vdcm_release(struct vdcm_idxd *vidxd) +{ + int rc; + struct device *dev = &vidxd->idxd->pdev->dev; + + mutex_lock(&vidxd->dev_lock); + if (atomic_cmpxchg(&vidxd->vdev.released, 0, 1)) { + mutex_unlock(&vidxd->dev_lock); + return; + } + + rc = vfio_unregister_notifier(mdev_dev(vidxd->vdev.mdev), + VFIO_GROUP_NOTIFY, + &vidxd->vdev.group_notifier); + if (rc < 0) + dev_warn(dev, "vfio_unregister_notifier group failed: %d\n", rc); + + vidxd_free_ims_entries(vidxd); + /* Re-initialize the VIDXD to a pristine state for re-use */ + idxd_vdcm_init(vidxd); + mutex_unlock(&vidxd->dev_lock); +} + +static void idxd_vdcm_release(struct mdev_device *mdev) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + struct device *dev = mdev_dev(mdev); + + dev_dbg(dev, "vdcm_idxd_release %d\n", vidxd->type->type); + __idxd_vdcm_release(vidxd); +} + +static void idxd_vdcm_release_work(struct work_struct *work) +{ + struct vdcm_idxd *vidxd = container_of(work, struct vdcm_idxd, + vdev.release_work); + + __idxd_vdcm_release(vidxd); +} + +static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev_device *mdev, + struct vdcm_idxd_type *type) +{ + struct vdcm_idxd *vidxd; + struct idxd_wq *wq = NULL; + int i; + + /* PLACEHOLDER, wq matching comes later */ + + if (!wq) + return ERR_PTR(-ENODEV); + + vidxd = kzalloc(sizeof(*vidxd), GFP_KERNEL); + if (!vidxd) + return ERR_PTR(-ENOMEM); + + mutex_init(&vidxd->dev_lock); + vidxd->idxd = idxd; + vidxd->vdev.mdev = mdev; + vidxd->wq = wq; + mdev_set_drvdata(mdev, vidxd); + vidxd->type = type; + vidxd->num_wqs = VIDXD_MAX_WQS; + + for (i = 0; i < VIDXD_MAX_MSIX_VECS - 1; i++) + vidxd->ims_index[i] = -1; + + INIT_WORK(&vidxd->vdev.release_work, idxd_vdcm_release_work); + idxd_vdcm_init(vidxd); + mutex_lock(&wq->wq_lock); + idxd_wq_get(wq); + mutex_unlock(&wq->wq_lock); + + return vidxd; +} + +static struct vdcm_idxd_type idxd_mdev_types[IDXD_MDEV_TYPES]; + +static struct vdcm_idxd_type *idxd_vdcm_find_vidxd_type(struct device *dev, + const char *name) +{ + int i; + char dev_name[IDXD_MDEV_NAME_LEN]; + + for (i = 0; i < IDXD_MDEV_TYPES; i++) { + snprintf(dev_name, IDXD_MDEV_NAME_LEN, "idxd-%s", + idxd_mdev_types[i].name); + + if (!strncmp(name, dev_name, IDXD_MDEV_NAME_LEN)) + return &idxd_mdev_types[i]; + } + + return NULL; +} + +static int idxd_vdcm_create(struct kobject *kobj, struct mdev_device *mdev) +{ + struct vdcm_idxd *vidxd; + struct vdcm_idxd_type *type; + struct device *dev, *parent; + struct idxd_device *idxd; + struct idxd_wq *wq; + + parent = mdev_parent_dev(mdev); + idxd = dev_get_drvdata(parent); + dev = mdev_dev(mdev); + + mdev_set_iommu_device(dev, parent); + type = idxd_vdcm_find_vidxd_type(dev, kobject_name(kobj)); + if (!type) { + dev_err(dev, "failed to find type %s to create\n", + kobject_name(kobj)); + return -EINVAL; + } + + vidxd = vdcm_vidxd_create(idxd, mdev, type); + if (IS_ERR(vidxd)) { + dev_err(dev, "failed to create vidxd: %ld\n", PTR_ERR(vidxd)); + return PTR_ERR(vidxd); + } + + wq = vidxd->wq; + mutex_lock(&wq->wq_lock); + list_add(&vidxd->list, &wq->vdcm_list); + mutex_unlock(&wq->wq_lock); + dev_dbg(dev, "mdev creation success: %s\n", dev_name(mdev_dev(mdev))); + + return 0; +} + +static int idxd_vdcm_remove(struct mdev_device *mdev) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + struct idxd_device *idxd = vidxd->idxd; + struct device *dev = &idxd->pdev->dev; + struct idxd_wq *wq = vidxd->wq; + + dev_dbg(dev, "%s: removing for wq %d\n", __func__, vidxd->wq->id); + + mutex_lock(&wq->wq_lock); + list_del(&vidxd->list); + idxd_wq_put(wq); + mutex_unlock(&wq->wq_lock); + + kfree(vidxd); + return 0; +} + +static int idxd_vdcm_group_notifier(struct notifier_block *nb, + unsigned long action, void *data) +{ + struct vdcm_idxd *vidxd = container_of(nb, struct vdcm_idxd, + vdev.group_notifier); + + /* The only action we care about */ + if (action == VFIO_GROUP_NOTIFY_SET_KVM) + if (!data) + schedule_work(&vidxd->vdev.release_work); + + return NOTIFY_OK; +} + +static int idxd_vdcm_open(struct mdev_device *mdev) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + unsigned long events; + int rc; + struct vdcm_idxd_type *type = vidxd->type; + struct device *dev = mdev_dev(mdev); + + dev_dbg(dev, "%s: type: %d\n", __func__, type->type); + + mutex_lock(&vidxd->dev_lock); + vidxd->vdev.group_notifier.notifier_call = idxd_vdcm_group_notifier; + events = VFIO_GROUP_NOTIFY_SET_KVM; + rc = vfio_register_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY, &events, + &vidxd->vdev.group_notifier); + if (rc < 0) { + mutex_unlock(&vidxd->dev_lock); + dev_err(dev, "vfio_register_notifier for group failed: %d\n", rc); + return rc; + } + + /* allocate and setup IMS entries */ + rc = vidxd_setup_ims_entries(vidxd); + if (rc < 0) + goto undo_group; + + atomic_set(&vidxd->vdev.released, 0); + mutex_unlock(&vidxd->dev_lock); + + return rc; + + undo_group: + mutex_unlock(&vidxd->dev_lock); + vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY, &vidxd->vdev.group_notifier); + return rc; +} + +static ssize_t idxd_vdcm_rw(struct mdev_device *mdev, char *buf, size_t count, loff_t *ppos, + enum idxd_vdcm_rw mode) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos); + u64 pos = *ppos & VFIO_PCI_OFFSET_MASK; + struct device *dev = mdev_dev(mdev); + int rc = -EINVAL; + + if (index >= VFIO_PCI_NUM_REGIONS) { + dev_err(dev, "invalid index: %u\n", index); + return -EINVAL; + } + + switch (index) { + case VFIO_PCI_CONFIG_REGION_INDEX: + if (mode == IDXD_VDCM_WRITE) + rc = vidxd_cfg_write(vidxd, pos, buf, count); + else + rc = vidxd_cfg_read(vidxd, pos, buf, count); + break; + case VFIO_PCI_BAR0_REGION_INDEX: + case VFIO_PCI_BAR1_REGION_INDEX: + if (mode == IDXD_VDCM_WRITE) + rc = vidxd_mmio_write(vidxd, vidxd->bar_val[0] + pos, buf, count); + else + rc = vidxd_mmio_read(vidxd, vidxd->bar_val[0] + pos, buf, count); + break; + case VFIO_PCI_BAR2_REGION_INDEX: + case VFIO_PCI_BAR3_REGION_INDEX: + case VFIO_PCI_BAR4_REGION_INDEX: + case VFIO_PCI_BAR5_REGION_INDEX: + case VFIO_PCI_VGA_REGION_INDEX: + case VFIO_PCI_ROM_REGION_INDEX: + default: + dev_err(dev, "unsupported region: %u\n", index); + } + + return rc == 0 ? count : rc; +} + +static ssize_t idxd_vdcm_read(struct mdev_device *mdev, char __user *buf, size_t count, + loff_t *ppos) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + unsigned int done = 0; + int rc; + + mutex_lock(&vidxd->dev_lock); + while (count) { + size_t filled; + + if (count >= 4 && !(*ppos % 4)) { + u32 val; + + rc = idxd_vdcm_rw(mdev, (char *)&val, sizeof(val), + ppos, IDXD_VDCM_READ); + if (rc <= 0) + goto read_err; + + if (copy_to_user(buf, &val, sizeof(val))) + goto read_err; + + filled = 4; + } else if (count >= 2 && !(*ppos % 2)) { + u16 val; + + rc = idxd_vdcm_rw(mdev, (char *)&val, sizeof(val), + ppos, IDXD_VDCM_READ); + if (rc <= 0) + goto read_err; + + if (copy_to_user(buf, &val, sizeof(val))) + goto read_err; + + filled = 2; + } else { + u8 val; + + rc = idxd_vdcm_rw(mdev, &val, sizeof(val), ppos, + IDXD_VDCM_READ); + if (rc <= 0) + goto read_err; + + if (copy_to_user(buf, &val, sizeof(val))) + goto read_err; + + filled = 1; + } + + count -= filled; + done += filled; + *ppos += filled; + buf += filled; + } + + mutex_unlock(&vidxd->dev_lock); + return done; + + read_err: + mutex_unlock(&vidxd->dev_lock); + return -EFAULT; +} + +static ssize_t idxd_vdcm_write(struct mdev_device *mdev, const char __user *buf, size_t count, + loff_t *ppos) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + unsigned int done = 0; + int rc; + + mutex_lock(&vidxd->dev_lock); + while (count) { + size_t filled; + + if (count >= 4 && !(*ppos % 4)) { + u32 val; + + if (copy_from_user(&val, buf, sizeof(val))) + goto write_err; + + rc = idxd_vdcm_rw(mdev, (char *)&val, sizeof(val), + ppos, IDXD_VDCM_WRITE); + if (rc <= 0) + goto write_err; + + filled = 4; + } else if (count >= 2 && !(*ppos % 2)) { + u16 val; + + if (copy_from_user(&val, buf, sizeof(val))) + goto write_err; + + rc = idxd_vdcm_rw(mdev, (char *)&val, + sizeof(val), ppos, IDXD_VDCM_WRITE); + if (rc <= 0) + goto write_err; + + filled = 2; + } else { + u8 val; + + if (copy_from_user(&val, buf, sizeof(val))) + goto write_err; + + rc = idxd_vdcm_rw(mdev, &val, sizeof(val), + ppos, IDXD_VDCM_WRITE); + if (rc <= 0) + goto write_err; + + filled = 1; + } + + count -= filled; + done += filled; + *ppos += filled; + buf += filled; + } + + mutex_unlock(&vidxd->dev_lock); + return done; + +write_err: + mutex_unlock(&vidxd->dev_lock); + return -EFAULT; +} + +static int check_vma(struct idxd_wq *wq, struct vm_area_struct *vma) +{ + if (vma->vm_end < vma->vm_start) + return -EINVAL; + if (!(vma->vm_flags & VM_SHARED)) + return -EINVAL; + + return 0; +} + +static int idxd_vdcm_mmap(struct mdev_device *mdev, struct vm_area_struct *vma) +{ + unsigned int wq_idx, rc; + unsigned long req_size, pgoff = 0, offset; + pgprot_t pg_prot; + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + struct idxd_wq *wq = vidxd->wq; + struct idxd_device *idxd = vidxd->idxd; + enum idxd_portal_prot virt_portal, phys_portal; + phys_addr_t base = pci_resource_start(idxd->pdev, IDXD_WQ_BAR); + struct device *dev = mdev_dev(mdev); + + rc = check_vma(wq, vma); + if (rc) + return rc; + + pg_prot = vma->vm_page_prot; + req_size = vma->vm_end - vma->vm_start; + vma->vm_flags |= VM_DONTCOPY; + + offset = (vma->vm_pgoff << PAGE_SHIFT) & + ((1ULL << VFIO_PCI_OFFSET_SHIFT) - 1); + + wq_idx = offset >> (PAGE_SHIFT + 2); + if (wq_idx >= 1) { + dev_err(dev, "mapping invalid wq %d off %lx\n", + wq_idx, offset); + return -EINVAL; + } + + /* + * Check and see if the guest wants to map to the limited or unlimited portal. + * The driver will allow mapping to unlimited portal only if the the wq is a + * dedicated wq. Otherwise, it goes to limited. + */ + virt_portal = ((offset >> PAGE_SHIFT) & 0x3) == 1; + phys_portal = IDXD_PORTAL_LIMITED; + if (virt_portal == IDXD_PORTAL_UNLIMITED && wq_dedicated(wq)) + phys_portal = IDXD_PORTAL_UNLIMITED; + + /* We always map IMS portals to the guest */ + pgoff = (base + idxd_get_wq_portal_full_offset(wq->id, phys_portal, + IDXD_IRQ_IMS)) >> PAGE_SHIFT; + + dev_dbg(dev, "mmap %lx %lx %lx %lx\n", vma->vm_start, pgoff, req_size, + pgprot_val(pg_prot)); + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + vma->vm_private_data = mdev; + vma->vm_pgoff = pgoff; + + return remap_pfn_range(vma, vma->vm_start, pgoff, req_size, pg_prot); +} + +static int idxd_vdcm_get_irq_count(struct vdcm_idxd *vidxd, int type) +{ + /* + * Even though the number of MSIX vectors supported are not tied to number of + * wqs being exported, the current design is to allow 1 vector per WQ for guest. + * So here we end up with num of wqs plus 1 that handles the misc interrupts. + */ + if (type == VFIO_PCI_MSI_IRQ_INDEX || type == VFIO_PCI_MSIX_IRQ_INDEX) + return VIDXD_MAX_MSIX_VECS; + + return 0; +} + +static irqreturn_t idxd_guest_wq_completion(int irq, void *data) +{ + struct ims_irq_entry *irq_entry = data; + struct vdcm_idxd *vidxd = irq_entry->vidxd; + int msix_idx = irq_entry->int_src; + + vidxd_send_interrupt(vidxd, msix_idx + 1); + return IRQ_HANDLED; +} + +static int msix_trigger_unregister(struct vdcm_idxd *vidxd, int index) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct ims_irq_entry *irq_entry; + int rc; + + if (!vidxd->vdev.msix_trigger[index]) + return 0; + + dev_dbg(dev, "disable MSIX trigger %d\n", index); + if (index) { + irq_entry = &vidxd->irq_entries[index - 1]; + if (irq_entry->irq_set) { + free_irq(irq_entry->irq, irq_entry); + irq_entry->irq_set = false; + } + rc = vidxd_disable_host_ims_pasid(vidxd, index - 1); + if (rc) + return rc; + } + eventfd_ctx_put(vidxd->vdev.msix_trigger[index]); + vidxd->vdev.msix_trigger[index] = NULL; + + return 0; +} + +static int msix_trigger_register(struct vdcm_idxd *vidxd, u32 fd, int index) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct ims_irq_entry *irq_entry; + struct eventfd_ctx *trigger; + int rc; + + rc = msix_trigger_unregister(vidxd, index); + if (rc < 0) + return rc; + + dev_dbg(dev, "enable MSIX trigger %d\n", index); + trigger = eventfd_ctx_fdget(fd); + if (IS_ERR(trigger)) { + dev_warn(dev, "eventfd_ctx_fdget failed %d\n", index); + return PTR_ERR(trigger); + } + + /* + * The MSIX vector 0 is emulated by the mdev. Starting with vector 1 + * the interrupt is backed by IMS and needs to be set up, but we + * will be setting up entry 0 of the IMS vectors. So here we pass + * in i - 1 to the host setup and irq_entries. + */ + if (index) { + irq_entry = &vidxd->irq_entries[index - 1]; + rc = vidxd_enable_host_ims_pasid(vidxd, index - 1); + if (rc) { + dev_warn(dev, "failed to enable host ims pasid\n"); + eventfd_ctx_put(trigger); + return rc; + } + + rc = request_irq(irq_entry->irq, idxd_guest_wq_completion, 0, "idxd-ims", irq_entry); + if (rc) { + dev_warn(dev, "failed to request ims irq\n"); + eventfd_ctx_put(trigger); + vidxd_disable_host_ims_pasid(vidxd, index - 1); + return rc; + } + irq_entry->irq_set = true; + } + + vidxd->vdev.msix_trigger[index] = trigger; + return 0; +} + +static int vdcm_idxd_set_msix_trigger(struct vdcm_idxd *vidxd, + unsigned int index, unsigned int start, + unsigned int count, uint32_t flags, + void *data) +{ + int i, rc = 0; + + if (count > VIDXD_MAX_MSIX_ENTRIES - 1) + count = VIDXD_MAX_MSIX_ENTRIES - 1; + + /* + * The MSIX vector 0 is emulated by the mdev. Starting with vector 1 + * the interrupt is backed by IMS and needs to be set up, but we + * will be setting up entry 0 of the IMS vectors. So here we pass + * in i - 1 to the host setup and irq_entries. + */ + if (count == 0 && (flags & VFIO_IRQ_SET_DATA_NONE)) { + /* Disable all MSIX entries */ + for (i = 0; i < VIDXD_MAX_MSIX_ENTRIES; i++) { + rc = msix_trigger_unregister(vidxd, i); + if (rc < 0) + return rc; + } + return 0; + } + + for (i = 0; i < count; i++) { + if (flags & VFIO_IRQ_SET_DATA_EVENTFD) { + u32 fd = *(u32 *)(data + i * sizeof(u32)); + + rc = msix_trigger_register(vidxd, fd, i); + if (rc < 0) + return rc; + } else if (flags & VFIO_IRQ_SET_DATA_NONE) { + rc = msix_trigger_unregister(vidxd, i); + if (rc < 0) + return rc; + } + } + return rc; +} + +static int idxd_vdcm_set_irqs(struct vdcm_idxd *vidxd, uint32_t flags, + unsigned int index, unsigned int start, + unsigned int count, void *data) +{ + int (*func)(struct vdcm_idxd *vidxd, unsigned int index, + unsigned int start, unsigned int count, uint32_t flags, + void *data) = NULL; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + switch (index) { + case VFIO_PCI_INTX_IRQ_INDEX: + dev_warn(dev, "intx interrupts not supported.\n"); + break; + case VFIO_PCI_MSI_IRQ_INDEX: + dev_dbg(dev, "msi interrupt.\n"); + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { + case VFIO_IRQ_SET_ACTION_MASK: + case VFIO_IRQ_SET_ACTION_UNMASK: + break; + case VFIO_IRQ_SET_ACTION_TRIGGER: + func = vdcm_idxd_set_msix_trigger; + break; + } + break; + case VFIO_PCI_MSIX_IRQ_INDEX: + switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) { + case VFIO_IRQ_SET_ACTION_MASK: + case VFIO_IRQ_SET_ACTION_UNMASK: + break; + case VFIO_IRQ_SET_ACTION_TRIGGER: + func = vdcm_idxd_set_msix_trigger; + break; + } + break; + default: + return -ENOTTY; + } + + if (!func) + return -ENOTTY; + + return func(vidxd, index, start, count, flags, data); +} + +static void vidxd_vdcm_reset(struct vdcm_idxd *vidxd) +{ + vidxd_reset(vidxd); +} + +static long idxd_vdcm_ioctl(struct mdev_device *mdev, unsigned int cmd, + unsigned long arg) +{ + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + unsigned long minsz; + int rc = -EINVAL; + struct device *dev = mdev_dev(mdev); + + dev_dbg(dev, "vidxd %p ioctl, cmd: %d\n", vidxd, cmd); + + mutex_lock(&vidxd->dev_lock); + if (cmd == VFIO_DEVICE_GET_INFO) { + struct vfio_device_info info; + + minsz = offsetofend(struct vfio_device_info, num_irqs); + + if (copy_from_user(&info, (void __user *)arg, minsz)) { + rc = -EFAULT; + goto out; + } + + if (info.argsz < minsz) { + rc = -EINVAL; + goto out; + } + + info.flags = VFIO_DEVICE_FLAGS_PCI; + info.flags |= VFIO_DEVICE_FLAGS_RESET; + info.num_regions = VFIO_PCI_NUM_REGIONS; + info.num_irqs = VFIO_PCI_NUM_IRQS; + + if (copy_to_user((void __user *)arg, &info, minsz)) + rc = -EFAULT; + else + rc = 0; + goto out; + } else if (cmd == VFIO_DEVICE_GET_REGION_INFO) { + struct vfio_region_info info; + struct vfio_info_cap caps = { .buf = NULL, .size = 0 }; + struct vfio_region_info_cap_sparse_mmap *sparse = NULL; + size_t size; + int nr_areas = 1; + int cap_type_id = 0; + + minsz = offsetofend(struct vfio_region_info, offset); + + if (copy_from_user(&info, (void __user *)arg, minsz)) { + rc = -EFAULT; + goto out; + } + + if (info.argsz < minsz) { + rc = -EINVAL; + goto out; + } + + switch (info.index) { + case VFIO_PCI_CONFIG_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = VIDXD_MAX_CFG_SPACE_SZ; + info.flags = VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE; + break; + case VFIO_PCI_BAR0_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = vidxd->bar_size[info.index]; + if (!info.size) { + info.flags = 0; + break; + } + + info.flags = VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE; + break; + case VFIO_PCI_BAR1_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = 0; + info.flags = 0; + break; + case VFIO_PCI_BAR2_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.flags = VFIO_REGION_INFO_FLAG_CAPS | VFIO_REGION_INFO_FLAG_MMAP | + VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE; + info.size = vidxd->bar_size[1]; + + /* + * Every WQ has two areas for unlimited and limited + * MSI-X portals. IMS portals are not reported + */ + nr_areas = 2; + + size = sizeof(*sparse) + (nr_areas * sizeof(*sparse->areas)); + sparse = kzalloc(size, GFP_KERNEL); + if (!sparse) { + rc = -ENOMEM; + goto out; + } + + sparse->header.id = VFIO_REGION_INFO_CAP_SPARSE_MMAP; + sparse->header.version = 1; + sparse->nr_areas = nr_areas; + cap_type_id = VFIO_REGION_INFO_CAP_SPARSE_MMAP; + + sparse->areas[0].offset = 0; + sparse->areas[0].size = PAGE_SIZE; + + sparse->areas[1].offset = PAGE_SIZE; + sparse->areas[1].size = PAGE_SIZE; + break; + + case VFIO_PCI_BAR3_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = 0; + info.flags = 0; + dev_dbg(dev, "get region info bar:%d\n", info.index); + break; + + case VFIO_PCI_ROM_REGION_INDEX: + case VFIO_PCI_VGA_REGION_INDEX: + dev_dbg(dev, "get region info index:%d\n", info.index); + break; + default: { + if (info.index >= VFIO_PCI_NUM_REGIONS) + rc = -EINVAL; + else + rc = 0; + goto out; + } /* default */ + } /* info.index switch */ + + if ((info.flags & VFIO_REGION_INFO_FLAG_CAPS) && sparse) { + if (cap_type_id == VFIO_REGION_INFO_CAP_SPARSE_MMAP) { + rc = vfio_info_add_capability(&caps, &sparse->header, + sizeof(*sparse) + (sparse->nr_areas * + sizeof(*sparse->areas))); + kfree(sparse); + if (rc) + goto out; + } + } + + if (caps.size) { + if (info.argsz < sizeof(info) + caps.size) { + info.argsz = sizeof(info) + caps.size; + info.cap_offset = 0; + } else { + vfio_info_cap_shift(&caps, sizeof(info)); + if (copy_to_user((void __user *)arg + sizeof(info), + caps.buf, caps.size)) { + kfree(caps.buf); + rc = -EFAULT; + goto out; + } + info.cap_offset = sizeof(info); + } + + kfree(caps.buf); + } + if (copy_to_user((void __user *)arg, &info, minsz)) + rc = -EFAULT; + else + rc = 0; + goto out; + } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) { + struct vfio_irq_info info; + + minsz = offsetofend(struct vfio_irq_info, count); + + if (copy_from_user(&info, (void __user *)arg, minsz)) { + rc = -EFAULT; + goto out; + } + + if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS) { + rc = -EINVAL; + goto out; + } + + switch (info.index) { + case VFIO_PCI_MSI_IRQ_INDEX: + case VFIO_PCI_MSIX_IRQ_INDEX: + default: + rc = -EINVAL; + goto out; + } /* switch(info.index) */ + + info.flags = VFIO_IRQ_INFO_EVENTFD | VFIO_IRQ_INFO_NORESIZE; + info.count = idxd_vdcm_get_irq_count(vidxd, info.index); + + if (copy_to_user((void __user *)arg, &info, minsz)) + rc = -EFAULT; + else + rc = 0; + goto out; + } else if (cmd == VFIO_DEVICE_SET_IRQS) { + struct vfio_irq_set hdr; + u8 *data = NULL; + size_t data_size = 0; + + minsz = offsetofend(struct vfio_irq_set, count); + + if (copy_from_user(&hdr, (void __user *)arg, minsz)) { + rc = -EFAULT; + goto out; + } + + if (!(hdr.flags & VFIO_IRQ_SET_DATA_NONE)) { + int max = idxd_vdcm_get_irq_count(vidxd, hdr.index); + + rc = vfio_set_irqs_validate_and_prepare(&hdr, max, VFIO_PCI_NUM_IRQS, + &data_size); + if (rc) { + dev_err(dev, "intel:vfio_set_irqs_validate_and_prepare failed\n"); + rc = -EINVAL; + goto out; + } + if (data_size) { + data = memdup_user((void __user *)(arg + minsz), data_size); + if (IS_ERR(data)) { + rc = PTR_ERR(data); + goto out; + } + } + } + + if (!data) { + rc = -EINVAL; + goto out; + } + + rc = idxd_vdcm_set_irqs(vidxd, hdr.flags, hdr.index, hdr.start, hdr.count, data); + kfree(data); + goto out; + } else if (cmd == VFIO_DEVICE_RESET) { + vidxd_vdcm_reset(vidxd); + } + + out: + mutex_unlock(&vidxd->dev_lock); + return rc; +} + +static const struct mdev_parent_ops idxd_vdcm_ops = { + .create = idxd_vdcm_create, + .remove = idxd_vdcm_remove, + .open = idxd_vdcm_open, + .release = idxd_vdcm_release, + .read = idxd_vdcm_read, + .write = idxd_vdcm_write, + .mmap = idxd_vdcm_mmap, + .ioctl = idxd_vdcm_ioctl, +}; + +int idxd_mdev_host_init(struct idxd_device *idxd) +{ + struct device *dev = &idxd->pdev->dev; + int rc; + + if (!test_bit(IDXD_FLAG_SIOV_SUPPORTED, &idxd->flags)) + return -EOPNOTSUPP; + + if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) { + rc = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_AUX); + if (rc < 0) { + dev_warn(dev, "Failed to enable aux-domain: %d\n", rc); + return rc; + } + } else { + dev_warn(dev, "No aux-domain feature.\n"); + return -EOPNOTSUPP; + } + + return mdev_register_device(dev, &idxd_vdcm_ops); +} + +void idxd_mdev_host_release(struct idxd_device *idxd) +{ + struct device *dev = &idxd->pdev->dev; + int rc; + + mdev_unregister_device(dev); + if (iommu_dev_has_feature(dev, IOMMU_DEV_FEAT_AUX)) { + rc = iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_AUX); + if (rc < 0) + dev_warn(dev, "Failed to disable aux-domain: %d\n", + rc); + } +} diff --git a/drivers/dma/idxd/mdev.h b/drivers/dma/idxd/mdev.h new file mode 100644 index 000000000000..328055435cea --- /dev/null +++ b/drivers/dma/idxd/mdev.h @@ -0,0 +1,118 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */ + +#ifndef _IDXD_MDEV_H_ +#define _IDXD_MDEV_H_ + +/* two 64-bit BARs implemented */ +#define VIDXD_MAX_BARS 2 +#define VIDXD_MAX_CFG_SPACE_SZ 4096 +#define VIDXD_MAX_MMIO_SPACE_SZ 8192 +#define VIDXD_MSIX_TBL_SZ_OFFSET 0x42 +#define VIDXD_CAP_CTRL_SZ 0x100 +#define VIDXD_GRP_CTRL_SZ 0x100 +#define VIDXD_WQ_CTRL_SZ 0x100 +#define VIDXD_WQ_OCPY_INT_SZ 0x20 +#define VIDXD_MSIX_TBL_SZ 0x90 +#define VIDXD_MSIX_PERM_TBL_SZ 0x48 + +#define VIDXD_MSIX_TABLE_OFFSET 0x600 +#define VIDXD_MSIX_PERM_OFFSET 0x300 +#define VIDXD_GRPCFG_OFFSET 0x400 +#define VIDXD_WQCFG_OFFSET 0x500 +#define VIDXD_IMS_OFFSET 0x1000 + +#define VIDXD_BAR0_SIZE 0x2000 +#define VIDXD_BAR2_SIZE 0x20000 +#define VIDXD_MAX_MSIX_ENTRIES (VIDXD_MSIX_TBL_SZ / 0x10) +#define VIDXD_MAX_WQS 1 +#define VIDXD_MAX_MSIX_VECS 2 + +#define VIDXD_ATS_OFFSET 0x100 +#define VIDXD_PRS_OFFSET 0x110 +#define VIDXD_PASID_OFFSET 0x120 +#define VIDXD_MSIX_PBA_OFFSET 0x700 + +struct ims_irq_entry { + struct vdcm_idxd *vidxd; + int int_src; + unsigned int irq; + bool irq_set; +}; + +struct idxd_vdev { + struct mdev_device *mdev; + struct eventfd_ctx *msix_trigger[VIDXD_MAX_MSIX_ENTRIES]; + struct notifier_block group_notifier; + struct work_struct release_work; + atomic_t released; +}; + +struct vdcm_idxd { + struct idxd_device *idxd; + struct idxd_wq *wq; + struct idxd_vdev vdev; + struct vdcm_idxd_type *type; + int num_wqs; + u64 ims_index[VIDXD_MAX_MSIX_VECS - 1]; + struct msix_entry ims_entry; + struct ims_irq_entry irq_entries[VIDXD_MAX_WQS]; + + /* For VM use case */ + u64 bar_val[VIDXD_MAX_BARS]; + u64 bar_size[VIDXD_MAX_BARS]; + u8 cfg[VIDXD_MAX_CFG_SPACE_SZ]; + u8 bar0[VIDXD_MAX_MMIO_SPACE_SZ]; + struct list_head list; + struct mutex dev_lock; /* lock for vidxd resources */ +}; + +static inline struct vdcm_idxd *to_vidxd(struct idxd_vdev *vdev) +{ + return container_of(vdev, struct vdcm_idxd, vdev); +} + +#define IDXD_MDEV_NAME_LEN 16 +#define IDXD_MDEV_DESCRIPTION_LEN 64 + +enum idxd_mdev_type { + IDXD_MDEV_TYPE_1_DWQ = 0, +}; + +#define IDXD_MDEV_TYPES 1 + +struct vdcm_idxd_type { + char name[IDXD_MDEV_NAME_LEN]; + char description[IDXD_MDEV_DESCRIPTION_LEN]; + enum idxd_mdev_type type; + unsigned int avail_instance; +}; + +enum idxd_vdcm_rw { + IDXD_VDCM_READ = 0, + IDXD_VDCM_WRITE, +}; + +static inline u64 get_reg_val(void *buf, int size) +{ + u64 val = 0; + + switch (size) { + case 8: + val = *(uint64_t *)buf; + break; + case 4: + val = *(uint32_t *)buf; + break; + case 2: + val = *(uint16_t *)buf; + break; + case 1: + val = *(uint8_t *)buf; + break; + } + + return val; +} + +#endif diff --git a/drivers/dma/idxd/vdev.c b/drivers/dma/idxd/vdev.c new file mode 100644 index 000000000000..af421852cc51 --- /dev/null +++ b/drivers/dma/idxd/vdev.c @@ -0,0 +1,76 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "registers.h" +#include "idxd.h" +#include "../../vfio/pci/vfio_pci_private.h" +#include "mdev.h" +#include "vdev.h" + +int vidxd_send_interrupt(struct vdcm_idxd *vidxd, int msix_idx) +{ + /* PLACE HOLDER */ + return 0; +} + +int vidxd_disable_host_ims_pasid(struct vdcm_idxd *vidxd, int ims_idx) +{ + /* PLACEHOLDER */ + return 0; +} + +int vidxd_enable_host_ims_pasid(struct vdcm_idxd *vidxd, int ims_idx) +{ + /* PLACEHOLDER */ + return 0; +} + +int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size) +{ + /* PLACEHOLDER */ + return 0; +} + +int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size) +{ + /* PLACEHOLDER */ + return 0; +} + +int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count) +{ + /* PLACEHOLDER */ + return 0; +} + +int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size) +{ + /* PLACEHOLDER */ + return 0; +} + +void vidxd_mmio_init(struct vdcm_idxd *vidxd) +{ + /* PLACEHOLDER */ +} + +void vidxd_reset(struct vdcm_idxd *vidxd) +{ + /* PLACEHOLDER */ +} diff --git a/drivers/dma/idxd/vdev.h b/drivers/dma/idxd/vdev.h new file mode 100644 index 000000000000..1a2fdda271e8 --- /dev/null +++ b/drivers/dma/idxd/vdev.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2019,2020 Intel Corporation. All rights rsvd. */ + +#ifndef _IDXD_VDEV_H_ +#define _IDXD_VDEV_H_ + +#include "mdev.h" + +int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size); +int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size); +int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count); +int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size); +void vidxd_mmio_init(struct vdcm_idxd *vidxd); +void vidxd_reset(struct vdcm_idxd *vidxd); +int vidxd_disable_host_ims_pasid(struct vdcm_idxd *vidxd, int ims_idx); +int vidxd_enable_host_ims_pasid(struct vdcm_idxd *vidxd, int ims_idx); +int vidxd_send_interrupt(struct vdcm_idxd *vidxd, int msix_idx); + +#endif From patchwork Tue Jul 21 16:03:21 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11676055 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 83AC3138C for ; Tue, 21 Jul 2020 16:08:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 675E92077D for ; Tue, 21 Jul 2020 16:08:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726029AbgGUQI1 (ORCPT ); Tue, 21 Jul 2020 12:08:27 -0400 Received: from mga17.intel.com ([192.55.52.151]:14338 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730232AbgGUQI1 (ORCPT ); Tue, 21 Jul 2020 12:08:27 -0400 IronPort-SDR: uVPfVJsri2qymBnqRn346uu+cqCWi76WU5QiAyFha6eEE07uuwQTkdmxXT5kfHswYnyi13qtbn 66e+XGQHP0og== X-IronPort-AV: E=McAfee;i="6000,8403,9689"; a="130239682" X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="130239682" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2020 09:03:22 -0700 IronPort-SDR: 5RET2/3EwcwyZOujwYUVfWvRl+pg4mL8PaRTOGam4jQ4SfIlh+3bM7TchIsJq9M5iK1Ha4IGL6 FrQtdMsDFE7Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="319952124" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga002.fm.intel.com with ESMTP; 21 Jul 2020 09:03:21 -0700 Subject: [PATCH RFC v2 10/18] dmaengine: idxd: add emulation rw routines From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, dave.hansen@intel.com, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Jul 2020 09:03:21 -0700 Message-ID: <159534740139.28840.8782422568949418156.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> References: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add emulation routines for PCI config read/write, MMIO read/write, and interrupt handling routine for the emulated device. The rw routines are called when PCI config read/writes or BAR0 mmio read/writes and being issued by the guest kernel through KVM/qemu. Because we are supporting read-only configuration, most of the MMIO emulations are simple memory copy except for cases such as handling device commands and interrupts. Signed-off-by: Dave Jiang Reviewed-by: Kevin Tian --- drivers/dma/idxd/registers.h | 4 drivers/dma/idxd/vdev.c | 428 +++++++++++++++++++++++++++++++++++++++++- drivers/dma/idxd/vdev.h | 8 + include/uapi/linux/idxd.h | 2 4 files changed, 434 insertions(+), 8 deletions(-) diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h index ace7248ee195..f8e4dd10a738 100644 --- a/drivers/dma/idxd/registers.h +++ b/drivers/dma/idxd/registers.h @@ -268,6 +268,10 @@ union msix_perm { u32 bits; } __packed; +#define IDXD_MSIX_PERM_MASK 0xfffff00c +#define IDXD_MSIX_PERM_IGNORE 0x3 +#define MSIX_ENTRY_MASK_INT 0x1 + union group_flags { struct { u32 tc_a:3; diff --git a/drivers/dma/idxd/vdev.c b/drivers/dma/idxd/vdev.c index af421852cc51..b4eace02199e 100644 --- a/drivers/dma/idxd/vdev.c +++ b/drivers/dma/idxd/vdev.c @@ -25,8 +25,23 @@ int vidxd_send_interrupt(struct vdcm_idxd *vidxd, int msix_idx) { - /* PLACE HOLDER */ - return 0; + int rc = -1; + struct device *dev = &vidxd->idxd->pdev->dev; + + dev_dbg(dev, "%s interrput %d\n", __func__, msix_idx); + + if (!vidxd->vdev.msix_trigger[msix_idx]) { + dev_warn(dev, "%s: intr evtfd not found %d\n", __func__, msix_idx); + return -EINVAL; + } + + rc = eventfd_signal(vidxd->vdev.msix_trigger[msix_idx], 1); + if (rc != 1) + dev_err(dev, "eventfd signal failed (%d)\n", rc); + else + dev_dbg(dev, "vidxd interrupt triggered wq(%d) %d\n", vidxd->wq->id, msix_idx); + + return rc; } int vidxd_disable_host_ims_pasid(struct vdcm_idxd *vidxd, int ims_idx) @@ -41,31 +56,423 @@ int vidxd_enable_host_ims_pasid(struct vdcm_idxd *vidxd, int ims_idx) return 0; } -int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size) +static void vidxd_report_error(struct vdcm_idxd *vidxd, unsigned int error) { - /* PLACEHOLDER */ - return 0; + u8 *bar0 = vidxd->bar0; + union sw_err_reg *swerr = (union sw_err_reg *)(bar0 + IDXD_SWERR_OFFSET); + union genctrl_reg *genctrl; + bool send = false; + + if (!swerr->valid) { + memset(swerr, 0, sizeof(*swerr)); + swerr->valid = 1; + swerr->error = error; + send = true; + } else if (swerr->valid && !swerr->overflow) { + swerr->overflow = 1; + } + + genctrl = (union genctrl_reg *)(bar0 + IDXD_GENCTRL_OFFSET); + if (send && genctrl->softerr_int_en) { + u32 *intcause = (u32 *)(bar0 + IDXD_INTCAUSE_OFFSET); + + *intcause |= IDXD_INTC_ERR; + vidxd_send_interrupt(vidxd, 0); + } } int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size) { - /* PLACEHOLDER */ + u32 offset = pos & (vidxd->bar_size[0] - 1); + u8 *bar0 = vidxd->bar0; + struct device *dev = mdev_dev(vidxd->vdev.mdev); + + dev_dbg(dev, "vidxd mmio W %d %x %x: %llx\n", vidxd->wq->id, size, + offset, get_reg_val(buf, size)); + + if (((size & (size - 1)) != 0) || (offset & (size - 1)) != 0) + return -EINVAL; + + /* If we don't limit this, we potentially can write out of bound */ + if (size > 4) + return -EINVAL; + + switch (offset) { + case IDXD_GENCFG_OFFSET ... IDXD_GENCFG_OFFSET + 3: + /* Write only when device is disabled. */ + if (vidxd_state(vidxd) == IDXD_DEVICE_STATE_DISABLED) + memcpy(bar0 + offset, buf, size); + break; + + case IDXD_GENCTRL_OFFSET: + memcpy(bar0 + offset, buf, size); + break; + + case IDXD_INTCAUSE_OFFSET: + bar0[offset] &= ~(get_reg_val(buf, 1) & 0x1f); + break; + + case IDXD_CMD_OFFSET: { + u32 *cmdsts = (u32 *)(bar0 + IDXD_CMDSTS_OFFSET); + u32 val = get_reg_val(buf, size); + + if (size != 4) + return -EINVAL; + + /* Check and set command in progress */ + if (test_and_set_bit(31, (unsigned long *)cmdsts) == 0) + vidxd_do_command(vidxd, val); + else + vidxd_report_error(vidxd, DSA_ERR_CMD_REG); + break; + } + + case IDXD_SWERR_OFFSET: + /* W1C */ + bar0[offset] &= ~(get_reg_val(buf, 1) & 3); + break; + + case VIDXD_WQCFG_OFFSET ... VIDXD_WQCFG_OFFSET + VIDXD_WQ_CTRL_SZ - 1: + case VIDXD_GRPCFG_OFFSET ... VIDXD_GRPCFG_OFFSET + VIDXD_GRP_CTRL_SZ - 1: + /* Nothing is written. Should be all RO */ + break; + + case VIDXD_MSIX_TABLE_OFFSET ... VIDXD_MSIX_TABLE_OFFSET + VIDXD_MSIX_TBL_SZ - 1: { + int index = (offset - VIDXD_MSIX_TABLE_OFFSET) / 0x10; + u8 *msix_entry = &bar0[VIDXD_MSIX_TABLE_OFFSET + index * 0x10]; + u64 *pba = (u64 *)(bar0 + VIDXD_MSIX_PBA_OFFSET); + u8 cvec_byte; + + cvec_byte = msix_entry[12]; + memcpy(bar0 + offset, buf, size); + /* Handle clearing of UNMASK bit */ + if (!(msix_entry[12] & MSIX_ENTRY_MASK_INT) && cvec_byte & MSIX_ENTRY_MASK_INT) + if (test_and_clear_bit(index, (unsigned long *)pba)) + vidxd_send_interrupt(vidxd, index); + break; + } + + case VIDXD_MSIX_PERM_OFFSET ... VIDXD_MSIX_PERM_OFFSET + VIDXD_MSIX_PERM_TBL_SZ - 1: + memcpy(bar0 + offset, buf, size); + break; + } /* offset */ + + return 0; +} + +int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size) +{ + u32 offset = pos & (vidxd->bar_size[0] - 1); + struct device *dev = mdev_dev(vidxd->vdev.mdev); + + memcpy(buf, vidxd->bar0 + offset, size); + + dev_dbg(dev, "vidxd mmio R %d %x %x: %llx\n", + vidxd->wq->id, size, offset, get_reg_val(buf, size)); return 0; } int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count) { - /* PLACEHOLDER */ + u32 offset = pos & 0xfff; + struct device *dev = mdev_dev(vidxd->vdev.mdev); + + memcpy(buf, &vidxd->cfg[offset], count); + + dev_dbg(dev, "vidxd pci R %d %x %x: %llx\n", + vidxd->wq->id, count, offset, get_reg_val(buf, count)); + + return 0; +} + +/* + * Much of the emulation code has been borrowed from Intel i915 cfg space + * emulation code. + * drivers/gpu/drm/i915/gvt/cfg_space.c: + */ + +/* + * Bitmap for writable bits (RW or RW1C bits, but cannot co-exist in one + * byte) byte by byte in standard pci configuration space. (not the full + * 256 bytes.) + */ +static const u8 pci_cfg_space_rw_bmp[PCI_INTERRUPT_LINE + 4] = { + [PCI_COMMAND] = 0xff, 0x07, + [PCI_STATUS] = 0x00, 0xf9, /* the only one RW1C byte */ + [PCI_CACHE_LINE_SIZE] = 0xff, + [PCI_BASE_ADDRESS_0 ... PCI_CARDBUS_CIS - 1] = 0xff, + [PCI_ROM_ADDRESS] = 0x01, 0xf8, 0xff, 0xff, + [PCI_INTERRUPT_LINE] = 0xff, +}; + +static void _pci_cfg_mem_write(struct vdcm_idxd *vidxd, unsigned int off, u8 *src, + unsigned int bytes) +{ + u8 *cfg_base = vidxd->cfg; + u8 mask, new, old; + int i = 0; + + for (; i < bytes && (off + i < sizeof(pci_cfg_space_rw_bmp)); i++) { + mask = pci_cfg_space_rw_bmp[off + i]; + old = cfg_base[off + i]; + new = src[i] & mask; + + /** + * The PCI_STATUS high byte has RW1C bits, here + * emulates clear by writing 1 for these bits. + * Writing a 0b to RW1C bits has no effect. + */ + if (off + i == PCI_STATUS + 1) + new = (~new & old) & mask; + + cfg_base[off + i] = (old & ~mask) | new; + } + + /* For other configuration space directly copy as it is. */ + if (i < bytes) + memcpy(cfg_base + off + i, src + i, bytes - i); +} + +static inline void _write_pci_bar(struct vdcm_idxd *vidxd, u32 offset, u32 val, bool low) +{ + u32 *pval; + + /* BAR offset should be 32 bits algiend */ + offset = rounddown(offset, 4); + pval = (u32 *)(vidxd->cfg + offset); + + if (low) { + /* + * only update bit 31 - bit 4, + * leave the bit 3 - bit 0 unchanged. + */ + *pval = (val & GENMASK(31, 4)) | (*pval & GENMASK(3, 0)); + } else { + *pval = val; + } +} + +static int _pci_cfg_bar_write(struct vdcm_idxd *vidxd, unsigned int offset, void *p_data, + unsigned int bytes) +{ + u32 new = *(u32 *)(p_data); + bool lo = IS_ALIGNED(offset, 8); + u64 size; + unsigned int bar_id; + + /* + * Power-up software can determine how much address + * space the device requires by writing a value of + * all 1's to the register and then reading the value + * back. The device will return 0's in all don't-care + * address bits. + */ + if (new == 0xffffffff) { + switch (offset) { + case PCI_BASE_ADDRESS_0: + case PCI_BASE_ADDRESS_1: + case PCI_BASE_ADDRESS_2: + case PCI_BASE_ADDRESS_3: + bar_id = (offset - PCI_BASE_ADDRESS_0) / 8; + size = vidxd->bar_size[bar_id]; + _write_pci_bar(vidxd, offset, size >> (lo ? 0 : 32), lo); + break; + default: + /* Unimplemented BARs */ + _write_pci_bar(vidxd, offset, 0x0, false); + } + } else { + switch (offset) { + case PCI_BASE_ADDRESS_0: + case PCI_BASE_ADDRESS_1: + case PCI_BASE_ADDRESS_2: + case PCI_BASE_ADDRESS_3: + _write_pci_bar(vidxd, offset, new, lo); + break; + default: + break; + } + } return 0; } int vidxd_cfg_write(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int size) { - /* PLACEHOLDER */ + struct device *dev = &vidxd->idxd->pdev->dev; + + if (size > 4) + return -EINVAL; + + if (pos + size > VIDXD_MAX_CFG_SPACE_SZ) + return -EINVAL; + + dev_dbg(dev, "vidxd pci W %d %x %x: %llx\n", vidxd->wq->id, size, pos, + get_reg_val(buf, size)); + + /* First check if it's PCI_COMMAND */ + if (IS_ALIGNED(pos, 2) && pos == PCI_COMMAND) { + bool new_bme; + bool bme; + + if (size > 2) + return -EINVAL; + + new_bme = !!(get_reg_val(buf, 2) & PCI_COMMAND_MASTER); + bme = !!(vidxd->cfg[pos] & PCI_COMMAND_MASTER); + _pci_cfg_mem_write(vidxd, pos, buf, size); + + /* Flag error if turning off BME while device is enabled */ + if ((bme && !new_bme) && vidxd_state(vidxd) == IDXD_DEVICE_STATE_ENABLED) + vidxd_report_error(vidxd, DSA_ERR_PCI_CFG); + return 0; + } + + switch (rounddown(pos, 4)) { + case PCI_BASE_ADDRESS_0 ... PCI_BASE_ADDRESS_5: + if (!IS_ALIGNED(pos, 4)) + return -EINVAL; + return _pci_cfg_bar_write(vidxd, pos, buf, size); + + default: + _pci_cfg_mem_write(vidxd, pos, buf, size); + } return 0; } +static void vidxd_mmio_init_grpcap(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + union group_cap_reg *grp_cap = (union group_cap_reg *)(bar0 + IDXD_GRPCAP_OFFSET); + + /* single group for current implementation */ + grp_cap->token_en = 0; + grp_cap->token_limit = 0; + grp_cap->num_groups = 1; +} + +static void vidxd_mmio_init_grpcfg(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + struct grpcfg *grpcfg = (struct grpcfg *)(bar0 + VIDXD_GRPCFG_OFFSET); + struct idxd_wq *wq = vidxd->wq; + struct idxd_group *group = wq->group; + int i; + + /* + * At this point, we are only exporting a single workqueue for + * each mdev. So we need to just fake it as first workqueue + * and also mark the available engines in this group. + */ + + /* Set single workqueue and the first one */ + grpcfg->wqs[0] = 0x1; + grpcfg->engines = 0; + for (i = 0; i < group->num_engines; i++) + grpcfg->engines |= BIT(i); + grpcfg->flags.bits = group->grpcfg.flags.bits; +} + +static void vidxd_mmio_init_wqcap(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + struct idxd_wq *wq = vidxd->wq; + union wq_cap_reg *wq_cap = (union wq_cap_reg *)(bar0 + IDXD_WQCAP_OFFSET); + + wq_cap->occupancy_int = 0; + wq_cap->occupancy = 0; + wq_cap->priority = 0; + wq_cap->total_wq_size = wq->size; + wq_cap->num_wqs = VIDXD_MAX_WQS; + if (wq_dedicated(wq)) + wq_cap->dedicated_mode = 1; +} + +static void vidxd_mmio_init_wqcfg(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + struct idxd_wq *wq = vidxd->wq; + u8 *bar0 = vidxd->bar0; + union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + + wqcfg->wq_size = wq->size; + wqcfg->wq_thresh = wq->threshold; + + if (wq_dedicated(wq)) + wqcfg->mode = 1; + + if (idxd->hw.gen_cap.block_on_fault && + test_bit(WQ_FLAG_BLOCK_ON_FAULT, &wq->flags)) + wqcfg->bof = 1; + + wqcfg->priority = wq->priority; + wqcfg->max_xfer_shift = idxd->hw.gen_cap.max_xfer_shift; + wqcfg->max_batch_shift = idxd->hw.gen_cap.max_batch_shift; + /* make mode change read-only */ + wqcfg->mode_support = 0; +} + +static void vidxd_mmio_init_engcap(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + union engine_cap_reg *engcap = (union engine_cap_reg *)(bar0 + IDXD_ENGCAP_OFFSET); + struct idxd_wq *wq = vidxd->wq; + struct idxd_group *group = wq->group; + + engcap->num_engines = group->num_engines; +} + +static void vidxd_mmio_init_gencap(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + u8 *bar0 = vidxd->bar0; + union gen_cap_reg *gencap = (union gen_cap_reg *)(bar0 + IDXD_GENCAP_OFFSET); + + gencap->bits = idxd->hw.gen_cap.bits; + gencap->config_en = 0; + gencap->max_ims_mult = 0; + gencap->cmd_cap = 1; +} + +static void vidxd_mmio_init_cmdcap(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + u8 *bar0 = vidxd->bar0; + u32 *cmdcap = (u32 *)(bar0 + IDXD_CMDCAP_OFFSET); + + if (idxd->hw.cmd_cap) + *cmdcap = idxd->hw.cmd_cap; + else + *cmdcap = 0x1ffe; + + *cmdcap |= BIT(IDXD_CMD_REQUEST_INT_HANDLE); +} + void vidxd_mmio_init(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + u8 *bar0 = vidxd->bar0; + union offsets_reg *offsets; + + /* Copy up to where table offset is */ + memcpy_fromio(vidxd->bar0, idxd->reg_base, IDXD_TABLE_OFFSET); + + vidxd_mmio_init_gencap(vidxd); + vidxd_mmio_init_cmdcap(vidxd); + vidxd_mmio_init_wqcap(vidxd); + vidxd_mmio_init_wqcfg(vidxd); + vidxd_mmio_init_grpcap(vidxd); + vidxd_mmio_init_grpcfg(vidxd); + vidxd_mmio_init_engcap(vidxd); + + offsets = (union offsets_reg *)(bar0 + IDXD_TABLE_OFFSET); + offsets->grpcfg = VIDXD_GRPCFG_OFFSET / 0x100; + offsets->wqcfg = VIDXD_WQCFG_OFFSET / 0x100; + offsets->msix_perm = VIDXD_MSIX_PERM_OFFSET / 0x100; + + memset(bar0 + VIDXD_MSIX_PERM_OFFSET, 0, VIDXD_MSIX_PERM_TBL_SZ); +} + +static void idxd_complete_command(struct vdcm_idxd *vidxd, enum idxd_cmdsts_err val) { /* PLACEHOLDER */ } @@ -74,3 +481,8 @@ void vidxd_reset(struct vdcm_idxd *vidxd) { /* PLACEHOLDER */ } + +void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val) +{ + /* PLACEHOLDER */ +} diff --git a/drivers/dma/idxd/vdev.h b/drivers/dma/idxd/vdev.h index 1a2fdda271e8..2dc8d22d3ea7 100644 --- a/drivers/dma/idxd/vdev.h +++ b/drivers/dma/idxd/vdev.h @@ -6,6 +6,13 @@ #include "mdev.h" +static inline u8 vidxd_state(struct vdcm_idxd *vidxd) +{ + union gensts_reg *gensts = (union gensts_reg *)(vidxd->bar0 + IDXD_GENSTATS_OFFSET); + + return gensts->state; +} + int vidxd_mmio_read(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size); int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size); int vidxd_cfg_read(struct vdcm_idxd *vidxd, unsigned int pos, void *buf, unsigned int count); @@ -15,5 +22,6 @@ void vidxd_reset(struct vdcm_idxd *vidxd); int vidxd_disable_host_ims_pasid(struct vdcm_idxd *vidxd, int ims_idx); int vidxd_enable_host_ims_pasid(struct vdcm_idxd *vidxd, int ims_idx); int vidxd_send_interrupt(struct vdcm_idxd *vidxd, int msix_idx); +void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val); #endif diff --git a/include/uapi/linux/idxd.h b/include/uapi/linux/idxd.h index fdcdfe414223..a0c0475a4626 100644 --- a/include/uapi/linux/idxd.h +++ b/include/uapi/linux/idxd.h @@ -78,6 +78,8 @@ enum dsa_completion_status { DSA_COMP_HW_ERR1, DSA_COMP_HW_ERR_DRB, DSA_COMP_TRANSLATION_FAIL, + DSA_ERR_PCI_CFG = 0x51, + DSA_ERR_CMD_REG, }; #define DSA_COMP_STATUS_MASK 0x7f From patchwork Tue Jul 21 16:03:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11676007 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5E6E11392 for ; Tue, 21 Jul 2020 16:03:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4720C20792 for ; Tue, 21 Jul 2020 16:03:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730249AbgGUQD3 (ORCPT ); Tue, 21 Jul 2020 12:03:29 -0400 Received: from mga01.intel.com ([192.55.52.88]:47466 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727955AbgGUQD3 (ORCPT ); Tue, 21 Jul 2020 12:03:29 -0400 IronPort-SDR: pO/6pGyztBeAFHb74czFJfr7GfgdoPgGc07APRQqzf7m9RwKKaKqBslNrR5S5MJWTfIfpQELvJ UBMguRpg5n9g== X-IronPort-AV: E=McAfee;i="6000,8403,9689"; a="168306447" X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="168306447" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2020 09:03:29 -0700 IronPort-SDR: ASrS9/qD8oK1UyWLwVuwEoMxpt85pzbdqQLA7S9J7zPRF750YAgh21npH4EPGnZJ7JHsmTb2Yv IZYV0e2Sf23w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="270476342" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga007.fm.intel.com with ESMTP; 21 Jul 2020 09:03:27 -0700 Subject: [PATCH RFC v2 11/18] dmaengine: idxd: prep for virtual device commands From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, dave.hansen@intel.com, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Jul 2020 09:03:27 -0700 Message-ID: <159534740776.28840.4232881471224728330.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> References: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Update some of the device commands in order to support usage by the virtual device commands emulated by the vdcm. Expose some of the commands' raw status so the virtual commands can utilize them accordingly. Signed-off-by: Dave Jiang Reviewed-by: Kevin Tian --- drivers/dma/idxd/cdev.c | 2 + drivers/dma/idxd/device.c | 69 +++++++++++++++++++++++++++++---------------- drivers/dma/idxd/idxd.h | 8 +++-- drivers/dma/idxd/irq.c | 2 + drivers/dma/idxd/mdev.c | 2 + drivers/dma/idxd/sysfs.c | 8 +++-- 6 files changed, 56 insertions(+), 35 deletions(-) diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c index ffeae646f947..90266c0bdef3 100644 --- a/drivers/dma/idxd/cdev.c +++ b/drivers/dma/idxd/cdev.c @@ -158,7 +158,7 @@ static int idxd_cdev_release(struct inode *node, struct file *filep) if (rc < 0) dev_err(dev, "wq disable pasid failed.\n"); } - idxd_wq_drain(wq); + idxd_wq_drain(wq, NULL); } if (ctx->sva) diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index 7443ffb5d3c3..6e9d27f59638 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -197,22 +197,25 @@ void idxd_wq_free_resources(struct idxd_wq *wq) sbitmap_queue_free(&wq->sbq); } -int idxd_wq_enable(struct idxd_wq *wq) +int idxd_wq_enable(struct idxd_wq *wq, u32 *status) { struct idxd_device *idxd = wq->idxd; struct device *dev = &idxd->pdev->dev; - u32 status; + u32 stat; if (wq->state == IDXD_WQ_ENABLED) { dev_dbg(dev, "WQ %d already enabled\n", wq->id); return -ENXIO; } - idxd_cmd_exec(idxd, IDXD_CMD_ENABLE_WQ, wq->id, &status); + idxd_cmd_exec(idxd, IDXD_CMD_ENABLE_WQ, wq->id, &stat); - if (status != IDXD_CMDSTS_SUCCESS && - status != IDXD_CMDSTS_ERR_WQ_ENABLED) { - dev_dbg(dev, "WQ enable failed: %#x\n", status); + if (status) + *status = stat; + + if (stat != IDXD_CMDSTS_SUCCESS && + stat != IDXD_CMDSTS_ERR_WQ_ENABLED) { + dev_dbg(dev, "WQ enable failed: %#x\n", stat); return -ENXIO; } @@ -221,11 +224,11 @@ int idxd_wq_enable(struct idxd_wq *wq) return 0; } -int idxd_wq_disable(struct idxd_wq *wq) +int idxd_wq_disable(struct idxd_wq *wq, u32 *status) { struct idxd_device *idxd = wq->idxd; struct device *dev = &idxd->pdev->dev; - u32 status, operand; + u32 stat, operand; dev_dbg(dev, "Disabling WQ %d\n", wq->id); @@ -235,10 +238,13 @@ int idxd_wq_disable(struct idxd_wq *wq) } operand = BIT(wq->id % 16) | ((wq->id / 16) << 16); - idxd_cmd_exec(idxd, IDXD_CMD_DISABLE_WQ, operand, &status); + idxd_cmd_exec(idxd, IDXD_CMD_DISABLE_WQ, operand, &stat); + + if (status) + *status = stat; - if (status != IDXD_CMDSTS_SUCCESS) { - dev_dbg(dev, "WQ disable failed: %#x\n", status); + if (stat != IDXD_CMDSTS_SUCCESS) { + dev_dbg(dev, "WQ disable failed: %#x\n", stat); return -ENXIO; } @@ -247,20 +253,31 @@ int idxd_wq_disable(struct idxd_wq *wq) return 0; } -void idxd_wq_drain(struct idxd_wq *wq) +int idxd_wq_drain(struct idxd_wq *wq, u32 *status) { struct idxd_device *idxd = wq->idxd; struct device *dev = &idxd->pdev->dev; - u32 operand; + u32 operand, stat; if (wq->state != IDXD_WQ_ENABLED) { dev_dbg(dev, "WQ %d in wrong state: %d\n", wq->id, wq->state); - return; + return 0; } dev_dbg(dev, "Draining WQ %d\n", wq->id); operand = BIT(wq->id % 16) | ((wq->id / 16) << 16); - idxd_cmd_exec(idxd, IDXD_CMD_DRAIN_WQ, operand, NULL); + idxd_cmd_exec(idxd, IDXD_CMD_DRAIN_WQ, operand, &stat); + + if (status) + *status = stat; + + if (stat != IDXD_CMDSTS_SUCCESS) { + dev_dbg(dev, "WQ drain failed: %#x\n", stat); + return -ENXIO; + } + + dev_dbg(dev, "WQ %d drained\n", wq->id); + return 0; } int idxd_wq_map_portal(struct idxd_wq *wq) @@ -287,11 +304,11 @@ void idxd_wq_unmap_portal(struct idxd_wq *wq) devm_iounmap(dev, wq->portal); } -int idxd_wq_abort(struct idxd_wq *wq) +int idxd_wq_abort(struct idxd_wq *wq, u32 *status) { struct idxd_device *idxd = wq->idxd; struct device *dev = &idxd->pdev->dev; - u32 operand, status; + u32 operand, stat; dev_dbg(dev, "Abort WQ %d\n", wq->id); if (wq->state != IDXD_WQ_ENABLED) { @@ -301,9 +318,13 @@ int idxd_wq_abort(struct idxd_wq *wq) operand = BIT(wq->id % 16) | ((wq->id / 16) << 16); dev_dbg(dev, "cmd: %u operand: %#x\n", IDXD_CMD_ABORT_WQ, operand); - idxd_cmd_exec(idxd, IDXD_CMD_ABORT_WQ, operand, &status); - if (status != IDXD_CMDSTS_SUCCESS) { - dev_dbg(dev, "WQ abort failed: %#x\n", status); + idxd_cmd_exec(idxd, IDXD_CMD_ABORT_WQ, operand, &stat); + + if (status) + *status = stat; + + if (stat != IDXD_CMDSTS_SUCCESS) { + dev_dbg(dev, "WQ abort failed: %#x\n", stat); return -ENXIO; } @@ -319,7 +340,7 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid) unsigned int offset; unsigned long flags; - rc = idxd_wq_disable(wq); + rc = idxd_wq_disable(wq, NULL); if (rc < 0) return rc; @@ -331,7 +352,7 @@ int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid) iowrite32(wqcfg.bits[2], idxd->reg_base + offset); spin_unlock_irqrestore(&idxd->dev_lock, flags); - rc = idxd_wq_enable(wq); + rc = idxd_wq_enable(wq, NULL); if (rc < 0) return rc; @@ -346,7 +367,7 @@ int idxd_wq_disable_pasid(struct idxd_wq *wq) unsigned int offset; unsigned long flags; - rc = idxd_wq_disable(wq); + rc = idxd_wq_disable(wq, NULL); if (rc < 0) return rc; @@ -358,7 +379,7 @@ int idxd_wq_disable_pasid(struct idxd_wq *wq) iowrite32(wqcfg.bits[2], idxd->reg_base + offset); spin_unlock_irqrestore(&idxd->dev_lock, flags); - rc = idxd_wq_enable(wq); + rc = idxd_wq_enable(wq, NULL); if (rc < 0) return rc; diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 9588872cd273..1e4d9ec9b00d 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -336,15 +336,15 @@ int idxd_device_request_int_handle(struct idxd_device *idxd, int idx, int *handl /* work queue control */ int idxd_wq_alloc_resources(struct idxd_wq *wq); void idxd_wq_free_resources(struct idxd_wq *wq); -int idxd_wq_enable(struct idxd_wq *wq); -int idxd_wq_disable(struct idxd_wq *wq); -void idxd_wq_drain(struct idxd_wq *wq); +int idxd_wq_enable(struct idxd_wq *wq, u32 *status); +int idxd_wq_disable(struct idxd_wq *wq, u32 *status); +int idxd_wq_drain(struct idxd_wq *wq, u32 *status); int idxd_wq_map_portal(struct idxd_wq *wq); void idxd_wq_unmap_portal(struct idxd_wq *wq); void idxd_wq_disable_cleanup(struct idxd_wq *wq); int idxd_wq_set_pasid(struct idxd_wq *wq, int pasid); int idxd_wq_disable_pasid(struct idxd_wq *wq); -int idxd_wq_abort(struct idxd_wq *wq); +int idxd_wq_abort(struct idxd_wq *wq, u32 *status); void idxd_wq_setup_pasid(struct idxd_wq *wq, int pasid); void idxd_wq_setup_priv(struct idxd_wq *wq, int priv); diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c index c97e08480323..c818acb34a14 100644 --- a/drivers/dma/idxd/irq.c +++ b/drivers/dma/idxd/irq.c @@ -60,7 +60,7 @@ static void idxd_device_reinit(struct work_struct *work) struct idxd_wq *wq = &idxd->wqs[i]; if (wq->state == IDXD_WQ_ENABLED) { - rc = idxd_wq_enable(wq); + rc = idxd_wq_enable(wq, NULL); if (rc < 0) { dev_warn(dev, "Unable to re-enable wq %s\n", dev_name(&wq->conf_dev)); diff --git a/drivers/dma/idxd/mdev.c b/drivers/dma/idxd/mdev.c index f9cc2909b1cf..01207ca42a79 100644 --- a/drivers/dma/idxd/mdev.c +++ b/drivers/dma/idxd/mdev.c @@ -70,7 +70,7 @@ static void idxd_vdcm_init(struct vdcm_idxd *vidxd) vidxd_mmio_init(vidxd); if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED) - idxd_wq_disable(wq); + idxd_wq_disable(wq, NULL); } static void __idxd_vdcm_release(struct vdcm_idxd *vidxd) diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index 501a1d489ce3..cdf2b5aac314 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -218,7 +218,7 @@ static int idxd_config_bus_probe(struct device *dev) return rc; } - rc = idxd_wq_enable(wq); + rc = idxd_wq_enable(wq, NULL); if (rc < 0) { mutex_unlock(&wq->wq_lock); dev_warn(dev, "WQ %d enabling failed: %d\n", @@ -229,7 +229,7 @@ static int idxd_config_bus_probe(struct device *dev) rc = idxd_wq_map_portal(wq); if (rc < 0) { dev_warn(dev, "wq portal mapping failed: %d\n", rc); - rc = idxd_wq_disable(wq); + rc = idxd_wq_disable(wq, NULL); if (rc < 0) dev_warn(dev, "IDXD wq disable failed\n"); mutex_unlock(&wq->wq_lock); @@ -287,8 +287,8 @@ static void disable_wq(struct idxd_wq *wq) idxd_wq_unmap_portal(wq); - idxd_wq_drain(wq); - rc = idxd_wq_disable(wq); + idxd_wq_drain(wq, NULL); + rc = idxd_wq_disable(wq, NULL); idxd_wq_free_resources(wq); wq->client_count = 0; From patchwork Tue Jul 21 16:03:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11676013 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1A54D1392 for ; Tue, 21 Jul 2020 16:03:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 009FC22BF5 for ; Tue, 21 Jul 2020 16:03:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730266AbgGUQDi (ORCPT ); Tue, 21 Jul 2020 12:03:38 -0400 Received: from mga14.intel.com ([192.55.52.115]:23829 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730231AbgGUQDh (ORCPT ); Tue, 21 Jul 2020 12:03:37 -0400 IronPort-SDR: Cknbz0e0+UpYw6xTUkuI6366IEPi8/I+4AI98lcfA5FaCwAB+P9QpMuIUwMO6XuUz6O7h5EXGA htsfCE+5oLyg== X-IronPort-AV: E=McAfee;i="6000,8403,9689"; a="149318114" X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="149318114" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2020 09:03:35 -0700 IronPort-SDR: WwALxcDgIyEIWAy29vnbkFC+MbqzGgPPQTawD8e0X/hOpMDP2Es/UCnriRHAELcViQEE6EMbZ9 CNMeBEhmGPqA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="362407149" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga001.jf.intel.com with ESMTP; 21 Jul 2020 09:03:34 -0700 Subject: [PATCH RFC v2 12/18] dmaengine: idxd: virtual device commands emulation From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, dave.hansen@intel.com, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Jul 2020 09:03:34 -0700 Message-ID: <159534741398.28840.9899761904385942767.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> References: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add all the helper functions that supports the emulation of the commands that are submitted to the device command register. Signed-off-by: Dave Jiang Reviewed-by: Kevin Tian --- drivers/dma/idxd/registers.h | 16 +- drivers/dma/idxd/vdev.c | 398 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 409 insertions(+), 5 deletions(-) diff --git a/drivers/dma/idxd/registers.h b/drivers/dma/idxd/registers.h index f8e4dd10a738..6531a40fad2e 100644 --- a/drivers/dma/idxd/registers.h +++ b/drivers/dma/idxd/registers.h @@ -115,7 +115,8 @@ union gencfg_reg { union genctrl_reg { struct { u32 softerr_int_en:1; - u32 rsvd:31; + u32 halt_state_int_en:1; + u32 rsvd:30; }; u32 bits; } __packed; @@ -137,6 +138,8 @@ enum idxd_device_status_state { IDXD_DEVICE_STATE_HALT, }; +#define IDXD_GENSTATS_MASK 0x03 + enum idxd_device_reset_type { IDXD_DEVICE_RESET_SOFTWARE = 0, IDXD_DEVICE_RESET_FLR, @@ -149,6 +152,7 @@ enum idxd_device_reset_type { #define IDXD_INTC_CMD 0x02 #define IDXD_INTC_OCCUPY 0x04 #define IDXD_INTC_PERFMON_OVFL 0x08 +#define IDXD_INTC_HALT_STATE 0x10 #define IDXD_CMD_OFFSET 0xa0 union idxd_command_reg { @@ -160,6 +164,7 @@ union idxd_command_reg { }; u32 bits; } __packed; +#define IDXD_CMD_INT_MASK 0x80000000 enum idxd_cmd { IDXD_CMD_ENABLE_DEVICE = 1, @@ -217,7 +222,7 @@ enum idxd_cmdsts_err { /* disable device errors */ IDXD_CMDSTS_ERR_DIS_DEV_EN = 0x31, /* disable WQ, drain WQ, abort WQ, reset WQ */ - IDXD_CMDSTS_ERR_DEV_NOT_EN, + IDXD_CMDSTS_ERR_WQ_NOT_EN, /* request interrupt handle */ IDXD_CMDSTS_ERR_INVAL_INT_IDX = 0x41, IDXD_CMDSTS_ERR_NO_HANDLE, @@ -353,4 +358,11 @@ union wqcfg { #define WQCFG_OFFSET(idxd_dev, n, ofs) ((idxd_dev)->wqcfg_offset +\ (n) * sizeof(union wqcfg) +\ sizeof(u32) * (ofs)) + +enum idxd_wq_hw_state { + IDXD_WQ_DEV_DISABLED = 0, + IDXD_WQ_DEV_ENABLED, + IDXD_WQ_DEV_BUSY, +}; + #endif diff --git a/drivers/dma/idxd/vdev.c b/drivers/dma/idxd/vdev.c index b4eace02199e..df99d0bce5e9 100644 --- a/drivers/dma/idxd/vdev.c +++ b/drivers/dma/idxd/vdev.c @@ -81,6 +81,18 @@ static void vidxd_report_error(struct vdcm_idxd *vidxd, unsigned int error) } } +static int idxd_get_mdev_pasid(struct mdev_device *mdev) +{ + struct iommu_domain *domain; + struct device *dev = mdev_dev(mdev); + + domain = mdev_get_iommu_domain(dev); + if (!domain) + return -EINVAL; + + return iommu_aux_get_pasid(domain, dev->parent); +} + int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size) { u32 offset = pos & (vidxd->bar_size[0] - 1); @@ -474,15 +486,395 @@ void vidxd_mmio_init(struct vdcm_idxd *vidxd) static void idxd_complete_command(struct vdcm_idxd *vidxd, enum idxd_cmdsts_err val) { - /* PLACEHOLDER */ + u8 *bar0 = vidxd->bar0; + u32 *cmd = (u32 *)(bar0 + IDXD_CMD_OFFSET); + u32 *cmdsts = (u32 *)(bar0 + IDXD_CMDSTS_OFFSET); + u32 *intcause = (u32 *)(bar0 + IDXD_INTCAUSE_OFFSET); + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + *cmdsts = val; + dev_dbg(dev, "%s: cmd: %#x status: %#x\n", __func__, *cmd, val); + + if (*cmd & IDXD_CMD_INT_MASK) { + *intcause |= IDXD_INTC_CMD; + vidxd_send_interrupt(vidxd, 0); + } +} + +static void vidxd_enable(struct vdcm_idxd *vidxd) +{ + u8 *bar0 = vidxd->bar0; + union gensts_reg *gensts = (union gensts_reg *)(bar0 + IDXD_GENSTATS_OFFSET); + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + dev_dbg(dev, "%s\n", __func__); + if (gensts->state == IDXD_DEVICE_STATE_ENABLED) + return idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DEV_ENABLED); + + /* Check PCI configuration */ + if (!(vidxd->cfg[PCI_COMMAND] & PCI_COMMAND_MASTER)) + return idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_BUSMASTER_EN); + + gensts->state = IDXD_DEVICE_STATE_ENABLED; + + return idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_disable(struct vdcm_idxd *vidxd) +{ + struct idxd_wq *wq; + union wqcfg *wqcfg; + u8 *bar0 = vidxd->bar0; + union gensts_reg *gensts = (union gensts_reg *)(bar0 + IDXD_GENSTATS_OFFSET); + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u32 status; + + dev_dbg(dev, "%s\n", __func__); + if (gensts->state == IDXD_DEVICE_STATE_DISABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DIS_DEV_EN); + return; + } + + wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + wq = vidxd->wq; + + /* If it is a DWQ, need to disable the DWQ as well */ + if (wq_dedicated(wq)) { + idxd_wq_disable(wq, &status); + if (status) { + dev_warn(dev, "vidxd disable (wq disable) failed: %#x\n", status); + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DIS_DEV_EN); + return; + } + } else { + idxd_wq_drain(wq, &status); + if (status) + dev_warn(dev, "vidxd disable (wq drain) failed: %#x\n", status); + } + + wqcfg->wq_state = 0; + gensts->state = IDXD_DEVICE_STATE_DISABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_drain_all(struct vdcm_idxd *vidxd) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct idxd_wq *wq = vidxd->wq; + + dev_dbg(dev, "%s\n", __func__); + + idxd_wq_drain(wq, NULL); + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_drain(struct vdcm_idxd *vidxd, int val) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u8 *bar0 = vidxd->bar0; + union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + struct idxd_wq *wq = vidxd->wq; + u32 status; + + dev_dbg(dev, "%s\n", __func__); + if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN); + return; + } + + idxd_wq_drain(wq, &status); + if (status) { + dev_dbg(dev, "wq drain failed: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_abort_all(struct vdcm_idxd *vidxd) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct idxd_wq *wq = vidxd->wq; + + dev_dbg(dev, "%s\n", __func__); + idxd_wq_abort(wq, NULL); + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_abort(struct vdcm_idxd *vidxd, int val) +{ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u8 *bar0 = vidxd->bar0; + union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + struct idxd_wq *wq = vidxd->wq; + u32 status; + + dev_dbg(dev, "%s\n", __func__); + if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN); + return; + } + + idxd_wq_abort(wq, &status); + if (status) { + dev_dbg(dev, "wq abort failed: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); } void vidxd_reset(struct vdcm_idxd *vidxd) { - /* PLACEHOLDER */ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u8 *bar0 = vidxd->bar0; + union gensts_reg *gensts = (union gensts_reg *)(bar0 + IDXD_GENSTATS_OFFSET); + struct idxd_wq *wq; + + dev_dbg(dev, "%s\n", __func__); + gensts->state = IDXD_DEVICE_STATE_DRAIN; + wq = vidxd->wq; + + if (wq->state == IDXD_WQ_ENABLED) { + idxd_wq_abort(wq, NULL); + idxd_wq_disable(wq, NULL); + } + + vidxd_mmio_init(vidxd); + gensts->state = IDXD_DEVICE_STATE_DISABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_reset(struct vdcm_idxd *vidxd, int wq_id_mask) +{ + struct idxd_wq *wq; + u8 *bar0 = vidxd->bar0; + union wqcfg *wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u32 status; + + wq = vidxd->wq; + dev_dbg(dev, "vidxd reset wq %u:%u\n", 0, wq->id); + + if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN); + return; + } + + idxd_wq_abort(wq, &status); + if (status) { + dev_dbg(dev, "vidxd reset wq failed to abort: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + + idxd_wq_disable(wq, &status); + if (status) { + dev_dbg(dev, "vidxd reset wq failed to disable: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + + wqcfg->wq_state = IDXD_WQ_DEV_DISABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_alloc_int_handle(struct vdcm_idxd *vidxd, int vidx) +{ + bool ims = (vidx >> 16) & 1; + u32 cmdsts; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + vidx = vidx & 0xffff; + + dev_dbg(dev, "allocating int handle for %x\n", vidx); + + if (vidx != 1) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_INVAL_INT_IDX); + return; + } + + if (ims) { + dev_warn(dev, "IMS allocation is not implemented yet\n"); + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_NO_HANDLE); + } else { + vidx--; /* MSIX idx 0 is a slow path interrupt */ + cmdsts = vidxd->ims_index[vidx] << 8; + dev_dbg(dev, "int handle %d:%lld\n", vidx, vidxd->ims_index[vidx]); + idxd_complete_command(vidxd, cmdsts); + } +} + +static void vidxd_wq_enable(struct vdcm_idxd *vidxd, int wq_id) +{ + struct idxd_wq *wq; + u8 *bar0 = vidxd->bar0; + union wq_cap_reg *wqcap; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct idxd_device *idxd; + union wqcfg *vwqcfg, *wqcfg; + unsigned long flags; + int wq_pasid; + u32 status; + int priv; + + if (wq_id >= VIDXD_MAX_WQS) { + idxd_complete_command(vidxd, IDXD_CMDSTS_INVAL_WQIDX); + return; + } + + idxd = vidxd->idxd; + wq = vidxd->wq; + + dev_dbg(dev, "%s: wq %u:%u\n", __func__, wq_id, wq->id); + + vwqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET + wq_id * 32); + wqcap = (union wq_cap_reg *)(bar0 + IDXD_WQCAP_OFFSET); + wqcfg = &wq->wqcfg; + + if (vidxd_state(vidxd) != IDXD_DEVICE_STATE_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_DEV_NOTEN); + return; + } + + if (vwqcfg->wq_state != IDXD_WQ_DEV_DISABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_ENABLED); + return; + } + + if (wq_dedicated(wq) && wqcap->dedicated_mode == 0) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_MODE); + return; + } + + wq_pasid = idxd_get_mdev_pasid(mdev); + priv = 1; + + if (wq_pasid >= 0) { + wqcfg->bits[2] &= ~0x3fffff00; + wqcfg->priv = priv; + wqcfg->pasid_en = 1; + wqcfg->pasid = wq_pasid; + dev_dbg(dev, "program pasid %d in wq %d\n", wq_pasid, wq->id); + spin_lock_irqsave(&idxd->dev_lock, flags); + idxd_wq_setup_pasid(wq, wq_pasid); + idxd_wq_setup_priv(wq, priv); + spin_unlock_irqrestore(&idxd->dev_lock, flags); + idxd_wq_enable(wq, &status); + if (status) { + dev_err(dev, "vidxd enable wq %d failed\n", wq->id); + idxd_complete_command(vidxd, status); + return; + } + } else { + dev_err(dev, "idxd pasid setup failed wq %d wq_pasid %d\n", wq->id, wq_pasid); + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_PASID_EN); + return; + } + + vwqcfg->wq_state = IDXD_WQ_DEV_ENABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); +} + +static void vidxd_wq_disable(struct vdcm_idxd *vidxd, int wq_id_mask) +{ + struct idxd_wq *wq; + union wqcfg *wqcfg; + u8 *bar0 = vidxd->bar0; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + u32 status; + + wq = vidxd->wq; + + dev_dbg(dev, "vidxd disable wq %u:%u\n", 0, wq->id); + + wqcfg = (union wqcfg *)(bar0 + VIDXD_WQCFG_OFFSET); + if (wqcfg->wq_state != IDXD_WQ_DEV_ENABLED) { + idxd_complete_command(vidxd, IDXD_CMDSTS_ERR_WQ_NOT_EN); + return; + } + + /* If it is a DWQ, need to disable the DWQ as well */ + if (wq_dedicated(wq)) { + idxd_wq_disable(wq, &status); + if (status) { + dev_warn(dev, "vidxd disable wq failed: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + } else { + idxd_wq_drain(wq, &status); + if (status) { + dev_warn(dev, "vidxd disable drain wq failed: %#x\n", status); + idxd_complete_command(vidxd, status); + return; + } + } + + wqcfg->wq_state = IDXD_WQ_DEV_DISABLED; + idxd_complete_command(vidxd, IDXD_CMDSTS_SUCCESS); } void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val) { - /* PLACEHOLDER */ + union idxd_command_reg *reg = (union idxd_command_reg *)(vidxd->bar0 + IDXD_CMD_OFFSET); + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + + reg->bits = val; + + dev_dbg(dev, "%s: cmd code: %u reg: %x\n", __func__, reg->cmd, reg->bits); + + switch (reg->cmd) { + case IDXD_CMD_ENABLE_DEVICE: + vidxd_enable(vidxd); + break; + case IDXD_CMD_DISABLE_DEVICE: + vidxd_disable(vidxd); + break; + case IDXD_CMD_DRAIN_ALL: + vidxd_drain_all(vidxd); + break; + case IDXD_CMD_ABORT_ALL: + vidxd_abort_all(vidxd); + break; + case IDXD_CMD_RESET_DEVICE: + vidxd_reset(vidxd); + break; + case IDXD_CMD_ENABLE_WQ: + vidxd_wq_enable(vidxd, reg->operand); + break; + case IDXD_CMD_DISABLE_WQ: + vidxd_wq_disable(vidxd, reg->operand); + break; + case IDXD_CMD_DRAIN_WQ: + vidxd_wq_drain(vidxd, reg->operand); + break; + case IDXD_CMD_ABORT_WQ: + vidxd_wq_abort(vidxd, reg->operand); + break; + case IDXD_CMD_RESET_WQ: + vidxd_wq_reset(vidxd, reg->operand); + break; + case IDXD_CMD_REQUEST_INT_HANDLE: + vidxd_alloc_int_handle(vidxd, reg->operand); + break; + default: + idxd_complete_command(vidxd, IDXD_CMDSTS_INVAL_CMD); + break; + } } From patchwork Tue Jul 21 16:03:40 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11676025 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 79FDF1392 for ; Tue, 21 Jul 2020 16:03:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 637F02077D for ; Tue, 21 Jul 2020 16:03:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730295AbgGUQD4 (ORCPT ); Tue, 21 Jul 2020 12:03:56 -0400 Received: from mga05.intel.com ([192.55.52.43]:35066 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727903AbgGUQD4 (ORCPT ); Tue, 21 Jul 2020 12:03:56 -0400 IronPort-SDR: IkDxihq4sFVhBpu0oUT4xZ3vRJYe2T2oxm+XXugJQ8VEoH7fcVSX/Q57x++x89l7XHo1vwAb6b xtFIIl1ZmEHQ== X-IronPort-AV: E=McAfee;i="6000,8403,9689"; a="235019338" X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="235019338" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2020 09:03:44 -0700 IronPort-SDR: TbQmM6QJK1TY8Upp2xDxQXhNIQ2dVdqFM8BiRJGD3v9de8WoGxfSkxdKyxfi2hiqj7h5tdPcXU HW4+G/zm6bTg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="318379937" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga008.jf.intel.com with ESMTP; 21 Jul 2020 09:03:40 -0700 Subject: [PATCH RFC v2 13/18] dmaengine: idxd: ims setup for the vdcm From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, dave.hansen@intel.com, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Jul 2020 09:03:40 -0700 Message-ID: <159534742073.28840.5432268637638647551.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> References: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add support for IMS enabling on the mediated device. On the actual hardware the MSIX vector 0 is misc interrupt and handles events such as administrative command completion, error reporting, performance monitor overflow, and etc. The MSIX vectors 1...N are used for descriptor completion interrupts. On the guest kernel, the MSIX interrupts are backed by the mediated device through emulation or IMS vectors. Vector 0 is handled through emulation by the host vdcm. The vector 1 (and more may be supported later) is backed by IMS. IMS can be setup with interrupt handlers via request_irq() just like MSIX interrupts once the relevant IRQ domain is set with dev_msi_domain_alloc_irqs(). Signed-off-by: Dave Jiang Reviewed-by: Kevin Tian --- drivers/dma/Kconfig | 1 drivers/dma/idxd/ims.c | 142 +++++++++++++++++++++++++++++++++++++++++------ drivers/dma/idxd/ims.h | 7 ++ drivers/dma/idxd/vdev.c | 76 +++++++++++++++++++++---- 4 files changed, 195 insertions(+), 31 deletions(-) diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig index 69c1ae72df86..a19e5dbeab9b 100644 --- a/drivers/dma/Kconfig +++ b/drivers/dma/Kconfig @@ -311,6 +311,7 @@ config INTEL_IDXD_MDEV depends on INTEL_IDXD depends on VFIO_MDEV depends on VFIO_MDEV_DEVICE + depends on DEV_MSI config INTEL_IOATDMA tristate "Intel I/OAT DMA support" diff --git a/drivers/dma/idxd/ims.c b/drivers/dma/idxd/ims.c index bffc74c2b305..f9b7fbcb61df 100644 --- a/drivers/dma/idxd/ims.c +++ b/drivers/dma/idxd/ims.c @@ -7,22 +7,13 @@ #include #include #include +#include #include #include "registers.h" #include "idxd.h" #include "mdev.h" - -int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd) -{ - /* PLACEHOLDER */ - return 0; -} - -int vidxd_free_ims_entries(struct vdcm_idxd *vidxd) -{ - /* PLACEHOLDER */ - return 0; -} +#include "ims.h" +#include "vdev.h" static void idxd_free_ims_index(struct idxd_device *idxd, unsigned long ims_idx) @@ -42,21 +33,65 @@ static int idxd_alloc_ims_index(struct idxd_device *idxd) static unsigned int idxd_ims_irq_mask(struct msi_desc *desc) { - // Filled out later when VDCM is introduced. + int ims_offset; + u32 mask_bits; + struct device *dev = desc->dev; + struct mdev_device *mdev = mdev_from_dev(dev); + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + struct idxd_device *idxd = vidxd->idxd; + void __iomem *base; + int ims_id = desc->platform.msi_index; - return 0; + dev_dbg(dev, "idxd irq mask: %d\n", ims_id); + + ims_offset = idxd->ims_offset + vidxd->ims_index[ims_id] * 0x10; + base = idxd->reg_base + ims_offset; + mask_bits = ioread32(base + IMS_ENTRY_VECTOR_CTRL); + mask_bits |= IMS_ENTRY_CTRL_MASKBIT; + iowrite32(mask_bits, base + IMS_ENTRY_VECTOR_CTRL); + + return mask_bits; } static unsigned int idxd_ims_irq_unmask(struct msi_desc *desc) { - // Filled out later when VDCM is introduced. + int ims_offset; + u32 mask_bits; + struct device *dev = desc->dev; + struct mdev_device *mdev = mdev_from_dev(dev); + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + struct idxd_device *idxd = vidxd->idxd; + void __iomem *base; + int ims_id = desc->platform.msi_index; - return 0; + dev_dbg(dev, "idxd irq unmask: %d\n", ims_id); + + ims_offset = idxd->ims_offset + vidxd->ims_index[ims_id] * 0x10; + base = idxd->reg_base + ims_offset; + mask_bits = ioread32(base + IMS_ENTRY_VECTOR_CTRL); + mask_bits &= ~IMS_ENTRY_CTRL_MASKBIT; + iowrite32(mask_bits, base + IMS_ENTRY_VECTOR_CTRL); + + return mask_bits; } static void idxd_ims_write_msg(struct msi_desc *desc, struct msi_msg *msg) { - // Filled out later when VDCM is introduced. + int ims_offset; + struct device *dev = desc->dev; + struct mdev_device *mdev = mdev_from_dev(dev); + struct vdcm_idxd *vidxd = mdev_get_drvdata(mdev); + struct idxd_device *idxd = vidxd->idxd; + void __iomem *base; + int ims_id = desc->platform.msi_index; + + dev_dbg(dev, "ims_write: %d %x\n", ims_id, msg->address_lo); + + ims_offset = idxd->ims_offset + vidxd->ims_index[ims_id] * 0x10; + base = idxd->reg_base + ims_offset; + iowrite32(msg->address_lo, base + IMS_ENTRY_LOWER_ADDR); + iowrite32(msg->address_hi, base + IMS_ENTRY_UPPER_ADDR); + iowrite32(msg->data, base + IMS_ENTRY_DATA); } static struct platform_msi_ops idxd_ims_ops = { @@ -64,3 +99,76 @@ static struct platform_msi_ops idxd_ims_ops = { .irq_unmask = idxd_ims_irq_unmask, .write_msg = idxd_ims_write_msg, }; + +int vidxd_free_ims_entries(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + struct ims_irq_entry *irq_entry; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct msi_desc *desc; + int i = 0; + + for_each_msi_entry(desc, dev) { + irq_entry = &vidxd->irq_entries[i]; + /* + * When qemu dies unexpectedly, it does not call VFIO_IRQ_SET_DATA_NONE ioctl + * to free up the interrupts. We need to free the interrupts here as clean up + * if they haven't been freed. + */ + if (irq_entry->irq_set) + free_irq(irq_entry->irq, irq_entry); + idxd_free_ims_index(idxd, vidxd->ims_index[i]); + vidxd->ims_index[i] = -1; + memset(irq_entry, 0, sizeof(*irq_entry)); + i++; + } + + dev_msi_domain_free_irqs(dev); + return 0; +} + +int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + struct ims_irq_entry *irq_entry; + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + struct msi_desc *desc; + int err, i = 0; + int index; + + /* + * MSIX vec 0 is emulated by the vdcm and does not take up an IMS. The total MSIX vecs used + * by the mdev will be total IMS + 1. vec 0 is used for misc interrupts such as command + * completion, error notification, PMU, etc. The other vectors are used for descriptor + * completion. Thus only the number of IMS vectors need to be allocated, which is + * VIDXD_MAX_MSIX_VECS - 1. + */ + err = dev_msi_domain_alloc_irqs(dev, VIDXD_MAX_MSIX_VECS - 1, &idxd_ims_ops); + if (err < 0) { + dev_dbg(dev, "Enabling IMS entry! %d\n", err); + return err; + } + + i = 0; + for_each_msi_entry(desc, dev) { + index = idxd_alloc_ims_index(idxd); + if (index < 0) { + err = index; + break; + } + vidxd->ims_index[i] = index; + + irq_entry = &vidxd->irq_entries[i]; + irq_entry->vidxd = vidxd; + irq_entry->int_src = i; + irq_entry->irq = desc->irq; + i++; + } + + if (err) + vidxd_free_ims_entries(vidxd); + + return 0; +} diff --git a/drivers/dma/idxd/ims.h b/drivers/dma/idxd/ims.h index 3d823606e3a3..97826abf1163 100644 --- a/drivers/dma/idxd/ims.h +++ b/drivers/dma/idxd/ims.h @@ -4,6 +4,13 @@ #ifndef _IDXD_IMS_H_ #define _IDXD_IMS_H_ +/* IMS entry format */ +#define IMS_ENTRY_LOWER_ADDR 0 /* Message Address */ +#define IMS_ENTRY_UPPER_ADDR 4 /* Message Upper Address */ +#define IMS_ENTRY_DATA 8 /* Message Data */ +#define IMS_ENTRY_VECTOR_CTRL 12 /* Vector Control */ +#define IMS_ENTRY_CTRL_MASKBIT 0x00000001 + int vidxd_setup_ims_entries(struct vdcm_idxd *vidxd); int vidxd_free_ims_entries(struct vdcm_idxd *vidxd); diff --git a/drivers/dma/idxd/vdev.c b/drivers/dma/idxd/vdev.c index df99d0bce5e9..66e59cb02635 100644 --- a/drivers/dma/idxd/vdev.c +++ b/drivers/dma/idxd/vdev.c @@ -44,15 +44,75 @@ int vidxd_send_interrupt(struct vdcm_idxd *vidxd, int msix_idx) return rc; } +static int idxd_get_mdev_pasid(struct mdev_device *mdev) +{ + struct iommu_domain *domain; + struct device *dev = mdev_dev(mdev); + + domain = mdev_get_iommu_domain(dev); + if (!domain) + return -EINVAL; + + return iommu_aux_get_pasid(domain, dev->parent); +} + +#define IMS_PASID_ENABLE 0x8 int vidxd_disable_host_ims_pasid(struct vdcm_idxd *vidxd, int ims_idx) { - /* PLACEHOLDER */ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + unsigned int ims_offset; + struct idxd_device *idxd = vidxd->idxd; + u32 val; + + /* + * Current implementation limits to 1 WQ for the vdev and therefore + * also only 1 IMS interrupt for that vdev. + */ + if (ims_idx >= VIDXD_MAX_WQS) { + dev_warn(dev, "ims_idx greater than vidxd allowed: %d\n", ims_idx); + return -EINVAL; + } + + ims_offset = idxd->ims_offset + vidxd->ims_index[ims_idx] * 0x10; + val = ioread32(idxd->reg_base + ims_offset + 12); + val &= ~IMS_PASID_ENABLE; + iowrite32(val, idxd->reg_base + ims_offset + 12); + return 0; } int vidxd_enable_host_ims_pasid(struct vdcm_idxd *vidxd, int ims_idx) { - /* PLACEHOLDER */ + struct mdev_device *mdev = vidxd->vdev.mdev; + struct device *dev = mdev_dev(mdev); + int pasid; + unsigned int ims_offset; + struct idxd_device *idxd = vidxd->idxd; + u32 val; + + /* + * Current implementation limits to 1 WQ for the vdev and therefore + * also only 1 IMS interrupt for that vdev. + */ + if (ims_idx >= VIDXD_MAX_WQS) { + dev_warn(dev, "ims_idx greater than vidxd allowed: %d\n", ims_idx); + return -EINVAL; + } + + /* Setup the PASID filtering */ + pasid = idxd_get_mdev_pasid(mdev); + + if (pasid >= 0) { + ims_offset = idxd->ims_offset + vidxd->ims_index[ims_idx] * 0x10; + val = ioread32(idxd->reg_base + ims_offset + 12); + val |= IMS_PASID_ENABLE | (pasid << 12) | (val & 0x7); + iowrite32(val, idxd->reg_base + ims_offset + 12); + } else { + dev_warn(dev, "pasid setup failed for ims entry %lld\n", vidxd->ims_index[ims_idx]); + return -ENXIO; + } + return 0; } @@ -81,18 +141,6 @@ static void vidxd_report_error(struct vdcm_idxd *vidxd, unsigned int error) } } -static int idxd_get_mdev_pasid(struct mdev_device *mdev) -{ - struct iommu_domain *domain; - struct device *dev = mdev_dev(mdev); - - domain = mdev_get_iommu_domain(dev); - if (!domain) - return -EINVAL; - - return iommu_aux_get_pasid(domain, dev->parent); -} - int vidxd_mmio_write(struct vdcm_idxd *vidxd, u64 pos, void *buf, unsigned int size) { u32 offset = pos & (vidxd->bar_size[0] - 1); From patchwork Tue Jul 21 16:03:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11676019 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9ECF3618 for ; Tue, 21 Jul 2020 16:03:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 910B82077D for ; Tue, 21 Jul 2020 16:03:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730280AbgGUQDt (ORCPT ); Tue, 21 Jul 2020 12:03:49 -0400 Received: from mga11.intel.com ([192.55.52.93]:9860 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727115AbgGUQDt (ORCPT ); Tue, 21 Jul 2020 12:03:49 -0400 IronPort-SDR: mT8yMac5VWnWmqcNRDp7w8T5XNLSma1VsrjjGNI8gsY6K5qZLaIWmqeygrQPkjnipyaO91Mqt/ N8+YtquiNJww== X-IronPort-AV: E=McAfee;i="6000,8403,9689"; a="148100754" X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="148100754" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2020 09:03:48 -0700 IronPort-SDR: 0oyIdad1lxDFJNYjQZK8aIhpAdXIs04RAE6PYdWPXVFD8P0lQpJHwvS0/LaL43Mmz4E/cYPjqi /e3oekyyyIRg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="310293846" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga004.fm.intel.com with ESMTP; 21 Jul 2020 09:03:47 -0700 Subject: [PATCH RFC v2 14/18] dmaengine: idxd: add mdev type as a new wq type From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, dave.hansen@intel.com, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Jul 2020 09:03:47 -0700 Message-ID: <159534742756.28840.2462836305382720334.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> References: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add "mdev" wq type and support helpers. The mdev wq type marks the wq to be utilized as a VFIO mediated device. Signed-off-by: Dave Jiang Reviewed-by: Kevin Tian --- drivers/dma/idxd/idxd.h | 2 ++ drivers/dma/idxd/sysfs.c | 13 +++++++++++-- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 1e4d9ec9b00d..1f03019bb45d 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -71,6 +71,7 @@ enum idxd_wq_type { IDXD_WQT_NONE = 0, IDXD_WQT_KERNEL, IDXD_WQT_USER, + IDXD_WQT_MDEV, }; struct idxd_cdev { @@ -308,6 +309,7 @@ void idxd_cleanup_sysfs(struct idxd_device *idxd); int idxd_register_driver(void); void idxd_unregister_driver(void); struct bus_type *idxd_get_bus_type(struct idxd_device *idxd); +bool is_idxd_wq_mdev(struct idxd_wq *wq); /* device interrupt control */ irqreturn_t idxd_irq_handler(int vec, void *data); diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index cdf2b5aac314..d3b0a95b0d1d 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -14,6 +14,7 @@ static char *idxd_wq_type_names[] = { [IDXD_WQT_NONE] = "none", [IDXD_WQT_KERNEL] = "kernel", [IDXD_WQT_USER] = "user", + [IDXD_WQT_MDEV] = "mdev", }; static void idxd_conf_device_release(struct device *dev) @@ -69,6 +70,11 @@ static inline bool is_idxd_wq_cdev(struct idxd_wq *wq) return wq->type == IDXD_WQT_USER; } +inline bool is_idxd_wq_mdev(struct idxd_wq *wq) +{ + return wq->type == IDXD_WQT_MDEV ? true : false; +} + static int idxd_config_bus_match(struct device *dev, struct device_driver *drv) { @@ -1095,8 +1101,9 @@ static ssize_t wq_type_show(struct device *dev, return sprintf(buf, "%s\n", idxd_wq_type_names[IDXD_WQT_KERNEL]); case IDXD_WQT_USER: - return sprintf(buf, "%s\n", - idxd_wq_type_names[IDXD_WQT_USER]); + return sprintf(buf, "%s\n", idxd_wq_type_names[IDXD_WQT_USER]); + case IDXD_WQT_MDEV: + return sprintf(buf, "%s\n", idxd_wq_type_names[IDXD_WQT_MDEV]); case IDXD_WQT_NONE: default: return sprintf(buf, "%s\n", @@ -1123,6 +1130,8 @@ static ssize_t wq_type_store(struct device *dev, wq->type = IDXD_WQT_KERNEL; else if (sysfs_streq(buf, idxd_wq_type_names[IDXD_WQT_USER])) wq->type = IDXD_WQT_USER; + else if (sysfs_streq(buf, idxd_wq_type_names[IDXD_WQT_MDEV])) + wq->type = IDXD_WQT_MDEV; else return -EINVAL; From patchwork Tue Jul 21 16:03:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11676029 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CDFF8618 for ; Tue, 21 Jul 2020 16:04:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C0FAC207DD for ; Tue, 21 Jul 2020 16:04:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730311AbgGUQD7 (ORCPT ); Tue, 21 Jul 2020 12:03:59 -0400 Received: from mga07.intel.com ([134.134.136.100]:3141 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730287AbgGUQD5 (ORCPT ); Tue, 21 Jul 2020 12:03:57 -0400 IronPort-SDR: coDlyFPbzNL+V/zlp/4xO0mrBliQ5GCo3bRWyrWKOct78WIgi0ioQBH/e05goH/Bjq1QmZ/hRI xB0GFVy1wnJg== X-IronPort-AV: E=McAfee;i="6000,8403,9689"; a="214815456" X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="214815456" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2020 09:03:55 -0700 IronPort-SDR: dRgRTHoVhoULeTzKBuAHpfKKzxGpg7J54u4aGWontPX3WG9vAPwwXxjs1q2mqHlJkOWegq526N xC6FCJzZ9Wig== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="432045318" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga004.jf.intel.com with ESMTP; 21 Jul 2020 09:03:53 -0700 Subject: [PATCH RFC v2 15/18] dmaengine: idxd: add dedicated wq mdev type From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, dave.hansen@intel.com, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Jul 2020 09:03:53 -0700 Message-ID: <159534743378.28840.14343631681400866758.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> References: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Add the support code for "1dwq" mdev type. This mdev type follows the standard VFIO mdev flow. The "1dwq" type will export a single dedicated wq to the mdev. The dwq will have read-only configuration that is configured by the host. The mdev type does not support PASID and SVA and will match the stage 1 driver in functional support. For backward compatibility, the mdev will maintain the DSA spec definition of this mdev type once the commit goes upstream. Signed-off-by: Dave Jiang Reviewed-by: Kevin Tian --- drivers/dma/idxd/mdev.c | 142 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 133 insertions(+), 9 deletions(-) diff --git a/drivers/dma/idxd/mdev.c b/drivers/dma/idxd/mdev.c index 01207ca42a79..744adfdc06cd 100644 --- a/drivers/dma/idxd/mdev.c +++ b/drivers/dma/idxd/mdev.c @@ -113,21 +113,58 @@ static void idxd_vdcm_release_work(struct work_struct *work) __idxd_vdcm_release(vidxd); } +static struct idxd_wq *find_any_dwq(struct idxd_device *idxd) +{ + int i; + struct idxd_wq *wq; + unsigned long flags; + + spin_lock_irqsave(&idxd->dev_lock, flags); + for (i = 0; i < idxd->max_wqs; i++) { + wq = &idxd->wqs[i]; + + if (wq->state != IDXD_WQ_ENABLED) + continue; + + if (!wq_dedicated(wq)) + continue; + + if (idxd_wq_refcount(wq) != 0) + continue; + + spin_unlock_irqrestore(&idxd->dev_lock, flags); + mutex_lock(&wq->wq_lock); + if (idxd_wq_refcount(wq)) { + spin_lock_irqsave(&idxd->dev_lock, flags); + continue; + } + + idxd_wq_get(wq); + mutex_unlock(&wq->wq_lock); + return wq; + } + + spin_unlock_irqrestore(&idxd->dev_lock, flags); + return NULL; +} + static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev_device *mdev, struct vdcm_idxd_type *type) { struct vdcm_idxd *vidxd; struct idxd_wq *wq = NULL; - int i; - - /* PLACEHOLDER, wq matching comes later */ + int i, rc; + if (type->type == IDXD_MDEV_TYPE_1_DWQ) + wq = find_any_dwq(idxd); if (!wq) return ERR_PTR(-ENODEV); vidxd = kzalloc(sizeof(*vidxd), GFP_KERNEL); - if (!vidxd) - return ERR_PTR(-ENOMEM); + if (!vidxd) { + rc = -ENOMEM; + goto err; + } mutex_init(&vidxd->dev_lock); vidxd->idxd = idxd; @@ -142,14 +179,23 @@ static struct vdcm_idxd *vdcm_vidxd_create(struct idxd_device *idxd, struct mdev INIT_WORK(&vidxd->vdev.release_work, idxd_vdcm_release_work); idxd_vdcm_init(vidxd); - mutex_lock(&wq->wq_lock); - idxd_wq_get(wq); - mutex_unlock(&wq->wq_lock); return vidxd; + + err: + mutex_lock(&wq->wq_lock); + idxd_wq_put(wq); + mutex_unlock(&wq->wq_lock); + return ERR_PTR(rc); } -static struct vdcm_idxd_type idxd_mdev_types[IDXD_MDEV_TYPES]; +static struct vdcm_idxd_type idxd_mdev_types[IDXD_MDEV_TYPES] = { + { + .name = "1dwq", + .description = "IDXD MDEV with 1 dedicated workqueue", + .type = IDXD_MDEV_TYPE_1_DWQ, + }, +}; static struct vdcm_idxd_type *idxd_vdcm_find_vidxd_type(struct device *dev, const char *name) @@ -932,7 +978,85 @@ static long idxd_vdcm_ioctl(struct mdev_device *mdev, unsigned int cmd, return rc; } +static ssize_t name_show(struct kobject *kobj, struct device *dev, char *buf) +{ + struct vdcm_idxd_type *type; + + type = idxd_vdcm_find_vidxd_type(dev, kobject_name(kobj)); + + if (type) + return sprintf(buf, "%s\n", type->description); + + return -EINVAL; +} +static MDEV_TYPE_ATTR_RO(name); + +static int find_available_mdev_instances(struct idxd_device *idxd, struct vdcm_idxd_type *type) +{ + int count = 0, i; + unsigned long flags; + + if (type->type != IDXD_MDEV_TYPE_1_DWQ) + return 0; + + spin_lock_irqsave(&idxd->dev_lock, flags); + for (i = 0; i < idxd->max_wqs; i++) { + struct idxd_wq *wq; + + wq = &idxd->wqs[i]; + if (!is_idxd_wq_mdev(wq) || !wq_dedicated(wq) || idxd_wq_refcount(wq)) + continue; + + count++; + } + spin_unlock_irqrestore(&idxd->dev_lock, flags); + + return count; +} + +static ssize_t available_instances_show(struct kobject *kobj, + struct device *dev, char *buf) +{ + int count; + struct idxd_device *idxd = dev_get_drvdata(dev); + struct vdcm_idxd_type *type; + + type = idxd_vdcm_find_vidxd_type(dev, kobject_name(kobj)); + if (!type) + return -EINVAL; + + count = find_available_mdev_instances(idxd, type); + + return sprintf(buf, "%d\n", count); +} +static MDEV_TYPE_ATTR_RO(available_instances); + +static ssize_t device_api_show(struct kobject *kobj, struct device *dev, + char *buf) +{ + return sprintf(buf, "%s\n", VFIO_DEVICE_API_PCI_STRING); +} +static MDEV_TYPE_ATTR_RO(device_api); + +static struct attribute *idxd_mdev_types_attrs[] = { + &mdev_type_attr_name.attr, + &mdev_type_attr_device_api.attr, + &mdev_type_attr_available_instances.attr, + NULL, +}; + +static struct attribute_group idxd_mdev_type_group0 = { + .name = "1dwq", + .attrs = idxd_mdev_types_attrs, +}; + +static struct attribute_group *idxd_mdev_type_groups[] = { + &idxd_mdev_type_group0, + NULL, +}; + static const struct mdev_parent_ops idxd_vdcm_ops = { + .supported_type_groups = idxd_mdev_type_groups, .create = idxd_vdcm_create, .remove = idxd_vdcm_remove, .open = idxd_vdcm_open, From patchwork Tue Jul 21 16:04:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11676033 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B4680618 for ; Tue, 21 Jul 2020 16:04:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A75532077D for ; Tue, 21 Jul 2020 16:04:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730327AbgGUQEF (ORCPT ); Tue, 21 Jul 2020 12:04:05 -0400 Received: from mga17.intel.com ([192.55.52.151]:14425 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730080AbgGUQED (ORCPT ); Tue, 21 Jul 2020 12:04:03 -0400 IronPort-SDR: pXVf3wa2xyMRS1o+iXuBjNzhdtT0c0D20ZR+9oFuKUhqz7/SsJbil8xKam0G3CiXhMFWp6PDXI 6Kyuu7E1mJQw== X-IronPort-AV: E=McAfee;i="6000,8403,9689"; a="130239905" X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="130239905" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2020 09:04:03 -0700 IronPort-SDR: di9UGr7zBb4po9MxWVUgchybeldRT6F2HWvHNaXAElWEkdKakEhqHDPOwpurfTC+pBGYZTsv2O Vs3ioLe2++vQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="287968306" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by orsmga006.jf.intel.com with ESMTP; 21 Jul 2020 09:04:01 -0700 Subject: [PATCH RFC v2 16/18] dmaengine: idxd: add new wq state for mdev From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, dave.hansen@intel.com, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Jul 2020 09:04:00 -0700 Message-ID: <159534744069.28840.5832146066812507209.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> References: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org When a dedicated wq is enabled as mdev, we must disable the wq on the device in order to program the pasid to the wq. Introduce a wq state IDXD_WQ_LOCKED that is software state only in order to prevent the user from modifying the configuration while mdev wq is in this state. While in this state, the wq is not in DISABLED state and will prevent any modifications to the configuration. It is also not in the ENABLED state and therefore prevents any actions allowed in the ENABLED state. Signed-off-by: Dave Jiang Reviewed-by: Kevin Tian --- drivers/dma/idxd/idxd.h | 1 + drivers/dma/idxd/mdev.c | 4 +++- drivers/dma/idxd/sysfs.c | 2 ++ 3 files changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index 1f03019bb45d..cc0665335aee 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -60,6 +60,7 @@ struct idxd_group { enum idxd_wq_state { IDXD_WQ_DISABLED = 0, IDXD_WQ_ENABLED, + IDXD_WQ_LOCKED, }; enum idxd_wq_flag { diff --git a/drivers/dma/idxd/mdev.c b/drivers/dma/idxd/mdev.c index 744adfdc06cd..e3c32f9566b5 100644 --- a/drivers/dma/idxd/mdev.c +++ b/drivers/dma/idxd/mdev.c @@ -69,8 +69,10 @@ static void idxd_vdcm_init(struct vdcm_idxd *vidxd) vidxd_mmio_init(vidxd); - if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED) + if (wq_dedicated(wq) && wq->state == IDXD_WQ_ENABLED) { idxd_wq_disable(wq, NULL); + wq->state = IDXD_WQ_LOCKED; + } } static void __idxd_vdcm_release(struct vdcm_idxd *vidxd) diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c index d3b0a95b0d1d..6344cc719897 100644 --- a/drivers/dma/idxd/sysfs.c +++ b/drivers/dma/idxd/sysfs.c @@ -822,6 +822,8 @@ static ssize_t wq_state_show(struct device *dev, return sprintf(buf, "disabled\n"); case IDXD_WQ_ENABLED: return sprintf(buf, "enabled\n"); + case IDXD_WQ_LOCKED: + return sprintf(buf, "locked\n"); } return sprintf(buf, "unknown\n"); From patchwork Tue Jul 21 16:04:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11676043 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C3685618 for ; Tue, 21 Jul 2020 16:04:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B58AB2077D for ; Tue, 21 Jul 2020 16:04:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730353AbgGUQE0 (ORCPT ); Tue, 21 Jul 2020 12:04:26 -0400 Received: from mga12.intel.com ([192.55.52.136]:60689 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730190AbgGUQEZ (ORCPT ); Tue, 21 Jul 2020 12:04:25 -0400 IronPort-SDR: BN+IzXx+bvOLUw5wa7acB/+Hm67ILI7AZDBXoVVTJxEmwYKQUoP3DV/MDUgzG3rMr20OH8WJpk GurFtnvQYNiQ== X-IronPort-AV: E=McAfee;i="6000,8403,9689"; a="129729280" X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="129729280" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2020 09:04:09 -0700 IronPort-SDR: uIhYV1BtmBsDEUUu449cmsykJiDu6nAJjNvn9+pmdhgXkuSprKae7Mz27UljJwqINePgn6rjvW EvUZsaEQyTaw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="270476704" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga007.fm.intel.com with ESMTP; 21 Jul 2020 09:04:08 -0700 Subject: [PATCH RFC v2 17/18] dmaengine: idxd: add error notification from host driver to mediated device From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, dave.hansen@intel.com, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Jul 2020 09:04:07 -0700 Message-ID: <159534744785.28840.7098006109300534327.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> References: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org When a device error occurs, the mediated device need to be notified in order to notify the guest of device error. Add support to notify the specific mdev when an error is wq specific and broadcast errors to all mdev when it's a generic device error. Signed-off-by: Dave Jiang Reviewed-by: Kevin Tian --- drivers/dma/idxd/idxd.h | 12 ++++++++++++ drivers/dma/idxd/irq.c | 4 ++++ drivers/dma/idxd/vdev.c | 34 ++++++++++++++++++++++++++++++++++ drivers/dma/idxd/vdev.h | 1 + 4 files changed, 51 insertions(+) diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h index cc0665335aee..4a3947b84f20 100644 --- a/drivers/dma/idxd/idxd.h +++ b/drivers/dma/idxd/idxd.h @@ -377,4 +377,16 @@ void idxd_wq_del_cdev(struct idxd_wq *wq); int idxd_mdev_host_init(struct idxd_device *idxd); void idxd_mdev_host_release(struct idxd_device *idxd); +#ifdef CONFIG_INTEL_IDXD_MDEV +void idxd_vidxd_send_errors(struct idxd_device *idxd); +void idxd_wq_vidxd_send_errors(struct idxd_wq *wq); +#else +static inline void idxd_vidxd_send_errors(struct idxd_device *idxd) +{ +} +static inline void idxd_wq_vidxd_send_errors(struct idxd_wq *wq) +{ +} +#endif /* CONFIG_INTEL_IDXD_MDEV */ + #endif diff --git a/drivers/dma/idxd/irq.c b/drivers/dma/idxd/irq.c index c818acb34a14..32e59ba85cd0 100644 --- a/drivers/dma/idxd/irq.c +++ b/drivers/dma/idxd/irq.c @@ -148,6 +148,8 @@ irqreturn_t idxd_misc_thread(int vec, void *data) if (wq->type == IDXD_WQT_USER) wake_up_interruptible(&wq->idxd_cdev.err_queue); + else if (wq->type == IDXD_WQT_MDEV) + idxd_wq_vidxd_send_errors(wq); } else { int i; @@ -156,6 +158,8 @@ irqreturn_t idxd_misc_thread(int vec, void *data) if (wq->type == IDXD_WQT_USER) wake_up_interruptible(&wq->idxd_cdev.err_queue); + else if (wq->type == IDXD_WQT_MDEV) + idxd_wq_vidxd_send_errors(wq); } } diff --git a/drivers/dma/idxd/vdev.c b/drivers/dma/idxd/vdev.c index 66e59cb02635..d87c5355e0cb 100644 --- a/drivers/dma/idxd/vdev.c +++ b/drivers/dma/idxd/vdev.c @@ -926,3 +926,37 @@ void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val) break; } } + +static void vidxd_send_errors(struct vdcm_idxd *vidxd) +{ + struct idxd_device *idxd = vidxd->idxd; + u8 *bar0 = vidxd->bar0; + union sw_err_reg *swerr = (union sw_err_reg *)(bar0 + IDXD_SWERR_OFFSET); + union genctrl_reg *genctrl = (union genctrl_reg *)(bar0 + IDXD_GENCTRL_OFFSET); + int i; + unsigned long flags; + + if (swerr->valid) { + if (!swerr->overflow) + swerr->overflow = 1; + return; + } + + spin_lock_irqsave(&idxd->dev_lock, flags); + for (i = 0; i < 4; i++) { + swerr->bits[i] = idxd->sw_err.bits[i]; + swerr++; + } + spin_unlock_irqrestore(&idxd->dev_lock, flags); + + if (genctrl->softerr_int_en) + vidxd_send_interrupt(vidxd, 0); +} + +void idxd_wq_vidxd_send_errors(struct idxd_wq *wq) +{ + struct vdcm_idxd *vidxd; + + list_for_each_entry(vidxd, &wq->vdcm_list, list) + vidxd_send_errors(vidxd); +} diff --git a/drivers/dma/idxd/vdev.h b/drivers/dma/idxd/vdev.h index 2dc8d22d3ea7..8ba7cbdb7e8b 100644 --- a/drivers/dma/idxd/vdev.h +++ b/drivers/dma/idxd/vdev.h @@ -23,5 +23,6 @@ int vidxd_disable_host_ims_pasid(struct vdcm_idxd *vidxd, int ims_idx); int vidxd_enable_host_ims_pasid(struct vdcm_idxd *vidxd, int ims_idx); int vidxd_send_interrupt(struct vdcm_idxd *vidxd, int msix_idx); void vidxd_do_command(struct vdcm_idxd *vidxd, u32 val); +void idxd_wq_vidxd_send_errors(struct idxd_wq *wq); #endif From patchwork Tue Jul 21 16:04:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Jiang X-Patchwork-Id: 11676049 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D2A291392 for ; Tue, 21 Jul 2020 16:04:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C5E0321702 for ; Tue, 21 Jul 2020 16:04:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730334AbgGUQEY (ORCPT ); Tue, 21 Jul 2020 12:04:24 -0400 Received: from mga09.intel.com ([134.134.136.24]:17661 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730190AbgGUQEX (ORCPT ); Tue, 21 Jul 2020 12:04:23 -0400 IronPort-SDR: XQLv4MZvQY6h/6QqQrwL1HUwPAU2LD/OXVdNX+DaDn+lgzYav59W0Tba6Rup2r/wdjA+QUXJb5 1bpmPBTBCqUA== X-IronPort-AV: E=McAfee;i="6000,8403,9689"; a="151499224" X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="151499224" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jul 2020 09:04:17 -0700 IronPort-SDR: e1G/riTdKBZmtlGuH16tlJmv6ftO/0opkm3RNbIwWYTxk34fKp+bvBUMSL7lgxcbyvJk8AMqXw dh0oRj/QGmuQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,379,1589266800"; d="scan'208";a="319952585" Received: from djiang5-desk3.ch.intel.com ([143.182.136.137]) by fmsmga002.fm.intel.com with ESMTP; 21 Jul 2020 09:04:14 -0700 Subject: [PATCH RFC v2 18/18] dmaengine: idxd: add ABI documentation for mediated device support From: Dave Jiang To: vkoul@kernel.org, megha.dey@intel.com, maz@kernel.org, bhelgaas@google.com, rafael@kernel.org, gregkh@linuxfoundation.org, tglx@linutronix.de, hpa@zytor.com, alex.williamson@redhat.com, jacob.jun.pan@intel.com, ashok.raj@intel.com, jgg@mellanox.com, yi.l.liu@intel.com, baolu.lu@intel.com, kevin.tian@intel.com, sanjay.k.kumar@intel.com, tony.luck@intel.com, jing.lin@intel.com, dan.j.williams@intel.com, kwankhede@nvidia.com, eric.auger@redhat.com, parav@mellanox.com, jgg@mellanox.com, rafael@kernel.org, dave.hansen@intel.com, netanelg@mellanox.com, shahafs@mellanox.com, yan.y.zhao@linux.intel.com, pbonzini@redhat.com, samuel.ortiz@intel.com, mona.hossain@intel.com Cc: dmaengine@vger.kernel.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-pci@vger.kernel.org, kvm@vger.kernel.org Date: Tue, 21 Jul 2020 09:04:14 -0700 Message-ID: <159534745419.28840.5543479944375440474.stgit@djiang5-desk3.ch.intel.com> In-Reply-To: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> References: <159534667974.28840.2045034360240786644.stgit@djiang5-desk3.ch.intel.com> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Jing Lin Add the sysfs attribute bits in ABI/stable for mediated device and guest support. Signed-off-by: Jing Lin Signed-off-by: Dave Jiang Reviewed-by: Kevin Tian --- Documentation/ABI/stable/sysfs-driver-dma-idxd | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/Documentation/ABI/stable/sysfs-driver-dma-idxd b/Documentation/ABI/stable/sysfs-driver-dma-idxd index a336d2b2009a..edbcf8a8796c 100644 --- a/Documentation/ABI/stable/sysfs-driver-dma-idxd +++ b/Documentation/ABI/stable/sysfs-driver-dma-idxd @@ -1,6 +1,6 @@ What: /sys/bus/dsa/devices/dsa/version -Date: Apr 15, 2020 -KernelVersion: 5.8.0 +Date: Jul 5, 2020 +KernelVersion: 5.9.0 Contact: dmaengine@vger.kernel.org Description: The hardware version number. @@ -84,6 +84,12 @@ Contact: dmaengine@vger.kernel.org Description: To indicate if PASID (process address space identifier) is enabled or not for this device. +What: /sys/bus/dsa/devices/dsa/ims_size +Date: Jul 5, 2020 +KernelVersion: 5.9.0 +Contact: dmaengine@vger.kernel.org +Description: Number of entries in the interrupt message storage table. + What: /sys/bus/dsa/devices/dsa/state Date: Oct 25, 2019 KernelVersion: 5.6.0 @@ -147,8 +153,9 @@ Date: Oct 25, 2019 KernelVersion: 5.6.0 Contact: dmaengine@vger.kernel.org Description: The type of this work queue, it can be "kernel" type for work - queue usages in the kernel space or "user" type for work queue - usages by applications in user space. + queue usages in the kernel space, "user" type for work queue + usages by applications in user space, or "mdev" type for + VFIO mediated devices. What: /sys/bus/dsa/devices/wq./cdev_minor Date: Oct 25, 2019