From patchwork Thu Sep 5 07:59:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11134725 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 98DA114ED for ; Fri, 6 Sep 2019 08:18:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 8104F2081B for ; Fri, 6 Sep 2019 08:18:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390332AbfIFIRZ (ORCPT ); Fri, 6 Sep 2019 04:17:25 -0400 Received: from mga11.intel.com ([192.55.52.93]:23720 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725945AbfIFIRY (ORCPT ); Fri, 6 Sep 2019 04:17:24 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Sep 2019 01:17:24 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,472,1559545200"; d="scan'208";a="383186108" Received: from yiliu-dev.bj.intel.com ([10.238.156.139]) by fmsmga005.fm.intel.com with ESMTP; 06 Sep 2019 01:17:22 -0700 From: Liu Yi L To: alex.williamson@redhat.com, kwankhede@nvidia.com Cc: kevin.tian@intel.com, baolu.lu@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, joro@8bytes.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, yan.y.zhao@intel.com, shaopeng.he@intel.com, chenbo.xia@intel.com, jun.j.tian@intel.com Subject: [PATCH v2 01/13] vfio_pci: move vfio_pci_is_vga/vfio_vga_disabled to header Date: Thu, 5 Sep 2019 15:59:18 +0800 Message-Id: <1567670370-4484-2-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> References: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch fix an issue regards to always_inline. e.g.: "error: inlining failed in call to always_inline ‘vfio_pci_is_vga’: function body not available". Cc: Kevin Tian Cc: Lu Baolu Signed-off-by: Liu Yi L --- drivers/vfio/pci/vfio_pci.c | 14 -------------- drivers/vfio/pci/vfio_pci_private.h | 14 ++++++++++++++ 2 files changed, 14 insertions(+), 14 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 703948c..38271df 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -54,15 +54,6 @@ module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(disable_idle_d3, "Disable using the PCI D3 low power state for idle, unused devices"); -static inline bool vfio_vga_disabled(void) -{ -#ifdef CONFIG_VFIO_PCI_VGA - return disable_vga; -#else - return true; -#endif -} - /* * Our VGA arbiter participation is limited since we don't know anything * about the device itself. However, if the device is the only VGA device @@ -102,11 +93,6 @@ static unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga) return decodes; } -static inline bool vfio_pci_is_vga(struct pci_dev *pdev) -{ - return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA; -} - static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev) { struct resource *res; diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index ee6ee91..f12d92c 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -130,6 +130,20 @@ struct vfio_pci_device { #define is_irq_none(vdev) (!(is_intx(vdev) || is_msi(vdev) || is_msix(vdev))) #define irq_is(vdev, type) (vdev->irq_type == type) +static inline bool vfio_pci_is_vga(struct pci_dev *pdev) +{ + return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA; +} + +static inline bool vfio_vga_disabled(void) +{ +#ifdef CONFIG_VFIO_PCI_VGA + return disable_vga; +#else + return true; +#endif +} + extern void vfio_pci_intx_mask(struct vfio_pci_device *vdev); extern void vfio_pci_intx_unmask(struct vfio_pci_device *vdev); From patchwork Thu Sep 5 07:59:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11134723 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2EF9714ED for ; Fri, 6 Sep 2019 08:18:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 190E02082C for ; Fri, 6 Sep 2019 08:18:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390318AbfIFIR2 (ORCPT ); Fri, 6 Sep 2019 04:17:28 -0400 Received: from mga11.intel.com ([192.55.52.93]:23720 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389928AbfIFIR1 (ORCPT ); Fri, 6 Sep 2019 04:17:27 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Sep 2019 01:17:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,472,1559545200"; d="scan'208";a="383186125" Received: from yiliu-dev.bj.intel.com ([10.238.156.139]) by fmsmga005.fm.intel.com with ESMTP; 06 Sep 2019 01:17:24 -0700 From: Liu Yi L To: alex.williamson@redhat.com, kwankhede@nvidia.com Cc: kevin.tian@intel.com, baolu.lu@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, joro@8bytes.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, yan.y.zhao@intel.com, shaopeng.he@intel.com, chenbo.xia@intel.com, jun.j.tian@intel.com Subject: [PATCH v2 02/13] vfio_pci: refine user config reference in vfio-pci module Date: Thu, 5 Sep 2019 15:59:19 +0800 Message-Id: <1567670370-4484-3-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> References: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds three fields in struct vfio_pci_device to pass the user configs of vfio-pci module to some functions which could be common in future usage. Cc: Kevin Tian Cc: Lu Baolu Signed-off-by: Liu Yi L --- drivers/vfio/pci/vfio_pci.c | 24 +++++++++++++++--------- drivers/vfio/pci/vfio_pci_private.h | 9 +++++++-- 2 files changed, 22 insertions(+), 11 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 38271df..fed2687 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -69,7 +69,8 @@ static unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga) unsigned char max_busnr; unsigned int decodes; - if (single_vga || !vfio_vga_disabled() || pci_is_root_bus(pdev->bus)) + if (single_vga || !vfio_vga_disabled(vdev) || + pci_is_root_bus(pdev->bus)) return VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM | VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM; @@ -273,7 +274,7 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev) if (!vdev->pci_saved_state) pci_dbg(pdev, "%s: Couldn't store saved state\n", __func__); - if (likely(!nointxmask)) { + if (likely(!vdev->nointxmask)) { if (vfio_pci_nointx(pdev)) { pci_info(pdev, "Masking broken INTx support\n"); vdev->nointx = true; @@ -310,7 +311,7 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev) } else vdev->msix_bar = 0xFF; - if (!vfio_vga_disabled() && vfio_pci_is_vga(pdev)) + if (!vfio_vga_disabled(vdev) && vfio_pci_is_vga(pdev)) vdev->has_vga = true; @@ -436,7 +437,7 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev) vfio_pci_try_bus_reset(vdev); - if (!disable_idle_d3) + if (!vdev->disable_idle_d3) vfio_pci_set_power_state(vdev, PCI_D3hot); } @@ -1304,6 +1305,11 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) spin_lock_init(&vdev->irqlock); mutex_init(&vdev->ioeventfds_lock); INIT_LIST_HEAD(&vdev->ioeventfds_list); + vdev->nointxmask = nointxmask; +#ifdef CONFIG_VFIO_PCI_VGA + vdev->disable_vga = disable_vga; +#endif + vdev->disable_idle_d3 = disable_idle_d3; ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev); if (ret) { @@ -1328,7 +1334,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) vfio_pci_probe_power_state(vdev); - if (!disable_idle_d3) { + if (!vdev->disable_idle_d3) { /* * pci-core sets the device power state to an unknown value at * bootup and after being removed from a driver. The only @@ -1359,7 +1365,7 @@ static void vfio_pci_remove(struct pci_dev *pdev) kfree(vdev->region); mutex_destroy(&vdev->ioeventfds_lock); - if (!disable_idle_d3) + if (!vdev->disable_idle_d3) vfio_pci_set_power_state(vdev, PCI_D0); kfree(vdev->pm_save); @@ -1594,7 +1600,7 @@ static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev) if (!ret) { tmp->needs_reset = false; - if (tmp != vdev && !disable_idle_d3) + if (tmp != vdev && !tmp->disable_idle_d3) vfio_pci_set_power_state(tmp, PCI_D3hot); } @@ -1610,7 +1616,7 @@ static void __exit vfio_pci_cleanup(void) vfio_pci_uninit_perm_bits(); } -static void __init vfio_pci_fill_ids(void) +static void __init vfio_pci_fill_ids(char *ids) { char *p, *id; int rc; @@ -1665,7 +1671,7 @@ static int __init vfio_pci_init(void) if (ret) goto out_driver; - vfio_pci_fill_ids(); + vfio_pci_fill_ids(&ids[0]); return 0; diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index f12d92c..68521d2 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -122,6 +122,11 @@ struct vfio_pci_device { struct list_head dummy_resources_list; struct mutex ioeventfds_lock; struct list_head ioeventfds_list; + bool nointxmask; +#ifdef CONFIG_VFIO_PCI_VGA + bool disable_vga; +#endif + bool disable_idle_d3; }; #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX) @@ -135,10 +140,10 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev) return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA; } -static inline bool vfio_vga_disabled(void) +static inline bool vfio_vga_disabled(struct vfio_pci_device *vdev) { #ifdef CONFIG_VFIO_PCI_VGA - return disable_vga; + return vdev->disable_vga; #else return true; #endif From patchwork Thu Sep 5 07:59:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11134697 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F24D31398 for ; Fri, 6 Sep 2019 08:17:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id DB88A2081B for ; Fri, 6 Sep 2019 08:17:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391103AbfIFIRa (ORCPT ); Fri, 6 Sep 2019 04:17:30 -0400 Received: from mga11.intel.com ([192.55.52.93]:23720 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390979AbfIFIR3 (ORCPT ); Fri, 6 Sep 2019 04:17:29 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Sep 2019 01:17:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,472,1559545200"; d="scan'208";a="383186132" Received: from yiliu-dev.bj.intel.com ([10.238.156.139]) by fmsmga005.fm.intel.com with ESMTP; 06 Sep 2019 01:17:27 -0700 From: Liu Yi L To: alex.williamson@redhat.com, kwankhede@nvidia.com Cc: kevin.tian@intel.com, baolu.lu@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, joro@8bytes.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, yan.y.zhao@intel.com, shaopeng.he@intel.com, chenbo.xia@intel.com, jun.j.tian@intel.com Subject: [PATCH v2 03/13] vfio_pci: refine vfio_pci_driver reference in vfio_pci.c Date: Thu, 5 Sep 2019 15:59:20 +0800 Message-Id: <1567670370-4484-4-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> References: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch replaces the vfio_pci_driver reference in vfio_pci.c with pci_dev_driver(vdev->pdev) which is more helpful to make the functions be generic to module types. Cc: Kevin Tian Cc: Lu Baolu Signed-off-by: Liu Yi L --- drivers/vfio/pci/vfio_pci.c | 33 ++++++++++++++++++--------------- 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index fed2687..722c041 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -1442,24 +1442,25 @@ static void vfio_pci_reflck_get(struct vfio_pci_reflck *reflck) static int vfio_pci_reflck_find(struct pci_dev *pdev, void *data) { - struct vfio_pci_reflck **preflck = data; + struct vfio_pci_device *vdev = data; + struct vfio_pci_reflck **preflck = &vdev->reflck; struct vfio_device *device; - struct vfio_pci_device *vdev; + struct vfio_pci_device *tmp; device = vfio_device_get_from_dev(&pdev->dev); if (!device) return 0; - if (pci_dev_driver(pdev) != &vfio_pci_driver) { + if (pci_dev_driver(pdev) != pci_dev_driver(vdev->pdev)) { vfio_device_put(device); return 0; } - vdev = vfio_device_data(device); + tmp = vfio_device_data(device); - if (vdev->reflck) { - vfio_pci_reflck_get(vdev->reflck); - *preflck = vdev->reflck; + if (tmp->reflck) { + vfio_pci_reflck_get(tmp->reflck); + *preflck = tmp->reflck; vfio_device_put(device); return 1; } @@ -1476,7 +1477,7 @@ static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev) if (pci_is_root_bus(vdev->pdev->bus) || vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_reflck_find, - &vdev->reflck, slot) <= 0) + vdev, slot) <= 0) vdev->reflck = vfio_pci_reflck_alloc(); mutex_unlock(&reflck_lock); @@ -1501,6 +1502,7 @@ static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck) struct vfio_devices { struct vfio_device **devices; + struct vfio_pci_device *vdev; int cur_index; int max_index; }; @@ -1509,7 +1511,7 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data) { struct vfio_devices *devs = data; struct vfio_device *device; - struct vfio_pci_device *vdev; + struct vfio_pci_device *tmp; if (devs->cur_index == devs->max_index) return -ENOSPC; @@ -1518,15 +1520,15 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data) if (!device) return -EINVAL; - if (pci_dev_driver(pdev) != &vfio_pci_driver) { + if (pci_dev_driver(pdev) != pci_dev_driver(devs->vdev->pdev)) { vfio_device_put(device); return -EBUSY; } - vdev = vfio_device_data(device); + tmp = vfio_device_data(device); /* Fault if the device is not unused */ - if (vdev->refcnt) { + if (tmp->refcnt) { vfio_device_put(device); return -EBUSY; } @@ -1572,6 +1574,7 @@ static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev) if (!devs.devices) return; + devs.vdev = vdev; if (vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_get_unused_devs, &devs, slot)) @@ -1616,7 +1619,7 @@ static void __exit vfio_pci_cleanup(void) vfio_pci_uninit_perm_bits(); } -static void __init vfio_pci_fill_ids(char *ids) +static void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver) { char *p, *id; int rc; @@ -1644,7 +1647,7 @@ static void __init vfio_pci_fill_ids(char *ids) continue; } - rc = pci_add_dynid(&vfio_pci_driver, vendor, device, + rc = pci_add_dynid(driver, vendor, device, subvendor, subdevice, class, class_mask, 0); if (rc) pr_warn("failed to add dynamic id [%04x:%04x[%04x:%04x]] class %#08x/%08x (%d)\n", @@ -1671,7 +1674,7 @@ static int __init vfio_pci_init(void) if (ret) goto out_driver; - vfio_pci_fill_ids(&ids[0]); + vfio_pci_fill_ids(&ids[0], &vfio_pci_driver); return 0; From patchwork Thu Sep 5 07:59:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11134721 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C84D31398 for ; Fri, 6 Sep 2019 08:18:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B35732082C for ; Fri, 6 Sep 2019 08:18:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390609AbfIFIRf (ORCPT ); Fri, 6 Sep 2019 04:17:35 -0400 Received: from mga11.intel.com ([192.55.52.93]:23720 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391119AbfIFIRc (ORCPT ); Fri, 6 Sep 2019 04:17:32 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Sep 2019 01:17:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,472,1559545200"; d="scan'208";a="383186139" Received: from yiliu-dev.bj.intel.com ([10.238.156.139]) by fmsmga005.fm.intel.com with ESMTP; 06 Sep 2019 01:17:29 -0700 From: Liu Yi L To: alex.williamson@redhat.com, kwankhede@nvidia.com Cc: kevin.tian@intel.com, baolu.lu@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, joro@8bytes.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, yan.y.zhao@intel.com, shaopeng.he@intel.com, chenbo.xia@intel.com, jun.j.tian@intel.com Subject: [PATCH v2 04/13] vfio_pci: make common functions be extern Date: Thu, 5 Sep 2019 15:59:21 +0800 Message-Id: <1567670370-4484-5-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> References: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch makes the common functions (module agnostic functions) in vfio_pci.c to extern. So that such functions could be moved to a common source file. *) vfio_pci_set_vga_decode *) vfio_pci_enable *) vfio_pci_disable *) vfio_pci_ioctl *) vfio_pci_read *) vfio_pci_write *) vfio_pci_mmap *) vfio_pci_request *) vfio_pci_fill_ids *) vfio_pci_reflck_attach *) vfio_pci_reflck_put *) vfio_pci_probe_power_state Cc: Kevin Tian Cc: Lu Baolu Signed-off-by: Liu Yi L --- drivers/vfio/pci/vfio_pci.c | 30 +++++++++++++----------------- drivers/vfio/pci/vfio_pci_private.h | 15 +++++++++++++++ 2 files changed, 28 insertions(+), 17 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 722c041..90a44a7 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -62,7 +62,7 @@ MODULE_PARM_DESC(disable_idle_d3, * has no way to get to it and routing can be disabled externally at the * bridge. */ -static unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga) +unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga) { struct vfio_pci_device *vdev = opaque; struct pci_dev *tmp = NULL, *pdev = vdev->pdev; @@ -163,7 +163,6 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev) } static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev); -static void vfio_pci_disable(struct vfio_pci_device *vdev); /* * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND @@ -194,7 +193,7 @@ static bool vfio_pci_nointx(struct pci_dev *pdev) return false; } -static void vfio_pci_probe_power_state(struct vfio_pci_device *vdev) +void vfio_pci_probe_power_state(struct vfio_pci_device *vdev) { struct pci_dev *pdev = vdev->pdev; u16 pmcsr; @@ -245,7 +244,7 @@ int vfio_pci_set_power_state(struct vfio_pci_device *vdev, pci_power_t state) return ret; } -static int vfio_pci_enable(struct vfio_pci_device *vdev) +int vfio_pci_enable(struct vfio_pci_device *vdev) { struct pci_dev *pdev = vdev->pdev; int ret; @@ -352,7 +351,7 @@ static int vfio_pci_enable(struct vfio_pci_device *vdev) return ret; } -static void vfio_pci_disable(struct vfio_pci_device *vdev) +void vfio_pci_disable(struct vfio_pci_device *vdev) { struct pci_dev *pdev = vdev->pdev; struct vfio_pci_dummy_resource *dummy_res, *tmp; @@ -666,8 +665,8 @@ int vfio_pci_register_dev_region(struct vfio_pci_device *vdev, return 0; } -static long vfio_pci_ioctl(void *device_data, - unsigned int cmd, unsigned long arg) +long vfio_pci_ioctl(void *device_data, + unsigned int cmd, unsigned long arg) { struct vfio_pci_device *vdev = device_data; unsigned long minsz; @@ -1152,7 +1151,7 @@ static ssize_t vfio_pci_rw(void *device_data, char __user *buf, return -EINVAL; } -static ssize_t vfio_pci_read(void *device_data, char __user *buf, +ssize_t vfio_pci_read(void *device_data, char __user *buf, size_t count, loff_t *ppos) { if (!count) @@ -1161,7 +1160,7 @@ static ssize_t vfio_pci_read(void *device_data, char __user *buf, return vfio_pci_rw(device_data, buf, count, ppos, false); } -static ssize_t vfio_pci_write(void *device_data, const char __user *buf, +ssize_t vfio_pci_write(void *device_data, const char __user *buf, size_t count, loff_t *ppos) { if (!count) @@ -1170,7 +1169,7 @@ static ssize_t vfio_pci_write(void *device_data, const char __user *buf, return vfio_pci_rw(device_data, (char __user *)buf, count, ppos, true); } -static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) +int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) { struct vfio_pci_device *vdev = device_data; struct pci_dev *pdev = vdev->pdev; @@ -1232,7 +1231,7 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) req_len, vma->vm_page_prot); } -static void vfio_pci_request(void *device_data, unsigned int count) +void vfio_pci_request(void *device_data, unsigned int count) { struct vfio_pci_device *vdev = device_data; struct pci_dev *pdev = vdev->pdev; @@ -1264,9 +1263,6 @@ static const struct vfio_device_ops vfio_pci_ops = { .request = vfio_pci_request, }; -static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev); -static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck); - static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) { struct vfio_pci_device *vdev; @@ -1469,7 +1465,7 @@ static int vfio_pci_reflck_find(struct pci_dev *pdev, void *data) return 0; } -static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev) +int vfio_pci_reflck_attach(struct vfio_pci_device *vdev) { bool slot = !pci_probe_reset_slot(vdev->pdev->slot); @@ -1495,7 +1491,7 @@ static void vfio_pci_reflck_release(struct kref *kref) mutex_unlock(&reflck_lock); } -static void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck) +void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck) { kref_put_mutex(&reflck->kref, vfio_pci_reflck_release, &reflck_lock); } @@ -1619,7 +1615,7 @@ static void __exit vfio_pci_cleanup(void) vfio_pci_uninit_perm_bits(); } -static void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver) +void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver) { char *p, *id; int rc; diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index 68521d2..53d0385 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -182,6 +182,21 @@ extern int vfio_pci_register_dev_region(struct vfio_pci_device *vdev, extern int vfio_pci_set_power_state(struct vfio_pci_device *vdev, pci_power_t state); +extern unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga); +extern int vfio_pci_enable(struct vfio_pci_device *vdev); +extern void vfio_pci_disable(struct vfio_pci_device *vdev); +extern long vfio_pci_ioctl(void *device_data, + unsigned int cmd, unsigned long arg); +extern ssize_t vfio_pci_read(void *device_data, char __user *buf, + size_t count, loff_t *ppos); +extern ssize_t vfio_pci_write(void *device_data, const char __user *buf, + size_t count, loff_t *ppos); +extern int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma); +extern void vfio_pci_request(void *device_data, unsigned int count); +extern void vfio_pci_fill_ids(char *ids, struct pci_driver *driver); +extern int vfio_pci_reflck_attach(struct vfio_pci_device *vdev); +extern void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck); +extern void vfio_pci_probe_power_state(struct vfio_pci_device *vdev); #ifdef CONFIG_VFIO_PCI_IGD extern int vfio_pci_igd_init(struct vfio_pci_device *vdev); From patchwork Thu Sep 5 07:59:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11134719 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F19E314ED for ; Fri, 6 Sep 2019 08:18:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BA69F2082C for ; Fri, 6 Sep 2019 08:18:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391175AbfIFIRh (ORCPT ); Fri, 6 Sep 2019 04:17:37 -0400 Received: from mga11.intel.com ([192.55.52.93]:23720 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391156AbfIFIRg (ORCPT ); Fri, 6 Sep 2019 04:17:36 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Sep 2019 01:17:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,472,1559545200"; d="scan'208";a="383186148" Received: from yiliu-dev.bj.intel.com ([10.238.156.139]) by fmsmga005.fm.intel.com with ESMTP; 06 Sep 2019 01:17:32 -0700 From: Liu Yi L To: alex.williamson@redhat.com, kwankhede@nvidia.com Cc: kevin.tian@intel.com, baolu.lu@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, joro@8bytes.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, yan.y.zhao@intel.com, shaopeng.he@intel.com, chenbo.xia@intel.com, jun.j.tian@intel.com Subject: [PATCH v2 05/13] vfio_pci: duplicate vfio_pci.c Date: Thu, 5 Sep 2019 15:59:22 +0800 Message-Id: <1567670370-4484-6-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> References: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch has no code change, just a file copy. In following patches, vfio_pci_common.c will be modified to only include the common functions and related static functions in original vfio_pci.c. Meanwhile, vfio_pci.c will be modified to only include vfio-pci module specific codes. Cc: Kevin Tian Cc: Lu Baolu Signed-off-by: Liu Yi L --- drivers/vfio/pci/vfio_pci_common.c | 1688 ++++++++++++++++++++++++++++++++++++ 1 file changed, 1688 insertions(+) create mode 100644 drivers/vfio/pci/vfio_pci_common.c diff --git a/drivers/vfio/pci/vfio_pci_common.c b/drivers/vfio/pci/vfio_pci_common.c new file mode 100644 index 0000000..90a44a7 --- /dev/null +++ b/drivers/vfio/pci/vfio_pci_common.c @@ -0,0 +1,1688 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2012 Red Hat, Inc. All rights reserved. + * Author: Alex Williamson + * + * Derived from original vfio: + * Copyright 2010 Cisco Systems, Inc. All rights reserved. + * Author: Tom Lyon, pugs@cisco.com + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt +#define dev_fmt pr_fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "vfio_pci_private.h" + +#define DRIVER_VERSION "0.2" +#define DRIVER_AUTHOR "Alex Williamson " +#define DRIVER_DESC "VFIO PCI - User Level meta-driver" + +static char ids[1024] __initdata; +module_param_string(ids, ids, sizeof(ids), 0); +MODULE_PARM_DESC(ids, "Initial PCI IDs to add to the vfio driver, format is \"vendor:device[:subvendor[:subdevice[:class[:class_mask]]]]\" and multiple comma separated entries can be specified"); + +static bool nointxmask; +module_param_named(nointxmask, nointxmask, bool, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(nointxmask, + "Disable support for PCI 2.3 style INTx masking. If this resolves problems for specific devices, report lspci -vvvxxx to linux-pci@vger.kernel.org so the device can be fixed automatically via the broken_intx_masking flag."); + +#ifdef CONFIG_VFIO_PCI_VGA +static bool disable_vga; +module_param(disable_vga, bool, S_IRUGO); +MODULE_PARM_DESC(disable_vga, "Disable VGA resource access through vfio-pci"); +#endif + +static bool disable_idle_d3; +module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(disable_idle_d3, + "Disable using the PCI D3 low power state for idle, unused devices"); + +/* + * Our VGA arbiter participation is limited since we don't know anything + * about the device itself. However, if the device is the only VGA device + * downstream of a bridge and VFIO VGA support is disabled, then we can + * safely return legacy VGA IO and memory as not decoded since the user + * has no way to get to it and routing can be disabled externally at the + * bridge. + */ +unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga) +{ + struct vfio_pci_device *vdev = opaque; + struct pci_dev *tmp = NULL, *pdev = vdev->pdev; + unsigned char max_busnr; + unsigned int decodes; + + if (single_vga || !vfio_vga_disabled(vdev) || + pci_is_root_bus(pdev->bus)) + return VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM | + VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM; + + max_busnr = pci_bus_max_busnr(pdev->bus); + decodes = VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM; + + while ((tmp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, tmp)) != NULL) { + if (tmp == pdev || + pci_domain_nr(tmp->bus) != pci_domain_nr(pdev->bus) || + pci_is_root_bus(tmp->bus)) + continue; + + if (tmp->bus->number >= pdev->bus->number && + tmp->bus->number <= max_busnr) { + pci_dev_put(tmp); + decodes |= VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM; + break; + } + } + + return decodes; +} + +static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev) +{ + struct resource *res; + int bar; + struct vfio_pci_dummy_resource *dummy_res; + + INIT_LIST_HEAD(&vdev->dummy_resources_list); + + for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; bar++) { + res = vdev->pdev->resource + bar; + + if (!IS_ENABLED(CONFIG_VFIO_PCI_MMAP)) + goto no_mmap; + + if (!(res->flags & IORESOURCE_MEM)) + goto no_mmap; + + /* + * The PCI core shouldn't set up a resource with a + * type but zero size. But there may be bugs that + * cause us to do that. + */ + if (!resource_size(res)) + goto no_mmap; + + if (resource_size(res) >= PAGE_SIZE) { + vdev->bar_mmap_supported[bar] = true; + continue; + } + + if (!(res->start & ~PAGE_MASK)) { + /* + * Add a dummy resource to reserve the remainder + * of the exclusive page in case that hot-add + * device's bar is assigned into it. + */ + dummy_res = kzalloc(sizeof(*dummy_res), GFP_KERNEL); + if (dummy_res == NULL) + goto no_mmap; + + dummy_res->resource.name = "vfio sub-page reserved"; + dummy_res->resource.start = res->end + 1; + dummy_res->resource.end = res->start + PAGE_SIZE - 1; + dummy_res->resource.flags = res->flags; + if (request_resource(res->parent, + &dummy_res->resource)) { + kfree(dummy_res); + goto no_mmap; + } + dummy_res->index = bar; + list_add(&dummy_res->res_next, + &vdev->dummy_resources_list); + vdev->bar_mmap_supported[bar] = true; + continue; + } + /* + * Here we don't handle the case when the BAR is not page + * aligned because we can't expect the BAR will be + * assigned into the same location in a page in guest + * when we passthrough the BAR. And it's hard to access + * this BAR in userspace because we have no way to get + * the BAR's location in a page. + */ +no_mmap: + vdev->bar_mmap_supported[bar] = false; + } +} + +static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev); + +/* + * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND + * _and_ the ability detect when the device is asserting INTx via PCI_STATUS. + * If a device implements the former but not the latter we would typically + * expect broken_intx_masking be set and require an exclusive interrupt. + * However since we do have control of the device's ability to assert INTx, + * we can instead pretend that the device does not implement INTx, virtualizing + * the pin register to report zero and maintaining DisINTx set on the host. + */ +static bool vfio_pci_nointx(struct pci_dev *pdev) +{ + switch (pdev->vendor) { + case PCI_VENDOR_ID_INTEL: + switch (pdev->device) { + /* All i40e (XL710/X710/XXV710) 10/20/25/40GbE NICs */ + case 0x1572: + case 0x1574: + case 0x1580 ... 0x1581: + case 0x1583 ... 0x158b: + case 0x37d0 ... 0x37d2: + return true; + default: + return false; + } + } + + return false; +} + +void vfio_pci_probe_power_state(struct vfio_pci_device *vdev) +{ + struct pci_dev *pdev = vdev->pdev; + u16 pmcsr; + + if (!pdev->pm_cap) + return; + + pci_read_config_word(pdev, pdev->pm_cap + PCI_PM_CTRL, &pmcsr); + + vdev->needs_pm_restore = !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET); +} + +/* + * pci_set_power_state() wrapper handling devices which perform a soft reset on + * D3->D0 transition. Save state prior to D0/1/2->D3, stash it on the vdev, + * restore when returned to D0. Saved separately from pci_saved_state for use + * by PM capability emulation and separately from pci_dev internal saved state + * to avoid it being overwritten and consumed around other resets. + */ +int vfio_pci_set_power_state(struct vfio_pci_device *vdev, pci_power_t state) +{ + struct pci_dev *pdev = vdev->pdev; + bool needs_restore = false, needs_save = false; + int ret; + + if (vdev->needs_pm_restore) { + if (pdev->current_state < PCI_D3hot && state >= PCI_D3hot) { + pci_save_state(pdev); + needs_save = true; + } + + if (pdev->current_state >= PCI_D3hot && state <= PCI_D0) + needs_restore = true; + } + + ret = pci_set_power_state(pdev, state); + + if (!ret) { + /* D3 might be unsupported via quirk, skip unless in D3 */ + if (needs_save && pdev->current_state >= PCI_D3hot) { + vdev->pm_save = pci_store_saved_state(pdev); + } else if (needs_restore) { + pci_load_and_free_saved_state(pdev, &vdev->pm_save); + pci_restore_state(pdev); + } + } + + return ret; +} + +int vfio_pci_enable(struct vfio_pci_device *vdev) +{ + struct pci_dev *pdev = vdev->pdev; + int ret; + u16 cmd; + u8 msix_pos; + + vfio_pci_set_power_state(vdev, PCI_D0); + + /* Don't allow our initial saved state to include busmaster */ + pci_clear_master(pdev); + + ret = pci_enable_device(pdev); + if (ret) + return ret; + + /* If reset fails because of the device lock, fail this path entirely */ + ret = pci_try_reset_function(pdev); + if (ret == -EAGAIN) { + pci_disable_device(pdev); + return ret; + } + + vdev->reset_works = !ret; + pci_save_state(pdev); + vdev->pci_saved_state = pci_store_saved_state(pdev); + if (!vdev->pci_saved_state) + pci_dbg(pdev, "%s: Couldn't store saved state\n", __func__); + + if (likely(!vdev->nointxmask)) { + if (vfio_pci_nointx(pdev)) { + pci_info(pdev, "Masking broken INTx support\n"); + vdev->nointx = true; + pci_intx(pdev, 0); + } else + vdev->pci_2_3 = pci_intx_mask_supported(pdev); + } + + pci_read_config_word(pdev, PCI_COMMAND, &cmd); + if (vdev->pci_2_3 && (cmd & PCI_COMMAND_INTX_DISABLE)) { + cmd &= ~PCI_COMMAND_INTX_DISABLE; + pci_write_config_word(pdev, PCI_COMMAND, cmd); + } + + ret = vfio_config_init(vdev); + if (ret) { + kfree(vdev->pci_saved_state); + vdev->pci_saved_state = NULL; + pci_disable_device(pdev); + return ret; + } + + msix_pos = pdev->msix_cap; + if (msix_pos) { + u16 flags; + u32 table; + + pci_read_config_word(pdev, msix_pos + PCI_MSIX_FLAGS, &flags); + pci_read_config_dword(pdev, msix_pos + PCI_MSIX_TABLE, &table); + + vdev->msix_bar = table & PCI_MSIX_TABLE_BIR; + vdev->msix_offset = table & PCI_MSIX_TABLE_OFFSET; + vdev->msix_size = ((flags & PCI_MSIX_FLAGS_QSIZE) + 1) * 16; + } else + vdev->msix_bar = 0xFF; + + if (!vfio_vga_disabled(vdev) && vfio_pci_is_vga(pdev)) + vdev->has_vga = true; + + + if (vfio_pci_is_vga(pdev) && + pdev->vendor == PCI_VENDOR_ID_INTEL && + IS_ENABLED(CONFIG_VFIO_PCI_IGD)) { + ret = vfio_pci_igd_init(vdev); + if (ret) { + pci_warn(pdev, "Failed to setup Intel IGD regions\n"); + goto disable_exit; + } + } + + if (pdev->vendor == PCI_VENDOR_ID_NVIDIA && + IS_ENABLED(CONFIG_VFIO_PCI_NVLINK2)) { + ret = vfio_pci_nvdia_v100_nvlink2_init(vdev); + if (ret && ret != -ENODEV) { + pci_warn(pdev, "Failed to setup NVIDIA NV2 RAM region\n"); + goto disable_exit; + } + } + + if (pdev->vendor == PCI_VENDOR_ID_IBM && + IS_ENABLED(CONFIG_VFIO_PCI_NVLINK2)) { + ret = vfio_pci_ibm_npu2_init(vdev); + if (ret && ret != -ENODEV) { + pci_warn(pdev, "Failed to setup NVIDIA NV2 ATSD region\n"); + goto disable_exit; + } + } + + vfio_pci_probe_mmaps(vdev); + + return 0; + +disable_exit: + vfio_pci_disable(vdev); + return ret; +} + +void vfio_pci_disable(struct vfio_pci_device *vdev) +{ + struct pci_dev *pdev = vdev->pdev; + struct vfio_pci_dummy_resource *dummy_res, *tmp; + struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp; + int i, bar; + + /* Stop the device from further DMA */ + pci_clear_master(pdev); + + vfio_pci_set_irqs_ioctl(vdev, VFIO_IRQ_SET_DATA_NONE | + VFIO_IRQ_SET_ACTION_TRIGGER, + vdev->irq_type, 0, 0, NULL); + + /* Device closed, don't need mutex here */ + list_for_each_entry_safe(ioeventfd, ioeventfd_tmp, + &vdev->ioeventfds_list, next) { + vfio_virqfd_disable(&ioeventfd->virqfd); + list_del(&ioeventfd->next); + kfree(ioeventfd); + } + vdev->ioeventfds_nr = 0; + + vdev->virq_disabled = false; + + for (i = 0; i < vdev->num_regions; i++) + vdev->region[i].ops->release(vdev, &vdev->region[i]); + + vdev->num_regions = 0; + kfree(vdev->region); + vdev->region = NULL; /* don't krealloc a freed pointer */ + + vfio_config_free(vdev); + + for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; bar++) { + if (!vdev->barmap[bar]) + continue; + pci_iounmap(pdev, vdev->barmap[bar]); + pci_release_selected_regions(pdev, 1 << bar); + vdev->barmap[bar] = NULL; + } + + list_for_each_entry_safe(dummy_res, tmp, + &vdev->dummy_resources_list, res_next) { + list_del(&dummy_res->res_next); + release_resource(&dummy_res->resource); + kfree(dummy_res); + } + + vdev->needs_reset = true; + + /* + * If we have saved state, restore it. If we can reset the device, + * even better. Resetting with current state seems better than + * nothing, but saving and restoring current state without reset + * is just busy work. + */ + if (pci_load_and_free_saved_state(pdev, &vdev->pci_saved_state)) { + pci_info(pdev, "%s: Couldn't reload saved state\n", __func__); + + if (!vdev->reset_works) + goto out; + + pci_save_state(pdev); + } + + /* + * Disable INTx and MSI, presumably to avoid spurious interrupts + * during reset. Stolen from pci_reset_function() + */ + pci_write_config_word(pdev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE); + + /* + * Try to reset the device. The success of this is dependent on + * being able to lock the device, which is not always possible. + */ + if (vdev->reset_works && !pci_try_reset_function(pdev)) + vdev->needs_reset = false; + + pci_restore_state(pdev); +out: + pci_disable_device(pdev); + + vfio_pci_try_bus_reset(vdev); + + if (!vdev->disable_idle_d3) + vfio_pci_set_power_state(vdev, PCI_D3hot); +} + +static void vfio_pci_release(void *device_data) +{ + struct vfio_pci_device *vdev = device_data; + + mutex_lock(&vdev->reflck->lock); + + if (!(--vdev->refcnt)) { + vfio_spapr_pci_eeh_release(vdev->pdev); + vfio_pci_disable(vdev); + } + + mutex_unlock(&vdev->reflck->lock); + + module_put(THIS_MODULE); +} + +static int vfio_pci_open(void *device_data) +{ + struct vfio_pci_device *vdev = device_data; + int ret = 0; + + if (!try_module_get(THIS_MODULE)) + return -ENODEV; + + mutex_lock(&vdev->reflck->lock); + + if (!vdev->refcnt) { + ret = vfio_pci_enable(vdev); + if (ret) + goto error; + + vfio_spapr_pci_eeh_open(vdev->pdev); + } + vdev->refcnt++; +error: + mutex_unlock(&vdev->reflck->lock); + if (ret) + module_put(THIS_MODULE); + return ret; +} + +static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type) +{ + if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) { + u8 pin; + + if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || + vdev->nointx || vdev->pdev->is_virtfn) + return 0; + + pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN, &pin); + + return pin ? 1 : 0; + } else if (irq_type == VFIO_PCI_MSI_IRQ_INDEX) { + u8 pos; + u16 flags; + + pos = vdev->pdev->msi_cap; + if (pos) { + pci_read_config_word(vdev->pdev, + pos + PCI_MSI_FLAGS, &flags); + return 1 << ((flags & PCI_MSI_FLAGS_QMASK) >> 1); + } + } else if (irq_type == VFIO_PCI_MSIX_IRQ_INDEX) { + u8 pos; + u16 flags; + + pos = vdev->pdev->msix_cap; + if (pos) { + pci_read_config_word(vdev->pdev, + pos + PCI_MSIX_FLAGS, &flags); + + return (flags & PCI_MSIX_FLAGS_QSIZE) + 1; + } + } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) { + if (pci_is_pcie(vdev->pdev)) + return 1; + } else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) { + return 1; + } + + return 0; +} + +static int vfio_pci_count_devs(struct pci_dev *pdev, void *data) +{ + (*(int *)data)++; + return 0; +} + +struct vfio_pci_fill_info { + int max; + int cur; + struct vfio_pci_dependent_device *devices; +}; + +static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data) +{ + struct vfio_pci_fill_info *fill = data; + struct iommu_group *iommu_group; + + if (fill->cur == fill->max) + return -EAGAIN; /* Something changed, try again */ + + iommu_group = iommu_group_get(&pdev->dev); + if (!iommu_group) + return -EPERM; /* Cannot reset non-isolated devices */ + + fill->devices[fill->cur].group_id = iommu_group_id(iommu_group); + fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus); + fill->devices[fill->cur].bus = pdev->bus->number; + fill->devices[fill->cur].devfn = pdev->devfn; + fill->cur++; + iommu_group_put(iommu_group); + return 0; +} + +struct vfio_pci_group_entry { + struct vfio_group *group; + int id; +}; + +struct vfio_pci_group_info { + int count; + struct vfio_pci_group_entry *groups; +}; + +static int vfio_pci_validate_devs(struct pci_dev *pdev, void *data) +{ + struct vfio_pci_group_info *info = data; + struct iommu_group *group; + int id, i; + + group = iommu_group_get(&pdev->dev); + if (!group) + return -EPERM; + + id = iommu_group_id(group); + + for (i = 0; i < info->count; i++) + if (info->groups[i].id == id) + break; + + iommu_group_put(group); + + return (i == info->count) ? -EINVAL : 0; +} + +static bool vfio_pci_dev_below_slot(struct pci_dev *pdev, struct pci_slot *slot) +{ + for (; pdev; pdev = pdev->bus->self) + if (pdev->bus == slot->bus) + return (pdev->slot == slot); + return false; +} + +struct vfio_pci_walk_info { + int (*fn)(struct pci_dev *, void *data); + void *data; + struct pci_dev *pdev; + bool slot; + int ret; +}; + +static int vfio_pci_walk_wrapper(struct pci_dev *pdev, void *data) +{ + struct vfio_pci_walk_info *walk = data; + + if (!walk->slot || vfio_pci_dev_below_slot(pdev, walk->pdev->slot)) + walk->ret = walk->fn(pdev, walk->data); + + return walk->ret; +} + +static int vfio_pci_for_each_slot_or_bus(struct pci_dev *pdev, + int (*fn)(struct pci_dev *, + void *data), void *data, + bool slot) +{ + struct vfio_pci_walk_info walk = { + .fn = fn, .data = data, .pdev = pdev, .slot = slot, .ret = 0, + }; + + pci_walk_bus(pdev->bus, vfio_pci_walk_wrapper, &walk); + + return walk.ret; +} + +static int msix_mmappable_cap(struct vfio_pci_device *vdev, + struct vfio_info_cap *caps) +{ + struct vfio_info_cap_header header = { + .id = VFIO_REGION_INFO_CAP_MSIX_MAPPABLE, + .version = 1 + }; + + return vfio_info_add_capability(caps, &header, sizeof(header)); +} + +int vfio_pci_register_dev_region(struct vfio_pci_device *vdev, + unsigned int type, unsigned int subtype, + const struct vfio_pci_regops *ops, + size_t size, u32 flags, void *data) +{ + struct vfio_pci_region *region; + + region = krealloc(vdev->region, + (vdev->num_regions + 1) * sizeof(*region), + GFP_KERNEL); + if (!region) + return -ENOMEM; + + vdev->region = region; + vdev->region[vdev->num_regions].type = type; + vdev->region[vdev->num_regions].subtype = subtype; + vdev->region[vdev->num_regions].ops = ops; + vdev->region[vdev->num_regions].size = size; + vdev->region[vdev->num_regions].flags = flags; + vdev->region[vdev->num_regions].data = data; + + vdev->num_regions++; + + return 0; +} + +long vfio_pci_ioctl(void *device_data, + unsigned int cmd, unsigned long arg) +{ + struct vfio_pci_device *vdev = device_data; + unsigned long minsz; + + if (cmd == VFIO_DEVICE_GET_INFO) { + struct vfio_device_info info; + + minsz = offsetofend(struct vfio_device_info, num_irqs); + + if (copy_from_user(&info, (void __user *)arg, minsz)) + return -EFAULT; + + if (info.argsz < minsz) + return -EINVAL; + + info.flags = VFIO_DEVICE_FLAGS_PCI; + + if (vdev->reset_works) + info.flags |= VFIO_DEVICE_FLAGS_RESET; + + info.num_regions = VFIO_PCI_NUM_REGIONS + vdev->num_regions; + info.num_irqs = VFIO_PCI_NUM_IRQS; + + return copy_to_user((void __user *)arg, &info, minsz) ? + -EFAULT : 0; + + } else if (cmd == VFIO_DEVICE_GET_REGION_INFO) { + struct pci_dev *pdev = vdev->pdev; + struct vfio_region_info info; + struct vfio_info_cap caps = { .buf = NULL, .size = 0 }; + int i, ret; + + minsz = offsetofend(struct vfio_region_info, offset); + + if (copy_from_user(&info, (void __user *)arg, minsz)) + return -EFAULT; + + if (info.argsz < minsz) + return -EINVAL; + + switch (info.index) { + case VFIO_PCI_CONFIG_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = pdev->cfg_size; + info.flags = VFIO_REGION_INFO_FLAG_READ | + VFIO_REGION_INFO_FLAG_WRITE; + break; + case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX: + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = pci_resource_len(pdev, info.index); + if (!info.size) { + info.flags = 0; + break; + } + + info.flags = VFIO_REGION_INFO_FLAG_READ | + VFIO_REGION_INFO_FLAG_WRITE; + if (vdev->bar_mmap_supported[info.index]) { + info.flags |= VFIO_REGION_INFO_FLAG_MMAP; + if (info.index == vdev->msix_bar) { + ret = msix_mmappable_cap(vdev, &caps); + if (ret) + return ret; + } + } + + break; + case VFIO_PCI_ROM_REGION_INDEX: + { + void __iomem *io; + size_t size; + u16 orig_cmd; + + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.flags = 0; + + /* Report the BAR size, not the ROM size */ + info.size = pci_resource_len(pdev, info.index); + if (!info.size) { + /* Shadow ROMs appear as PCI option ROMs */ + if (pdev->resource[PCI_ROM_RESOURCE].flags & + IORESOURCE_ROM_SHADOW) + info.size = 0x20000; + else + break; + } + + /* + * Is it really there? Enable memory decode for + * implicit access in pci_map_rom(). + */ + pci_read_config_word(pdev, PCI_COMMAND, &orig_cmd); + pci_write_config_word(pdev, PCI_COMMAND, + orig_cmd | PCI_COMMAND_MEMORY); + + io = pci_map_rom(pdev, &size); + if (io) { + info.flags = VFIO_REGION_INFO_FLAG_READ; + pci_unmap_rom(pdev, io); + } else { + info.size = 0; + } + + pci_write_config_word(pdev, PCI_COMMAND, orig_cmd); + break; + } + case VFIO_PCI_VGA_REGION_INDEX: + if (!vdev->has_vga) + return -EINVAL; + + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = 0xc0000; + info.flags = VFIO_REGION_INFO_FLAG_READ | + VFIO_REGION_INFO_FLAG_WRITE; + + break; + default: + { + struct vfio_region_info_cap_type cap_type = { + .header.id = VFIO_REGION_INFO_CAP_TYPE, + .header.version = 1 }; + + if (info.index >= + VFIO_PCI_NUM_REGIONS + vdev->num_regions) + return -EINVAL; + info.index = array_index_nospec(info.index, + VFIO_PCI_NUM_REGIONS + + vdev->num_regions); + + i = info.index - VFIO_PCI_NUM_REGIONS; + + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); + info.size = vdev->region[i].size; + info.flags = vdev->region[i].flags; + + cap_type.type = vdev->region[i].type; + cap_type.subtype = vdev->region[i].subtype; + + ret = vfio_info_add_capability(&caps, &cap_type.header, + sizeof(cap_type)); + if (ret) + return ret; + + if (vdev->region[i].ops->add_capability) { + ret = vdev->region[i].ops->add_capability(vdev, + &vdev->region[i], &caps); + if (ret) + return ret; + } + } + } + + if (caps.size) { + info.flags |= VFIO_REGION_INFO_FLAG_CAPS; + if (info.argsz < sizeof(info) + caps.size) { + info.argsz = sizeof(info) + caps.size; + info.cap_offset = 0; + } else { + vfio_info_cap_shift(&caps, sizeof(info)); + if (copy_to_user((void __user *)arg + + sizeof(info), caps.buf, + caps.size)) { + kfree(caps.buf); + return -EFAULT; + } + info.cap_offset = sizeof(info); + } + + kfree(caps.buf); + } + + return copy_to_user((void __user *)arg, &info, minsz) ? + -EFAULT : 0; + + } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) { + struct vfio_irq_info info; + + minsz = offsetofend(struct vfio_irq_info, count); + + if (copy_from_user(&info, (void __user *)arg, minsz)) + return -EFAULT; + + if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS) + return -EINVAL; + + switch (info.index) { + case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX: + case VFIO_PCI_REQ_IRQ_INDEX: + break; + case VFIO_PCI_ERR_IRQ_INDEX: + if (pci_is_pcie(vdev->pdev)) + break; + /* fall through */ + default: + return -EINVAL; + } + + info.flags = VFIO_IRQ_INFO_EVENTFD; + + info.count = vfio_pci_get_irq_count(vdev, info.index); + + if (info.index == VFIO_PCI_INTX_IRQ_INDEX) + info.flags |= (VFIO_IRQ_INFO_MASKABLE | + VFIO_IRQ_INFO_AUTOMASKED); + else + info.flags |= VFIO_IRQ_INFO_NORESIZE; + + return copy_to_user((void __user *)arg, &info, minsz) ? + -EFAULT : 0; + + } else if (cmd == VFIO_DEVICE_SET_IRQS) { + struct vfio_irq_set hdr; + u8 *data = NULL; + int max, ret = 0; + size_t data_size = 0; + + minsz = offsetofend(struct vfio_irq_set, count); + + if (copy_from_user(&hdr, (void __user *)arg, minsz)) + return -EFAULT; + + max = vfio_pci_get_irq_count(vdev, hdr.index); + + ret = vfio_set_irqs_validate_and_prepare(&hdr, max, + VFIO_PCI_NUM_IRQS, &data_size); + if (ret) + return ret; + + if (data_size) { + data = memdup_user((void __user *)(arg + minsz), + data_size); + if (IS_ERR(data)) + return PTR_ERR(data); + } + + mutex_lock(&vdev->igate); + + ret = vfio_pci_set_irqs_ioctl(vdev, hdr.flags, hdr.index, + hdr.start, hdr.count, data); + + mutex_unlock(&vdev->igate); + kfree(data); + + return ret; + + } else if (cmd == VFIO_DEVICE_RESET) { + return vdev->reset_works ? + pci_try_reset_function(vdev->pdev) : -EINVAL; + + } else if (cmd == VFIO_DEVICE_GET_PCI_HOT_RESET_INFO) { + struct vfio_pci_hot_reset_info hdr; + struct vfio_pci_fill_info fill = { 0 }; + struct vfio_pci_dependent_device *devices = NULL; + bool slot = false; + int ret = 0; + + minsz = offsetofend(struct vfio_pci_hot_reset_info, count); + + if (copy_from_user(&hdr, (void __user *)arg, minsz)) + return -EFAULT; + + if (hdr.argsz < minsz) + return -EINVAL; + + hdr.flags = 0; + + /* Can we do a slot or bus reset or neither? */ + if (!pci_probe_reset_slot(vdev->pdev->slot)) + slot = true; + else if (pci_probe_reset_bus(vdev->pdev->bus)) + return -ENODEV; + + /* How many devices are affected? */ + ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, + vfio_pci_count_devs, + &fill.max, slot); + if (ret) + return ret; + + WARN_ON(!fill.max); /* Should always be at least one */ + + /* + * If there's enough space, fill it now, otherwise return + * -ENOSPC and the number of devices affected. + */ + if (hdr.argsz < sizeof(hdr) + (fill.max * sizeof(*devices))) { + ret = -ENOSPC; + hdr.count = fill.max; + goto reset_info_exit; + } + + devices = kcalloc(fill.max, sizeof(*devices), GFP_KERNEL); + if (!devices) + return -ENOMEM; + + fill.devices = devices; + + ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, + vfio_pci_fill_devs, + &fill, slot); + + /* + * If a device was removed between counting and filling, + * we may come up short of fill.max. If a device was + * added, we'll have a return of -EAGAIN above. + */ + if (!ret) + hdr.count = fill.cur; + +reset_info_exit: + if (copy_to_user((void __user *)arg, &hdr, minsz)) + ret = -EFAULT; + + if (!ret) { + if (copy_to_user((void __user *)(arg + minsz), devices, + hdr.count * sizeof(*devices))) + ret = -EFAULT; + } + + kfree(devices); + return ret; + + } else if (cmd == VFIO_DEVICE_PCI_HOT_RESET) { + struct vfio_pci_hot_reset hdr; + int32_t *group_fds; + struct vfio_pci_group_entry *groups; + struct vfio_pci_group_info info; + bool slot = false; + int i, count = 0, ret = 0; + + minsz = offsetofend(struct vfio_pci_hot_reset, count); + + if (copy_from_user(&hdr, (void __user *)arg, minsz)) + return -EFAULT; + + if (hdr.argsz < minsz || hdr.flags) + return -EINVAL; + + /* Can we do a slot or bus reset or neither? */ + if (!pci_probe_reset_slot(vdev->pdev->slot)) + slot = true; + else if (pci_probe_reset_bus(vdev->pdev->bus)) + return -ENODEV; + + /* + * We can't let userspace give us an arbitrarily large + * buffer to copy, so verify how many we think there + * could be. Note groups can have multiple devices so + * one group per device is the max. + */ + ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, + vfio_pci_count_devs, + &count, slot); + if (ret) + return ret; + + /* Somewhere between 1 and count is OK */ + if (!hdr.count || hdr.count > count) + return -EINVAL; + + group_fds = kcalloc(hdr.count, sizeof(*group_fds), GFP_KERNEL); + groups = kcalloc(hdr.count, sizeof(*groups), GFP_KERNEL); + if (!group_fds || !groups) { + kfree(group_fds); + kfree(groups); + return -ENOMEM; + } + + if (copy_from_user(group_fds, (void __user *)(arg + minsz), + hdr.count * sizeof(*group_fds))) { + kfree(group_fds); + kfree(groups); + return -EFAULT; + } + + /* + * For each group_fd, get the group through the vfio external + * user interface and store the group and iommu ID. This + * ensures the group is held across the reset. + */ + for (i = 0; i < hdr.count; i++) { + struct vfio_group *group; + struct fd f = fdget(group_fds[i]); + if (!f.file) { + ret = -EBADF; + break; + } + + group = vfio_group_get_external_user(f.file); + fdput(f); + if (IS_ERR(group)) { + ret = PTR_ERR(group); + break; + } + + groups[i].group = group; + groups[i].id = vfio_external_user_iommu_id(group); + } + + kfree(group_fds); + + /* release reference to groups on error */ + if (ret) + goto hot_reset_release; + + info.count = hdr.count; + info.groups = groups; + + /* + * Test whether all the affected devices are contained + * by the set of groups provided by the user. + */ + ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, + vfio_pci_validate_devs, + &info, slot); + if (!ret) + /* User has access, do the reset */ + ret = pci_reset_bus(vdev->pdev); + +hot_reset_release: + for (i--; i >= 0; i--) + vfio_group_put_external_user(groups[i].group); + + kfree(groups); + return ret; + } else if (cmd == VFIO_DEVICE_IOEVENTFD) { + struct vfio_device_ioeventfd ioeventfd; + int count; + + minsz = offsetofend(struct vfio_device_ioeventfd, fd); + + if (copy_from_user(&ioeventfd, (void __user *)arg, minsz)) + return -EFAULT; + + if (ioeventfd.argsz < minsz) + return -EINVAL; + + if (ioeventfd.flags & ~VFIO_DEVICE_IOEVENTFD_SIZE_MASK) + return -EINVAL; + + count = ioeventfd.flags & VFIO_DEVICE_IOEVENTFD_SIZE_MASK; + + if (hweight8(count) != 1 || ioeventfd.fd < -1) + return -EINVAL; + + return vfio_pci_ioeventfd(vdev, ioeventfd.offset, + ioeventfd.data, count, ioeventfd.fd); + } + + return -ENOTTY; +} + +static ssize_t vfio_pci_rw(void *device_data, char __user *buf, + size_t count, loff_t *ppos, bool iswrite) +{ + unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos); + struct vfio_pci_device *vdev = device_data; + + if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions) + return -EINVAL; + + switch (index) { + case VFIO_PCI_CONFIG_REGION_INDEX: + return vfio_pci_config_rw(vdev, buf, count, ppos, iswrite); + + case VFIO_PCI_ROM_REGION_INDEX: + if (iswrite) + return -EINVAL; + return vfio_pci_bar_rw(vdev, buf, count, ppos, false); + + case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX: + return vfio_pci_bar_rw(vdev, buf, count, ppos, iswrite); + + case VFIO_PCI_VGA_REGION_INDEX: + return vfio_pci_vga_rw(vdev, buf, count, ppos, iswrite); + default: + index -= VFIO_PCI_NUM_REGIONS; + return vdev->region[index].ops->rw(vdev, buf, + count, ppos, iswrite); + } + + return -EINVAL; +} + +ssize_t vfio_pci_read(void *device_data, char __user *buf, + size_t count, loff_t *ppos) +{ + if (!count) + return 0; + + return vfio_pci_rw(device_data, buf, count, ppos, false); +} + +ssize_t vfio_pci_write(void *device_data, const char __user *buf, + size_t count, loff_t *ppos) +{ + if (!count) + return 0; + + return vfio_pci_rw(device_data, (char __user *)buf, count, ppos, true); +} + +int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) +{ + struct vfio_pci_device *vdev = device_data; + struct pci_dev *pdev = vdev->pdev; + unsigned int index; + u64 phys_len, req_len, pgoff, req_start; + int ret; + + index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); + + if (vma->vm_end < vma->vm_start) + return -EINVAL; + if ((vma->vm_flags & VM_SHARED) == 0) + return -EINVAL; + if (index >= VFIO_PCI_NUM_REGIONS) { + int regnum = index - VFIO_PCI_NUM_REGIONS; + struct vfio_pci_region *region = vdev->region + regnum; + + if (region && region->ops && region->ops->mmap && + (region->flags & VFIO_REGION_INFO_FLAG_MMAP)) + return region->ops->mmap(vdev, region, vma); + return -EINVAL; + } + if (index >= VFIO_PCI_ROM_REGION_INDEX) + return -EINVAL; + if (!vdev->bar_mmap_supported[index]) + return -EINVAL; + + phys_len = PAGE_ALIGN(pci_resource_len(pdev, index)); + req_len = vma->vm_end - vma->vm_start; + pgoff = vma->vm_pgoff & + ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1); + req_start = pgoff << PAGE_SHIFT; + + if (req_start + req_len > phys_len) + return -EINVAL; + + /* + * Even though we don't make use of the barmap for the mmap, + * we need to request the region and the barmap tracks that. + */ + if (!vdev->barmap[index]) { + ret = pci_request_selected_regions(pdev, + 1 << index, "vfio-pci"); + if (ret) + return ret; + + vdev->barmap[index] = pci_iomap(pdev, index, 0); + if (!vdev->barmap[index]) { + pci_release_selected_regions(pdev, 1 << index); + return -ENOMEM; + } + } + + vma->vm_private_data = vdev; + vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); + vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff; + + return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, + req_len, vma->vm_page_prot); +} + +void vfio_pci_request(void *device_data, unsigned int count) +{ + struct vfio_pci_device *vdev = device_data; + struct pci_dev *pdev = vdev->pdev; + + mutex_lock(&vdev->igate); + + if (vdev->req_trigger) { + if (!(count % 10)) + pci_notice_ratelimited(pdev, + "Relaying device request to user (#%u)\n", + count); + eventfd_signal(vdev->req_trigger, 1); + } else if (count == 0) { + pci_warn(pdev, + "No device request channel registered, blocked until released by user\n"); + } + + mutex_unlock(&vdev->igate); +} + +static const struct vfio_device_ops vfio_pci_ops = { + .name = "vfio-pci", + .open = vfio_pci_open, + .release = vfio_pci_release, + .ioctl = vfio_pci_ioctl, + .read = vfio_pci_read, + .write = vfio_pci_write, + .mmap = vfio_pci_mmap, + .request = vfio_pci_request, +}; + +static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) +{ + struct vfio_pci_device *vdev; + struct iommu_group *group; + int ret; + + if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL) + return -EINVAL; + + /* + * Prevent binding to PFs with VFs enabled, this too easily allows + * userspace instance with VFs and PFs from the same device, which + * cannot work. Disabling SR-IOV here would initiate removing the + * VFs, which would unbind the driver, which is prone to blocking + * if that VF is also in use by vfio-pci. Just reject these PFs + * and let the user sort it out. + */ + if (pci_num_vf(pdev)) { + pci_warn(pdev, "Cannot bind to PF with SR-IOV enabled\n"); + return -EBUSY; + } + + group = vfio_iommu_group_get(&pdev->dev); + if (!group) + return -EINVAL; + + vdev = kzalloc(sizeof(*vdev), GFP_KERNEL); + if (!vdev) { + vfio_iommu_group_put(group, &pdev->dev); + return -ENOMEM; + } + + vdev->pdev = pdev; + vdev->irq_type = VFIO_PCI_NUM_IRQS; + mutex_init(&vdev->igate); + spin_lock_init(&vdev->irqlock); + mutex_init(&vdev->ioeventfds_lock); + INIT_LIST_HEAD(&vdev->ioeventfds_list); + vdev->nointxmask = nointxmask; +#ifdef CONFIG_VFIO_PCI_VGA + vdev->disable_vga = disable_vga; +#endif + vdev->disable_idle_d3 = disable_idle_d3; + + ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev); + if (ret) { + vfio_iommu_group_put(group, &pdev->dev); + kfree(vdev); + return ret; + } + + ret = vfio_pci_reflck_attach(vdev); + if (ret) { + vfio_del_group_dev(&pdev->dev); + vfio_iommu_group_put(group, &pdev->dev); + kfree(vdev); + return ret; + } + + if (vfio_pci_is_vga(pdev)) { + vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode); + vga_set_legacy_decoding(pdev, + vfio_pci_set_vga_decode(vdev, false)); + } + + vfio_pci_probe_power_state(vdev); + + if (!vdev->disable_idle_d3) { + /* + * pci-core sets the device power state to an unknown value at + * bootup and after being removed from a driver. The only + * transition it allows from this unknown state is to D0, which + * typically happens when a driver calls pci_enable_device(). + * We're not ready to enable the device yet, but we do want to + * be able to get to D3. Therefore first do a D0 transition + * before going to D3. + */ + vfio_pci_set_power_state(vdev, PCI_D0); + vfio_pci_set_power_state(vdev, PCI_D3hot); + } + + return ret; +} + +static void vfio_pci_remove(struct pci_dev *pdev) +{ + struct vfio_pci_device *vdev; + + vdev = vfio_del_group_dev(&pdev->dev); + if (!vdev) + return; + + vfio_pci_reflck_put(vdev->reflck); + + vfio_iommu_group_put(pdev->dev.iommu_group, &pdev->dev); + kfree(vdev->region); + mutex_destroy(&vdev->ioeventfds_lock); + + if (!vdev->disable_idle_d3) + vfio_pci_set_power_state(vdev, PCI_D0); + + kfree(vdev->pm_save); + kfree(vdev); + + if (vfio_pci_is_vga(pdev)) { + vga_client_register(pdev, NULL, NULL, NULL); + vga_set_legacy_decoding(pdev, + VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM | + VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM); + } +} + +static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev, + pci_channel_state_t state) +{ + struct vfio_pci_device *vdev; + struct vfio_device *device; + + device = vfio_device_get_from_dev(&pdev->dev); + if (device == NULL) + return PCI_ERS_RESULT_DISCONNECT; + + vdev = vfio_device_data(device); + if (vdev == NULL) { + vfio_device_put(device); + return PCI_ERS_RESULT_DISCONNECT; + } + + mutex_lock(&vdev->igate); + + if (vdev->err_trigger) + eventfd_signal(vdev->err_trigger, 1); + + mutex_unlock(&vdev->igate); + + vfio_device_put(device); + + return PCI_ERS_RESULT_CAN_RECOVER; +} + +static const struct pci_error_handlers vfio_err_handlers = { + .error_detected = vfio_pci_aer_err_detected, +}; + +static struct pci_driver vfio_pci_driver = { + .name = "vfio-pci", + .id_table = NULL, /* only dynamic ids */ + .probe = vfio_pci_probe, + .remove = vfio_pci_remove, + .err_handler = &vfio_err_handlers, +}; + +static DEFINE_MUTEX(reflck_lock); + +static struct vfio_pci_reflck *vfio_pci_reflck_alloc(void) +{ + struct vfio_pci_reflck *reflck; + + reflck = kzalloc(sizeof(*reflck), GFP_KERNEL); + if (!reflck) + return ERR_PTR(-ENOMEM); + + kref_init(&reflck->kref); + mutex_init(&reflck->lock); + + return reflck; +} + +static void vfio_pci_reflck_get(struct vfio_pci_reflck *reflck) +{ + kref_get(&reflck->kref); +} + +static int vfio_pci_reflck_find(struct pci_dev *pdev, void *data) +{ + struct vfio_pci_device *vdev = data; + struct vfio_pci_reflck **preflck = &vdev->reflck; + struct vfio_device *device; + struct vfio_pci_device *tmp; + + device = vfio_device_get_from_dev(&pdev->dev); + if (!device) + return 0; + + if (pci_dev_driver(pdev) != pci_dev_driver(vdev->pdev)) { + vfio_device_put(device); + return 0; + } + + tmp = vfio_device_data(device); + + if (tmp->reflck) { + vfio_pci_reflck_get(tmp->reflck); + *preflck = tmp->reflck; + vfio_device_put(device); + return 1; + } + + vfio_device_put(device); + return 0; +} + +int vfio_pci_reflck_attach(struct vfio_pci_device *vdev) +{ + bool slot = !pci_probe_reset_slot(vdev->pdev->slot); + + mutex_lock(&reflck_lock); + + if (pci_is_root_bus(vdev->pdev->bus) || + vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_reflck_find, + vdev, slot) <= 0) + vdev->reflck = vfio_pci_reflck_alloc(); + + mutex_unlock(&reflck_lock); + + return PTR_ERR_OR_ZERO(vdev->reflck); +} + +static void vfio_pci_reflck_release(struct kref *kref) +{ + struct vfio_pci_reflck *reflck = container_of(kref, + struct vfio_pci_reflck, + kref); + + kfree(reflck); + mutex_unlock(&reflck_lock); +} + +void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck) +{ + kref_put_mutex(&reflck->kref, vfio_pci_reflck_release, &reflck_lock); +} + +struct vfio_devices { + struct vfio_device **devices; + struct vfio_pci_device *vdev; + int cur_index; + int max_index; +}; + +static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data) +{ + struct vfio_devices *devs = data; + struct vfio_device *device; + struct vfio_pci_device *tmp; + + if (devs->cur_index == devs->max_index) + return -ENOSPC; + + device = vfio_device_get_from_dev(&pdev->dev); + if (!device) + return -EINVAL; + + if (pci_dev_driver(pdev) != pci_dev_driver(devs->vdev->pdev)) { + vfio_device_put(device); + return -EBUSY; + } + + tmp = vfio_device_data(device); + + /* Fault if the device is not unused */ + if (tmp->refcnt) { + vfio_device_put(device); + return -EBUSY; + } + + devs->devices[devs->cur_index++] = device; + return 0; +} + +/* + * If a bus or slot reset is available for the provided device and: + * - All of the devices affected by that bus or slot reset are unused + * (!refcnt) + * - At least one of the affected devices is marked dirty via + * needs_reset (such as by lack of FLR support) + * Then attempt to perform that bus or slot reset. Callers are required + * to hold vdev->reflck->lock, protecting the bus/slot reset group from + * concurrent opens. A vfio_device reference is acquired for each device + * to prevent unbinds during the reset operation. + * + * NB: vfio-core considers a group to be viable even if some devices are + * bound to drivers like pci-stub or pcieport. Here we require all devices + * to be bound to vfio_pci since that's the only way we can be sure they + * stay put. + */ +static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev) +{ + struct vfio_devices devs = { .cur_index = 0 }; + int i = 0, ret = -EINVAL; + bool slot = false; + struct vfio_pci_device *tmp; + + if (!pci_probe_reset_slot(vdev->pdev->slot)) + slot = true; + else if (pci_probe_reset_bus(vdev->pdev->bus)) + return; + + if (vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs, + &i, slot) || !i) + return; + + devs.max_index = i; + devs.devices = kcalloc(i, sizeof(struct vfio_device *), GFP_KERNEL); + if (!devs.devices) + return; + + devs.vdev = vdev; + if (vfio_pci_for_each_slot_or_bus(vdev->pdev, + vfio_pci_get_unused_devs, + &devs, slot)) + goto put_devs; + + /* Does at least one need a reset? */ + for (i = 0; i < devs.cur_index; i++) { + tmp = vfio_device_data(devs.devices[i]); + if (tmp->needs_reset) { + ret = pci_reset_bus(vdev->pdev); + break; + } + } + +put_devs: + for (i = 0; i < devs.cur_index; i++) { + tmp = vfio_device_data(devs.devices[i]); + + /* + * If reset was successful, affected devices no longer need + * a reset and we should return all the collateral devices + * to low power. If not successful, we either didn't reset + * the bus or timed out waiting for it, so let's not touch + * the power state. + */ + if (!ret) { + tmp->needs_reset = false; + + if (tmp != vdev && !tmp->disable_idle_d3) + vfio_pci_set_power_state(tmp, PCI_D3hot); + } + + vfio_device_put(devs.devices[i]); + } + + kfree(devs.devices); +} + +static void __exit vfio_pci_cleanup(void) +{ + pci_unregister_driver(&vfio_pci_driver); + vfio_pci_uninit_perm_bits(); +} + +void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver) +{ + char *p, *id; + int rc; + + /* no ids passed actually */ + if (ids[0] == '\0') + return; + + /* add ids specified in the module parameter */ + p = ids; + while ((id = strsep(&p, ","))) { + unsigned int vendor, device, subvendor = PCI_ANY_ID, + subdevice = PCI_ANY_ID, class = 0, class_mask = 0; + int fields; + + if (!strlen(id)) + continue; + + fields = sscanf(id, "%x:%x:%x:%x:%x:%x", + &vendor, &device, &subvendor, &subdevice, + &class, &class_mask); + + if (fields < 2) { + pr_warn("invalid id string \"%s\"\n", id); + continue; + } + + rc = pci_add_dynid(driver, vendor, device, + subvendor, subdevice, class, class_mask, 0); + if (rc) + pr_warn("failed to add dynamic id [%04x:%04x[%04x:%04x]] class %#08x/%08x (%d)\n", + vendor, device, subvendor, subdevice, + class, class_mask, rc); + else + pr_info("add [%04x:%04x[%04x:%04x]] class %#08x/%08x\n", + vendor, device, subvendor, subdevice, + class, class_mask); + } +} + +static int __init vfio_pci_init(void) +{ + int ret; + + /* Allocate shared config space permision data used by all devices */ + ret = vfio_pci_init_perm_bits(); + if (ret) + return ret; + + /* Register and scan for devices */ + ret = pci_register_driver(&vfio_pci_driver); + if (ret) + goto out_driver; + + vfio_pci_fill_ids(&ids[0], &vfio_pci_driver); + + return 0; + +out_driver: + vfio_pci_uninit_perm_bits(); + return ret; +} + +module_init(vfio_pci_init); +module_exit(vfio_pci_cleanup); + +MODULE_VERSION(DRIVER_VERSION); +MODULE_LICENSE("GPL v2"); +MODULE_AUTHOR(DRIVER_AUTHOR); +MODULE_DESCRIPTION(DRIVER_DESC); From patchwork Thu Sep 5 07:59:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11134717 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2125914ED for ; Fri, 6 Sep 2019 08:18:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 02F162087E for ; Fri, 6 Sep 2019 08:18:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391294AbfIFIRm (ORCPT ); Fri, 6 Sep 2019 04:17:42 -0400 Received: from mga11.intel.com ([192.55.52.93]:23720 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391182AbfIFIRi (ORCPT ); Fri, 6 Sep 2019 04:17:38 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Sep 2019 01:17:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,472,1559545200"; d="scan'208";a="383186156" Received: from yiliu-dev.bj.intel.com ([10.238.156.139]) by fmsmga005.fm.intel.com with ESMTP; 06 Sep 2019 01:17:35 -0700 From: Liu Yi L To: alex.williamson@redhat.com, kwankhede@nvidia.com Cc: kevin.tian@intel.com, baolu.lu@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, joro@8bytes.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, yan.y.zhao@intel.com, shaopeng.he@intel.com, chenbo.xia@intel.com, jun.j.tian@intel.com Subject: [PATCH v2 06/13] vfio_pci: shrink vfio_pci_common.c Date: Thu, 5 Sep 2019 15:59:23 +0800 Message-Id: <1567670370-4484-7-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> References: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch removes the vfio-pci module specific codes in vfio_pci_common.c to make vfio_pci_common.c be a common source file. Cc: Kevin Tian Cc: Lu Baolu Signed-off-by: Liu Yi L --- drivers/vfio/pci/vfio_pci_common.c | 233 ------------------------------------- 1 file changed, 233 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_common.c b/drivers/vfio/pci/vfio_pci_common.c index 90a44a7..11365f1 100644 --- a/drivers/vfio/pci/vfio_pci_common.c +++ b/drivers/vfio/pci/vfio_pci_common.c @@ -30,30 +30,6 @@ #include "vfio_pci_private.h" -#define DRIVER_VERSION "0.2" -#define DRIVER_AUTHOR "Alex Williamson " -#define DRIVER_DESC "VFIO PCI - User Level meta-driver" - -static char ids[1024] __initdata; -module_param_string(ids, ids, sizeof(ids), 0); -MODULE_PARM_DESC(ids, "Initial PCI IDs to add to the vfio driver, format is \"vendor:device[:subvendor[:subdevice[:class[:class_mask]]]]\" and multiple comma separated entries can be specified"); - -static bool nointxmask; -module_param_named(nointxmask, nointxmask, bool, S_IRUGO | S_IWUSR); -MODULE_PARM_DESC(nointxmask, - "Disable support for PCI 2.3 style INTx masking. If this resolves problems for specific devices, report lspci -vvvxxx to linux-pci@vger.kernel.org so the device can be fixed automatically via the broken_intx_masking flag."); - -#ifdef CONFIG_VFIO_PCI_VGA -static bool disable_vga; -module_param(disable_vga, bool, S_IRUGO); -MODULE_PARM_DESC(disable_vga, "Disable VGA resource access through vfio-pci"); -#endif - -static bool disable_idle_d3; -module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR); -MODULE_PARM_DESC(disable_idle_d3, - "Disable using the PCI D3 low power state for idle, unused devices"); - /* * Our VGA arbiter participation is limited since we don't know anything * about the device itself. However, if the device is the only VGA device @@ -440,47 +416,6 @@ void vfio_pci_disable(struct vfio_pci_device *vdev) vfio_pci_set_power_state(vdev, PCI_D3hot); } -static void vfio_pci_release(void *device_data) -{ - struct vfio_pci_device *vdev = device_data; - - mutex_lock(&vdev->reflck->lock); - - if (!(--vdev->refcnt)) { - vfio_spapr_pci_eeh_release(vdev->pdev); - vfio_pci_disable(vdev); - } - - mutex_unlock(&vdev->reflck->lock); - - module_put(THIS_MODULE); -} - -static int vfio_pci_open(void *device_data) -{ - struct vfio_pci_device *vdev = device_data; - int ret = 0; - - if (!try_module_get(THIS_MODULE)) - return -ENODEV; - - mutex_lock(&vdev->reflck->lock); - - if (!vdev->refcnt) { - ret = vfio_pci_enable(vdev); - if (ret) - goto error; - - vfio_spapr_pci_eeh_open(vdev->pdev); - } - vdev->refcnt++; -error: - mutex_unlock(&vdev->reflck->lock); - if (ret) - module_put(THIS_MODULE); - return ret; -} - static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type) { if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) { @@ -1252,129 +1187,6 @@ void vfio_pci_request(void *device_data, unsigned int count) mutex_unlock(&vdev->igate); } -static const struct vfio_device_ops vfio_pci_ops = { - .name = "vfio-pci", - .open = vfio_pci_open, - .release = vfio_pci_release, - .ioctl = vfio_pci_ioctl, - .read = vfio_pci_read, - .write = vfio_pci_write, - .mmap = vfio_pci_mmap, - .request = vfio_pci_request, -}; - -static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) -{ - struct vfio_pci_device *vdev; - struct iommu_group *group; - int ret; - - if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL) - return -EINVAL; - - /* - * Prevent binding to PFs with VFs enabled, this too easily allows - * userspace instance with VFs and PFs from the same device, which - * cannot work. Disabling SR-IOV here would initiate removing the - * VFs, which would unbind the driver, which is prone to blocking - * if that VF is also in use by vfio-pci. Just reject these PFs - * and let the user sort it out. - */ - if (pci_num_vf(pdev)) { - pci_warn(pdev, "Cannot bind to PF with SR-IOV enabled\n"); - return -EBUSY; - } - - group = vfio_iommu_group_get(&pdev->dev); - if (!group) - return -EINVAL; - - vdev = kzalloc(sizeof(*vdev), GFP_KERNEL); - if (!vdev) { - vfio_iommu_group_put(group, &pdev->dev); - return -ENOMEM; - } - - vdev->pdev = pdev; - vdev->irq_type = VFIO_PCI_NUM_IRQS; - mutex_init(&vdev->igate); - spin_lock_init(&vdev->irqlock); - mutex_init(&vdev->ioeventfds_lock); - INIT_LIST_HEAD(&vdev->ioeventfds_list); - vdev->nointxmask = nointxmask; -#ifdef CONFIG_VFIO_PCI_VGA - vdev->disable_vga = disable_vga; -#endif - vdev->disable_idle_d3 = disable_idle_d3; - - ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev); - if (ret) { - vfio_iommu_group_put(group, &pdev->dev); - kfree(vdev); - return ret; - } - - ret = vfio_pci_reflck_attach(vdev); - if (ret) { - vfio_del_group_dev(&pdev->dev); - vfio_iommu_group_put(group, &pdev->dev); - kfree(vdev); - return ret; - } - - if (vfio_pci_is_vga(pdev)) { - vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode); - vga_set_legacy_decoding(pdev, - vfio_pci_set_vga_decode(vdev, false)); - } - - vfio_pci_probe_power_state(vdev); - - if (!vdev->disable_idle_d3) { - /* - * pci-core sets the device power state to an unknown value at - * bootup and after being removed from a driver. The only - * transition it allows from this unknown state is to D0, which - * typically happens when a driver calls pci_enable_device(). - * We're not ready to enable the device yet, but we do want to - * be able to get to D3. Therefore first do a D0 transition - * before going to D3. - */ - vfio_pci_set_power_state(vdev, PCI_D0); - vfio_pci_set_power_state(vdev, PCI_D3hot); - } - - return ret; -} - -static void vfio_pci_remove(struct pci_dev *pdev) -{ - struct vfio_pci_device *vdev; - - vdev = vfio_del_group_dev(&pdev->dev); - if (!vdev) - return; - - vfio_pci_reflck_put(vdev->reflck); - - vfio_iommu_group_put(pdev->dev.iommu_group, &pdev->dev); - kfree(vdev->region); - mutex_destroy(&vdev->ioeventfds_lock); - - if (!vdev->disable_idle_d3) - vfio_pci_set_power_state(vdev, PCI_D0); - - kfree(vdev->pm_save); - kfree(vdev); - - if (vfio_pci_is_vga(pdev)) { - vga_client_register(pdev, NULL, NULL, NULL); - vga_set_legacy_decoding(pdev, - VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM | - VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM); - } -} - static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev, pci_channel_state_t state) { @@ -1407,14 +1219,6 @@ static const struct pci_error_handlers vfio_err_handlers = { .error_detected = vfio_pci_aer_err_detected, }; -static struct pci_driver vfio_pci_driver = { - .name = "vfio-pci", - .id_table = NULL, /* only dynamic ids */ - .probe = vfio_pci_probe, - .remove = vfio_pci_remove, - .err_handler = &vfio_err_handlers, -}; - static DEFINE_MUTEX(reflck_lock); static struct vfio_pci_reflck *vfio_pci_reflck_alloc(void) @@ -1609,12 +1413,6 @@ static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev) kfree(devs.devices); } -static void __exit vfio_pci_cleanup(void) -{ - pci_unregister_driver(&vfio_pci_driver); - vfio_pci_uninit_perm_bits(); -} - void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver) { char *p, *id; @@ -1655,34 +1453,3 @@ void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver) class, class_mask); } } - -static int __init vfio_pci_init(void) -{ - int ret; - - /* Allocate shared config space permision data used by all devices */ - ret = vfio_pci_init_perm_bits(); - if (ret) - return ret; - - /* Register and scan for devices */ - ret = pci_register_driver(&vfio_pci_driver); - if (ret) - goto out_driver; - - vfio_pci_fill_ids(&ids[0], &vfio_pci_driver); - - return 0; - -out_driver: - vfio_pci_uninit_perm_bits(); - return ret; -} - -module_init(vfio_pci_init); -module_exit(vfio_pci_cleanup); - -MODULE_VERSION(DRIVER_VERSION); -MODULE_LICENSE("GPL v2"); -MODULE_AUTHOR(DRIVER_AUTHOR); -MODULE_DESCRIPTION(DRIVER_DESC); From patchwork Thu Sep 5 07:59:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11134715 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 81C101398 for ; Fri, 6 Sep 2019 08:18:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 36F2C214DE for ; Fri, 6 Sep 2019 08:18:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391353AbfIFIRm (ORCPT ); Fri, 6 Sep 2019 04:17:42 -0400 Received: from mga11.intel.com ([192.55.52.93]:23720 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391156AbfIFIRl (ORCPT ); Fri, 6 Sep 2019 04:17:41 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Sep 2019 01:17:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,472,1559545200"; d="scan'208";a="383186165" Received: from yiliu-dev.bj.intel.com ([10.238.156.139]) by fmsmga005.fm.intel.com with ESMTP; 06 Sep 2019 01:17:37 -0700 From: Liu Yi L To: alex.williamson@redhat.com, kwankhede@nvidia.com Cc: kevin.tian@intel.com, baolu.lu@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, joro@8bytes.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, yan.y.zhao@intel.com, shaopeng.he@intel.com, chenbo.xia@intel.com, jun.j.tian@intel.com Subject: [PATCH v2 07/13] vfio_pci: shrink vfio_pci.c Date: Thu, 5 Sep 2019 15:59:24 +0800 Message-Id: <1567670370-4484-8-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> References: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch removes the common codes in vfio_pci.c, instead, vfio-pci module will leverage the common functions implemented in vfio_pci_common.c. Cc: Kevin Tian Cc: Lu Baolu Signed-off-by: Liu Yi L --- drivers/vfio/pci/Makefile | 3 +- drivers/vfio/pci/vfio_pci.c | 1424 ----------------------------------- drivers/vfio/pci/vfio_pci_common.c | 2 +- drivers/vfio/pci/vfio_pci_private.h | 2 + 4 files changed, 5 insertions(+), 1426 deletions(-) diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index f027f8a..d94317a 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0-only -vfio-pci-y := vfio_pci.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o +vfio-pci-y := vfio_pci.o vfio_pci_common.o vfio_pci_intrs.o \ + vfio_pci_rdwr.o vfio_pci_config.o vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o vfio-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 90a44a7..c6c39e3 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -54,392 +54,6 @@ module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR); MODULE_PARM_DESC(disable_idle_d3, "Disable using the PCI D3 low power state for idle, unused devices"); -/* - * Our VGA arbiter participation is limited since we don't know anything - * about the device itself. However, if the device is the only VGA device - * downstream of a bridge and VFIO VGA support is disabled, then we can - * safely return legacy VGA IO and memory as not decoded since the user - * has no way to get to it and routing can be disabled externally at the - * bridge. - */ -unsigned int vfio_pci_set_vga_decode(void *opaque, bool single_vga) -{ - struct vfio_pci_device *vdev = opaque; - struct pci_dev *tmp = NULL, *pdev = vdev->pdev; - unsigned char max_busnr; - unsigned int decodes; - - if (single_vga || !vfio_vga_disabled(vdev) || - pci_is_root_bus(pdev->bus)) - return VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM | - VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM; - - max_busnr = pci_bus_max_busnr(pdev->bus); - decodes = VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM; - - while ((tmp = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, tmp)) != NULL) { - if (tmp == pdev || - pci_domain_nr(tmp->bus) != pci_domain_nr(pdev->bus) || - pci_is_root_bus(tmp->bus)) - continue; - - if (tmp->bus->number >= pdev->bus->number && - tmp->bus->number <= max_busnr) { - pci_dev_put(tmp); - decodes |= VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM; - break; - } - } - - return decodes; -} - -static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev) -{ - struct resource *res; - int bar; - struct vfio_pci_dummy_resource *dummy_res; - - INIT_LIST_HEAD(&vdev->dummy_resources_list); - - for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; bar++) { - res = vdev->pdev->resource + bar; - - if (!IS_ENABLED(CONFIG_VFIO_PCI_MMAP)) - goto no_mmap; - - if (!(res->flags & IORESOURCE_MEM)) - goto no_mmap; - - /* - * The PCI core shouldn't set up a resource with a - * type but zero size. But there may be bugs that - * cause us to do that. - */ - if (!resource_size(res)) - goto no_mmap; - - if (resource_size(res) >= PAGE_SIZE) { - vdev->bar_mmap_supported[bar] = true; - continue; - } - - if (!(res->start & ~PAGE_MASK)) { - /* - * Add a dummy resource to reserve the remainder - * of the exclusive page in case that hot-add - * device's bar is assigned into it. - */ - dummy_res = kzalloc(sizeof(*dummy_res), GFP_KERNEL); - if (dummy_res == NULL) - goto no_mmap; - - dummy_res->resource.name = "vfio sub-page reserved"; - dummy_res->resource.start = res->end + 1; - dummy_res->resource.end = res->start + PAGE_SIZE - 1; - dummy_res->resource.flags = res->flags; - if (request_resource(res->parent, - &dummy_res->resource)) { - kfree(dummy_res); - goto no_mmap; - } - dummy_res->index = bar; - list_add(&dummy_res->res_next, - &vdev->dummy_resources_list); - vdev->bar_mmap_supported[bar] = true; - continue; - } - /* - * Here we don't handle the case when the BAR is not page - * aligned because we can't expect the BAR will be - * assigned into the same location in a page in guest - * when we passthrough the BAR. And it's hard to access - * this BAR in userspace because we have no way to get - * the BAR's location in a page. - */ -no_mmap: - vdev->bar_mmap_supported[bar] = false; - } -} - -static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev); - -/* - * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND - * _and_ the ability detect when the device is asserting INTx via PCI_STATUS. - * If a device implements the former but not the latter we would typically - * expect broken_intx_masking be set and require an exclusive interrupt. - * However since we do have control of the device's ability to assert INTx, - * we can instead pretend that the device does not implement INTx, virtualizing - * the pin register to report zero and maintaining DisINTx set on the host. - */ -static bool vfio_pci_nointx(struct pci_dev *pdev) -{ - switch (pdev->vendor) { - case PCI_VENDOR_ID_INTEL: - switch (pdev->device) { - /* All i40e (XL710/X710/XXV710) 10/20/25/40GbE NICs */ - case 0x1572: - case 0x1574: - case 0x1580 ... 0x1581: - case 0x1583 ... 0x158b: - case 0x37d0 ... 0x37d2: - return true; - default: - return false; - } - } - - return false; -} - -void vfio_pci_probe_power_state(struct vfio_pci_device *vdev) -{ - struct pci_dev *pdev = vdev->pdev; - u16 pmcsr; - - if (!pdev->pm_cap) - return; - - pci_read_config_word(pdev, pdev->pm_cap + PCI_PM_CTRL, &pmcsr); - - vdev->needs_pm_restore = !(pmcsr & PCI_PM_CTRL_NO_SOFT_RESET); -} - -/* - * pci_set_power_state() wrapper handling devices which perform a soft reset on - * D3->D0 transition. Save state prior to D0/1/2->D3, stash it on the vdev, - * restore when returned to D0. Saved separately from pci_saved_state for use - * by PM capability emulation and separately from pci_dev internal saved state - * to avoid it being overwritten and consumed around other resets. - */ -int vfio_pci_set_power_state(struct vfio_pci_device *vdev, pci_power_t state) -{ - struct pci_dev *pdev = vdev->pdev; - bool needs_restore = false, needs_save = false; - int ret; - - if (vdev->needs_pm_restore) { - if (pdev->current_state < PCI_D3hot && state >= PCI_D3hot) { - pci_save_state(pdev); - needs_save = true; - } - - if (pdev->current_state >= PCI_D3hot && state <= PCI_D0) - needs_restore = true; - } - - ret = pci_set_power_state(pdev, state); - - if (!ret) { - /* D3 might be unsupported via quirk, skip unless in D3 */ - if (needs_save && pdev->current_state >= PCI_D3hot) { - vdev->pm_save = pci_store_saved_state(pdev); - } else if (needs_restore) { - pci_load_and_free_saved_state(pdev, &vdev->pm_save); - pci_restore_state(pdev); - } - } - - return ret; -} - -int vfio_pci_enable(struct vfio_pci_device *vdev) -{ - struct pci_dev *pdev = vdev->pdev; - int ret; - u16 cmd; - u8 msix_pos; - - vfio_pci_set_power_state(vdev, PCI_D0); - - /* Don't allow our initial saved state to include busmaster */ - pci_clear_master(pdev); - - ret = pci_enable_device(pdev); - if (ret) - return ret; - - /* If reset fails because of the device lock, fail this path entirely */ - ret = pci_try_reset_function(pdev); - if (ret == -EAGAIN) { - pci_disable_device(pdev); - return ret; - } - - vdev->reset_works = !ret; - pci_save_state(pdev); - vdev->pci_saved_state = pci_store_saved_state(pdev); - if (!vdev->pci_saved_state) - pci_dbg(pdev, "%s: Couldn't store saved state\n", __func__); - - if (likely(!vdev->nointxmask)) { - if (vfio_pci_nointx(pdev)) { - pci_info(pdev, "Masking broken INTx support\n"); - vdev->nointx = true; - pci_intx(pdev, 0); - } else - vdev->pci_2_3 = pci_intx_mask_supported(pdev); - } - - pci_read_config_word(pdev, PCI_COMMAND, &cmd); - if (vdev->pci_2_3 && (cmd & PCI_COMMAND_INTX_DISABLE)) { - cmd &= ~PCI_COMMAND_INTX_DISABLE; - pci_write_config_word(pdev, PCI_COMMAND, cmd); - } - - ret = vfio_config_init(vdev); - if (ret) { - kfree(vdev->pci_saved_state); - vdev->pci_saved_state = NULL; - pci_disable_device(pdev); - return ret; - } - - msix_pos = pdev->msix_cap; - if (msix_pos) { - u16 flags; - u32 table; - - pci_read_config_word(pdev, msix_pos + PCI_MSIX_FLAGS, &flags); - pci_read_config_dword(pdev, msix_pos + PCI_MSIX_TABLE, &table); - - vdev->msix_bar = table & PCI_MSIX_TABLE_BIR; - vdev->msix_offset = table & PCI_MSIX_TABLE_OFFSET; - vdev->msix_size = ((flags & PCI_MSIX_FLAGS_QSIZE) + 1) * 16; - } else - vdev->msix_bar = 0xFF; - - if (!vfio_vga_disabled(vdev) && vfio_pci_is_vga(pdev)) - vdev->has_vga = true; - - - if (vfio_pci_is_vga(pdev) && - pdev->vendor == PCI_VENDOR_ID_INTEL && - IS_ENABLED(CONFIG_VFIO_PCI_IGD)) { - ret = vfio_pci_igd_init(vdev); - if (ret) { - pci_warn(pdev, "Failed to setup Intel IGD regions\n"); - goto disable_exit; - } - } - - if (pdev->vendor == PCI_VENDOR_ID_NVIDIA && - IS_ENABLED(CONFIG_VFIO_PCI_NVLINK2)) { - ret = vfio_pci_nvdia_v100_nvlink2_init(vdev); - if (ret && ret != -ENODEV) { - pci_warn(pdev, "Failed to setup NVIDIA NV2 RAM region\n"); - goto disable_exit; - } - } - - if (pdev->vendor == PCI_VENDOR_ID_IBM && - IS_ENABLED(CONFIG_VFIO_PCI_NVLINK2)) { - ret = vfio_pci_ibm_npu2_init(vdev); - if (ret && ret != -ENODEV) { - pci_warn(pdev, "Failed to setup NVIDIA NV2 ATSD region\n"); - goto disable_exit; - } - } - - vfio_pci_probe_mmaps(vdev); - - return 0; - -disable_exit: - vfio_pci_disable(vdev); - return ret; -} - -void vfio_pci_disable(struct vfio_pci_device *vdev) -{ - struct pci_dev *pdev = vdev->pdev; - struct vfio_pci_dummy_resource *dummy_res, *tmp; - struct vfio_pci_ioeventfd *ioeventfd, *ioeventfd_tmp; - int i, bar; - - /* Stop the device from further DMA */ - pci_clear_master(pdev); - - vfio_pci_set_irqs_ioctl(vdev, VFIO_IRQ_SET_DATA_NONE | - VFIO_IRQ_SET_ACTION_TRIGGER, - vdev->irq_type, 0, 0, NULL); - - /* Device closed, don't need mutex here */ - list_for_each_entry_safe(ioeventfd, ioeventfd_tmp, - &vdev->ioeventfds_list, next) { - vfio_virqfd_disable(&ioeventfd->virqfd); - list_del(&ioeventfd->next); - kfree(ioeventfd); - } - vdev->ioeventfds_nr = 0; - - vdev->virq_disabled = false; - - for (i = 0; i < vdev->num_regions; i++) - vdev->region[i].ops->release(vdev, &vdev->region[i]); - - vdev->num_regions = 0; - kfree(vdev->region); - vdev->region = NULL; /* don't krealloc a freed pointer */ - - vfio_config_free(vdev); - - for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; bar++) { - if (!vdev->barmap[bar]) - continue; - pci_iounmap(pdev, vdev->barmap[bar]); - pci_release_selected_regions(pdev, 1 << bar); - vdev->barmap[bar] = NULL; - } - - list_for_each_entry_safe(dummy_res, tmp, - &vdev->dummy_resources_list, res_next) { - list_del(&dummy_res->res_next); - release_resource(&dummy_res->resource); - kfree(dummy_res); - } - - vdev->needs_reset = true; - - /* - * If we have saved state, restore it. If we can reset the device, - * even better. Resetting with current state seems better than - * nothing, but saving and restoring current state without reset - * is just busy work. - */ - if (pci_load_and_free_saved_state(pdev, &vdev->pci_saved_state)) { - pci_info(pdev, "%s: Couldn't reload saved state\n", __func__); - - if (!vdev->reset_works) - goto out; - - pci_save_state(pdev); - } - - /* - * Disable INTx and MSI, presumably to avoid spurious interrupts - * during reset. Stolen from pci_reset_function() - */ - pci_write_config_word(pdev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE); - - /* - * Try to reset the device. The success of this is dependent on - * being able to lock the device, which is not always possible. - */ - if (vdev->reset_works && !pci_try_reset_function(pdev)) - vdev->needs_reset = false; - - pci_restore_state(pdev); -out: - pci_disable_device(pdev); - - vfio_pci_try_bus_reset(vdev); - - if (!vdev->disable_idle_d3) - vfio_pci_set_power_state(vdev, PCI_D3hot); -} - static void vfio_pci_release(void *device_data) { struct vfio_pci_device *vdev = device_data; @@ -481,777 +95,6 @@ static int vfio_pci_open(void *device_data) return ret; } -static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type) -{ - if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) { - u8 pin; - - if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || - vdev->nointx || vdev->pdev->is_virtfn) - return 0; - - pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN, &pin); - - return pin ? 1 : 0; - } else if (irq_type == VFIO_PCI_MSI_IRQ_INDEX) { - u8 pos; - u16 flags; - - pos = vdev->pdev->msi_cap; - if (pos) { - pci_read_config_word(vdev->pdev, - pos + PCI_MSI_FLAGS, &flags); - return 1 << ((flags & PCI_MSI_FLAGS_QMASK) >> 1); - } - } else if (irq_type == VFIO_PCI_MSIX_IRQ_INDEX) { - u8 pos; - u16 flags; - - pos = vdev->pdev->msix_cap; - if (pos) { - pci_read_config_word(vdev->pdev, - pos + PCI_MSIX_FLAGS, &flags); - - return (flags & PCI_MSIX_FLAGS_QSIZE) + 1; - } - } else if (irq_type == VFIO_PCI_ERR_IRQ_INDEX) { - if (pci_is_pcie(vdev->pdev)) - return 1; - } else if (irq_type == VFIO_PCI_REQ_IRQ_INDEX) { - return 1; - } - - return 0; -} - -static int vfio_pci_count_devs(struct pci_dev *pdev, void *data) -{ - (*(int *)data)++; - return 0; -} - -struct vfio_pci_fill_info { - int max; - int cur; - struct vfio_pci_dependent_device *devices; -}; - -static int vfio_pci_fill_devs(struct pci_dev *pdev, void *data) -{ - struct vfio_pci_fill_info *fill = data; - struct iommu_group *iommu_group; - - if (fill->cur == fill->max) - return -EAGAIN; /* Something changed, try again */ - - iommu_group = iommu_group_get(&pdev->dev); - if (!iommu_group) - return -EPERM; /* Cannot reset non-isolated devices */ - - fill->devices[fill->cur].group_id = iommu_group_id(iommu_group); - fill->devices[fill->cur].segment = pci_domain_nr(pdev->bus); - fill->devices[fill->cur].bus = pdev->bus->number; - fill->devices[fill->cur].devfn = pdev->devfn; - fill->cur++; - iommu_group_put(iommu_group); - return 0; -} - -struct vfio_pci_group_entry { - struct vfio_group *group; - int id; -}; - -struct vfio_pci_group_info { - int count; - struct vfio_pci_group_entry *groups; -}; - -static int vfio_pci_validate_devs(struct pci_dev *pdev, void *data) -{ - struct vfio_pci_group_info *info = data; - struct iommu_group *group; - int id, i; - - group = iommu_group_get(&pdev->dev); - if (!group) - return -EPERM; - - id = iommu_group_id(group); - - for (i = 0; i < info->count; i++) - if (info->groups[i].id == id) - break; - - iommu_group_put(group); - - return (i == info->count) ? -EINVAL : 0; -} - -static bool vfio_pci_dev_below_slot(struct pci_dev *pdev, struct pci_slot *slot) -{ - for (; pdev; pdev = pdev->bus->self) - if (pdev->bus == slot->bus) - return (pdev->slot == slot); - return false; -} - -struct vfio_pci_walk_info { - int (*fn)(struct pci_dev *, void *data); - void *data; - struct pci_dev *pdev; - bool slot; - int ret; -}; - -static int vfio_pci_walk_wrapper(struct pci_dev *pdev, void *data) -{ - struct vfio_pci_walk_info *walk = data; - - if (!walk->slot || vfio_pci_dev_below_slot(pdev, walk->pdev->slot)) - walk->ret = walk->fn(pdev, walk->data); - - return walk->ret; -} - -static int vfio_pci_for_each_slot_or_bus(struct pci_dev *pdev, - int (*fn)(struct pci_dev *, - void *data), void *data, - bool slot) -{ - struct vfio_pci_walk_info walk = { - .fn = fn, .data = data, .pdev = pdev, .slot = slot, .ret = 0, - }; - - pci_walk_bus(pdev->bus, vfio_pci_walk_wrapper, &walk); - - return walk.ret; -} - -static int msix_mmappable_cap(struct vfio_pci_device *vdev, - struct vfio_info_cap *caps) -{ - struct vfio_info_cap_header header = { - .id = VFIO_REGION_INFO_CAP_MSIX_MAPPABLE, - .version = 1 - }; - - return vfio_info_add_capability(caps, &header, sizeof(header)); -} - -int vfio_pci_register_dev_region(struct vfio_pci_device *vdev, - unsigned int type, unsigned int subtype, - const struct vfio_pci_regops *ops, - size_t size, u32 flags, void *data) -{ - struct vfio_pci_region *region; - - region = krealloc(vdev->region, - (vdev->num_regions + 1) * sizeof(*region), - GFP_KERNEL); - if (!region) - return -ENOMEM; - - vdev->region = region; - vdev->region[vdev->num_regions].type = type; - vdev->region[vdev->num_regions].subtype = subtype; - vdev->region[vdev->num_regions].ops = ops; - vdev->region[vdev->num_regions].size = size; - vdev->region[vdev->num_regions].flags = flags; - vdev->region[vdev->num_regions].data = data; - - vdev->num_regions++; - - return 0; -} - -long vfio_pci_ioctl(void *device_data, - unsigned int cmd, unsigned long arg) -{ - struct vfio_pci_device *vdev = device_data; - unsigned long minsz; - - if (cmd == VFIO_DEVICE_GET_INFO) { - struct vfio_device_info info; - - minsz = offsetofend(struct vfio_device_info, num_irqs); - - if (copy_from_user(&info, (void __user *)arg, minsz)) - return -EFAULT; - - if (info.argsz < minsz) - return -EINVAL; - - info.flags = VFIO_DEVICE_FLAGS_PCI; - - if (vdev->reset_works) - info.flags |= VFIO_DEVICE_FLAGS_RESET; - - info.num_regions = VFIO_PCI_NUM_REGIONS + vdev->num_regions; - info.num_irqs = VFIO_PCI_NUM_IRQS; - - return copy_to_user((void __user *)arg, &info, minsz) ? - -EFAULT : 0; - - } else if (cmd == VFIO_DEVICE_GET_REGION_INFO) { - struct pci_dev *pdev = vdev->pdev; - struct vfio_region_info info; - struct vfio_info_cap caps = { .buf = NULL, .size = 0 }; - int i, ret; - - minsz = offsetofend(struct vfio_region_info, offset); - - if (copy_from_user(&info, (void __user *)arg, minsz)) - return -EFAULT; - - if (info.argsz < minsz) - return -EINVAL; - - switch (info.index) { - case VFIO_PCI_CONFIG_REGION_INDEX: - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); - info.size = pdev->cfg_size; - info.flags = VFIO_REGION_INFO_FLAG_READ | - VFIO_REGION_INFO_FLAG_WRITE; - break; - case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX: - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); - info.size = pci_resource_len(pdev, info.index); - if (!info.size) { - info.flags = 0; - break; - } - - info.flags = VFIO_REGION_INFO_FLAG_READ | - VFIO_REGION_INFO_FLAG_WRITE; - if (vdev->bar_mmap_supported[info.index]) { - info.flags |= VFIO_REGION_INFO_FLAG_MMAP; - if (info.index == vdev->msix_bar) { - ret = msix_mmappable_cap(vdev, &caps); - if (ret) - return ret; - } - } - - break; - case VFIO_PCI_ROM_REGION_INDEX: - { - void __iomem *io; - size_t size; - u16 orig_cmd; - - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); - info.flags = 0; - - /* Report the BAR size, not the ROM size */ - info.size = pci_resource_len(pdev, info.index); - if (!info.size) { - /* Shadow ROMs appear as PCI option ROMs */ - if (pdev->resource[PCI_ROM_RESOURCE].flags & - IORESOURCE_ROM_SHADOW) - info.size = 0x20000; - else - break; - } - - /* - * Is it really there? Enable memory decode for - * implicit access in pci_map_rom(). - */ - pci_read_config_word(pdev, PCI_COMMAND, &orig_cmd); - pci_write_config_word(pdev, PCI_COMMAND, - orig_cmd | PCI_COMMAND_MEMORY); - - io = pci_map_rom(pdev, &size); - if (io) { - info.flags = VFIO_REGION_INFO_FLAG_READ; - pci_unmap_rom(pdev, io); - } else { - info.size = 0; - } - - pci_write_config_word(pdev, PCI_COMMAND, orig_cmd); - break; - } - case VFIO_PCI_VGA_REGION_INDEX: - if (!vdev->has_vga) - return -EINVAL; - - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); - info.size = 0xc0000; - info.flags = VFIO_REGION_INFO_FLAG_READ | - VFIO_REGION_INFO_FLAG_WRITE; - - break; - default: - { - struct vfio_region_info_cap_type cap_type = { - .header.id = VFIO_REGION_INFO_CAP_TYPE, - .header.version = 1 }; - - if (info.index >= - VFIO_PCI_NUM_REGIONS + vdev->num_regions) - return -EINVAL; - info.index = array_index_nospec(info.index, - VFIO_PCI_NUM_REGIONS + - vdev->num_regions); - - i = info.index - VFIO_PCI_NUM_REGIONS; - - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); - info.size = vdev->region[i].size; - info.flags = vdev->region[i].flags; - - cap_type.type = vdev->region[i].type; - cap_type.subtype = vdev->region[i].subtype; - - ret = vfio_info_add_capability(&caps, &cap_type.header, - sizeof(cap_type)); - if (ret) - return ret; - - if (vdev->region[i].ops->add_capability) { - ret = vdev->region[i].ops->add_capability(vdev, - &vdev->region[i], &caps); - if (ret) - return ret; - } - } - } - - if (caps.size) { - info.flags |= VFIO_REGION_INFO_FLAG_CAPS; - if (info.argsz < sizeof(info) + caps.size) { - info.argsz = sizeof(info) + caps.size; - info.cap_offset = 0; - } else { - vfio_info_cap_shift(&caps, sizeof(info)); - if (copy_to_user((void __user *)arg + - sizeof(info), caps.buf, - caps.size)) { - kfree(caps.buf); - return -EFAULT; - } - info.cap_offset = sizeof(info); - } - - kfree(caps.buf); - } - - return copy_to_user((void __user *)arg, &info, minsz) ? - -EFAULT : 0; - - } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) { - struct vfio_irq_info info; - - minsz = offsetofend(struct vfio_irq_info, count); - - if (copy_from_user(&info, (void __user *)arg, minsz)) - return -EFAULT; - - if (info.argsz < minsz || info.index >= VFIO_PCI_NUM_IRQS) - return -EINVAL; - - switch (info.index) { - case VFIO_PCI_INTX_IRQ_INDEX ... VFIO_PCI_MSIX_IRQ_INDEX: - case VFIO_PCI_REQ_IRQ_INDEX: - break; - case VFIO_PCI_ERR_IRQ_INDEX: - if (pci_is_pcie(vdev->pdev)) - break; - /* fall through */ - default: - return -EINVAL; - } - - info.flags = VFIO_IRQ_INFO_EVENTFD; - - info.count = vfio_pci_get_irq_count(vdev, info.index); - - if (info.index == VFIO_PCI_INTX_IRQ_INDEX) - info.flags |= (VFIO_IRQ_INFO_MASKABLE | - VFIO_IRQ_INFO_AUTOMASKED); - else - info.flags |= VFIO_IRQ_INFO_NORESIZE; - - return copy_to_user((void __user *)arg, &info, minsz) ? - -EFAULT : 0; - - } else if (cmd == VFIO_DEVICE_SET_IRQS) { - struct vfio_irq_set hdr; - u8 *data = NULL; - int max, ret = 0; - size_t data_size = 0; - - minsz = offsetofend(struct vfio_irq_set, count); - - if (copy_from_user(&hdr, (void __user *)arg, minsz)) - return -EFAULT; - - max = vfio_pci_get_irq_count(vdev, hdr.index); - - ret = vfio_set_irqs_validate_and_prepare(&hdr, max, - VFIO_PCI_NUM_IRQS, &data_size); - if (ret) - return ret; - - if (data_size) { - data = memdup_user((void __user *)(arg + minsz), - data_size); - if (IS_ERR(data)) - return PTR_ERR(data); - } - - mutex_lock(&vdev->igate); - - ret = vfio_pci_set_irqs_ioctl(vdev, hdr.flags, hdr.index, - hdr.start, hdr.count, data); - - mutex_unlock(&vdev->igate); - kfree(data); - - return ret; - - } else if (cmd == VFIO_DEVICE_RESET) { - return vdev->reset_works ? - pci_try_reset_function(vdev->pdev) : -EINVAL; - - } else if (cmd == VFIO_DEVICE_GET_PCI_HOT_RESET_INFO) { - struct vfio_pci_hot_reset_info hdr; - struct vfio_pci_fill_info fill = { 0 }; - struct vfio_pci_dependent_device *devices = NULL; - bool slot = false; - int ret = 0; - - minsz = offsetofend(struct vfio_pci_hot_reset_info, count); - - if (copy_from_user(&hdr, (void __user *)arg, minsz)) - return -EFAULT; - - if (hdr.argsz < minsz) - return -EINVAL; - - hdr.flags = 0; - - /* Can we do a slot or bus reset or neither? */ - if (!pci_probe_reset_slot(vdev->pdev->slot)) - slot = true; - else if (pci_probe_reset_bus(vdev->pdev->bus)) - return -ENODEV; - - /* How many devices are affected? */ - ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, - vfio_pci_count_devs, - &fill.max, slot); - if (ret) - return ret; - - WARN_ON(!fill.max); /* Should always be at least one */ - - /* - * If there's enough space, fill it now, otherwise return - * -ENOSPC and the number of devices affected. - */ - if (hdr.argsz < sizeof(hdr) + (fill.max * sizeof(*devices))) { - ret = -ENOSPC; - hdr.count = fill.max; - goto reset_info_exit; - } - - devices = kcalloc(fill.max, sizeof(*devices), GFP_KERNEL); - if (!devices) - return -ENOMEM; - - fill.devices = devices; - - ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, - vfio_pci_fill_devs, - &fill, slot); - - /* - * If a device was removed between counting and filling, - * we may come up short of fill.max. If a device was - * added, we'll have a return of -EAGAIN above. - */ - if (!ret) - hdr.count = fill.cur; - -reset_info_exit: - if (copy_to_user((void __user *)arg, &hdr, minsz)) - ret = -EFAULT; - - if (!ret) { - if (copy_to_user((void __user *)(arg + minsz), devices, - hdr.count * sizeof(*devices))) - ret = -EFAULT; - } - - kfree(devices); - return ret; - - } else if (cmd == VFIO_DEVICE_PCI_HOT_RESET) { - struct vfio_pci_hot_reset hdr; - int32_t *group_fds; - struct vfio_pci_group_entry *groups; - struct vfio_pci_group_info info; - bool slot = false; - int i, count = 0, ret = 0; - - minsz = offsetofend(struct vfio_pci_hot_reset, count); - - if (copy_from_user(&hdr, (void __user *)arg, minsz)) - return -EFAULT; - - if (hdr.argsz < minsz || hdr.flags) - return -EINVAL; - - /* Can we do a slot or bus reset or neither? */ - if (!pci_probe_reset_slot(vdev->pdev->slot)) - slot = true; - else if (pci_probe_reset_bus(vdev->pdev->bus)) - return -ENODEV; - - /* - * We can't let userspace give us an arbitrarily large - * buffer to copy, so verify how many we think there - * could be. Note groups can have multiple devices so - * one group per device is the max. - */ - ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, - vfio_pci_count_devs, - &count, slot); - if (ret) - return ret; - - /* Somewhere between 1 and count is OK */ - if (!hdr.count || hdr.count > count) - return -EINVAL; - - group_fds = kcalloc(hdr.count, sizeof(*group_fds), GFP_KERNEL); - groups = kcalloc(hdr.count, sizeof(*groups), GFP_KERNEL); - if (!group_fds || !groups) { - kfree(group_fds); - kfree(groups); - return -ENOMEM; - } - - if (copy_from_user(group_fds, (void __user *)(arg + minsz), - hdr.count * sizeof(*group_fds))) { - kfree(group_fds); - kfree(groups); - return -EFAULT; - } - - /* - * For each group_fd, get the group through the vfio external - * user interface and store the group and iommu ID. This - * ensures the group is held across the reset. - */ - for (i = 0; i < hdr.count; i++) { - struct vfio_group *group; - struct fd f = fdget(group_fds[i]); - if (!f.file) { - ret = -EBADF; - break; - } - - group = vfio_group_get_external_user(f.file); - fdput(f); - if (IS_ERR(group)) { - ret = PTR_ERR(group); - break; - } - - groups[i].group = group; - groups[i].id = vfio_external_user_iommu_id(group); - } - - kfree(group_fds); - - /* release reference to groups on error */ - if (ret) - goto hot_reset_release; - - info.count = hdr.count; - info.groups = groups; - - /* - * Test whether all the affected devices are contained - * by the set of groups provided by the user. - */ - ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, - vfio_pci_validate_devs, - &info, slot); - if (!ret) - /* User has access, do the reset */ - ret = pci_reset_bus(vdev->pdev); - -hot_reset_release: - for (i--; i >= 0; i--) - vfio_group_put_external_user(groups[i].group); - - kfree(groups); - return ret; - } else if (cmd == VFIO_DEVICE_IOEVENTFD) { - struct vfio_device_ioeventfd ioeventfd; - int count; - - minsz = offsetofend(struct vfio_device_ioeventfd, fd); - - if (copy_from_user(&ioeventfd, (void __user *)arg, minsz)) - return -EFAULT; - - if (ioeventfd.argsz < minsz) - return -EINVAL; - - if (ioeventfd.flags & ~VFIO_DEVICE_IOEVENTFD_SIZE_MASK) - return -EINVAL; - - count = ioeventfd.flags & VFIO_DEVICE_IOEVENTFD_SIZE_MASK; - - if (hweight8(count) != 1 || ioeventfd.fd < -1) - return -EINVAL; - - return vfio_pci_ioeventfd(vdev, ioeventfd.offset, - ioeventfd.data, count, ioeventfd.fd); - } - - return -ENOTTY; -} - -static ssize_t vfio_pci_rw(void *device_data, char __user *buf, - size_t count, loff_t *ppos, bool iswrite) -{ - unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos); - struct vfio_pci_device *vdev = device_data; - - if (index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions) - return -EINVAL; - - switch (index) { - case VFIO_PCI_CONFIG_REGION_INDEX: - return vfio_pci_config_rw(vdev, buf, count, ppos, iswrite); - - case VFIO_PCI_ROM_REGION_INDEX: - if (iswrite) - return -EINVAL; - return vfio_pci_bar_rw(vdev, buf, count, ppos, false); - - case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX: - return vfio_pci_bar_rw(vdev, buf, count, ppos, iswrite); - - case VFIO_PCI_VGA_REGION_INDEX: - return vfio_pci_vga_rw(vdev, buf, count, ppos, iswrite); - default: - index -= VFIO_PCI_NUM_REGIONS; - return vdev->region[index].ops->rw(vdev, buf, - count, ppos, iswrite); - } - - return -EINVAL; -} - -ssize_t vfio_pci_read(void *device_data, char __user *buf, - size_t count, loff_t *ppos) -{ - if (!count) - return 0; - - return vfio_pci_rw(device_data, buf, count, ppos, false); -} - -ssize_t vfio_pci_write(void *device_data, const char __user *buf, - size_t count, loff_t *ppos) -{ - if (!count) - return 0; - - return vfio_pci_rw(device_data, (char __user *)buf, count, ppos, true); -} - -int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) -{ - struct vfio_pci_device *vdev = device_data; - struct pci_dev *pdev = vdev->pdev; - unsigned int index; - u64 phys_len, req_len, pgoff, req_start; - int ret; - - index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); - - if (vma->vm_end < vma->vm_start) - return -EINVAL; - if ((vma->vm_flags & VM_SHARED) == 0) - return -EINVAL; - if (index >= VFIO_PCI_NUM_REGIONS) { - int regnum = index - VFIO_PCI_NUM_REGIONS; - struct vfio_pci_region *region = vdev->region + regnum; - - if (region && region->ops && region->ops->mmap && - (region->flags & VFIO_REGION_INFO_FLAG_MMAP)) - return region->ops->mmap(vdev, region, vma); - return -EINVAL; - } - if (index >= VFIO_PCI_ROM_REGION_INDEX) - return -EINVAL; - if (!vdev->bar_mmap_supported[index]) - return -EINVAL; - - phys_len = PAGE_ALIGN(pci_resource_len(pdev, index)); - req_len = vma->vm_end - vma->vm_start; - pgoff = vma->vm_pgoff & - ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1); - req_start = pgoff << PAGE_SHIFT; - - if (req_start + req_len > phys_len) - return -EINVAL; - - /* - * Even though we don't make use of the barmap for the mmap, - * we need to request the region and the barmap tracks that. - */ - if (!vdev->barmap[index]) { - ret = pci_request_selected_regions(pdev, - 1 << index, "vfio-pci"); - if (ret) - return ret; - - vdev->barmap[index] = pci_iomap(pdev, index, 0); - if (!vdev->barmap[index]) { - pci_release_selected_regions(pdev, 1 << index); - return -ENOMEM; - } - } - - vma->vm_private_data = vdev; - vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); - vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff; - - return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, - req_len, vma->vm_page_prot); -} - -void vfio_pci_request(void *device_data, unsigned int count) -{ - struct vfio_pci_device *vdev = device_data; - struct pci_dev *pdev = vdev->pdev; - - mutex_lock(&vdev->igate); - - if (vdev->req_trigger) { - if (!(count % 10)) - pci_notice_ratelimited(pdev, - "Relaying device request to user (#%u)\n", - count); - eventfd_signal(vdev->req_trigger, 1); - } else if (count == 0) { - pci_warn(pdev, - "No device request channel registered, blocked until released by user\n"); - } - - mutex_unlock(&vdev->igate); -} - static const struct vfio_device_ops vfio_pci_ops = { .name = "vfio-pci", .open = vfio_pci_open, @@ -1375,38 +218,6 @@ static void vfio_pci_remove(struct pci_dev *pdev) } } -static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev, - pci_channel_state_t state) -{ - struct vfio_pci_device *vdev; - struct vfio_device *device; - - device = vfio_device_get_from_dev(&pdev->dev); - if (device == NULL) - return PCI_ERS_RESULT_DISCONNECT; - - vdev = vfio_device_data(device); - if (vdev == NULL) { - vfio_device_put(device); - return PCI_ERS_RESULT_DISCONNECT; - } - - mutex_lock(&vdev->igate); - - if (vdev->err_trigger) - eventfd_signal(vdev->err_trigger, 1); - - mutex_unlock(&vdev->igate); - - vfio_device_put(device); - - return PCI_ERS_RESULT_CAN_RECOVER; -} - -static const struct pci_error_handlers vfio_err_handlers = { - .error_detected = vfio_pci_aer_err_detected, -}; - static struct pci_driver vfio_pci_driver = { .name = "vfio-pci", .id_table = NULL, /* only dynamic ids */ @@ -1415,247 +226,12 @@ static struct pci_driver vfio_pci_driver = { .err_handler = &vfio_err_handlers, }; -static DEFINE_MUTEX(reflck_lock); - -static struct vfio_pci_reflck *vfio_pci_reflck_alloc(void) -{ - struct vfio_pci_reflck *reflck; - - reflck = kzalloc(sizeof(*reflck), GFP_KERNEL); - if (!reflck) - return ERR_PTR(-ENOMEM); - - kref_init(&reflck->kref); - mutex_init(&reflck->lock); - - return reflck; -} - -static void vfio_pci_reflck_get(struct vfio_pci_reflck *reflck) -{ - kref_get(&reflck->kref); -} - -static int vfio_pci_reflck_find(struct pci_dev *pdev, void *data) -{ - struct vfio_pci_device *vdev = data; - struct vfio_pci_reflck **preflck = &vdev->reflck; - struct vfio_device *device; - struct vfio_pci_device *tmp; - - device = vfio_device_get_from_dev(&pdev->dev); - if (!device) - return 0; - - if (pci_dev_driver(pdev) != pci_dev_driver(vdev->pdev)) { - vfio_device_put(device); - return 0; - } - - tmp = vfio_device_data(device); - - if (tmp->reflck) { - vfio_pci_reflck_get(tmp->reflck); - *preflck = tmp->reflck; - vfio_device_put(device); - return 1; - } - - vfio_device_put(device); - return 0; -} - -int vfio_pci_reflck_attach(struct vfio_pci_device *vdev) -{ - bool slot = !pci_probe_reset_slot(vdev->pdev->slot); - - mutex_lock(&reflck_lock); - - if (pci_is_root_bus(vdev->pdev->bus) || - vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_reflck_find, - vdev, slot) <= 0) - vdev->reflck = vfio_pci_reflck_alloc(); - - mutex_unlock(&reflck_lock); - - return PTR_ERR_OR_ZERO(vdev->reflck); -} - -static void vfio_pci_reflck_release(struct kref *kref) -{ - struct vfio_pci_reflck *reflck = container_of(kref, - struct vfio_pci_reflck, - kref); - - kfree(reflck); - mutex_unlock(&reflck_lock); -} - -void vfio_pci_reflck_put(struct vfio_pci_reflck *reflck) -{ - kref_put_mutex(&reflck->kref, vfio_pci_reflck_release, &reflck_lock); -} - -struct vfio_devices { - struct vfio_device **devices; - struct vfio_pci_device *vdev; - int cur_index; - int max_index; -}; - -static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data) -{ - struct vfio_devices *devs = data; - struct vfio_device *device; - struct vfio_pci_device *tmp; - - if (devs->cur_index == devs->max_index) - return -ENOSPC; - - device = vfio_device_get_from_dev(&pdev->dev); - if (!device) - return -EINVAL; - - if (pci_dev_driver(pdev) != pci_dev_driver(devs->vdev->pdev)) { - vfio_device_put(device); - return -EBUSY; - } - - tmp = vfio_device_data(device); - - /* Fault if the device is not unused */ - if (tmp->refcnt) { - vfio_device_put(device); - return -EBUSY; - } - - devs->devices[devs->cur_index++] = device; - return 0; -} - -/* - * If a bus or slot reset is available for the provided device and: - * - All of the devices affected by that bus or slot reset are unused - * (!refcnt) - * - At least one of the affected devices is marked dirty via - * needs_reset (such as by lack of FLR support) - * Then attempt to perform that bus or slot reset. Callers are required - * to hold vdev->reflck->lock, protecting the bus/slot reset group from - * concurrent opens. A vfio_device reference is acquired for each device - * to prevent unbinds during the reset operation. - * - * NB: vfio-core considers a group to be viable even if some devices are - * bound to drivers like pci-stub or pcieport. Here we require all devices - * to be bound to vfio_pci since that's the only way we can be sure they - * stay put. - */ -static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev) -{ - struct vfio_devices devs = { .cur_index = 0 }; - int i = 0, ret = -EINVAL; - bool slot = false; - struct vfio_pci_device *tmp; - - if (!pci_probe_reset_slot(vdev->pdev->slot)) - slot = true; - else if (pci_probe_reset_bus(vdev->pdev->bus)) - return; - - if (vfio_pci_for_each_slot_or_bus(vdev->pdev, vfio_pci_count_devs, - &i, slot) || !i) - return; - - devs.max_index = i; - devs.devices = kcalloc(i, sizeof(struct vfio_device *), GFP_KERNEL); - if (!devs.devices) - return; - - devs.vdev = vdev; - if (vfio_pci_for_each_slot_or_bus(vdev->pdev, - vfio_pci_get_unused_devs, - &devs, slot)) - goto put_devs; - - /* Does at least one need a reset? */ - for (i = 0; i < devs.cur_index; i++) { - tmp = vfio_device_data(devs.devices[i]); - if (tmp->needs_reset) { - ret = pci_reset_bus(vdev->pdev); - break; - } - } - -put_devs: - for (i = 0; i < devs.cur_index; i++) { - tmp = vfio_device_data(devs.devices[i]); - - /* - * If reset was successful, affected devices no longer need - * a reset and we should return all the collateral devices - * to low power. If not successful, we either didn't reset - * the bus or timed out waiting for it, so let's not touch - * the power state. - */ - if (!ret) { - tmp->needs_reset = false; - - if (tmp != vdev && !tmp->disable_idle_d3) - vfio_pci_set_power_state(tmp, PCI_D3hot); - } - - vfio_device_put(devs.devices[i]); - } - - kfree(devs.devices); -} - static void __exit vfio_pci_cleanup(void) { pci_unregister_driver(&vfio_pci_driver); vfio_pci_uninit_perm_bits(); } -void __init vfio_pci_fill_ids(char *ids, struct pci_driver *driver) -{ - char *p, *id; - int rc; - - /* no ids passed actually */ - if (ids[0] == '\0') - return; - - /* add ids specified in the module parameter */ - p = ids; - while ((id = strsep(&p, ","))) { - unsigned int vendor, device, subvendor = PCI_ANY_ID, - subdevice = PCI_ANY_ID, class = 0, class_mask = 0; - int fields; - - if (!strlen(id)) - continue; - - fields = sscanf(id, "%x:%x:%x:%x:%x:%x", - &vendor, &device, &subvendor, &subdevice, - &class, &class_mask); - - if (fields < 2) { - pr_warn("invalid id string \"%s\"\n", id); - continue; - } - - rc = pci_add_dynid(driver, vendor, device, - subvendor, subdevice, class, class_mask, 0); - if (rc) - pr_warn("failed to add dynamic id [%04x:%04x[%04x:%04x]] class %#08x/%08x (%d)\n", - vendor, device, subvendor, subdevice, - class, class_mask, rc); - else - pr_info("add [%04x:%04x[%04x:%04x]] class %#08x/%08x\n", - vendor, device, subvendor, subdevice, - class, class_mask); - } -} - static int __init vfio_pci_init(void) { int ret; diff --git a/drivers/vfio/pci/vfio_pci_common.c b/drivers/vfio/pci/vfio_pci_common.c index 11365f1..318ad29 100644 --- a/drivers/vfio/pci/vfio_pci_common.c +++ b/drivers/vfio/pci/vfio_pci_common.c @@ -1215,7 +1215,7 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev, return PCI_ERS_RESULT_CAN_RECOVER; } -static const struct pci_error_handlers vfio_err_handlers = { +const struct pci_error_handlers vfio_err_handlers = { .error_detected = vfio_pci_aer_err_detected, }; diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index 53d0385..b64c4e9 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -135,6 +135,8 @@ struct vfio_pci_device { #define is_irq_none(vdev) (!(is_intx(vdev) || is_msi(vdev) || is_msix(vdev))) #define irq_is(vdev, type) (vdev->irq_type == type) +extern const struct pci_error_handlers vfio_err_handlers; + static inline bool vfio_pci_is_vga(struct pci_dev *pdev) { return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA; From patchwork Thu Sep 5 07:59:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11134713 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2ED2114ED for ; Fri, 6 Sep 2019 08:18:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0D96C2087E for ; Fri, 6 Sep 2019 08:18:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391688AbfIFIRo (ORCPT ); Fri, 6 Sep 2019 04:17:44 -0400 Received: from mga11.intel.com ([192.55.52.93]:23720 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2391410AbfIFIRn (ORCPT ); Fri, 6 Sep 2019 04:17:43 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Sep 2019 01:17:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,472,1559545200"; d="scan'208";a="383186173" Received: from yiliu-dev.bj.intel.com ([10.238.156.139]) by fmsmga005.fm.intel.com with ESMTP; 06 Sep 2019 01:17:40 -0700 From: Liu Yi L To: alex.williamson@redhat.com, kwankhede@nvidia.com Cc: kevin.tian@intel.com, baolu.lu@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, joro@8bytes.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, yan.y.zhao@intel.com, shaopeng.he@intel.com, chenbo.xia@intel.com, jun.j.tian@intel.com Subject: [PATCH v2 08/13] vfio/pci: protect cap/ecap_perm bits alloc/free with atomic op Date: Thu, 5 Sep 2019 15:59:25 +0800 Message-Id: <1567670370-4484-9-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> References: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org There is a case in which cap_perms and ecap_perms can be reallocated by different modules. e.g. the vfio-mdev-pci sample driver. To secure the initialization of cap_perms and ecap_perms, this patch adds an atomic variable to track the user of cap/ecap_perms bits. First caller of vfio_pci_init_perm_bits() will initialize the bits. While the last caller of vfio_pci_uninit_perm_bits() will free the bits. Cc: Kevin Tian Cc: Lu Baolu Suggested-by: Alex Williamson Signed-off-by: Liu Yi L --- drivers/vfio/pci/vfio_pci_config.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index f0891bd..1b3e6e5 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -992,11 +992,17 @@ static int __init init_pci_ext_cap_pwr_perm(struct perm_bits *perm) return 0; } +/* Track the user number of the cap/ecap perm_bits */ +atomic_t vfio_pci_perm_bits_users = ATOMIC_INIT(0); + /* * Initialize the shared permission tables */ void vfio_pci_uninit_perm_bits(void) { + if (atomic_dec_return(&vfio_pci_perm_bits_users)) + return; + free_perm_bits(&cap_perms[PCI_CAP_ID_BASIC]); free_perm_bits(&cap_perms[PCI_CAP_ID_PM]); @@ -1013,6 +1019,9 @@ int __init vfio_pci_init_perm_bits(void) { int ret; + if (atomic_inc_return(&vfio_pci_perm_bits_users) != 1) + return 0; + /* Basic config space */ ret = init_pci_cap_basic_perm(&cap_perms[PCI_CAP_ID_BASIC]); From patchwork Thu Sep 5 07:59:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11134711 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3706714ED for ; Fri, 6 Sep 2019 08:18:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0FB542082C for ; Fri, 6 Sep 2019 08:18:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2392619AbfIFIRs (ORCPT ); Fri, 6 Sep 2019 04:17:48 -0400 Received: from mga11.intel.com ([192.55.52.93]:23720 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2392610AbfIFIRq (ORCPT ); Fri, 6 Sep 2019 04:17:46 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Sep 2019 01:17:45 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,472,1559545200"; d="scan'208";a="383186188" Received: from yiliu-dev.bj.intel.com ([10.238.156.139]) by fmsmga005.fm.intel.com with ESMTP; 06 Sep 2019 01:17:43 -0700 From: Liu Yi L To: alex.williamson@redhat.com, kwankhede@nvidia.com Cc: kevin.tian@intel.com, baolu.lu@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, joro@8bytes.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, yan.y.zhao@intel.com, shaopeng.he@intel.com, chenbo.xia@intel.com, jun.j.tian@intel.com, Masahiro Yamada Subject: [PATCH v2 09/13] samples: add vfio-mdev-pci driver Date: Thu, 5 Sep 2019 15:59:26 +0800 Message-Id: <1567670370-4484-10-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> References: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> MIME-Version: 1.0 Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds sample driver named vfio-mdev-pci. It is to wrap a PCI device as a mediated device. For a pci device, once bound to vfio-mdev-pci driver, user space access of this device will go through vfio mdev framework. The usage of the device follows mdev management method. e.g. user should create a mdev before exposing the device to user-space. Benefit of this new driver would be acting as a sample driver for recent changes from "vfio/mdev: IOMMU aware mediated device" patchset. Also it could be a good experiment driver for future device specific mdev migration support. To use this driver: a) build and load vfio-mdev-pci.ko module execute "make menuconfig" and config CONFIG_SAMPLE_VFIO_MDEV_PCI then load it with following command > sudo modprobe vfio > sudo modprobe vfio-pci > sudo insmod drivers/vfio/pci/vfio-mdev-pci.ko b) unbind original device driver e.g. use following command to unbind its original driver > echo $dev_bdf > /sys/bus/pci/devices/$dev_bdf/driver/unbind c) bind vfio-mdev-pci driver to the physical device > echo $vend_id $dev_id > /sys/bus/pci/drivers/vfio-mdev-pci/new_id d) check the supported mdev instances > ls /sys/bus/pci/devices/$dev_bdf/mdev_supported_types/ vfio-mdev-pci-type_name > ls /sys/bus/pci/devices/$dev_bdf/mdev_supported_types/\ vfio-mdev-pci-type_name/ available_instances create device_api devices name e) create mdev on this physical device (only 1 instance) > echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1003" > \ /sys/bus/pci/devices/$dev_bdf/mdev_supported_types/\ vfio-mdev-pci-type_name/create f) passthru the mdev to guest add the following line in QEMU boot command -device vfio-pci,\ sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1003 g) destroy mdev > echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1003/\ remove Cc: Kevin Tian Cc: Lu Baolu Cc: Masahiro Yamada Suggested-by: Alex Williamson Signed-off-by: Liu Yi L --- drivers/vfio/pci/Makefile | 6 + drivers/vfio/pci/vfio_mdev_pci.c | 403 +++++++++++++++++++++++++++++++++++++++ samples/Kconfig | 11 ++ 3 files changed, 420 insertions(+) create mode 100644 drivers/vfio/pci/vfio_mdev_pci.c diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index d94317a..ac118ef 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -5,4 +5,10 @@ vfio-pci-y := vfio_pci.o vfio_pci_common.o vfio_pci_intrs.o \ vfio-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o vfio-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o +vfio-mdev-pci-y := vfio_mdev_pci.o vfio_pci_common.o vfio_pci_intrs.o \ + vfio_pci_rdwr.o vfio_pci_config.o +vfio-mdev-pci-$(CONFIG_VFIO_PCI_IGD) += vfio_pci_igd.o +vfio-mdev-pci-$(CONFIG_VFIO_PCI_NVLINK2) += vfio_pci_nvlink2.o + obj-$(CONFIG_VFIO_PCI) += vfio-pci.o +obj-$(CONFIG_SAMPLE_VFIO_MDEV_PCI) += vfio-mdev-pci.o diff --git a/drivers/vfio/pci/vfio_mdev_pci.c b/drivers/vfio/pci/vfio_mdev_pci.c new file mode 100644 index 0000000..07c8067 --- /dev/null +++ b/drivers/vfio/pci/vfio_mdev_pci.c @@ -0,0 +1,403 @@ +/* + * Copyright © 2019 Intel Corporation. + * Author: Liu, Yi L + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * Derived from original vfio_pci.c: + * Copyright (C) 2012 Red Hat, Inc. All rights reserved. + * Author: Alex Williamson + * + * Derived from original vfio: + * Copyright 2010 Cisco Systems, Inc. All rights reserved. + * Author: Tom Lyon, pugs@cisco.com + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "vfio_pci_private.h" + +#define DRIVER_VERSION "0.1" +#define DRIVER_AUTHOR "Liu, Yi L " +#define DRIVER_DESC "VFIO Mdev PCI - Sample driver for PCI device as a mdev" + +#define VFIO_MDEV_PCI_NAME "vfio-mdev-pci" + +static char ids[1024] __initdata; +module_param_string(ids, ids, sizeof(ids), 0); +MODULE_PARM_DESC(ids, "Initial PCI IDs to add to the vfio-mdev-pci driver, format is \"vendor:device[:subvendor[:subdevice[:class[:class_mask]]]]\" and multiple comma separated entries can be specified"); + +static bool nointxmask; +module_param_named(nointxmask, nointxmask, bool, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(nointxmask, + "Disable support for PCI 2.3 style INTx masking. If this resolves problems for specific devices, report lspci -vvvxxx to linux-pci@vger.kernel.org so the device can be fixed automatically via the broken_intx_masking flag."); + +#ifdef CONFIG_VFIO_PCI_VGA +static bool disable_vga; +module_param(disable_vga, bool, S_IRUGO); +MODULE_PARM_DESC(disable_vga, "Disable VGA resource access through vfio-mdev-pci"); +#endif + +static bool disable_idle_d3; +module_param(disable_idle_d3, bool, S_IRUGO | S_IWUSR); +MODULE_PARM_DESC(disable_idle_d3, + "Disable using the PCI D3 low power state for idle, unused devices"); + +static struct pci_driver vfio_mdev_pci_driver; + +static ssize_t +name_show(struct kobject *kobj, struct device *dev, char *buf) +{ + return sprintf(buf, "%s-type1\n", dev_name(dev)); +} + +MDEV_TYPE_ATTR_RO(name); + +static ssize_t +available_instances_show(struct kobject *kobj, struct device *dev, char *buf) +{ + return sprintf(buf, "%d\n", 1); +} + +MDEV_TYPE_ATTR_RO(available_instances); + +static ssize_t device_api_show(struct kobject *kobj, struct device *dev, + char *buf) +{ + return sprintf(buf, "%s\n", VFIO_DEVICE_API_PCI_STRING); +} + +MDEV_TYPE_ATTR_RO(device_api); + +static struct attribute *vfio_mdev_pci_types_attrs[] = { + &mdev_type_attr_name.attr, + &mdev_type_attr_device_api.attr, + &mdev_type_attr_available_instances.attr, + NULL, +}; + +static struct attribute_group vfio_mdev_pci_type_group1 = { + .name = "type1", + .attrs = vfio_mdev_pci_types_attrs, +}; + +struct attribute_group *vfio_mdev_pci_type_groups[] = { + &vfio_mdev_pci_type_group1, + NULL, +}; + +struct vfio_mdev_pci { + struct vfio_pci_device *vdev; + struct mdev_device *mdev; + unsigned long handle; +}; + +static int vfio_mdev_pci_create(struct kobject *kobj, struct mdev_device *mdev) +{ + struct device *pdev; + struct vfio_pci_device *vdev; + struct vfio_mdev_pci *pmdev; + int ret; + + pdev = mdev_parent_dev(mdev); + vdev = dev_get_drvdata(pdev); + pmdev = kzalloc(sizeof(struct vfio_mdev_pci), GFP_KERNEL); + if (pmdev == NULL) { + ret = -EBUSY; + goto out; + } + + pmdev->mdev = mdev; + pmdev->vdev = vdev; + mdev_set_drvdata(mdev, pmdev); + ret = mdev_set_iommu_device(mdev_dev(mdev), pdev); + if (ret) { + pr_info("%s, failed to config iommu isolation for mdev: %s on pf: %s\n", + __func__, dev_name(mdev_dev(mdev)), dev_name(pdev)); + goto out; + } + +out: + return ret; +} + +static int vfio_mdev_pci_remove(struct mdev_device *mdev) +{ + struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev); + + kfree(pmdev); + pr_info("%s, succeeded for mdev: %s\n", __func__, + dev_name(mdev_dev(mdev))); + + return 0; +} + +static int vfio_mdev_pci_open(struct mdev_device *mdev) +{ + struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev); + struct vfio_pci_device *vdev = pmdev->vdev; + int ret = 0; + + if (!try_module_get(THIS_MODULE)) + return -ENODEV; + + mutex_lock(&vdev->reflck->lock); + + if (!vdev->refcnt) { + ret = vfio_pci_enable(vdev); + if (ret) + goto error; + + vfio_spapr_pci_eeh_open(vdev->pdev); + } + vdev->refcnt++; +error: + mutex_unlock(&vdev->reflck->lock); + if (!ret) + pr_info("Succeeded to open mdev: %s on pf: %s\n", + dev_name(mdev_dev(mdev)), dev_name(&pmdev->vdev->pdev->dev)); + else { + pr_info("Failed to open mdev: %s on pf: %s\n", + dev_name(mdev_dev(mdev)), dev_name(&pmdev->vdev->pdev->dev)); + module_put(THIS_MODULE); + } + return ret; +} + +static void vfio_mdev_pci_release(struct mdev_device *mdev) +{ + struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev); + struct vfio_pci_device *vdev = pmdev->vdev; + + pr_info("Release mdev: %s on pf: %s\n", + dev_name(mdev_dev(mdev)), dev_name(&pmdev->vdev->pdev->dev)); + + mutex_lock(&vdev->reflck->lock); + + if (!(--vdev->refcnt)) { + vfio_spapr_pci_eeh_release(vdev->pdev); + vfio_pci_disable(vdev); + } + + mutex_unlock(&vdev->reflck->lock); + + module_put(THIS_MODULE); +} + +static long vfio_mdev_pci_ioctl(struct mdev_device *mdev, unsigned int cmd, + unsigned long arg) +{ + struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev); + + return vfio_pci_ioctl(pmdev->vdev, cmd, arg); +} + +static int vfio_mdev_pci_mmap(struct mdev_device *mdev, + struct vm_area_struct *vma) +{ + struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev); + + return vfio_pci_mmap(pmdev->vdev, vma); +} + +static ssize_t vfio_mdev_pci_read(struct mdev_device *mdev, char __user *buf, + size_t count, loff_t *ppos) +{ + struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev); + + return vfio_pci_read(pmdev->vdev, buf, count, ppos); +} + +static ssize_t vfio_mdev_pci_write(struct mdev_device *mdev, + const char __user *buf, + size_t count, loff_t *ppos) +{ + struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev); + + return vfio_pci_write(pmdev->vdev, (char __user *)buf, count, ppos); +} + +static const struct mdev_parent_ops vfio_mdev_pci_ops = { + .supported_type_groups = vfio_mdev_pci_type_groups, + .create = vfio_mdev_pci_create, + .remove = vfio_mdev_pci_remove, + + .open = vfio_mdev_pci_open, + .release = vfio_mdev_pci_release, + + .read = vfio_mdev_pci_read, + .write = vfio_mdev_pci_write, + .mmap = vfio_mdev_pci_mmap, + .ioctl = vfio_mdev_pci_ioctl, +}; + +static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev, + const struct pci_device_id *id) +{ + struct vfio_pci_device *vdev; + int ret; + + if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL) + return -EINVAL; + + /* + * Prevent binding to PFs with VFs enabled, this too easily allows + * userspace instance with VFs and PFs from the same device, which + * cannot work. Disabling SR-IOV here would initiate removing the + * VFs, which would unbind the driver, which is prone to blocking + * if that VF is also in use by vfio-pci or vfio-mdev-pci. Just + * reject these PFs and let the user sort it out. + */ + if (pci_num_vf(pdev)) { + pci_warn(pdev, "Cannot bind to PF with SR-IOV enabled\n"); + return -EBUSY; + } + + vdev = kzalloc(sizeof(*vdev), GFP_KERNEL); + if (!vdev) + return -ENOMEM; + + vdev->pdev = pdev; + vdev->irq_type = VFIO_PCI_NUM_IRQS; + mutex_init(&vdev->igate); + spin_lock_init(&vdev->irqlock); + mutex_init(&vdev->ioeventfds_lock); + INIT_LIST_HEAD(&vdev->ioeventfds_list); + vdev->nointxmask = nointxmask; +#ifdef CONFIG_VFIO_PCI_VGA + vdev->disable_vga = disable_vga; +#endif + vdev->disable_idle_d3 = disable_idle_d3; + + pci_set_drvdata(pdev, vdev); + + ret = vfio_pci_reflck_attach(vdev); + if (ret) { + pci_set_drvdata(pdev, NULL); + kfree(vdev); + return ret; + } + + if (vfio_pci_is_vga(pdev)) { + vga_client_register(pdev, vdev, NULL, vfio_pci_set_vga_decode); + vga_set_legacy_decoding(pdev, + vfio_pci_set_vga_decode(vdev, false)); + } + + vfio_pci_probe_power_state(vdev); + + if (!vdev->disable_idle_d3) { + /* + * pci-core sets the device power state to an unknown value at + * bootup and after being removed from a driver. The only + * transition it allows from this unknown state is to D0, which + * typically happens when a driver calls pci_enable_device(). + * We're not ready to enable the device yet, but we do want to + * be able to get to D3. Therefore first do a D0 transition + * before going to D3. + */ + vfio_pci_set_power_state(vdev, PCI_D0); + vfio_pci_set_power_state(vdev, PCI_D3hot); + } + + ret = mdev_register_device(&pdev->dev, &vfio_mdev_pci_ops); + if (ret) + pr_err("Cannot register mdev for device %s\n", + dev_name(&pdev->dev)); + else + pr_info("Wrap device %s as a mdev\n", dev_name(&pdev->dev)); + + return ret; +} + +static void vfio_mdev_pci_driver_remove(struct pci_dev *pdev) +{ + struct vfio_pci_device *vdev; + + vdev = pci_get_drvdata(pdev); + if (!vdev) + return; + + vfio_pci_reflck_put(vdev->reflck); + + kfree(vdev->region); + mutex_destroy(&vdev->ioeventfds_lock); + + if (!disable_idle_d3) + vfio_pci_set_power_state(vdev, PCI_D0); + + kfree(vdev->pm_save); + + if (vfio_pci_is_vga(pdev)) { + vga_client_register(pdev, NULL, NULL, NULL); + vga_set_legacy_decoding(pdev, + VGA_RSRC_NORMAL_IO | VGA_RSRC_NORMAL_MEM | + VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM); + } + + kfree(vdev); +} + +static struct pci_driver vfio_mdev_pci_driver = { + .name = VFIO_MDEV_PCI_NAME, + .id_table = NULL, /* only dynamic ids */ + .probe = vfio_mdev_pci_driver_probe, + .remove = vfio_mdev_pci_driver_remove, + .err_handler = &vfio_err_handlers, +}; + +static void __exit vfio_mdev_pci_cleanup(void) +{ + pci_unregister_driver(&vfio_mdev_pci_driver); + vfio_pci_uninit_perm_bits(); +} + +static int __init vfio_mdev_pci_init(void) +{ + int ret; + + /* Allocate shared config space permision data used by all devices */ + ret = vfio_pci_init_perm_bits(); + if (ret) + return ret; + + /* Register and scan for devices */ + ret = pci_register_driver(&vfio_mdev_pci_driver); + if (ret) + goto out_driver; + + vfio_pci_fill_ids(ids, &vfio_mdev_pci_driver); + + return 0; +out_driver: + vfio_pci_uninit_perm_bits(); + return ret; +} + +module_init(vfio_mdev_pci_init); +module_exit(vfio_mdev_pci_cleanup); + +MODULE_VERSION(DRIVER_VERSION); +MODULE_LICENSE("GPL v2"); +MODULE_AUTHOR(DRIVER_AUTHOR); +MODULE_DESCRIPTION(DRIVER_DESC); diff --git a/samples/Kconfig b/samples/Kconfig index c8dacb4..1513fef 100644 --- a/samples/Kconfig +++ b/samples/Kconfig @@ -169,4 +169,15 @@ config SAMPLE_VFS as mount API and statx(). Note that this is restricted to the x86 arch whilst it accesses system calls that aren't yet in all arches. +config SAMPLE_VFIO_MDEV_PCI + tristate "Sample driver for wrapping PCI device as a mdev" + depends on PCI && EVENTFD && VFIO_MDEV_DEVICE + select VFIO_VIRQFD + select IRQ_BYPASS_MANAGER + help + Sample driver for wrapping a PCI device as a mdev. Once bound to + this driver, device passthru should through mdev path. + + If you don't know what to do here, say N. + endif # SAMPLES From patchwork Thu Sep 5 07:59:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11134699 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CA63C1398 for ; Fri, 6 Sep 2019 08:17:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id ABBE920854 for ; Fri, 6 Sep 2019 08:17:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2392625AbfIFIRt (ORCPT ); Fri, 6 Sep 2019 04:17:49 -0400 Received: from mga11.intel.com ([192.55.52.93]:23720 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2392615AbfIFIRs (ORCPT ); Fri, 6 Sep 2019 04:17:48 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Sep 2019 01:17:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,472,1559545200"; d="scan'208";a="383186201" Received: from yiliu-dev.bj.intel.com ([10.238.156.139]) by fmsmga005.fm.intel.com with ESMTP; 06 Sep 2019 01:17:46 -0700 From: Liu Yi L To: alex.williamson@redhat.com, kwankhede@nvidia.com Cc: kevin.tian@intel.com, baolu.lu@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, joro@8bytes.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, yan.y.zhao@intel.com, shaopeng.he@intel.com, chenbo.xia@intel.com, jun.j.tian@intel.com Subject: [PATCH v2 10/13] samples: refine vfio-mdev-pci driver Date: Thu, 5 Sep 2019 15:59:27 +0800 Message-Id: <1567670370-4484-11-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> References: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Alex Williamson This patch refines the implementation of original vfio-mdev-pci driver. And the vfio-mdev-pci-type_name will be named per the following rule: vmdev->attr.name = kasprintf(GFP_KERNEL, "%04x:%04x:%04x:%04x:%06x:%02x", pdev->vendor, pdev->device, pdev->subsystem_vendor, pdev->subsystem_device, pdev->class, pdev->revision); Before usage, check the /sys/bus/pci/devices/$bdf/mdev_supported_types/ to ensure the final mdev_supported_types. Cc: Kevin Tian Cc: Lu Baolu Signed-off-by: Alex Williamson Signed-off-by: Liu Yi L --- drivers/vfio/pci/vfio_mdev_pci.c | 123 +++++++++++++++++++++++---------------- 1 file changed, 72 insertions(+), 51 deletions(-) diff --git a/drivers/vfio/pci/vfio_mdev_pci.c b/drivers/vfio/pci/vfio_mdev_pci.c index 07c8067..09143d3 100644 --- a/drivers/vfio/pci/vfio_mdev_pci.c +++ b/drivers/vfio/pci/vfio_mdev_pci.c @@ -65,18 +65,22 @@ MODULE_PARM_DESC(disable_idle_d3, static struct pci_driver vfio_mdev_pci_driver; -static ssize_t -name_show(struct kobject *kobj, struct device *dev, char *buf) -{ - return sprintf(buf, "%s-type1\n", dev_name(dev)); -} - -MDEV_TYPE_ATTR_RO(name); +struct vfio_mdev_pci_device { + struct vfio_pci_device vdev; + struct mdev_parent_ops ops; + struct attribute_group *groups[2]; + struct attribute_group attr; + atomic_t avail; +}; static ssize_t available_instances_show(struct kobject *kobj, struct device *dev, char *buf) { - return sprintf(buf, "%d\n", 1); + struct vfio_mdev_pci_device *vmdev; + + vmdev = pci_get_drvdata(to_pci_dev(dev)); + + return sprintf(buf, "%d\n", atomic_read(&vmdev->avail)); } MDEV_TYPE_ATTR_RO(available_instances); @@ -90,62 +94,57 @@ static ssize_t device_api_show(struct kobject *kobj, struct device *dev, MDEV_TYPE_ATTR_RO(device_api); static struct attribute *vfio_mdev_pci_types_attrs[] = { - &mdev_type_attr_name.attr, &mdev_type_attr_device_api.attr, &mdev_type_attr_available_instances.attr, NULL, }; -static struct attribute_group vfio_mdev_pci_type_group1 = { - .name = "type1", - .attrs = vfio_mdev_pci_types_attrs, -}; - -struct attribute_group *vfio_mdev_pci_type_groups[] = { - &vfio_mdev_pci_type_group1, - NULL, -}; - struct vfio_mdev_pci { struct vfio_pci_device *vdev; struct mdev_device *mdev; - unsigned long handle; }; static int vfio_mdev_pci_create(struct kobject *kobj, struct mdev_device *mdev) { struct device *pdev; - struct vfio_pci_device *vdev; + struct vfio_mdev_pci_device *vmdev; struct vfio_mdev_pci *pmdev; int ret; pdev = mdev_parent_dev(mdev); - vdev = dev_get_drvdata(pdev); + vmdev = dev_get_drvdata(pdev); + + if (atomic_dec_if_positive(&vmdev->avail) < 0) + return -ENOSPC; + pmdev = kzalloc(sizeof(struct vfio_mdev_pci), GFP_KERNEL); - if (pmdev == NULL) { - ret = -EBUSY; - goto out; - } + if (!pmdev) + return -ENOMEM; pmdev->mdev = mdev; - pmdev->vdev = vdev; + pmdev->vdev = &vmdev->vdev; mdev_set_drvdata(mdev, pmdev); ret = mdev_set_iommu_device(mdev_dev(mdev), pdev); if (ret) { pr_info("%s, failed to config iommu isolation for mdev: %s on pf: %s\n", __func__, dev_name(mdev_dev(mdev)), dev_name(pdev)); - goto out; + kfree(pmdev); + atomic_inc(&vmdev->avail); + return ret; } -out: - return ret; + return 0; } static int vfio_mdev_pci_remove(struct mdev_device *mdev) { struct vfio_mdev_pci *pmdev = mdev_get_drvdata(mdev); + struct vfio_mdev_pci_device *vmdev; + + vmdev = container_of(pmdev->vdev, struct vfio_mdev_pci_device, vdev); kfree(pmdev); + atomic_inc(&vmdev->avail); pr_info("%s, succeeded for mdev: %s\n", __func__, dev_name(mdev_dev(mdev))); @@ -237,24 +236,12 @@ static ssize_t vfio_mdev_pci_write(struct mdev_device *mdev, return vfio_pci_write(pmdev->vdev, (char __user *)buf, count, ppos); } -static const struct mdev_parent_ops vfio_mdev_pci_ops = { - .supported_type_groups = vfio_mdev_pci_type_groups, - .create = vfio_mdev_pci_create, - .remove = vfio_mdev_pci_remove, - - .open = vfio_mdev_pci_open, - .release = vfio_mdev_pci_release, - - .read = vfio_mdev_pci_read, - .write = vfio_mdev_pci_write, - .mmap = vfio_mdev_pci_mmap, - .ioctl = vfio_mdev_pci_ioctl, -}; - static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev, const struct pci_device_id *id) { + struct vfio_mdev_pci_device *vmdev; struct vfio_pci_device *vdev; + const struct mdev_parent_ops *ops; int ret; if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL) @@ -273,10 +260,38 @@ static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev, return -EBUSY; } - vdev = kzalloc(sizeof(*vdev), GFP_KERNEL); - if (!vdev) + vmdev = kzalloc(sizeof(*vmdev), GFP_KERNEL); + if (!vmdev) return -ENOMEM; + vmdev->attr.name = kasprintf(GFP_KERNEL, + "%04x:%04x:%04x:%04x:%06x:%02x", + pdev->vendor, pdev->device, + pdev->subsystem_vendor, + pdev->subsystem_device, pdev->class, + pdev->revision); + if (!vmdev->attr.name) { + kfree(vmdev); + return -ENOMEM; + } + + atomic_set(&vmdev->avail, 1); + + vmdev->attr.attrs = vfio_mdev_pci_types_attrs; + vmdev->groups[0] = &vmdev->attr; + + vmdev->ops.supported_type_groups = vmdev->groups; + vmdev->ops.create = vfio_mdev_pci_create; + vmdev->ops.remove = vfio_mdev_pci_remove; + vmdev->ops.open = vfio_mdev_pci_open; + vmdev->ops.release = vfio_mdev_pci_release; + vmdev->ops.read = vfio_mdev_pci_read; + vmdev->ops.write = vfio_mdev_pci_write; + vmdev->ops.mmap = vfio_mdev_pci_mmap; + vmdev->ops.ioctl = vfio_mdev_pci_ioctl; + ops = &vmdev->ops; + + vdev = &vmdev->vdev; vdev->pdev = pdev; vdev->irq_type = VFIO_PCI_NUM_IRQS; mutex_init(&vdev->igate); @@ -289,7 +304,7 @@ static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev, #endif vdev->disable_idle_d3 = disable_idle_d3; - pci_set_drvdata(pdev, vdev); + pci_set_drvdata(pdev, vmdev); ret = vfio_pci_reflck_attach(vdev); if (ret) { @@ -320,7 +335,7 @@ static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev, vfio_pci_set_power_state(vdev, PCI_D3hot); } - ret = mdev_register_device(&pdev->dev, &vfio_mdev_pci_ops); + ret = mdev_register_device(&pdev->dev, ops); if (ret) pr_err("Cannot register mdev for device %s\n", dev_name(&pdev->dev)); @@ -332,12 +347,17 @@ static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev, static void vfio_mdev_pci_driver_remove(struct pci_dev *pdev) { + struct vfio_mdev_pci_device *vmdev; struct vfio_pci_device *vdev; - vdev = pci_get_drvdata(pdev); - if (!vdev) + mdev_unregister_device(&pdev->dev); + + vmdev = pci_get_drvdata(pdev); + if (!vmdev) return; + vdev = &vmdev->vdev; + vfio_pci_reflck_put(vdev->reflck); kfree(vdev->region); @@ -355,7 +375,8 @@ static void vfio_mdev_pci_driver_remove(struct pci_dev *pdev) VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM); } - kfree(vdev); + kfree(vmdev->attr.name); + kfree(vmdev); } static struct pci_driver vfio_mdev_pci_driver = { From patchwork Thu Sep 5 07:59:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11134709 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 285E91398 for ; Fri, 6 Sep 2019 08:18:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0A9F32087E for ; Fri, 6 Sep 2019 08:18:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2392678AbfIFISF (ORCPT ); Fri, 6 Sep 2019 04:18:05 -0400 Received: from mga11.intel.com ([192.55.52.93]:23720 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2392636AbfIFIRv (ORCPT ); Fri, 6 Sep 2019 04:17:51 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Sep 2019 01:17:50 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,472,1559545200"; d="scan'208";a="383186210" Received: from yiliu-dev.bj.intel.com ([10.238.156.139]) by fmsmga005.fm.intel.com with ESMTP; 06 Sep 2019 01:17:48 -0700 From: Liu Yi L To: alex.williamson@redhat.com, kwankhede@nvidia.com Cc: kevin.tian@intel.com, baolu.lu@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, joro@8bytes.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, yan.y.zhao@intel.com, shaopeng.he@intel.com, chenbo.xia@intel.com, jun.j.tian@intel.com Subject: [PATCH v2 11/13] samples/vfio-mdev-pci: call vfio_add_group_dev() Date: Thu, 5 Sep 2019 15:59:28 +0800 Message-Id: <1567670370-4484-12-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> References: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds vfio_add_group_dev() calling in probe() to make vfio-mdev-pci work well with non-singleton iommu group. User could bind devices from a non-singleton iommu group to either vfio-pci driver or this sample driver. Existing passthru policy works well for this non-singleton group. This is actually a policy choice. A device driver can make this call if it wants to be vfio viable. And it needs to provide dummy vfio_device_ops which is required by vfio framework. To prevent user from opening the device from the iommu backed group fd, the open callback of the dummy vfio_device_ops should return -ENODEV to fail the VFIO_GET_DEVICE_FD request from userspace. Cc: Kevin Tian Cc: Lu Baolu Signed-off-by: Liu Yi L --- drivers/vfio/pci/vfio_mdev_pci.c | 91 ++++++++++++++++++++++++++++++++++++---- 1 file changed, 82 insertions(+), 9 deletions(-) diff --git a/drivers/vfio/pci/vfio_mdev_pci.c b/drivers/vfio/pci/vfio_mdev_pci.c index 09143d3..a61c20d 100644 --- a/drivers/vfio/pci/vfio_mdev_pci.c +++ b/drivers/vfio/pci/vfio_mdev_pci.c @@ -107,19 +107,27 @@ struct vfio_mdev_pci { static int vfio_mdev_pci_create(struct kobject *kobj, struct mdev_device *mdev) { struct device *pdev; + struct vfio_device *device; struct vfio_mdev_pci_device *vmdev; struct vfio_mdev_pci *pmdev; int ret; pdev = mdev_parent_dev(mdev); - vmdev = dev_get_drvdata(pdev); + device = vfio_device_get_from_dev(pdev); + vmdev = vfio_device_data(device); - if (atomic_dec_if_positive(&vmdev->avail) < 0) - return -ENOSPC; + if (atomic_dec_if_positive(&vmdev->avail) < 0) { + ret = -ENOSPC; + goto out; + } + pr_info("%s, available instance: %d\n", + __func__, atomic_read(&vmdev->avail)); pmdev = kzalloc(sizeof(struct vfio_mdev_pci), GFP_KERNEL); - if (!pmdev) - return -ENOMEM; + if (!pmdev) { + ret = -ENOMEM; + goto out; + } pmdev->mdev = mdev; pmdev->vdev = &vmdev->vdev; @@ -130,10 +138,11 @@ static int vfio_mdev_pci_create(struct kobject *kobj, struct mdev_device *mdev) __func__, dev_name(mdev_dev(mdev)), dev_name(pdev)); kfree(pmdev); atomic_inc(&vmdev->avail); - return ret; } - return 0; +out: + vfio_device_put(device); + return ret; } static int vfio_mdev_pci_remove(struct mdev_device *mdev) @@ -145,6 +154,8 @@ static int vfio_mdev_pci_remove(struct mdev_device *mdev) kfree(pmdev); atomic_inc(&vmdev->avail); + pr_info("%s, available instance: %d\n", + __func__, atomic_read(&vmdev->avail)); pr_info("%s, succeeded for mdev: %s\n", __func__, dev_name(mdev_dev(mdev))); @@ -236,12 +247,65 @@ static ssize_t vfio_mdev_pci_write(struct mdev_device *mdev, return vfio_pci_write(pmdev->vdev, (char __user *)buf, count, ppos); } +static int vfio_pci_dummy_open(void *device_data) +{ + struct vfio_mdev_pci_device *vmdev = + (struct vfio_mdev_pci_device *) device_data; + pr_warn("Device %s is not viable for vfio-pci passthru, please follow" + " vfio-mdev passthru path as it has been wrapped as mdev!!!\n", + dev_name(&vmdev->vdev.pdev->dev)); + return -ENODEV; +} + +static void vfio_pci_dummy_release(void *device_data) +{ +} + +long vfio_pci_dummy_ioctl(void *device_data, + unsigned int cmd, unsigned long arg) +{ + return 0; +} + +ssize_t vfio_pci_dummy_read(void *device_data, char __user *buf, + size_t count, loff_t *ppos) +{ + return 0; +} + +ssize_t vfio_pci_dummy_write(void *device_data, const char __user *buf, + size_t count, loff_t *ppos) +{ + return 0; +} + +int vfio_pci_dummy_mmap(void *device_data, struct vm_area_struct *vma) +{ + return 0; +} + +void vfio_pci_dummy_request(void *device_data, unsigned int count) +{ +} + +static const struct vfio_device_ops vfio_pci_dummy_ops = { + .name = "vfio-pci", + .open = vfio_pci_dummy_open, + .release = vfio_pci_dummy_release, + .ioctl = vfio_pci_dummy_ioctl, + .read = vfio_pci_dummy_read, + .write = vfio_pci_dummy_write, + .mmap = vfio_pci_dummy_mmap, + .request = vfio_pci_dummy_request, +}; + static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev, const struct pci_device_id *id) { struct vfio_mdev_pci_device *vmdev; struct vfio_pci_device *vdev; const struct mdev_parent_ops *ops; + struct iommu_group *group; int ret; if (pdev->hdr_type != PCI_HEADER_TYPE_NORMAL) @@ -260,6 +324,10 @@ static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev, return -EBUSY; } + group = vfio_iommu_group_get(&pdev->dev); + if (!group) + return -EINVAL; + vmdev = kzalloc(sizeof(*vmdev), GFP_KERNEL); if (!vmdev) return -ENOMEM; @@ -304,7 +372,12 @@ static int vfio_mdev_pci_driver_probe(struct pci_dev *pdev, #endif vdev->disable_idle_d3 = disable_idle_d3; - pci_set_drvdata(pdev, vmdev); + ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_dummy_ops, vmdev); + if (ret) { + vfio_iommu_group_put(group, &pdev->dev); + kfree(vmdev); + return ret; + } ret = vfio_pci_reflck_attach(vdev); if (ret) { @@ -352,7 +425,7 @@ static void vfio_mdev_pci_driver_remove(struct pci_dev *pdev) mdev_unregister_device(&pdev->dev); - vmdev = pci_get_drvdata(pdev); + vmdev = vfio_del_group_dev(&pdev->dev); if (!vmdev) return; From patchwork Thu Sep 5 07:59:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11134701 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A6EF71398 for ; Fri, 6 Sep 2019 08:17:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9292F2087E for ; Fri, 6 Sep 2019 08:17:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2392650AbfIFIRy (ORCPT ); Fri, 6 Sep 2019 04:17:54 -0400 Received: from mga11.intel.com ([192.55.52.93]:23720 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2392615AbfIFIRx (ORCPT ); Fri, 6 Sep 2019 04:17:53 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Sep 2019 01:17:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,472,1559545200"; d="scan'208";a="383186216" Received: from yiliu-dev.bj.intel.com ([10.238.156.139]) by fmsmga005.fm.intel.com with ESMTP; 06 Sep 2019 01:17:51 -0700 From: Liu Yi L To: alex.williamson@redhat.com, kwankhede@nvidia.com Cc: kevin.tian@intel.com, baolu.lu@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, joro@8bytes.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, yan.y.zhao@intel.com, shaopeng.he@intel.com, chenbo.xia@intel.com, jun.j.tian@intel.com Subject: [PATCH v2 12/13] vfio/type1: use iommu_attach_group() for wrapping PF/VF as mdev Date: Thu, 5 Sep 2019 15:59:29 +0800 Message-Id: <1567670370-4484-13-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> References: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch uses iommu_attach_group() to do group attach when it is for the case of wrapping a PF/VF as a mdev. iommu_attach_device() doesn't support non-singleton iommu group attach. With this change, wrapping PF/VF as mdev can work on non-singleton iommu groups. Cc: Kevin Tian Cc: Lu Baolu Suggested-by: Alex Williamson Signed-off-by: Liu Yi L --- drivers/vfio/vfio_iommu_type1.c | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 054391f..317430d 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -1312,13 +1312,20 @@ static int vfio_mdev_attach_domain(struct device *dev, void *data) { struct iommu_domain *domain = data; struct device *iommu_device; + struct iommu_group *group; iommu_device = vfio_mdev_get_iommu_device(dev); if (iommu_device) { if (iommu_dev_feature_enabled(iommu_device, IOMMU_DEV_FEAT_AUX)) return iommu_aux_attach_device(domain, iommu_device); - else - return iommu_attach_device(domain, iommu_device); + else { + group = iommu_group_get(iommu_device); + if (!group) { + WARN_ON(1); + return -EINVAL; + } + return iommu_attach_group(domain, group); + } } return -EINVAL; @@ -1328,13 +1335,20 @@ static int vfio_mdev_detach_domain(struct device *dev, void *data) { struct iommu_domain *domain = data; struct device *iommu_device; + struct iommu_group *group; iommu_device = vfio_mdev_get_iommu_device(dev); if (iommu_device) { if (iommu_dev_feature_enabled(iommu_device, IOMMU_DEV_FEAT_AUX)) iommu_aux_detach_device(domain, iommu_device); - else - iommu_detach_device(domain, iommu_device); + else { + group = iommu_group_get(iommu_device); + if (!group) { + WARN_ON(1); + return -EINVAL; + } + iommu_detach_group(domain, group); + } } return 0; From patchwork Thu Sep 5 07:59:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11134705 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 158A414ED for ; Fri, 6 Sep 2019 08:18:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E30D02082C for ; Fri, 6 Sep 2019 08:18:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2392660AbfIFIR5 (ORCPT ); Fri, 6 Sep 2019 04:17:57 -0400 Received: from mga11.intel.com ([192.55.52.93]:23747 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2392657AbfIFIR4 (ORCPT ); Fri, 6 Sep 2019 04:17:56 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 06 Sep 2019 01:17:55 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,472,1559545200"; d="scan'208";a="383186227" Received: from yiliu-dev.bj.intel.com ([10.238.156.139]) by fmsmga005.fm.intel.com with ESMTP; 06 Sep 2019 01:17:53 -0700 From: Liu Yi L To: alex.williamson@redhat.com, kwankhede@nvidia.com Cc: kevin.tian@intel.com, baolu.lu@linux.intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, joro@8bytes.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, yan.y.zhao@intel.com, shaopeng.he@intel.com, chenbo.xia@intel.com, jun.j.tian@intel.com Subject: [PATCH v2 13/13] vfio/type1: track iommu backed group attach Date: Thu, 5 Sep 2019 15:59:30 +0800 Message-Id: <1567670370-4484-14-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> References: <1567670370-4484-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org With the introduction of iommu aware mdev group, user may wrap a PF/VF as a mdev. Such mdevs will be called as wrapped PF/VF mdevs in following statements. If it's applied on a non-singleton iommu group, there would be multiple domain attach on an iommu_device group (equal to iommu backed group). Reason is that mdev group attaches is finally an iommu_device group attach in the end. And existing vfio_domain.gorup_list has no idea about it. Thus multiple attach would happen. What's more, under default domain policy, group attach is allowed only when its in-use domain is equal to its default domain as the code below: static int __iommu_attach_group(struct iommu_domain *domain, ..) { .. if (group->default_domain && group->domain != group->default_domain) return -EBUSY; ... } So for the above scenario, only the first group attach on the non-singleton iommu group will be successful. Subsequent group attaches will be failed. However, this is a fairly valid usage case if the wrapped PF/VF mdevs and other devices are assigned to a single VM. We may want to prevent it. In other words, the subsequent group attaches should return success before going to __iommu_attach_group(). However, if user tries to assign the wrapped PF/VF mdevs and other devices to different VMs, the subsequent group attaches on a single iommu_device group should be failed. This means the subsequent group attach should finally calls into __iommu_attach_group() and be failed. To meet the above requirements, this patch introduces vfio_group_object structure to track the group attach of an iommu_device group (a.ka. iommu backed group). Each vfio_domain will have a group_obj_list to record the vfio_group_objects. The search of the group_obj_list should use iommu_device group if a group is mdev group. struct vfio_group_object { atomic_t count; struct iommu_group *iommu_group; struct vfio_domain *domain; struct list_head next; }; Each time, a successful group attach should either have a new vfio_group_object created or count increasing of an existing vfio_group_object instance. Details can be found in vfio_domain_attach_group_object(). For group detach, should have count decreasing. Please check vfio_domain_detach_group_object(). As the vfio_domain.group_obj_list is within vfio container(vfio_iommu) scope, if user wants to passthru a non-singleton to multiple VMs, it will be failed as VMs will have separate vfio containers. Also, if vIOMMU is exposed, it will also fail the attempts of assigning multiple devices (via vfio-pci or PF/VF wrapped mdev) to a single VM. This is aligned with current vfio passthru rules. Cc: Kevin Tian Cc: Lu Baolu Suggested-by: Alex Williamson Signed-off-by: Liu Yi L --- drivers/vfio/vfio_iommu_type1.c | 167 ++++++++++++++++++++++++++++++++++++---- 1 file changed, 154 insertions(+), 13 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 317430d..6a67bd6 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -75,6 +75,7 @@ struct vfio_domain { struct iommu_domain *domain; struct list_head next; struct list_head group_list; + struct list_head group_obj_list; int prot; /* IOMMU_CACHE */ bool fgsp; /* Fine-grained super pages */ }; @@ -97,6 +98,13 @@ struct vfio_group { bool mdev_group; /* An mdev group */ }; +struct vfio_group_object { + atomic_t count; + struct iommu_group *iommu_group; + struct vfio_domain *domain; + struct list_head next; +}; + /* * Guest RAM pinning working set or DMA target */ @@ -1263,6 +1271,85 @@ static struct vfio_group *find_iommu_group(struct vfio_domain *domain, return NULL; } +static struct vfio_group_object *find_iommu_group_object( + struct vfio_domain *domain, struct iommu_group *iommu_group) +{ + struct vfio_group_object *g; + + list_for_each_entry(g, &domain->group_obj_list, next) { + if (g->iommu_group == iommu_group) + return g; + } + + return NULL; +} + +static void vfio_init_iommu_group_object(struct vfio_group_object *group_obj, + struct vfio_domain *domain, struct iommu_group *iommu_group) +{ + if (!group_obj || !domain || !iommu_group) { + WARN_ON(1); + return; + } + atomic_set(&group_obj->count, 1); + group_obj->iommu_group = iommu_group; + group_obj->domain = domain; + list_add(&group_obj->next, &domain->group_obj_list); +} + +static int vfio_domain_attach_group_object( + struct vfio_domain *domain, struct iommu_group *iommu_group) +{ + struct vfio_group_object *group_obj; + + group_obj = find_iommu_group_object(domain, iommu_group); + if (group_obj) { + atomic_inc(&group_obj->count); + return 0; + } + group_obj = kzalloc(sizeof(*group_obj), GFP_KERNEL); + vfio_init_iommu_group_object(group_obj, domain, iommu_group); + return iommu_attach_group(domain->domain, iommu_group); +} + +static int vfio_domain_detach_group_object( + struct vfio_domain *domain, struct iommu_group *iommu_group) +{ + struct vfio_group_object *group_obj; + + group_obj = find_iommu_group_object(domain, iommu_group); + if (!group_obj) { + WARN_ON(1); + return -EINVAL; + } + if (atomic_dec_if_positive(&group_obj->count) == 0) { + list_del(&group_obj->next); + kfree(group_obj); + } + iommu_detach_group(domain->domain, iommu_group); + return 0; +} + +/* + * Check if an iommu backed group has been attached to a domain within + * a specific container (vfio_iommu). If yes, return the vfio_group_object + * which tracks the previous domain attach for this group. Caller of this + * function should hold vfio_iommu->lock. + */ +static struct vfio_group_object *vfio_iommu_group_object_check( + struct vfio_iommu *iommu, struct iommu_group *iommu_group) +{ + struct vfio_domain *d; + struct vfio_group_object *group_obj; + + list_for_each_entry(d, &iommu->domain_list, next) { + group_obj = find_iommu_group_object(d, iommu_group); + if (group_obj) + return group_obj; + } + return NULL; +} + static bool vfio_iommu_has_sw_msi(struct iommu_group *group, phys_addr_t *base) { struct list_head group_resv_regions; @@ -1310,21 +1397,23 @@ static struct device *vfio_mdev_get_iommu_device(struct device *dev) static int vfio_mdev_attach_domain(struct device *dev, void *data) { - struct iommu_domain *domain = data; + struct vfio_domain *domain = data; struct device *iommu_device; struct iommu_group *group; iommu_device = vfio_mdev_get_iommu_device(dev); if (iommu_device) { if (iommu_dev_feature_enabled(iommu_device, IOMMU_DEV_FEAT_AUX)) - return iommu_aux_attach_device(domain, iommu_device); + return iommu_aux_attach_device(domain->domain, + iommu_device); else { group = iommu_group_get(iommu_device); if (!group) { WARN_ON(1); return -EINVAL; } - return iommu_attach_group(domain, group); + return vfio_domain_attach_group_object( + domain, group); } } @@ -1333,21 +1422,22 @@ static int vfio_mdev_attach_domain(struct device *dev, void *data) static int vfio_mdev_detach_domain(struct device *dev, void *data) { - struct iommu_domain *domain = data; + struct vfio_domain *domain = data; struct device *iommu_device; struct iommu_group *group; iommu_device = vfio_mdev_get_iommu_device(dev); if (iommu_device) { if (iommu_dev_feature_enabled(iommu_device, IOMMU_DEV_FEAT_AUX)) - iommu_aux_detach_device(domain, iommu_device); + iommu_aux_detach_device(domain->domain, iommu_device); else { group = iommu_group_get(iommu_device); if (!group) { WARN_ON(1); return -EINVAL; } - iommu_detach_group(domain, group); + return vfio_domain_detach_group_object( + domain, group); } } @@ -1359,20 +1449,27 @@ static int vfio_iommu_attach_group(struct vfio_domain *domain, { if (group->mdev_group) return iommu_group_for_each_dev(group->iommu_group, - domain->domain, + domain, vfio_mdev_attach_domain); else - return iommu_attach_group(domain->domain, group->iommu_group); + return vfio_domain_attach_group_object(domain, + group->iommu_group); } static void vfio_iommu_detach_group(struct vfio_domain *domain, struct vfio_group *group) { + int ret; + if (group->mdev_group) - iommu_group_for_each_dev(group->iommu_group, domain->domain, + iommu_group_for_each_dev(group->iommu_group, domain, vfio_mdev_detach_domain); - else - iommu_detach_group(domain->domain, group->iommu_group); + else { + ret = vfio_domain_detach_group_object( + domain, group->iommu_group); + if (ret) + pr_warn("%s, deatch failed!! ret: %d", __func__, ret); + } } static bool vfio_bus_is_mdev(struct bus_type *bus) @@ -1412,6 +1509,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, int ret; bool resv_msi, msi_remap; phys_addr_t resv_msi_base; + struct vfio_group_object *group_obj = NULL; + struct device *iommu_device = NULL; + struct iommu_group *iommu_device_group; + mutex_lock(&iommu->lock); @@ -1438,14 +1539,20 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, group->iommu_group = iommu_group; + group_obj = vfio_iommu_group_object_check(iommu, group->iommu_group); + if (group_obj) { + atomic_inc(&group_obj->count); + list_add(&group->next, &group_obj->domain->group_list); + mutex_unlock(&iommu->lock); + return 0; + } + /* Determine bus_type in order to allocate a domain */ ret = iommu_group_for_each_dev(iommu_group, &bus, vfio_bus_type); if (ret) goto out_free; if (vfio_bus_is_mdev(bus)) { - struct device *iommu_device = NULL; - group->mdev_group = true; /* Determine the isolation type */ @@ -1469,6 +1576,39 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, bus = iommu_device->bus; } + /* + * Check if iommu backed group attached to a domain within current + * container. If yes, increase the count; If no, go ahead with a + * new domain attach process. + */ + group_obj = NULL; + if (iommu_device) { + iommu_device_group = iommu_group_get(iommu_device); + if (!iommu_device_group) { + WARN_ON(1); + kfree(domain); + mutex_unlock(&iommu->lock); + return -EINVAL; + } + group_obj = vfio_iommu_group_object_check(iommu, + iommu_device_group); + } else + group_obj = vfio_iommu_group_object_check(iommu, + group->iommu_group); + + if (group_obj) { + atomic_inc(&group_obj->count); + list_add(&group->next, &group_obj->domain->group_list); + kfree(domain); + mutex_unlock(&iommu->lock); + return 0; + } + + /* + * Now we are sure we want to initialize a new vfio_domain. + * First step is to alloc an iommu_domain from iommu abstract + * layer. + */ domain->domain = iommu_domain_alloc(bus); if (!domain->domain) { ret = -EIO; @@ -1484,6 +1624,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, goto out_domain; } + INIT_LIST_HEAD(&domain->group_obj_list); ret = vfio_iommu_attach_group(domain, group); if (ret) goto out_domain;