From patchwork Thu Sep 10 10:56:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11767601 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DA1EC59D for ; Thu, 10 Sep 2020 11:02:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BE4B821941 for ; Thu, 10 Sep 2020 11:02:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730538AbgIJLB7 (ORCPT ); Thu, 10 Sep 2020 07:01:59 -0400 Received: from mga12.intel.com ([192.55.52.136]:21743 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730636AbgIJK7j (ORCPT ); Thu, 10 Sep 2020 06:59:39 -0400 IronPort-SDR: XhRo6+qsk4Hzwac//UAWxyiB7v9skDl9kIIpI84Y48JQNkpwUzoodiNncdflfa1kV80/ZgXw/a vU6/ZxjorQOg== X-IronPort-AV: E=McAfee;i="6000,8403,9739"; a="138025850" X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="138025850" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2020 03:54:39 -0700 IronPort-SDR: KYL+DBA7/CqRG61xGOcwFf2KrovP/8NEtBv7A9NJBAUou9bME+cOo+X8tBNE+nCSpQHT7S4X/h 1/Khj93vKeuw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="334140036" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 10 Sep 2020 03:54:39 -0700 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com, jasowang@redhat.com Cc: mst@redhat.com, pbonzini@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, jean-philippe@linaro.org, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, hao.wu@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Cornelia Huck Subject: [RFC v10 01/25] scripts/update-linux-headers: Import iommu.h Date: Thu, 10 Sep 2020 03:56:14 -0700 Message-Id: <1599735398-6829-2-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> References: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org From: Eric Auger Update the script to import the new iommu.h uapi header. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Michael S. Tsirkin Cc: Cornelia Huck Cc: Paolo Bonzini Acked-by: Cornelia Huck Signed-off-by: Eric Auger --- scripts/update-linux-headers.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh index 29c27f4..5b64ee3 100755 --- a/scripts/update-linux-headers.sh +++ b/scripts/update-linux-headers.sh @@ -141,7 +141,7 @@ done rm -rf "$output/linux-headers/linux" mkdir -p "$output/linux-headers/linux" -for header in kvm.h vfio.h vfio_ccw.h vhost.h \ +for header in kvm.h vfio.h vfio_ccw.h vhost.h iommu.h \ psci.h psp-sev.h userfaultfd.h mman.h; do cp "$tmpdir/include/linux/$header" "$output/linux-headers/linux" done From patchwork Thu Sep 10 10:56:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11767609 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BB1B3112E for ; Thu, 10 Sep 2020 11:02:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A72BD2145D for ; Thu, 10 Sep 2020 11:02:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730650AbgIJLCT (ORCPT ); Thu, 10 Sep 2020 07:02:19 -0400 Received: from mga12.intel.com ([192.55.52.136]:21746 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729521AbgIJK7l (ORCPT ); Thu, 10 Sep 2020 06:59:41 -0400 IronPort-SDR: 1ky0HrZTS8ByaeP4UTaHb1p3tPjJqboNSFERVRw7om4PsRWqJywh1AiFnyV4VSBsL4iDd6lpKj /yogLl+m92qA== X-IronPort-AV: E=McAfee;i="6000,8403,9739"; a="138025854" X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="138025854" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2020 03:54:40 -0700 IronPort-SDR: iZ25jyoEMpNtXQgOp6LiOrZ0DnCiVgf/frtDmaCfxWV7B6PGdTtwbhbK+xhaP8aFcShTJinit7 jOBd6xopM14A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="334140044" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 10 Sep 2020 03:54:39 -0700 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com, jasowang@redhat.com Cc: mst@redhat.com, pbonzini@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, jean-philippe@linaro.org, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, hao.wu@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun Subject: [RFC v10 04/25] hw/pci: introduce pci_device_get_iommu_attr() Date: Thu, 10 Sep 2020 03:56:17 -0700 Message-Id: <1599735398-6829-5-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> References: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds pci_device_get_iommu_attr() to get vIOMMU attributes. e.g. if nesting IOMMU wanted. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Eric Auger Cc: Yi Sun Cc: David Gibson Cc: Michael S. Tsirkin Signed-off-by: Liu Yi L --- hw/pci/pci.c | 35 ++++++++++++++++++++++++++++++----- include/hw/pci/pci.h | 7 +++++++ 2 files changed, 37 insertions(+), 5 deletions(-) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 1967746..1886f8e 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -2659,7 +2659,8 @@ static void pci_device_class_base_init(ObjectClass *klass, void *data) } } -AddressSpace *pci_device_iommu_address_space(PCIDevice *dev) +static void pci_device_get_iommu_bus_devfn(PCIDevice *dev, + PCIBus **pbus, uint8_t *pdevfn) { PCIBus *bus = pci_get_bus(dev); PCIBus *iommu_bus = bus; @@ -2710,14 +2711,38 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice *dev) iommu_bus = parent_bus; } - if (iommu_bus && iommu_bus->iommu_ops && - iommu_bus->iommu_ops->get_address_space) { - return iommu_bus->iommu_ops->get_address_space(bus, - iommu_bus->iommu_opaque, devfn); + *pbus = iommu_bus; + *pdevfn = devfn; +} + +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev) +{ + PCIBus *bus; + uint8_t devfn; + + pci_device_get_iommu_bus_devfn(dev, &bus, &devfn); + if (bus && bus->iommu_ops && + bus->iommu_ops->get_address_space) { + return bus->iommu_ops->get_address_space(bus, + bus->iommu_opaque, devfn); } return &address_space_memory; } +int pci_device_get_iommu_attr(PCIDevice *dev, IOMMUAttr attr, void *data) +{ + PCIBus *bus; + uint8_t devfn; + + pci_device_get_iommu_bus_devfn(dev, &bus, &devfn); + if (bus && bus->iommu_ops && + bus->iommu_ops->get_iommu_attr) { + return bus->iommu_ops->get_iommu_attr(bus, bus->iommu_opaque, + devfn, attr, data); + } + return -ENOENT; +} + void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *ops, void *opaque) { bus->iommu_ops = ops; diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index 7c46a78..18b51dd 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -487,13 +487,20 @@ void pci_bus_get_w64_range(PCIBus *bus, Range *range); void pci_device_deassert_intx(PCIDevice *dev); +typedef enum IOMMUAttr { + IOMMU_WANT_NESTING, +} IOMMUAttr; + typedef struct PCIIOMMUOps PCIIOMMUOps; struct PCIIOMMUOps { AddressSpace * (*get_address_space)(PCIBus *bus, void *opaque, int32_t devfn); + int (*get_iommu_attr)(PCIBus *bus, void *opaque, int32_t devfn, + IOMMUAttr attr, void *data); }; AddressSpace *pci_device_iommu_address_space(PCIDevice *dev); +int pci_device_get_iommu_attr(PCIDevice *dev, IOMMUAttr attr, void *data); void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *iommu_ops, void *opaque); static inline void From patchwork Thu Sep 10 10:56:22 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11767647 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 01752618 for ; Thu, 10 Sep 2020 11:10:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DD74621941 for ; Thu, 10 Sep 2020 11:10:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730615AbgIJLKR (ORCPT ); Thu, 10 Sep 2020 07:10:17 -0400 Received: from mga09.intel.com ([134.134.136.24]:7227 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730511AbgIJKyl (ORCPT ); Thu, 10 Sep 2020 06:54:41 -0400 IronPort-SDR: OmNKNWq48s2P+FuxRZf7BjHZky3e0v4uyblf5csqRHprxYvsqyUCeeTAAZFBGVJBa/J9XG9BjZ 8eicy1OxCc6w== X-IronPort-AV: E=McAfee;i="6000,8403,9739"; a="159459139" X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="159459139" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2020 03:54:40 -0700 IronPort-SDR: +7U6P3wxXumwv3NXvJyLC/60aFJ20BSsqjAHJmlIqrCJCVm+nyKeM7DvD8n++efJcM7kolDTjf p/W1NWC4xExg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="334140060" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 10 Sep 2020 03:54:39 -0700 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com, jasowang@redhat.com Cc: mst@redhat.com, pbonzini@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, jean-philippe@linaro.org, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, hao.wu@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun Subject: [RFC v10 09/25] hw/pci: introduce pci_device_set/unset_iommu_context() Date: Thu, 10 Sep 2020 03:56:22 -0700 Message-Id: <1599735398-6829-10-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> References: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org For nesting IOMMU translation capable platforms, vIOMMUs running on such system could be implemented upon physical IOMMU nested paging (VFIO case). vIOMMU advertises such implementation by "want_nested" attribute to PCIe devices (e.g. VFIO PCI). Once "want_nested" is satisfied, device (VFIO case) should set HostIOMMUContext to vIOMMU, thus vIOMMU could manage stage-1 translation. DMAs out from such devices would be protected through the stage-1 page tables owned by guest together with stage-2 page tables owned by host. This patch adds pci_device_set/unset_iommu_context() to set/unset HostIOMMUContext for a given PCIe device (VFIO case). Caller of set should fail if set operation failed. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Eric Auger Cc: Yi Sun Cc: David Gibson Cc: Michael S. Tsirkin Reviewed-by: Peter Xu Signed-off-by: Liu Yi L --- rfcv5 (v2) -> rfcv6: *) pci_device_set_iommu_context() returns 0 if callback is not implemented. --- hw/pci/pci.c | 28 ++++++++++++++++++++++++++++ include/hw/pci/pci.h | 10 ++++++++++ 2 files changed, 38 insertions(+) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 1886f8e..e1b2f05 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -2743,6 +2743,34 @@ int pci_device_get_iommu_attr(PCIDevice *dev, IOMMUAttr attr, void *data) return -ENOENT; } +int pci_device_set_iommu_context(PCIDevice *dev, + HostIOMMUContext *iommu_ctx) +{ + PCIBus *bus; + uint8_t devfn; + + pci_device_get_iommu_bus_devfn(dev, &bus, &devfn); + if (bus && bus->iommu_ops && + bus->iommu_ops->set_iommu_context) { + return bus->iommu_ops->set_iommu_context(bus, + bus->iommu_opaque, devfn, iommu_ctx); + } + return 0; +} + +void pci_device_unset_iommu_context(PCIDevice *dev) +{ + PCIBus *bus; + uint8_t devfn; + + pci_device_get_iommu_bus_devfn(dev, &bus, &devfn); + if (bus && bus->iommu_ops && + bus->iommu_ops->unset_iommu_context) { + bus->iommu_ops->unset_iommu_context(bus, + bus->iommu_opaque, devfn); + } +} + void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *ops, void *opaque) { bus->iommu_ops = ops; diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index 18b51dd..9348560 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -9,6 +9,8 @@ #include "hw/pci/pcie.h" +#include "hw/iommu/host_iommu_context.h" + extern bool pci_available; /* PCI bus */ @@ -497,10 +499,18 @@ struct PCIIOMMUOps { void *opaque, int32_t devfn); int (*get_iommu_attr)(PCIBus *bus, void *opaque, int32_t devfn, IOMMUAttr attr, void *data); + int (*set_iommu_context)(PCIBus *bus, void *opaque, + int32_t devfn, + HostIOMMUContext *iommu_ctx); + void (*unset_iommu_context)(PCIBus *bus, void *opaque, + int32_t devfn); }; AddressSpace *pci_device_iommu_address_space(PCIDevice *dev); int pci_device_get_iommu_attr(PCIDevice *dev, IOMMUAttr attr, void *data); +int pci_device_set_iommu_context(PCIDevice *dev, + HostIOMMUContext *iommu_ctx); +void pci_device_unset_iommu_context(PCIDevice *dev); void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *iommu_ops, void *opaque); static inline void From patchwork Thu Sep 10 10:56:23 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11767557 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A9FAB59D for ; Thu, 10 Sep 2020 10:56:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 95252214F1 for ; Thu, 10 Sep 2020 10:56:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730187AbgIJK4l (ORCPT ); Thu, 10 Sep 2020 06:56:41 -0400 Received: from mga09.intel.com ([134.134.136.24]:6916 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730577AbgIJK41 (ORCPT ); Thu, 10 Sep 2020 06:56:27 -0400 IronPort-SDR: azyWXw/TkVYw/NqBpYuXyLijNBlsOPru8/nXG6OIaz9NQybdbUBAohkWxUCktgM/zXMvw2QcTy Tjrg+TW0RE/w== X-IronPort-AV: E=McAfee;i="6000,8403,9739"; a="159459140" X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="159459140" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2020 03:54:40 -0700 IronPort-SDR: mKdvn83Plwe/RFcK17KsPNCB2uv1Za3rCq/uvW/TAZLG0CAS+n/NNCEbB/8ybDOMWqHm5zEysv gKd1APzIrL4Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="334140064" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 10 Sep 2020 03:54:39 -0700 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com, jasowang@redhat.com Cc: mst@redhat.com, pbonzini@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, jean-philippe@linaro.org, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, hao.wu@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Richard Henderson , Eduardo Habkost Subject: [RFC v10 10/25] intel_iommu: add set/unset_iommu_context callback Date: Thu, 10 Sep 2020 03:56:23 -0700 Message-Id: <1599735398-6829-11-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> References: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds set/unset_iommu_context() impelementation in Intel vIOMMU. PCIe devices (VFIO case) sets HostIOMMUContext to vIOMMU as an ack of vIOMMU's "want_nested" attribute. Thus vIOMMU could build DMA protection based on nested paging of host IOMMU. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Signed-off-by: Liu Yi L --- hw/i386/intel_iommu.c | 71 ++++++++++++++++++++++++++++++++++++++++--- include/hw/i386/intel_iommu.h | 21 ++++++++++--- 2 files changed, 83 insertions(+), 9 deletions(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 333f172..bf496f7 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -3358,23 +3358,33 @@ static const MemoryRegionOps vtd_mem_ir_ops = { }, }; -VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn) +/** + * Fetch a VTDBus instance for given PCIBus. If no existing instance, + * allocate one. + */ +static VTDBus *vtd_find_add_bus(IntelIOMMUState *s, PCIBus *bus) { uintptr_t key = (uintptr_t)bus; VTDBus *vtd_bus = g_hash_table_lookup(s->vtd_as_by_busptr, &key); - VTDAddressSpace *vtd_dev_as; - char name[128]; if (!vtd_bus) { uintptr_t *new_key = g_malloc(sizeof(*new_key)); *new_key = (uintptr_t)bus; /* No corresponding free() */ - vtd_bus = g_malloc0(sizeof(VTDBus) + sizeof(VTDAddressSpace *) * \ - PCI_DEVFN_MAX); + vtd_bus = g_malloc0(sizeof(VTDBus)); vtd_bus->bus = bus; g_hash_table_insert(s->vtd_as_by_busptr, new_key, vtd_bus); } + return vtd_bus; +} +VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn) +{ + VTDBus *vtd_bus; + VTDAddressSpace *vtd_dev_as; + char name[128]; + + vtd_bus = vtd_find_add_bus(s, bus); vtd_dev_as = vtd_bus->dev_as[devfn]; if (!vtd_dev_as) { @@ -3462,6 +3472,55 @@ static int vtd_dev_get_iommu_attr(PCIBus *bus, void *opaque, int32_t devfn, return ret; } +static int vtd_dev_set_iommu_context(PCIBus *bus, void *opaque, + int devfn, + HostIOMMUContext *iommu_ctx) +{ + IntelIOMMUState *s = opaque; + VTDBus *vtd_bus; + VTDHostIOMMUContext *vtd_dev_icx; + + assert(0 <= devfn && devfn < PCI_DEVFN_MAX); + + vtd_bus = vtd_find_add_bus(s, bus); + + vtd_iommu_lock(s); + + vtd_dev_icx = vtd_bus->dev_icx[devfn]; + + assert(!vtd_dev_icx); + + vtd_bus->dev_icx[devfn] = vtd_dev_icx = + g_malloc0(sizeof(VTDHostIOMMUContext)); + vtd_dev_icx->vtd_bus = vtd_bus; + vtd_dev_icx->devfn = (uint8_t)devfn; + vtd_dev_icx->iommu_state = s; + vtd_dev_icx->iommu_ctx = iommu_ctx; + + vtd_iommu_unlock(s); + + return 0; +} + +static void vtd_dev_unset_iommu_context(PCIBus *bus, void *opaque, int devfn) +{ + IntelIOMMUState *s = opaque; + VTDBus *vtd_bus; + VTDHostIOMMUContext *vtd_dev_icx; + + assert(0 <= devfn && devfn < PCI_DEVFN_MAX); + + vtd_bus = vtd_find_add_bus(s, bus); + + vtd_iommu_lock(s); + + vtd_dev_icx = vtd_bus->dev_icx[devfn]; + g_free(vtd_dev_icx); + vtd_bus->dev_icx[devfn] = NULL; + + vtd_iommu_unlock(s); +} + static uint64_t get_naturally_aligned_size(uint64_t start, uint64_t size, int gaw) { @@ -3758,6 +3817,8 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn) static PCIIOMMUOps vtd_iommu_ops = { .get_address_space = vtd_host_dma_iommu, .get_iommu_attr = vtd_dev_get_iommu_attr, + .set_iommu_context = vtd_dev_set_iommu_context, + .unset_iommu_context = vtd_dev_unset_iommu_context, }; static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h index 3870052..b5fefb9 100644 --- a/include/hw/i386/intel_iommu.h +++ b/include/hw/i386/intel_iommu.h @@ -64,6 +64,7 @@ typedef union VTD_IR_TableEntry VTD_IR_TableEntry; typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress; typedef struct VTDPASIDDirEntry VTDPASIDDirEntry; typedef struct VTDPASIDEntry VTDPASIDEntry; +typedef struct VTDHostIOMMUContext VTDHostIOMMUContext; /* Context-Entry */ struct VTDContextEntry { @@ -112,10 +113,20 @@ struct VTDAddressSpace { IOVATree *iova_tree; /* Traces mapped IOVA ranges */ }; +struct VTDHostIOMMUContext { + VTDBus *vtd_bus; + uint8_t devfn; + HostIOMMUContext *iommu_ctx; + IntelIOMMUState *iommu_state; +}; + struct VTDBus { - PCIBus* bus; /* A reference to the bus to provide translation for */ + /* A reference to the bus to provide translation for */ + PCIBus *bus; /* A table of VTDAddressSpace objects indexed by devfn */ - VTDAddressSpace *dev_as[]; + VTDAddressSpace *dev_as[PCI_DEVFN_MAX]; + /* A table of VTDHostIOMMUContext objects indexed by devfn */ + VTDHostIOMMUContext *dev_icx[PCI_DEVFN_MAX]; }; struct VTDIOTLBEntry { @@ -269,8 +280,10 @@ struct IntelIOMMUState { bool dma_drain; /* Whether DMA r/w draining enabled */ /* - * Protects IOMMU states in general. Currently it protects the - * per-IOMMU IOTLB cache, and context entry cache in VTDAddressSpace. + * iommu_lock protects below: + * - per-IOMMU IOTLB caches + * - context entry cache in VTDAddressSpace + * - HostIOMMUContext pointer cached in vIOMMU */ QemuMutex iommu_lock; }; From patchwork Thu Sep 10 10:56:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11767565 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9137414F6 for ; Thu, 10 Sep 2020 10:57:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7A33B214F1 for ; Thu, 10 Sep 2020 10:57:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730674AbgIJK5S (ORCPT ); Thu, 10 Sep 2020 06:57:18 -0400 Received: from mga09.intel.com ([134.134.136.24]:7070 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730418AbgIJKzb (ORCPT ); Thu, 10 Sep 2020 06:55:31 -0400 IronPort-SDR: Mj2HtKFwWvjWAIzSWS1I31uoLgj/nYiCmTaWHMqKYQf8IQgLXZmSBtaMJ8Iq6+ze0EoQwXcafn FPfu7M+KCf5g== X-IronPort-AV: E=McAfee;i="6000,8403,9739"; a="159459141" X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="159459141" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2020 03:54:40 -0700 IronPort-SDR: DJg/B4V+A0ZW2xMGSeo8hX/KKCHOWTy1jeiRSly21QzvKQH2nXRxKxv4foSGb0CB+GBg4gmkJz k9p9toUCuuOQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="334140066" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 10 Sep 2020 03:54:39 -0700 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com, jasowang@redhat.com Cc: mst@redhat.com, pbonzini@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, jean-philippe@linaro.org, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, hao.wu@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun Subject: [RFC v10 11/25] vfio/common: provide PASID alloc/free hooks Date: Thu, 10 Sep 2020 03:56:24 -0700 Message-Id: <1599735398-6829-12-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> References: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch defines vfio_host_iommu_context_info, implements the PASID alloc/free hooks defined in HostIOMMUContextClass. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Eric Auger Cc: Yi Sun Cc: David Gibson Cc: Alex Williamson Signed-off-by: Liu Yi L --- hw/vfio/common.c | 66 +++++++++++++++++++++++++++++++++++ include/hw/iommu/host_iommu_context.h | 3 ++ include/hw/vfio/vfio-common.h | 4 +++ 3 files changed, 73 insertions(+) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index af91eca..41aaf41 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -1183,6 +1183,50 @@ static int vfio_get_iommu_type(VFIOContainer *container, return ret; } +static int vfio_host_iommu_ctx_pasid_alloc(HostIOMMUContext *iommu_ctx, + uint32_t min, uint32_t max, + uint32_t *pasid) +{ + VFIOContainer *container = container_of(iommu_ctx, + VFIOContainer, iommu_ctx); + struct vfio_iommu_type1_pasid_request req; + int ret = 0; + + req.argsz = sizeof(req); + req.flags = VFIO_IOMMU_FLAG_ALLOC_PASID; + req.range.min = min; + req.range.max = max; + + ret = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req); + if (ret < 0) { + error_report("%s: alloc failed (%m)", __func__); + return ret; + } + *pasid = ret; + return 0; +} + +static int vfio_host_iommu_ctx_pasid_free(HostIOMMUContext *iommu_ctx, + uint32_t pasid) +{ + VFIOContainer *container = container_of(iommu_ctx, + VFIOContainer, iommu_ctx); + struct vfio_iommu_type1_pasid_request req; + + int ret = 0; + + req.argsz = sizeof(req); + req.flags = VFIO_IOMMU_FLAG_FREE_PASID; + req.range.min = pasid; + req.range.max = pasid + 1; + + ret = ioctl(container->fd, VFIO_IOMMU_PASID_REQUEST, &req); + if (ret) { + error_report("%s: free failed (%m)", __func__); + } + return ret; +} + static int vfio_init_container(VFIOContainer *container, int group_fd, bool want_nested, Error **errp) { @@ -1802,3 +1846,25 @@ int vfio_eeh_as_op(AddressSpace *as, uint32_t op) } return vfio_eeh_container_op(container, op); } + +static void vfio_host_iommu_context_class_init(ObjectClass *klass, + void *data) +{ + HostIOMMUContextClass *hicxc = HOST_IOMMU_CONTEXT_CLASS(klass); + + hicxc->pasid_alloc = vfio_host_iommu_ctx_pasid_alloc; + hicxc->pasid_free = vfio_host_iommu_ctx_pasid_free; +} + +static const TypeInfo vfio_host_iommu_context_info = { + .parent = TYPE_HOST_IOMMU_CONTEXT, + .name = TYPE_VFIO_HOST_IOMMU_CONTEXT, + .class_init = vfio_host_iommu_context_class_init, +}; + +static void vfio_register_types(void) +{ + type_register_static(&vfio_host_iommu_context_info); +} + +type_init(vfio_register_types) diff --git a/include/hw/iommu/host_iommu_context.h b/include/hw/iommu/host_iommu_context.h index 35c4861..227c433 100644 --- a/include/hw/iommu/host_iommu_context.h +++ b/include/hw/iommu/host_iommu_context.h @@ -33,6 +33,9 @@ #define TYPE_HOST_IOMMU_CONTEXT "qemu:host-iommu-context" #define HOST_IOMMU_CONTEXT(obj) \ OBJECT_CHECK(HostIOMMUContext, (obj), TYPE_HOST_IOMMU_CONTEXT) +#define HOST_IOMMU_CONTEXT_CLASS(klass) \ + OBJECT_CLASS_CHECK(HostIOMMUContextClass, (klass), \ + TYPE_HOST_IOMMU_CONTEXT) #define HOST_IOMMU_CONTEXT_GET_CLASS(obj) \ OBJECT_GET_CLASS(HostIOMMUContextClass, (obj), \ TYPE_HOST_IOMMU_CONTEXT) diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index bdb09f4..a5eaf35 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -26,12 +26,15 @@ #include "qemu/notify.h" #include "ui/console.h" #include "hw/display/ramfb.h" +#include "hw/iommu/host_iommu_context.h" #ifdef CONFIG_LINUX #include #endif #define VFIO_MSG_PREFIX "vfio %s: " +#define TYPE_VFIO_HOST_IOMMU_CONTEXT "qemu:vfio-host-iommu-context" + enum { VFIO_DEVICE_TYPE_PCI = 0, VFIO_DEVICE_TYPE_PLATFORM = 1, @@ -71,6 +74,7 @@ typedef struct VFIOContainer { MemoryListener listener; MemoryListener prereg_listener; unsigned iommu_type; + HostIOMMUContext iommu_ctx; Error *error; bool initialized; unsigned long pgsizes; From patchwork Thu Sep 10 10:56:25 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11767577 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1A6C6139F for ; Thu, 10 Sep 2020 10:59:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 019002145D for ; Thu, 10 Sep 2020 10:59:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730591AbgIJK4v (ORCPT ); Thu, 10 Sep 2020 06:56:51 -0400 Received: from mga09.intel.com ([134.134.136.24]:7227 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729824AbgIJK4h (ORCPT ); Thu, 10 Sep 2020 06:56:37 -0400 IronPort-SDR: 7ekrrdR77qoM6NZUBHiO+ufPhQAHmxfaA6sHc2NsQ1Qidmatw6vl0chhGABN2ruWg2J3XnXQK3 Y2VLE+NBj6Vg== X-IronPort-AV: E=McAfee;i="6000,8403,9739"; a="159459143" X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="159459143" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2020 03:54:40 -0700 IronPort-SDR: wfMaShXIyZHNaTw6w5AzhWQk6Uvyi6b5kFpgPGkjXZRkAo/K0obDG0N9dwdFk+jvn/yviAS9Gi 62ucLIx4zRfg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="334140070" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 10 Sep 2020 03:54:39 -0700 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com, jasowang@redhat.com Cc: mst@redhat.com, pbonzini@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, jean-philippe@linaro.org, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, hao.wu@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun Subject: [RFC v10 12/25] vfio: init HostIOMMUContext per-container Date: Thu, 10 Sep 2020 03:56:25 -0700 Message-Id: <1599735398-6829-13-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> References: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org In this patch, QEMU firstly gets iommu info from kernel to check the supported capabilities by a VFIO_IOMMU_TYPE1_NESTING iommu. And inits HostIOMMUContet instance. For vfio-pci devices, it could use pci_device_set/unset_iommu() to expose host iommu context to vIOMMU emulators. vIOMMU emulators could make use the methods provided by host iommu context. e.g. propagate requests to host iommu. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Eric Auger Cc: Yi Sun Cc: David Gibson Cc: Alex Williamson Signed-off-by: Liu Yi L --- hw/vfio/common.c | 113 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ hw/vfio/pci.c | 17 +++++++++ 2 files changed, 130 insertions(+) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 41aaf41..f41deeb 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -1227,10 +1227,102 @@ static int vfio_host_iommu_ctx_pasid_free(HostIOMMUContext *iommu_ctx, return ret; } +/** + * Get iommu info from host. Caller of this funcion should free + * the memory pointed by the returned pointer stored in @info + * after a successful calling when finished its usage. + */ +static int vfio_get_iommu_info(VFIOContainer *container, + struct vfio_iommu_type1_info **info) +{ + + size_t argsz = sizeof(struct vfio_iommu_type1_info); + + *info = g_malloc0(argsz); + +retry: + (*info)->argsz = argsz; + + if (ioctl(container->fd, VFIO_IOMMU_GET_INFO, *info)) { + g_free(*info); + *info = NULL; + return -errno; + } + + if (((*info)->argsz > argsz)) { + argsz = (*info)->argsz; + *info = g_realloc(*info, argsz); + goto retry; + } + + return 0; +} + +static struct vfio_info_cap_header * +vfio_get_iommu_info_cap(struct vfio_iommu_type1_info *info, uint16_t id) +{ + struct vfio_info_cap_header *hdr; + void *ptr = info; + + if (!(info->flags & VFIO_IOMMU_INFO_CAPS)) { + return NULL; + } + + for (hdr = ptr + info->cap_offset; hdr != ptr; hdr = ptr + hdr->next) { + if (hdr->id == id) { + return hdr; + } + } + + return NULL; +} + +static int vfio_get_nesting_iommu_cap(VFIOContainer *container, + struct vfio_iommu_type1_info_cap_nesting **cap_nesting) +{ + struct vfio_iommu_type1_info *info; + struct vfio_info_cap_header *hdr; + struct vfio_iommu_type1_info_cap_nesting *cap; + struct iommu_nesting_info *nest_info; + int ret; + uint32_t minsz, cap_size; + + ret = vfio_get_iommu_info(container, &info); + if (ret) { + return ret; + } + + hdr = vfio_get_iommu_info_cap(info, + VFIO_IOMMU_TYPE1_INFO_CAP_NESTING); + if (!hdr) { + g_free(info); + return -EINVAL; + } + + cap = container_of(hdr, + struct vfio_iommu_type1_info_cap_nesting, header); + + nest_info = &cap->info; + minsz = offsetof(struct iommu_nesting_info, vendor); + if (nest_info->argsz < minsz) { + g_free(info); + return -EINVAL; + } + + cap_size = offsetof(struct vfio_iommu_type1_info_cap_nesting, info) + + nest_info->argsz; + *cap_nesting = g_malloc0(cap_size); + memcpy(*cap_nesting, cap, cap_size); + + g_free(info); + return 0; +} + static int vfio_init_container(VFIOContainer *container, int group_fd, bool want_nested, Error **errp) { int iommu_type, ret; + uint64_t flags = 0; iommu_type = vfio_get_iommu_type(container, want_nested, errp); if (iommu_type < 0) { @@ -1258,6 +1350,27 @@ static int vfio_init_container(VFIOContainer *container, int group_fd, return -errno; } + if (iommu_type == VFIO_TYPE1_NESTING_IOMMU) { + struct vfio_iommu_type1_info_cap_nesting *nesting = NULL; + struct iommu_nesting_info *nest_info; + + ret = vfio_get_nesting_iommu_cap(container, &nesting); + if (ret) { + error_setg_errno(errp, -ret, + "Failed to get nesting iommu cap"); + return ret; + } + + nest_info = (struct iommu_nesting_info *) &nesting->info; + flags |= (nest_info->features & IOMMU_NESTING_FEAT_SYSWIDE_PASID) ? + HOST_IOMMU_PASID_REQUEST : 0; + host_iommu_ctx_init(&container->iommu_ctx, + sizeof(container->iommu_ctx), + TYPE_VFIO_HOST_IOMMU_CONTEXT, + flags); + g_free(nesting); + } + container->iommu_type = iommu_type; return 0; } diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index d33fb89..3907c4f 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2704,6 +2704,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) VFIOPCIDevice *vdev = PCI_VFIO(pdev); VFIODevice *vbasedev_iter; VFIOGroup *group; + VFIOContainer *container; char *tmp, *subsys, group_path[PATH_MAX], *group_name; Error *err = NULL; ssize_t len; @@ -2780,6 +2781,15 @@ static void vfio_realize(PCIDevice *pdev, Error **errp) goto error; } + container = group->container; + if (container->iommu_ctx.initialized && + pci_device_set_iommu_context(pdev, &container->iommu_ctx)) { + error_setg(errp, "device attachment is denied by vIOMMU, " + "please check host IOMMU nesting capability"); + vfio_put_group(group); + goto error; + } + QLIST_FOREACH(vbasedev_iter, &group->device_list, next) { if (strcmp(vbasedev_iter->name, vdev->vbasedev.name) == 0) { error_setg(errp, "device is already attached"); @@ -3065,9 +3075,16 @@ static void vfio_instance_finalize(Object *obj) static void vfio_exitfn(PCIDevice *pdev) { VFIOPCIDevice *vdev = PCI_VFIO(pdev); + VFIOContainer *container; vfio_unregister_req_notifier(vdev); vfio_unregister_err_notifier(vdev); + + container = vdev->vbasedev.group->container; + if (container->iommu_ctx.initialized) { + pci_device_unset_iommu_context(pdev); + } + pci_device_set_intx_routing_notifier(&vdev->pdev, NULL); if (vdev->irqchip_change_notifier.notify) { kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier); From patchwork Thu Sep 10 10:56:27 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11767561 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9EE7959D for ; Thu, 10 Sep 2020 10:57:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 88E132145D for ; Thu, 10 Sep 2020 10:57:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730108AbgIJK5I (ORCPT ); Thu, 10 Sep 2020 06:57:08 -0400 Received: from mga09.intel.com ([134.134.136.24]:7070 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730330AbgIJK4g (ORCPT ); Thu, 10 Sep 2020 06:56:36 -0400 IronPort-SDR: VjUz+vzhkfngbD5cAdLL443xrLbykN2uIN7hWs15W0ofpxu16hX2SJcxfCOHgXUzfkw+cXGiqy F2w6P0aHAc8w== X-IronPort-AV: E=McAfee;i="6000,8403,9739"; a="159459144" X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="159459144" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2020 03:54:40 -0700 IronPort-SDR: 6zzGrK2ZKPMcXF+YXY0iZIU0Yzw4P4NOAg4BVliCv/6aCFh5qTQKt6+T1NKAqeilQYCrMiSg3X 4jGXY7cVzi1A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="334140076" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 10 Sep 2020 03:54:40 -0700 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com, jasowang@redhat.com Cc: mst@redhat.com, pbonzini@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, jean-philippe@linaro.org, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, hao.wu@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Richard Henderson , Eduardo Habkost Subject: [RFC v10 14/25] intel_iommu: process PASID cache invalidation Date: Thu, 10 Sep 2020 03:56:27 -0700 Message-Id: <1599735398-6829-15-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> References: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds PASID cache invalidation handling. When guest enabled PASID usages (e.g. SVA), guest software should issue a proper PASID cache invalidation when caching-mode is exposed. This patch only adds the draft handling of pasid cache invalidation. Detailed handling will be added in subsequent patches. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Reviewed-by: Peter Xu Signed-off-by: Liu Yi L --- rfcv4 (v1) -> rfcv5 (v2): *) remove vtd_pasid_cache_gsi(), vtd_pasid_cache_psi() and vtd_pasid_cache_dsi() --- hw/i386/intel_iommu.c | 40 +++++++++++++++++++++++++++++++++++----- hw/i386/intel_iommu_internal.h | 12 ++++++++++++ hw/i386/trace-events | 3 +++ 3 files changed, 50 insertions(+), 5 deletions(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index f6353c7..b110c7f 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -2395,6 +2395,37 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) return true; } +static bool vtd_process_pasid_desc(IntelIOMMUState *s, + VTDInvDesc *inv_desc) +{ + if ((inv_desc->val[0] & VTD_INV_DESC_PASIDC_RSVD_VAL0) || + (inv_desc->val[1] & VTD_INV_DESC_PASIDC_RSVD_VAL1) || + (inv_desc->val[2] & VTD_INV_DESC_PASIDC_RSVD_VAL2) || + (inv_desc->val[3] & VTD_INV_DESC_PASIDC_RSVD_VAL3)) { + error_report_once("non-zero-field-in-pc_inv_desc hi: 0x%" PRIx64 + " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]); + return false; + } + + switch (inv_desc->val[0] & VTD_INV_DESC_PASIDC_G) { + case VTD_INV_DESC_PASIDC_DSI: + break; + + case VTD_INV_DESC_PASIDC_PASID_SI: + break; + + case VTD_INV_DESC_PASIDC_GLOBAL: + break; + + default: + error_report_once("invalid-inv-granu-in-pc_inv_desc hi: 0x%" PRIx64 + " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]); + return false; + } + + return true; +} + static bool vtd_process_inv_iec_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) { @@ -2501,12 +2532,11 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s) } break; - /* - * TODO: the entity of below two cases will be implemented in future series. - * To make guest (which integrates scalable mode support patch set in - * iommu driver) work, just return true is enough so far. - */ case VTD_INV_DESC_PC: + trace_vtd_inv_desc("pasid-cache", inv_desc.val[1], inv_desc.val[0]); + if (!vtd_process_pasid_desc(s, &inv_desc)) { + return false; + } break; case VTD_INV_DESC_PIOTLB: diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index 64ac0a8..22d0bc5 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -445,6 +445,18 @@ typedef union VTDInvDesc VTDInvDesc; (0x3ffff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM | VTD_SL_TM)) : \ (0x3ffff800ULL | ~(VTD_HAW_MASK(aw) | VTD_SL_IGN_COM)) +#define VTD_INV_DESC_PASIDC_G (3ULL << 4) +#define VTD_INV_DESC_PASIDC_PASID(val) (((val) >> 32) & 0xfffffULL) +#define VTD_INV_DESC_PASIDC_DID(val) (((val) >> 16) & VTD_DOMAIN_ID_MASK) +#define VTD_INV_DESC_PASIDC_RSVD_VAL0 0xfff000000000ffc0ULL +#define VTD_INV_DESC_PASIDC_RSVD_VAL1 0xffffffffffffffffULL +#define VTD_INV_DESC_PASIDC_RSVD_VAL2 0xffffffffffffffffULL +#define VTD_INV_DESC_PASIDC_RSVD_VAL3 0xffffffffffffffffULL + +#define VTD_INV_DESC_PASIDC_DSI (0ULL << 4) +#define VTD_INV_DESC_PASIDC_PASID_SI (1ULL << 4) +#define VTD_INV_DESC_PASIDC_GLOBAL (3ULL << 4) + /* Information about page-selective IOTLB invalidate */ struct VTDIOTLBPageInvInfo { uint16_t domain_id; diff --git a/hw/i386/trace-events b/hw/i386/trace-events index 71536a7..f7cd4e5 100644 --- a/hw/i386/trace-events +++ b/hw/i386/trace-events @@ -22,6 +22,9 @@ vtd_inv_qi_head(uint16_t head) "read head %d" vtd_inv_qi_tail(uint16_t head) "write tail %d" vtd_inv_qi_fetch(void) "" vtd_context_cache_reset(void) "" +vtd_pasid_cache_gsi(void) "" +vtd_pasid_cache_dsi(uint16_t domain) "Domian slective PC invalidation domain 0x%"PRIx16 +vtd_pasid_cache_psi(uint16_t domain, uint32_t pasid) "PASID slective PC invalidation domain 0x%"PRIx16" pasid 0x%"PRIx32 vtd_re_not_present(uint8_t bus) "Root entry bus %"PRIu8" not present" vtd_ce_not_present(uint8_t bus, uint8_t devfn) "Context entry bus %"PRIu8" devfn %"PRIu8" not present" vtd_iotlb_page_hit(uint16_t sid, uint64_t addr, uint64_t slpte, uint16_t domain) "IOTLB page hit sid 0x%"PRIx16" iova 0x%"PRIx64" slpte 0x%"PRIx64" domain 0x%"PRIx16 From patchwork Thu Sep 10 10:56:28 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11767617 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 264A559D for ; Thu, 10 Sep 2020 11:04:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E93AA20C09 for ; Thu, 10 Sep 2020 11:04:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730552AbgIJLER (ORCPT ); Thu, 10 Sep 2020 07:04:17 -0400 Received: from mga09.intel.com ([134.134.136.24]:7070 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730376AbgIJLAp (ORCPT ); Thu, 10 Sep 2020 07:00:45 -0400 IronPort-SDR: +PNLBwPx7qo6wEnbcYQDMuQsZeGGMHRYd20GYgfjk1S2uyuhW1xe622l0cgaYCN7LzfocdsttL FsXQaTfHpEDg== X-IronPort-AV: E=McAfee;i="6000,8403,9739"; a="159459146" X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="159459146" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2020 03:54:40 -0700 IronPort-SDR: HU/eE5+5MsD+A3ZZgal7IhHIBhSvZyAc6macIJ/uI5dehA6L6gGwnqY7MSe2tn0Jc4zQb/3kky aO0D05VAu55g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="334140079" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 10 Sep 2020 03:54:40 -0700 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com, jasowang@redhat.com Cc: mst@redhat.com, pbonzini@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, jean-philippe@linaro.org, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, hao.wu@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Richard Henderson , Eduardo Habkost Subject: [RFC v10 15/25] intel_iommu: add PASID cache management infrastructure Date: Thu, 10 Sep 2020 03:56:28 -0700 Message-Id: <1599735398-6829-16-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> References: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds a PASID cache management infrastructure based on new added structure VTDPASIDAddressSpace, which is used to track the PASID usage and future PASID tagged DMA address translation support in vIOMMU. struct VTDPASIDAddressSpace { VTDBus *vtd_bus; uint8_t devfn; AddressSpace as; uint32_t pasid; IntelIOMMUState *iommu_state; VTDContextCacheEntry context_cache_entry; QLIST_ENTRY(VTDPASIDAddressSpace) next; VTDPASIDCacheEntry pasid_cache_entry; }; Ideally, a VTDPASIDAddressSpace instance is created when a PASID is bound with a DMA AddressSpace. Intel VT-d spec requires guest software to issue pasid cache invalidation when bind or unbind a pasid with an address space under caching-mode. However, as VTDPASIDAddressSpace instances also act as pasid cache in this implementation, its creation also happens during vIOMMU PASID tagged DMA translation. The creation in this path will not be added in this patch since no PASID-capable emulated devices for now. The implementation in this patch manages VTDPASIDAddressSpace instances per PASID+BDF (lookup and insert will use PASID and BDF) since Intel VT-d spec allows per-BDF PASID Table. When a guest bind a PASID with an AddressSpace, QEMU will capture the guest pasid selective pasid cache invalidation, and allocate remove a VTDPASIDAddressSpace instance per the invalidation reasons: *) a present pasid entry moved to non-present *) a present pasid entry to be a present entry *) a non-present pasid entry moved to present vIOMMU emulator could figure out the reason by fetching latest guest pasid entry. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Signed-off-by: Liu Yi L --- rfcv4 (v1) -> rfcv5 (v2): *) merged this patch with former replay binding patch, makes PSI/DSI/GSI use the unified function to do cache invalidation and pasid binding replay. *) dropped pasid_cache_gen in both iommu_state and vtd_pasid_as as it is not necessary so far, we may want it when one day initroduce emulated SVA-capable device. --- hw/i386/intel_iommu.c | 464 +++++++++++++++++++++++++++++++++++++++++ hw/i386/intel_iommu_internal.h | 21 ++ hw/i386/trace-events | 1 + include/hw/i386/intel_iommu.h | 24 +++ 4 files changed, 510 insertions(+) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index b110c7f..1fce772 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -40,6 +40,7 @@ #include "kvm_i386.h" #include "migration/vmstate.h" #include "trace.h" +#include "qemu/jhash.h" /* context entry operations */ #define VTD_CE_GET_RID2PASID(ce) \ @@ -65,6 +66,8 @@ static void vtd_address_space_refresh_all(IntelIOMMUState *s); static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n); +static void vtd_pasid_cache_reset(IntelIOMMUState *s); + static void vtd_panic_require_caching_mode(void) { error_report("We need to set caching-mode=on for intel-iommu to enable " @@ -276,6 +279,7 @@ static void vtd_reset_caches(IntelIOMMUState *s) vtd_iommu_lock(s); vtd_reset_iotlb_locked(s); vtd_reset_context_cache_locked(s); + vtd_pasid_cache_reset(s); vtd_iommu_unlock(s); } @@ -686,6 +690,16 @@ static inline bool vtd_pe_type_check(X86IOMMUState *x86_iommu, return true; } +static inline uint16_t vtd_pe_get_domain_id(VTDPASIDEntry *pe) +{ + return VTD_SM_PASID_ENTRY_DID((pe)->val[1]); +} + +static inline uint32_t vtd_sm_ce_get_pdt_entry_num(VTDContextEntry *ce) +{ + return 1U << (VTD_SM_CONTEXT_ENTRY_PDTS(ce->val[0]) + 7); +} + static inline bool vtd_pdire_present(VTDPASIDDirEntry *pdire) { return pdire->val & 1; @@ -2395,9 +2409,443 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) return true; } +static inline void vtd_init_pasid_key(uint32_t pasid, + uint16_t sid, + struct pasid_key *key) +{ + key->pasid = pasid; + key->sid = sid; +} + +static guint vtd_pasid_as_key_hash(gconstpointer v) +{ + struct pasid_key *key = (struct pasid_key *)v; + uint32_t a, b, c; + + /* Jenkins hash */ + a = b = c = JHASH_INITVAL + sizeof(*key); + a += key->sid; + b += extract32(key->pasid, 0, 16); + c += extract32(key->pasid, 16, 16); + + __jhash_mix(a, b, c); + __jhash_final(a, b, c); + + return c; +} + +static gboolean vtd_pasid_as_key_equal(gconstpointer v1, gconstpointer v2) +{ + const struct pasid_key *k1 = v1; + const struct pasid_key *k2 = v2; + + return (k1->pasid == k2->pasid) && (k1->sid == k2->sid); +} + +static inline int vtd_dev_get_pe_from_pasid(IntelIOMMUState *s, + uint8_t bus_num, + uint8_t devfn, + uint32_t pasid, + VTDPASIDEntry *pe) +{ + VTDContextEntry ce; + int ret; + dma_addr_t pasid_dir_base; + + if (!s->root_scalable) { + return -VTD_FR_PASID_TABLE_INV; + } + + ret = vtd_dev_to_context_entry(s, bus_num, devfn, &ce); + if (ret) { + return ret; + } + + pasid_dir_base = VTD_CE_GET_PASID_DIR_TABLE(&ce); + ret = vtd_get_pe_from_pasid_table(s, + pasid_dir_base, pasid, pe); + + return ret; +} + +static bool vtd_pasid_entry_compare(VTDPASIDEntry *p1, VTDPASIDEntry *p2) +{ + return !memcmp(p1, p2, sizeof(*p1)); +} + +/** + * This function fills in the pasid entry in &vtd_pasid_as. Caller + * of this function should hold iommu_lock. + */ +static void vtd_fill_pe_in_cache(IntelIOMMUState *s, + VTDPASIDAddressSpace *vtd_pasid_as, + VTDPASIDEntry *pe) +{ + VTDPASIDCacheEntry *pc_entry = &vtd_pasid_as->pasid_cache_entry; + + if (vtd_pasid_entry_compare(pe, &pc_entry->pasid_entry)) { + /* No need to go further as cached pasid entry is latest */ + return; + } + + pc_entry->pasid_entry = *pe; + /* + * TODO: + * - send pasid bind to host for passthru devices + */ +} + +/** + * This function is used to clear cached pasid entry in vtd_pasid_as + * instances. Caller of this function should hold iommu_lock. + */ +static gboolean vtd_flush_pasid(gpointer key, gpointer value, + gpointer user_data) +{ + VTDPASIDCacheInfo *pc_info = user_data; + VTDPASIDAddressSpace *vtd_pasid_as = value; + IntelIOMMUState *s = vtd_pasid_as->iommu_state; + VTDPASIDCacheEntry *pc_entry = &vtd_pasid_as->pasid_cache_entry; + VTDBus *vtd_bus = vtd_pasid_as->vtd_bus; + VTDPASIDEntry pe; + uint16_t did; + uint32_t pasid; + uint16_t devfn; + int ret; + + did = vtd_pe_get_domain_id(&pc_entry->pasid_entry); + pasid = vtd_pasid_as->pasid; + devfn = vtd_pasid_as->devfn; + + switch (pc_info->type) { + case VTD_PASID_CACHE_FORCE_RESET: + goto remove; + case VTD_PASID_CACHE_PASIDSI: + if (pc_info->pasid != pasid) { + return false; + } + /* Fall through */ + case VTD_PASID_CACHE_DOMSI: + if (pc_info->domain_id != did) { + return false; + } + /* Fall through */ + case VTD_PASID_CACHE_GLOBAL_INV: + break; + default: + error_report("invalid pc_info->type"); + abort(); + } + + /* + * pasid cache invalidation may indicate a present pasid + * entry to present pasid entry modification. To cover such + * case, vIOMMU emulator needs to fetch latest guest pasid + * entry and check cached pasid entry, then update pasid + * cache and send pasid bind/unbind to host properly. + */ + ret = vtd_dev_get_pe_from_pasid(s, pci_bus_num(vtd_bus->bus), + devfn, pasid, &pe); + if (ret) { + /* + * No valid pasid entry in guest memory. e.g. pasid entry + * was modified to be either all-zero or non-present. Either + * case means existing pasid cache should be removed. + */ + goto remove; + } + + vtd_fill_pe_in_cache(s, vtd_pasid_as, &pe); + /* + * TODO: + * - when pasid-base-iotlb(piotlb) infrastructure is ready, + * should invalidate QEMU piotlb togehter with this change. + */ + return false; +remove: + /* + * TODO: + * - send pasid bind to host for passthru devices + * - when pasid-base-iotlb(piotlb) infrastructure is ready, + * should invalidate QEMU piotlb togehter with this change. + */ + return true; +} + +/** + * This function finds or adds a VTDPASIDAddressSpace for a device + * when it is bound to a pasid. Caller of this function should hold + * iommu_lock. + */ +static VTDPASIDAddressSpace *vtd_add_find_pasid_as(IntelIOMMUState *s, + VTDBus *vtd_bus, + int devfn, + uint32_t pasid) +{ + struct pasid_key key; + struct pasid_key *new_key; + VTDPASIDAddressSpace *vtd_pasid_as; + uint16_t sid; + + sid = vtd_make_source_id(pci_bus_num(vtd_bus->bus), devfn); + vtd_init_pasid_key(pasid, sid, &key); + vtd_pasid_as = g_hash_table_lookup(s->vtd_pasid_as, &key); + + if (!vtd_pasid_as) { + new_key = g_malloc0(sizeof(*new_key)); + vtd_init_pasid_key(pasid, sid, new_key); + /* + * Initiate the vtd_pasid_as structure. + * + * This structure here is used to track the guest pasid + * binding and also serves as pasid-cache mangement entry. + * + * TODO: in future, if wants to support the SVA-aware DMA + * emulation, the vtd_pasid_as should have include + * AddressSpace to support DMA emulation. + */ + vtd_pasid_as = g_malloc0(sizeof(VTDPASIDAddressSpace)); + vtd_pasid_as->iommu_state = s; + vtd_pasid_as->vtd_bus = vtd_bus; + vtd_pasid_as->devfn = devfn; + vtd_pasid_as->pasid = pasid; + g_hash_table_insert(s->vtd_pasid_as, new_key, vtd_pasid_as); + } + return vtd_pasid_as; +} + +/** + * Caller of this function should hold iommu_lock. + */ +static void vtd_sm_pasid_table_walk_one(IntelIOMMUState *s, + dma_addr_t pt_base, + int start, + int end, + VTDPASIDCacheInfo *info) +{ + VTDPASIDEntry pe; + int pasid = start; + int pasid_next; + VTDPASIDAddressSpace *vtd_pasid_as; + + while (pasid < end) { + pasid_next = pasid + 1; + + if (!vtd_get_pe_in_pasid_leaf_table(s, pasid, pt_base, &pe) + && vtd_pe_present(&pe)) { + vtd_pasid_as = vtd_add_find_pasid_as(s, + info->vtd_bus, info->devfn, pasid); + if ((info->type == VTD_PASID_CACHE_DOMSI || + info->type == VTD_PASID_CACHE_PASIDSI) && + !(info->domain_id == vtd_pe_get_domain_id(&pe))) { + /* + * VTD_PASID_CACHE_DOMSI and VTD_PASID_CACHE_PASIDSI + * requires domain ID check. If domain Id check fail, + * go to next pasid. + */ + pasid = pasid_next; + continue; + } + vtd_fill_pe_in_cache(s, vtd_pasid_as, &pe); + } + pasid = pasid_next; + } +} + +/* + * Currently, VT-d scalable mode pasid table is a two level table, + * this function aims to loop a range of PASIDs in a given pasid + * table to identify the pasid config in guest. + * Caller of this function should hold iommu_lock. + */ +static void vtd_sm_pasid_table_walk(IntelIOMMUState *s, + dma_addr_t pdt_base, + int start, + int end, + VTDPASIDCacheInfo *info) +{ + VTDPASIDDirEntry pdire; + int pasid = start; + int pasid_next; + dma_addr_t pt_base; + + while (pasid < end) { + pasid_next = ((end - pasid) > VTD_PASID_TBL_ENTRY_NUM) ? + (pasid + VTD_PASID_TBL_ENTRY_NUM) : end; + if (!vtd_get_pdire_from_pdir_table(pdt_base, pasid, &pdire) + && vtd_pdire_present(&pdire)) { + pt_base = pdire.val & VTD_PASID_TABLE_BASE_ADDR_MASK; + vtd_sm_pasid_table_walk_one(s, pt_base, pasid, pasid_next, info); + } + pasid = pasid_next; + } +} + +static void vtd_replay_pasid_bind_for_dev(IntelIOMMUState *s, + int start, int end, + VTDPASIDCacheInfo *info) +{ + VTDContextEntry ce; + int bus_n, devfn; + + bus_n = pci_bus_num(info->vtd_bus->bus); + devfn = info->devfn; + + if (!vtd_dev_to_context_entry(s, bus_n, devfn, &ce)) { + uint32_t max_pasid; + + max_pasid = vtd_sm_ce_get_pdt_entry_num(&ce) * VTD_PASID_TBL_ENTRY_NUM; + if (end > max_pasid) { + end = max_pasid; + } + vtd_sm_pasid_table_walk(s, + VTD_CE_GET_PASID_DIR_TABLE(&ce), + start, + end, + info); + } +} + +/** + * This function replay the guest pasid bindings to hots by + * walking the guest PASID table. This ensures host will have + * latest guest pasid bindings. Caller should hold iommu_lock. + */ +static void vtd_replay_guest_pasid_bindings(IntelIOMMUState *s, + VTDPASIDCacheInfo *pc_info) +{ + VTDHostIOMMUContext *vtd_dev_icx; + int start = 0, end = VTD_HPASID_MAX; + VTDPASIDCacheInfo walk_info; + + switch (pc_info->type) { + case VTD_PASID_CACHE_PASIDSI: + start = pc_info->pasid; + end = pc_info->pasid + 1; + /* + * PASID selective invalidation is within domain, + * thus fall through. + */ + case VTD_PASID_CACHE_DOMSI: + case VTD_PASID_CACHE_GLOBAL_INV: + /* loop all assigned devices */ + break; + case VTD_PASID_CACHE_FORCE_RESET: + /* For force reset, no need to go further replay */ + return; + default: + error_report("invalid pc_info->type for replay"); + abort(); + } + + /* + * In this replay, only needs to care about the devices which + * are backed by host IOMMU. For such devices, their vtd_dev_icx + * instances are in the s->vtd_dev_icx_list. For devices which + * are not backed byhost IOMMU, it is not necessary to replay + * the bindings since their cache could be re-created in the future + * DMA address transaltion. + */ + walk_info = *pc_info; + QLIST_FOREACH(vtd_dev_icx, &s->vtd_dev_icx_list, next) { + /* vtd_bus|devfn fields are not identical with pc_info */ + walk_info.vtd_bus = vtd_dev_icx->vtd_bus; + walk_info.devfn = vtd_dev_icx->devfn; + vtd_replay_pasid_bind_for_dev(s, start, end, &walk_info); + } +} + +/** + * This function syncs the pasid bindings between guest and host. + * It includes updating the pasid cache in vIOMMU and updating the + * pasid bindings per guest's latest pasid entry presence. + */ +static void vtd_pasid_cache_sync(IntelIOMMUState *s, + VTDPASIDCacheInfo *pc_info) +{ + /* + * Regards to a pasid cache invalidation, e.g. a PSI. + * it could be either cases of below: + * a) a present pasid entry moved to non-present + * b) a present pasid entry to be a present entry + * c) a non-present pasid entry moved to present + * + * Different invalidation granularity may affect different device + * scope and pasid scope. But for each invalidation granularity, + * it needs to do two steps to sync host and guest pasid binding. + * + * Here is the handling of a PSI: + * 1) loop all the existing vtd_pasid_as instances to update them + * according to the latest guest pasid entry in pasid table. + * this will make sure affected existing vtd_pasid_as instances + * cached the latest pasid entries. Also, during the loop, the + * host should be notified if needed. e.g. pasid unbind or pasid + * update. Should be able to cover case a) and case b). + * + * 2) loop all devices to cover case c) + * - For devices which have HostIOMMUContext instances, + * we loop them and check if guest pasid entry exists. If yes, + * it is case c), we update the pasid cache and also notify + * host. + * - For devices which have no HostIOMMUContext, it is not + * necessary to create pasid cache at this phase since it + * could be created when vIOMMU does DMA address translation. + * This is not yet implemented since there is no emulated + * pasid-capable devices today. If we have such devices in + * future, the pasid cache shall be created there. + * Other granularity follow the same steps, just with different scope + * + */ + + vtd_iommu_lock(s); + /* Step 1: loop all the exisitng vtd_pasid_as instances */ + g_hash_table_foreach_remove(s->vtd_pasid_as, + vtd_flush_pasid, pc_info); + + /* + * Step 2: loop all the exisitng vtd_dev_icx instances. + * Ideally, needs to loop all devices to find if there is any new + * PASID binding regards to the PASID cache invalidation request. + * But it is enough to loop the devices which are backed by host + * IOMMU. For devices backed by vIOMMU (a.k.a emulated devices), + * if new PASID happened on them, their vtd_pasid_as instance could + * be created during future vIOMMU DMA translation. + */ + vtd_replay_guest_pasid_bindings(s, pc_info); + vtd_iommu_unlock(s); +} + +/** + * Caller of this function should hold iommu_lock + */ +static void vtd_pasid_cache_reset(IntelIOMMUState *s) +{ + VTDPASIDCacheInfo pc_info; + + trace_vtd_pasid_cache_reset(); + + pc_info.type = VTD_PASID_CACHE_FORCE_RESET; + + /* + * Reset pasid cache is a big hammer, so use + * g_hash_table_foreach_remove which will free + * the vtd_pasid_as instances. Also, as a big + * hammer, use VTD_PASID_CACHE_FORCE_RESET to + * ensure all the vtd_pasid_as instances are + * dropped, meanwhile the change will be pass + * to host if HostIOMMUContext is available. + */ + g_hash_table_foreach_remove(s->vtd_pasid_as, + vtd_flush_pasid, &pc_info); +} + static bool vtd_process_pasid_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) { + uint16_t domain_id; + uint32_t pasid; + VTDPASIDCacheInfo pc_info; + if ((inv_desc->val[0] & VTD_INV_DESC_PASIDC_RSVD_VAL0) || (inv_desc->val[1] & VTD_INV_DESC_PASIDC_RSVD_VAL1) || (inv_desc->val[2] & VTD_INV_DESC_PASIDC_RSVD_VAL2) || @@ -2407,14 +2855,26 @@ static bool vtd_process_pasid_desc(IntelIOMMUState *s, return false; } + domain_id = VTD_INV_DESC_PASIDC_DID(inv_desc->val[0]); + pasid = VTD_INV_DESC_PASIDC_PASID(inv_desc->val[0]); + switch (inv_desc->val[0] & VTD_INV_DESC_PASIDC_G) { case VTD_INV_DESC_PASIDC_DSI: + trace_vtd_pasid_cache_dsi(domain_id); + pc_info.type = VTD_PASID_CACHE_DOMSI; + pc_info.domain_id = domain_id; break; case VTD_INV_DESC_PASIDC_PASID_SI: + /* PASID selective implies a DID selective */ + pc_info.type = VTD_PASID_CACHE_PASIDSI; + pc_info.domain_id = domain_id; + pc_info.pasid = pasid; break; case VTD_INV_DESC_PASIDC_GLOBAL: + trace_vtd_pasid_cache_gsi(); + pc_info.type = VTD_PASID_CACHE_GLOBAL_INV; break; default: @@ -2423,6 +2883,7 @@ static bool vtd_process_pasid_desc(IntelIOMMUState *s, return false; } + vtd_pasid_cache_sync(s, &pc_info); return true; } @@ -4112,6 +4573,9 @@ static void vtd_realize(DeviceState *dev, Error **errp) g_free, g_free); s->vtd_as_by_busptr = g_hash_table_new_full(vtd_uint64_hash, vtd_uint64_equal, g_free, g_free); + s->vtd_pasid_as = g_hash_table_new_full(vtd_pasid_as_key_hash, + vtd_pasid_as_key_equal, + g_free, g_free); vtd_init(s); sysbus_mmio_map(SYS_BUS_DEVICE(s), 0, Q35_HOST_BRIDGE_IOMMU_ADDR); pci_setup_iommu(bus, &vtd_iommu_ops, dev); diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index 22d0bc5..1829f3a 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -308,6 +308,7 @@ typedef enum VTDFaultReason { VTD_FR_IR_SID_ERR = 0x26, /* Invalid Source-ID */ VTD_FR_PASID_TABLE_INV = 0x58, /*Invalid PASID table entry */ + VTD_FR_PASID_ENTRY_P = 0x59, /* The Present(P) field of pasidt-entry is 0 */ /* This is not a normal fault reason. We use this to indicate some faults * that are not referenced by the VT-d specification. @@ -512,10 +513,29 @@ typedef struct VTDRootEntry VTDRootEntry; #define VTD_CTX_ENTRY_LEGACY_SIZE 16 #define VTD_CTX_ENTRY_SCALABLE_SIZE 32 +#define VTD_SM_CONTEXT_ENTRY_PDTS(val) (((val) >> 9) & 0x3) #define VTD_SM_CONTEXT_ENTRY_RID2PASID_MASK 0xfffff #define VTD_SM_CONTEXT_ENTRY_RSVD_VAL0(aw) (0x1e0ULL | ~VTD_HAW_MASK(aw)) #define VTD_SM_CONTEXT_ENTRY_RSVD_VAL1 0xffffffffffe00000ULL +typedef enum VTDPCInvType { + /* force reset all */ + VTD_PASID_CACHE_FORCE_RESET = 0, + /* pasid cache invalidation rely on guest PASID entry */ + VTD_PASID_CACHE_GLOBAL_INV, + VTD_PASID_CACHE_DOMSI, + VTD_PASID_CACHE_PASIDSI, +} VTDPCInvType; + +struct VTDPASIDCacheInfo { + VTDPCInvType type; + uint16_t domain_id; + uint32_t pasid; + VTDBus *vtd_bus; + uint16_t devfn; +}; +typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo; + /* PASID Table Related Definitions */ #define VTD_PASID_DIR_BASE_ADDR_MASK (~0xfffULL) #define VTD_PASID_TABLE_BASE_ADDR_MASK (~0xfffULL) @@ -527,6 +547,7 @@ typedef struct VTDRootEntry VTDRootEntry; #define VTD_PASID_TABLE_BITS_MASK (0x3fULL) #define VTD_PASID_TABLE_INDEX(pasid) ((pasid) & VTD_PASID_TABLE_BITS_MASK) #define VTD_PASID_ENTRY_FPD (1ULL << 1) /* Fault Processing Disable */ +#define VTD_PASID_TBL_ENTRY_NUM (1ULL << 6) /* PASID Granular Translation Type Mask */ #define VTD_PASID_ENTRY_P 1ULL diff --git a/hw/i386/trace-events b/hw/i386/trace-events index f7cd4e5..60d20c1 100644 --- a/hw/i386/trace-events +++ b/hw/i386/trace-events @@ -23,6 +23,7 @@ vtd_inv_qi_tail(uint16_t head) "write tail %d" vtd_inv_qi_fetch(void) "" vtd_context_cache_reset(void) "" vtd_pasid_cache_gsi(void) "" +vtd_pasid_cache_reset(void) "" vtd_pasid_cache_dsi(uint16_t domain) "Domian slective PC invalidation domain 0x%"PRIx16 vtd_pasid_cache_psi(uint16_t domain, uint32_t pasid) "PASID slective PC invalidation domain 0x%"PRIx16" pasid 0x%"PRIx32 vtd_re_not_present(uint8_t bus) "Root entry bus %"PRIu8" not present" diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h index 42a58d6..626c1cd 100644 --- a/include/hw/i386/intel_iommu.h +++ b/include/hw/i386/intel_iommu.h @@ -65,6 +65,8 @@ typedef union VTD_IR_MSIAddress VTD_IR_MSIAddress; typedef struct VTDPASIDDirEntry VTDPASIDDirEntry; typedef struct VTDPASIDEntry VTDPASIDEntry; typedef struct VTDHostIOMMUContext VTDHostIOMMUContext; +typedef struct VTDPASIDCacheEntry VTDPASIDCacheEntry; +typedef struct VTDPASIDAddressSpace VTDPASIDAddressSpace; /* Context-Entry */ struct VTDContextEntry { @@ -97,6 +99,26 @@ struct VTDPASIDEntry { uint64_t val[8]; }; +struct pasid_key { + uint32_t pasid; + uint16_t sid; +}; + +struct VTDPASIDCacheEntry { + struct VTDPASIDEntry pasid_entry; +}; + +struct VTDPASIDAddressSpace { + VTDBus *vtd_bus; + uint8_t devfn; + AddressSpace as; + uint32_t pasid; + IntelIOMMUState *iommu_state; + VTDContextCacheEntry context_cache_entry; + QLIST_ENTRY(VTDPASIDAddressSpace) next; + VTDPASIDCacheEntry pasid_cache_entry; +}; + struct VTDAddressSpace { PCIBus *bus; uint8_t devfn; @@ -267,6 +289,7 @@ struct IntelIOMMUState { GHashTable *vtd_as_by_busptr; /* VTDBus objects indexed by PCIBus* reference */ VTDBus *vtd_as_by_bus_num[VTD_PCI_BUS_MAX]; /* VTDBus objects indexed by bus number */ + GHashTable *vtd_pasid_as; /* VTDPASIDAddressSpace instances */ /* list of registered notifiers */ QLIST_HEAD(, VTDAddressSpace) vtd_as_with_notifiers; @@ -292,6 +315,7 @@ struct IntelIOMMUState { * - per-IOMMU IOTLB caches * - context entry cache in VTDAddressSpace * - HostIOMMUContext pointer cached in vIOMMU + * - PASID cache in VTDPASIDAddressSpace */ QemuMutex iommu_lock; }; From patchwork Thu Sep 10 10:56:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11767589 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0F14059D for ; Thu, 10 Sep 2020 11:00:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EBC2A20C09 for ; Thu, 10 Sep 2020 11:00:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730755AbgIJK7p (ORCPT ); Thu, 10 Sep 2020 06:59:45 -0400 Received: from mga09.intel.com ([134.134.136.24]:6916 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730609AbgIJK4k (ORCPT ); Thu, 10 Sep 2020 06:56:40 -0400 IronPort-SDR: joO7FawhN3Ngj0oRA78TXIGCT0bO541ojpg8RSr73TLv2tMwTL1ZGMBXpPiBFAAjTcyjYzDqjt pkMKVsHg4TPg== X-IronPort-AV: E=McAfee;i="6000,8403,9739"; a="159459147" X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="159459147" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2020 03:54:40 -0700 IronPort-SDR: 7hvIrgrGnxw8CTd+Mee4bnG82zdBDf73syyEouKcS7uzgKaNmeM1/UJhxMK/q9YPpwihu3GBZn 1/TzD9uNXk9g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="334140082" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 10 Sep 2020 03:54:40 -0700 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com, jasowang@redhat.com Cc: mst@redhat.com, pbonzini@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, jean-philippe@linaro.org, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, hao.wu@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun Subject: [RFC v10 16/25] vfio: add bind stage-1 page table support Date: Thu, 10 Sep 2020 03:56:29 -0700 Message-Id: <1599735398-6829-17-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> References: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds bind_stage1_pgtbl() definition in HostIOMMUContextClass, also adds corresponding implementation in VFIO. This is to expose a way for vIOMMU to setup dual stage DMA translation for passthru devices on hardware. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Eric Auger Cc: Yi Sun Cc: David Gibson Cc: Alex Williamson Signed-off-by: Liu Yi L --- hw/iommu/host_iommu_context.c | 57 +++++++++++++++++++++++++++++++++- hw/vfio/common.c | 58 ++++++++++++++++++++++++++++++++++- include/hw/iommu/host_iommu_context.h | 19 +++++++++++- 3 files changed, 131 insertions(+), 3 deletions(-) diff --git a/hw/iommu/host_iommu_context.c b/hw/iommu/host_iommu_context.c index 5fb2223..c43965c 100644 --- a/hw/iommu/host_iommu_context.c +++ b/hw/iommu/host_iommu_context.c @@ -69,23 +69,78 @@ int host_iommu_ctx_pasid_free(HostIOMMUContext *iommu_ctx, uint32_t pasid) return hicxc->pasid_free(iommu_ctx, pasid); } +int host_iommu_ctx_bind_stage1_pgtbl(HostIOMMUContext *iommu_ctx, + struct iommu_gpasid_bind_data *bind) +{ + HostIOMMUContextClass *hicxc; + + if (!iommu_ctx) { + return -EINVAL; + } + + hicxc = HOST_IOMMU_CONTEXT_GET_CLASS(iommu_ctx); + if (!hicxc) { + return -EINVAL; + } + + if (!(iommu_ctx->flags & HOST_IOMMU_NESTING) || + !hicxc->bind_stage1_pgtbl) { + return -EINVAL; + } + + return hicxc->bind_stage1_pgtbl(iommu_ctx, bind); +} + +int host_iommu_ctx_unbind_stage1_pgtbl(HostIOMMUContext *iommu_ctx, + struct iommu_gpasid_bind_data *unbind) +{ + HostIOMMUContextClass *hicxc; + + if (!iommu_ctx) { + return -EINVAL; + } + + hicxc = HOST_IOMMU_CONTEXT_GET_CLASS(iommu_ctx); + if (!hicxc) { + return -EINVAL; + } + + if (!(iommu_ctx->flags & HOST_IOMMU_NESTING) || + !hicxc->unbind_stage1_pgtbl) { + return -EINVAL; + } + + return hicxc->unbind_stage1_pgtbl(iommu_ctx, unbind); +} + void host_iommu_ctx_init(void *_iommu_ctx, size_t instance_size, const char *mrtypename, - uint64_t flags) + uint64_t flags, + struct iommu_nesting_info *info) { HostIOMMUContext *iommu_ctx; object_initialize(_iommu_ctx, instance_size, mrtypename); iommu_ctx = HOST_IOMMU_CONTEXT(_iommu_ctx); iommu_ctx->flags = flags; + iommu_ctx->info = g_malloc0(info->argsz); + memcpy(iommu_ctx->info, info, info->argsz); iommu_ctx->initialized = true; } +static void host_iommu_ctx_finalize_fn(Object *obj) +{ + HostIOMMUContext *iommu_ctx = HOST_IOMMU_CONTEXT(obj); + + g_free(iommu_ctx->info); +} + static const TypeInfo host_iommu_context_info = { .parent = TYPE_OBJECT, .name = TYPE_HOST_IOMMU_CONTEXT, .class_size = sizeof(HostIOMMUContextClass), .instance_size = sizeof(HostIOMMUContext), + .instance_finalize = host_iommu_ctx_finalize_fn, .abstract = true, }; diff --git a/hw/vfio/common.c b/hw/vfio/common.c index f41deeb..74dbeaf 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -1227,6 +1227,54 @@ static int vfio_host_iommu_ctx_pasid_free(HostIOMMUContext *iommu_ctx, return ret; } +static int vfio_host_iommu_ctx_bind_stage1_pgtbl(HostIOMMUContext *iommu_ctx, + struct iommu_gpasid_bind_data *bind) +{ + VFIOContainer *container = container_of(iommu_ctx, + VFIOContainer, iommu_ctx); + struct vfio_iommu_type1_nesting_op *op; + unsigned long argsz; + int ret = 0; + + argsz = sizeof(*op) + sizeof(*bind); + op = g_malloc0(argsz); + op->argsz = argsz; + op->flags = VFIO_IOMMU_NESTING_OP_BIND_PGTBL; + memcpy(&op->data, bind, sizeof(*bind)); + + if (ioctl(container->fd, VFIO_IOMMU_NESTING_OP, op)) { + ret = -errno; + error_report("%s: pasid (%llu) bind failed: %m", + __func__, bind->hpasid); + } + g_free(op); + return ret; +} + +static int vfio_host_iommu_ctx_unbind_stage1_pgtbl(HostIOMMUContext *iommu_ctx, + struct iommu_gpasid_bind_data *unbind) +{ + VFIOContainer *container = container_of(iommu_ctx, + VFIOContainer, iommu_ctx); + struct vfio_iommu_type1_nesting_op *op; + unsigned long argsz; + int ret = 0; + + argsz = sizeof(*op) + sizeof(*unbind); + op = g_malloc0(argsz); + op->argsz = argsz; + op->flags = VFIO_IOMMU_NESTING_OP_UNBIND_PGTBL; + memcpy(&op->data, unbind, sizeof(*unbind)); + + if (ioctl(container->fd, VFIO_IOMMU_NESTING_OP, op)) { + ret = -errno; + error_report("%s: pasid (%llu) unbind failed: %m", + __func__, unbind->hpasid); + } + g_free(op); + return ret; +} + /** * Get iommu info from host. Caller of this funcion should free * the memory pointed by the returned pointer stored in @info @@ -1364,10 +1412,16 @@ static int vfio_init_container(VFIOContainer *container, int group_fd, nest_info = (struct iommu_nesting_info *) &nesting->info; flags |= (nest_info->features & IOMMU_NESTING_FEAT_SYSWIDE_PASID) ? HOST_IOMMU_PASID_REQUEST : 0; + if ((nest_info->features & IOMMU_NESTING_FEAT_BIND_PGTBL) && + (nest_info->features & IOMMU_NESTING_FEAT_CACHE_INVLD)) { + flags |= HOST_IOMMU_NESTING; + } + host_iommu_ctx_init(&container->iommu_ctx, sizeof(container->iommu_ctx), TYPE_VFIO_HOST_IOMMU_CONTEXT, - flags); + flags, + nest_info); g_free(nesting); } @@ -1967,6 +2021,8 @@ static void vfio_host_iommu_context_class_init(ObjectClass *klass, hicxc->pasid_alloc = vfio_host_iommu_ctx_pasid_alloc; hicxc->pasid_free = vfio_host_iommu_ctx_pasid_free; + hicxc->bind_stage1_pgtbl = vfio_host_iommu_ctx_bind_stage1_pgtbl; + hicxc->unbind_stage1_pgtbl = vfio_host_iommu_ctx_unbind_stage1_pgtbl; } static const TypeInfo vfio_host_iommu_context_info = { diff --git a/include/hw/iommu/host_iommu_context.h b/include/hw/iommu/host_iommu_context.h index 227c433..2883ed8 100644 --- a/include/hw/iommu/host_iommu_context.h +++ b/include/hw/iommu/host_iommu_context.h @@ -54,6 +54,16 @@ typedef struct HostIOMMUContextClass { /* Reclaim pasid from HostIOMMUContext (a.k.a. host software) */ int (*pasid_free)(HostIOMMUContext *iommu_ctx, uint32_t pasid); + /* + * Bind stage-1 page table to a hostIOMMU w/ dual stage + * DMA translation capability. + * @bind specifies the bind configurations. + */ + int (*bind_stage1_pgtbl)(HostIOMMUContext *iommu_ctx, + struct iommu_gpasid_bind_data *bind); + /* Undo a previous bind. @unbind specifies the unbind info. */ + int (*unbind_stage1_pgtbl)(HostIOMMUContext *iommu_ctx, + struct iommu_gpasid_bind_data *unbind); } HostIOMMUContextClass; /* @@ -62,17 +72,24 @@ typedef struct HostIOMMUContextClass { struct HostIOMMUContext { Object parent_obj; #define HOST_IOMMU_PASID_REQUEST (1ULL << 0) +#define HOST_IOMMU_NESTING (1ULL << 1) uint64_t flags; + struct iommu_nesting_info *info; bool initialized; }; int host_iommu_ctx_pasid_alloc(HostIOMMUContext *iommu_ctx, uint32_t min, uint32_t max, uint32_t *pasid); int host_iommu_ctx_pasid_free(HostIOMMUContext *iommu_ctx, uint32_t pasid); +int host_iommu_ctx_bind_stage1_pgtbl(HostIOMMUContext *iommu_ctx, + struct iommu_gpasid_bind_data *bind); +int host_iommu_ctx_unbind_stage1_pgtbl(HostIOMMUContext *iommu_ctx, + struct iommu_gpasid_bind_data *unbind); void host_iommu_ctx_init(void *_iommu_ctx, size_t instance_size, const char *mrtypename, - uint64_t flags); + uint64_t flags, + struct iommu_nesting_info *info); void host_iommu_ctx_destroy(HostIOMMUContext *iommu_ctx); #endif From patchwork Thu Sep 10 10:56:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11767587 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A4D8E59D for ; Thu, 10 Sep 2020 10:59:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 85E172145D for ; Thu, 10 Sep 2020 10:59:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730462AbgIJK7j (ORCPT ); Thu, 10 Sep 2020 06:59:39 -0400 Received: from mga09.intel.com ([134.134.136.24]:7227 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729521AbgIJK4r (ORCPT ); Thu, 10 Sep 2020 06:56:47 -0400 IronPort-SDR: qrTM5+ym/S+d82pHOY21+g7nSSsYFXb3sgsYB7Fb0TyQO5NvOeu+BEyOu7eTqrUlCWEwi1rwMG h0HEW6eAE+AQ== X-IronPort-AV: E=McAfee;i="6000,8403,9739"; a="159459148" X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="159459148" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2020 03:54:40 -0700 IronPort-SDR: awgJgjM+1XkfwqXBKrDB0Iu8nuDSuJs2USH2TwyXEXMUuDlXJdJ+pPyZq/USKot8tUrfBZaWi+ UkUgXqRS5vmA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="334140088" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 10 Sep 2020 03:54:40 -0700 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com, jasowang@redhat.com Cc: mst@redhat.com, pbonzini@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, jean-philippe@linaro.org, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, hao.wu@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Richard Henderson Subject: [RFC v10 18/25] intel_iommu: bind/unbind guest page table to host Date: Thu, 10 Sep 2020 03:56:31 -0700 Message-Id: <1599735398-6829-19-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> References: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch captures the guest PASID table entry modifications and propagates the changes to host to setup dual stage DMA translation. The guest page table is configured as 1st level page table (GVA->GPA) whose translation result would further go through host VT-d 2nd level page table(GPA->HPA) under nested translation mode. This is the key part of vSVA support, and also a key to support IOVA over 1st- level page table for Intel VT-d in virtualization environment. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Paolo Bonzini Cc: Richard Henderson Signed-off-by: Liu Yi L --- hw/i386/intel_iommu.c | 101 +++++++++++++++++++++++++++++++++++++++-- hw/i386/intel_iommu_internal.h | 18 ++++++++ 2 files changed, 114 insertions(+), 5 deletions(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index af17b36..4f6b80f 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -41,6 +41,7 @@ #include "migration/vmstate.h" #include "trace.h" #include "qemu/jhash.h" +#include /* context entry operations */ #define VTD_CE_GET_RID2PASID(ce) \ @@ -700,6 +701,16 @@ static inline uint32_t vtd_sm_ce_get_pdt_entry_num(VTDContextEntry *ce) return 1U << (VTD_SM_CONTEXT_ENTRY_PDTS(ce->val[0]) + 7); } +static inline uint32_t vtd_pe_get_fl_aw(VTDPASIDEntry *pe) +{ + return 48 + ((pe->val[2] >> 2) & VTD_SM_PASID_ENTRY_FLPM) * 9; +} + +static inline dma_addr_t vtd_pe_get_flpt_base(VTDPASIDEntry *pe) +{ + return pe->val[2] & VTD_SM_PASID_ENTRY_FLPTPTR; +} + static inline bool vtd_pdire_present(VTDPASIDDirEntry *pdire) { return pdire->val & 1; @@ -1861,6 +1872,85 @@ static void vtd_context_global_invalidate(IntelIOMMUState *s) vtd_iommu_replay_all(s); } +/** + * Caller should hold iommu_lock. + */ +static int vtd_bind_guest_pasid(IntelIOMMUState *s, VTDBus *vtd_bus, + int devfn, int pasid, VTDPASIDEntry *pe, + VTDPASIDOp op) +{ + VTDHostIOMMUContext *vtd_dev_icx; + HostIOMMUContext *iommu_ctx; + int ret = -1; + + vtd_dev_icx = vtd_bus->dev_icx[devfn]; + if (!vtd_dev_icx) { + /* means no need to go further, e.g. for emulated devices */ + return 0; + } + + iommu_ctx = vtd_dev_icx->iommu_ctx; + if (!iommu_ctx) { + return -EINVAL; + } + + switch (op) { + case VTD_PASID_BIND: + { + struct iommu_gpasid_bind_data *g_bind_data; + + g_bind_data = g_malloc0(sizeof(*g_bind_data)); + + g_bind_data->argsz = sizeof(*g_bind_data); + g_bind_data->version = IOMMU_GPASID_BIND_VERSION_1; + g_bind_data->format = IOMMU_PASID_FORMAT_INTEL_VTD; + g_bind_data->gpgd = vtd_pe_get_flpt_base(pe); + g_bind_data->addr_width = vtd_pe_get_fl_aw(pe); + g_bind_data->hpasid = pasid; + g_bind_data->gpasid = pasid; + g_bind_data->flags |= IOMMU_SVA_GPASID_VAL; + g_bind_data->vendor.vtd.flags = + (VTD_SM_PASID_ENTRY_SRE_BIT(pe->val[2]) ? + IOMMU_SVA_VTD_GPASID_SRE : 0) + | (VTD_SM_PASID_ENTRY_EAFE_BIT(pe->val[2]) ? + IOMMU_SVA_VTD_GPASID_EAFE : 0) + | (VTD_SM_PASID_ENTRY_PCD_BIT(pe->val[1]) ? + IOMMU_SVA_VTD_GPASID_PCD : 0) + | (VTD_SM_PASID_ENTRY_PWT_BIT(pe->val[1]) ? + IOMMU_SVA_VTD_GPASID_PWT : 0) + | (VTD_SM_PASID_ENTRY_EMTE_BIT(pe->val[1]) ? + IOMMU_SVA_VTD_GPASID_EMTE : 0) + | (VTD_SM_PASID_ENTRY_CD_BIT(pe->val[1]) ? + IOMMU_SVA_VTD_GPASID_CD : 0); + g_bind_data->vendor.vtd.pat = VTD_SM_PASID_ENTRY_PAT(pe->val[1]); + g_bind_data->vendor.vtd.emt = VTD_SM_PASID_ENTRY_EMT(pe->val[1]); + ret = host_iommu_ctx_bind_stage1_pgtbl(iommu_ctx, g_bind_data); + g_free(g_bind_data); + break; + } + case VTD_PASID_UNBIND: + { + struct iommu_gpasid_bind_data *g_unbind_data; + + g_unbind_data = g_malloc0(sizeof(*g_unbind_data)); + + g_unbind_data->argsz = sizeof(*g_unbind_data); + g_unbind_data->version = IOMMU_GPASID_BIND_VERSION_1; + g_unbind_data->format = IOMMU_PASID_FORMAT_INTEL_VTD; + g_unbind_data->hpasid = pasid; + ret = host_iommu_ctx_unbind_stage1_pgtbl(iommu_ctx, g_unbind_data); + g_free(g_unbind_data); + break; + } + default: + error_report_once("Unknown VTDPASIDOp!!!\n"); + break; + } + + + return ret; +} + /* Do a context-cache device-selective invalidation. * @func_mask: FM field after shifting */ @@ -2489,10 +2579,10 @@ static void vtd_fill_pe_in_cache(IntelIOMMUState *s, } pc_entry->pasid_entry = *pe; - /* - * TODO: - * - send pasid bind to host for passthru devices - */ + vtd_bind_guest_pasid(s, vtd_pasid_as->vtd_bus, + vtd_pasid_as->devfn, + vtd_pasid_as->pasid, + pe, VTD_PASID_BIND); } /** @@ -2565,10 +2655,11 @@ static gboolean vtd_flush_pasid(gpointer key, gpointer value, remove: /* * TODO: - * - send pasid bind to host for passthru devices * - when pasid-base-iotlb(piotlb) infrastructure is ready, * should invalidate QEMU piotlb togehter with this change. */ + vtd_bind_guest_pasid(s, vtd_bus, devfn, + pasid, NULL, VTD_PASID_UNBIND); return true; } diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index a57ef3d..51691d0 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -536,6 +536,13 @@ typedef struct VTDRootEntry VTDRootEntry; #define VTD_SM_CONTEXT_ENTRY_RSVD_VAL0(aw) (0x1e0ULL | ~VTD_HAW_MASK(aw)) #define VTD_SM_CONTEXT_ENTRY_RSVD_VAL1 0xffffffffffe00000ULL +enum VTDPASIDOp { + VTD_PASID_BIND, + VTD_PASID_UNBIND, + VTD_OP_NUM +}; +typedef enum VTDPASIDOp VTDPASIDOp; + typedef enum VTDPCInvType { /* force reset all */ VTD_PASID_CACHE_FORCE_RESET = 0, @@ -578,6 +585,17 @@ typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo; #define VTD_SM_PASID_ENTRY_AW 7ULL /* Adjusted guest-address-width */ #define VTD_SM_PASID_ENTRY_DID(val) ((val) & VTD_DOMAIN_ID_MASK) +#define VTD_SM_PASID_ENTRY_FLPM 3ULL +#define VTD_SM_PASID_ENTRY_FLPTPTR (~0xfffULL) +#define VTD_SM_PASID_ENTRY_SRE_BIT(val) (!!((val) & 1ULL)) +#define VTD_SM_PASID_ENTRY_EAFE_BIT(val) (!!(((val) >> 7) & 1ULL)) +#define VTD_SM_PASID_ENTRY_PCD_BIT(val) (!!(((val) >> 31) & 1ULL)) +#define VTD_SM_PASID_ENTRY_PWT_BIT(val) (!!(((val) >> 30) & 1ULL)) +#define VTD_SM_PASID_ENTRY_EMTE_BIT(val) (!!(((val) >> 26) & 1ULL)) +#define VTD_SM_PASID_ENTRY_CD_BIT(val) (!!(((val) >> 25) & 1ULL)) +#define VTD_SM_PASID_ENTRY_PAT(val) (((val) >> 32) & 0xFFFFFFFFULL) +#define VTD_SM_PASID_ENTRY_EMT(val) (((val) >> 27) & 0x7ULL) + /* Second Level Page Translation Pointer*/ #define VTD_SM_PASID_ENTRY_SLPTPTR (~0xfffULL) From patchwork Thu Sep 10 10:56:32 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11767579 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6CBF6139F for ; Thu, 10 Sep 2020 10:59:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 57D5C20BED for ; Thu, 10 Sep 2020 10:59:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730200AbgIJK7E (ORCPT ); Thu, 10 Sep 2020 06:59:04 -0400 Received: from mga09.intel.com ([134.134.136.24]:6916 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730619AbgIJK4y (ORCPT ); Thu, 10 Sep 2020 06:56:54 -0400 IronPort-SDR: U4LEZujrqGt0Iss3JhbiUUtFmBDE6thLHY6pHKx4fBk1rY5t3tuhxX15HHlXxMZfBfzfUsH8nd wnao8ZlJAtQA== X-IronPort-AV: E=McAfee;i="6000,8403,9739"; a="159459149" X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="159459149" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2020 03:54:40 -0700 IronPort-SDR: zybOB9RYwc92+k4NPT0nbGpj/wjBw+rw4myhZQd6jYcOmPJndX3lxwF72PbNihCDbpcIH+MjLV pgk6sPtuibqA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="334140091" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 10 Sep 2020 03:54:40 -0700 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com, jasowang@redhat.com Cc: mst@redhat.com, pbonzini@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, jean-philippe@linaro.org, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, hao.wu@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Richard Henderson , Eduardo Habkost Subject: [RFC v10 19/25] intel_iommu: replay pasid binds after context cache invalidation Date: Thu, 10 Sep 2020 03:56:32 -0700 Message-Id: <1599735398-6829-20-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> References: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch replays guest pasid bindings after context cache invalidation. This is a behavior to ensure safety. Actually, programmer should issue pasid cache invalidation with proper granularity after issuing a context cache invalidation. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Signed-off-by: Liu Yi L --- hw/i386/intel_iommu.c | 50 ++++++++++++++++++++++++++++++++++++++++++ hw/i386/intel_iommu_internal.h | 1 + hw/i386/trace-events | 1 + 3 files changed, 52 insertions(+) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 4f6b80f..7bc9735 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -68,6 +68,10 @@ static void vtd_address_space_refresh_all(IntelIOMMUState *s); static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n); static void vtd_pasid_cache_reset(IntelIOMMUState *s); +static void vtd_pasid_cache_sync(IntelIOMMUState *s, + VTDPASIDCacheInfo *pc_info); +static void vtd_pasid_cache_devsi(IntelIOMMUState *s, + VTDBus *vtd_bus, uint16_t devfn); static void vtd_panic_require_caching_mode(void) { @@ -1853,7 +1857,10 @@ static void vtd_iommu_replay_all(IntelIOMMUState *s) static void vtd_context_global_invalidate(IntelIOMMUState *s) { + VTDPASIDCacheInfo pc_info; + trace_vtd_inv_desc_cc_global(); + /* Protects context cache */ vtd_iommu_lock(s); s->context_cache_gen++; @@ -1870,6 +1877,9 @@ static void vtd_context_global_invalidate(IntelIOMMUState *s) * VT-d emulation codes. */ vtd_iommu_replay_all(s); + + pc_info.type = VTD_PASID_CACHE_GLOBAL_INV; + vtd_pasid_cache_sync(s, &pc_info); } /** @@ -2008,6 +2018,21 @@ static void vtd_context_device_invalidate(IntelIOMMUState *s, * happened. */ vtd_sync_shadow_page_table(vtd_as); + /* + * Per spec, context flush should also followed with PASID + * cache and iotlb flush. Regards to a device selective + * context cache invalidation: + * if (emaulted_device) + * invalidate pasid cahce and pasid-based iotlb + * else if (assigned_device) + * check if the device has been bound to any pasid + * invoke pasid_unbind regards to each bound pasid + * Here, we have vtd_pasid_cache_devsi() to invalidate pasid + * caches, while for piotlb in QEMU, we don't have it yet, so + * no handling. For assigned device, host iommu driver would + * flush piotlb when a pasid unbind is pass down to it. + */ + vtd_pasid_cache_devsi(s, vtd_bus, devfn_it); } } } @@ -2622,6 +2647,12 @@ static gboolean vtd_flush_pasid(gpointer key, gpointer value, /* Fall through */ case VTD_PASID_CACHE_GLOBAL_INV: break; + case VTD_PASID_CACHE_DEVSI: + if (pc_info->vtd_bus != vtd_bus || + pc_info->devfn != devfn) { + return false; + } + break; default: error_report("invalid pc_info->type"); abort(); @@ -2821,6 +2852,11 @@ static void vtd_replay_guest_pasid_bindings(IntelIOMMUState *s, case VTD_PASID_CACHE_GLOBAL_INV: /* loop all assigned devices */ break; + case VTD_PASID_CACHE_DEVSI: + walk_info.vtd_bus = pc_info->vtd_bus; + walk_info.devfn = pc_info->devfn; + vtd_replay_pasid_bind_for_dev(s, start, end, &walk_info); + return; case VTD_PASID_CACHE_FORCE_RESET: /* For force reset, no need to go further replay */ return; @@ -2906,6 +2942,20 @@ static void vtd_pasid_cache_sync(IntelIOMMUState *s, vtd_iommu_unlock(s); } +static void vtd_pasid_cache_devsi(IntelIOMMUState *s, + VTDBus *vtd_bus, uint16_t devfn) +{ + VTDPASIDCacheInfo pc_info; + + trace_vtd_pasid_cache_devsi(devfn); + + pc_info.type = VTD_PASID_CACHE_DEVSI; + pc_info.vtd_bus = vtd_bus; + pc_info.devfn = devfn; + + vtd_pasid_cache_sync(s, &pc_info); +} + /** * Caller of this function should hold iommu_lock */ diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index 51691d0..9805b84 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -548,6 +548,7 @@ typedef enum VTDPCInvType { VTD_PASID_CACHE_FORCE_RESET = 0, /* pasid cache invalidation rely on guest PASID entry */ VTD_PASID_CACHE_GLOBAL_INV, + VTD_PASID_CACHE_DEVSI, VTD_PASID_CACHE_DOMSI, VTD_PASID_CACHE_PASIDSI, } VTDPCInvType; diff --git a/hw/i386/trace-events b/hw/i386/trace-events index 60d20c1..3853fa8 100644 --- a/hw/i386/trace-events +++ b/hw/i386/trace-events @@ -26,6 +26,7 @@ vtd_pasid_cache_gsi(void) "" vtd_pasid_cache_reset(void) "" vtd_pasid_cache_dsi(uint16_t domain) "Domian slective PC invalidation domain 0x%"PRIx16 vtd_pasid_cache_psi(uint16_t domain, uint32_t pasid) "PASID slective PC invalidation domain 0x%"PRIx16" pasid 0x%"PRIx32 +vtd_pasid_cache_devsi(uint16_t devfn) "Dev selective PC invalidation dev: 0x%"PRIx16 vtd_re_not_present(uint8_t bus) "Root entry bus %"PRIu8" not present" vtd_ce_not_present(uint8_t bus, uint8_t devfn) "Context entry bus %"PRIu8" devfn %"PRIu8" not present" vtd_iotlb_page_hit(uint16_t sid, uint64_t addr, uint64_t slpte, uint16_t domain) "IOTLB page hit sid 0x%"PRIx16" iova 0x%"PRIx64" slpte 0x%"PRIx64" domain 0x%"PRIx16 From patchwork Thu Sep 10 10:56:33 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11767573 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F374F59D for ; Thu, 10 Sep 2020 10:58:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DE50D20C09 for ; Thu, 10 Sep 2020 10:58:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730547AbgIJK6G (ORCPT ); Thu, 10 Sep 2020 06:58:06 -0400 Received: from mga09.intel.com ([134.134.136.24]:7227 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728611AbgIJK5J (ORCPT ); Thu, 10 Sep 2020 06:57:09 -0400 IronPort-SDR: WUVSZsg6DYjL4WZmSdvfbLbA+m2W6rSnGEtAGmRKXxJsjlGzTtEkRS9M+DTp6mxkEvSp+1dIru nKmOhL7MY+kA== X-IronPort-AV: E=McAfee;i="6000,8403,9739"; a="159459150" X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="159459150" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2020 03:54:40 -0700 IronPort-SDR: kXPcE/RYregj474ZLDMdnIXO7mGXl8z5RSZWRB/vBU/49Yfo93LjeSgIcnuv0B1es5baXZ+qgp DONZZRCja8+Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="334140093" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 10 Sep 2020 03:54:40 -0700 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com, jasowang@redhat.com Cc: mst@redhat.com, pbonzini@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, jean-philippe@linaro.org, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, hao.wu@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Richard Henderson , Eduardo Habkost Subject: [RFC v10 20/25] intel_iommu: do not pass down pasid bind for PASID #0 Date: Thu, 10 Sep 2020 03:56:33 -0700 Message-Id: <1599735398-6829-21-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> References: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org RID_PASID field was introduced in VT-d 3.0 spec, it is used for DMA requests w/o PASID in scalable mode VT-d. It is also known as IOVA. And in VT-d 3.1 spec, there is definition on it: "Implementations not supporting RID_PASID capability (ECAP_REG.RPS is 0b), use a PASID value of 0 to perform address translation for requests without PASID." This patch adds a check against the PASIDs which are going to be bound to device. For PASID #0, it is not necessary to pass down pasid bind request for it since PASID #0 is used as RID_PASID for DMA requests without pasid. Further reason is current Intel vIOMMU supports gIOVA by shadowing guest 2nd level page table. However, in future, if guest IOMMU driver uses 1st level page table to store IOVA mappings, then guest IOVA support will also be done via nested translation. When gIOVA is over FLPT, then vIOMMU should pass down the pasid bind request for PASID #0 to host, host needs to bind the guest IOVA page table to a proper PASID. e.g. PASID value in RID_PASID field for PF/VF if ECAP_REG.RPS is clear or default PASID for ADI (Assignable Device Interface in Scalable IOV solution). IOVA over FLPT support on Intel VT-d: https://lore.kernel.org/linux-iommu/20191219031634.15168-1-baolu.lu@linux.intel.com/ Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Reviewed-by: Peter Xu Signed-off-by: Liu Yi L --- hw/i386/intel_iommu.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 7bc9735..55623e8 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -1893,6 +1893,16 @@ static int vtd_bind_guest_pasid(IntelIOMMUState *s, VTDBus *vtd_bus, HostIOMMUContext *iommu_ctx; int ret = -1; + if (pasid < VTD_HPASID_MIN) { + /* + * If pasid < VTD_HPASID_MIN, this pasid is not allocated + * from host. No need to pass down the changes on it to host. + * TODO: when IOVA over FLPT is ready, this switch should be + * refined. + */ + return 0; + } + vtd_dev_icx = vtd_bus->dev_icx[devfn]; if (!vtd_dev_icx) { /* means no need to go further, e.g. for emulated devices */ From patchwork Thu Sep 10 10:56:34 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11767653 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 81B9E112E for ; Thu, 10 Sep 2020 11:15:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6A7A121556 for ; Thu, 10 Sep 2020 11:15:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730378AbgIJLPE (ORCPT ); Thu, 10 Sep 2020 07:15:04 -0400 Received: from mga12.intel.com ([192.55.52.136]:22709 "EHLO mga12.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728626AbgIJLLp (ORCPT ); Thu, 10 Sep 2020 07:11:45 -0400 IronPort-SDR: HuqHmOMY5sTeoRqCfiR6U3DdROYxgER89xUZJI64/7FgR+FA3tetIaiAyqMQMHyF/Jce7yZuC/ dT4PYPXkRufw== X-IronPort-AV: E=McAfee;i="6000,8403,9739"; a="138025871" X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="138025871" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2020 03:54:40 -0700 IronPort-SDR: IrmRRykn86c7wdhllt5PynxDey062DxlBY1cBjwXZDap++lNULp6ZOh1nzgs/gjSPwqKlpHCLD ctBoJBKUvmBw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="334140096" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 10 Sep 2020 03:54:40 -0700 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com, jasowang@redhat.com Cc: mst@redhat.com, pbonzini@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, jean-philippe@linaro.org, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, hao.wu@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun Subject: [RFC v10 21/25] vfio: add support for flush iommu stage-1 cache Date: Thu, 10 Sep 2020 03:56:34 -0700 Message-Id: <1599735398-6829-22-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> References: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds flush_stage1_cache() definition in HostIOMUContextClass. And adds corresponding implementation in VFIO. This is to expose a way for vIOMMU to flush stage-1 cache in host side since guest owns stage-1 translation structures in dual stage DMA translation configuration. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Eric Auger Cc: Yi Sun Cc: David Gibson Cc: Alex Williamson Acked-by: Peter Xu Signed-off-by: Liu Yi L --- hw/iommu/host_iommu_context.c | 19 +++++++++++++++++++ hw/vfio/common.c | 24 ++++++++++++++++++++++++ include/hw/iommu/host_iommu_context.h | 8 ++++++++ 3 files changed, 51 insertions(+) diff --git a/hw/iommu/host_iommu_context.c b/hw/iommu/host_iommu_context.c index c43965c..a3f7706 100644 --- a/hw/iommu/host_iommu_context.c +++ b/hw/iommu/host_iommu_context.c @@ -113,6 +113,25 @@ int host_iommu_ctx_unbind_stage1_pgtbl(HostIOMMUContext *iommu_ctx, return hicxc->unbind_stage1_pgtbl(iommu_ctx, unbind); } +int host_iommu_ctx_flush_stage1_cache(HostIOMMUContext *iommu_ctx, + struct iommu_cache_invalidate_info *cache) +{ + HostIOMMUContextClass *hicxc; + + hicxc = HOST_IOMMU_CONTEXT_GET_CLASS(iommu_ctx); + + if (!hicxc) { + return -EINVAL; + } + + if (!(iommu_ctx->flags & HOST_IOMMU_NESTING) || + !hicxc->flush_stage1_cache) { + return -EINVAL; + } + + return hicxc->flush_stage1_cache(iommu_ctx, cache); +} + void host_iommu_ctx_init(void *_iommu_ctx, size_t instance_size, const char *mrtypename, uint64_t flags, diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 74dbeaf..77f88e5 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -1275,6 +1275,29 @@ static int vfio_host_iommu_ctx_unbind_stage1_pgtbl(HostIOMMUContext *iommu_ctx, return ret; } +static int vfio_host_iommu_ctx_flush_stage1_cache(HostIOMMUContext *iommu_ctx, + struct iommu_cache_invalidate_info *cache) +{ + VFIOContainer *container = container_of(iommu_ctx, + VFIOContainer, iommu_ctx); + struct vfio_iommu_type1_nesting_op *op; + unsigned long argsz; + int ret = 0; + + argsz = sizeof(*op) + sizeof(*cache); + op = g_malloc0(argsz); + op->argsz = argsz; + op->flags = VFIO_IOMMU_NESTING_OP_CACHE_INVLD; + memcpy(&op->data, cache, sizeof(*cache)); + + if (ioctl(container->fd, VFIO_IOMMU_NESTING_OP, op)) { + ret = -errno; + error_report("%s: iommu cache flush failed: %m", __func__); + } + g_free(op); + return ret; +} + /** * Get iommu info from host. Caller of this funcion should free * the memory pointed by the returned pointer stored in @info @@ -2023,6 +2046,7 @@ static void vfio_host_iommu_context_class_init(ObjectClass *klass, hicxc->pasid_free = vfio_host_iommu_ctx_pasid_free; hicxc->bind_stage1_pgtbl = vfio_host_iommu_ctx_bind_stage1_pgtbl; hicxc->unbind_stage1_pgtbl = vfio_host_iommu_ctx_unbind_stage1_pgtbl; + hicxc->flush_stage1_cache = vfio_host_iommu_ctx_flush_stage1_cache; } static const TypeInfo vfio_host_iommu_context_info = { diff --git a/include/hw/iommu/host_iommu_context.h b/include/hw/iommu/host_iommu_context.h index 2883ed8..40e860a 100644 --- a/include/hw/iommu/host_iommu_context.h +++ b/include/hw/iommu/host_iommu_context.h @@ -64,6 +64,12 @@ typedef struct HostIOMMUContextClass { /* Undo a previous bind. @unbind specifies the unbind info. */ int (*unbind_stage1_pgtbl)(HostIOMMUContext *iommu_ctx, struct iommu_gpasid_bind_data *unbind); + /* + * Propagate stage-1 cache flush to host IOMMU, cache + * info specifid in @cache + */ + int (*flush_stage1_cache)(HostIOMMUContext *iommu_ctx, + struct iommu_cache_invalidate_info *cache); } HostIOMMUContextClass; /* @@ -85,6 +91,8 @@ int host_iommu_ctx_bind_stage1_pgtbl(HostIOMMUContext *iommu_ctx, struct iommu_gpasid_bind_data *bind); int host_iommu_ctx_unbind_stage1_pgtbl(HostIOMMUContext *iommu_ctx, struct iommu_gpasid_bind_data *unbind); +int host_iommu_ctx_flush_stage1_cache(HostIOMMUContext *iommu_ctx, + struct iommu_cache_invalidate_info *cache); void host_iommu_ctx_init(void *_iommu_ctx, size_t instance_size, const char *mrtypename, From patchwork Thu Sep 10 10:56:35 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11767643 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D76EE618 for ; Thu, 10 Sep 2020 11:09:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BFBCE20BED for ; Thu, 10 Sep 2020 11:09:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730512AbgIJLI5 (ORCPT ); Thu, 10 Sep 2020 07:08:57 -0400 Received: from mga09.intel.com ([134.134.136.24]:6916 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730431AbgIJK5o (ORCPT ); Thu, 10 Sep 2020 06:57:44 -0400 IronPort-SDR: 89BYTqQvHqY2I1dlkYvUCNM6hMgOocc9kdCQg83OnR/jPnGQMeeq9N0v48m6NU4iT4fy9Xa+4f MlpZpTHgksjw== X-IronPort-AV: E=McAfee;i="6000,8403,9739"; a="159459151" X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="159459151" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2020 03:54:40 -0700 IronPort-SDR: KeZAgBZhhbekL9IsHrOavvaG5mVxb7Cs695cbM5uyGiyub6uFPNjv9ftUWqx5vNQ4cXLlnX1n4 qPTiYPIwZWxw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="334140099" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 10 Sep 2020 03:54:40 -0700 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com, jasowang@redhat.com Cc: mst@redhat.com, pbonzini@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, jean-philippe@linaro.org, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, hao.wu@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Richard Henderson , Eduardo Habkost Subject: [RFC v10 22/25] intel_iommu: process PASID-based iotlb invalidation Date: Thu, 10 Sep 2020 03:56:35 -0700 Message-Id: <1599735398-6829-23-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> References: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds the basic PASID-based iotlb (piotlb) invalidation support. piotlb is used during walking Intel VT-d 1st level page table. This patch only adds the basic processing. Detailed handling will be added in next patch. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Reviewed-by: Peter Xu Signed-off-by: Liu Yi L --- hw/i386/intel_iommu.c | 53 ++++++++++++++++++++++++++++++++++++++++++ hw/i386/intel_iommu_internal.h | 13 +++++++++++ 2 files changed, 66 insertions(+) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 55623e8..516d7ff 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -3038,6 +3038,55 @@ static bool vtd_process_pasid_desc(IntelIOMMUState *s, return true; } +static void vtd_piotlb_pasid_invalidate(IntelIOMMUState *s, + uint16_t domain_id, + uint32_t pasid) +{ +} + +static void vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id, + uint32_t pasid, hwaddr addr, uint8_t am, + bool ih) +{ +} + +static bool vtd_process_piotlb_desc(IntelIOMMUState *s, + VTDInvDesc *inv_desc) +{ + uint16_t domain_id; + uint32_t pasid; + uint8_t am; + hwaddr addr; + + if ((inv_desc->val[0] & VTD_INV_DESC_PIOTLB_RSVD_VAL0) || + (inv_desc->val[1] & VTD_INV_DESC_PIOTLB_RSVD_VAL1)) { + error_report_once("non-zero-field-in-piotlb_inv_desc hi: 0x%" PRIx64 + " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]); + return false; + } + + domain_id = VTD_INV_DESC_PIOTLB_DID(inv_desc->val[0]); + pasid = VTD_INV_DESC_PIOTLB_PASID(inv_desc->val[0]); + switch (inv_desc->val[0] & VTD_INV_DESC_IOTLB_G) { + case VTD_INV_DESC_PIOTLB_ALL_IN_PASID: + vtd_piotlb_pasid_invalidate(s, domain_id, pasid); + break; + + case VTD_INV_DESC_PIOTLB_PSI_IN_PASID: + am = VTD_INV_DESC_PIOTLB_AM(inv_desc->val[1]); + addr = (hwaddr) VTD_INV_DESC_PIOTLB_ADDR(inv_desc->val[1]); + vtd_piotlb_page_invalidate(s, domain_id, pasid, addr, am, + VTD_INV_DESC_PIOTLB_IH(inv_desc->val[1])); + break; + + default: + error_report_once("Invalid granularity in P-IOTLB desc hi: 0x%" PRIx64 + " lo: 0x%" PRIx64, inv_desc->val[1], inv_desc->val[0]); + return false; + } + return true; +} + static bool vtd_process_inv_iec_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) { @@ -3152,6 +3201,10 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s) break; case VTD_INV_DESC_PIOTLB: + trace_vtd_inv_desc("p-iotlb", inv_desc.val[1], inv_desc.val[0]); + if (!vtd_process_piotlb_desc(s, &inv_desc)) { + return false; + } break; case VTD_INV_DESC_WAIT: diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index 9805b84..118d568 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -476,6 +476,19 @@ typedef union VTDInvDesc VTDInvDesc; #define VTD_INV_DESC_PASIDC_PASID_SI (1ULL << 4) #define VTD_INV_DESC_PASIDC_GLOBAL (3ULL << 4) +#define VTD_INV_DESC_PIOTLB_ALL_IN_PASID (2ULL << 4) +#define VTD_INV_DESC_PIOTLB_PSI_IN_PASID (3ULL << 4) + +#define VTD_INV_DESC_PIOTLB_RSVD_VAL0 0xfff000000000ffc0ULL +#define VTD_INV_DESC_PIOTLB_RSVD_VAL1 0xf80ULL + +#define VTD_INV_DESC_PIOTLB_PASID(val) (((val) >> 32) & 0xfffffULL) +#define VTD_INV_DESC_PIOTLB_DID(val) (((val) >> 16) & \ + VTD_DOMAIN_ID_MASK) +#define VTD_INV_DESC_PIOTLB_ADDR(val) ((val) & ~0xfffULL) +#define VTD_INV_DESC_PIOTLB_AM(val) ((val) & 0x3fULL) +#define VTD_INV_DESC_PIOTLB_IH(val) (((val) >> 6) & 0x1) + /* Information about page-selective IOTLB invalidate */ struct VTDIOTLBPageInvInfo { uint16_t domain_id; From patchwork Thu Sep 10 10:56:36 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11767641 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CA5E9746 for ; Thu, 10 Sep 2020 11:08:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A94612145D for ; Thu, 10 Sep 2020 11:08:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727770AbgIJLIy (ORCPT ); Thu, 10 Sep 2020 07:08:54 -0400 Received: from mga09.intel.com ([134.134.136.24]:7227 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726932AbgIJK54 (ORCPT ); Thu, 10 Sep 2020 06:57:56 -0400 IronPort-SDR: s6+KBXkYF4ICobbyLl7CvWX8jZAZ7BHxhCmMF/u3QjYX+A4SkYdCLZgA2ranq43snSbw8pA8e5 dYTOStX5p+Ow== X-IronPort-AV: E=McAfee;i="6000,8403,9739"; a="159459152" X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="159459152" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2020 03:54:40 -0700 IronPort-SDR: 4jWBSKUF1SRyGps5IoL+7ouYaJurFXM35bR5i/cKIHJ/PBr/crBw1Tgu7e0Yrqc7bUvSK2bMid ulpmllp0O9GQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="334140104" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 10 Sep 2020 03:54:40 -0700 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com, jasowang@redhat.com Cc: mst@redhat.com, pbonzini@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, jean-philippe@linaro.org, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, hao.wu@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Richard Henderson , Eduardo Habkost Subject: [RFC v10 23/25] intel_iommu: propagate PASID-based iotlb invalidation to host Date: Thu, 10 Sep 2020 03:56:36 -0700 Message-Id: <1599735398-6829-24-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> References: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch propagates PASID-based iotlb invalidation to host. Intel VT-d 3.0 supports nested translation in PASID granular. Guest SVA support could be implemented by configuring nested translation on specific PASID. This is also known as dual stage DMA translation. Under such configuration, guest owns the GVA->GPA translation which is configured as first level page table in host side for a specific pasid, and host owns GPA->HPA translation. As guest owns first level translation table, piotlb invalidation should be propagated to host since host IOMMU will cache first level page table related mappings during DMA address translation. This patch traps the guest PASID-based iotlb flush and propagate it to host. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Signed-off-by: Liu Yi L --- rfcv4 (v1) -> rfcv5 (v2): *) removed the valid check to vtd_pasid_as instance as rfcv5 ensures all vtd_pasid_as instances in hash table should be valid. --- hw/i386/intel_iommu.c | 113 +++++++++++++++++++++++++++++++++++++++++ hw/i386/intel_iommu_internal.h | 7 +++ 2 files changed, 120 insertions(+) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 516d7ff..32b0029 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -3038,16 +3038,129 @@ static bool vtd_process_pasid_desc(IntelIOMMUState *s, return true; } +/** + * Caller of this function should hold iommu_lock. + */ +static void vtd_invalidate_piotlb(IntelIOMMUState *s, + VTDBus *vtd_bus, + int devfn, + struct iommu_cache_invalidate_info *cache) +{ + VTDHostIOMMUContext *vtd_dev_icx; + HostIOMMUContext *iommu_ctx; + + vtd_dev_icx = vtd_bus->dev_icx[devfn]; + if (!vtd_dev_icx) { + goto out; + } + iommu_ctx = vtd_dev_icx->iommu_ctx; + if (!iommu_ctx) { + goto out; + } + if (host_iommu_ctx_flush_stage1_cache(iommu_ctx, cache)) { + error_report("Cache flush failed"); + } +out: + return; +} + +/** + * This function is a loop function for the s->vtd_pasid_as + * list with VTDPIOTLBInvInfo as execution filter. It propagates + * the piotlb invalidation to host. Caller of this function + * should hold iommu_lock. + */ +static void vtd_flush_pasid_iotlb(gpointer key, gpointer value, + gpointer user_data) +{ + VTDPIOTLBInvInfo *piotlb_info = user_data; + VTDPASIDAddressSpace *vtd_pasid_as = value; + VTDPASIDCacheEntry *pc_entry = &vtd_pasid_as->pasid_cache_entry; + uint16_t did; + + did = vtd_pe_get_domain_id(&pc_entry->pasid_entry); + + if ((piotlb_info->domain_id == did) && + (piotlb_info->pasid == vtd_pasid_as->pasid)) { + vtd_invalidate_piotlb(vtd_pasid_as->iommu_state, + vtd_pasid_as->vtd_bus, + vtd_pasid_as->devfn, + piotlb_info->cache_info); + } + + /* + * TODO: needs to add QEMU piotlb flush when QEMU piotlb + * infrastructure is ready. For now, it is enough for passthru + * devices. + */ +} + static void vtd_piotlb_pasid_invalidate(IntelIOMMUState *s, uint16_t domain_id, uint32_t pasid) { + VTDPIOTLBInvInfo piotlb_info; + struct iommu_cache_invalidate_info *cache_info; + + cache_info = g_malloc0(sizeof(*cache_info)); + + cache_info->argsz = sizeof(*cache_info); + cache_info->version = IOMMU_CACHE_INVALIDATE_INFO_VERSION_1; + cache_info->cache = IOMMU_CACHE_INV_TYPE_IOTLB; + cache_info->granularity = IOMMU_INV_GRANU_PASID; + cache_info->granu.pasid_info.pasid = pasid; + cache_info->granu.pasid_info.flags = IOMMU_INV_PASID_FLAGS_PASID; + + piotlb_info.domain_id = domain_id; + piotlb_info.pasid = pasid; + piotlb_info.cache_info = cache_info; + + vtd_iommu_lock(s); + /* + * Here loops all the vtd_pasid_as instances in s->vtd_pasid_as + * to find out the affected devices since piotlb invalidation + * should check pasid cache per architecture point of view. + */ + g_hash_table_foreach(s->vtd_pasid_as, + vtd_flush_pasid_iotlb, &piotlb_info); + vtd_iommu_unlock(s); + g_free(cache_info); } static void vtd_piotlb_page_invalidate(IntelIOMMUState *s, uint16_t domain_id, uint32_t pasid, hwaddr addr, uint8_t am, bool ih) { + VTDPIOTLBInvInfo piotlb_info; + struct iommu_cache_invalidate_info *cache_info; + + cache_info = g_malloc0(sizeof(*cache_info)); + + cache_info->argsz = sizeof(*cache_info); + cache_info->version = IOMMU_CACHE_INVALIDATE_INFO_VERSION_1; + cache_info->cache = IOMMU_CACHE_INV_TYPE_IOTLB; + cache_info->granularity = IOMMU_INV_GRANU_ADDR; + cache_info->granu.addr_info.flags = IOMMU_INV_ADDR_FLAGS_PASID; + cache_info->granu.addr_info.flags |= ih ? IOMMU_INV_ADDR_FLAGS_LEAF : 0; + cache_info->granu.addr_info.pasid = pasid; + cache_info->granu.addr_info.addr = addr; + cache_info->granu.addr_info.granule_size = 1 << (12 + am); + cache_info->granu.addr_info.nb_granules = 1; + + piotlb_info.domain_id = domain_id; + piotlb_info.pasid = pasid; + piotlb_info.cache_info = cache_info; + + vtd_iommu_lock(s); + /* + * Here loops all the vtd_pasid_as instances in s->vtd_pasid_as + * to find out the affected devices since piotlb invalidation + * should check pasid cache per architecture point of view. + */ + g_hash_table_foreach(s->vtd_pasid_as, + vtd_flush_pasid_iotlb, &piotlb_info); + vtd_iommu_unlock(s); + g_free(cache_info); } static bool vtd_process_piotlb_desc(IntelIOMMUState *s, diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index 118d568..08ff58e 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -575,6 +575,13 @@ struct VTDPASIDCacheInfo { }; typedef struct VTDPASIDCacheInfo VTDPASIDCacheInfo; +struct VTDPIOTLBInvInfo { + uint16_t domain_id; + uint32_t pasid; + struct iommu_cache_invalidate_info *cache_info; +}; +typedef struct VTDPIOTLBInvInfo VTDPIOTLBInvInfo; + /* PASID Table Related Definitions */ #define VTD_PASID_DIR_BASE_ADDR_MASK (~0xfffULL) #define VTD_PASID_TABLE_BASE_ADDR_MASK (~0xfffULL) From patchwork Thu Sep 10 10:56:37 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11767625 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1FF93618 for ; Thu, 10 Sep 2020 11:06:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 08F792145D for ; Thu, 10 Sep 2020 11:06:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730068AbgIJLGm (ORCPT ); Thu, 10 Sep 2020 07:06:42 -0400 Received: from mga09.intel.com ([134.134.136.24]:6916 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730602AbgIJK7r (ORCPT ); Thu, 10 Sep 2020 06:59:47 -0400 IronPort-SDR: ri32XehcQHoppbV70+6+eWnx0scVNilZm+GirGTyGnq2I4y0wJONB5Z0a/cumwdJQwoCgWSQ8Z 5QZFrBExwaIg== X-IronPort-AV: E=McAfee;i="6000,8403,9739"; a="159459153" X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="159459153" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2020 03:54:40 -0700 IronPort-SDR: pHdzxKbIsgRn66hzsSwFWDvi74Cfl4+a/SEG1uh5UuCghq86WEoDkf+BlhboPnZ9TReVar5sVm SG0Lx5FLzLNA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="334140106" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 10 Sep 2020 03:54:40 -0700 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com, jasowang@redhat.com Cc: mst@redhat.com, pbonzini@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, jean-philippe@linaro.org, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, hao.wu@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Richard Henderson , Eduardo Habkost Subject: [RFC v10 24/25] intel_iommu: process PASID-based Device-TLB invalidation Date: Thu, 10 Sep 2020 03:56:37 -0700 Message-Id: <1599735398-6829-25-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> References: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This patch adds an empty handling for PASID-based Device-TLB invalidation. For now it is enough as it is not necessary to propagate it to host for passthru device and also there is no emulated device has device tlb. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Reviewed-by: Peter Xu Signed-off-by: Liu Yi L --- hw/i386/intel_iommu.c | 18 ++++++++++++++++++ hw/i386/intel_iommu_internal.h | 1 + 2 files changed, 19 insertions(+) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 32b0029..2010c33 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -3213,6 +3213,17 @@ static bool vtd_process_inv_iec_desc(IntelIOMMUState *s, return true; } +static bool vtd_process_device_piotlb_desc(IntelIOMMUState *s, + VTDInvDesc *inv_desc) +{ + /* + * no need to handle it for passthru device, for emulated + * devices with device tlb, it may be required, but for now, + * return is enough + */ + return true; +} + static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s, VTDInvDesc *inv_desc) { @@ -3334,6 +3345,13 @@ static bool vtd_process_inv_desc(IntelIOMMUState *s) } break; + case VTD_INV_DESC_DEV_PIOTLB: + trace_vtd_inv_desc("device-piotlb", inv_desc.hi, inv_desc.lo); + if (!vtd_process_device_piotlb_desc(s, &inv_desc)) { + return false; + } + break; + case VTD_INV_DESC_DEVICE: trace_vtd_inv_desc("device", inv_desc.hi, inv_desc.lo); if (!vtd_process_device_iotlb_desc(s, &inv_desc)) { diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index 08ff58e..9b4fc67 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -405,6 +405,7 @@ typedef union VTDInvDesc VTDInvDesc; #define VTD_INV_DESC_WAIT 0x5 /* Invalidation Wait Descriptor */ #define VTD_INV_DESC_PIOTLB 0x6 /* PASID-IOTLB Invalidate Desc */ #define VTD_INV_DESC_PC 0x7 /* PASID-cache Invalidate Desc */ +#define VTD_INV_DESC_DEV_PIOTLB 0x8 /* PASID-based-DIOTLB inv_desc*/ #define VTD_INV_DESC_NONE 0 /* Not an Invalidate Descriptor */ /* Masks for Invalidation Wait Descriptor*/ From patchwork Thu Sep 10 10:56:38 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yi Liu X-Patchwork-Id: 11767627 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 993FF746 for ; Thu, 10 Sep 2020 11:06:55 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7D72C20C09 for ; Thu, 10 Sep 2020 11:06:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730551AbgIJLGu (ORCPT ); Thu, 10 Sep 2020 07:06:50 -0400 Received: from mga09.intel.com ([134.134.136.24]:7227 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730616AbgIJLAB (ORCPT ); Thu, 10 Sep 2020 07:00:01 -0400 IronPort-SDR: wuyaBBNURxRYDzE1iQv2w9ROh3+rq1BUMfJi/cbemn7kPGZxPg6G0mKnBnx3BUmO8xajTMzRIh JVbakLRdxcxw== X-IronPort-AV: E=McAfee;i="6000,8403,9739"; a="159459154" X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="159459154" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Sep 2020 03:54:40 -0700 IronPort-SDR: 6lQCFRgiZbizOl2qMz20I6mWGJjl28N2f9P80ruZ6BSjXM4gMMHF6teKC3RGDuwyDjVVRXzelb Q+s4qoPh774g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,412,1592895600"; d="scan'208";a="334140109" Received: from jacob-builder.jf.intel.com ([10.7.199.155]) by orsmga008.jf.intel.com with ESMTP; 10 Sep 2020 03:54:40 -0700 From: Liu Yi L To: qemu-devel@nongnu.org, alex.williamson@redhat.com, peterx@redhat.com, jasowang@redhat.com Cc: mst@redhat.com, pbonzini@redhat.com, eric.auger@redhat.com, david@gibson.dropbear.id.au, jean-philippe@linaro.org, kevin.tian@intel.com, yi.l.liu@intel.com, jun.j.tian@intel.com, yi.y.sun@intel.com, hao.wu@intel.com, kvm@vger.kernel.org, Jacob Pan , Yi Sun , Richard Henderson , Eduardo Habkost Subject: [RFC v10 25/25] intel_iommu: modify x-scalable-mode to be string option Date: Thu, 10 Sep 2020 03:56:38 -0700 Message-Id: <1599735398-6829-26-git-send-email-yi.l.liu@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> References: <1599735398-6829-1-git-send-email-yi.l.liu@intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Intel VT-d 3.0 introduces scalable mode, and it has a bunch of capabilities related to scalable mode translation, thus there are multiple combinations. While this vIOMMU implementation wants simplify it for user by providing typical combinations. User could config it by "x-scalable-mode" option. The usage is as below: "-device intel-iommu,x-scalable-mode=["legacy"|"modern"|"off"]" - "legacy": gives support for SL page table - "modern": gives support for FL page table, pasid, virtual command - "off": no scalable mode support - if not configured, means no scalable mode support, if not proper configured, will throw error Note: this patch is supposed to be merged when the whole vSVA patch series were merged. Cc: Kevin Tian Cc: Jacob Pan Cc: Peter Xu Cc: Yi Sun Cc: Paolo Bonzini Cc: Richard Henderson Cc: Eduardo Habkost Reviewed-by: Peter Xu Signed-off-by: Liu Yi L Signed-off-by: Yi Sun --- rfcv5 (v2) -> rfcv6: *) reports want_nested to VFIO; *) assert iommu_set/unset_iommu_context() if vIOMMU is not scalable modern. --- hw/i386/intel_iommu.c | 39 +++++++++++++++++++++++++++++++++++---- hw/i386/intel_iommu_internal.h | 3 +++ include/hw/i386/intel_iommu.h | 2 ++ 3 files changed, 40 insertions(+), 4 deletions(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 2010c33..9781a18 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -4050,7 +4050,7 @@ static Property vtd_properties[] = { DEFINE_PROP_UINT8("aw-bits", IntelIOMMUState, aw_bits, VTD_HOST_ADDRESS_WIDTH), DEFINE_PROP_BOOL("caching-mode", IntelIOMMUState, caching_mode, FALSE), - DEFINE_PROP_BOOL("x-scalable-mode", IntelIOMMUState, scalable_mode, FALSE), + DEFINE_PROP_STRING("x-scalable-mode", IntelIOMMUState, scalable_mode_str), DEFINE_PROP_BOOL("dma-drain", IntelIOMMUState, dma_drain, true), DEFINE_PROP_END_OF_LIST(), }; @@ -4419,6 +4419,7 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn) static int vtd_dev_get_iommu_attr(PCIBus *bus, void *opaque, int32_t devfn, IOMMUAttr attr, void *data) { + IntelIOMMUState *s = opaque; int ret = 0; assert(0 <= devfn && devfn < PCI_DEVFN_MAX); @@ -4428,8 +4429,7 @@ static int vtd_dev_get_iommu_attr(PCIBus *bus, void *opaque, int32_t devfn, { bool *pdata = data; - /* return false until vSVA is ready */ - *pdata = false; + *pdata = s->scalable_modern ? true : false; break; } default: @@ -4523,6 +4523,8 @@ static int vtd_dev_set_iommu_context(PCIBus *bus, void *opaque, VTDHostIOMMUContext *vtd_dev_icx; assert(0 <= devfn && devfn < PCI_DEVFN_MAX); + /* only modern scalable supports set_ioimmu_context */ + assert(s->scalable_modern); vtd_bus = vtd_find_add_bus(s, bus); @@ -4557,6 +4559,8 @@ static void vtd_dev_unset_iommu_context(PCIBus *bus, void *opaque, int devfn) VTDHostIOMMUContext *vtd_dev_icx; assert(0 <= devfn && devfn < PCI_DEVFN_MAX); + /* only modern scalable supports unset_ioimmu_context */ + assert(s->scalable_modern); vtd_bus = vtd_find_add_bus(s, bus); @@ -4784,8 +4788,13 @@ static void vtd_init(IntelIOMMUState *s) } /* TODO: read cap/ecap from host to decide which cap to be exposed. */ - if (s->scalable_mode) { + if (s->scalable_mode && !s->scalable_modern) { s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_SLTS; + } else if (s->scalable_mode && s->scalable_modern) { + s->ecap |= VTD_ECAP_SMTS | VTD_ECAP_SRS | VTD_ECAP_PASID | + VTD_ECAP_FLTS | VTD_ECAP_PSS(VTD_PASID_SS) | + VTD_ECAP_VCS; + s->vccap |= VTD_VCCAP_PAS; } if (!s->cap_finalized) { @@ -4926,6 +4935,28 @@ static bool vtd_decide_config(IntelIOMMUState *s, Error **errp) return false; } + if (s->scalable_mode_str && + (strcmp(s->scalable_mode_str, "off") && + strcmp(s->scalable_mode_str, "modern") && + strcmp(s->scalable_mode_str, "legacy"))) { + error_setg(errp, "Invalid x-scalable-mode config," + "Please use \"modern\", \"legacy\" or \"off\""); + return false; + } + + if (s->scalable_mode_str && + !strcmp(s->scalable_mode_str, "legacy")) { + s->scalable_mode = true; + s->scalable_modern = false; + } else if (s->scalable_mode_str && + !strcmp(s->scalable_mode_str, "modern")) { + s->scalable_mode = true; + s->scalable_modern = true; + } else { + s->scalable_mode = false; + s->scalable_modern = false; + } + return true; } diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h index 9b4fc67..afb4c6a 100644 --- a/hw/i386/intel_iommu_internal.h +++ b/hw/i386/intel_iommu_internal.h @@ -197,7 +197,9 @@ #define VTD_ECAP_MHMV (15ULL << 20) #define VTD_ECAP_SRS (1ULL << 31) #define VTD_ECAP_SMTS (1ULL << 43) +#define VTD_ECAP_VCS (1ULL << 44) #define VTD_ECAP_SLTS (1ULL << 46) +#define VTD_ECAP_FLTS (1ULL << 47) /* 1st level related caps */ #define VTD_CAP_FL1GP (1ULL << 56) @@ -209,6 +211,7 @@ #define VTD_ECAP_PSS(val) (((val) & 0x1fULL) << 35) #define VTD_ECAP_PASID (1ULL << 40) +#define VTD_PASID_SS (19) #define VTD_GET_PSS(val) (((val) >> 35) & 0x1f) #define VTD_ECAP_PSS_MASK (0x1fULL << 35) diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h index 1aab882..fd64364 100644 --- a/include/hw/i386/intel_iommu.h +++ b/include/hw/i386/intel_iommu.h @@ -263,6 +263,8 @@ struct IntelIOMMUState { bool caching_mode; /* RO - is cap CM enabled? */ bool scalable_mode; /* RO - is Scalable Mode supported? */ + char *scalable_mode_str; /* RO - admin's Scalable Mode config */ + bool scalable_modern; /* RO - is modern SM supported? */ dma_addr_t root; /* Current root table pointer */ bool root_scalable; /* Type of root table (scalable or not) */