From patchwork Wed Dec 21 04:21:17 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 9482483 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 39AC560772 for ; Wed, 21 Dec 2016 04:21:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 17BA927F82 for ; Wed, 21 Dec 2016 04:21:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EFA8D281A7; Wed, 21 Dec 2016 04:21:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 3DB0127F82 for ; Wed, 21 Dec 2016 04:21:52 +0000 (UTC) Received: from localhost ([::1]:54814 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cJYPW-000614-SX for patchwork-qemu-devel@patchwork.kernel.org; Tue, 20 Dec 2016 23:21:50 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55545) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cJYPG-00060t-GR for qemu-devel@nongnu.org; Tue, 20 Dec 2016 23:21:35 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cJYPB-0001sn-Dz for qemu-devel@nongnu.org; Tue, 20 Dec 2016 23:21:34 -0500 Received: from mx1.redhat.com ([209.132.183.28]:46488) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cJYPB-0001rt-5e for qemu-devel@nongnu.org; Tue, 20 Dec 2016 23:21:29 -0500 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2F4A361BB6; Wed, 21 Dec 2016 04:21:28 +0000 (UTC) Received: from pxdev.xzpeter.org (vpn1-7-26.pek2.redhat.com [10.72.7.26]) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uBL4LMIK026682; Tue, 20 Dec 2016 23:21:22 -0500 From: Peter Xu To: qemu-devel@nongnu.org Date: Wed, 21 Dec 2016 12:21:17 +0800 Message-Id: <1482294077-29807-1-git-send-email-peterx@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Wed, 21 Dec 2016 04:21:28 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2] intel_iommu: allow dynamic switch of IOMMU region X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: tianyu.lan@intel.com, kevin.tian@intel.com, mst@redhat.com, jan.kiszka@siemens.com, jasowang@redhat.com, peterx@redhat.com, alex.williamson@redhat.com, bd.aviv@gmail.com, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" X-Virus-Scanned: ClamAV using ClamSMTP This is preparation work to finally enabled dynamic switching ON/OFF for VT-d protection. The old VT-d codes is using static IOMMU region, and that won't satisfy vfio-pci device listeners. Let me explain. vfio-pci devices depend on the memory region listener and IOMMU replay mechanism to make sure the device mapping is coherent with the guest even if there are domain switches. And there are two kinds of domain switches: (1) switch from domain A -> B (2) switch from domain A -> no domain (e.g., turn DMAR off) Case (1) is handled by the context entry invalidation handling by the VT-d replay logic. What the replay function should do here is to replay the existing page mappings in domain B. However for case (2), we don't want to replay any domain mappings - we just need the default GPA->HPA mappings (the address_space_memory mapping). And this patch helps on case (2) to build up the mapping automatically by leveraging the vfio-pci memory listeners. Another important thing that this patch does is to seperate IR (Interrupt Remapping) from DMAR (DMA Remapping). IR region should not depend on the DMAR region (like before this patch). It should be a standalone region, and it should be able to be activated without DMAR (which is a common behavior of Linux kernel - by default it enables IR while disabled DMAR). Signed-off-by: Peter Xu --- v2 - fix issues reported by patchew - switch domain by enable/disable memory regions [David] - provide vtd_switch_address_space{_all}() - provide a better comment on the memory regions test done: with intel_iommu device, boot vm with/without "intel_iommu=on" parameter. --- hw/i386/intel_iommu.c | 79 ++++++++++++++++++++++++++++++++++++++++--- hw/i386/trace-events | 3 ++ include/hw/i386/intel_iommu.h | 2 ++ 3 files changed, 80 insertions(+), 4 deletions(-) diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c index 5f3e351..5d90df0 100644 --- a/hw/i386/intel_iommu.c +++ b/hw/i386/intel_iommu.c @@ -35,6 +35,7 @@ #include "sysemu/kvm.h" #include "hw/i386/apic_internal.h" #include "kvm_i386.h" +#include "trace.h" /*#define DEBUG_INTEL_IOMMU*/ #ifdef DEBUG_INTEL_IOMMU @@ -1179,9 +1180,49 @@ static void vtd_handle_gcmd_sirtp(IntelIOMMUState *s) vtd_set_clear_mask_long(s, DMAR_GSTS_REG, 0, VTD_GSTS_IRTPS); } +static void vtd_switch_address_space(VTDAddressSpace *as, bool iommu_enabled) +{ + assert(as); + + trace_vtd_switch_address_space(pci_bus_num(as->bus), + VTD_PCI_SLOT(as->devfn), + VTD_PCI_FUNC(as->devfn), + iommu_enabled); + + /* Turn off first then on the other */ + if (iommu_enabled) { + memory_region_set_enabled(&as->sys_alias, false); + memory_region_set_enabled(&as->iommu, true); + } else { + memory_region_set_enabled(&as->iommu, false); + memory_region_set_enabled(&as->sys_alias, true); + } +} + +static void vtd_switch_address_space_all(IntelIOMMUState *s, bool enabled) +{ + GHashTableIter iter; + VTDBus *vtd_bus; + int i; + + g_hash_table_iter_init(&iter, s->vtd_as_by_busptr); + while (g_hash_table_iter_next (&iter, NULL, (void **)&vtd_bus)) { + for (i = 0; i < X86_IOMMU_PCI_DEVFN_MAX; i++) { + if (!vtd_bus->dev_as[i]) { + continue; + } + vtd_switch_address_space(vtd_bus->dev_as[i], enabled); + } + } +} + /* Handle Translation Enable/Disable */ static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en) { + if (s->dmar_enabled == en) { + return; + } + VTD_DPRINTF(CSR, "Translation Enable %s", (en ? "on" : "off")); if (en) { @@ -1196,6 +1237,8 @@ static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en) /* Ok - report back to driver */ vtd_set_clear_mask_long(s, DMAR_GSTS_REG, VTD_GSTS_TES, 0); } + + vtd_switch_address_space_all(s, en); } /* Handle Interrupt Remap Enable/Disable */ @@ -2343,15 +2386,43 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus, int devfn) vtd_dev_as->devfn = (uint8_t)devfn; vtd_dev_as->iommu_state = s; vtd_dev_as->context_cache_entry.context_cache_gen = 0; + + /* + * Memory region relationships looks like (Address range shows + * only lower 32 bits to make it short in length...): + * + * |-----------------+-------------------+----------| + * | Name | Address range | Priority | + * |-----------------+-------------------+----------+ + * | vtd_root | 00000000-ffffffff | 0 | + * | intel_iommu | 00000000-ffffffff | 1 | + * | vtd_sys_alias | 00000000-ffffffff | 1 | + * | intel_iommu_ir | fee00000-feefffff | 64 | + * |-----------------+-------------------+----------| + * + * We enable/disable DMAR by switching enablement for + * vtd_sys_alias and intel_iommu regions. IR region is always + * enabled. + */ memory_region_init_iommu(&vtd_dev_as->iommu, OBJECT(s), &s->iommu_ops, "intel_iommu", UINT64_MAX); + memory_region_init_alias(&vtd_dev_as->sys_alias, OBJECT(s), + "vtd_sys_alias", get_system_memory(), + 0, memory_region_size(get_system_memory())); memory_region_init_io(&vtd_dev_as->iommu_ir, OBJECT(s), &vtd_mem_ir_ops, s, "intel_iommu_ir", VTD_INTERRUPT_ADDR_SIZE); - memory_region_add_subregion(&vtd_dev_as->iommu, VTD_INTERRUPT_ADDR_FIRST, - &vtd_dev_as->iommu_ir); - address_space_init(&vtd_dev_as->as, - &vtd_dev_as->iommu, "intel_iommu"); + memory_region_init(&vtd_dev_as->root, OBJECT(s), + "vtd_root", UINT64_MAX); + memory_region_add_subregion_overlap(&vtd_dev_as->root, + VTD_INTERRUPT_ADDR_FIRST, + &vtd_dev_as->iommu_ir, 64); + address_space_init(&vtd_dev_as->as, &vtd_dev_as->root, "intel_iommu"); + memory_region_add_subregion_overlap(&vtd_dev_as->root, 0, + &vtd_dev_as->sys_alias, 1); + memory_region_add_subregion_overlap(&vtd_dev_as->root, 0, + &vtd_dev_as->iommu, 1); + vtd_switch_address_space(vtd_dev_as, s->dmar_enabled); } return vtd_dev_as; } diff --git a/hw/i386/trace-events b/hw/i386/trace-events index d2b4973..aee93bb 100644 --- a/hw/i386/trace-events +++ b/hw/i386/trace-events @@ -10,6 +10,9 @@ xen_pv_mmio_write(uint64_t addr) "WARNING: write to Xen PV Device MMIO space (ad # hw/i386/x86-iommu.c x86_iommu_iec_notify(bool global, uint32_t index, uint32_t mask) "Notify IEC invalidation: global=%d index=%" PRIu32 " mask=%" PRIu32 +# hw/i386/intel_iommu.c +vtd_switch_address_space(uint8_t bus, uint8_t slot, uint8_t fn, bool on) "Device %02x:%02x.%x switching address space (iommu enabled=%d)" + # hw/i386/amd_iommu.c amdvi_evntlog_fail(uint64_t addr, uint32_t head) "error: fail to write at addr 0x%"PRIx64" + offset 0x%"PRIx32 amdvi_cache_update(uint16_t domid, uint8_t bus, uint8_t slot, uint8_t func, uint64_t gpa, uint64_t txaddr) " update iotlb domid 0x%"PRIx16" devid: %02x:%02x.%x gpa 0x%"PRIx64" hpa 0x%"PRIx64 diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h index 405c9d1..85c1b9b 100644 --- a/include/hw/i386/intel_iommu.h +++ b/include/hw/i386/intel_iommu.h @@ -83,6 +83,8 @@ struct VTDAddressSpace { uint8_t devfn; AddressSpace as; MemoryRegion iommu; + MemoryRegion root; + MemoryRegion sys_alias; MemoryRegion iommu_ir; /* Interrupt region: 0xfeeXXXXX */ IntelIOMMUState *iommu_state; VTDContextCacheEntry context_cache_entry;