From patchwork Mon Sep 23 12:24:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baolu Lu X-Patchwork-Id: 11156691 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id CB67B13B1 for ; Mon, 23 Sep 2019 12:27:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B28D320882 for ; Mon, 23 Sep 2019 12:27:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732339AbfIWM1W (ORCPT ); Mon, 23 Sep 2019 08:27:22 -0400 Received: from mga06.intel.com ([134.134.136.31]:4334 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730399AbfIWM1V (ORCPT ); Mon, 23 Sep 2019 08:27:21 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 23 Sep 2019 05:27:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,539,1559545200"; d="scan'208";a="203116661" Received: from allen-box.sh.intel.com ([10.239.159.136]) by fmsmga001.fm.intel.com with ESMTP; 23 Sep 2019 05:27:19 -0700 From: Lu Baolu To: Joerg Roedel , David Woodhouse , Alex Williamson Cc: ashok.raj@intel.com, sanjay.k.kumar@intel.com, jacob.jun.pan@linux.intel.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, iommu@lists.linux-foundation.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Lu Baolu , Yi Sun Subject: [RFC PATCH 1/4] iommu/vt-d: Move domain_flush_cache helper into header Date: Mon, 23 Sep 2019 20:24:51 +0800 Message-Id: <20190923122454.9888-2-baolu.lu@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190923122454.9888-1-baolu.lu@linux.intel.com> References: <20190923122454.9888-1-baolu.lu@linux.intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org So that it could be used in other source files as well. Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian Cc: Liu Yi L Cc: Yi Sun Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 7 ------- include/linux/intel-iommu.h | 7 +++++++ 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 5aa68a094efd..9cfe8098d993 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -828,13 +828,6 @@ static struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devf return iommu; } -static void domain_flush_cache(struct dmar_domain *domain, - void *addr, int size) -{ - if (!domain->iommu_coherency) - clflush_cache_range(addr, size); -} - static int device_context_mapped(struct intel_iommu *iommu, u8 bus, u8 devfn) { struct context_entry *context; diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index ed11ef594378..3ee694d4f361 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -629,6 +629,13 @@ static inline int first_pte_in_page(struct dma_pte *pte) return !((unsigned long)pte & ~VTD_PAGE_MASK); } +static inline void +domain_flush_cache(struct dmar_domain *domain, void *addr, int size) +{ + if (!domain->iommu_coherency) + clflush_cache_range(addr, size); +} + extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev *dev); extern int dmar_find_matched_atsr_unit(struct pci_dev *dev); From patchwork Mon Sep 23 12:24:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baolu Lu X-Patchwork-Id: 11156697 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BBD3514ED for ; Mon, 23 Sep 2019 12:27:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 902D921655 for ; Mon, 23 Sep 2019 12:27:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732394AbfIWM10 (ORCPT ); Mon, 23 Sep 2019 08:27:26 -0400 Received: from mga06.intel.com ([134.134.136.31]:4334 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730399AbfIWM1Y (ORCPT ); Mon, 23 Sep 2019 08:27:24 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 23 Sep 2019 05:27:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,539,1559545200"; d="scan'208";a="203116669" Received: from allen-box.sh.intel.com ([10.239.159.136]) by fmsmga001.fm.intel.com with ESMTP; 23 Sep 2019 05:27:21 -0700 From: Lu Baolu To: Joerg Roedel , David Woodhouse , Alex Williamson Cc: ashok.raj@intel.com, sanjay.k.kumar@intel.com, jacob.jun.pan@linux.intel.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, iommu@lists.linux-foundation.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Lu Baolu , Yi Sun Subject: [RFC PATCH 2/4] iommu/vt-d: Add first level page table interfaces Date: Mon, 23 Sep 2019 20:24:52 +0800 Message-Id: <20190923122454.9888-3-baolu.lu@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190923122454.9888-1-baolu.lu@linux.intel.com> References: <20190923122454.9888-1-baolu.lu@linux.intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This adds functions to manipulate first level page tables which could be used by a scalale mode capable IOMMU unit. intel_mmmap_range(domain, addr, end, phys_addr, prot) - Map an iova range of [addr, end) to the physical memory started at @phys_addr with the @prot permissions. intel_mmunmap_range(domain, addr, end) - Tear down the map of an iova range [addr, end). A page list will be returned which will be freed after iotlb flushing. Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian Cc: Liu Yi L Cc: Yi Sun Signed-off-by: Lu Baolu --- drivers/iommu/Makefile | 2 +- drivers/iommu/intel-pgtable.c | 342 +++++++++++++++++++++++++++++ include/linux/intel-iommu.h | 24 +- include/trace/events/intel_iommu.h | 60 +++++ 4 files changed, 426 insertions(+), 2 deletions(-) create mode 100644 drivers/iommu/intel-pgtable.c diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 4f405f926e73..dc550e14cc58 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -17,7 +17,7 @@ obj-$(CONFIG_ARM_SMMU) += arm-smmu.o arm-smmu-impl.o obj-$(CONFIG_ARM_SMMU_V3) += arm-smmu-v3.o obj-$(CONFIG_DMAR_TABLE) += dmar.o obj-$(CONFIG_INTEL_IOMMU) += intel-iommu.o intel-pasid.o -obj-$(CONFIG_INTEL_IOMMU) += intel-trace.o +obj-$(CONFIG_INTEL_IOMMU) += intel-trace.o intel-pgtable.o obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += intel-iommu-debugfs.o obj-$(CONFIG_INTEL_IOMMU_SVM) += intel-svm.o obj-$(CONFIG_IPMMU_VMSA) += ipmmu-vmsa.o diff --git a/drivers/iommu/intel-pgtable.c b/drivers/iommu/intel-pgtable.c new file mode 100644 index 000000000000..8e95978cd381 --- /dev/null +++ b/drivers/iommu/intel-pgtable.c @@ -0,0 +1,342 @@ +// SPDX-License-Identifier: GPL-2.0 +/** + * intel-pgtable.c - Intel IOMMU page table manipulation library + * + * Copyright (C) 2019 Intel Corporation + * + * Author: Lu Baolu + */ + +#define pr_fmt(fmt) "DMAR: " fmt +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#ifdef CONFIG_X86 +/* + * mmmap: Map a range of IO virtual address to physical addresses. + */ +#define pgtable_populate(domain, nm) \ +do { \ + void *__new = alloc_pgtable_page(domain->nid); \ + if (!__new) \ + return -ENOMEM; \ + smp_wmb(); \ + spin_lock(&(domain)->page_table_lock); \ + if (nm ## _present(*nm)) { \ + free_pgtable_page(__new); \ + } else { \ + set_##nm(nm, __##nm(__pa(__new) | _PAGE_TABLE)); \ + domain_flush_cache(domain, nm, sizeof(nm##_t)); \ + } \ + spin_unlock(&(domain)->page_table_lock); \ +} while(0); + +static int +mmmap_pte_range(struct dmar_domain *domain, pmd_t *pmd, unsigned long addr, + unsigned long end, phys_addr_t phys_addr, pgprot_t prot) +{ + pte_t *pte, *first_pte; + u64 pfn; + + pfn = phys_addr >> PAGE_SHIFT; + if (unlikely(pmd_none(*pmd))) + pgtable_populate(domain, pmd); + + first_pte = pte = pte_offset_kernel(pmd, addr); + + do { + set_pte(pte, pfn_pte(pfn, prot)); + pfn++; + } while (pte++, addr += PAGE_SIZE, addr != end); + + domain_flush_cache(domain, first_pte, (void *)pte - (void *)first_pte); + + return 0; +} + +static int +mmmap_pmd_range(struct dmar_domain *domain, pud_t *pud, unsigned long addr, + unsigned long end, phys_addr_t phys_addr, pgprot_t prot) +{ + unsigned long next; + pmd_t *pmd; + + if (unlikely(pud_none(*pud))) + pgtable_populate(domain, pud); + pmd = pmd_offset(pud, addr); + + phys_addr -= addr; + do { + next = pmd_addr_end(addr, end); + if (mmmap_pte_range(domain, pmd, addr, next, + phys_addr + addr, prot)) + return -ENOMEM; + } while (pmd++, addr = next, addr != end); + + return 0; +} + +static int +mmmap_pud_range(struct dmar_domain *domain, p4d_t *p4d, unsigned long addr, + unsigned long end, phys_addr_t phys_addr, pgprot_t prot) +{ + unsigned long next; + pud_t *pud; + + if (unlikely(p4d_none(*p4d))) + pgtable_populate(domain, p4d); + + pud = pud_offset(p4d, addr); + + phys_addr -= addr; + do { + next = pud_addr_end(addr, end); + if (mmmap_pmd_range(domain, pud, addr, next, + phys_addr + addr, prot)) + return -ENOMEM; + } while (pud++, addr = next, addr != end); + + return 0; +} + +static int +mmmap_p4d_range(struct dmar_domain *domain, pgd_t *pgd, unsigned long addr, + unsigned long end, phys_addr_t phys_addr, pgprot_t prot) +{ + unsigned long next; + p4d_t *p4d; + + if (cpu_feature_enabled(X86_FEATURE_LA57) && unlikely(pgd_none(*pgd))) + pgtable_populate(domain, pgd); + + p4d = p4d_offset(pgd, addr); + + phys_addr -= addr; + do { + next = p4d_addr_end(addr, end); + if (mmmap_pud_range(domain, p4d, addr, next, + phys_addr + addr, prot)) + return -ENOMEM; + } while (p4d++, addr = next, addr != end); + + return 0; +} + +int intel_mmmap_range(struct dmar_domain *domain, unsigned long addr, + unsigned long end, phys_addr_t phys_addr, int dma_prot) +{ + unsigned long next; + pgprot_t prot; + pgd_t *pgd; + + trace_domain_mm_map(domain, addr, end, phys_addr); + + /* + * There is no PAGE_KERNEL_WO for a pte entry, so let's use RW + * for a pte that requires write operation. + */ + prot = dma_prot & DMA_PTE_WRITE ? PAGE_KERNEL : PAGE_KERNEL_RO; + BUG_ON(addr >= end); + + phys_addr -= addr; + pgd = pgd_offset_pgd(domain->pgd, addr); + do { + next = pgd_addr_end(addr, end); + if (mmmap_p4d_range(domain, pgd, addr, next, + phys_addr + addr, prot)) + return -ENOMEM; + } while (pgd++, addr = next, addr != end); + + return 0; +} + +/* + * mmunmap: Unmap an existing mapping between a range of IO vitual address + * and physical addresses. + */ +static struct page * +mmunmap_pte_range(struct dmar_domain *domain, pmd_t *pmd, + unsigned long addr, unsigned long end, + struct page *freelist, bool reclaim) +{ + int i; + unsigned long start; + pte_t *pte, *first_pte; + + start = addr; + pte = pte_offset_kernel(pmd, addr); + first_pte = pte; + do { + set_pte(pte, __pte(0)); + } while (pte++, addr += PAGE_SIZE, addr != end); + + domain_flush_cache(domain, first_pte, (void *)pte - (void *)first_pte); + + /* Add page to free list if all entries are empty. */ + if (reclaim) { + struct page *pte_page; + + pte = (pte_t *)pmd_page_vaddr(*pmd); + for (i = 0; i < PTRS_PER_PTE; i++) + if (!pte || !pte_none(pte[i])) + goto pte_out; + + pte_page = pmd_page(*pmd); + pte_page->freelist = freelist; + freelist = pte_page; + pmd_clear(pmd); + domain_flush_cache(domain, pmd, sizeof(pmd_t)); + } + +pte_out: + return freelist; +} + +static struct page * +mmunmap_pmd_range(struct dmar_domain *domain, pud_t *pud, + unsigned long addr, unsigned long end, + struct page *freelist, bool reclaim) +{ + int i; + pmd_t *pmd; + unsigned long start, next; + + start = addr; + pmd = pmd_offset(pud, addr); + do { + next = pmd_addr_end(addr, end); + if (pmd_none_or_clear_bad(pmd)) + continue; + freelist = mmunmap_pte_range(domain, pmd, addr, next, + freelist, reclaim); + } while (pmd++, addr = next, addr != end); + + /* Add page to free list if all entries are empty. */ + if (reclaim) { + struct page *pmd_page; + + pmd = (pmd_t *)pud_page_vaddr(*pud); + for (i = 0; i < PTRS_PER_PMD; i++) + if (!pmd || !pmd_none(pmd[i])) + goto pmd_out; + + pmd_page = pud_page(*pud); + pmd_page->freelist = freelist; + freelist = pmd_page; + pud_clear(pud); + domain_flush_cache(domain, pud, sizeof(pud_t)); + } + +pmd_out: + return freelist; +} + +static struct page * +mmunmap_pud_range(struct dmar_domain *domain, p4d_t *p4d, + unsigned long addr, unsigned long end, + struct page *freelist, bool reclaim) +{ + int i; + pud_t *pud; + unsigned long start, next; + + start = addr; + pud = pud_offset(p4d, addr); + do { + next = pud_addr_end(addr, end); + if (pud_none_or_clear_bad(pud)) + continue; + freelist = mmunmap_pmd_range(domain, pud, addr, next, + freelist, reclaim); + } while (pud++, addr = next, addr != end); + + /* Add page to free list if all entries are empty. */ + if (reclaim) { + struct page *pud_page; + + pud = (pud_t *)p4d_page_vaddr(*p4d); + for (i = 0; i < PTRS_PER_PUD; i++) + if (!pud || !pud_none(pud[i])) + goto pud_out; + + pud_page = p4d_page(*p4d); + pud_page->freelist = freelist; + freelist = pud_page; + p4d_clear(p4d); + domain_flush_cache(domain, p4d, sizeof(p4d_t)); + } + +pud_out: + return freelist; +} + +static struct page * +mmunmap_p4d_range(struct dmar_domain *domain, pgd_t *pgd, + unsigned long addr, unsigned long end, + struct page *freelist, bool reclaim) +{ + p4d_t *p4d; + unsigned long start, next; + + start = addr; + p4d = p4d_offset(pgd, addr); + do { + next = p4d_addr_end(addr, end); + if (p4d_none_or_clear_bad(p4d)) + continue; + freelist = mmunmap_pud_range(domain, p4d, addr, next, + freelist, reclaim); + } while (p4d++, addr = next, addr != end); + + /* Add page to free list if all entries are empty. */ + if (cpu_feature_enabled(X86_FEATURE_LA57) && reclaim) { + struct page *p4d_page; + int i; + + p4d = (p4d_t *)pgd_page_vaddr(*pgd); + for (i = 0; i < PTRS_PER_P4D; i++) + if (!p4d || !p4d_none(p4d[i])) + goto p4d_out; + + p4d_page = pgd_page(*pgd); + p4d_page->freelist = freelist; + freelist = p4d_page; + pgd_clear(pgd); + domain_flush_cache(domain, pgd, sizeof(pgd_t)); + } + +p4d_out: + return freelist; +} + +struct page * +intel_mmunmap_range(struct dmar_domain *domain, + unsigned long addr, unsigned long end) +{ + pgd_t *pgd; + unsigned long next; + struct page *freelist = NULL; + + trace_domain_mm_unmap(domain, addr, end); + + BUG_ON(addr >= end); + pgd = pgd_offset_pgd(domain->pgd, addr); + do { + next = pgd_addr_end(addr, end); + if (pgd_none_or_clear_bad(pgd)) + continue; + freelist = mmunmap_p4d_range(domain, pgd, addr, next, + freelist, !addr); + } while (pgd++, addr = next, addr != end); + + return freelist; +} +#endif /* CONFIG_X86 */ diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 3ee694d4f361..044a91fa5431 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -489,7 +489,8 @@ struct dmar_domain { struct list_head auxd; /* link to device's auxiliary list */ struct iova_domain iovad; /* iova's that belong to this domain */ - struct dma_pte *pgd; /* virtual address */ + void *pgd; /* virtual address */ + spinlock_t page_table_lock; /* Protects page tables */ int gaw; /* max guest address width */ /* adjusted guest address width, 0 is level 2 30-bit */ @@ -662,6 +663,27 @@ int for_each_device_domain(int (*fn)(struct device_domain_info *info, void iommu_flush_write_buffer(struct intel_iommu *iommu); int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev); +#ifdef CONFIG_X86 +int intel_mmmap_range(struct dmar_domain *domain, unsigned long addr, + unsigned long end, phys_addr_t phys_addr, int dma_prot); +struct page *intel_mmunmap_range(struct dmar_domain *domain, + unsigned long addr, unsigned long end); +#else +static inline int +intel_mmmap_range(struct dmar_domain *domain, unsigned long addr, + unsigned long end, phys_addr_t phys_addr, int dma_prot) +{ + return -ENODEV; +} + +static inline struct page * +intel_mmunmap_range(struct dmar_domain *domain, + unsigned long addr, unsigned long end) +{ + return NULL; +} +#endif + #ifdef CONFIG_INTEL_IOMMU_SVM int intel_svm_init(struct intel_iommu *iommu); extern int intel_svm_enable_prq(struct intel_iommu *iommu); diff --git a/include/trace/events/intel_iommu.h b/include/trace/events/intel_iommu.h index 54e61d456cdf..e8c95290fd13 100644 --- a/include/trace/events/intel_iommu.h +++ b/include/trace/events/intel_iommu.h @@ -99,6 +99,66 @@ DEFINE_EVENT(dma_unmap, bounce_unmap_single, TP_ARGS(dev, dev_addr, size) ); +DECLARE_EVENT_CLASS(domain_map, + TP_PROTO(struct dmar_domain *domain, unsigned long addr, + unsigned long end, phys_addr_t phys_addr), + + TP_ARGS(domain, addr, end, phys_addr), + + TP_STRUCT__entry( + __field(struct dmar_domain *, domain) + __field(unsigned long, addr) + __field(unsigned long, end) + __field(phys_addr_t, phys_addr) + ), + + TP_fast_assign( + __entry->domain = domain; + __entry->addr = addr; + __entry->end = end; + __entry->phys_addr = phys_addr; + ), + + TP_printk("domain=%p addr=0x%lx end=0x%lx phys_addr=0x%llx", + __entry->domain, __entry->addr, __entry->end, + (unsigned long long)__entry->phys_addr) +); + +DEFINE_EVENT(domain_map, domain_mm_map, + TP_PROTO(struct dmar_domain *domain, unsigned long addr, + unsigned long end, phys_addr_t phys_addr), + + TP_ARGS(domain, addr, end, phys_addr) +); + +DECLARE_EVENT_CLASS(domain_unmap, + TP_PROTO(struct dmar_domain *domain, unsigned long addr, + unsigned long end), + + TP_ARGS(domain, addr, end), + + TP_STRUCT__entry( + __field(struct dmar_domain *, domain) + __field(unsigned long, addr) + __field(unsigned long, end) + ), + + TP_fast_assign( + __entry->domain = domain; + __entry->addr = addr; + __entry->end = end; + ), + + TP_printk("domain=%p addr=0x%lx end=0x%lx", + __entry->domain, __entry->addr, __entry->end) +); + +DEFINE_EVENT(domain_unmap, domain_mm_unmap, + TP_PROTO(struct dmar_domain *domain, unsigned long addr, + unsigned long end), + + TP_ARGS(domain, addr, end) +); #endif /* _TRACE_INTEL_IOMMU_H */ /* This part must be outside protection */ From patchwork Mon Sep 23 12:24:53 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baolu Lu X-Patchwork-Id: 11156693 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4CA6613B1 for ; Mon, 23 Sep 2019 12:27:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2AD1C20882 for ; Mon, 23 Sep 2019 12:27:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732423AbfIWM12 (ORCPT ); Mon, 23 Sep 2019 08:27:28 -0400 Received: from mga06.intel.com ([134.134.136.31]:4334 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732410AbfIWM11 (ORCPT ); Mon, 23 Sep 2019 08:27:27 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 23 Sep 2019 05:27:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,539,1559545200"; d="scan'208";a="203116676" Received: from allen-box.sh.intel.com ([10.239.159.136]) by fmsmga001.fm.intel.com with ESMTP; 23 Sep 2019 05:27:24 -0700 From: Lu Baolu To: Joerg Roedel , David Woodhouse , Alex Williamson Cc: ashok.raj@intel.com, sanjay.k.kumar@intel.com, jacob.jun.pan@linux.intel.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, iommu@lists.linux-foundation.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Lu Baolu , Yi Sun Subject: [RFC PATCH 3/4] iommu/vt-d: Map/unmap domain with mmmap/mmunmap Date: Mon, 23 Sep 2019 20:24:53 +0800 Message-Id: <20190923122454.9888-4-baolu.lu@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190923122454.9888-1-baolu.lu@linux.intel.com> References: <20190923122454.9888-1-baolu.lu@linux.intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org If a dmar domain has DOMAIN_FLAG_FIRST_LEVEL_TRANS bit set in its flags, IOMMU will use the first level page table for translation. Hence, we need to map or unmap addresses in the first level page table. Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian Cc: Liu Yi L Cc: Yi Sun Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 94 ++++++++++++++++++++++++++++++++----- 1 file changed, 82 insertions(+), 12 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 9cfe8098d993..103480016010 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -168,6 +168,11 @@ static inline unsigned long virt_to_dma_pfn(void *p) return page_to_dma_pfn(virt_to_page(p)); } +static inline unsigned long dma_pfn_to_addr(unsigned long pfn) +{ + return pfn << VTD_PAGE_SHIFT; +} + /* global iommu list, set NULL for ignored DMAR units */ static struct intel_iommu **g_iommus; @@ -307,6 +312,9 @@ static int hw_pass_through = 1; */ #define DOMAIN_FLAG_LOSE_CHILDREN BIT(1) +/* Domain uses first level translation for DMA remapping. */ +#define DOMAIN_FLAG_FIRST_LEVEL_TRANS BIT(2) + #define for_each_domain_iommu(idx, domain) \ for (idx = 0; idx < g_num_of_iommus; idx++) \ if (domain->iommu_refcnt[idx]) @@ -552,6 +560,11 @@ static inline int domain_type_is_si(struct dmar_domain *domain) return domain->flags & DOMAIN_FLAG_STATIC_IDENTITY; } +static inline int domain_type_is_flt(struct dmar_domain *domain) +{ + return domain->flags & DOMAIN_FLAG_FIRST_LEVEL_TRANS; +} + static inline int domain_pfn_supported(struct dmar_domain *domain, unsigned long pfn) { @@ -1147,8 +1160,15 @@ static struct page *domain_unmap(struct dmar_domain *domain, BUG_ON(start_pfn > last_pfn); /* we don't need lock here; nobody else touches the iova range */ - freelist = dma_pte_clear_level(domain, agaw_to_level(domain->agaw), - domain->pgd, 0, start_pfn, last_pfn, NULL); + if (domain_type_is_flt(domain)) + freelist = intel_mmunmap_range(domain, + dma_pfn_to_addr(start_pfn), + dma_pfn_to_addr(last_pfn + 1)); + else + freelist = dma_pte_clear_level(domain, + agaw_to_level(domain->agaw), + domain->pgd, 0, start_pfn, + last_pfn, NULL); /* free pgd */ if (start_pfn == 0 && last_pfn == DOMAIN_MAX_PFN(domain->gaw)) { @@ -2213,9 +2233,10 @@ static inline int hardware_largepage_caps(struct dmar_domain *domain, return level; } -static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, - struct scatterlist *sg, unsigned long phys_pfn, - unsigned long nr_pages, int prot) +static int +__domain_mapping_dma(struct dmar_domain *domain, unsigned long iov_pfn, + struct scatterlist *sg, unsigned long phys_pfn, + unsigned long nr_pages, int prot) { struct dma_pte *first_pte = NULL, *pte = NULL; phys_addr_t uninitialized_var(pteval); @@ -2223,13 +2244,6 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, unsigned int largepage_lvl = 0; unsigned long lvl_pages = 0; - BUG_ON(!domain_pfn_supported(domain, iov_pfn + nr_pages - 1)); - - if ((prot & (DMA_PTE_READ|DMA_PTE_WRITE)) == 0) - return -EINVAL; - - prot &= DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP; - if (!sg) { sg_res = nr_pages; pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | prot; @@ -2328,6 +2342,62 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, return 0; } +static int +__domain_mapping_mm(struct dmar_domain *domain, unsigned long iov_pfn, + struct scatterlist *sg, unsigned long phys_pfn, + unsigned long nr_pages, int prot) +{ + int ret = 0; + + if (!sg) + return intel_mmmap_range(domain, dma_pfn_to_addr(iov_pfn), + dma_pfn_to_addr(iov_pfn + nr_pages), + dma_pfn_to_addr(phys_pfn), prot); + + while (nr_pages > 0) { + unsigned long sg_pages, phys; + unsigned long pgoff = sg->offset & ~PAGE_MASK; + + sg_pages = aligned_nrpages(sg->offset, sg->length); + phys = sg_phys(sg) - pgoff; + + ret = intel_mmmap_range(domain, dma_pfn_to_addr(iov_pfn), + dma_pfn_to_addr(iov_pfn + sg_pages), + phys, prot); + if (ret) + break; + + sg->dma_address = ((dma_addr_t)dma_pfn_to_addr(iov_pfn)) + pgoff; + sg->dma_length = sg->length; + + nr_pages -= sg_pages; + iov_pfn += sg_pages; + sg = sg_next(sg); + } + + return ret; +} + +static int +__domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, + struct scatterlist *sg, unsigned long phys_pfn, + unsigned long nr_pages, int prot) +{ + BUG_ON(!domain_pfn_supported(domain, iov_pfn + nr_pages - 1)); + + if ((prot & (DMA_PTE_READ|DMA_PTE_WRITE)) == 0) + return -EINVAL; + + prot &= DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP; + + if (domain_type_is_flt(domain)) + return __domain_mapping_mm(domain, iov_pfn, sg, + phys_pfn, nr_pages, prot); + else + return __domain_mapping_dma(domain, iov_pfn, sg, + phys_pfn, nr_pages, prot); +} + static int domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, struct scatterlist *sg, unsigned long phys_pfn, unsigned long nr_pages, int prot) From patchwork Mon Sep 23 12:24:54 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Baolu Lu X-Patchwork-Id: 11156695 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 851C514ED for ; Mon, 23 Sep 2019 12:27:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6B49220867 for ; Mon, 23 Sep 2019 12:27:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2393628AbfIWM1d (ORCPT ); Mon, 23 Sep 2019 08:27:33 -0400 Received: from mga06.intel.com ([134.134.136.31]:4334 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732441AbfIWM1c (ORCPT ); Mon, 23 Sep 2019 08:27:32 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 23 Sep 2019 05:27:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,539,1559545200"; d="scan'208";a="203116684" Received: from allen-box.sh.intel.com ([10.239.159.136]) by fmsmga001.fm.intel.com with ESMTP; 23 Sep 2019 05:27:26 -0700 From: Lu Baolu To: Joerg Roedel , David Woodhouse , Alex Williamson Cc: ashok.raj@intel.com, sanjay.k.kumar@intel.com, jacob.jun.pan@linux.intel.com, kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com, iommu@lists.linux-foundation.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Lu Baolu , Yi Sun Subject: [RFC PATCH 4/4] iommu/vt-d: Identify domains using first level page table Date: Mon, 23 Sep 2019 20:24:54 +0800 Message-Id: <20190923122454.9888-5-baolu.lu@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190923122454.9888-1-baolu.lu@linux.intel.com> References: <20190923122454.9888-1-baolu.lu@linux.intel.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org This checks whether a domain should use first level page table for map/unmap. And if so, we should attach the domain to the device in first level translation mode. Cc: Ashok Raj Cc: Jacob Pan Cc: Kevin Tian Cc: Liu Yi L Cc: Yi Sun Cc: Sanjay Kumar Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 41 ++++++++++++++++++++++++++++++++++--- 1 file changed, 38 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 103480016010..d539e6a6c3dd 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -1722,6 +1722,26 @@ static void free_dmar_iommu(struct intel_iommu *iommu) #endif } +/* + * Check and return whether first level is used by default for + * DMA translation. + */ +static bool first_level_by_default(void) +{ + struct dmar_drhd_unit *drhd; + struct intel_iommu *iommu; + + rcu_read_lock(); + for_each_active_iommu(iommu, drhd) + if (!sm_supported(iommu) || + !ecap_flts(iommu->ecap) || + !cap_caching_mode(iommu->cap)) + return false; + rcu_read_unlock(); + + return true; +} + static struct dmar_domain *alloc_domain(int flags) { struct dmar_domain *domain; @@ -1736,6 +1756,9 @@ static struct dmar_domain *alloc_domain(int flags) domain->has_iotlb_device = false; INIT_LIST_HEAD(&domain->devices); + if (first_level_by_default()) + domain->flags |= DOMAIN_FLAG_FIRST_LEVEL_TRANS; + return domain; } @@ -2625,6 +2648,11 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu, if (hw_pass_through && domain_type_is_si(domain)) ret = intel_pasid_setup_pass_through(iommu, domain, dev, PASID_RID2PASID); + else if (domain_type_is_flt(domain)) + ret = intel_pasid_setup_first_level(iommu, dev, + domain->pgd, PASID_RID2PASID, + domain->iommu_did[iommu->seq_id], + PASID_FLAG_SUPERVISOR_MODE); else ret = intel_pasid_setup_second_level(iommu, domain, dev, PASID_RID2PASID); @@ -5349,8 +5377,14 @@ static int aux_domain_add_dev(struct dmar_domain *domain, goto attach_failed; /* Setup the PASID entry for mediated devices: */ - ret = intel_pasid_setup_second_level(iommu, domain, dev, - domain->default_pasid); + if (domain_type_is_flt(domain)) + ret = intel_pasid_setup_first_level(iommu, dev, + domain->pgd, domain->default_pasid, + domain->iommu_did[iommu->seq_id], + PASID_FLAG_SUPERVISOR_MODE); + else + ret = intel_pasid_setup_second_level(iommu, domain, dev, + domain->default_pasid); if (ret) goto table_failed; spin_unlock(&iommu->lock); @@ -5583,7 +5617,8 @@ static phys_addr_t intel_iommu_iova_to_phys(struct iommu_domain *domain, int level = 0; u64 phys = 0; - if (dmar_domain->flags & DOMAIN_FLAG_LOSE_CHILDREN) + if ((dmar_domain->flags & DOMAIN_FLAG_LOSE_CHILDREN) || + (dmar_domain->flags & DOMAIN_FLAG_FIRST_LEVEL_TRANS)) return 0; pte = pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT, &level);