Message ID | 20191211021219.8997-5-baolu.lu@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Use 1st-level for IOVA translation | expand |
Hi Allen, > From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf > Of Lu Baolu > Sent: Wednesday, December 11, 2019 10:12 AM > Subject: [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level > > Intel VT-d in scalable mode supports two types of page tables for IOVA translation: > first level and second level. The IOMMU driver can choose one from both for IOVA > translation according to the use case. This sets up the pasid entry if a domain is > selected to use the first-level page table for iova translation. > > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> > --- > drivers/iommu/intel-iommu.c | 48 +++++++++++++++++++++++++++++++++++-- > include/linux/intel-iommu.h | 10 ++++---- > 2 files changed, 52 insertions(+), 6 deletions(-) > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index > 2b5a47584baf..83a7abf0c4f0 100644 > --- a/drivers/iommu/intel-iommu.c > +++ b/drivers/iommu/intel-iommu.c > @@ -571,6 +571,11 @@ static inline int domain_type_is_si(struct dmar_domain > *domain) > return domain->flags & DOMAIN_FLAG_STATIC_IDENTITY; } > > +static inline bool domain_use_first_level(struct dmar_domain *domain) { > + return domain->flags & DOMAIN_FLAG_USE_FIRST_LEVEL; } > + > static inline int domain_pfn_supported(struct dmar_domain *domain, > unsigned long pfn) > { > @@ -2288,6 +2293,8 @@ static int __domain_mapping(struct dmar_domain > *domain, unsigned long iov_pfn, > return -EINVAL; > > prot &= DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP; > + if (domain_use_first_level(domain)) > + prot |= DMA_FL_PTE_PRESENT; For DMA_PTE_SNP bit, I think there needs some work. The bit 11 of prot should be cleared when FLPT is used for IOVA. Also, we need to set bit 63 "XD" properly. e.g. If bit 11 of prot is set, it means snoop required, then "XD" bit is "0". If bit 11 of prot is "0", it means this domain is not snooping, so you may want to set "XD" bit as "1". With such enhancement, I think IOVA over FLPT would have as less difference with IOVA over SLPT. Regards, Yi Liu
Hi Liu Yi, Thanks for reviewing my patch. On 12/13/19 5:23 PM, Liu, Yi L wrote: >> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf >> Of Lu Baolu >> Sent: Wednesday, December 11, 2019 10:12 AM >> Subject: [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level >> >> Intel VT-d in scalable mode supports two types of page tables for IOVA translation: >> first level and second level. The IOMMU driver can choose one from both for IOVA >> translation according to the use case. This sets up the pasid entry if a domain is >> selected to use the first-level page table for iova translation. >> >> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> >> --- >> drivers/iommu/intel-iommu.c | 48 +++++++++++++++++++++++++++++++++++-- >> include/linux/intel-iommu.h | 10 ++++---- >> 2 files changed, 52 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index >> 2b5a47584baf..83a7abf0c4f0 100644 >> --- a/drivers/iommu/intel-iommu.c >> +++ b/drivers/iommu/intel-iommu.c >> @@ -571,6 +571,11 @@ static inline int domain_type_is_si(struct dmar_domain >> *domain) >> return domain->flags & DOMAIN_FLAG_STATIC_IDENTITY; } >> >> +static inline bool domain_use_first_level(struct dmar_domain *domain) { >> + return domain->flags & DOMAIN_FLAG_USE_FIRST_LEVEL; } >> + >> static inline int domain_pfn_supported(struct dmar_domain *domain, >> unsigned long pfn) >> { >> @@ -2288,6 +2293,8 @@ static int __domain_mapping(struct dmar_domain >> *domain, unsigned long iov_pfn, >> return -EINVAL; >> >> prot &= DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP; >> + if (domain_use_first_level(domain)) >> + prot |= DMA_FL_PTE_PRESENT; > > For DMA_PTE_SNP bit, I think there needs some work. The bit 11 of prot > should be cleared when FLPT is used for IOVA. SNP (bit 11) is only for second level. This bit is ignored for first level page table walk. We should clear this bit for first level anyway. > > Also, we need to set bit 63 "XD" properly. e.g. If bit 11 of prot is set, it > means snoop required, then "XD" bit is "0". If bit 11 of prot is "0", it means > this domain is not snooping, so you may want to set "XD" bit as "1". With > such enhancement, I think IOVA over FLPT would have as less difference > with IOVA over SLPT. XD (bit 63) is only for the first level, and SNP (bit 11) is only for second level, right? I think we need to always set XD bit for IOVA over FL case. thoughts? Best regards, baolu
Hi Baolu, > From: Lu Baolu [mailto:baolu.lu@linux.intel.com] > Sent: Saturday, December 14, 2019 11:04 AM > To: Liu, Yi L <yi.l.liu@intel.com>; Joerg Roedel <joro@8bytes.org>; David > Woodhouse <dwmw2@infradead.org>; Alex Williamson > <alex.williamson@redhat.com> > Subject: Re: [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level > > Hi Liu Yi, > > Thanks for reviewing my patch. > > On 12/13/19 5:23 PM, Liu, Yi L wrote: > >> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On > >> Behalf Of Lu Baolu > >> Sent: Wednesday, December 11, 2019 10:12 AM > >> Subject: [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over > >> first level > >> > >> Intel VT-d in scalable mode supports two types of page tables for IOVA > translation: > >> first level and second level. The IOMMU driver can choose one from > >> both for IOVA translation according to the use case. This sets up the > >> pasid entry if a domain is selected to use the first-level page table for iova > translation. > >> > >> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> > >> --- > >> drivers/iommu/intel-iommu.c | 48 +++++++++++++++++++++++++++++++++++-- > >> include/linux/intel-iommu.h | 10 ++++---- > >> 2 files changed, 52 insertions(+), 6 deletions(-) > >> > >> diff --git a/drivers/iommu/intel-iommu.c > >> b/drivers/iommu/intel-iommu.c index > >> 2b5a47584baf..83a7abf0c4f0 100644 > >> --- a/drivers/iommu/intel-iommu.c > >> +++ b/drivers/iommu/intel-iommu.c > >> @@ -571,6 +571,11 @@ static inline int domain_type_is_si(struct > >> dmar_domain > >> *domain) > >> return domain->flags & DOMAIN_FLAG_STATIC_IDENTITY; } > >> > >> +static inline bool domain_use_first_level(struct dmar_domain *domain) { > >> + return domain->flags & DOMAIN_FLAG_USE_FIRST_LEVEL; } > >> + > >> static inline int domain_pfn_supported(struct dmar_domain *domain, > >> unsigned long pfn) > >> { > >> @@ -2288,6 +2293,8 @@ static int __domain_mapping(struct dmar_domain > >> *domain, unsigned long iov_pfn, > >> return -EINVAL; > >> > >> prot &= DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP; > >> + if (domain_use_first_level(domain)) > >> + prot |= DMA_FL_PTE_PRESENT; > > > > For DMA_PTE_SNP bit, I think there needs some work. The bit 11 of prot > > should be cleared when FLPT is used for IOVA. > > SNP (bit 11) is only for second level. This bit is ignored for first level page table walk. > We should clear this bit for first level anyway. I think this is what I meant above? This patch somehow misses the operation to clear the bit 11. > > > > Also, we need to set bit 63 "XD" properly. e.g. If bit 11 of prot is > > set, it means snoop required, then "XD" bit is "0". If bit 11 of prot > > is "0", it means this domain is not snooping, so you may want to set > > "XD" bit as "1". With such enhancement, I think IOVA over FLPT would > > have as less difference with IOVA over SLPT. > > XD (bit 63) is only for the first level, and SNP (bit 11) is only for second level, right? I > think we need to always set XD bit for IOVA over FL case. thoughts? Oops, I made a mistake here. Please forget SNP bit, there is no way to control SNP with first level page table.:-) Actually, it is execute (bit 1) of second level page table which I wanted to say. If software sets R/W/X permission to an IOVA, with IOVA over second level page table, it will set bit 1. However, if IOVA is over first level page table, it may need to clear XD bit. This is what I want to say here. If IOVA doesn’t allow execute permission, it's ok to always set XD bit for IOVA over FL case. But I would like to do it just as what we did for R/W permission. R/W permission relies on the permission configured by the page map caller. right? Regards, Yi Liu
Hi Yi, On 12/15/19 5:37 PM, Liu, Yi L wrote: >> XD (bit 63) is only for the first level, and SNP (bit 11) is only for second level, right? I >> think we need to always set XD bit for IOVA over FL case. thoughts? > Oops, I made a mistake here. Please forget SNP bit, there is no way to control SNP > with first level page table.:-) > > Actually, it is execute (bit 1) of second level page table which I wanted to say. > If software sets R/W/X permission to an IOVA, with IOVA over second level > page table, it will set bit 1. However, if IOVA is over first level page table, it > may need to clear XD bit. This is what I want to say here. If IOVA doesn’t allow > execute permission, it's ok to always set XD bit for IOVA over FL case. But I > would like to do it just as what we did for R/W permission. R/W permission > relies on the permission configured by the page map caller. right? Got your point. Current driver always cleard X (bit 2) in the second level page table. So we will always set XD bit (bit 63) in the first level page table. If we decide to use the X permission, we need a separated patch, right? Best regards, baolu
> From: Lu Baolu < baolu.lu@linux.intel.com > > Sent: Tuesday, December 17, 2019 10:04 AM > To: Liu, Yi L <yi.l.liu@intel.com>; Joerg Roedel <joro@8bytes.org>; David > Woodhouse <dwmw2@infradead.org>; Alex Williamson > <alex.williamson@redhat.com> > Subject: Re: [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level > > Hi Yi, > > On 12/15/19 5:37 PM, Liu, Yi L wrote: > >> XD (bit 63) is only for the first level, and SNP (bit 11) is only for > >> second level, right? I think we need to always set XD bit for IOVA over FL case. > thoughts? > > Oops, I made a mistake here. Please forget SNP bit, there is no way to > > control SNP with first level page table.:-) > > > > Actually, it is execute (bit 1) of second level page table which I wanted to say. > > If software sets R/W/X permission to an IOVA, with IOVA over second > > level page table, it will set bit 1. However, if IOVA is over first > > level page table, it may need to clear XD bit. This is what I want to > > say here. If IOVA doesn’t allow execute permission, it's ok to always > > set XD bit for IOVA over FL case. But I would like to do it just as > > what we did for R/W permission. R/W permission relies on the permission > configured by the page map caller. right? > > Got your point. > > Current driver always cleard X (bit 2) in the second level page table. > So we will always set XD bit (bit 63) in the first level page table. yes, I also noticed X (bit 2) is not used in intel-iommu driver. So I know why you set XD for IOVA over FL case. But it's a little bit weird to hard code it. That's why I suggested to relay page map caller's permission input. > If we decide to use the X permission, we need a separated patch, right? sure, it would be a separate patch since current code doesn’t apply X permission. Regards, Yi Liu
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 2b5a47584baf..83a7abf0c4f0 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -571,6 +571,11 @@ static inline int domain_type_is_si(struct dmar_domain *domain) return domain->flags & DOMAIN_FLAG_STATIC_IDENTITY; } +static inline bool domain_use_first_level(struct dmar_domain *domain) +{ + return domain->flags & DOMAIN_FLAG_USE_FIRST_LEVEL; +} + static inline int domain_pfn_supported(struct dmar_domain *domain, unsigned long pfn) { @@ -2288,6 +2293,8 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, return -EINVAL; prot &= DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP; + if (domain_use_first_level(domain)) + prot |= DMA_FL_PTE_PRESENT; if (!sg) { sg_res = nr_pages; @@ -2515,6 +2522,36 @@ dmar_search_domain_by_dev_info(int segment, int bus, int devfn) return NULL; } +static int domain_setup_first_level(struct intel_iommu *iommu, + struct dmar_domain *domain, + struct device *dev, + int pasid) +{ + int flags = PASID_FLAG_SUPERVISOR_MODE; + struct dma_pte *pgd = domain->pgd; + int agaw, level; + + /* + * Skip top levels of page tables for iommu which has + * less agaw than default. Unnecessary for PT mode. + */ + for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) { + pgd = phys_to_virt(dma_pte_addr(pgd)); + if (!dma_pte_present(pgd)) + return -ENOMEM; + } + + level = agaw_to_level(agaw); + if (level != 4 && level != 5) + return -EINVAL; + + flags |= (level == 5) ? PASID_FLAG_FL5LP : 0; + + return intel_pasid_setup_first_level(iommu, dev, (pgd_t *)pgd, pasid, + domain->iommu_did[iommu->seq_id], + flags); +} + static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu, int bus, int devfn, struct device *dev, @@ -2614,6 +2651,9 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu, if (hw_pass_through && domain_type_is_si(domain)) ret = intel_pasid_setup_pass_through(iommu, domain, dev, PASID_RID2PASID); + else if (domain_use_first_level(domain)) + ret = domain_setup_first_level(iommu, domain, dev, + PASID_RID2PASID); else ret = intel_pasid_setup_second_level(iommu, domain, dev, PASID_RID2PASID); @@ -5369,8 +5409,12 @@ static int aux_domain_add_dev(struct dmar_domain *domain, goto attach_failed; /* Setup the PASID entry for mediated devices: */ - ret = intel_pasid_setup_second_level(iommu, domain, dev, - domain->default_pasid); + if (domain_use_first_level(domain)) + ret = domain_setup_first_level(iommu, domain, dev, + domain->default_pasid); + else + ret = intel_pasid_setup_second_level(iommu, domain, dev, + domain->default_pasid); if (ret) goto table_failed; spin_unlock(&iommu->lock); diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index aaece25c055f..66b525bad434 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -34,10 +34,12 @@ #define VTD_STRIDE_SHIFT (9) #define VTD_STRIDE_MASK (((u64)-1) << VTD_STRIDE_SHIFT) -#define DMA_PTE_READ (1) -#define DMA_PTE_WRITE (2) -#define DMA_PTE_LARGE_PAGE (1 << 7) -#define DMA_PTE_SNP (1 << 11) +#define DMA_PTE_READ (1) +#define DMA_PTE_WRITE (2) +#define DMA_PTE_LARGE_PAGE (1 << 7) +#define DMA_PTE_SNP (1 << 11) + +#define DMA_FL_PTE_PRESENT (1) #define CONTEXT_TT_MULTI_LEVEL 0 #define CONTEXT_TT_DEV_IOTLB 1
Intel VT-d in scalable mode supports two types of page tables for IOVA translation: first level and second level. The IOMMU driver can choose one from both for IOVA translation according to the use case. This sets up the pasid entry if a domain is selected to use the first-level page table for iova translation. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> --- drivers/iommu/intel-iommu.c | 48 +++++++++++++++++++++++++++++++++++-- include/linux/intel-iommu.h | 10 ++++---- 2 files changed, 52 insertions(+), 6 deletions(-)