diff mbox series

[v3,4/6] iommu/vt-d: Setup pasid entries for iova over first level

Message ID 20191211021219.8997-5-baolu.lu@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series Use 1st-level for IOVA translation | expand

Commit Message

Baolu Lu Dec. 11, 2019, 2:12 a.m. UTC
Intel VT-d in scalable mode supports two types of page tables for
IOVA translation: first level and second level. The IOMMU driver
can choose one from both for IOVA translation according to the use
case. This sets up the pasid entry if a domain is selected to use
the first-level page table for iova translation.

Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
 drivers/iommu/intel-iommu.c | 48 +++++++++++++++++++++++++++++++++++--
 include/linux/intel-iommu.h | 10 ++++----
 2 files changed, 52 insertions(+), 6 deletions(-)

Comments

Yi Liu Dec. 13, 2019, 9:23 a.m. UTC | #1
Hi Allen,

> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf
> Of Lu Baolu
> Sent: Wednesday, December 11, 2019 10:12 AM
> Subject: [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level
> 
> Intel VT-d in scalable mode supports two types of page tables for IOVA translation:
> first level and second level. The IOMMU driver can choose one from both for IOVA
> translation according to the use case. This sets up the pasid entry if a domain is
> selected to use the first-level page table for iova translation.
> 
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel-iommu.c | 48 +++++++++++++++++++++++++++++++++++--
>  include/linux/intel-iommu.h | 10 ++++----
>  2 files changed, 52 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index
> 2b5a47584baf..83a7abf0c4f0 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -571,6 +571,11 @@ static inline int domain_type_is_si(struct dmar_domain
> *domain)
>  	return domain->flags & DOMAIN_FLAG_STATIC_IDENTITY;  }
> 
> +static inline bool domain_use_first_level(struct dmar_domain *domain) {
> +	return domain->flags & DOMAIN_FLAG_USE_FIRST_LEVEL; }
> +
>  static inline int domain_pfn_supported(struct dmar_domain *domain,
>  				       unsigned long pfn)
>  {
> @@ -2288,6 +2293,8 @@ static int __domain_mapping(struct dmar_domain
> *domain, unsigned long iov_pfn,
>  		return -EINVAL;
> 
>  	prot &= DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP;
> +	if (domain_use_first_level(domain))
> +		prot |= DMA_FL_PTE_PRESENT;

For DMA_PTE_SNP bit, I think there needs some work. The bit 11 of prot
should be cleared when FLPT is used for IOVA.

Also, we need to set bit 63 "XD" properly. e.g. If bit 11 of prot is set, it
means snoop required, then "XD" bit is "0". If bit 11 of prot is "0", it means
this domain is not snooping, so you may want to set "XD" bit as "1". With
such enhancement, I think IOVA over FLPT would have as less difference
with IOVA over SLPT.

Regards,
Yi Liu
Baolu Lu Dec. 14, 2019, 3:03 a.m. UTC | #2
Hi Liu Yi,

Thanks for reviewing my patch.

On 12/13/19 5:23 PM, Liu, Yi L wrote:
>> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On Behalf
>> Of Lu Baolu
>> Sent: Wednesday, December 11, 2019 10:12 AM
>> Subject: [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level
>>
>> Intel VT-d in scalable mode supports two types of page tables for IOVA translation:
>> first level and second level. The IOMMU driver can choose one from both for IOVA
>> translation according to the use case. This sets up the pasid entry if a domain is
>> selected to use the first-level page table for iova translation.
>>
>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
>> ---
>>   drivers/iommu/intel-iommu.c | 48 +++++++++++++++++++++++++++++++++++--
>>   include/linux/intel-iommu.h | 10 ++++----
>>   2 files changed, 52 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index
>> 2b5a47584baf..83a7abf0c4f0 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -571,6 +571,11 @@ static inline int domain_type_is_si(struct dmar_domain
>> *domain)
>>   	return domain->flags & DOMAIN_FLAG_STATIC_IDENTITY;  }
>>
>> +static inline bool domain_use_first_level(struct dmar_domain *domain) {
>> +	return domain->flags & DOMAIN_FLAG_USE_FIRST_LEVEL; }
>> +
>>   static inline int domain_pfn_supported(struct dmar_domain *domain,
>>   				       unsigned long pfn)
>>   {
>> @@ -2288,6 +2293,8 @@ static int __domain_mapping(struct dmar_domain
>> *domain, unsigned long iov_pfn,
>>   		return -EINVAL;
>>
>>   	prot &= DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP;
>> +	if (domain_use_first_level(domain))
>> +		prot |= DMA_FL_PTE_PRESENT;
> 
> For DMA_PTE_SNP bit, I think there needs some work. The bit 11 of prot
> should be cleared when FLPT is used for IOVA.

SNP (bit 11) is only for second level. This bit is ignored for first
level page table walk. We should clear this bit for first level anyway.

> 
> Also, we need to set bit 63 "XD" properly. e.g. If bit 11 of prot is set, it
> means snoop required, then "XD" bit is "0". If bit 11 of prot is "0", it means
> this domain is not snooping, so you may want to set "XD" bit as "1". With
> such enhancement, I think IOVA over FLPT would have as less difference
> with IOVA over SLPT.

XD (bit 63) is only for the first level, and SNP (bit 11) is only for
second level, right? I think we need to always set XD bit for IOVA over
FL case. thoughts?

Best regards,
baolu
Yi Liu Dec. 15, 2019, 9:37 a.m. UTC | #3
Hi Baolu,

> From: Lu Baolu [mailto:baolu.lu@linux.intel.com]
> Sent: Saturday, December 14, 2019 11:04 AM
> To: Liu, Yi L <yi.l.liu@intel.com>; Joerg Roedel <joro@8bytes.org>; David
> Woodhouse <dwmw2@infradead.org>; Alex Williamson
> <alex.williamson@redhat.com>
> Subject: Re: [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level
> 
> Hi Liu Yi,
> 
> Thanks for reviewing my patch.
> 
> On 12/13/19 5:23 PM, Liu, Yi L wrote:
> >> From: kvm-owner@vger.kernel.org [mailto:kvm-owner@vger.kernel.org] On
> >> Behalf Of Lu Baolu
> >> Sent: Wednesday, December 11, 2019 10:12 AM
> >> Subject: [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over
> >> first level
> >>
> >> Intel VT-d in scalable mode supports two types of page tables for IOVA
> translation:
> >> first level and second level. The IOMMU driver can choose one from
> >> both for IOVA translation according to the use case. This sets up the
> >> pasid entry if a domain is selected to use the first-level page table for iova
> translation.
> >>
> >> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> >> ---
> >>   drivers/iommu/intel-iommu.c | 48 +++++++++++++++++++++++++++++++++++--
> >>   include/linux/intel-iommu.h | 10 ++++----
> >>   2 files changed, 52 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/iommu/intel-iommu.c
> >> b/drivers/iommu/intel-iommu.c index
> >> 2b5a47584baf..83a7abf0c4f0 100644
> >> --- a/drivers/iommu/intel-iommu.c
> >> +++ b/drivers/iommu/intel-iommu.c
> >> @@ -571,6 +571,11 @@ static inline int domain_type_is_si(struct
> >> dmar_domain
> >> *domain)
> >>   	return domain->flags & DOMAIN_FLAG_STATIC_IDENTITY;  }
> >>
> >> +static inline bool domain_use_first_level(struct dmar_domain *domain) {
> >> +	return domain->flags & DOMAIN_FLAG_USE_FIRST_LEVEL; }
> >> +
> >>   static inline int domain_pfn_supported(struct dmar_domain *domain,
> >>   				       unsigned long pfn)
> >>   {
> >> @@ -2288,6 +2293,8 @@ static int __domain_mapping(struct dmar_domain
> >> *domain, unsigned long iov_pfn,
> >>   		return -EINVAL;
> >>
> >>   	prot &= DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP;
> >> +	if (domain_use_first_level(domain))
> >> +		prot |= DMA_FL_PTE_PRESENT;
> >
> > For DMA_PTE_SNP bit, I think there needs some work. The bit 11 of prot
> > should be cleared when FLPT is used for IOVA.
> 
> SNP (bit 11) is only for second level. This bit is ignored for first level page table walk.
> We should clear this bit for first level anyway.

I think this is what I meant above? This patch somehow misses the operation
to clear the bit 11.

> >
> > Also, we need to set bit 63 "XD" properly. e.g. If bit 11 of prot is
> > set, it means snoop required, then "XD" bit is "0". If bit 11 of prot
> > is "0", it means this domain is not snooping, so you may want to set
> > "XD" bit as "1". With such enhancement, I think IOVA over FLPT would
> > have as less difference with IOVA over SLPT.
> 
> XD (bit 63) is only for the first level, and SNP (bit 11) is only for second level, right? I
> think we need to always set XD bit for IOVA over FL case. thoughts?

Oops, I made a mistake here. Please forget SNP bit, there is no way to control SNP
with first level page table.:-)

Actually, it is execute (bit 1) of second level page table which I wanted to say.
If software sets R/W/X permission to an IOVA, with IOVA over second level
page table, it will set bit 1. However, if IOVA is over first level page table, it
may need to clear XD bit. This is what I want to say here. If IOVA doesn’t allow
execute permission, it's ok to always set XD bit for IOVA over FL case. But I
would like to do it just as what we did for R/W permission. R/W permission
relies on the permission configured by the page map caller. right?

Regards,
Yi Liu
Baolu Lu Dec. 17, 2019, 2:03 a.m. UTC | #4
Hi Yi,

On 12/15/19 5:37 PM, Liu, Yi L wrote:
>> XD (bit 63) is only for the first level, and SNP (bit 11) is only for second level, right? I
>> think we need to always set XD bit for IOVA over FL case. thoughts?
> Oops, I made a mistake here. Please forget SNP bit, there is no way to control SNP
> with first level page table.:-)
> 
> Actually, it is execute (bit 1) of second level page table which I wanted to say.
> If software sets R/W/X permission to an IOVA, with IOVA over second level
> page table, it will set bit 1. However, if IOVA is over first level page table, it
> may need to clear XD bit. This is what I want to say here. If IOVA doesn’t allow
> execute permission, it's ok to always set XD bit for IOVA over FL case. But I
> would like to do it just as what we did for R/W permission. R/W permission
> relies on the permission configured by the page map caller. right?

Got your point.

Current driver always cleard X (bit 2) in the second level page table.
So we will always set XD bit (bit 63) in the first level page table.
If we decide to use the X permission, we need a separated patch, right?

Best regards,
baolu
Yi Liu Dec. 17, 2019, 2:33 a.m. UTC | #5
> From: Lu Baolu < baolu.lu@linux.intel.com >
> Sent: Tuesday, December 17, 2019 10:04 AM
> To: Liu, Yi L <yi.l.liu@intel.com>; Joerg Roedel <joro@8bytes.org>; David
> Woodhouse <dwmw2@infradead.org>; Alex Williamson
> <alex.williamson@redhat.com>
> Subject: Re: [PATCH v3 4/6] iommu/vt-d: Setup pasid entries for iova over first level
> 
> Hi Yi,
> 
> On 12/15/19 5:37 PM, Liu, Yi L wrote:
> >> XD (bit 63) is only for the first level, and SNP (bit 11) is only for
> >> second level, right? I think we need to always set XD bit for IOVA over FL case.
> thoughts?
> > Oops, I made a mistake here. Please forget SNP bit, there is no way to
> > control SNP with first level page table.:-)
> >
> > Actually, it is execute (bit 1) of second level page table which I wanted to say.
> > If software sets R/W/X permission to an IOVA, with IOVA over second
> > level page table, it will set bit 1. However, if IOVA is over first
> > level page table, it may need to clear XD bit. This is what I want to
> > say here. If IOVA doesn’t allow execute permission, it's ok to always
> > set XD bit for IOVA over FL case. But I would like to do it just as
> > what we did for R/W permission. R/W permission relies on the permission
> configured by the page map caller. right?
> 
> Got your point.
> 
> Current driver always cleard X (bit 2) in the second level page table.
> So we will always set XD bit (bit 63) in the first level page table.

yes, I also noticed X (bit 2) is not used in intel-iommu driver. So I
know why you set XD for IOVA over FL case. But it's a little bit weird
to hard code it. That's why I suggested to relay page map caller's
permission input.

> If we decide to use the X permission, we need a separated patch, right?

sure, it would be a separate patch since current code doesn’t apply
X permission.

Regards,
Yi Liu
diff mbox series

Patch

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 2b5a47584baf..83a7abf0c4f0 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -571,6 +571,11 @@  static inline int domain_type_is_si(struct dmar_domain *domain)
 	return domain->flags & DOMAIN_FLAG_STATIC_IDENTITY;
 }
 
+static inline bool domain_use_first_level(struct dmar_domain *domain)
+{
+	return domain->flags & DOMAIN_FLAG_USE_FIRST_LEVEL;
+}
+
 static inline int domain_pfn_supported(struct dmar_domain *domain,
 				       unsigned long pfn)
 {
@@ -2288,6 +2293,8 @@  static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
 		return -EINVAL;
 
 	prot &= DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP;
+	if (domain_use_first_level(domain))
+		prot |= DMA_FL_PTE_PRESENT;
 
 	if (!sg) {
 		sg_res = nr_pages;
@@ -2515,6 +2522,36 @@  dmar_search_domain_by_dev_info(int segment, int bus, int devfn)
 	return NULL;
 }
 
+static int domain_setup_first_level(struct intel_iommu *iommu,
+				    struct dmar_domain *domain,
+				    struct device *dev,
+				    int pasid)
+{
+	int flags = PASID_FLAG_SUPERVISOR_MODE;
+	struct dma_pte *pgd = domain->pgd;
+	int agaw, level;
+
+	/*
+	 * Skip top levels of page tables for iommu which has
+	 * less agaw than default. Unnecessary for PT mode.
+	 */
+	for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
+		pgd = phys_to_virt(dma_pte_addr(pgd));
+		if (!dma_pte_present(pgd))
+			return -ENOMEM;
+	}
+
+	level = agaw_to_level(agaw);
+	if (level != 4 && level != 5)
+		return -EINVAL;
+
+	flags |= (level == 5) ? PASID_FLAG_FL5LP : 0;
+
+	return intel_pasid_setup_first_level(iommu, dev, (pgd_t *)pgd, pasid,
+					     domain->iommu_did[iommu->seq_id],
+					     flags);
+}
+
 static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
 						    int bus, int devfn,
 						    struct device *dev,
@@ -2614,6 +2651,9 @@  static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
 		if (hw_pass_through && domain_type_is_si(domain))
 			ret = intel_pasid_setup_pass_through(iommu, domain,
 					dev, PASID_RID2PASID);
+		else if (domain_use_first_level(domain))
+			ret = domain_setup_first_level(iommu, domain, dev,
+					PASID_RID2PASID);
 		else
 			ret = intel_pasid_setup_second_level(iommu, domain,
 					dev, PASID_RID2PASID);
@@ -5369,8 +5409,12 @@  static int aux_domain_add_dev(struct dmar_domain *domain,
 		goto attach_failed;
 
 	/* Setup the PASID entry for mediated devices: */
-	ret = intel_pasid_setup_second_level(iommu, domain, dev,
-					     domain->default_pasid);
+	if (domain_use_first_level(domain))
+		ret = domain_setup_first_level(iommu, domain, dev,
+					       domain->default_pasid);
+	else
+		ret = intel_pasid_setup_second_level(iommu, domain, dev,
+						     domain->default_pasid);
 	if (ret)
 		goto table_failed;
 	spin_unlock(&iommu->lock);
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index aaece25c055f..66b525bad434 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -34,10 +34,12 @@ 
 #define VTD_STRIDE_SHIFT        (9)
 #define VTD_STRIDE_MASK         (((u64)-1) << VTD_STRIDE_SHIFT)
 
-#define DMA_PTE_READ (1)
-#define DMA_PTE_WRITE (2)
-#define DMA_PTE_LARGE_PAGE (1 << 7)
-#define DMA_PTE_SNP (1 << 11)
+#define DMA_PTE_READ		(1)
+#define DMA_PTE_WRITE		(2)
+#define DMA_PTE_LARGE_PAGE	(1 << 7)
+#define DMA_PTE_SNP		(1 << 11)
+
+#define DMA_FL_PTE_PRESENT	(1)
 
 #define CONTEXT_TT_MULTI_LEVEL	0
 #define CONTEXT_TT_DEV_IOTLB	1