diff mbox

[RFC/PATCH,2/7] iommu-api: Add map_range/unmap_range functions

Message ID 1404147116-4598-3-git-send-email-ohaugan@codeaurora.org (mailing list archive)
State New, archived
Headers show

Commit Message

Olav Haugan June 30, 2014, 4:51 p.m. UTC
Mapping and unmapping are more often than not in the critical path.
map_range and unmap_range allows SMMU driver implementations to optimize
the process of mapping and unmapping buffers into the SMMU page tables.
Instead of mapping one physical address, do TLB operation (expensive),
mapping, do TLB operation, mapping, do TLB operation the driver can map
a scatter-gatherlist of physically contiguous pages into one virtual
address space and then at the end do one TLB operation.

Additionally, the mapping operation would be faster in general since
clients does not have to keep calling map API over and over again for
each physically contiguous chunk of memory that needs to be mapped to a
virtually contiguous region.

Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
---
 drivers/iommu/iommu.c | 24 ++++++++++++++++++++++++
 include/linux/iommu.h | 24 ++++++++++++++++++++++++
 2 files changed, 48 insertions(+)

Comments

Thierry Reding June 30, 2014, 7:42 p.m. UTC | #1
On Mon, Jun 30, 2014 at 09:51:51AM -0700, Olav Haugan wrote:
[...]
> +int iommu_map_range(struct iommu_domain *domain, unsigned int iova,
> +		    struct scatterlist *sg, unsigned int len, int prot)
> +{
> +	if (unlikely(domain->ops->map_range == NULL))
> +		return -ENODEV;

Should we perhaps make this mandatory? For drivers that don't provide it
we could implement a generic helper that wraps iommu_{map,unmap}().

> +
> +	BUG_ON(iova & (~PAGE_MASK));
> +
> +	return domain->ops->map_range(domain, iova, sg, len, prot);
> +}
> +EXPORT_SYMBOL_GPL(iommu_map_range);
> +
> +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova,
> +		      unsigned int len)
> +{
> +	if (unlikely(domain->ops->unmap_range == NULL))
> +		return -ENODEV;
> +
> +	BUG_ON(iova & (~PAGE_MASK));
> +
> +	return domain->ops->unmap_range(domain, iova, len);
> +}
> +EXPORT_SYMBOL_GPL(iommu_unmap_range);

Could these be renamed iommu_{map,unmap}_sg() instead to make it more
obvious what exactly they map? And perhaps this could take an sg_table
instead, which already provides a count and is a very common structure
used in drivers (and the DMA mapping API).

Thierry
Will Deacon July 1, 2014, 9:33 a.m. UTC | #2
Hi Olav,

On Mon, Jun 30, 2014 at 05:51:51PM +0100, Olav Haugan wrote:
> Mapping and unmapping are more often than not in the critical path.
> map_range and unmap_range allows SMMU driver implementations to optimize
> the process of mapping and unmapping buffers into the SMMU page tables.
> Instead of mapping one physical address, do TLB operation (expensive),
> mapping, do TLB operation, mapping, do TLB operation the driver can map
> a scatter-gatherlist of physically contiguous pages into one virtual
> address space and then at the end do one TLB operation.
> 
> Additionally, the mapping operation would be faster in general since
> clients does not have to keep calling map API over and over again for
> each physically contiguous chunk of memory that needs to be mapped to a
> virtually contiguous region.

I like the idea of this, although it does mean that drivers implementing the
range mapping functions need more featureful page-table manipulation code
than currently required.

For example, iommu_map uses iommu_pgsize to guarantee that mappings are
created in blocks of the largest support page size. This can be used to
simplify iterating in the SMMU driver (although the ARM SMMU driver doesn't
yet make use of this, I think Varun would add this when he adds support for
sections).

Given that we're really trying to kill the TLBI here, why not implement
something like iommu_unmap_nosync (unmap without DSB; TLBI) and iommu_sync
(DSB; TLBI) instead? If we guarantee that ranges must be unmapped before
being remapped, then there shouldn't be a TLBI on the map path anyway.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Varun Sethi July 1, 2014, 9:58 a.m. UTC | #3
> -----Original Message-----
> From: iommu-bounces@lists.linux-foundation.org [mailto:iommu-
> bounces@lists.linux-foundation.org] On Behalf Of Will Deacon
> Sent: Tuesday, July 01, 2014 3:04 PM
> To: Olav Haugan
> Cc: linux-arm-msm@vger.kernel.org; iommu@lists.linux-foundation.org;
> thierry.reding@gmail.com; vgandhi@codeaurora.org; linux-arm-
> kernel@lists.infradead.org
> Subject: Re: [RFC/PATCH 2/7] iommu-api: Add map_range/unmap_range
> functions
> 
> Hi Olav,
> 
> On Mon, Jun 30, 2014 at 05:51:51PM +0100, Olav Haugan wrote:
> > Mapping and unmapping are more often than not in the critical path.
> > map_range and unmap_range allows SMMU driver implementations to
> > optimize the process of mapping and unmapping buffers into the SMMU
> page tables.
> > Instead of mapping one physical address, do TLB operation (expensive),
> > mapping, do TLB operation, mapping, do TLB operation the driver can
> > map a scatter-gatherlist of physically contiguous pages into one
> > virtual address space and then at the end do one TLB operation.
> >
> > Additionally, the mapping operation would be faster in general since
> > clients does not have to keep calling map API over and over again for
> > each physically contiguous chunk of memory that needs to be mapped to
> > a virtually contiguous region.
> 
> I like the idea of this, although it does mean that drivers implementing
> the range mapping functions need more featureful page-table manipulation
> code than currently required.
> 
> For example, iommu_map uses iommu_pgsize to guarantee that mappings are
> created in blocks of the largest support page size. This can be used to
> simplify iterating in the SMMU driver (although the ARM SMMU driver
> doesn't yet make use of this, I think Varun would add this when he adds
> support for sections).
Yes, this would be supported.

-Varun
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hiroshi DOYU July 4, 2014, 4:29 a.m. UTC | #4
Hi Olav,

Olav Haugan <ohaugan@codeaurora.org> writes:

> Mapping and unmapping are more often than not in the critical path.
> map_range and unmap_range allows SMMU driver implementations to optimize
> the process of mapping and unmapping buffers into the SMMU page tables.
> Instead of mapping one physical address, do TLB operation (expensive),
> mapping, do TLB operation, mapping, do TLB operation the driver can map
> a scatter-gatherlist of physically contiguous pages into one virtual
> address space and then at the end do one TLB operation.
>
> Additionally, the mapping operation would be faster in general since
> clients does not have to keep calling map API over and over again for
> each physically contiguous chunk of memory that needs to be mapped to a
> virtually contiguous region.
>
> Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
> ---
>  drivers/iommu/iommu.c | 24 ++++++++++++++++++++++++
>  include/linux/iommu.h | 24 ++++++++++++++++++++++++
>  2 files changed, 48 insertions(+)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index e5555fc..f2a6b80 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -898,6 +898,30 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
>  EXPORT_SYMBOL_GPL(iommu_unmap);
>  
>  
> +int iommu_map_range(struct iommu_domain *domain, unsigned int iova,
> +		    struct scatterlist *sg, unsigned int len, int prot)
> +{
> +	if (unlikely(domain->ops->map_range == NULL))
> +		return -ENODEV;
> +
> +	BUG_ON(iova & (~PAGE_MASK));
> +
> +	return domain->ops->map_range(domain, iova, sg, len, prot);
> +}
> +EXPORT_SYMBOL_GPL(iommu_map_range);

We have the similar one internally, which is named, "iommu_map_sg()",
called from DMA API.

> +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova,
> +		      unsigned int len)
> +{
> +	if (unlikely(domain->ops->unmap_range == NULL))
> +		return -ENODEV;
> +
> +	BUG_ON(iova & (~PAGE_MASK));
> +
> +	return domain->ops->unmap_range(domain, iova, len);
> +}
> +EXPORT_SYMBOL_GPL(iommu_unmap_range);

Can the existing iommu_unmap() do the same?
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Olav Haugan July 8, 2014, 9:53 p.m. UTC | #5
Hi Hiroshi,

On 7/3/2014 9:29 PM, Hiroshi Doyu wrote:
> Hi Olav,
> 
> Olav Haugan <ohaugan@codeaurora.org> writes:
> 
>> Mapping and unmapping are more often than not in the critical path.
>> map_range and unmap_range allows SMMU driver implementations to optimize
>> the process of mapping and unmapping buffers into the SMMU page tables.
>> Instead of mapping one physical address, do TLB operation (expensive),
>> mapping, do TLB operation, mapping, do TLB operation the driver can map
>> a scatter-gatherlist of physically contiguous pages into one virtual
>> address space and then at the end do one TLB operation.
>>
>> Additionally, the mapping operation would be faster in general since
>> clients does not have to keep calling map API over and over again for
>> each physically contiguous chunk of memory that needs to be mapped to a
>> virtually contiguous region.
>>
>> Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
>> ---
>>  drivers/iommu/iommu.c | 24 ++++++++++++++++++++++++
>>  include/linux/iommu.h | 24 ++++++++++++++++++++++++
>>  2 files changed, 48 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index e5555fc..f2a6b80 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -898,6 +898,30 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
>>  EXPORT_SYMBOL_GPL(iommu_unmap);
>>  
>>  
>> +int iommu_map_range(struct iommu_domain *domain, unsigned int iova,
>> +		    struct scatterlist *sg, unsigned int len, int prot)
>> +{
>> +	if (unlikely(domain->ops->map_range == NULL))
>> +		return -ENODEV;
>> +
>> +	BUG_ON(iova & (~PAGE_MASK));
>> +
>> +	return domain->ops->map_range(domain, iova, sg, len, prot);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_map_range);
> 
> We have the similar one internally, which is named, "iommu_map_sg()",
> called from DMA API.

Great, so this new API will be useful to more people!

>> +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova,
>> +		      unsigned int len)
>> +{
>> +	if (unlikely(domain->ops->unmap_range == NULL))
>> +		return -ENODEV;
>> +
>> +	BUG_ON(iova & (~PAGE_MASK));
>> +
>> +	return domain->ops->unmap_range(domain, iova, len);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_unmap_range);
> 
> Can the existing iommu_unmap() do the same?

I believe iommu_unmap() behaves a bit differently because it will keep
on calling domain->ops->unmap() until everything is unmapped instead of
letting the iommu implementation take care of unmapping everything in
one call.

I am abandoning the patch series since our driver was not accepted.
However, if there are no objections I will resubmit this patch (PATCH
2/7) as an independent patch to add this new map_range API.

Thanks,

Olav Haugan
Rob Clark July 8, 2014, 11:49 p.m. UTC | #6
On Tue, Jul 8, 2014 at 5:53 PM, Olav Haugan <ohaugan@codeaurora.org> wrote:
> Hi Hiroshi,
>
> On 7/3/2014 9:29 PM, Hiroshi Doyu wrote:
>> Hi Olav,
>>
>> Olav Haugan <ohaugan@codeaurora.org> writes:
>>
>>> Mapping and unmapping are more often than not in the critical path.
>>> map_range and unmap_range allows SMMU driver implementations to optimize
>>> the process of mapping and unmapping buffers into the SMMU page tables.
>>> Instead of mapping one physical address, do TLB operation (expensive),
>>> mapping, do TLB operation, mapping, do TLB operation the driver can map
>>> a scatter-gatherlist of physically contiguous pages into one virtual
>>> address space and then at the end do one TLB operation.
>>>
>>> Additionally, the mapping operation would be faster in general since
>>> clients does not have to keep calling map API over and over again for
>>> each physically contiguous chunk of memory that needs to be mapped to a
>>> virtually contiguous region.
>>>
>>> Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
>>> ---
>>>  drivers/iommu/iommu.c | 24 ++++++++++++++++++++++++
>>>  include/linux/iommu.h | 24 ++++++++++++++++++++++++
>>>  2 files changed, 48 insertions(+)
>>>
>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>> index e5555fc..f2a6b80 100644
>>> --- a/drivers/iommu/iommu.c
>>> +++ b/drivers/iommu/iommu.c
>>> @@ -898,6 +898,30 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
>>>  EXPORT_SYMBOL_GPL(iommu_unmap);
>>>
>>>
>>> +int iommu_map_range(struct iommu_domain *domain, unsigned int iova,
>>> +                struct scatterlist *sg, unsigned int len, int prot)
>>> +{
>>> +    if (unlikely(domain->ops->map_range == NULL))
>>> +            return -ENODEV;
>>> +
>>> +    BUG_ON(iova & (~PAGE_MASK));
>>> +
>>> +    return domain->ops->map_range(domain, iova, sg, len, prot);
>>> +}
>>> +EXPORT_SYMBOL_GPL(iommu_map_range);
>>
>> We have the similar one internally, which is named, "iommu_map_sg()",
>> called from DMA API.
>
> Great, so this new API will be useful to more people!
>
>>> +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova,
>>> +                  unsigned int len)
>>> +{
>>> +    if (unlikely(domain->ops->unmap_range == NULL))
>>> +            return -ENODEV;
>>> +
>>> +    BUG_ON(iova & (~PAGE_MASK));
>>> +
>>> +    return domain->ops->unmap_range(domain, iova, len);
>>> +}
>>> +EXPORT_SYMBOL_GPL(iommu_unmap_range);
>>
>> Can the existing iommu_unmap() do the same?
>
> I believe iommu_unmap() behaves a bit differently because it will keep
> on calling domain->ops->unmap() until everything is unmapped instead of
> letting the iommu implementation take care of unmapping everything in
> one call.
>
> I am abandoning the patch series since our driver was not accepted.
> However, if there are no objections I will resubmit this patch (PATCH
> 2/7) as an independent patch to add this new map_range API.

+1 for map_range().. I've seen for gpu workloads, at least, it is the
downstream map_range() API is quite beneficial.   It was worth at
least a few fps in xonotic.

And, possibly getting off the subject a bit, but I was wondering about
the possibility of going one step further and batching up mapping
and/or unmapping multiple buffers (ranges) at once.  I have a pretty
convenient sync point in drm/msm to flush out multiple mappings before
kicking gpu.

BR,
-R

> Thanks,
>
> Olav Haugan
>
> --
> The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by The Linux Foundation
> --
> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Olav Haugan July 10, 2014, 12:03 a.m. UTC | #7
On 7/8/2014 4:49 PM, Rob Clark wrote:
> On Tue, Jul 8, 2014 at 5:53 PM, Olav Haugan <ohaugan@codeaurora.org> wrote:
>> Hi Hiroshi,
>>
>> On 7/3/2014 9:29 PM, Hiroshi Doyu wrote:
>>> Hi Olav,
>>>
>>> Olav Haugan <ohaugan@codeaurora.org> writes:
>>>
>>>> Mapping and unmapping are more often than not in the critical path.
>>>> map_range and unmap_range allows SMMU driver implementations to optimize
>>>> the process of mapping and unmapping buffers into the SMMU page tables.
>>>> Instead of mapping one physical address, do TLB operation (expensive),
>>>> mapping, do TLB operation, mapping, do TLB operation the driver can map
>>>> a scatter-gatherlist of physically contiguous pages into one virtual
>>>> address space and then at the end do one TLB operation.
>>>>
>>>> Additionally, the mapping operation would be faster in general since
>>>> clients does not have to keep calling map API over and over again for
>>>> each physically contiguous chunk of memory that needs to be mapped to a
>>>> virtually contiguous region.
>>>>
>>>> Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
>>>> ---
>>>>  drivers/iommu/iommu.c | 24 ++++++++++++++++++++++++
>>>>  include/linux/iommu.h | 24 ++++++++++++++++++++++++
>>>>  2 files changed, 48 insertions(+)
>>>>
>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>>> index e5555fc..f2a6b80 100644
>>>> --- a/drivers/iommu/iommu.c
>>>> +++ b/drivers/iommu/iommu.c
>>>> @@ -898,6 +898,30 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
>>>>  EXPORT_SYMBOL_GPL(iommu_unmap);
>>>>
>>>>
>>>> +int iommu_map_range(struct iommu_domain *domain, unsigned int iova,
>>>> +                struct scatterlist *sg, unsigned int len, int prot)
>>>> +{
>>>> +    if (unlikely(domain->ops->map_range == NULL))
>>>> +            return -ENODEV;
>>>> +
>>>> +    BUG_ON(iova & (~PAGE_MASK));
>>>> +
>>>> +    return domain->ops->map_range(domain, iova, sg, len, prot);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(iommu_map_range);
>>>
>>> We have the similar one internally, which is named, "iommu_map_sg()",
>>> called from DMA API.
>>
>> Great, so this new API will be useful to more people!
>>
>>>> +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova,
>>>> +                  unsigned int len)
>>>> +{
>>>> +    if (unlikely(domain->ops->unmap_range == NULL))
>>>> +            return -ENODEV;
>>>> +
>>>> +    BUG_ON(iova & (~PAGE_MASK));
>>>> +
>>>> +    return domain->ops->unmap_range(domain, iova, len);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(iommu_unmap_range);
>>>
>>> Can the existing iommu_unmap() do the same?
>>
>> I believe iommu_unmap() behaves a bit differently because it will keep
>> on calling domain->ops->unmap() until everything is unmapped instead of
>> letting the iommu implementation take care of unmapping everything in
>> one call.
>>
>> I am abandoning the patch series since our driver was not accepted.
>> However, if there are no objections I will resubmit this patch (PATCH
>> 2/7) as an independent patch to add this new map_range API.
> 
> +1 for map_range().. I've seen for gpu workloads, at least, it is the
> downstream map_range() API is quite beneficial.   It was worth at
> least a few fps in xonotic.
> 
> And, possibly getting off the subject a bit, but I was wondering about
> the possibility of going one step further and batching up mapping
> and/or unmapping multiple buffers (ranges) at once.  I have a pretty
> convenient sync point in drm/msm to flush out multiple mappings before
> kicking gpu.

I think you should be able to do that with this API already - at least
the mapping part since we are passing in a sg list (this could be a
chained sglist).

Thanks,

Olav
Rob Clark July 10, 2014, 12:40 a.m. UTC | #8
On Wed, Jul 9, 2014 at 8:03 PM, Olav Haugan <ohaugan@codeaurora.org> wrote:
> On 7/8/2014 4:49 PM, Rob Clark wrote:
>> On Tue, Jul 8, 2014 at 5:53 PM, Olav Haugan <ohaugan@codeaurora.org> wrote:
>>> Hi Hiroshi,
>>>
>>> On 7/3/2014 9:29 PM, Hiroshi Doyu wrote:
>>>> Hi Olav,
>>>>
>>>> Olav Haugan <ohaugan@codeaurora.org> writes:
>>>>
>>>>> Mapping and unmapping are more often than not in the critical path.
>>>>> map_range and unmap_range allows SMMU driver implementations to optimize
>>>>> the process of mapping and unmapping buffers into the SMMU page tables.
>>>>> Instead of mapping one physical address, do TLB operation (expensive),
>>>>> mapping, do TLB operation, mapping, do TLB operation the driver can map
>>>>> a scatter-gatherlist of physically contiguous pages into one virtual
>>>>> address space and then at the end do one TLB operation.
>>>>>
>>>>> Additionally, the mapping operation would be faster in general since
>>>>> clients does not have to keep calling map API over and over again for
>>>>> each physically contiguous chunk of memory that needs to be mapped to a
>>>>> virtually contiguous region.
>>>>>
>>>>> Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
>>>>> ---
>>>>>  drivers/iommu/iommu.c | 24 ++++++++++++++++++++++++
>>>>>  include/linux/iommu.h | 24 ++++++++++++++++++++++++
>>>>>  2 files changed, 48 insertions(+)
>>>>>
>>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>>>> index e5555fc..f2a6b80 100644
>>>>> --- a/drivers/iommu/iommu.c
>>>>> +++ b/drivers/iommu/iommu.c
>>>>> @@ -898,6 +898,30 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
>>>>>  EXPORT_SYMBOL_GPL(iommu_unmap);
>>>>>
>>>>>
>>>>> +int iommu_map_range(struct iommu_domain *domain, unsigned int iova,
>>>>> +                struct scatterlist *sg, unsigned int len, int prot)
>>>>> +{
>>>>> +    if (unlikely(domain->ops->map_range == NULL))
>>>>> +            return -ENODEV;
>>>>> +
>>>>> +    BUG_ON(iova & (~PAGE_MASK));
>>>>> +
>>>>> +    return domain->ops->map_range(domain, iova, sg, len, prot);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(iommu_map_range);
>>>>
>>>> We have the similar one internally, which is named, "iommu_map_sg()",
>>>> called from DMA API.
>>>
>>> Great, so this new API will be useful to more people!
>>>
>>>>> +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova,
>>>>> +                  unsigned int len)
>>>>> +{
>>>>> +    if (unlikely(domain->ops->unmap_range == NULL))
>>>>> +            return -ENODEV;
>>>>> +
>>>>> +    BUG_ON(iova & (~PAGE_MASK));
>>>>> +
>>>>> +    return domain->ops->unmap_range(domain, iova, len);
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(iommu_unmap_range);
>>>>
>>>> Can the existing iommu_unmap() do the same?
>>>
>>> I believe iommu_unmap() behaves a bit differently because it will keep
>>> on calling domain->ops->unmap() until everything is unmapped instead of
>>> letting the iommu implementation take care of unmapping everything in
>>> one call.
>>>
>>> I am abandoning the patch series since our driver was not accepted.
>>> However, if there are no objections I will resubmit this patch (PATCH
>>> 2/7) as an independent patch to add this new map_range API.
>>
>> +1 for map_range().. I've seen for gpu workloads, at least, it is the
>> downstream map_range() API is quite beneficial.   It was worth at
>> least a few fps in xonotic.
>>
>> And, possibly getting off the subject a bit, but I was wondering about
>> the possibility of going one step further and batching up mapping
>> and/or unmapping multiple buffers (ranges) at once.  I have a pretty
>> convenient sync point in drm/msm to flush out multiple mappings before
>> kicking gpu.
>
> I think you should be able to do that with this API already - at least
> the mapping part since we are passing in a sg list (this could be a
> chained sglist).

What I mean by batching up is mapping and unmapping multiple sglists
each at different iova's with minmal cpu cache and iommu tlb flushes..

Ideally we'd let the IOMMU driver be clever and build out all 2nd
level tables before inserting into first level tables (to minimize cpu
cache flushing).. also, there is probably a reasonable chance that
we'd be mapping a new buffer into existing location, so there might be
some potential to reuse existing 2nd level tables (and save a tiny bit
of free/alloc).  I've not thought too much about how that would look
in code.. might be kinda, umm, fun..

But at an API level, we should be able to do a bunch of
map/unmap_range's with one flush.

Maybe it could look like a sequence of iommu_{map,unmap}_range()
followed by iommu_flush()?

BR,
-R

> Thanks,
>
> Olav
>
> --
> The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thierry Reding July 10, 2014, 7:10 a.m. UTC | #9
On Wed, Jul 09, 2014 at 08:40:21PM -0400, Rob Clark wrote:
> On Wed, Jul 9, 2014 at 8:03 PM, Olav Haugan <ohaugan@codeaurora.org> wrote:
> > On 7/8/2014 4:49 PM, Rob Clark wrote:
> >> On Tue, Jul 8, 2014 at 5:53 PM, Olav Haugan <ohaugan@codeaurora.org> wrote:
> >>> Hi Hiroshi,
> >>>
> >>> On 7/3/2014 9:29 PM, Hiroshi Doyu wrote:
> >>>> Hi Olav,
> >>>>
> >>>> Olav Haugan <ohaugan@codeaurora.org> writes:
> >>>>
> >>>>> Mapping and unmapping are more often than not in the critical path.
> >>>>> map_range and unmap_range allows SMMU driver implementations to optimize
> >>>>> the process of mapping and unmapping buffers into the SMMU page tables.
> >>>>> Instead of mapping one physical address, do TLB operation (expensive),
> >>>>> mapping, do TLB operation, mapping, do TLB operation the driver can map
> >>>>> a scatter-gatherlist of physically contiguous pages into one virtual
> >>>>> address space and then at the end do one TLB operation.
> >>>>>
> >>>>> Additionally, the mapping operation would be faster in general since
> >>>>> clients does not have to keep calling map API over and over again for
> >>>>> each physically contiguous chunk of memory that needs to be mapped to a
> >>>>> virtually contiguous region.
> >>>>>
> >>>>> Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
> >>>>> ---
> >>>>>  drivers/iommu/iommu.c | 24 ++++++++++++++++++++++++
> >>>>>  include/linux/iommu.h | 24 ++++++++++++++++++++++++
> >>>>>  2 files changed, 48 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> >>>>> index e5555fc..f2a6b80 100644
> >>>>> --- a/drivers/iommu/iommu.c
> >>>>> +++ b/drivers/iommu/iommu.c
> >>>>> @@ -898,6 +898,30 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
> >>>>>  EXPORT_SYMBOL_GPL(iommu_unmap);
> >>>>>
> >>>>>
> >>>>> +int iommu_map_range(struct iommu_domain *domain, unsigned int iova,
> >>>>> +                struct scatterlist *sg, unsigned int len, int prot)
> >>>>> +{
> >>>>> +    if (unlikely(domain->ops->map_range == NULL))
> >>>>> +            return -ENODEV;
> >>>>> +
> >>>>> +    BUG_ON(iova & (~PAGE_MASK));
> >>>>> +
> >>>>> +    return domain->ops->map_range(domain, iova, sg, len, prot);
> >>>>> +}
> >>>>> +EXPORT_SYMBOL_GPL(iommu_map_range);
> >>>>
> >>>> We have the similar one internally, which is named, "iommu_map_sg()",
> >>>> called from DMA API.
> >>>
> >>> Great, so this new API will be useful to more people!
> >>>
> >>>>> +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova,
> >>>>> +                  unsigned int len)
> >>>>> +{
> >>>>> +    if (unlikely(domain->ops->unmap_range == NULL))
> >>>>> +            return -ENODEV;
> >>>>> +
> >>>>> +    BUG_ON(iova & (~PAGE_MASK));
> >>>>> +
> >>>>> +    return domain->ops->unmap_range(domain, iova, len);
> >>>>> +}
> >>>>> +EXPORT_SYMBOL_GPL(iommu_unmap_range);
> >>>>
> >>>> Can the existing iommu_unmap() do the same?
> >>>
> >>> I believe iommu_unmap() behaves a bit differently because it will keep
> >>> on calling domain->ops->unmap() until everything is unmapped instead of
> >>> letting the iommu implementation take care of unmapping everything in
> >>> one call.
> >>>
> >>> I am abandoning the patch series since our driver was not accepted.
> >>> However, if there are no objections I will resubmit this patch (PATCH
> >>> 2/7) as an independent patch to add this new map_range API.
> >>
> >> +1 for map_range().. I've seen for gpu workloads, at least, it is the
> >> downstream map_range() API is quite beneficial.   It was worth at
> >> least a few fps in xonotic.
> >>
> >> And, possibly getting off the subject a bit, but I was wondering about
> >> the possibility of going one step further and batching up mapping
> >> and/or unmapping multiple buffers (ranges) at once.  I have a pretty
> >> convenient sync point in drm/msm to flush out multiple mappings before
> >> kicking gpu.
> >
> > I think you should be able to do that with this API already - at least
> > the mapping part since we are passing in a sg list (this could be a
> > chained sglist).
> 
> What I mean by batching up is mapping and unmapping multiple sglists
> each at different iova's with minmal cpu cache and iommu tlb flushes..
> 
> Ideally we'd let the IOMMU driver be clever and build out all 2nd
> level tables before inserting into first level tables (to minimize cpu
> cache flushing).. also, there is probably a reasonable chance that
> we'd be mapping a new buffer into existing location, so there might be
> some potential to reuse existing 2nd level tables (and save a tiny bit
> of free/alloc).  I've not thought too much about how that would look
> in code.. might be kinda, umm, fun..
> 
> But at an API level, we should be able to do a bunch of
> map/unmap_range's with one flush.
> 
> Maybe it could look like a sequence of iommu_{map,unmap}_range()
> followed by iommu_flush()?

Doesn't that mean that the IOMMU driver would have to keep track of all
mappings until it sees an iommu_flush()? That sounds like it could be a
lot of work and complicated code.

Thierry
Rob Clark July 10, 2014, 11:15 a.m. UTC | #10
On Thu, Jul 10, 2014 at 3:10 AM, Thierry Reding
<thierry.reding@gmail.com> wrote:
> On Wed, Jul 09, 2014 at 08:40:21PM -0400, Rob Clark wrote:
>> On Wed, Jul 9, 2014 at 8:03 PM, Olav Haugan <ohaugan@codeaurora.org> wrote:
>> > On 7/8/2014 4:49 PM, Rob Clark wrote:
>> >> On Tue, Jul 8, 2014 at 5:53 PM, Olav Haugan <ohaugan@codeaurora.org> wrote:
>> >>> Hi Hiroshi,
>> >>>
>> >>> On 7/3/2014 9:29 PM, Hiroshi Doyu wrote:
>> >>>> Hi Olav,
>> >>>>
>> >>>> Olav Haugan <ohaugan@codeaurora.org> writes:
>> >>>>
>> >>>>> Mapping and unmapping are more often than not in the critical path.
>> >>>>> map_range and unmap_range allows SMMU driver implementations to optimize
>> >>>>> the process of mapping and unmapping buffers into the SMMU page tables.
>> >>>>> Instead of mapping one physical address, do TLB operation (expensive),
>> >>>>> mapping, do TLB operation, mapping, do TLB operation the driver can map
>> >>>>> a scatter-gatherlist of physically contiguous pages into one virtual
>> >>>>> address space and then at the end do one TLB operation.
>> >>>>>
>> >>>>> Additionally, the mapping operation would be faster in general since
>> >>>>> clients does not have to keep calling map API over and over again for
>> >>>>> each physically contiguous chunk of memory that needs to be mapped to a
>> >>>>> virtually contiguous region.
>> >>>>>
>> >>>>> Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
>> >>>>> ---
>> >>>>>  drivers/iommu/iommu.c | 24 ++++++++++++++++++++++++
>> >>>>>  include/linux/iommu.h | 24 ++++++++++++++++++++++++
>> >>>>>  2 files changed, 48 insertions(+)
>> >>>>>
>> >>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> >>>>> index e5555fc..f2a6b80 100644
>> >>>>> --- a/drivers/iommu/iommu.c
>> >>>>> +++ b/drivers/iommu/iommu.c
>> >>>>> @@ -898,6 +898,30 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
>> >>>>>  EXPORT_SYMBOL_GPL(iommu_unmap);
>> >>>>>
>> >>>>>
>> >>>>> +int iommu_map_range(struct iommu_domain *domain, unsigned int iova,
>> >>>>> +                struct scatterlist *sg, unsigned int len, int prot)
>> >>>>> +{
>> >>>>> +    if (unlikely(domain->ops->map_range == NULL))
>> >>>>> +            return -ENODEV;
>> >>>>> +
>> >>>>> +    BUG_ON(iova & (~PAGE_MASK));
>> >>>>> +
>> >>>>> +    return domain->ops->map_range(domain, iova, sg, len, prot);
>> >>>>> +}
>> >>>>> +EXPORT_SYMBOL_GPL(iommu_map_range);
>> >>>>
>> >>>> We have the similar one internally, which is named, "iommu_map_sg()",
>> >>>> called from DMA API.
>> >>>
>> >>> Great, so this new API will be useful to more people!
>> >>>
>> >>>>> +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova,
>> >>>>> +                  unsigned int len)
>> >>>>> +{
>> >>>>> +    if (unlikely(domain->ops->unmap_range == NULL))
>> >>>>> +            return -ENODEV;
>> >>>>> +
>> >>>>> +    BUG_ON(iova & (~PAGE_MASK));
>> >>>>> +
>> >>>>> +    return domain->ops->unmap_range(domain, iova, len);
>> >>>>> +}
>> >>>>> +EXPORT_SYMBOL_GPL(iommu_unmap_range);
>> >>>>
>> >>>> Can the existing iommu_unmap() do the same?
>> >>>
>> >>> I believe iommu_unmap() behaves a bit differently because it will keep
>> >>> on calling domain->ops->unmap() until everything is unmapped instead of
>> >>> letting the iommu implementation take care of unmapping everything in
>> >>> one call.
>> >>>
>> >>> I am abandoning the patch series since our driver was not accepted.
>> >>> However, if there are no objections I will resubmit this patch (PATCH
>> >>> 2/7) as an independent patch to add this new map_range API.
>> >>
>> >> +1 for map_range().. I've seen for gpu workloads, at least, it is the
>> >> downstream map_range() API is quite beneficial.   It was worth at
>> >> least a few fps in xonotic.
>> >>
>> >> And, possibly getting off the subject a bit, but I was wondering about
>> >> the possibility of going one step further and batching up mapping
>> >> and/or unmapping multiple buffers (ranges) at once.  I have a pretty
>> >> convenient sync point in drm/msm to flush out multiple mappings before
>> >> kicking gpu.
>> >
>> > I think you should be able to do that with this API already - at least
>> > the mapping part since we are passing in a sg list (this could be a
>> > chained sglist).
>>
>> What I mean by batching up is mapping and unmapping multiple sglists
>> each at different iova's with minmal cpu cache and iommu tlb flushes..
>>
>> Ideally we'd let the IOMMU driver be clever and build out all 2nd
>> level tables before inserting into first level tables (to minimize cpu
>> cache flushing).. also, there is probably a reasonable chance that
>> we'd be mapping a new buffer into existing location, so there might be
>> some potential to reuse existing 2nd level tables (and save a tiny bit
>> of free/alloc).  I've not thought too much about how that would look
>> in code.. might be kinda, umm, fun..
>>
>> But at an API level, we should be able to do a bunch of
>> map/unmap_range's with one flush.
>>
>> Maybe it could look like a sequence of iommu_{map,unmap}_range()
>> followed by iommu_flush()?
>
> Doesn't that mean that the IOMMU driver would have to keep track of all
> mappings until it sees an iommu_flush()? That sounds like it could be a
> lot of work and complicated code.

Well, depends on how elaborate you want to get.  If you don't want to
be too fancy, it may just be a matter of not doing TLB flush until
iommu_flush().  If you want to get fancy and minimize cpu flushes too,
then iommu driver would have to do some more tracking to build up a
transaction internally.  I'm not really sure how you avoid that.

I'm not quite sure how frequent it would be that separate buffers
touch the same 2nd level table, so it might be sufficient to treat it
like N map_range and unmap_range's followed by one TLB flush.  I
would, I think, need to implement a prototype or at least instrument
the iommu driver somehow to generate some statistics.

I've nearly got qcom-iommu-v0 working here on top of upstream + small
set of patches.. but once that is a bit more complete, experimenting
with some of this will be on my TODO list to see what amount of
crazy/complicated brings worthwhile performance benefits.

BR,
-R

> Thierry
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Olav Haugan July 10, 2014, 10:43 p.m. UTC | #11
On 7/9/2014 5:40 PM, Rob Clark wrote:
> On Wed, Jul 9, 2014 at 8:03 PM, Olav Haugan <ohaugan@codeaurora.org> wrote:
>> On 7/8/2014 4:49 PM, Rob Clark wrote:
>>> On Tue, Jul 8, 2014 at 5:53 PM, Olav Haugan <ohaugan@codeaurora.org> wrote:
>>>> Hi Hiroshi,
>>>>
>>>> On 7/3/2014 9:29 PM, Hiroshi Doyu wrote:
>>>>> Hi Olav,
>>>>>
>>>>> Olav Haugan <ohaugan@codeaurora.org> writes:
>>>>>
>>>>>> Mapping and unmapping are more often than not in the critical path.
>>>>>> map_range and unmap_range allows SMMU driver implementations to optimize
>>>>>> the process of mapping and unmapping buffers into the SMMU page tables.
>>>>>> Instead of mapping one physical address, do TLB operation (expensive),
>>>>>> mapping, do TLB operation, mapping, do TLB operation the driver can map
>>>>>> a scatter-gatherlist of physically contiguous pages into one virtual
>>>>>> address space and then at the end do one TLB operation.
>>>>>>
>>>>>> Additionally, the mapping operation would be faster in general since
>>>>>> clients does not have to keep calling map API over and over again for
>>>>>> each physically contiguous chunk of memory that needs to be mapped to a
>>>>>> virtually contiguous region.
>>>>>>
>>>>>> Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
>>>>>> ---
>>>>>>  drivers/iommu/iommu.c | 24 ++++++++++++++++++++++++
>>>>>>  include/linux/iommu.h | 24 ++++++++++++++++++++++++
>>>>>>  2 files changed, 48 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>>>>> index e5555fc..f2a6b80 100644
>>>>>> --- a/drivers/iommu/iommu.c
>>>>>> +++ b/drivers/iommu/iommu.c
>>>>>> @@ -898,6 +898,30 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
>>>>>>  EXPORT_SYMBOL_GPL(iommu_unmap);
>>>>>>
>>>>>>
>>>>>> +int iommu_map_range(struct iommu_domain *domain, unsigned int iova,
>>>>>> +                struct scatterlist *sg, unsigned int len, int prot)
>>>>>> +{
>>>>>> +    if (unlikely(domain->ops->map_range == NULL))
>>>>>> +            return -ENODEV;
>>>>>> +
>>>>>> +    BUG_ON(iova & (~PAGE_MASK));
>>>>>> +
>>>>>> +    return domain->ops->map_range(domain, iova, sg, len, prot);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL_GPL(iommu_map_range);
>>>>>
>>>>> We have the similar one internally, which is named, "iommu_map_sg()",
>>>>> called from DMA API.
>>>>
>>>> Great, so this new API will be useful to more people!
>>>>
>>>>>> +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova,
>>>>>> +                  unsigned int len)
>>>>>> +{
>>>>>> +    if (unlikely(domain->ops->unmap_range == NULL))
>>>>>> +            return -ENODEV;
>>>>>> +
>>>>>> +    BUG_ON(iova & (~PAGE_MASK));
>>>>>> +
>>>>>> +    return domain->ops->unmap_range(domain, iova, len);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL_GPL(iommu_unmap_range);
>>>>>
>>>>> Can the existing iommu_unmap() do the same?
>>>>
>>>> I believe iommu_unmap() behaves a bit differently because it will keep
>>>> on calling domain->ops->unmap() until everything is unmapped instead of
>>>> letting the iommu implementation take care of unmapping everything in
>>>> one call.
>>>>
>>>> I am abandoning the patch series since our driver was not accepted.
>>>> However, if there are no objections I will resubmit this patch (PATCH
>>>> 2/7) as an independent patch to add this new map_range API.
>>>
>>> +1 for map_range().. I've seen for gpu workloads, at least, it is the
>>> downstream map_range() API is quite beneficial.   It was worth at
>>> least a few fps in xonotic.
>>>
>>> And, possibly getting off the subject a bit, but I was wondering about
>>> the possibility of going one step further and batching up mapping
>>> and/or unmapping multiple buffers (ranges) at once.  I have a pretty
>>> convenient sync point in drm/msm to flush out multiple mappings before
>>> kicking gpu.
>>
>> I think you should be able to do that with this API already - at least
>> the mapping part since we are passing in a sg list (this could be a
>> chained sglist).
> 
> What I mean by batching up is mapping and unmapping multiple sglists
> each at different iova's with minmal cpu cache and iommu tlb flushes..
>
> Ideally we'd let the IOMMU driver be clever and build out all 2nd
> level tables before inserting into first level tables (to minimize cpu
> cache flushing).. also, there is probably a reasonable chance that
> we'd be mapping a new buffer into existing location, so there might be
> some potential to reuse existing 2nd level tables (and save a tiny bit
> of free/alloc).  I've not thought too much about how that would look
> in code.. might be kinda, umm, fun..
> 
> But at an API level, we should be able to do a bunch of
> map/unmap_range's with one flush.
> 
> Maybe it could look like a sequence of iommu_{map,unmap}_range()
> followed by iommu_flush()?
> 

So we could add another argument ("options") in the range api that
allows you to indicate whether you want to invalidate TLB or not.

Thanks,

Olav
Rob Clark July 10, 2014, 11:42 p.m. UTC | #12
On Thu, Jul 10, 2014 at 6:43 PM, Olav Haugan <ohaugan@codeaurora.org> wrote:
> On 7/9/2014 5:40 PM, Rob Clark wrote:
>> On Wed, Jul 9, 2014 at 8:03 PM, Olav Haugan <ohaugan@codeaurora.org> wrote:
>>> On 7/8/2014 4:49 PM, Rob Clark wrote:
>>>> On Tue, Jul 8, 2014 at 5:53 PM, Olav Haugan <ohaugan@codeaurora.org> wrote:
>>>>> Hi Hiroshi,
>>>>>
>>>>> On 7/3/2014 9:29 PM, Hiroshi Doyu wrote:
>>>>>> Hi Olav,
>>>>>>
>>>>>> Olav Haugan <ohaugan@codeaurora.org> writes:
>>>>>>
>>>>>>> Mapping and unmapping are more often than not in the critical path.
>>>>>>> map_range and unmap_range allows SMMU driver implementations to optimize
>>>>>>> the process of mapping and unmapping buffers into the SMMU page tables.
>>>>>>> Instead of mapping one physical address, do TLB operation (expensive),
>>>>>>> mapping, do TLB operation, mapping, do TLB operation the driver can map
>>>>>>> a scatter-gatherlist of physically contiguous pages into one virtual
>>>>>>> address space and then at the end do one TLB operation.
>>>>>>>
>>>>>>> Additionally, the mapping operation would be faster in general since
>>>>>>> clients does not have to keep calling map API over and over again for
>>>>>>> each physically contiguous chunk of memory that needs to be mapped to a
>>>>>>> virtually contiguous region.
>>>>>>>
>>>>>>> Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
>>>>>>> ---
>>>>>>>  drivers/iommu/iommu.c | 24 ++++++++++++++++++++++++
>>>>>>>  include/linux/iommu.h | 24 ++++++++++++++++++++++++
>>>>>>>  2 files changed, 48 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>>>>>> index e5555fc..f2a6b80 100644
>>>>>>> --- a/drivers/iommu/iommu.c
>>>>>>> +++ b/drivers/iommu/iommu.c
>>>>>>> @@ -898,6 +898,30 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
>>>>>>>  EXPORT_SYMBOL_GPL(iommu_unmap);
>>>>>>>
>>>>>>>
>>>>>>> +int iommu_map_range(struct iommu_domain *domain, unsigned int iova,
>>>>>>> +                struct scatterlist *sg, unsigned int len, int prot)
>>>>>>> +{
>>>>>>> +    if (unlikely(domain->ops->map_range == NULL))
>>>>>>> +            return -ENODEV;
>>>>>>> +
>>>>>>> +    BUG_ON(iova & (~PAGE_MASK));
>>>>>>> +
>>>>>>> +    return domain->ops->map_range(domain, iova, sg, len, prot);
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL_GPL(iommu_map_range);
>>>>>>
>>>>>> We have the similar one internally, which is named, "iommu_map_sg()",
>>>>>> called from DMA API.
>>>>>
>>>>> Great, so this new API will be useful to more people!
>>>>>
>>>>>>> +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova,
>>>>>>> +                  unsigned int len)
>>>>>>> +{
>>>>>>> +    if (unlikely(domain->ops->unmap_range == NULL))
>>>>>>> +            return -ENODEV;
>>>>>>> +
>>>>>>> +    BUG_ON(iova & (~PAGE_MASK));
>>>>>>> +
>>>>>>> +    return domain->ops->unmap_range(domain, iova, len);
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL_GPL(iommu_unmap_range);
>>>>>>
>>>>>> Can the existing iommu_unmap() do the same?
>>>>>
>>>>> I believe iommu_unmap() behaves a bit differently because it will keep
>>>>> on calling domain->ops->unmap() until everything is unmapped instead of
>>>>> letting the iommu implementation take care of unmapping everything in
>>>>> one call.
>>>>>
>>>>> I am abandoning the patch series since our driver was not accepted.
>>>>> However, if there are no objections I will resubmit this patch (PATCH
>>>>> 2/7) as an independent patch to add this new map_range API.
>>>>
>>>> +1 for map_range().. I've seen for gpu workloads, at least, it is the
>>>> downstream map_range() API is quite beneficial.   It was worth at
>>>> least a few fps in xonotic.
>>>>
>>>> And, possibly getting off the subject a bit, but I was wondering about
>>>> the possibility of going one step further and batching up mapping
>>>> and/or unmapping multiple buffers (ranges) at once.  I have a pretty
>>>> convenient sync point in drm/msm to flush out multiple mappings before
>>>> kicking gpu.
>>>
>>> I think you should be able to do that with this API already - at least
>>> the mapping part since we are passing in a sg list (this could be a
>>> chained sglist).
>>
>> What I mean by batching up is mapping and unmapping multiple sglists
>> each at different iova's with minmal cpu cache and iommu tlb flushes..
>>
>> Ideally we'd let the IOMMU driver be clever and build out all 2nd
>> level tables before inserting into first level tables (to minimize cpu
>> cache flushing).. also, there is probably a reasonable chance that
>> we'd be mapping a new buffer into existing location, so there might be
>> some potential to reuse existing 2nd level tables (and save a tiny bit
>> of free/alloc).  I've not thought too much about how that would look
>> in code.. might be kinda, umm, fun..
>>
>> But at an API level, we should be able to do a bunch of
>> map/unmap_range's with one flush.
>>
>> Maybe it could look like a sequence of iommu_{map,unmap}_range()
>> followed by iommu_flush()?
>>
>
> So we could add another argument ("options") in the range api that
> allows you to indicate whether you want to invalidate TLB or not.

sounds reasonable.. I'm pretty sure we want explict-flush to be an
opt-in behaviour.

BR,
-R

> Thanks,
>
> Olav
>
> --
> The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by The Linux Foundation
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Joerg Roedel July 11, 2014, 10:20 a.m. UTC | #13
On Mon, Jun 30, 2014 at 09:51:51AM -0700, Olav Haugan wrote:
> +int iommu_map_range(struct iommu_domain *domain, unsigned int iova,
> +		    struct scatterlist *sg, unsigned int len, int prot)
> +{
> +	if (unlikely(domain->ops->map_range == NULL))
> +		return -ENODEV;
> +
> +	BUG_ON(iova & (~PAGE_MASK));
> +
> +	return domain->ops->map_range(domain, iova, sg, len, prot);
> +}
> +EXPORT_SYMBOL_GPL(iommu_map_range);
> +
> +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova,
> +		      unsigned int len)
> +{
> +	if (unlikely(domain->ops->unmap_range == NULL))
> +		return -ENODEV;
> +
> +	BUG_ON(iova & (~PAGE_MASK));
> +
> +	return domain->ops->unmap_range(domain, iova, len);
> +}
> +EXPORT_SYMBOL_GPL(iommu_unmap_range);

Before introducing these new API functions there should be a fall-back
for IOMMU drivers that do (not yet) implement the map_range and
unmap_range call-backs.

The last thing we want is this kind of functional partitioning between
different IOMMU drivers.


	Joerg


--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Olav Haugan July 15, 2014, 1:13 a.m. UTC | #14
On 7/11/2014 3:20 AM, Joerg Roedel wrote:
> On Mon, Jun 30, 2014 at 09:51:51AM -0700, Olav Haugan wrote:
>> +int iommu_map_range(struct iommu_domain *domain, unsigned int iova,
>> +		    struct scatterlist *sg, unsigned int len, int prot)
>> +{
>> +	if (unlikely(domain->ops->map_range == NULL))
>> +		return -ENODEV;
>> +
>> +	BUG_ON(iova & (~PAGE_MASK));
>> +
>> +	return domain->ops->map_range(domain, iova, sg, len, prot);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_map_range);
>> +
>> +int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova,
>> +		      unsigned int len)
>> +{
>> +	if (unlikely(domain->ops->unmap_range == NULL))
>> +		return -ENODEV;
>> +
>> +	BUG_ON(iova & (~PAGE_MASK));
>> +
>> +	return domain->ops->unmap_range(domain, iova, len);
>> +}
>> +EXPORT_SYMBOL_GPL(iommu_unmap_range);
> 
> Before introducing these new API functions there should be a fall-back
> for IOMMU drivers that do (not yet) implement the map_range and
> unmap_range call-backs.
> 
> The last thing we want is this kind of functional partitioning between
> different IOMMU drivers.

Yes, I can definitely add a fallback instead of returning -ENODEV.


Thanks,

Olav
diff mbox

Patch

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index e5555fc..f2a6b80 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -898,6 +898,30 @@  size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
 EXPORT_SYMBOL_GPL(iommu_unmap);
 
 
+int iommu_map_range(struct iommu_domain *domain, unsigned int iova,
+		    struct scatterlist *sg, unsigned int len, int prot)
+{
+	if (unlikely(domain->ops->map_range == NULL))
+		return -ENODEV;
+
+	BUG_ON(iova & (~PAGE_MASK));
+
+	return domain->ops->map_range(domain, iova, sg, len, prot);
+}
+EXPORT_SYMBOL_GPL(iommu_map_range);
+
+int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova,
+		      unsigned int len)
+{
+	if (unlikely(domain->ops->unmap_range == NULL))
+		return -ENODEV;
+
+	BUG_ON(iova & (~PAGE_MASK));
+
+	return domain->ops->unmap_range(domain, iova, len);
+}
+EXPORT_SYMBOL_GPL(iommu_unmap_range);
+
 int iommu_domain_window_enable(struct iommu_domain *domain, u32 wnd_nr,
 			       phys_addr_t paddr, u64 size, int prot)
 {
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b96a5b2..63dca6d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -22,6 +22,7 @@ 
 #include <linux/errno.h>
 #include <linux/err.h>
 #include <linux/types.h>
+#include <linux/scatterlist.h>
 #include <trace/events/iommu.h>
 
 #define IOMMU_READ	(1 << 0)
@@ -93,6 +94,8 @@  enum iommu_attr {
  * @detach_dev: detach device from an iommu domain
  * @map: map a physically contiguous memory region to an iommu domain
  * @unmap: unmap a physically contiguous memory region from an iommu domain
+ * @map_range: map a scatter-gather list of physically contiguous memory chunks to an iommu domain
+ * @unmap_range: unmap a scatter-gather list of physically contiguous memory chunks from an iommu domain
  * @iova_to_phys: translate iova to physical address
  * @domain_has_cap: domain capabilities query
  * @add_device: add device to iommu grouping
@@ -110,6 +113,10 @@  struct iommu_ops {
 		   phys_addr_t paddr, size_t size, int prot);
 	size_t (*unmap)(struct iommu_domain *domain, unsigned long iova,
 		     size_t size);
+	int (*map_range)(struct iommu_domain *domain, unsigned int iova,
+		    struct scatterlist *sg, unsigned int len, int prot);
+	int (*unmap_range)(struct iommu_domain *domain, unsigned int iova,
+		      unsigned int len);
 	phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
 	int (*domain_has_cap)(struct iommu_domain *domain,
 			      unsigned long cap);
@@ -153,6 +160,10 @@  extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		     phys_addr_t paddr, size_t size, int prot);
 extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
 		       size_t size);
+extern int iommu_map_range(struct iommu_domain *domain, unsigned int iova,
+		    struct scatterlist *sg, unsigned int len, int prot);
+extern int iommu_unmap_range(struct iommu_domain *domain, unsigned int iova,
+		      unsigned int len);
 extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova);
 extern int iommu_domain_has_cap(struct iommu_domain *domain,
 				unsigned long cap);
@@ -280,6 +291,19 @@  static inline int iommu_unmap(struct iommu_domain *domain, unsigned long iova,
 	return -ENODEV;
 }
 
+static inline int iommu_map_range(struct iommu_domain *domain,
+				  unsigned int iova, struct scatterlist *sg,
+				  unsigned int len, int prot)
+{
+	return -ENODEV;
+}
+
+static inline int iommu_unmap_range(struct iommu_domain *domain,
+				    unsigned int iova, unsigned int len)
+{
+	return -ENODEV;
+}
+
 static inline int iommu_domain_window_enable(struct iommu_domain *domain,
 					     u32 wnd_nr, phys_addr_t paddr,
 					     u64 size, int prot)