mbox series

[v2,0/3] iommu: Avoid DMA ops domain refcount contention

Message ID cover.1536764440.git.robin.murphy@arm.com (mailing list archive)
Headers show
Series iommu: Avoid DMA ops domain refcount contention | expand

Message

Robin Murphy Sept. 12, 2018, 3:24 p.m. UTC
John raised the issue[1] that we have some unnecessary refcount contention
in the DMA ops path which shows scalability problems now that we have more
real high-performance hardware using iommu-dma. The x86 IOMMU drivers are
sidestepping this by stashing domain references in archdata, but since
that's not very nice for architecture-agnostic code, I think it's time to
look at a generic API-level solution.

These are a couple of quick patches based on the idea I had back when
first implementing iommu-dma, but didn't have any way to justify at the
time. However, the reports of 10-25% better networking performance on v1
suggest that it's very worthwhile (and far more significant than I ever
would have guessed).

As far as merging goes, I don't mind at all whether this goes via IOMMU,
or via dma-mapping provided Joerg's happy to ack it.

Robin.


[1] https://lists.linuxfoundation.org/pipermail/iommu/2018-August/029303.html

Robin Murphy (3):
  iommu: Add fast hook for getting DMA domains
  iommu/dma: Use fast DMA domain lookup
  arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops

 arch/arm64/mm/dma-mapping.c | 10 +++++-----
 drivers/iommu/dma-iommu.c   | 23 ++++++++++++-----------
 drivers/iommu/iommu.c       |  9 +++++++++
 include/linux/iommu.h       |  1 +
 4 files changed, 27 insertions(+), 16 deletions(-)

Comments

Will Deacon Sept. 14, 2018, 12:48 p.m. UTC | #1
Hi Robin,

On Wed, Sep 12, 2018 at 04:24:11PM +0100, Robin Murphy wrote:
> John raised the issue[1] that we have some unnecessary refcount contention
> in the DMA ops path which shows scalability problems now that we have more
> real high-performance hardware using iommu-dma. The x86 IOMMU drivers are
> sidestepping this by stashing domain references in archdata, but since
> that's not very nice for architecture-agnostic code, I think it's time to
> look at a generic API-level solution.
> 
> These are a couple of quick patches based on the idea I had back when
> first implementing iommu-dma, but didn't have any way to justify at the
> time. However, the reports of 10-25% better networking performance on v1
> suggest that it's very worthwhile (and far more significant than I ever
> would have guessed).
> 
> As far as merging goes, I don't mind at all whether this goes via IOMMU,
> or via dma-mapping provided Joerg's happy to ack it.

I think it makes most sense for Joerg to take this series via his tree.

Anyway, I've been running them on my TX2 box and things are happy enough,
so:

Tested-by: Will Deacon <will.deacon@arm.com>

Will
John Garry Sept. 17, 2018, 11:20 a.m. UTC | #2
On 14/09/2018 13:48, Will Deacon wrote:
> Hi Robin,
>

Hi Robin,

I just spoke with Dongdong and we will test this version also so that we 
may provide a "Tested-by" tag.

Thanks,
John

> On Wed, Sep 12, 2018 at 04:24:11PM +0100, Robin Murphy wrote:
>> John raised the issue[1] that we have some unnecessary refcount contention
>> in the DMA ops path which shows scalability problems now that we have more
>> real high-performance hardware using iommu-dma. The x86 IOMMU drivers are
>> sidestepping this by stashing domain references in archdata, but since
>> that's not very nice for architecture-agnostic code, I think it's time to
>> look at a generic API-level solution.
>>
>> These are a couple of quick patches based on the idea I had back when
>> first implementing iommu-dma, but didn't have any way to justify at the
>> time. However, the reports of 10-25% better networking performance on v1
>> suggest that it's very worthwhile (and far more significant than I ever
>> would have guessed).
>>
>> As far as merging goes, I don't mind at all whether this goes via IOMMU,
>> or via dma-mapping provided Joerg's happy to ack it.
>
> I think it makes most sense for Joerg to take this series via his tree.
>
> Anyway, I've been running them on my TX2 box and things are happy enough,
> so:
>
> Tested-by: Will Deacon <will.deacon@arm.com>
>
> Will
>
> .
>
Christoph Hellwig Sept. 17, 2018, 1:33 p.m. UTC | #3
On Fri, Sep 14, 2018 at 01:48:59PM +0100, Will Deacon wrote:
> > As far as merging goes, I don't mind at all whether this goes via IOMMU,
> > or via dma-mapping provided Joerg's happy to ack it.
> 
> I think it makes most sense for Joerg to take this series via his tree.

FYI, I have WIP patches to move the arm dma-iommu wrappers to common
code:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-maybe-coherent

which will require some synchronization of the involved trees.  In the
end it is iommu code, so the actual iommu patches should probably
go into the iommu tree after all, but it might have to pull in the
dma-mapping branch for that.  Or we just punt it until the next merge
window.  Not sure how fast Tom needs the common dma-iommu code for
the x86 AMD iommu conversion.
Tom Murphy Sept. 18, 2018, 1:28 p.m. UTC | #4
>Not sure how fast Tom needs the common dma-iommu code for the x86 AMD iommu conversion.

I am currently busy working on something else and won't be able to
do/test the x86 AMD iommu conversion anytime soon. So I don't need the
common dma-iommu code anytime soon.

On 17 September 2018 at 14:33, Christoph Hellwig <hch@lst.de> wrote:
> On Fri, Sep 14, 2018 at 01:48:59PM +0100, Will Deacon wrote:
>> > As far as merging goes, I don't mind at all whether this goes via IOMMU,
>> > or via dma-mapping provided Joerg's happy to ack it.
>>
>> I think it makes most sense for Joerg to take this series via his tree.
>
> FYI, I have WIP patches to move the arm dma-iommu wrappers to common
> code:
>
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-maybe-coherent
>
> which will require some synchronization of the involved trees.  In the
> end it is iommu code, so the actual iommu patches should probably
> go into the iommu tree after all, but it might have to pull in the
> dma-mapping branch for that.  Or we just punt it until the next merge
> window.  Not sure how fast Tom needs the common dma-iommu code for
> the x86 AMD iommu conversion.
John Garry Sept. 20, 2018, 12:51 p.m. UTC | #5
On 17/09/2018 12:20, John Garry wrote:
> On 14/09/2018 13:48, Will Deacon wrote:
>> Hi Robin,
>>
>
> Hi Robin,
>
> I just spoke with Dongdong and we will test this version also so that we
> may provide a "Tested-by" tag.
>

I tested this, so for series:
Tested-by: John Garry <john.garry@huawei.com>

Thanks,
John

> Thanks,
> John
>
>> On Wed, Sep 12, 2018 at 04:24:11PM +0100, Robin Murphy wrote:
>>> John raised the issue[1] that we have some unnecessary refcount
>>> contention
>>> in the DMA ops path which shows scalability problems now that we have
>>> more
>>> real high-performance hardware using iommu-dma. The x86 IOMMU drivers
>>> are
>>> sidestepping this by stashing domain references in archdata, but since
>>> that's not very nice for architecture-agnostic code, I think it's
>>> time to
>>> look at a generic API-level solution.
>>>
>>> These are a couple of quick patches based on the idea I had back when
>>> first implementing iommu-dma, but didn't have any way to justify at the
>>> time. However, the reports of 10-25% better networking performance on v1
>>> suggest that it's very worthwhile (and far more significant than I ever
>>> would have guessed).
>>>
>>> As far as merging goes, I don't mind at all whether this goes via IOMMU,
>>> or via dma-mapping provided Joerg's happy to ack it.
>>
>> I think it makes most sense for Joerg to take this series via his tree.
>>
>> Anyway, I've been running them on my TX2 box and things are happy enough,
>> so:
>>
>> Tested-by: Will Deacon <will.deacon@arm.com>
>>
>> Will
>>
>> .
>>
>
>
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>
> .
>
Joerg Roedel Sept. 25, 2018, 8:24 a.m. UTC | #6
On Wed, Sep 12, 2018 at 04:24:11PM +0100, Robin Murphy wrote:
> Robin Murphy (3):
>   iommu: Add fast hook for getting DMA domains
>   iommu/dma: Use fast DMA domain lookup
>   arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops

Applied, thanks Robin.