mbox series

[0/3] iommu: Avoid DMA ops domain refcount contention

Message ID cover.1534250425.git.robin.murphy@arm.com (mailing list archive)
Headers show
Series iommu: Avoid DMA ops domain refcount contention | expand

Message

Robin Murphy Aug. 14, 2018, 1:04 p.m. UTC
John raised the issue[1] that we have some unnecessary refcount contention
in the DMA ops path which shows scalability problems now that we have more
real high-performance hardware using iommu-dma. The x86 IOMMU drivers are
sidestepping this by stashing domain references in archdata, but since
that's not very nice for architecture-agnostic code, I think it's time to
look at a generic API-level solution.

These are a couple of quick patches based on the idea I had back when
first implementing this lot, but didn't have any way to justify at the
time. The third patch can be ignored for the sake of API discussion, but
is included for completeness. 

Robin.


[1] https://lists.linuxfoundation.org/pipermail/iommu/2018-August/029303.html

Robin Murphy (3):
  iommu: Add fast hook for getting DMA domains
  iommu/dma: Use fast DMA domain lookup
  arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops

 arch/arm64/mm/dma-mapping.c | 10 +++++-----
 drivers/iommu/dma-iommu.c   | 23 ++++++++++++-----------
 drivers/iommu/iommu.c       |  9 +++++++++
 include/linux/iommu.h       |  1 +
 4 files changed, 27 insertions(+), 16 deletions(-)

Comments

John Garry Aug. 14, 2018, 1:38 p.m. UTC | #1
On 14/08/2018 14:04, Robin Murphy wrote:
> John raised the issue[1] that we have some unnecessary refcount contention
> in the DMA ops path which shows scalability problems now that we have more
> real high-performance hardware using iommu-dma. The x86 IOMMU drivers are
> sidestepping this by stashing domain references in archdata, but since
> that's not very nice for architecture-agnostic code, I think it's time to
> look at a generic API-level solution.
>
> These are a couple of quick patches based on the idea I had back when
> first implementing this lot, but didn't have any way to justify at the
> time. The third patch can be ignored for the sake of API discussion, but
> is included for completeness.
>
> Robin.
>
>
> [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-August/029303.html
>
> Robin Murphy (3):
>   iommu: Add fast hook for getting DMA domains
>   iommu/dma: Use fast DMA domain lookup
>   arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops
>
>  arch/arm64/mm/dma-mapping.c | 10 +++++-----
>  drivers/iommu/dma-iommu.c   | 23 ++++++++++++-----------
>  drivers/iommu/iommu.c       |  9 +++++++++
>  include/linux/iommu.h       |  1 +
>  4 files changed, 27 insertions(+), 16 deletions(-)
>

Thanks Robin. So we'll test this now. I anticipate a decent improvement 
across the board since we are now simply dereferencing the device's 
iommu group directly.

Cheers,
John
Christoph Hellwig Aug. 17, 2018, 7:24 a.m. UTC | #2
I plan to make the arm iommu dma ops generic and move them to
drivers/iommu for the 4.20 merge window.  Because of that it would
be great to create a stable branch or even pull this in through
the dma-mapping tree entirely.
Will Deacon Aug. 17, 2018, 9:03 a.m. UTC | #3
On Fri, Aug 17, 2018 at 12:24:15AM -0700, Christoph Hellwig wrote:
> I plan to make the arm iommu dma ops generic and move them to
> drivers/iommu for the 4.20 merge window.  Because of that it would
> be great to create a stable branch or even pull this in through
> the dma-mapping tree entirely.

In which case, for the arm64 bits:

Acked-by: Will Deacon <will.deacon@arm.com>

Will
Robin Murphy Aug. 17, 2018, 11:30 a.m. UTC | #4
On 17/08/18 08:24, Christoph Hellwig wrote:
> I plan to make the arm iommu dma ops generic and move them to
> drivers/iommu for the 4.20 merge window.

You mean 32-bit arm? The only place that code should move to is 
/dev/null ;) - the plan has always been to convert it to use groups and 
default domains so it can just plug iommu-dma in. The tricky bit is 
either weaning the users of the private arm_iommu_*() API onto generic 
IOMMU API usage, or at least implementing transitional wrappers in a way 
that doesn't break anything.

>  Because of that it would
> be great to create a stable branch or even pull this in through
> the dma-mapping tree entirely.

FWIW I might still need to tweak the first patch depending on how we 
choose to resolve the interaction between DMA ops and unmanaged 
domains[1] - if we want to add fallback paths to iommu-dma instead of 
swizzling device ops then the assumptions for this hook change and it 
will need to reference group->domain rather than group->default_domain.

Robin.


[1] 
https://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg230275.html
Christoph Hellwig Aug. 17, 2018, 12:01 p.m. UTC | #5
On Fri, Aug 17, 2018 at 12:30:31PM +0100, Robin Murphy wrote:
> On 17/08/18 08:24, Christoph Hellwig wrote:
> > I plan to make the arm iommu dma ops generic and move them to
> > drivers/iommu for the 4.20 merge window.
> 
> You mean 32-bit arm?

Sorry, I meant the arm64 wrappers for dma-iommu of course.

> The only place that code should move to is /dev/null ;)
> - the plan has always been to convert it to use groups and default domains
> so it can just plug iommu-dma in. The tricky bit is either weaning the users
> of the private arm_iommu_*() API onto generic IOMMU API usage, or at least
> implementing transitional wrappers in a way that doesn't break anything.

Agreed on the arm32 iommu code.
John Garry Aug. 17, 2018, 1:03 p.m. UTC | #6
On 14/08/2018 14:04, Robin Murphy wrote:
> John raised the issue[1] that we have some unnecessary refcount contention
> in the DMA ops path which shows scalability problems now that we have more
> real high-performance hardware using iommu-dma. The x86 IOMMU drivers are
> sidestepping this by stashing domain references in archdata, but since
> that's not very nice for architecture-agnostic code, I think it's time to
> look at a generic API-level solution.
>
> These are a couple of quick patches based on the idea I had back when
> first implementing this lot, but didn't have any way to justify at the
> time. The third patch can be ignored for the sake of API discussion, but
> is included for completeness.
>
> Robin.
>

Some results:

PCIe NIC iperf test (128 processes, small packets):
Without patchset:
289232.00 rxpck/s

With patchset:
367283 rxpck/s

JFYI, with Leizhen's non-strict mode patchset + Robin's patchset:
1215539 rxpck/s

Leizhen can share non-strict mode results in his own patchset however.

We did also try the storage controller fio test with a lot of SAS SSD 
disks (24 disks, 24 fio processes) for Robin's patchset only, but did 
not see a significant change.

Thanks to Dongdong + chenxiang for testing.

Let me know if you require more info.

Thanks again,
John


>
> [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-August/029303.html
>
> Robin Murphy (3):
>   iommu: Add fast hook for getting DMA domains
>   iommu/dma: Use fast DMA domain lookup
>   arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops
>
>  arch/arm64/mm/dma-mapping.c | 10 +++++-----
>  drivers/iommu/dma-iommu.c   | 23 ++++++++++++-----------
>  drivers/iommu/iommu.c       |  9 +++++++++
>  include/linux/iommu.h       |  1 +
>  4 files changed, 27 insertions(+), 16 deletions(-)
>