Message ID | cover.1534250425.git.robin.murphy@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | iommu: Avoid DMA ops domain refcount contention | expand |
On 14/08/2018 14:04, Robin Murphy wrote: > John raised the issue[1] that we have some unnecessary refcount contention > in the DMA ops path which shows scalability problems now that we have more > real high-performance hardware using iommu-dma. The x86 IOMMU drivers are > sidestepping this by stashing domain references in archdata, but since > that's not very nice for architecture-agnostic code, I think it's time to > look at a generic API-level solution. > > These are a couple of quick patches based on the idea I had back when > first implementing this lot, but didn't have any way to justify at the > time. The third patch can be ignored for the sake of API discussion, but > is included for completeness. > > Robin. > > > [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-August/029303.html > > Robin Murphy (3): > iommu: Add fast hook for getting DMA domains > iommu/dma: Use fast DMA domain lookup > arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops > > arch/arm64/mm/dma-mapping.c | 10 +++++----- > drivers/iommu/dma-iommu.c | 23 ++++++++++++----------- > drivers/iommu/iommu.c | 9 +++++++++ > include/linux/iommu.h | 1 + > 4 files changed, 27 insertions(+), 16 deletions(-) > Thanks Robin. So we'll test this now. I anticipate a decent improvement across the board since we are now simply dereferencing the device's iommu group directly. Cheers, John
I plan to make the arm iommu dma ops generic and move them to drivers/iommu for the 4.20 merge window. Because of that it would be great to create a stable branch or even pull this in through the dma-mapping tree entirely.
On Fri, Aug 17, 2018 at 12:24:15AM -0700, Christoph Hellwig wrote: > I plan to make the arm iommu dma ops generic and move them to > drivers/iommu for the 4.20 merge window. Because of that it would > be great to create a stable branch or even pull this in through > the dma-mapping tree entirely. In which case, for the arm64 bits: Acked-by: Will Deacon <will.deacon@arm.com> Will
On 17/08/18 08:24, Christoph Hellwig wrote: > I plan to make the arm iommu dma ops generic and move them to > drivers/iommu for the 4.20 merge window. You mean 32-bit arm? The only place that code should move to is /dev/null ;) - the plan has always been to convert it to use groups and default domains so it can just plug iommu-dma in. The tricky bit is either weaning the users of the private arm_iommu_*() API onto generic IOMMU API usage, or at least implementing transitional wrappers in a way that doesn't break anything. > Because of that it would > be great to create a stable branch or even pull this in through > the dma-mapping tree entirely. FWIW I might still need to tweak the first patch depending on how we choose to resolve the interaction between DMA ops and unmanaged domains[1] - if we want to add fallback paths to iommu-dma instead of swizzling device ops then the assumptions for this hook change and it will need to reference group->domain rather than group->default_domain. Robin. [1] https://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg230275.html
On Fri, Aug 17, 2018 at 12:30:31PM +0100, Robin Murphy wrote: > On 17/08/18 08:24, Christoph Hellwig wrote: > > I plan to make the arm iommu dma ops generic and move them to > > drivers/iommu for the 4.20 merge window. > > You mean 32-bit arm? Sorry, I meant the arm64 wrappers for dma-iommu of course. > The only place that code should move to is /dev/null ;) > - the plan has always been to convert it to use groups and default domains > so it can just plug iommu-dma in. The tricky bit is either weaning the users > of the private arm_iommu_*() API onto generic IOMMU API usage, or at least > implementing transitional wrappers in a way that doesn't break anything. Agreed on the arm32 iommu code.
On 14/08/2018 14:04, Robin Murphy wrote: > John raised the issue[1] that we have some unnecessary refcount contention > in the DMA ops path which shows scalability problems now that we have more > real high-performance hardware using iommu-dma. The x86 IOMMU drivers are > sidestepping this by stashing domain references in archdata, but since > that's not very nice for architecture-agnostic code, I think it's time to > look at a generic API-level solution. > > These are a couple of quick patches based on the idea I had back when > first implementing this lot, but didn't have any way to justify at the > time. The third patch can be ignored for the sake of API discussion, but > is included for completeness. > > Robin. > Some results: PCIe NIC iperf test (128 processes, small packets): Without patchset: 289232.00 rxpck/s With patchset: 367283 rxpck/s JFYI, with Leizhen's non-strict mode patchset + Robin's patchset: 1215539 rxpck/s Leizhen can share non-strict mode results in his own patchset however. We did also try the storage controller fio test with a lot of SAS SSD disks (24 disks, 24 fio processes) for Robin's patchset only, but did not see a significant change. Thanks to Dongdong + chenxiang for testing. Let me know if you require more info. Thanks again, John > > [1] https://lists.linuxfoundation.org/pipermail/iommu/2018-August/029303.html > > Robin Murphy (3): > iommu: Add fast hook for getting DMA domains > iommu/dma: Use fast DMA domain lookup > arm64/dma-mapping: Mildly optimise non-coherent IOMMU ops > > arch/arm64/mm/dma-mapping.c | 10 +++++----- > drivers/iommu/dma-iommu.c | 23 ++++++++++++----------- > drivers/iommu/iommu.c | 9 +++++++++ > include/linux/iommu.h | 1 + > 4 files changed, 27 insertions(+), 16 deletions(-) >