Message ID | 20190506185207.31069-1-tmurphy@arista.com (mailing list archive) |
---|---|
Headers | show |
Series | iommu/amd: Convert the AMD iommu driver to the dma-iommu api | expand |
Hi Tom, On Mon, May 06, 2019 at 07:52:02PM +0100, Tom Murphy wrote: > Convert the AMD iommu driver to the dma-iommu api. Remove the iova > handling and reserve region code from the AMD iommu driver. Thank you for your work on this! I appreciate that much, but I am not sure we are ready to make that move for the AMD and Intel IOMMU drivers yet. My main concern right now is that these changes will add a per-page table lock into the fast-path for dma-mapping operations. There has been much work in the past to remove all locking from these code-paths and make it scalable on x86. The dma-ops implementations in the x86 IOMMU drivers have the benefit that they can call their page-table manipulation functions directly and without locks, because they can make the necessary assumptions. The IOMMU-API mapping/unmapping path can't make these assumptions because it is also used for non-DMA-API use-cases. So before we can move the AMD and Intel drivers to the generic DMA-API implementation we need to solve this problem to not introduce new scalability regressions. Regards, Joerg
On Mon, Jun 3, 2019 at 11:52 AM Joerg Roedel <joro@8bytes.org> wrote: > > Hi Tom, > > On Mon, May 06, 2019 at 07:52:02PM +0100, Tom Murphy wrote: > > Convert the AMD iommu driver to the dma-iommu api. Remove the iova > > handling and reserve region code from the AMD iommu driver. > > Thank you for your work on this! I appreciate that much, but I am not > sure we are ready to make that move for the AMD and Intel IOMMU drivers > yet. > > My main concern right now is that these changes will add a per-page > table lock into the fast-path for dma-mapping operations. There has been > much work in the past to remove all locking from these code-paths and > make it scalable on x86. Where is the locking introduced? intel doesn't use a lock in it's iommu_map function: https://github.com/torvalds/linux/blob/f2c7c76c5d0a443053e94adb9f0918fa2fb85c3a/drivers/iommu/intel-iommu.c#L5302 because it cleverly uses cmpxchg64 to avoid using locks: https://github.com/torvalds/linux/blob/f2c7c76c5d0a443053e94adb9f0918fa2fb85c3a/drivers/iommu/intel-iommu.c#L900 And the locking in AMD's iommu_map function can be removed (and i have removed it in my patch set) because it does that same thing as intel: https://github.com/torvalds/linux/blob/f2c7c76c5d0a443053e94adb9f0918fa2fb85c3a/drivers/iommu/amd_iommu.c#L1486 Is there something I'm missing? > > The dma-ops implementations in the x86 IOMMU drivers have the benefit > that they can call their page-table manipulation functions directly and > without locks, because they can make the necessary assumptions. The > IOMMU-API mapping/unmapping path can't make these assumptions because it > is also used for non-DMA-API use-cases. > > So before we can move the AMD and Intel drivers to the generic DMA-API > implementation we need to solve this problem to not introduce new > scalability regressions. > > Regards, > > Joerg >