Message ID | 20210627143405.77298-1-sven@svenpeter.dev (mailing list archive) |
---|---|
Headers | show |
Series | Apple M1 DART IOMMU driver | expand |
On 2021-06-27 15:34, Sven Peter wrote: [...] > In the long term, I'd like to extend the dma-iommu framework itself to > support iommu pagesizes with a larger granule than the CPU pagesize if that is > something you agree with. BTW this isn't something we can fully support in general. IOMMU API users may expect this to work: iommu_map(domain, iova, page_to_phys(p1), PAGE_SIZE, prot); iommu_map(domain, iova + PAGE_SIZE, page_to_phys(p2), PAGE_SIZE, prot); Although they do in principle have visibility of pgsize_bitmap, I still doubt anyone is really prepared for CPU-page-aligned mappings to fail. Even at the DMA API level you could hide *some* of it (at the cost of effectively only having 1/4 of the usable address space), but there are still cases like where v4l2 has a hard requirement that a page-aligned scatterlist can be mapped into a contiguous region of DMA addresses. > This would be important to later support the thunderbolt DARTs since I would be > very uncomfortable to have these running in (software or hardware) bypass mode. Funnily enough that's the one case that would be relatively workable, since untrusted devices are currently subject to bounce-buffering of the entire DMA request, so it doesn't matter so much how the bounce buffer itself is mapped. Even with the possible future optimisation of only bouncing the non-page-aligned start and end parts of a buffer I think it still works (the physical alignment just has to be considered in terms of the IOMMU granule). Robin.
On Wed, Jul 14, 2021 at 8:21 PM Robin Murphy <robin.murphy@arm.com> wrote: > > On 2021-06-27 15:34, Sven Peter wrote: > [...] > > In the long term, I'd like to extend the dma-iommu framework itself to > > support iommu pagesizes with a larger granule than the CPU pagesize if that is > > something you agree with. > > BTW this isn't something we can fully support in general. IOMMU API > users may expect this to work: > > iommu_map(domain, iova, page_to_phys(p1), PAGE_SIZE, prot); > iommu_map(domain, iova + PAGE_SIZE, page_to_phys(p2), PAGE_SIZE, prot); > > Although they do in principle have visibility of pgsize_bitmap, I still > doubt anyone is really prepared for CPU-page-aligned mappings to fail. > Even at the DMA API level you could hide *some* of it (at the cost of > effectively only having 1/4 of the usable address space), but there are > still cases like where v4l2 has a hard requirement that a page-aligned > scatterlist can be mapped into a contiguous region of DMA addresses. I think that was the same conclusion we had earlier: the dma-mapping interfaces should be possible for large iotlb pages, but any driver directly using the IOMMU API, such as VFIO, would not work. The question is how we can best allow one but not the other. Arnd
On Wed, Jul 14, 2021 at 10:51:34PM +0200, Arnd Bergmann wrote:
> The question is how we can best allow one but not the other.
By only allowing to allocate domains of type IDENTITY and DMA, but fail
to allocate UNMANAGED domains.
Regards,
Joerg
On Wed, Jul 14, 2021 at 07:19:50PM +0100, Robin Murphy wrote: > Even at the DMA API level you could hide *some* of it (at the cost of > effectively only having 1/4 of the usable address space), but there are > still cases like where v4l2 has a hard requirement that a page-aligned > scatterlist can be mapped into a contiguous region of DMA addresses. Where does v4l2 make that broken assumption? Plenty of dma mapping implementations including dma-direct do not support that. Drivers need to call dma_get_merge_boundary() to check for that kind of behavior.
On 2021-07-16 07:24, Christoph Hellwig wrote: > On Wed, Jul 14, 2021 at 07:19:50PM +0100, Robin Murphy wrote: >> Even at the DMA API level you could hide *some* of it (at the cost of >> effectively only having 1/4 of the usable address space), but there are >> still cases like where v4l2 has a hard requirement that a page-aligned >> scatterlist can be mapped into a contiguous region of DMA addresses. > > Where does v4l2 make that broken assumption? Plenty of dma mapping > implementations including dma-direct do not support that. See vb2_dc_get_contiguous_size() and its callers. I still remember spending an entire work day on writing one email at the culmination of this discussion: https://lore.kernel.org/linux-iommu/56409B6D.5090903@arm.com/ 809eac54cdd6 was framed as an efficiency improvement because it technically was one (and something I had wanted to implement anyway), but it was also very much to save myself from any further email debates or customer calls about "regressing" code ported from 32-bit platforms... Robin.