Message ID | 20220629011638.21783-1-schmitzmic@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Converting m68k WD33C93 drivers to DMA API | expand |
On Wed, Jun 29, 2022 at 3:16 AM Michael Schmitz <schmitzmic@gmail.com> wrote: > > V1 of a patch series to convert m68k Amiga WD33C93 drivers to the > DMA API. > > This series was precipitated by Arnd removing CONFIG_VIRT_TO_BUS. The > m68k WD33C93 still used virt_to_bus to convert virtual addresses to > physical addresses suitable for the DMA engines (note m68k does not > have an IOMMU and uses a direct mapping for DMA addresses). > > Arnd suggested to use dma_map_single() to set up dma mappings instead > of open-coding much the same in every driver dma_setup() function. > > It appears that m68k (MMU, except for coldfire) will set up pages for > DMA transfers as non-cacheable, thus obviating the need for explicit > cache management. To clarify, the dma-mapping API has two ways of dealing with this: - the streaming API (dma_map/unmap_...) uses explicit cache flushes - the coherent API (dma_alloc_coherent etc) uses page protections to prevent caching. These three drivers use the streaming API because they operate on data passed in from the outside, so the non-cacheable protection bits are not used here. > DMA setup on a3000 host adapters can be simplified to skip bounce > buffer use (assuming SCSI buffers passed to the driver are cache> Thanks, and Cheers, > > Michael > > line aligned; a safe bet except for maybe sg.c input). > > On gvp11 and a2091 host adapters, only the lowest 16 MB of physical > memory can be directy addressed by DMA, and bounce buffers from that > space must still be used (possibly allocated from chip RAM using the > custom allocator) if buffers are located in the higher memory regions. > > The m68k VME mvme147 driver has no DMA addressing or alignment > restrictions and can be converted in the same way as the Amiga a3000 > one, but will require conversion to a platform device driver first. Right, it seems that the old hack of passing a NULL device pointer no longer works, and that is probably for the better. If you want an easy way out, the driver can just call platform_device_register_full() to create its own device with a dma_mask set up, and use that device for the DMA API, but not actually match the device to a driver. > Only compile tested so far, and hardware testing might be hard to do. > I'd appreciate someone giving this a thorough review. Looks all good to me. Arnd
Hi Arnd, Am 29.06.2022 um 18:19 schrieb Arnd Bergmann: > On Wed, Jun 29, 2022 at 3:16 AM Michael Schmitz <schmitzmic@gmail.com> wrote: >> >> V1 of a patch series to convert m68k Amiga WD33C93 drivers to the >> DMA API. >> >> This series was precipitated by Arnd removing CONFIG_VIRT_TO_BUS. The >> m68k WD33C93 still used virt_to_bus to convert virtual addresses to >> physical addresses suitable for the DMA engines (note m68k does not >> have an IOMMU and uses a direct mapping for DMA addresses). >> >> Arnd suggested to use dma_map_single() to set up dma mappings instead >> of open-coding much the same in every driver dma_setup() function. >> >> It appears that m68k (MMU, except for coldfire) will set up pages for >> DMA transfers as non-cacheable, thus obviating the need for explicit >> cache management. > > To clarify, the dma-mapping API has two ways of dealing with this: > > - the streaming API (dma_map/unmap_...) uses explicit cache flushes > > - the coherent API (dma_alloc_coherent etc) uses page protections > to prevent caching. > > These three drivers use the streaming API because they operate on > data passed in from the outside, so the non-cacheable protection bits > are not used here. I had feared you'd say something along these lines ... Now that throws up a possible problem for the gvp11 driver: due to the need to first map an allocated chunk, then possibly discard that and try another allocation strategy, copying of data to the bounce buffer is deferred until after the final mapping has been successful. This means for writes, we have done the cache flushing before we have actually written any data to the buffer. I don't think it is safe to omit the explicit cache flush for writes in this case. >> DMA setup on a3000 host adapters can be simplified to skip bounce >> buffer use (assuming SCSI buffers passed to the driver are cache> Thanks, and Cheers, >> >> Michael >> > >> line aligned; a safe bet except for maybe sg.c input). >> >> On gvp11 and a2091 host adapters, only the lowest 16 MB of physical >> memory can be directy addressed by DMA, and bounce buffers from that >> space must still be used (possibly allocated from chip RAM using the >> custom allocator) if buffers are located in the higher memory regions. >> >> The m68k VME mvme147 driver has no DMA addressing or alignment >> restrictions and can be converted in the same way as the Amiga a3000 >> one, but will require conversion to a platform device driver first. > > Right, it seems that the old hack of passing a NULL device pointer > no longer works, and that is probably for the better. > > If you want an easy way out, the driver can just call > platform_device_register_full() to create its own device with a > dma_mask set up, and use that device for the DMA API, but > not actually match the device to a driver. I'll leave it to Geert to decide whether he prefers setting up a platform device in arch/m68k/mvme147/config.c or use the shortcut. I've used platform_device_register_simple() in a few other drivers so don't mind that much. Cheers, Michael > >> Only compile tested so far, and hardware testing might be hard to do. >> I'd appreciate someone giving this a thorough review. > > Looks all good to me. > > Arnd >
On Wed, Jun 29, 2022 at 9:36 AM Michael Schmitz <schmitzmic@gmail.com> wrote: > Am 29.06.2022 um 18:19 schrieb Arnd Bergmann: > > On Wed, Jun 29, 2022 at 3:16 AM Michael Schmitz <schmitzmic@gmail.com> wrote: > >> > >> V1 of a patch series to convert m68k Amiga WD33C93 drivers to the > >> DMA API. > >> > >> This series was precipitated by Arnd removing CONFIG_VIRT_TO_BUS. The > >> m68k WD33C93 still used virt_to_bus to convert virtual addresses to > >> physical addresses suitable for the DMA engines (note m68k does not > >> have an IOMMU and uses a direct mapping for DMA addresses). > >> > >> Arnd suggested to use dma_map_single() to set up dma mappings instead > >> of open-coding much the same in every driver dma_setup() function. > >> > >> It appears that m68k (MMU, except for coldfire) will set up pages for > >> DMA transfers as non-cacheable, thus obviating the need for explicit > >> cache management. > > > > To clarify, the dma-mapping API has two ways of dealing with this: > > > > - the streaming API (dma_map/unmap_...) uses explicit cache flushes > > > > - the coherent API (dma_alloc_coherent etc) uses page protections > > to prevent caching. > > > > These three drivers use the streaming API because they operate on > > data passed in from the outside, so the non-cacheable protection bits > > are not used here. > > I had feared you'd say something along these lines ... > > Now that throws up a possible problem for the gvp11 driver: due to the > need to first map an allocated chunk, then possibly discard that and try > another allocation strategy, copying of data to the bounce buffer is > deferred until after the final mapping has been successful. This means > for writes, we have done the cache flushing before we have actually > written any data to the buffer. > > I don't think it is safe to omit the explicit cache flush for writes in > this case. I think it's fine as long as you do things in the correct order: the copy into the bounce buffer has to be done before the dma_map_single() here, and conversely, the copy out of the bounce buffer must happen after the dma_unmap_single(). Regarding the amiga_chip_alloc(), I don't know what this means for caching. If chip memory is cache-coherent (either uncached or by snooping), then there should not be any dma_map()/dma_unmap() for that case, but instead the amiga_chip_alloc() function should return both the pointer and the dma_addr_t token. Arnd
Hi Arnd, On Wed, Jun 29, 2022 at 10:21 AM Arnd Bergmann <arnd@kernel.org> wrote: > Regarding the amiga_chip_alloc(), I don't know what this means > for caching. If chip memory is cache-coherent (either uncached > or by snooping), then there should not be any > dma_map()/dma_unmap() for that case, but instead the > amiga_chip_alloc() function should return both the pointer > and the dma_addr_t token. Chip RAM is mapped uncached. Amifb remaps it using ioremap_wt() to get a write-through frame buffer. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds
On Wed, Jun 29, 2022 at 5:48 PM Geert Uytterhoeven <geert@linux-m68k.org> wrote: > > On Wed, Jun 29, 2022 at 10:21 AM Arnd Bergmann <arnd@kernel.org> wrote: > > Regarding the amiga_chip_alloc(), I don't know what this means > > for caching. If chip memory is cache-coherent (either uncached > > or by snooping), then there should not be any > > dma_map()/dma_unmap() for that case, but instead the > > amiga_chip_alloc() function should return both the pointer > > and the dma_addr_t token. > > Chip RAM is mapped uncached. > Amifb remaps it using ioremap_wt() to get a write-through frame buffer. Ok, so in this case the driver should skip the dma_map_single() cache management and instead keep converting the chipram address to a bus address directly. While the driver does an extra cache flush on the uncached address today, it's clearly not needed and there is probably a performance benefit to not doing it. For simplicity, the normal bounce buffer could similarly use dma_alloc_coherent(), which will also result in an uncached buffer. If I read this correctly, this will also ensure that the buffer is within the DMA mask, so if ZONE_DMA is larger than the mask, it just returns NULL and you have to fall back to the chipram page, rather than checking the address and freeing the buffer. Arnd
Hi Arnd, On 29/06/22 20:20, Arnd Bergmann wrote: > >>> To clarify, the dma-mapping API has two ways of dealing with this: >>> >>> - the streaming API (dma_map/unmap_...) uses explicit cache flushes >>> >>> - the coherent API (dma_alloc_coherent etc) uses page protections >>> to prevent caching. >>> >>> These three drivers use the streaming API because they operate on >>> data passed in from the outside, so the non-cacheable protection bits >>> are not used here. >> I had feared you'd say something along these lines ... >> >> Now that throws up a possible problem for the gvp11 driver: due to the >> need to first map an allocated chunk, then possibly discard that and try >> another allocation strategy, copying of data to the bounce buffer is >> deferred until after the final mapping has been successful. This means >> for writes, we have done the cache flushing before we have actually >> written any data to the buffer. >> >> I don't think it is safe to omit the explicit cache flush for writes in >> this case. > I think it's fine as long as you do things in the correct order: the > copy into the bounce buffer has to be done before the > dma_map_single() here, and conversely, the copy out of the > bounce buffer must happen after the dma_unmap_single(). Ah - I had missed the latter (due to dma_setup previously doing all cache management, and I had expected dma_map_single to do the same, i.e. invalidate the affected cache lines at time of mapping). Will fix. The former is possible to do, but may incur an extra memcpy on gvp11 (filling a bounce buffer that we may then discard because dma_map_single() returns a DMA handle outside our DMA range). The driver does switch to only allocating bounce buffers from chip RAM once a kmalloc allocation failed to yield DMA-able RAM, so the performance impact ought to be minimal. > Regarding the amiga_chip_alloc(), I don't know what this means > for caching. If chip memory is cache-coherent (either uncached > or by snooping), then there should not be any > dma_map()/dma_unmap() for that case, but instead the > amiga_chip_alloc() function should return both the pointer > and the dma_addr_t token. amiga_chip_alloc() is used in many places around the kernel, I'd rather not change all that (or more precisely. I'd rather Geert does change amiga_chip_alloc() if need be). I'll drop use of dma_map_single() on chip RAM and will rely on that having been mapped non-cacheable. Cheers, Michael > > Arnd
Hi Arnd, Am 30.06.2022 um 04:44 schrieb Arnd Bergmann: > For simplicity, the normal bounce buffer could similarly use > dma_alloc_coherent(), which will also result in an uncached > buffer. If I read this correctly, this will also ensure that the buffer No sure we can rule out calling dma_setup() in interrupt context. > is within the DMA mask, so if ZONE_DMA is larger than the mask, > it just returns NULL and you have to fall back to the chipram > page, rather than checking the address and freeing the buffer. Still need to check what ZONE_DMA is set to on Amiga. Cheers, Michael > > Arnd >