Message ID | 20171130173428.8666-1-l.stach@pengutronix.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Quoting Lucas Stach (2017-11-30 17:34:28) > Dma-bufs should already be device coherent, as they are only pulled in the > CPU domain via the begin/end cpu_access calls. As we cache the mapping set > up by dma_map_sg a CPU sync at this point will not actually guarantee proper > coherency on non-coherent architectures, so we can as well stop pretending. That matches my understanding of the dma-buf API, device coherent with explicit cpu coherency managed by ioctl. > This is an important performance fix for architectures which need explicit > cache synchronization and userspace doing lots of dma-buf imports. > Improves Weston on Etnaviv performance 5x, where before this patch > 90% > of Weston CPU time was spent synchronizing caches for buffers which are > already device coherent. > > Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Sent an equivalent patch through i915's CI, which didn't show any problems. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> -Chris
On Thu, Nov 30, 2017 at 08:56:42PM +0000, Chris Wilson wrote: > Quoting Lucas Stach (2017-11-30 17:34:28) > > Dma-bufs should already be device coherent, as they are only pulled in the > > CPU domain via the begin/end cpu_access calls. As we cache the mapping set > > up by dma_map_sg a CPU sync at this point will not actually guarantee proper > > coherency on non-coherent architectures, so we can as well stop pretending. > > That matches my understanding of the dma-buf API, device coherent with > explicit cpu coherency managed by ioctl. > > > This is an important performance fix for architectures which need explicit > > cache synchronization and userspace doing lots of dma-buf imports. > > Improves Weston on Etnaviv performance 5x, where before this patch > 90% > > of Weston CPU time was spent synchronizing caches for buffers which are > > already device coherent. > > > > Signed-off-by: Lucas Stach <l.stach@pengutronix.de> > > Sent an equivalent patch through i915's CI, which didn't show any > problems. > > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Pushed to drm-misc-next, thx for patch&review. -Daniel
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c index 8de93a226c24..9a17725b0f7a 100644 --- a/drivers/gpu/drm/drm_prime.c +++ b/drivers/gpu/drm/drm_prime.c @@ -218,8 +218,9 @@ static void drm_gem_map_detach(struct dma_buf *dma_buf, sgt = prime_attach->sgt; if (sgt) { if (prime_attach->dir != DMA_NONE) - dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents, - prime_attach->dir); + dma_unmap_sg_attrs(attach->dev, sgt->sgl, sgt->nents, + prime_attach->dir, + DMA_ATTR_SKIP_CPU_SYNC); sg_free_table(sgt); } @@ -277,7 +278,8 @@ static struct sg_table *drm_gem_map_dma_buf(struct dma_buf_attachment *attach, sgt = obj->dev->driver->gem_prime_get_sg_table(obj); if (!IS_ERR(sgt)) { - if (!dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir)) { + if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir, + DMA_ATTR_SKIP_CPU_SYNC)) { sg_free_table(sgt); kfree(sgt); sgt = ERR_PTR(-ENOMEM);
Dma-bufs should already be device coherent, as they are only pulled in the CPU domain via the begin/end cpu_access calls. As we cache the mapping set up by dma_map_sg a CPU sync at this point will not actually guarantee proper coherency on non-coherent architectures, so we can as well stop pretending. This is an important performance fix for architectures which need explicit cache synchronization and userspace doing lots of dma-buf imports. Improves Weston on Etnaviv performance 5x, where before this patch > 90% of Weston CPU time was spent synchronizing caches for buffers which are already device coherent. Signed-off-by: Lucas Stach <l.stach@pengutronix.de> --- drivers/gpu/drm/drm_prime.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)