diff mbox

drm/prime: skip CPU sync in map/unmap dma_buf

Message ID 20171130173428.8666-1-l.stach@pengutronix.de (mailing list archive)
State New, archived
Headers show

Commit Message

Lucas Stach Nov. 30, 2017, 5:34 p.m. UTC
Dma-bufs should already be device coherent, as they are only pulled in the
CPU domain via the begin/end cpu_access calls. As we cache the mapping set
up by dma_map_sg a CPU sync at this point will not actually guarantee proper
coherency on non-coherent architectures, so we can as well stop pretending.

This is an important performance fix for architectures which need explicit
cache synchronization and userspace doing lots of dma-buf imports.
Improves Weston on Etnaviv performance 5x, where before this patch > 90%
of Weston CPU time was spent synchronizing caches for buffers which are
already device coherent.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
---
 drivers/gpu/drm/drm_prime.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

Chris Wilson Nov. 30, 2017, 8:56 p.m. UTC | #1
Quoting Lucas Stach (2017-11-30 17:34:28)
> Dma-bufs should already be device coherent, as they are only pulled in the
> CPU domain via the begin/end cpu_access calls. As we cache the mapping set
> up by dma_map_sg a CPU sync at this point will not actually guarantee proper
> coherency on non-coherent architectures, so we can as well stop pretending.

That matches my understanding of the dma-buf API, device coherent with
explicit cpu coherency managed by ioctl.

> This is an important performance fix for architectures which need explicit
> cache synchronization and userspace doing lots of dma-buf imports.
> Improves Weston on Etnaviv performance 5x, where before this patch > 90%
> of Weston CPU time was spent synchronizing caches for buffers which are
> already device coherent.
> 
> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>

Sent an equivalent patch through i915's CI, which didn't show any
problems.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
Daniel Vetter Dec. 1, 2017, 7:23 a.m. UTC | #2
On Thu, Nov 30, 2017 at 08:56:42PM +0000, Chris Wilson wrote:
> Quoting Lucas Stach (2017-11-30 17:34:28)
> > Dma-bufs should already be device coherent, as they are only pulled in the
> > CPU domain via the begin/end cpu_access calls. As we cache the mapping set
> > up by dma_map_sg a CPU sync at this point will not actually guarantee proper
> > coherency on non-coherent architectures, so we can as well stop pretending.
> 
> That matches my understanding of the dma-buf API, device coherent with
> explicit cpu coherency managed by ioctl.
> 
> > This is an important performance fix for architectures which need explicit
> > cache synchronization and userspace doing lots of dma-buf imports.
> > Improves Weston on Etnaviv performance 5x, where before this patch > 90%
> > of Weston CPU time was spent synchronizing caches for buffers which are
> > already device coherent.
> > 
> > Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
> 
> Sent an equivalent patch through i915's CI, which didn't show any
> problems.
> 
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Pushed to drm-misc-next, thx for patch&review.
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 8de93a226c24..9a17725b0f7a 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -218,8 +218,9 @@  static void drm_gem_map_detach(struct dma_buf *dma_buf,
 	sgt = prime_attach->sgt;
 	if (sgt) {
 		if (prime_attach->dir != DMA_NONE)
-			dma_unmap_sg(attach->dev, sgt->sgl, sgt->nents,
-					prime_attach->dir);
+			dma_unmap_sg_attrs(attach->dev, sgt->sgl, sgt->nents,
+					   prime_attach->dir,
+					   DMA_ATTR_SKIP_CPU_SYNC);
 		sg_free_table(sgt);
 	}
 
@@ -277,7 +278,8 @@  static struct sg_table *drm_gem_map_dma_buf(struct dma_buf_attachment *attach,
 	sgt = obj->dev->driver->gem_prime_get_sg_table(obj);
 
 	if (!IS_ERR(sgt)) {
-		if (!dma_map_sg(attach->dev, sgt->sgl, sgt->nents, dir)) {
+		if (!dma_map_sg_attrs(attach->dev, sgt->sgl, sgt->nents, dir,
+				      DMA_ATTR_SKIP_CPU_SYNC)) {
 			sg_free_table(sgt);
 			kfree(sgt);
 			sgt = ERR_PTR(-ENOMEM);