drm/i915: Treat foreign dma-buf imports as uncached
diff mbox

Message ID 1439237806-6971-1-git-send-email-chris@chris-wilson.co.uk
State New
Headers show

Commit Message

Chris Wilson Aug. 10, 2015, 8:16 p.m. UTC
If the set of pages is being imported from another device, we cannot
assume that it is fully coherent with the CPU cache, so mark it as such.
However, if the source is the shared memory vgem allocator, we could
treat the buffer as being cached (so long as all parties agree in the
case the same buffer is shared between multiple devices) but as of
today, vgem cannot export dma-bufs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem_dmabuf.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Daniel Vetter Aug. 11, 2015, 10:12 a.m. UTC | #1
On Mon, Aug 10, 2015 at 09:16:46PM +0100, Chris Wilson wrote:
> If the set of pages is being imported from another device, we cannot
> assume that it is fully coherent with the CPU cache, so mark it as such.
> However, if the source is the shared memory vgem allocator, we could
> treat the buffer as being cached (so long as all parties agree in the
> case the same buffer is shared between multiple devices) but as of
> today, vgem cannot export dma-bufs.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

Yay for x86 dma api to refuse acknowleding that incoherent devices exist
(since this should be done in the dma-ap/sg stuff imo for dma-bufs).

But since x86 assumes that everything is coherent shouldn't we do the
inverse and use snooping/cacheable always?

That should be correct for everything modern at least, and for old agp
crap (do we care even, is sharing possible there?) snooping should only
result in a bit more overhead.

Or where exactly does this blow up?
-Daniel


> ---
>  drivers/gpu/drm/i915/i915_gem_dmabuf.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
> index 7748d57adffd..da5e4fb02350 100644
> --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
> +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
> @@ -265,8 +265,22 @@ struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev,
>  	i915_gem_object_init(obj, &i915_gem_object_dmabuf_ops);
>  	obj->base.import_attach = attach;
>  
> +	/* This bo is imported from a foreign HW device with which we cannot
> +	 * assume any coherency. Note that if this dma-buf was imported from
> +	 * a simple page allocator, like vgem, then we could treat this as
> +	 * cacheable (so long as all parties agree on that convention - i.e.
> +	 * if the same bo was shared with nouveau/radeon they also treat it
> +	 * as CPU cacheeable so that coherency is maintained between all
> +	 * parties).
> +	 */
> +	ret = i915_gem_object_set_cache_level(obj, I915_CACHE_NONE);
> +	if (ret)
> +		goto fail_obj;
> +
>  	return &obj->base;
>  
> +fail_obj:
> +	drm_gem_object_unreference(&obj->base);
>  fail_detach:
>  	dma_buf_detach(dma_buf, attach);
>  	dma_buf_put(dma_buf);
> -- 
> 2.5.0
>
Chris Wilson Aug. 11, 2015, 10:24 a.m. UTC | #2
On Tue, Aug 11, 2015 at 12:12:08PM +0200, Daniel Vetter wrote:
> On Mon, Aug 10, 2015 at 09:16:46PM +0100, Chris Wilson wrote:
> > If the set of pages is being imported from another device, we cannot
> > assume that it is fully coherent with the CPU cache, so mark it as such.
> > However, if the source is the shared memory vgem allocator, we could
> > treat the buffer as being cached (so long as all parties agree in the
> > case the same buffer is shared between multiple devices) but as of
> > today, vgem cannot export dma-bufs.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Yay for x86 dma api to refuse acknowleding that incoherent devices exist
> (since this should be done in the dma-ap/sg stuff imo for dma-bufs).
> 
> But since x86 assumes that everything is coherent shouldn't we do the
> inverse and use snooping/cacheable always?

Actually, the convention for PRIME is that transfer bos are uncached
because we have to be sure that any write makes it into the foriegn
pages, that applies to both GPU and CPU writes, precisely because the
target is *incoherent*.

> That should be correct for everything modern at least, and for old agp
> crap (do we care even, is sharing possible there?) snooping should only
> result in a bit more overhead.
> 
> Or where exactly does this blow up?

Consider a kms_crc-esque test using a radeon/nouveau bo imported into
i915 and accessed using i915 ioctls (with the CRC testing on the radeon
scanout).
-Chris
Daniel Vetter Aug. 11, 2015, 12:58 p.m. UTC | #3
On Tue, Aug 11, 2015 at 11:24:58AM +0100, Chris Wilson wrote:
> On Tue, Aug 11, 2015 at 12:12:08PM +0200, Daniel Vetter wrote:
> > On Mon, Aug 10, 2015 at 09:16:46PM +0100, Chris Wilson wrote:
> > > If the set of pages is being imported from another device, we cannot
> > > assume that it is fully coherent with the CPU cache, so mark it as such.
> > > However, if the source is the shared memory vgem allocator, we could
> > > treat the buffer as being cached (so long as all parties agree in the
> > > case the same buffer is shared between multiple devices) but as of
> > > today, vgem cannot export dma-bufs.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > 
> > Yay for x86 dma api to refuse acknowleding that incoherent devices exist
> > (since this should be done in the dma-ap/sg stuff imo for dma-bufs).
> > 
> > But since x86 assumes that everything is coherent shouldn't we do the
> > inverse and use snooping/cacheable always?
> 
> Actually, the convention for PRIME is that transfer bos are uncached
> because we have to be sure that any write makes it into the foriegn
> pages, that applies to both GPU and CPU writes, precisely because the
> target is *incoherent*.
> 
> > That should be correct for everything modern at least, and for old agp
> > crap (do we care even, is sharing possible there?) snooping should only
> > result in a bit more overhead.
> > 
> > Or where exactly does this blow up?
> 
> Consider a kms_crc-esque test using a radeon/nouveau bo imported into
> i915 and accessed using i915 ioctls (with the CRC testing on the radeon
> scanout).

Scanout on radeon must be in vram afaik, and that's a place i915 can't get
at. I think that even holds true for the integrated radones (they just use
stolen for vram then).

The other way round also doesn't work I think since i915 can't change the
caching policy for radoen/nouveau access if radeon/nouveau want to write
directly to main memory. Otoh I think pcie transactions just snoop the
cache and never put anything in there.

I still think that everything we import and don't have a clue about should
probably be treated as snooped conceptually. But practically who knows
what's going on ...
-Daniel
Chris Wilson Aug. 11, 2015, 1:05 p.m. UTC | #4
On Tue, Aug 11, 2015 at 02:58:59PM +0200, Daniel Vetter wrote:
> On Tue, Aug 11, 2015 at 11:24:58AM +0100, Chris Wilson wrote:
> > On Tue, Aug 11, 2015 at 12:12:08PM +0200, Daniel Vetter wrote:
> > > On Mon, Aug 10, 2015 at 09:16:46PM +0100, Chris Wilson wrote:
> > > > If the set of pages is being imported from another device, we cannot
> > > > assume that it is fully coherent with the CPU cache, so mark it as such.
> > > > However, if the source is the shared memory vgem allocator, we could
> > > > treat the buffer as being cached (so long as all parties agree in the
> > > > case the same buffer is shared between multiple devices) but as of
> > > > today, vgem cannot export dma-bufs.
> > > > 
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > 
> > > Yay for x86 dma api to refuse acknowleding that incoherent devices exist
> > > (since this should be done in the dma-ap/sg stuff imo for dma-bufs).
> > > 
> > > But since x86 assumes that everything is coherent shouldn't we do the
> > > inverse and use snooping/cacheable always?
> > 
> > Actually, the convention for PRIME is that transfer bos are uncached
> > because we have to be sure that any write makes it into the foriegn
> > pages, that applies to both GPU and CPU writes, precisely because the
> > target is *incoherent*.
> > 
> > > That should be correct for everything modern at least, and for old agp
> > > crap (do we care even, is sharing possible there?) snooping should only
> > > result in a bit more overhead.
> > > 
> > > Or where exactly does this blow up?
> > 
> > Consider a kms_crc-esque test using a radeon/nouveau bo imported into
> > i915 and accessed using i915 ioctls (with the CRC testing on the radeon
> > scanout).
> 
> Scanout on radeon must be in vram afaik, and that's a place i915 can't get
> at. I think that even holds true for the integrated radones (they just use
> stolen for vram then).
> 
> The other way round also doesn't work I think since i915 can't change the
> caching policy for radoen/nouveau access if radeon/nouveau want to write
> directly to main memory. Otoh I think pcie transactions just snoop the
> cache and never put anything in there.
> 
> I still think that everything we import and don't have a clue about should
> probably be treated as snooped conceptually. But practically who knows
> what's going on ...

I'd say setting as uncached is safer. But at the moment, the policy is
just whatever the platform default is and that leads to interesting
platform specific behaviour.

At the very least we do want snooping if we ever do get a shmem dma-buf
source.
-Chris
Shuang He Aug. 14, 2015, 9:48 a.m. UTC | #5
Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 7140
-------------------------------------Summary-------------------------------------
Platform          Delta          drm-intel-nightly          Series Applied
ILK                                  302/302              302/302
SNB                                  315/315              315/315
IVB                                  336/336              336/336
BYT                                  283/283              283/283
HSW                                  378/378              378/378
-------------------------------------Detailed-------------------------------------
Platform  Test                                drm-intel-nightly          Series Applied
Note: You need to pay more attention to line start with '*'

Patch
diff mbox

diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index 7748d57adffd..da5e4fb02350 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -265,8 +265,22 @@  struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev,
 	i915_gem_object_init(obj, &i915_gem_object_dmabuf_ops);
 	obj->base.import_attach = attach;
 
+	/* This bo is imported from a foreign HW device with which we cannot
+	 * assume any coherency. Note that if this dma-buf was imported from
+	 * a simple page allocator, like vgem, then we could treat this as
+	 * cacheable (so long as all parties agree on that convention - i.e.
+	 * if the same bo was shared with nouveau/radeon they also treat it
+	 * as CPU cacheeable so that coherency is maintained between all
+	 * parties).
+	 */
+	ret = i915_gem_object_set_cache_level(obj, I915_CACHE_NONE);
+	if (ret)
+		goto fail_obj;
+
 	return &obj->base;
 
+fail_obj:
+	drm_gem_object_unreference(&obj->base);
 fail_detach:
 	dma_buf_detach(dma_buf, attach);
 	dma_buf_put(dma_buf);