Message ID | 1439237806-6971-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Aug 10, 2015 at 09:16:46PM +0100, Chris Wilson wrote: > If the set of pages is being imported from another device, we cannot > assume that it is fully coherent with the CPU cache, so mark it as such. > However, if the source is the shared memory vgem allocator, we could > treat the buffer as being cached (so long as all parties agree in the > case the same buffer is shared between multiple devices) but as of > today, vgem cannot export dma-bufs. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Yay for x86 dma api to refuse acknowleding that incoherent devices exist (since this should be done in the dma-ap/sg stuff imo for dma-bufs). But since x86 assumes that everything is coherent shouldn't we do the inverse and use snooping/cacheable always? That should be correct for everything modern at least, and for old agp crap (do we care even, is sharing possible there?) snooping should only result in a bit more overhead. Or where exactly does this blow up? -Daniel > --- > drivers/gpu/drm/i915/i915_gem_dmabuf.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c > index 7748d57adffd..da5e4fb02350 100644 > --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c > +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c > @@ -265,8 +265,22 @@ struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev, > i915_gem_object_init(obj, &i915_gem_object_dmabuf_ops); > obj->base.import_attach = attach; > > + /* This bo is imported from a foreign HW device with which we cannot > + * assume any coherency. Note that if this dma-buf was imported from > + * a simple page allocator, like vgem, then we could treat this as > + * cacheable (so long as all parties agree on that convention - i.e. > + * if the same bo was shared with nouveau/radeon they also treat it > + * as CPU cacheeable so that coherency is maintained between all > + * parties). > + */ > + ret = i915_gem_object_set_cache_level(obj, I915_CACHE_NONE); > + if (ret) > + goto fail_obj; > + > return &obj->base; > > +fail_obj: > + drm_gem_object_unreference(&obj->base); > fail_detach: > dma_buf_detach(dma_buf, attach); > dma_buf_put(dma_buf); > -- > 2.5.0 >
On Tue, Aug 11, 2015 at 12:12:08PM +0200, Daniel Vetter wrote: > On Mon, Aug 10, 2015 at 09:16:46PM +0100, Chris Wilson wrote: > > If the set of pages is being imported from another device, we cannot > > assume that it is fully coherent with the CPU cache, so mark it as such. > > However, if the source is the shared memory vgem allocator, we could > > treat the buffer as being cached (so long as all parties agree in the > > case the same buffer is shared between multiple devices) but as of > > today, vgem cannot export dma-bufs. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > > Yay for x86 dma api to refuse acknowleding that incoherent devices exist > (since this should be done in the dma-ap/sg stuff imo for dma-bufs). > > But since x86 assumes that everything is coherent shouldn't we do the > inverse and use snooping/cacheable always? Actually, the convention for PRIME is that transfer bos are uncached because we have to be sure that any write makes it into the foriegn pages, that applies to both GPU and CPU writes, precisely because the target is *incoherent*. > That should be correct for everything modern at least, and for old agp > crap (do we care even, is sharing possible there?) snooping should only > result in a bit more overhead. > > Or where exactly does this blow up? Consider a kms_crc-esque test using a radeon/nouveau bo imported into i915 and accessed using i915 ioctls (with the CRC testing on the radeon scanout). -Chris
On Tue, Aug 11, 2015 at 11:24:58AM +0100, Chris Wilson wrote: > On Tue, Aug 11, 2015 at 12:12:08PM +0200, Daniel Vetter wrote: > > On Mon, Aug 10, 2015 at 09:16:46PM +0100, Chris Wilson wrote: > > > If the set of pages is being imported from another device, we cannot > > > assume that it is fully coherent with the CPU cache, so mark it as such. > > > However, if the source is the shared memory vgem allocator, we could > > > treat the buffer as being cached (so long as all parties agree in the > > > case the same buffer is shared between multiple devices) but as of > > > today, vgem cannot export dma-bufs. > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > > > > Yay for x86 dma api to refuse acknowleding that incoherent devices exist > > (since this should be done in the dma-ap/sg stuff imo for dma-bufs). > > > > But since x86 assumes that everything is coherent shouldn't we do the > > inverse and use snooping/cacheable always? > > Actually, the convention for PRIME is that transfer bos are uncached > because we have to be sure that any write makes it into the foriegn > pages, that applies to both GPU and CPU writes, precisely because the > target is *incoherent*. > > > That should be correct for everything modern at least, and for old agp > > crap (do we care even, is sharing possible there?) snooping should only > > result in a bit more overhead. > > > > Or where exactly does this blow up? > > Consider a kms_crc-esque test using a radeon/nouveau bo imported into > i915 and accessed using i915 ioctls (with the CRC testing on the radeon > scanout). Scanout on radeon must be in vram afaik, and that's a place i915 can't get at. I think that even holds true for the integrated radones (they just use stolen for vram then). The other way round also doesn't work I think since i915 can't change the caching policy for radoen/nouveau access if radeon/nouveau want to write directly to main memory. Otoh I think pcie transactions just snoop the cache and never put anything in there. I still think that everything we import and don't have a clue about should probably be treated as snooped conceptually. But practically who knows what's going on ... -Daniel
On Tue, Aug 11, 2015 at 02:58:59PM +0200, Daniel Vetter wrote: > On Tue, Aug 11, 2015 at 11:24:58AM +0100, Chris Wilson wrote: > > On Tue, Aug 11, 2015 at 12:12:08PM +0200, Daniel Vetter wrote: > > > On Mon, Aug 10, 2015 at 09:16:46PM +0100, Chris Wilson wrote: > > > > If the set of pages is being imported from another device, we cannot > > > > assume that it is fully coherent with the CPU cache, so mark it as such. > > > > However, if the source is the shared memory vgem allocator, we could > > > > treat the buffer as being cached (so long as all parties agree in the > > > > case the same buffer is shared between multiple devices) but as of > > > > today, vgem cannot export dma-bufs. > > > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > > > > > > Yay for x86 dma api to refuse acknowleding that incoherent devices exist > > > (since this should be done in the dma-ap/sg stuff imo for dma-bufs). > > > > > > But since x86 assumes that everything is coherent shouldn't we do the > > > inverse and use snooping/cacheable always? > > > > Actually, the convention for PRIME is that transfer bos are uncached > > because we have to be sure that any write makes it into the foriegn > > pages, that applies to both GPU and CPU writes, precisely because the > > target is *incoherent*. > > > > > That should be correct for everything modern at least, and for old agp > > > crap (do we care even, is sharing possible there?) snooping should only > > > result in a bit more overhead. > > > > > > Or where exactly does this blow up? > > > > Consider a kms_crc-esque test using a radeon/nouveau bo imported into > > i915 and accessed using i915 ioctls (with the CRC testing on the radeon > > scanout). > > Scanout on radeon must be in vram afaik, and that's a place i915 can't get > at. I think that even holds true for the integrated radones (they just use > stolen for vram then). > > The other way round also doesn't work I think since i915 can't change the > caching policy for radoen/nouveau access if radeon/nouveau want to write > directly to main memory. Otoh I think pcie transactions just snoop the > cache and never put anything in there. > > I still think that everything we import and don't have a clue about should > probably be treated as snooped conceptually. But practically who knows > what's going on ... I'd say setting as uncached is safer. But at the moment, the policy is just whatever the platform default is and that leads to interesting platform specific behaviour. At the very least we do want snooping if we ever do get a shmem dma-buf source. -Chris
Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 7140
-------------------------------------Summary-------------------------------------
Platform Delta drm-intel-nightly Series Applied
ILK 302/302 302/302
SNB 315/315 315/315
IVB 336/336 336/336
BYT 283/283 283/283
HSW 378/378 378/378
-------------------------------------Detailed-------------------------------------
Platform Test drm-intel-nightly Series Applied
Note: You need to pay more attention to line start with '*'
diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c index 7748d57adffd..da5e4fb02350 100644 --- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c @@ -265,8 +265,22 @@ struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev, i915_gem_object_init(obj, &i915_gem_object_dmabuf_ops); obj->base.import_attach = attach; + /* This bo is imported from a foreign HW device with which we cannot + * assume any coherency. Note that if this dma-buf was imported from + * a simple page allocator, like vgem, then we could treat this as + * cacheable (so long as all parties agree on that convention - i.e. + * if the same bo was shared with nouveau/radeon they also treat it + * as CPU cacheeable so that coherency is maintained between all + * parties). + */ + ret = i915_gem_object_set_cache_level(obj, I915_CACHE_NONE); + if (ret) + goto fail_obj; + return &obj->base; +fail_obj: + drm_gem_object_unreference(&obj->base); fail_detach: dma_buf_detach(dma_buf, attach); dma_buf_put(dma_buf);
If the set of pages is being imported from another device, we cannot assume that it is fully coherent with the CPU cache, so mark it as such. However, if the source is the shared memory vgem allocator, we could treat the buffer as being cached (so long as all parties agree in the case the same buffer is shared between multiple devices) but as of today, vgem cannot export dma-bufs. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> --- drivers/gpu/drm/i915/i915_gem_dmabuf.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)