Message ID | 1362162777-21626-1-git-send-email-ville.syrjala@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Mar 01, 2013 at 08:32:57PM +0200, ville.syrjala@linux.intel.com wrote: > From: Ville Syrjälä <ville.syrjala@linux.intel.com> > > Currently all scanout buffers must be uncached because the > display controller doesn't snoop the LLC. SNB introduced another > method to guarantee coherency for the display controller. It's > called the GFDT or graphics data type. > > Pages that have the GFDT bit enabled in their PTEs get flushed > all the way to memory when a MI_FLUSH_DW or PIPE_CONTROL is > issued with the "synchronize GFDT" bit set. > > So rather than making all scanout buffers uncached, set the GFDT > bit in their PTEs, and modify the ring flush functions to enable > the "synchronize GFDT" bit. > > On HSW the GFDT bit was removed from the PTE, and it's only present in > surface state, so we can't really set it from the kernel. Also the docs > state that the hardware isn't actually guaranteed to respect the GFDT > bit. So it looks like GFDT isn't all that useful on HSW. > > So far I've tried this very quickly on an IVB machine, and > it seems to be working as advertised. No idea if it does any > good though. > > TODO: > - make sure there are no missing flushes (CPU access doesn't > respect GFDT after all). > - measure it and see if there's some real benefit > - maybe we can track whether "synchronize GFDT" needs to be > issued, and skip it when possible. needs some numbers to > determine if it's worthwile. > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Iirc when I've tried this out a while back it regressed a few benchmarks. Chris&me suspected cache trahsing, but hard to tell without proper instrumentation support. Chris played around with a few tricks to mark other giant bos as uncacheable, but he couldn't find any improved workloads. In short I think this needs to come with decent amounts of numbers attached, like the TODO says ;-) The other thing was that I didn't manage to get things to work properly, leaving some random cachline dirt on the screen. Looking at your code, you add the gfdt flush to every ring_flush, whereas I've tried to be clever and only flushed after batches rendering to the frontbuffer. So probably a bug in my code, or a flush on a given ring doesn't flush out caches for one of the other engines. And finally a bikeshed: I'd vote for a slightly more generice unsigned long cache_flags instead of bool gfdt. Cheers, Daniel > --- > drivers/gpu/drm/i915/i915_drv.h | 18 +++-- > drivers/gpu/drm/i915/i915_gem.c | 109 ++++++++++++++++++++++++----- > drivers/gpu/drm/i915/i915_gem_context.c | 3 +- > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 10 +-- > drivers/gpu/drm/i915/i915_gem_gtt.c | 38 ++++++---- > drivers/gpu/drm/i915/i915_reg.h | 2 + > drivers/gpu/drm/i915/intel_display.c | 4 +- > drivers/gpu/drm/i915/intel_overlay.c | 2 +- > drivers/gpu/drm/i915/intel_ringbuffer.c | 25 +++++++ > 9 files changed, 167 insertions(+), 44 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index e95337c..537b344 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -408,7 +408,8 @@ struct i915_gtt { > void (*gtt_insert_entries)(struct drm_device *dev, > struct sg_table *st, > unsigned int pg_start, > - enum i915_cache_level cache_level); > + enum i915_cache_level cache_level, > + bool gfdt); > }; > #define gtt_total_entries(gtt) ((gtt).total >> PAGE_SHIFT) > > @@ -429,7 +430,8 @@ struct i915_hw_ppgtt { > void (*insert_entries)(struct i915_hw_ppgtt *ppgtt, > struct sg_table *st, > unsigned int pg_start, > - enum i915_cache_level cache_level); > + enum i915_cache_level cache_level, > + bool gfdt); > void (*cleanup)(struct i915_hw_ppgtt *ppgtt); > }; > > @@ -1169,6 +1171,7 @@ struct drm_i915_gem_object { > unsigned int fenced_gpu_access:1; > > unsigned int cache_level:2; > + unsigned int gfdt:1; > > unsigned int has_aliasing_ppgtt_mapping:1; > unsigned int has_global_gtt_mapping:1; > @@ -1310,6 +1313,10 @@ struct drm_i915_file_private { > #define HAS_LLC(dev) (INTEL_INFO(dev)->has_llc) > #define I915_NEED_GFX_HWS(dev) (INTEL_INFO(dev)->need_gfx_hws) > > +/* Only SNB and IVB have GFDT in PTEs */ > +#define HAS_GFDT(dev) (HAS_LLC(dev) && !IS_HASWELL(dev) && \ > + (IS_GEN6(dev) || IS_GEN7(dev))) > + > #define HAS_HW_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 6) > #define HAS_ALIASING_PPGTT(dev) (INTEL_INFO(dev)->gen >=6 && !IS_VALLEYVIEW(dev)) > > @@ -1639,7 +1646,8 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, > int __must_check > i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write); > int __must_check > -i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, > +i915_gem_object_pin_to_display_plane(struct drm_device *dev, > + struct drm_i915_gem_object *obj, > u32 alignment, > struct intel_ring_buffer *pipelined); > int i915_gem_attach_phys_object(struct drm_device *dev, > @@ -1681,14 +1689,14 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data, > void i915_gem_cleanup_aliasing_ppgtt(struct drm_device *dev); > void i915_ppgtt_bind_object(struct i915_hw_ppgtt *ppgtt, > struct drm_i915_gem_object *obj, > - enum i915_cache_level cache_level); > + enum i915_cache_level cache_level, bool gfdt); > void i915_ppgtt_unbind_object(struct i915_hw_ppgtt *ppgtt, > struct drm_i915_gem_object *obj); > > void i915_gem_restore_gtt_mappings(struct drm_device *dev); > int __must_check i915_gem_gtt_prepare_object(struct drm_i915_gem_object *obj); > void i915_gem_gtt_bind_object(struct drm_i915_gem_object *obj, > - enum i915_cache_level cache_level); > + enum i915_cache_level cache_level, bool gfdt); > void i915_gem_gtt_unbind_object(struct drm_i915_gem_object *obj); > void i915_gem_gtt_finish_object(struct drm_i915_gem_object *obj); > void i915_gem_init_global_gtt(struct drm_device *dev); > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 8413ffc..c635430 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -3035,8 +3035,13 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj) > * and flushes when moving into and out of the RENDER domain, correct > * snooping behaviour occurs naturally as the result of our domain > * tracking. > + * > + * Also flush all pinned objects with GFDT enabled. Such objects > + * may be used for scanout, and the CPU doesn't know anything > + * about GFDT. > */ > - if (obj->cache_level != I915_CACHE_NONE) > + if (obj->cache_level != I915_CACHE_NONE && > + !(obj->gfdt && obj->pin_count)) > return; > > trace_i915_gem_object_clflush(obj); > @@ -3154,6 +3159,11 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj, > struct drm_device *dev = obj->base.dev; > drm_i915_private_t *dev_priv = dev->dev_private; > int ret; > + bool gfdt = obj->gfdt; > + > + /* Clear GFDT when moving to uncached */ > + if (cache_level == I915_CACHE_NONE) > + gfdt = false; > > if (obj->cache_level == cache_level) > return 0; > @@ -3187,10 +3197,10 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj, > } > > if (obj->has_global_gtt_mapping) > - i915_gem_gtt_bind_object(obj, cache_level); > + i915_gem_gtt_bind_object(obj, cache_level, gfdt); > if (obj->has_aliasing_ppgtt_mapping) > i915_ppgtt_bind_object(dev_priv->mm.aliasing_ppgtt, > - obj, cache_level); > + obj, cache_level, gfdt); > > obj->gtt_space->color = cache_level; > } > @@ -3219,6 +3229,65 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj, > } > > obj->cache_level = cache_level; > + obj->gfdt = gfdt; > + i915_gem_verify_gtt(dev); > + return 0; > +} > + > +static int i915_gem_object_set_gfdt(struct drm_i915_gem_object *obj, > + bool gfdt) > +{ > + struct drm_device *dev = obj->base.dev; > + drm_i915_private_t *dev_priv = dev->dev_private; > + int ret; > + > + if (!HAS_GFDT(dev)) > + return -ENODEV; > + > + if (obj->gfdt == gfdt) > + return 0; > + > + /* no point in setting GFDT on uncached object */ > + if (obj->cache_level == I915_CACHE_NONE) > + return -EINVAL; > + > + if (obj->gtt_space) { > + ret = i915_gem_object_finish_gpu(obj); > + if (ret) > + return ret; > + > + i915_gem_object_finish_gtt(obj); > + > + if (obj->has_global_gtt_mapping) > + i915_gem_gtt_bind_object(obj, obj->cache_level, gfdt); > + if (obj->has_aliasing_ppgtt_mapping) > + i915_ppgtt_bind_object(dev_priv->mm.aliasing_ppgtt, > + obj, obj->cache_level, gfdt); > + } > + > + if (gfdt) { > + u32 old_read_domains, old_write_domain; > + > + /* > + * Since we're just now enabling GFDT there's most > + * likely dirty data in the LLC. It must be flushed > + * to memory the old fashined way. > + */ > + WARN_ON(obj->base.write_domain & ~I915_GEM_DOMAIN_CPU); > + WARN_ON(obj->base.read_domains & ~I915_GEM_DOMAIN_CPU); > + > + old_read_domains = obj->base.read_domains; > + old_write_domain = obj->base.write_domain; > + > + obj->base.read_domains = I915_GEM_DOMAIN_CPU; > + obj->base.write_domain = I915_GEM_DOMAIN_CPU; > + > + trace_i915_gem_object_change_domain(obj, > + old_read_domains, > + old_write_domain); > + } > + > + obj->gfdt = gfdt; > i915_gem_verify_gtt(dev); > return 0; > } > @@ -3291,7 +3360,8 @@ unlock: > * any flushes to be pipelined (for pageflips). > */ > int > -i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, > +i915_gem_object_pin_to_display_plane(struct drm_device *dev, > + struct drm_i915_gem_object *obj, > u32 alignment, > struct intel_ring_buffer *pipelined) > { > @@ -3304,18 +3374,23 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, > return ret; > } > > - /* The display engine is not coherent with the LLC cache on gen6. As > - * a result, we make sure that the pinning that is about to occur is > - * done with uncached PTEs. This is lowest common denominator for all > - * chipsets. > - * > - * However for gen6+, we could do better by using the GFDT bit instead > - * of uncaching, which would allow us to flush all the LLC-cached data > - * with that bit in the PTE to main memory with just one PIPE_CONTROL. > + /* > + * Try to set the GFDT bit instead of uncaching. This allow us to flush > + * all the LLC-cached data with that bit in the PTE to main memory with > + * just one PIPE_CONTROL. > */ > - ret = i915_gem_object_set_cache_level(obj, I915_CACHE_NONE); > - if (ret) > - return ret; > + ret = i915_gem_object_set_gfdt(obj, true); > + if (ret) { > + /* > + * The display engine is not coherent with the LLC cache on gen6. As > + * a result, we make sure that the pinning that is about to occur is > + * done with uncached PTEs. This is lowest common denominator for all > + * chipsets. > + */ > + ret = i915_gem_object_set_cache_level(obj, I915_CACHE_NONE); > + if (ret) > + return ret; > + } > > /* As the user may map the buffer once pinned in the display plane > * (e.g. libkms for the bootup splash), we have to ensure that we > @@ -3499,11 +3574,11 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj, > return ret; > > if (!dev_priv->mm.aliasing_ppgtt) > - i915_gem_gtt_bind_object(obj, obj->cache_level); > + i915_gem_gtt_bind_object(obj, obj->cache_level, obj->gfdt); > } > > if (!obj->has_global_gtt_mapping && map_and_fenceable) > - i915_gem_gtt_bind_object(obj, obj->cache_level); > + i915_gem_gtt_bind_object(obj, obj->cache_level, obj->gfdt); > > obj->pin_count++; > obj->pin_mappable |= map_and_fenceable; > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > index 21177d9..7244c0c 100644 > --- a/drivers/gpu/drm/i915/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > @@ -382,7 +382,8 @@ static int do_switch(struct i915_hw_context *to) > } > > if (!to->obj->has_global_gtt_mapping) > - i915_gem_gtt_bind_object(to->obj, to->obj->cache_level); > + i915_gem_gtt_bind_object(to->obj, to->obj->cache_level, > + to->obj->gfdt); > > if (!to->is_initialized || is_default_context(to)) > hw_flags |= MI_RESTORE_INHIBIT; > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > index 2f2daeb..28a3c42 100644 > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > @@ -197,7 +197,8 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj, > reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION && > !target_i915_obj->has_global_gtt_mapping)) { > i915_gem_gtt_bind_object(target_i915_obj, > - target_i915_obj->cache_level); > + target_i915_obj->cache_level, > + target_i915_obj->gfdt); > } > > /* Validate that the target is in a valid r/w GPU domain */ > @@ -432,7 +433,7 @@ i915_gem_execbuffer_reserve_object(struct drm_i915_gem_object *obj, > /* Ensure ppgtt mapping exists if needed */ > if (dev_priv->mm.aliasing_ppgtt && !obj->has_aliasing_ppgtt_mapping) { > i915_ppgtt_bind_object(dev_priv->mm.aliasing_ppgtt, > - obj, obj->cache_level); > + obj, obj->cache_level, obj->gfdt); > > obj->has_aliasing_ppgtt_mapping = 1; > } > @@ -449,7 +450,7 @@ i915_gem_execbuffer_reserve_object(struct drm_i915_gem_object *obj, > > if (entry->flags & EXEC_OBJECT_NEEDS_GTT && > !obj->has_global_gtt_mapping) > - i915_gem_gtt_bind_object(obj, obj->cache_level); > + i915_gem_gtt_bind_object(obj, obj->cache_level, obj->gfdt); > > return 0; > } > @@ -1011,7 +1012,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data, > * hsw should have this fixed, but let's be paranoid and do it > * unconditionally for now. */ > if (flags & I915_DISPATCH_SECURE && !batch_obj->has_global_gtt_mapping) > - i915_gem_gtt_bind_object(batch_obj, batch_obj->cache_level); > + i915_gem_gtt_bind_object(batch_obj, batch_obj->cache_level, > + batch_obj->gfdt); > > ret = i915_gem_execbuffer_move_to_gpu(ring, &eb->objects); > if (ret) > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c > index 926a1e2..3c7e48b 100644 > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > @@ -42,15 +42,20 @@ typedef uint32_t gtt_pte_t; > #define HSW_PTE_UNCACHED (0) > #define GEN6_PTE_CACHE_LLC (2 << 1) > #define GEN6_PTE_CACHE_LLC_MLC (3 << 1) > +#define GEN6_PTE_GFDT (1 << 3) > #define GEN6_PTE_ADDR_ENCODE(addr) GEN6_GTT_ADDR_ENCODE(addr) > > static inline gtt_pte_t gen6_pte_encode(struct drm_device *dev, > dma_addr_t addr, > - enum i915_cache_level level) > + enum i915_cache_level level, > + bool gfdt) > { > gtt_pte_t pte = GEN6_PTE_VALID; > pte |= GEN6_PTE_ADDR_ENCODE(addr); > > + if (gfdt && HAS_GFDT(dev)) > + pte |= GEN6_PTE_GFDT; > + > switch (level) { > case I915_CACHE_LLC_MLC: > /* Haswell doesn't set L3 this way */ > @@ -89,7 +94,7 @@ static void gen6_ppgtt_clear_range(struct i915_hw_ppgtt *ppgtt, > > scratch_pte = gen6_pte_encode(ppgtt->dev, > ppgtt->scratch_page_dma_addr, > - I915_CACHE_LLC); > + I915_CACHE_LLC, false); > > while (num_entries) { > last_pte = first_pte + num_entries; > @@ -112,7 +117,8 @@ static void gen6_ppgtt_clear_range(struct i915_hw_ppgtt *ppgtt, > static void gen6_ppgtt_insert_entries(struct i915_hw_ppgtt *ppgtt, > struct sg_table *pages, > unsigned first_entry, > - enum i915_cache_level cache_level) > + enum i915_cache_level cache_level, > + bool gfdt) > { > gtt_pte_t *pt_vaddr; > unsigned act_pd = first_entry / I915_PPGTT_PT_ENTRIES; > @@ -133,7 +139,7 @@ static void gen6_ppgtt_insert_entries(struct i915_hw_ppgtt *ppgtt, > for (j = first_pte; j < I915_PPGTT_PT_ENTRIES; j++) { > page_addr = sg_dma_address(sg) + (m << PAGE_SHIFT); > pt_vaddr[j] = gen6_pte_encode(ppgtt->dev, page_addr, > - cache_level); > + cache_level, gfdt); > > /* grab the next page */ > if (++m == segment_len) { > @@ -279,11 +285,12 @@ void i915_gem_cleanup_aliasing_ppgtt(struct drm_device *dev) > > void i915_ppgtt_bind_object(struct i915_hw_ppgtt *ppgtt, > struct drm_i915_gem_object *obj, > - enum i915_cache_level cache_level) > + enum i915_cache_level cache_level, > + bool gfdt) > { > ppgtt->insert_entries(ppgtt, obj->pages, > obj->gtt_space->start >> PAGE_SHIFT, > - cache_level); > + cache_level, gfdt); > } > > void i915_ppgtt_unbind_object(struct i915_hw_ppgtt *ppgtt, > @@ -401,7 +408,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev) > > list_for_each_entry(obj, &dev_priv->mm.bound_list, gtt_list) { > i915_gem_clflush_object(obj); > - i915_gem_gtt_bind_object(obj, obj->cache_level); > + i915_gem_gtt_bind_object(obj, obj->cache_level, obj->gfdt); > } > > i915_gem_chipset_flush(dev); > @@ -429,7 +436,8 @@ int i915_gem_gtt_prepare_object(struct drm_i915_gem_object *obj) > static void gen6_ggtt_insert_entries(struct drm_device *dev, > struct sg_table *st, > unsigned int first_entry, > - enum i915_cache_level level) > + enum i915_cache_level level, > + bool gfdt) > { > struct drm_i915_private *dev_priv = dev->dev_private; > struct scatterlist *sg = st->sgl; > @@ -443,7 +451,7 @@ static void gen6_ggtt_insert_entries(struct drm_device *dev, > len = sg_dma_len(sg) >> PAGE_SHIFT; > for (m = 0; m < len; m++) { > addr = sg_dma_address(sg) + (m << PAGE_SHIFT); > - iowrite32(gen6_pte_encode(dev, addr, level), > + iowrite32(gen6_pte_encode(dev, addr, level, gfdt), > >t_entries[i]); > i++; > } > @@ -457,7 +465,7 @@ static void gen6_ggtt_insert_entries(struct drm_device *dev, > */ > if (i != 0) > WARN_ON(readl(>t_entries[i-1]) > - != gen6_pte_encode(dev, addr, level)); > + != gen6_pte_encode(dev, addr, level, gfdt)); > > /* This next bit makes the above posting read even more important. We > * want to flush the TLBs only after we're certain all the PTE updates > @@ -483,7 +491,7 @@ static void gen6_ggtt_clear_range(struct drm_device *dev, > num_entries = max_entries; > > scratch_pte = gen6_pte_encode(dev, dev_priv->gtt.scratch_page_dma, > - I915_CACHE_LLC); > + I915_CACHE_LLC, false); > for (i = 0; i < num_entries; i++) > iowrite32(scratch_pte, >t_base[i]); > readl(gtt_base); > @@ -493,7 +501,8 @@ static void gen6_ggtt_clear_range(struct drm_device *dev, > static void i915_ggtt_insert_entries(struct drm_device *dev, > struct sg_table *st, > unsigned int pg_start, > - enum i915_cache_level cache_level) > + enum i915_cache_level cache_level, > + bool gfdt) > { > unsigned int flags = (cache_level == I915_CACHE_NONE) ? > AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY; > @@ -511,14 +520,15 @@ static void i915_ggtt_clear_range(struct drm_device *dev, > > > void i915_gem_gtt_bind_object(struct drm_i915_gem_object *obj, > - enum i915_cache_level cache_level) > + enum i915_cache_level cache_level, > + bool gfdt) > { > struct drm_device *dev = obj->base.dev; > struct drm_i915_private *dev_priv = dev->dev_private; > > dev_priv->gtt.gtt_insert_entries(dev, obj->pages, > obj->gtt_space->start >> PAGE_SHIFT, > - cache_level); > + cache_level, gfdt); > > obj->has_global_gtt_mapping = 1; > } > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h > index cd226c2..6aa29d5 100644 > --- a/drivers/gpu/drm/i915/i915_reg.h > +++ b/drivers/gpu/drm/i915/i915_reg.h > @@ -242,6 +242,7 @@ > #define MI_FLUSH_DW MI_INSTR(0x26, 1) /* for GEN6 */ > #define MI_FLUSH_DW_STORE_INDEX (1<<21) > #define MI_INVALIDATE_TLB (1<<18) > +#define MI_SYNCHRONIZE_GFDT (1<<17) > #define MI_FLUSH_DW_OP_STOREDW (1<<14) > #define MI_INVALIDATE_BSD (1<<7) > #define MI_FLUSH_DW_USE_GTT (1<<2) > @@ -311,6 +312,7 @@ > #define PIPE_CONTROL_GLOBAL_GTT_IVB (1<<24) /* gen7+ */ > #define PIPE_CONTROL_CS_STALL (1<<20) > #define PIPE_CONTROL_TLB_INVALIDATE (1<<18) > +#define PIPE_CONTROL_SYNCHRONIZE_GFDT (1<<17) > #define PIPE_CONTROL_QW_WRITE (1<<14) > #define PIPE_CONTROL_DEPTH_STALL (1<<13) > #define PIPE_CONTROL_WRITE_FLUSH (1<<12) > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c > index d912306..fc99d55 100644 > --- a/drivers/gpu/drm/i915/intel_display.c > +++ b/drivers/gpu/drm/i915/intel_display.c > @@ -1968,7 +1968,7 @@ intel_pin_and_fence_fb_obj(struct drm_device *dev, > } > > dev_priv->mm.interruptible = false; > - ret = i915_gem_object_pin_to_display_plane(obj, alignment, pipelined); > + ret = i915_gem_object_pin_to_display_plane(dev, obj, alignment, pipelined); > if (ret) > goto err_interruptible; > > @@ -6340,7 +6340,7 @@ static int intel_crtc_cursor_set(struct drm_crtc *crtc, > goto fail_locked; > } > > - ret = i915_gem_object_pin_to_display_plane(obj, 0, NULL); > + ret = i915_gem_object_pin_to_display_plane(dev, obj, 0, NULL); > if (ret) { > DRM_ERROR("failed to move cursor bo into the GTT\n"); > goto fail_locked; > diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c > index 67a2501..b4110a2 100644 > --- a/drivers/gpu/drm/i915/intel_overlay.c > +++ b/drivers/gpu/drm/i915/intel_overlay.c > @@ -695,7 +695,7 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay, > if (ret != 0) > return ret; > > - ret = i915_gem_object_pin_to_display_plane(new_bo, 0, NULL); > + ret = i915_gem_object_pin_to_display_plane(dev, new_bo, 0, NULL); > if (ret != 0) > return ret; > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > index 1d5d613..8406247 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -234,6 +234,12 @@ gen6_render_ring_flush(struct intel_ring_buffer *ring, > * when the render cache is indeed flushed. > */ > flags |= PIPE_CONTROL_CS_STALL; > + > + /* Flush GFDT out to memory */ > + if (HAS_GFDT(ring->dev)) { > + flags |= PIPE_CONTROL_QW_WRITE; > + flags |= PIPE_CONTROL_SYNCHRONIZE_GFDT; > + } > } > if (invalidate_domains) { > flags |= PIPE_CONTROL_TLB_INVALIDATE; > @@ -306,6 +312,13 @@ gen7_render_ring_flush(struct intel_ring_buffer *ring, > if (flush_domains) { > flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH; > flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH; > + > + /* Flush GFDT out to memory */ > + if (HAS_GFDT(ring->dev)) { > + flags |= PIPE_CONTROL_QW_WRITE; > + flags |= PIPE_CONTROL_GLOBAL_GTT_IVB; > + flags |= PIPE_CONTROL_SYNCHRONIZE_GFDT; > + } > } > if (invalidate_domains) { > flags |= PIPE_CONTROL_TLB_INVALIDATE; > @@ -1557,6 +1570,12 @@ static int gen6_ring_flush(struct intel_ring_buffer *ring, > return ret; > > cmd = MI_FLUSH_DW; > + > + /* Flush GFDT out to memory */ > + if (flush & I915_GEM_GPU_DOMAINS && HAS_GFDT(ring->dev)) > + cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW | > + MI_SYNCHRONIZE_GFDT; > + > /* > * Bspec vol 1c.5 - video engine command streamer: > * "If ENABLED, all TLBs will be invalidated once the flush > @@ -1629,6 +1648,12 @@ static int blt_ring_flush(struct intel_ring_buffer *ring, > return ret; > > cmd = MI_FLUSH_DW; > + > + /* Flush GFDT out to memory */ > + if (flush & I915_GEM_DOMAIN_RENDER && HAS_GFDT(ring->dev)) > + cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW | > + MI_SYNCHRONIZE_GFDT; > + > /* > * Bspec vol 1c.3 - blitter engine command streamer: > * "If ENABLED, all TLBs will be invalidated once the flush > -- > 1.7.12.4 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Sun, Mar 03, 2013 at 05:28:52PM +0100, Daniel Vetter wrote: > The other thing was that I didn't manage to get things to work properly, > leaving some random cachline dirt on the screen. Looking at your code, you > add the gfdt flush to every ring_flush, whereas I've tried to be clever > and only flushed after batches rendering to the frontbuffer. So probably a > bug in my code, or a flush on a given ring doesn't flush out caches for > one of the other engines. Right, we have a formalized interface for flushes to scanout rather than always flushing GFDT. -Chris
On Sun, Mar 03, 2013 at 05:39:09PM +0000, Chris Wilson wrote: > On Sun, Mar 03, 2013 at 05:28:52PM +0100, Daniel Vetter wrote: > > The other thing was that I didn't manage to get things to work properly, > > leaving some random cachline dirt on the screen. Looking at your code, you > > add the gfdt flush to every ring_flush, whereas I've tried to be clever > > and only flushed after batches rendering to the frontbuffer. So probably a > > bug in my code, or a flush on a given ring doesn't flush out caches for > > one of the other engines. > > Right, we have a formalized interface for flushes to scanout rather than > always flushing GFDT. We do? Where's that?
On Mon, Mar 04, 2013 at 03:51:11PM +0200, Ville Syrjälä wrote: > On Sun, Mar 03, 2013 at 05:39:09PM +0000, Chris Wilson wrote: > > On Sun, Mar 03, 2013 at 05:28:52PM +0100, Daniel Vetter wrote: > > > The other thing was that I didn't manage to get things to work properly, > > > leaving some random cachline dirt on the screen. Looking at your code, you > > > add the gfdt flush to every ring_flush, whereas I've tried to be clever > > > and only flushed after batches rendering to the frontbuffer. So probably a > > > bug in my code, or a flush on a given ring doesn't flush out caches for > > > one of the other engines. > > > > Right, we have a formalized interface for flushes to scanout rather than > > always flushing GFDT. > > We do? Where's that? A side-effect of the API for busy_ioctl() is that all caches for the bo are flushed and any outstanding requests queued in order for the bo to become unbusy in the near-future. This semantic was required for GL query objects to eventually complete without futher interaction. It also solved the problem of having to flush the scanout periodically on gen4+ (and on gen3 if we enable the optimisation bit) and is so enshrined into lore. -Chris
On Sun, Mar 03, 2013 at 05:28:52PM +0100, Daniel Vetter wrote: > On Fri, Mar 01, 2013 at 08:32:57PM +0200, ville.syrjala@linux.intel.com wrote: > > From: Ville Syrjälä <ville.syrjala@linux.intel.com> > > > > Currently all scanout buffers must be uncached because the > > display controller doesn't snoop the LLC. SNB introduced another > > method to guarantee coherency for the display controller. It's > > called the GFDT or graphics data type. > > > > Pages that have the GFDT bit enabled in their PTEs get flushed > > all the way to memory when a MI_FLUSH_DW or PIPE_CONTROL is > > issued with the "synchronize GFDT" bit set. > > > > So rather than making all scanout buffers uncached, set the GFDT > > bit in their PTEs, and modify the ring flush functions to enable > > the "synchronize GFDT" bit. > > > > On HSW the GFDT bit was removed from the PTE, and it's only present in > > surface state, so we can't really set it from the kernel. Also the docs > > state that the hardware isn't actually guaranteed to respect the GFDT > > bit. So it looks like GFDT isn't all that useful on HSW. > > > > So far I've tried this very quickly on an IVB machine, and > > it seems to be working as advertised. No idea if it does any > > good though. > > > > TODO: > > - make sure there are no missing flushes (CPU access doesn't > > respect GFDT after all). > > - measure it and see if there's some real benefit > > - maybe we can track whether "synchronize GFDT" needs to be > > issued, and skip it when possible. needs some numbers to > > determine if it's worthwile. > > > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> > > Iirc when I've tried this out a while back it regressed a few benchmarks. > Chris&me suspected cache trahsing, but hard to tell without proper > instrumentation support. Chris played around with a few tricks to mark > other giant bos as uncacheable, but he couldn't find any improved > workloads. I see. I didn't realize this was tried already. Not that I really planned to implement this in the first place. I was just studying the code and figured I'd learn better by trying to change things a bit ;) But if there's interest I can of course try to improve it further. > In short I think this needs to come with decent amounts of numbers > attached, like the TODO says ;-) > > The other thing was that I didn't manage to get things to work properly, > leaving some random cachline dirt on the screen. Looking at your code, you > add the gfdt flush to every ring_flush, whereas I've tried to be clever > and only flushed after batches rendering to the frontbuffer. So probably a > bug in my code, or a flush on a given ring doesn't flush out caches for > one of the other engines. I had a bug in my first attempt where I forgot the initial clflush. That left some random crap on the screen until I rendered the next frame w/ GFDT already enabled. Other than that I didn't see any corruption except when I intentionally left out the flushes. But as stated my code does too many flushes probably.
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index e95337c..537b344 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -408,7 +408,8 @@ struct i915_gtt { void (*gtt_insert_entries)(struct drm_device *dev, struct sg_table *st, unsigned int pg_start, - enum i915_cache_level cache_level); + enum i915_cache_level cache_level, + bool gfdt); }; #define gtt_total_entries(gtt) ((gtt).total >> PAGE_SHIFT) @@ -429,7 +430,8 @@ struct i915_hw_ppgtt { void (*insert_entries)(struct i915_hw_ppgtt *ppgtt, struct sg_table *st, unsigned int pg_start, - enum i915_cache_level cache_level); + enum i915_cache_level cache_level, + bool gfdt); void (*cleanup)(struct i915_hw_ppgtt *ppgtt); }; @@ -1169,6 +1171,7 @@ struct drm_i915_gem_object { unsigned int fenced_gpu_access:1; unsigned int cache_level:2; + unsigned int gfdt:1; unsigned int has_aliasing_ppgtt_mapping:1; unsigned int has_global_gtt_mapping:1; @@ -1310,6 +1313,10 @@ struct drm_i915_file_private { #define HAS_LLC(dev) (INTEL_INFO(dev)->has_llc) #define I915_NEED_GFX_HWS(dev) (INTEL_INFO(dev)->need_gfx_hws) +/* Only SNB and IVB have GFDT in PTEs */ +#define HAS_GFDT(dev) (HAS_LLC(dev) && !IS_HASWELL(dev) && \ + (IS_GEN6(dev) || IS_GEN7(dev))) + #define HAS_HW_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 6) #define HAS_ALIASING_PPGTT(dev) (INTEL_INFO(dev)->gen >=6 && !IS_VALLEYVIEW(dev)) @@ -1639,7 +1646,8 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, int __must_check i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write); int __must_check -i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, +i915_gem_object_pin_to_display_plane(struct drm_device *dev, + struct drm_i915_gem_object *obj, u32 alignment, struct intel_ring_buffer *pipelined); int i915_gem_attach_phys_object(struct drm_device *dev, @@ -1681,14 +1689,14 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data, void i915_gem_cleanup_aliasing_ppgtt(struct drm_device *dev); void i915_ppgtt_bind_object(struct i915_hw_ppgtt *ppgtt, struct drm_i915_gem_object *obj, - enum i915_cache_level cache_level); + enum i915_cache_level cache_level, bool gfdt); void i915_ppgtt_unbind_object(struct i915_hw_ppgtt *ppgtt, struct drm_i915_gem_object *obj); void i915_gem_restore_gtt_mappings(struct drm_device *dev); int __must_check i915_gem_gtt_prepare_object(struct drm_i915_gem_object *obj); void i915_gem_gtt_bind_object(struct drm_i915_gem_object *obj, - enum i915_cache_level cache_level); + enum i915_cache_level cache_level, bool gfdt); void i915_gem_gtt_unbind_object(struct drm_i915_gem_object *obj); void i915_gem_gtt_finish_object(struct drm_i915_gem_object *obj); void i915_gem_init_global_gtt(struct drm_device *dev); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 8413ffc..c635430 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3035,8 +3035,13 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj) * and flushes when moving into and out of the RENDER domain, correct * snooping behaviour occurs naturally as the result of our domain * tracking. + * + * Also flush all pinned objects with GFDT enabled. Such objects + * may be used for scanout, and the CPU doesn't know anything + * about GFDT. */ - if (obj->cache_level != I915_CACHE_NONE) + if (obj->cache_level != I915_CACHE_NONE && + !(obj->gfdt && obj->pin_count)) return; trace_i915_gem_object_clflush(obj); @@ -3154,6 +3159,11 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj, struct drm_device *dev = obj->base.dev; drm_i915_private_t *dev_priv = dev->dev_private; int ret; + bool gfdt = obj->gfdt; + + /* Clear GFDT when moving to uncached */ + if (cache_level == I915_CACHE_NONE) + gfdt = false; if (obj->cache_level == cache_level) return 0; @@ -3187,10 +3197,10 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj, } if (obj->has_global_gtt_mapping) - i915_gem_gtt_bind_object(obj, cache_level); + i915_gem_gtt_bind_object(obj, cache_level, gfdt); if (obj->has_aliasing_ppgtt_mapping) i915_ppgtt_bind_object(dev_priv->mm.aliasing_ppgtt, - obj, cache_level); + obj, cache_level, gfdt); obj->gtt_space->color = cache_level; } @@ -3219,6 +3229,65 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj, } obj->cache_level = cache_level; + obj->gfdt = gfdt; + i915_gem_verify_gtt(dev); + return 0; +} + +static int i915_gem_object_set_gfdt(struct drm_i915_gem_object *obj, + bool gfdt) +{ + struct drm_device *dev = obj->base.dev; + drm_i915_private_t *dev_priv = dev->dev_private; + int ret; + + if (!HAS_GFDT(dev)) + return -ENODEV; + + if (obj->gfdt == gfdt) + return 0; + + /* no point in setting GFDT on uncached object */ + if (obj->cache_level == I915_CACHE_NONE) + return -EINVAL; + + if (obj->gtt_space) { + ret = i915_gem_object_finish_gpu(obj); + if (ret) + return ret; + + i915_gem_object_finish_gtt(obj); + + if (obj->has_global_gtt_mapping) + i915_gem_gtt_bind_object(obj, obj->cache_level, gfdt); + if (obj->has_aliasing_ppgtt_mapping) + i915_ppgtt_bind_object(dev_priv->mm.aliasing_ppgtt, + obj, obj->cache_level, gfdt); + } + + if (gfdt) { + u32 old_read_domains, old_write_domain; + + /* + * Since we're just now enabling GFDT there's most + * likely dirty data in the LLC. It must be flushed + * to memory the old fashined way. + */ + WARN_ON(obj->base.write_domain & ~I915_GEM_DOMAIN_CPU); + WARN_ON(obj->base.read_domains & ~I915_GEM_DOMAIN_CPU); + + old_read_domains = obj->base.read_domains; + old_write_domain = obj->base.write_domain; + + obj->base.read_domains = I915_GEM_DOMAIN_CPU; + obj->base.write_domain = I915_GEM_DOMAIN_CPU; + + trace_i915_gem_object_change_domain(obj, + old_read_domains, + old_write_domain); + } + + obj->gfdt = gfdt; i915_gem_verify_gtt(dev); return 0; } @@ -3291,7 +3360,8 @@ unlock: * any flushes to be pipelined (for pageflips). */ int -i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, +i915_gem_object_pin_to_display_plane(struct drm_device *dev, + struct drm_i915_gem_object *obj, u32 alignment, struct intel_ring_buffer *pipelined) { @@ -3304,18 +3374,23 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, return ret; } - /* The display engine is not coherent with the LLC cache on gen6. As - * a result, we make sure that the pinning that is about to occur is - * done with uncached PTEs. This is lowest common denominator for all - * chipsets. - * - * However for gen6+, we could do better by using the GFDT bit instead - * of uncaching, which would allow us to flush all the LLC-cached data - * with that bit in the PTE to main memory with just one PIPE_CONTROL. + /* + * Try to set the GFDT bit instead of uncaching. This allow us to flush + * all the LLC-cached data with that bit in the PTE to main memory with + * just one PIPE_CONTROL. */ - ret = i915_gem_object_set_cache_level(obj, I915_CACHE_NONE); - if (ret) - return ret; + ret = i915_gem_object_set_gfdt(obj, true); + if (ret) { + /* + * The display engine is not coherent with the LLC cache on gen6. As + * a result, we make sure that the pinning that is about to occur is + * done with uncached PTEs. This is lowest common denominator for all + * chipsets. + */ + ret = i915_gem_object_set_cache_level(obj, I915_CACHE_NONE); + if (ret) + return ret; + } /* As the user may map the buffer once pinned in the display plane * (e.g. libkms for the bootup splash), we have to ensure that we @@ -3499,11 +3574,11 @@ i915_gem_object_pin(struct drm_i915_gem_object *obj, return ret; if (!dev_priv->mm.aliasing_ppgtt) - i915_gem_gtt_bind_object(obj, obj->cache_level); + i915_gem_gtt_bind_object(obj, obj->cache_level, obj->gfdt); } if (!obj->has_global_gtt_mapping && map_and_fenceable) - i915_gem_gtt_bind_object(obj, obj->cache_level); + i915_gem_gtt_bind_object(obj, obj->cache_level, obj->gfdt); obj->pin_count++; obj->pin_mappable |= map_and_fenceable; diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 21177d9..7244c0c 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -382,7 +382,8 @@ static int do_switch(struct i915_hw_context *to) } if (!to->obj->has_global_gtt_mapping) - i915_gem_gtt_bind_object(to->obj, to->obj->cache_level); + i915_gem_gtt_bind_object(to->obj, to->obj->cache_level, + to->obj->gfdt); if (!to->is_initialized || is_default_context(to)) hw_flags |= MI_RESTORE_INHIBIT; diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 2f2daeb..28a3c42 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -197,7 +197,8 @@ i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj, reloc->write_domain == I915_GEM_DOMAIN_INSTRUCTION && !target_i915_obj->has_global_gtt_mapping)) { i915_gem_gtt_bind_object(target_i915_obj, - target_i915_obj->cache_level); + target_i915_obj->cache_level, + target_i915_obj->gfdt); } /* Validate that the target is in a valid r/w GPU domain */ @@ -432,7 +433,7 @@ i915_gem_execbuffer_reserve_object(struct drm_i915_gem_object *obj, /* Ensure ppgtt mapping exists if needed */ if (dev_priv->mm.aliasing_ppgtt && !obj->has_aliasing_ppgtt_mapping) { i915_ppgtt_bind_object(dev_priv->mm.aliasing_ppgtt, - obj, obj->cache_level); + obj, obj->cache_level, obj->gfdt); obj->has_aliasing_ppgtt_mapping = 1; } @@ -449,7 +450,7 @@ i915_gem_execbuffer_reserve_object(struct drm_i915_gem_object *obj, if (entry->flags & EXEC_OBJECT_NEEDS_GTT && !obj->has_global_gtt_mapping) - i915_gem_gtt_bind_object(obj, obj->cache_level); + i915_gem_gtt_bind_object(obj, obj->cache_level, obj->gfdt); return 0; } @@ -1011,7 +1012,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data, * hsw should have this fixed, but let's be paranoid and do it * unconditionally for now. */ if (flags & I915_DISPATCH_SECURE && !batch_obj->has_global_gtt_mapping) - i915_gem_gtt_bind_object(batch_obj, batch_obj->cache_level); + i915_gem_gtt_bind_object(batch_obj, batch_obj->cache_level, + batch_obj->gfdt); ret = i915_gem_execbuffer_move_to_gpu(ring, &eb->objects); if (ret) diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 926a1e2..3c7e48b 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -42,15 +42,20 @@ typedef uint32_t gtt_pte_t; #define HSW_PTE_UNCACHED (0) #define GEN6_PTE_CACHE_LLC (2 << 1) #define GEN6_PTE_CACHE_LLC_MLC (3 << 1) +#define GEN6_PTE_GFDT (1 << 3) #define GEN6_PTE_ADDR_ENCODE(addr) GEN6_GTT_ADDR_ENCODE(addr) static inline gtt_pte_t gen6_pte_encode(struct drm_device *dev, dma_addr_t addr, - enum i915_cache_level level) + enum i915_cache_level level, + bool gfdt) { gtt_pte_t pte = GEN6_PTE_VALID; pte |= GEN6_PTE_ADDR_ENCODE(addr); + if (gfdt && HAS_GFDT(dev)) + pte |= GEN6_PTE_GFDT; + switch (level) { case I915_CACHE_LLC_MLC: /* Haswell doesn't set L3 this way */ @@ -89,7 +94,7 @@ static void gen6_ppgtt_clear_range(struct i915_hw_ppgtt *ppgtt, scratch_pte = gen6_pte_encode(ppgtt->dev, ppgtt->scratch_page_dma_addr, - I915_CACHE_LLC); + I915_CACHE_LLC, false); while (num_entries) { last_pte = first_pte + num_entries; @@ -112,7 +117,8 @@ static void gen6_ppgtt_clear_range(struct i915_hw_ppgtt *ppgtt, static void gen6_ppgtt_insert_entries(struct i915_hw_ppgtt *ppgtt, struct sg_table *pages, unsigned first_entry, - enum i915_cache_level cache_level) + enum i915_cache_level cache_level, + bool gfdt) { gtt_pte_t *pt_vaddr; unsigned act_pd = first_entry / I915_PPGTT_PT_ENTRIES; @@ -133,7 +139,7 @@ static void gen6_ppgtt_insert_entries(struct i915_hw_ppgtt *ppgtt, for (j = first_pte; j < I915_PPGTT_PT_ENTRIES; j++) { page_addr = sg_dma_address(sg) + (m << PAGE_SHIFT); pt_vaddr[j] = gen6_pte_encode(ppgtt->dev, page_addr, - cache_level); + cache_level, gfdt); /* grab the next page */ if (++m == segment_len) { @@ -279,11 +285,12 @@ void i915_gem_cleanup_aliasing_ppgtt(struct drm_device *dev) void i915_ppgtt_bind_object(struct i915_hw_ppgtt *ppgtt, struct drm_i915_gem_object *obj, - enum i915_cache_level cache_level) + enum i915_cache_level cache_level, + bool gfdt) { ppgtt->insert_entries(ppgtt, obj->pages, obj->gtt_space->start >> PAGE_SHIFT, - cache_level); + cache_level, gfdt); } void i915_ppgtt_unbind_object(struct i915_hw_ppgtt *ppgtt, @@ -401,7 +408,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev) list_for_each_entry(obj, &dev_priv->mm.bound_list, gtt_list) { i915_gem_clflush_object(obj); - i915_gem_gtt_bind_object(obj, obj->cache_level); + i915_gem_gtt_bind_object(obj, obj->cache_level, obj->gfdt); } i915_gem_chipset_flush(dev); @@ -429,7 +436,8 @@ int i915_gem_gtt_prepare_object(struct drm_i915_gem_object *obj) static void gen6_ggtt_insert_entries(struct drm_device *dev, struct sg_table *st, unsigned int first_entry, - enum i915_cache_level level) + enum i915_cache_level level, + bool gfdt) { struct drm_i915_private *dev_priv = dev->dev_private; struct scatterlist *sg = st->sgl; @@ -443,7 +451,7 @@ static void gen6_ggtt_insert_entries(struct drm_device *dev, len = sg_dma_len(sg) >> PAGE_SHIFT; for (m = 0; m < len; m++) { addr = sg_dma_address(sg) + (m << PAGE_SHIFT); - iowrite32(gen6_pte_encode(dev, addr, level), + iowrite32(gen6_pte_encode(dev, addr, level, gfdt), >t_entries[i]); i++; } @@ -457,7 +465,7 @@ static void gen6_ggtt_insert_entries(struct drm_device *dev, */ if (i != 0) WARN_ON(readl(>t_entries[i-1]) - != gen6_pte_encode(dev, addr, level)); + != gen6_pte_encode(dev, addr, level, gfdt)); /* This next bit makes the above posting read even more important. We * want to flush the TLBs only after we're certain all the PTE updates @@ -483,7 +491,7 @@ static void gen6_ggtt_clear_range(struct drm_device *dev, num_entries = max_entries; scratch_pte = gen6_pte_encode(dev, dev_priv->gtt.scratch_page_dma, - I915_CACHE_LLC); + I915_CACHE_LLC, false); for (i = 0; i < num_entries; i++) iowrite32(scratch_pte, >t_base[i]); readl(gtt_base); @@ -493,7 +501,8 @@ static void gen6_ggtt_clear_range(struct drm_device *dev, static void i915_ggtt_insert_entries(struct drm_device *dev, struct sg_table *st, unsigned int pg_start, - enum i915_cache_level cache_level) + enum i915_cache_level cache_level, + bool gfdt) { unsigned int flags = (cache_level == I915_CACHE_NONE) ? AGP_USER_MEMORY : AGP_USER_CACHED_MEMORY; @@ -511,14 +520,15 @@ static void i915_ggtt_clear_range(struct drm_device *dev, void i915_gem_gtt_bind_object(struct drm_i915_gem_object *obj, - enum i915_cache_level cache_level) + enum i915_cache_level cache_level, + bool gfdt) { struct drm_device *dev = obj->base.dev; struct drm_i915_private *dev_priv = dev->dev_private; dev_priv->gtt.gtt_insert_entries(dev, obj->pages, obj->gtt_space->start >> PAGE_SHIFT, - cache_level); + cache_level, gfdt); obj->has_global_gtt_mapping = 1; } diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index cd226c2..6aa29d5 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -242,6 +242,7 @@ #define MI_FLUSH_DW MI_INSTR(0x26, 1) /* for GEN6 */ #define MI_FLUSH_DW_STORE_INDEX (1<<21) #define MI_INVALIDATE_TLB (1<<18) +#define MI_SYNCHRONIZE_GFDT (1<<17) #define MI_FLUSH_DW_OP_STOREDW (1<<14) #define MI_INVALIDATE_BSD (1<<7) #define MI_FLUSH_DW_USE_GTT (1<<2) @@ -311,6 +312,7 @@ #define PIPE_CONTROL_GLOBAL_GTT_IVB (1<<24) /* gen7+ */ #define PIPE_CONTROL_CS_STALL (1<<20) #define PIPE_CONTROL_TLB_INVALIDATE (1<<18) +#define PIPE_CONTROL_SYNCHRONIZE_GFDT (1<<17) #define PIPE_CONTROL_QW_WRITE (1<<14) #define PIPE_CONTROL_DEPTH_STALL (1<<13) #define PIPE_CONTROL_WRITE_FLUSH (1<<12) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index d912306..fc99d55 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -1968,7 +1968,7 @@ intel_pin_and_fence_fb_obj(struct drm_device *dev, } dev_priv->mm.interruptible = false; - ret = i915_gem_object_pin_to_display_plane(obj, alignment, pipelined); + ret = i915_gem_object_pin_to_display_plane(dev, obj, alignment, pipelined); if (ret) goto err_interruptible; @@ -6340,7 +6340,7 @@ static int intel_crtc_cursor_set(struct drm_crtc *crtc, goto fail_locked; } - ret = i915_gem_object_pin_to_display_plane(obj, 0, NULL); + ret = i915_gem_object_pin_to_display_plane(dev, obj, 0, NULL); if (ret) { DRM_ERROR("failed to move cursor bo into the GTT\n"); goto fail_locked; diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c index 67a2501..b4110a2 100644 --- a/drivers/gpu/drm/i915/intel_overlay.c +++ b/drivers/gpu/drm/i915/intel_overlay.c @@ -695,7 +695,7 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay, if (ret != 0) return ret; - ret = i915_gem_object_pin_to_display_plane(new_bo, 0, NULL); + ret = i915_gem_object_pin_to_display_plane(dev, new_bo, 0, NULL); if (ret != 0) return ret; diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 1d5d613..8406247 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -234,6 +234,12 @@ gen6_render_ring_flush(struct intel_ring_buffer *ring, * when the render cache is indeed flushed. */ flags |= PIPE_CONTROL_CS_STALL; + + /* Flush GFDT out to memory */ + if (HAS_GFDT(ring->dev)) { + flags |= PIPE_CONTROL_QW_WRITE; + flags |= PIPE_CONTROL_SYNCHRONIZE_GFDT; + } } if (invalidate_domains) { flags |= PIPE_CONTROL_TLB_INVALIDATE; @@ -306,6 +312,13 @@ gen7_render_ring_flush(struct intel_ring_buffer *ring, if (flush_domains) { flags |= PIPE_CONTROL_RENDER_TARGET_CACHE_FLUSH; flags |= PIPE_CONTROL_DEPTH_CACHE_FLUSH; + + /* Flush GFDT out to memory */ + if (HAS_GFDT(ring->dev)) { + flags |= PIPE_CONTROL_QW_WRITE; + flags |= PIPE_CONTROL_GLOBAL_GTT_IVB; + flags |= PIPE_CONTROL_SYNCHRONIZE_GFDT; + } } if (invalidate_domains) { flags |= PIPE_CONTROL_TLB_INVALIDATE; @@ -1557,6 +1570,12 @@ static int gen6_ring_flush(struct intel_ring_buffer *ring, return ret; cmd = MI_FLUSH_DW; + + /* Flush GFDT out to memory */ + if (flush & I915_GEM_GPU_DOMAINS && HAS_GFDT(ring->dev)) + cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW | + MI_SYNCHRONIZE_GFDT; + /* * Bspec vol 1c.5 - video engine command streamer: * "If ENABLED, all TLBs will be invalidated once the flush @@ -1629,6 +1648,12 @@ static int blt_ring_flush(struct intel_ring_buffer *ring, return ret; cmd = MI_FLUSH_DW; + + /* Flush GFDT out to memory */ + if (flush & I915_GEM_DOMAIN_RENDER && HAS_GFDT(ring->dev)) + cmd |= MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW | + MI_SYNCHRONIZE_GFDT; + /* * Bspec vol 1c.3 - blitter engine command streamer: * "If ENABLED, all TLBs will be invalidated once the flush