diff mbox

drm/i915: clflush on pin_to_display after pwrite to UC bo in LLC

Message ID 20150811142827.GD3701@nuc-i3427.alporthouse.com (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson Aug. 11, 2015, 2:28 p.m. UTC
On Tue, Aug 11, 2015 at 04:59:14PM +0300, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> Currently we don't clflush on pin_to_display if the bo is already
> UC/WT and is not in the CPU write domain. This causes problems with
> pwrite since pwrite doesn't change the write domain, and it avoids
> clflushing on UC/WT buffers on LLC platforms unless the buffer is
> currently being scanned out.
> 
> Fix the problem by marking the cache dirty and adjusting
> i915_gem_object_set_cache_level() to clflush when the cache is dirty
> even if the cache_level doesn't change.
> 
> My last attempt [1] at fixing this via write domain frobbing was shot
> down, but now with the cache_dirty flag we can do things in a nicer way.
> 
> [1] http://lists.freedesktop.org/archives/intel-gfx/2014-November/055390.html
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86422
> Testcase: igt/kms_pwrite_crc
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 73293b4..73eff2e 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1005,12 +1005,15 @@ out:
>  		if (!needs_clflush_after &&
>  		    obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
>  			if (i915_gem_clflush_object(obj, obj->pin_display))
> -				i915_gem_chipset_flush(dev);
> +				needs_clflush_after = true;
>  		}
>  	}
>  
>  	if (needs_clflush_after)
>  		i915_gem_chipset_flush(dev);
> +	else if (obj->cache_level == I915_CACHE_NONE ||
> +		 obj->cache_level == I915_CACHE_WT)
> +		obj->cache_dirty = true;
>  
>  	intel_fb_obj_flush(obj, false, ORIGIN_CPU);
>  	return ret;

Ok, this seems reasonable.

> @@ -3639,10 +3642,10 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>  {
>  	struct drm_device *dev = obj->base.dev;
>  	struct i915_vma *vma, *next;
> -	int ret;
> +	int ret = 0;
>  
>  	if (obj->cache_level == cache_level)
> -		return 0;
> +		goto out;
>  
>  	if (i915_gem_obj_is_pinned(obj)) {
>  		DRM_DEBUG("can not change the cache level of pinned objects\n");
> @@ -3687,6 +3690,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>  		vma->node.color = cache_level;
>  	obj->cache_level = cache_level;
>  
> +out:
>  	if (obj->cache_dirty &&
>  	    obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
>  	    cpu_write_needs_clflush(obj)) {

This we can do better (and I know I am guilty of the original sin). It
just feels a little too loose that we expect pin-to-display-plane should
always call set-cache-level (yes, it has to, but it feels like we are
putting pin-to-display-plane specifics into set-cache-level).

I think this chunk is much more understandable if we did:


Maybe even

/* Whilst the object was away from the scanout we may have been eliding the coherent
 * writes into the CPU cache. However, the moment it has to be read by the display
 * engine, those writes into the CPU cache become inaccessible and so we have to 
 * clflush them out to main memory. We track elided flushes with obj->cache_dirty
 * and hope they are rare.
 */


The other user of set-cache-level (set_caching_ioctl) should not care
about the clflush side-effect and the clflush elision should work just
fine instead.

Either way,
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
but I'd prefer a v2 with the comments :)
-Chris

Comments

Ville Syrjälä Aug. 11, 2015, 2:56 p.m. UTC | #1
On Tue, Aug 11, 2015 at 03:28:27PM +0100, Chris Wilson wrote:
> On Tue, Aug 11, 2015 at 04:59:14PM +0300, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > Currently we don't clflush on pin_to_display if the bo is already
> > UC/WT and is not in the CPU write domain. This causes problems with
> > pwrite since pwrite doesn't change the write domain, and it avoids
> > clflushing on UC/WT buffers on LLC platforms unless the buffer is
> > currently being scanned out.
> > 
> > Fix the problem by marking the cache dirty and adjusting
> > i915_gem_object_set_cache_level() to clflush when the cache is dirty
> > even if the cache_level doesn't change.
> > 
> > My last attempt [1] at fixing this via write domain frobbing was shot
> > down, but now with the cache_dirty flag we can do things in a nicer way.
> > 
> > [1] http://lists.freedesktop.org/archives/intel-gfx/2014-November/055390.html
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86422
> > Testcase: igt/kms_pwrite_crc
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 10 +++++++---
> >  1 file changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 73293b4..73eff2e 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -1005,12 +1005,15 @@ out:
> >  		if (!needs_clflush_after &&
> >  		    obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
> >  			if (i915_gem_clflush_object(obj, obj->pin_display))
> > -				i915_gem_chipset_flush(dev);
> > +				needs_clflush_after = true;
> >  		}
> >  	}
> >  
> >  	if (needs_clflush_after)
> >  		i915_gem_chipset_flush(dev);
> > +	else if (obj->cache_level == I915_CACHE_NONE ||
> > +		 obj->cache_level == I915_CACHE_WT)
> > +		obj->cache_dirty = true;
> >  
> >  	intel_fb_obj_flush(obj, false, ORIGIN_CPU);
> >  	return ret;
> 
> Ok, this seems reasonable.
> 
> > @@ -3639,10 +3642,10 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> >  {
> >  	struct drm_device *dev = obj->base.dev;
> >  	struct i915_vma *vma, *next;
> > -	int ret;
> > +	int ret = 0;
> >  
> >  	if (obj->cache_level == cache_level)
> > -		return 0;
> > +		goto out;
> >  
> >  	if (i915_gem_obj_is_pinned(obj)) {
> >  		DRM_DEBUG("can not change the cache level of pinned objects\n");
> > @@ -3687,6 +3690,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> >  		vma->node.color = cache_level;
> >  	obj->cache_level = cache_level;
> >  
> > +out:
> >  	if (obj->cache_dirty &&
> >  	    obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
> >  	    cpu_write_needs_clflush(obj)) {
> 
> This we can do better (and I know I am guilty of the original sin). It
> just feels a little too loose that we expect pin-to-display-plane should
> always call set-cache-level (yes, it has to, but it feels like we are
> putting pin-to-display-plane specifics into set-cache-level).
> 
> I think this chunk is much more understandable if we did:
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 5e7fcf77e57a..fc6bcc19cca1 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3656,13 +3656,6 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>                 vma->node.color = cache_level;
>         obj->cache_level = cache_level;
>  
> -       if (obj->cache_dirty &&
> -           obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
> -           cpu_write_needs_clflush(obj)) {
> -               if (i915_gem_clflush_object(obj, true))
> -                       i915_gem_chipset_flush(obj->base.dev);
> -       }
> -
>         return 0;
>  }
>  
> @@ -3795,6 +3788,10 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
>         WARN_ON(obj->pin_display > vma->pin_count);
>  
>         i915_gem_object_flush_cpu_write_domain(obj);
> +       if (obj->cache_dirty) {
> +               if (i915_gem_clflush_object(obj, true))
> +                       i915_gem_chipset_flush(obj->base.dev);
> +       }
>  
>         /* It should now be out of any other write domains, and we can update
>          * the domain values for our changes.
> 
> Maybe even
> 
> /* Whilst the object was away from the scanout we may have been eliding the coherent
>  * writes into the CPU cache. However, the moment it has to be read by the display
>  * engine, those writes into the CPU cache become inaccessible and so we have to 
>  * clflush them out to main memory. We track elided flushes with obj->cache_dirty
>  * and hope they are rare.
>  */
> 
> 
> The other user of set-cache-level (set_caching_ioctl) should not care
> about the clflush side-effect and the clflush elision should work just
> fine instead.

Hmm. So what would happen on !LLC if we start with a cached bo, then pwrite it
and afterwards make it uncached?

> 
> Either way,
> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
> but I'd prefer a v2 with the comments :)
> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre
Ville Syrjälä Aug. 11, 2015, 3:10 p.m. UTC | #2
On Tue, Aug 11, 2015 at 05:56:28PM +0300, Ville Syrjälä wrote:
> On Tue, Aug 11, 2015 at 03:28:27PM +0100, Chris Wilson wrote:
> > On Tue, Aug 11, 2015 at 04:59:14PM +0300, ville.syrjala@linux.intel.com wrote:
> > > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > 
> > > Currently we don't clflush on pin_to_display if the bo is already
> > > UC/WT and is not in the CPU write domain. This causes problems with
> > > pwrite since pwrite doesn't change the write domain, and it avoids
> > > clflushing on UC/WT buffers on LLC platforms unless the buffer is
> > > currently being scanned out.
> > > 
> > > Fix the problem by marking the cache dirty and adjusting
> > > i915_gem_object_set_cache_level() to clflush when the cache is dirty
> > > even if the cache_level doesn't change.
> > > 
> > > My last attempt [1] at fixing this via write domain frobbing was shot
> > > down, but now with the cache_dirty flag we can do things in a nicer way.
> > > 
> > > [1] http://lists.freedesktop.org/archives/intel-gfx/2014-November/055390.html
> > > 
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86422
> > > Testcase: igt/kms_pwrite_crc
> > > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/i915_gem.c | 10 +++++++---
> > >  1 file changed, 7 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > > index 73293b4..73eff2e 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > > @@ -1005,12 +1005,15 @@ out:
> > >  		if (!needs_clflush_after &&
> > >  		    obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
> > >  			if (i915_gem_clflush_object(obj, obj->pin_display))
> > > -				i915_gem_chipset_flush(dev);
> > > +				needs_clflush_after = true;
> > >  		}
> > >  	}
> > >  
> > >  	if (needs_clflush_after)
> > >  		i915_gem_chipset_flush(dev);
> > > +	else if (obj->cache_level == I915_CACHE_NONE ||
> > > +		 obj->cache_level == I915_CACHE_WT)
> > > +		obj->cache_dirty = true;
> > >  
> > >  	intel_fb_obj_flush(obj, false, ORIGIN_CPU);
> > >  	return ret;
> > 
> > Ok, this seems reasonable.
> > 
> > > @@ -3639,10 +3642,10 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> > >  {
> > >  	struct drm_device *dev = obj->base.dev;
> > >  	struct i915_vma *vma, *next;
> > > -	int ret;
> > > +	int ret = 0;
> > >  
> > >  	if (obj->cache_level == cache_level)
> > > -		return 0;
> > > +		goto out;
> > >  
> > >  	if (i915_gem_obj_is_pinned(obj)) {
> > >  		DRM_DEBUG("can not change the cache level of pinned objects\n");
> > > @@ -3687,6 +3690,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> > >  		vma->node.color = cache_level;
> > >  	obj->cache_level = cache_level;
> > >  
> > > +out:
> > >  	if (obj->cache_dirty &&
> > >  	    obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
> > >  	    cpu_write_needs_clflush(obj)) {
> > 
> > This we can do better (and I know I am guilty of the original sin). It
> > just feels a little too loose that we expect pin-to-display-plane should
> > always call set-cache-level (yes, it has to, but it feels like we are
> > putting pin-to-display-plane specifics into set-cache-level).
> > 
> > I think this chunk is much more understandable if we did:
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 5e7fcf77e57a..fc6bcc19cca1 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3656,13 +3656,6 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
> >                 vma->node.color = cache_level;
> >         obj->cache_level = cache_level;
> >  
> > -       if (obj->cache_dirty &&
> > -           obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
> > -           cpu_write_needs_clflush(obj)) {
> > -               if (i915_gem_clflush_object(obj, true))
> > -                       i915_gem_chipset_flush(obj->base.dev);
> > -       }
> > -
> >         return 0;
> >  }
> >  
> > @@ -3795,6 +3788,10 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
> >         WARN_ON(obj->pin_display > vma->pin_count);
> >  
> >         i915_gem_object_flush_cpu_write_domain(obj);
> > +       if (obj->cache_dirty) {
> > +               if (i915_gem_clflush_object(obj, true))
> > +                       i915_gem_chipset_flush(obj->base.dev);
> > +       }
> >  
> >         /* It should now be out of any other write domains, and we can update
> >          * the domain values for our changes.
> > 
> > Maybe even
> > 
> > /* Whilst the object was away from the scanout we may have been eliding the coherent
> >  * writes into the CPU cache. However, the moment it has to be read by the display
> >  * engine, those writes into the CPU cache become inaccessible and so we have to 
> >  * clflush them out to main memory. We track elided flushes with obj->cache_dirty
> >  * and hope they are rare.
> >  */
> > 
> > 
> > The other user of set-cache-level (set_caching_ioctl) should not care
> > about the clflush side-effect and the clflush elision should work just
> > fine instead.
> 
> Hmm. So what would happen on !LLC if we start with a cached bo, then pwrite it
> and afterwards make it uncached?

In fact that would still fail even with my patch, and wouldn't work with current
upstream code either. To fix that I'd need to drop the I915_CACHE_NONE/WT checks
from pwrite in my patch and always set cache_dirty=true when it didn't
clflush. Doing that would seem perfectly reasonable to me.
Chris Wilson Aug. 11, 2015, 3:19 p.m. UTC | #3
On Tue, Aug 11, 2015 at 06:10:29PM +0300, Ville Syrjälä wrote:
> On Tue, Aug 11, 2015 at 05:56:28PM +0300, Ville Syrjälä wrote:
> > Hmm. So what would happen on !LLC if we start with a cached bo, then pwrite it
> > and afterwards make it uncached?
> 
> In fact that would still fail even with my patch, and wouldn't work with current
> upstream code either. To fix that I'd need to drop the I915_CACHE_NONE/WT checks
> from pwrite in my patch and always set cache_dirty=true when it didn't
> clflush. Doing that would seem perfectly reasonable to me.

Yes. I agree. Marking obj->cache_dirty whenever we dirty the cpu cache
and clear it upon clflush seems sane. It will only take effect on
transition to the display plane, so not going to impact us except when
correctness is required. So maybe we should call it obj->display_dirty?
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5e7fcf77e57a..fc6bcc19cca1 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3656,13 +3656,6 @@  int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
                vma->node.color = cache_level;
        obj->cache_level = cache_level;
 
-       if (obj->cache_dirty &&
-           obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
-           cpu_write_needs_clflush(obj)) {
-               if (i915_gem_clflush_object(obj, true))
-                       i915_gem_chipset_flush(obj->base.dev);
-       }
-
        return 0;
 }
 
@@ -3795,6 +3788,10 @@  i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj,
        WARN_ON(obj->pin_display > vma->pin_count);
 
        i915_gem_object_flush_cpu_write_domain(obj);
+       if (obj->cache_dirty) {
+               if (i915_gem_clflush_object(obj, true))
+                       i915_gem_chipset_flush(obj->base.dev);
+       }
 
        /* It should now be out of any other write domains, and we can update
         * the domain values for our changes.