drm/i915: Unconditionally flush residual writes before batches
diff mbox

Message ID 1342081782-11626-1-git-send-email-chris@chris-wilson.co.uk
State New, archived
Headers show

Commit Message

Chris Wilson July 12, 2012, 8:29 a.m. UTC
During batch buffer emission we flag that the next request should
generate a flush. The intention is that we queue a request following the
batch buffer in order to provide the breadcrumb, and so always flush all
caches after every batch, killing the flushing list. However, X
carefully schedules a signal to arrive just as we are checking whether
the ring has enough space to enqueue the flush and so we abort that
addition with an ERESTARTSYS. The next batch then clears the write_domain,
and we end up with an object with no write_domain on the ring's
gpu_write_list. Havoc WARNs and BUGs ensue.

The gpu_write_list is doomed as it is merely a remnant  of the flushing
list, so the easiest fix in the meantime is to clear all residual
members from the gpu_write_list prior to updating the write_domain on
the next batch.

Fixes regression from

commit cc889e0f6ce6a63c62db17d702ecfed86d58083f
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Jun 13 20:45:19 2012 +0200

    drm/i915: disable flushing_list/gpu_write_lis

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc:  Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Daniel Vetter July 12, 2012, 9 a.m. UTC | #1
On Thu, Jul 12, 2012 at 09:29:42AM +0100, Chris Wilson wrote:
> During batch buffer emission we flag that the next request should
> generate a flush. The intention is that we queue a request following the
> batch buffer in order to provide the breadcrumb, and so always flush all
> caches after every batch, killing the flushing list. However, X
> carefully schedules a signal to arrive just as we are checking whether
> the ring has enough space to enqueue the flush and so we abort that
> addition with an ERESTARTSYS. The next batch then clears the write_domain,
> and we end up with an object with no write_domain on the ring's
> gpu_write_list. Havoc WARNs and BUGs ensue.
> 
> The gpu_write_list is doomed as it is merely a remnant  of the flushing
> list, so the easiest fix in the meantime is to clear all residual
> members from the gpu_write_list prior to updating the write_domain on
> the next batch.
> 
> Fixes regression from
> 
> commit cc889e0f6ce6a63c62db17d702ecfed86d58083f
> Author: Daniel Vetter <daniel.vetter@ffwll.ch>
> Date:   Wed Jun 13 20:45:19 2012 +0200
> 
>     drm/i915: disable flushing_list/gpu_write_lis
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc:  Daniel Vetter <daniel.vetter@ffwll.ch>

Thinking about this some more, this could actually be a regression from

commit de2b998552c1534e87bfbc51ec5734b02bc89020
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Jul 4 22:52:50 2012 +0200

    drm/i915: don't return a spurious -EIO from intel_ring_begin

Only with that patch does intel_ring_begin return -ERESTARTSYS.

> ---
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 88e2e11..4c521df 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -885,8 +885,9 @@ i915_gem_execbuffer_move_to_gpu(struct intel_ring_buffer *ring,
>  			return ret;
>  	}
>  
> -	/* Unconditionally invalidate gpu caches. */
> -	ret = i915_gem_flush_ring(ring, I915_GEM_GPU_DOMAINS, 0);
> +	/* Unconditionally invalidate gpu caches and flush residual writes. */

Hm, can we have a big FIXME here that we should change this back once the
gpu_write_list is properly burried?
-Daniel

> +	ret = i915_gem_flush_ring(ring,
> +				  I915_GEM_GPU_DOMAINS, I915_GEM_GPU_DOMAINS);
>  	if (ret)
>  		return ret;
>  
> -- 
> 1.7.10.4
>
Chris Wilson July 12, 2012, 10:10 a.m. UTC | #2
On Thu, 12 Jul 2012 11:00:17 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Thu, Jul 12, 2012 at 09:29:42AM +0100, Chris Wilson wrote:
> > During batch buffer emission we flag that the next request should
> > generate a flush. The intention is that we queue a request following the
> > batch buffer in order to provide the breadcrumb, and so always flush all
> > caches after every batch, killing the flushing list. However, X
> > carefully schedules a signal to arrive just as we are checking whether
> > the ring has enough space to enqueue the flush and so we abort that
> > addition with an ERESTARTSYS. The next batch then clears the write_domain,
> > and we end up with an object with no write_domain on the ring's
> > gpu_write_list. Havoc WARNs and BUGs ensue.
> > 
> > The gpu_write_list is doomed as it is merely a remnant  of the flushing
> > list, so the easiest fix in the meantime is to clear all residual
> > members from the gpu_write_list prior to updating the write_domain on
> > the next batch.
> > 
> > Fixes regression from
> > 
> > commit cc889e0f6ce6a63c62db17d702ecfed86d58083f
> > Author: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Date:   Wed Jun 13 20:45:19 2012 +0200
> > 
> >     drm/i915: disable flushing_list/gpu_write_lis
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc:  Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Thinking about this some more, this could actually be a regression from
> 
> commit de2b998552c1534e87bfbc51ec5734b02bc89020
> Author: Daniel Vetter <daniel.vetter@ffwll.ch>
> Date:   Wed Jul 4 22:52:50 2012 +0200
> 
>     drm/i915: don't return a spurious -EIO from intel_ring_begin
> 
> Only with that patch does intel_ring_begin return -ERESTARTSYS.

Ok, that would better fit in with the discovery timeline, and indeed it
was a side-effect of the "not yet audited all the paths...". How true
you were. :)
 
> > ---
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c |    5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > index 88e2e11..4c521df 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > @@ -885,8 +885,9 @@ i915_gem_execbuffer_move_to_gpu(struct intel_ring_buffer *ring,
> >  			return ret;
> >  	}
> >  
> > -	/* Unconditionally invalidate gpu caches. */
> > -	ret = i915_gem_flush_ring(ring, I915_GEM_GPU_DOMAINS, 0);
> > +	/* Unconditionally invalidate gpu caches and flush residual writes. */
> 
> Hm, can we have a big FIXME here that we should change this back once the
> gpu_write_list is properly burried?

I fully expect i915_gem_flush_ring() to die in a fire once the flushing
list removal is complete. There will only be two callsites, both in
i915_gem_execbuffer.c; one to invalidate and one to flush. So we might
as well call them thus. If you want to tag an XXX in there, feel free,
I was just trying to keep inside 80 cols. ;-)
-Chris

Patch
diff mbox

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 88e2e11..4c521df 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -885,8 +885,9 @@  i915_gem_execbuffer_move_to_gpu(struct intel_ring_buffer *ring,
 			return ret;
 	}
 
-	/* Unconditionally invalidate gpu caches. */
-	ret = i915_gem_flush_ring(ring, I915_GEM_GPU_DOMAINS, 0);
+	/* Unconditionally invalidate gpu caches and flush residual writes. */
+	ret = i915_gem_flush_ring(ring,
+				  I915_GEM_GPU_DOMAINS, I915_GEM_GPU_DOMAINS);
 	if (ret)
 		return ret;