diff mbox

[05/13] drm/i915: Insert a flush between batches if the breadcrumb was dropped

Message ID 1342185256-16024-6-git-send-email-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson July 13, 2012, 1:14 p.m. UTC
If we drop the breadcrumb request after a batch due to a signal for
example we aim to fix it up at the next opportunity. In this case we
emit a second batchbuffer with no waits upon the first and so no
opportunity to insert the missing request, so we need to emit the
missing flush for coherency. (Note that that invalidating the render
cache is the same as flushing it, so there should have been no
observable corruption.)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Daniel Vetter July 13, 2012, 3:46 p.m. UTC | #1
On Fri, Jul 13, 2012 at 02:14:08PM +0100, Chris Wilson wrote:
> If we drop the breadcrumb request after a batch due to a signal for
> example we aim to fix it up at the next opportunity. In this case we
> emit a second batchbuffer with no waits upon the first and so no
> opportunity to insert the missing request, so we need to emit the
> missing flush for coherency. (Note that that invalidating the render
> cache is the same as flushing it, so there should have been no
> observable corruption.)
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Imo still too meager commit message ;-) As I've said in the previous mail,
I'd like some mention of the two commits that made this disaster possible
(put the blame on me where it is due). And I think some more in-detail
walk-thru of how things blow up would be great. And the Bugzilla link for
the QA bugreport.

Also, I still don't understand why this patch here isn't enough to fix up
the fallout. So if you can enlighten me where/why stuff blows up even with
this I'd highly appreciate. Not just because not understanding bugs makes
me queasy, but also to have a clear picture of what I'd need to send to
Dave it this -next cycle misses 3.6.

Meanwhile I'll try to hit this with some i-g-t tests, maybe that gives me
better understanding of what's going on.

Thanks, Daniel

> ---
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |    9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 50e83e5..c08229f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -885,11 +885,16 @@ i915_gem_execbuffer_move_to_gpu(struct intel_ring_buffer *ring,
>  			return ret;
>  	}
>  
> -	/* Unconditionally invalidate gpu caches. */
> -	ret = i915_gem_flush_ring(ring, I915_GEM_GPU_DOMAINS, 0);
> +	/* Unconditionally invalidate gpu caches and ensure that we do flush
> +	 * any residual writes from the previous batch.
> +	 */
> +	ret = i915_gem_flush_ring(ring,
> +				  I915_GEM_GPU_DOMAINS,
> +				  ring->gpu_caches_dirty ? I915_GEM_GPU_DOMAINS : 0);
>  	if (ret)
>  		return ret;
>  
> +	ring->gpu_caches_dirty = false;
>  	return 0;
>  }
>  
> -- 
> 1.7.10.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson July 14, 2012, 10:24 a.m. UTC | #2
On Fri, 13 Jul 2012 17:46:20 +0200, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Fri, Jul 13, 2012 at 02:14:08PM +0100, Chris Wilson wrote:
> > If we drop the breadcrumb request after a batch due to a signal for
> > example we aim to fix it up at the next opportunity. In this case we
> > emit a second batchbuffer with no waits upon the first and so no
> > opportunity to insert the missing request, so we need to emit the
> > missing flush for coherency. (Note that that invalidating the render
> > cache is the same as flushing it, so there should have been no
> > observable corruption.)
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Imo still too meager commit message ;-) As I've said in the previous mail,
> I'd like some mention of the two commits that made this disaster possible
> (put the blame on me where it is due). And I think some more in-detail
> walk-thru of how things blow up would be great. And the Bugzilla link for
> the QA bugreport.

Sure, in the patch I thought I was sending I had an extra paragraph:

    As a side effect this will also paper over issues such as
      https://bugs.freedesktop.org/show_bug.cgi?id=52040
    whereby we clear the write_domain on objects on the defunct
    gpu_write_list.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=52040

> Also, I still don't understand why this patch here isn't enough to fix up
> the fallout. So if you can enlighten me where/why stuff blows up even with
> this I'd highly appreciate. Not just because not understanding bugs makes
> me queasy, but also to have a clear picture of what I'd need to send to
> Dave it this -next cycle misses 3.6.

The remaining fallout is that we still end up using the flushing-list,
as revealed by *adding* a WARN.

To end up in that situation we must retire an object with a write-domain
still set. But how can this be possible if we always clear the
write_list prior to the request/retirment?

I thought I had it, being sneaky with the use of INSTRUCTION write
domain for pipe-control. However, looks like I'm going to have to
reproduce with some more debugging.
-Chris
Daniel Vetter July 14, 2012, 1:39 p.m. UTC | #3
On Fri, Jul 13, 2012 at 02:14:08PM +0100, Chris Wilson wrote:
> If we drop the breadcrumb request after a batch due to a signal for
> example we aim to fix it up at the next opportunity. In this case we
> emit a second batchbuffer with no waits upon the first and so no
> opportunity to insert the missing request, so we need to emit the
> missing flush for coherency. (Note that that invalidating the render
> cache is the same as flushing it, so there should have been no
> observable corruption.)
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Ok, now that I've got some more clue about how this all blows up, I've
merged this patch here (with a rather decently pimped commit message).
Thanks for digging into this & feeding me the lacking clue.

Cheers, Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 50e83e5..c08229f 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -885,11 +885,16 @@  i915_gem_execbuffer_move_to_gpu(struct intel_ring_buffer *ring,
 			return ret;
 	}
 
-	/* Unconditionally invalidate gpu caches. */
-	ret = i915_gem_flush_ring(ring, I915_GEM_GPU_DOMAINS, 0);
+	/* Unconditionally invalidate gpu caches and ensure that we do flush
+	 * any residual writes from the previous batch.
+	 */
+	ret = i915_gem_flush_ring(ring,
+				  I915_GEM_GPU_DOMAINS,
+				  ring->gpu_caches_dirty ? I915_GEM_GPU_DOMAINS : 0);
 	if (ret)
 		return ret;
 
+	ring->gpu_caches_dirty = false;
 	return 0;
 }