Message ID | 1342185256-16024-6-git-send-email-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Jul 13, 2012 at 02:14:08PM +0100, Chris Wilson wrote: > If we drop the breadcrumb request after a batch due to a signal for > example we aim to fix it up at the next opportunity. In this case we > emit a second batchbuffer with no waits upon the first and so no > opportunity to insert the missing request, so we need to emit the > missing flush for coherency. (Note that that invalidating the render > cache is the same as flushing it, so there should have been no > observable corruption.) > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Imo still too meager commit message ;-) As I've said in the previous mail, I'd like some mention of the two commits that made this disaster possible (put the blame on me where it is due). And I think some more in-detail walk-thru of how things blow up would be great. And the Bugzilla link for the QA bugreport. Also, I still don't understand why this patch here isn't enough to fix up the fallout. So if you can enlighten me where/why stuff blows up even with this I'd highly appreciate. Not just because not understanding bugs makes me queasy, but also to have a clear picture of what I'd need to send to Dave it this -next cycle misses 3.6. Meanwhile I'll try to hit this with some i-g-t tests, maybe that gives me better understanding of what's going on. Thanks, Daniel > --- > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > index 50e83e5..c08229f 100644 > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > @@ -885,11 +885,16 @@ i915_gem_execbuffer_move_to_gpu(struct intel_ring_buffer *ring, > return ret; > } > > - /* Unconditionally invalidate gpu caches. */ > - ret = i915_gem_flush_ring(ring, I915_GEM_GPU_DOMAINS, 0); > + /* Unconditionally invalidate gpu caches and ensure that we do flush > + * any residual writes from the previous batch. > + */ > + ret = i915_gem_flush_ring(ring, > + I915_GEM_GPU_DOMAINS, > + ring->gpu_caches_dirty ? I915_GEM_GPU_DOMAINS : 0); > if (ret) > return ret; > > + ring->gpu_caches_dirty = false; > return 0; > } > > -- > 1.7.10.4 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Fri, 13 Jul 2012 17:46:20 +0200, Daniel Vetter <daniel@ffwll.ch> wrote: > On Fri, Jul 13, 2012 at 02:14:08PM +0100, Chris Wilson wrote: > > If we drop the breadcrumb request after a batch due to a signal for > > example we aim to fix it up at the next opportunity. In this case we > > emit a second batchbuffer with no waits upon the first and so no > > opportunity to insert the missing request, so we need to emit the > > missing flush for coherency. (Note that that invalidating the render > > cache is the same as flushing it, so there should have been no > > observable corruption.) > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Imo still too meager commit message ;-) As I've said in the previous mail, > I'd like some mention of the two commits that made this disaster possible > (put the blame on me where it is due). And I think some more in-detail > walk-thru of how things blow up would be great. And the Bugzilla link for > the QA bugreport. Sure, in the patch I thought I was sending I had an extra paragraph: As a side effect this will also paper over issues such as https://bugs.freedesktop.org/show_bug.cgi?id=52040 whereby we clear the write_domain on objects on the defunct gpu_write_list. References: https://bugs.freedesktop.org/show_bug.cgi?id=52040 > Also, I still don't understand why this patch here isn't enough to fix up > the fallout. So if you can enlighten me where/why stuff blows up even with > this I'd highly appreciate. Not just because not understanding bugs makes > me queasy, but also to have a clear picture of what I'd need to send to > Dave it this -next cycle misses 3.6. The remaining fallout is that we still end up using the flushing-list, as revealed by *adding* a WARN. To end up in that situation we must retire an object with a write-domain still set. But how can this be possible if we always clear the write_list prior to the request/retirment? I thought I had it, being sneaky with the use of INSTRUCTION write domain for pipe-control. However, looks like I'm going to have to reproduce with some more debugging. -Chris
On Fri, Jul 13, 2012 at 02:14:08PM +0100, Chris Wilson wrote: > If we drop the breadcrumb request after a batch due to a signal for > example we aim to fix it up at the next opportunity. In this case we > emit a second batchbuffer with no waits upon the first and so no > opportunity to insert the missing request, so we need to emit the > missing flush for coherency. (Note that that invalidating the render > cache is the same as flushing it, so there should have been no > observable corruption.) > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Ok, now that I've got some more clue about how this all blows up, I've merged this patch here (with a rather decently pimped commit message). Thanks for digging into this & feeding me the lacking clue. Cheers, Daniel
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 50e83e5..c08229f 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -885,11 +885,16 @@ i915_gem_execbuffer_move_to_gpu(struct intel_ring_buffer *ring, return ret; } - /* Unconditionally invalidate gpu caches. */ - ret = i915_gem_flush_ring(ring, I915_GEM_GPU_DOMAINS, 0); + /* Unconditionally invalidate gpu caches and ensure that we do flush + * any residual writes from the previous batch. + */ + ret = i915_gem_flush_ring(ring, + I915_GEM_GPU_DOMAINS, + ring->gpu_caches_dirty ? I915_GEM_GPU_DOMAINS : 0); if (ret) return ret; + ring->gpu_caches_dirty = false; return 0; }
If we drop the breadcrumb request after a batch due to a signal for example we aim to fix it up at the next opportunity. In this case we emit a second batchbuffer with no waits upon the first and so no opportunity to insert the missing request, so we need to emit the missing flush for coherency. (Note that that invalidating the render cache is the same as flushing it, so there should have been no observable corruption.) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)