Message ID | 1342081782-11626-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jul 12, 2012 at 09:29:42AM +0100, Chris Wilson wrote: > During batch buffer emission we flag that the next request should > generate a flush. The intention is that we queue a request following the > batch buffer in order to provide the breadcrumb, and so always flush all > caches after every batch, killing the flushing list. However, X > carefully schedules a signal to arrive just as we are checking whether > the ring has enough space to enqueue the flush and so we abort that > addition with an ERESTARTSYS. The next batch then clears the write_domain, > and we end up with an object with no write_domain on the ring's > gpu_write_list. Havoc WARNs and BUGs ensue. > > The gpu_write_list is doomed as it is merely a remnant of the flushing > list, so the easiest fix in the meantime is to clear all residual > members from the gpu_write_list prior to updating the write_domain on > the next batch. > > Fixes regression from > > commit cc889e0f6ce6a63c62db17d702ecfed86d58083f > Author: Daniel Vetter <daniel.vetter@ffwll.ch> > Date: Wed Jun 13 20:45:19 2012 +0200 > > drm/i915: disable flushing_list/gpu_write_lis > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Thinking about this some more, this could actually be a regression from commit de2b998552c1534e87bfbc51ec5734b02bc89020 Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Jul 4 22:52:50 2012 +0200 drm/i915: don't return a spurious -EIO from intel_ring_begin Only with that patch does intel_ring_begin return -ERESTARTSYS. > --- > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > index 88e2e11..4c521df 100644 > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > @@ -885,8 +885,9 @@ i915_gem_execbuffer_move_to_gpu(struct intel_ring_buffer *ring, > return ret; > } > > - /* Unconditionally invalidate gpu caches. */ > - ret = i915_gem_flush_ring(ring, I915_GEM_GPU_DOMAINS, 0); > + /* Unconditionally invalidate gpu caches and flush residual writes. */ Hm, can we have a big FIXME here that we should change this back once the gpu_write_list is properly burried? -Daniel > + ret = i915_gem_flush_ring(ring, > + I915_GEM_GPU_DOMAINS, I915_GEM_GPU_DOMAINS); > if (ret) > return ret; > > -- > 1.7.10.4 >
On Thu, 12 Jul 2012 11:00:17 +0200, Daniel Vetter <daniel@ffwll.ch> wrote: > On Thu, Jul 12, 2012 at 09:29:42AM +0100, Chris Wilson wrote: > > During batch buffer emission we flag that the next request should > > generate a flush. The intention is that we queue a request following the > > batch buffer in order to provide the breadcrumb, and so always flush all > > caches after every batch, killing the flushing list. However, X > > carefully schedules a signal to arrive just as we are checking whether > > the ring has enough space to enqueue the flush and so we abort that > > addition with an ERESTARTSYS. The next batch then clears the write_domain, > > and we end up with an object with no write_domain on the ring's > > gpu_write_list. Havoc WARNs and BUGs ensue. > > > > The gpu_write_list is doomed as it is merely a remnant of the flushing > > list, so the easiest fix in the meantime is to clear all residual > > members from the gpu_write_list prior to updating the write_domain on > > the next batch. > > > > Fixes regression from > > > > commit cc889e0f6ce6a63c62db17d702ecfed86d58083f > > Author: Daniel Vetter <daniel.vetter@ffwll.ch> > > Date: Wed Jun 13 20:45:19 2012 +0200 > > > > drm/i915: disable flushing_list/gpu_write_lis > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > > Thinking about this some more, this could actually be a regression from > > commit de2b998552c1534e87bfbc51ec5734b02bc89020 > Author: Daniel Vetter <daniel.vetter@ffwll.ch> > Date: Wed Jul 4 22:52:50 2012 +0200 > > drm/i915: don't return a spurious -EIO from intel_ring_begin > > Only with that patch does intel_ring_begin return -ERESTARTSYS. Ok, that would better fit in with the discovery timeline, and indeed it was a side-effect of the "not yet audited all the paths...". How true you were. :) > > --- > > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > > index 88e2e11..4c521df 100644 > > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c > > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > > @@ -885,8 +885,9 @@ i915_gem_execbuffer_move_to_gpu(struct intel_ring_buffer *ring, > > return ret; > > } > > > > - /* Unconditionally invalidate gpu caches. */ > > - ret = i915_gem_flush_ring(ring, I915_GEM_GPU_DOMAINS, 0); > > + /* Unconditionally invalidate gpu caches and flush residual writes. */ > > Hm, can we have a big FIXME here that we should change this back once the > gpu_write_list is properly burried? I fully expect i915_gem_flush_ring() to die in a fire once the flushing list removal is complete. There will only be two callsites, both in i915_gem_execbuffer.c; one to invalidate and one to flush. So we might as well call them thus. If you want to tag an XXX in there, feel free, I was just trying to keep inside 80 cols. ;-) -Chris
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 88e2e11..4c521df 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -885,8 +885,9 @@ i915_gem_execbuffer_move_to_gpu(struct intel_ring_buffer *ring, return ret; } - /* Unconditionally invalidate gpu caches. */ - ret = i915_gem_flush_ring(ring, I915_GEM_GPU_DOMAINS, 0); + /* Unconditionally invalidate gpu caches and flush residual writes. */ + ret = i915_gem_flush_ring(ring, + I915_GEM_GPU_DOMAINS, I915_GEM_GPU_DOMAINS); if (ret) return ret;
During batch buffer emission we flag that the next request should generate a flush. The intention is that we queue a request following the batch buffer in order to provide the breadcrumb, and so always flush all caches after every batch, killing the flushing list. However, X carefully schedules a signal to arrive just as we are checking whether the ring has enough space to enqueue the flush and so we abort that addition with an ERESTARTSYS. The next batch then clears the write_domain, and we end up with an object with no write_domain on the ring's gpu_write_list. Havoc WARNs and BUGs ensue. The gpu_write_list is doomed as it is merely a remnant of the flushing list, so the easiest fix in the meantime is to clear all residual members from the gpu_write_list prior to updating the write_domain on the next batch. Fixes regression from commit cc889e0f6ce6a63c62db17d702ecfed86d58083f Author: Daniel Vetter <daniel.vetter@ffwll.ch> Date: Wed Jun 13 20:45:19 2012 +0200 drm/i915: disable flushing_list/gpu_write_lis Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> --- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)