Message ID | 20180814171857.24673-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915: Clear stop-engine for a pardoned reset | expand |
Quoting Patchwork (2018-08-14 18:48:08) > == Series Details == > > Series: drm/i915: Clear stop-engine for a pardoned reset > URL : https://patchwork.freedesktop.org/series/48202/ > State : success > > == Summary == > > = CI Bug Log - changes from CI_DRM_4666 -> Patchwork_9943 = > > == Summary - SUCCESS == > > No regressions found. > > External URL: https://patchwork.freedesktop.org/api/1.0/series/48202/revisions/1/mbox/ > > == Known issues == > > Here are the changes found in Patchwork_9943 that come from known issues: > > === IGT changes === > > ==== Issues hit ==== > > igt@drv_selftest@live_hangcheck: > {fi-cfl-8109u}: PASS -> DMESG-FAIL (fdo#106560) A different bug at last! That's a missed CS interrupt of some description instead. -Chris
Chris Wilson <chris@chris-wilson.co.uk> writes: > If we pardon a per-engine reset, we may leave the STOP_RING bit asserted > in RING_MI_MODE resulting in the engine hanging. Unconditionally clear > it on the per-engine exit path as we know that either we skipped the > reset and so need the cancellation, or the reset was successful and the > cancellation is a no-op, or there was an error and we will follow up > with a full-reset or wedging (both of which will stop the engines again > as required). > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > --- > drivers/gpu/drm/i915/i915_drv.c | 1 + > drivers/gpu/drm/i915/intel_engine_cs.c | 10 ++++++++++ > drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + > 3 files changed, 12 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > index 9dce55182c3a..41111f2a9c39 100644 > --- a/drivers/gpu/drm/i915/i915_drv.c > +++ b/drivers/gpu/drm/i915/i915_drv.c > @@ -2079,6 +2079,7 @@ int i915_reset_engine(struct intel_engine_cs *engine, const char *msg) > goto out; > > out: > + intel_engine_cancel_stop_cs(engine); > i915_gem_reset_finish_engine(engine); Should we just lift the whole stop/start dance into gem_reset_prepare|finish_engine()s? -Mika > return ret; > } > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c > index 99d5a24219c1..8628567d8f6e 100644 > --- a/drivers/gpu/drm/i915/intel_engine_cs.c > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c > @@ -788,6 +788,16 @@ int intel_engine_stop_cs(struct intel_engine_cs *engine) > return err; > } > > +void intel_engine_cancel_stop_cs(struct intel_engine_cs *engine) > +{ > + struct drm_i915_private *dev_priv = engine->i915; > + > + GEM_TRACE("%s\n", engine->name); > + > + I915_WRITE_FW(RING_MI_MODE(engine->mmio_base), > + _MASKED_BIT_DISABLE(STOP_RING)); > +} > + > const char *i915_cache_level_str(struct drm_i915_private *i915, int type) > { > switch (type) { > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h > index 9090885d57de..3f6920dd7880 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h > @@ -906,6 +906,7 @@ int intel_init_blt_ring_buffer(struct intel_engine_cs *engine); > int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine); > > int intel_engine_stop_cs(struct intel_engine_cs *engine); > +void intel_engine_cancel_stop_cs(struct intel_engine_cs *engine); > > u64 intel_engine_get_active_head(const struct intel_engine_cs *engine); > u64 intel_engine_get_last_batch_head(const struct intel_engine_cs *engine); > -- > 2.18.0
Quoting Mika Kuoppala (2018-08-15 09:52:18) > Chris Wilson <chris@chris-wilson.co.uk> writes: > > > If we pardon a per-engine reset, we may leave the STOP_RING bit asserted > > in RING_MI_MODE resulting in the engine hanging. Unconditionally clear > > it on the per-engine exit path as we know that either we skipped the > > reset and so need the cancellation, or the reset was successful and the > > cancellation is a no-op, or there was an error and we will follow up > > with a full-reset or wedging (both of which will stop the engines again > > as required). > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > > --- > > drivers/gpu/drm/i915/i915_drv.c | 1 + > > drivers/gpu/drm/i915/intel_engine_cs.c | 10 ++++++++++ > > drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + > > 3 files changed, 12 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > > index 9dce55182c3a..41111f2a9c39 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.c > > +++ b/drivers/gpu/drm/i915/i915_drv.c > > @@ -2079,6 +2079,7 @@ int i915_reset_engine(struct intel_engine_cs *engine, const char *msg) > > goto out; > > > > out: > > + intel_engine_cancel_stop_cs(engine); > > i915_gem_reset_finish_engine(engine); > > Should we just lift the whole stop/start dance into > gem_reset_prepare|finish_engine()s? No, because it is also used by wedging where we do want the asymmetry. Been there, done that, have the gem_eio bruises. -Chris
Chris Wilson <chris@chris-wilson.co.uk> writes: > Quoting Mika Kuoppala (2018-08-15 09:52:18) >> Chris Wilson <chris@chris-wilson.co.uk> writes: >> >> > If we pardon a per-engine reset, we may leave the STOP_RING bit asserted >> > in RING_MI_MODE resulting in the engine hanging. Unconditionally clear >> > it on the per-engine exit path as we know that either we skipped the >> > reset and so need the cancellation, or the reset was successful and the >> > cancellation is a no-op, or there was an error and we will follow up >> > with a full-reset or wedging (both of which will stop the engines again >> > as required). >> > >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> >> > --- >> > drivers/gpu/drm/i915/i915_drv.c | 1 + >> > drivers/gpu/drm/i915/intel_engine_cs.c | 10 ++++++++++ >> > drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + >> > 3 files changed, 12 insertions(+) >> > >> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c >> > index 9dce55182c3a..41111f2a9c39 100644 >> > --- a/drivers/gpu/drm/i915/i915_drv.c >> > +++ b/drivers/gpu/drm/i915/i915_drv.c >> > @@ -2079,6 +2079,7 @@ int i915_reset_engine(struct intel_engine_cs *engine, const char *msg) >> > goto out; >> > >> > out: >> > + intel_engine_cancel_stop_cs(engine); >> > i915_gem_reset_finish_engine(engine); >> >> Should we just lift the whole stop/start dance into >> gem_reset_prepare|finish_engine()s? > > No, because it is also used by wedging where we do want the asymmetry. > Been there, done that, have the gem_eio bruises. On wedge the submission path is blocked but yeah, gpu can be in any state and enabling the engine at that point is asking for trouble. Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> > -Chris
Quoting Mika Kuoppala (2018-08-15 10:08:09) > Chris Wilson <chris@chris-wilson.co.uk> writes: > > > Quoting Mika Kuoppala (2018-08-15 09:52:18) > >> Chris Wilson <chris@chris-wilson.co.uk> writes: > >> > >> > If we pardon a per-engine reset, we may leave the STOP_RING bit asserted > >> > in RING_MI_MODE resulting in the engine hanging. Unconditionally clear > >> > it on the per-engine exit path as we know that either we skipped the > >> > reset and so need the cancellation, or the reset was successful and the > >> > cancellation is a no-op, or there was an error and we will follow up > >> > with a full-reset or wedging (both of which will stop the engines again > >> > as required). > >> > > >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > >> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > >> > --- > >> > drivers/gpu/drm/i915/i915_drv.c | 1 + > >> > drivers/gpu/drm/i915/intel_engine_cs.c | 10 ++++++++++ > >> > drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + > >> > 3 files changed, 12 insertions(+) > >> > > >> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > >> > index 9dce55182c3a..41111f2a9c39 100644 > >> > --- a/drivers/gpu/drm/i915/i915_drv.c > >> > +++ b/drivers/gpu/drm/i915/i915_drv.c > >> > @@ -2079,6 +2079,7 @@ int i915_reset_engine(struct intel_engine_cs *engine, const char *msg) > >> > goto out; > >> > > >> > out: > >> > + intel_engine_cancel_stop_cs(engine); > >> > i915_gem_reset_finish_engine(engine); > >> > >> Should we just lift the whole stop/start dance into > >> gem_reset_prepare|finish_engine()s? > > > > No, because it is also used by wedging where we do want the asymmetry. > > Been there, done that, have the gem_eio bruises. > > On wedge the submission path is blocked but yeah, > gpu can be in any state and enabling the engine at that > point is asking for trouble. > > Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Grabbed, pushed, and scarpered. Although this should fix the silly STOP_RING hangs that have been plaguing us for the last few months, there still seems to be more fun to be had here. /o\ -Chris
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 9dce55182c3a..41111f2a9c39 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -2079,6 +2079,7 @@ int i915_reset_engine(struct intel_engine_cs *engine, const char *msg) goto out; out: + intel_engine_cancel_stop_cs(engine); i915_gem_reset_finish_engine(engine); return ret; } diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c index 99d5a24219c1..8628567d8f6e 100644 --- a/drivers/gpu/drm/i915/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/intel_engine_cs.c @@ -788,6 +788,16 @@ int intel_engine_stop_cs(struct intel_engine_cs *engine) return err; } +void intel_engine_cancel_stop_cs(struct intel_engine_cs *engine) +{ + struct drm_i915_private *dev_priv = engine->i915; + + GEM_TRACE("%s\n", engine->name); + + I915_WRITE_FW(RING_MI_MODE(engine->mmio_base), + _MASKED_BIT_DISABLE(STOP_RING)); +} + const char *i915_cache_level_str(struct drm_i915_private *i915, int type) { switch (type) { diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index 9090885d57de..3f6920dd7880 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -906,6 +906,7 @@ int intel_init_blt_ring_buffer(struct intel_engine_cs *engine); int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine); int intel_engine_stop_cs(struct intel_engine_cs *engine); +void intel_engine_cancel_stop_cs(struct intel_engine_cs *engine); u64 intel_engine_get_active_head(const struct intel_engine_cs *engine); u64 intel_engine_get_last_batch_head(const struct intel_engine_cs *engine);
If we pardon a per-engine reset, we may leave the STOP_RING bit asserted in RING_MI_MODE resulting in the engine hanging. Unconditionally clear it on the per-engine exit path as we know that either we skipped the reset and so need the cancellation, or the reset was successful and the cancellation is a no-op, or there was an error and we will follow up with a full-reset or wedging (both of which will stop the engines again as required). Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> --- drivers/gpu/drm/i915/i915_drv.c | 1 + drivers/gpu/drm/i915/intel_engine_cs.c | 10 ++++++++++ drivers/gpu/drm/i915/intel_ringbuffer.h | 1 + 3 files changed, 12 insertions(+)