Message ID | 1396023498-8635-1-git-send-email-mika.kuoppala@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Mar 28, 2014 at 06:18:18PM +0200, Mika Kuoppala wrote: > Piglit runner and QA are both looking at the dmesg for > DRM_ERRORs with test cases. Add a flag to control those > when we they are expected from related test cases. > > Also add flag to control if contexts should be banned > that introduced the hang. Hangcheck is timer based and > preventing bans by adding sleeps to testcases makes > testing slower. > > v2: intel_ring_stopped(), readable comment (Chris) > v3: keep compatibility (Daniel) Do we really want to keep backwards compatibility with the igt testsuite that currently fails through debugfs? > References: https://bugs.freedesktop.org/show_bug.cgi?id=75876 > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> However, it is not a battle worth fighting, so either way, Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> -Chris
On Fri, Mar 28, 2014 at 05:04:25PM +0000, Chris Wilson wrote: > On Fri, Mar 28, 2014 at 06:18:18PM +0200, Mika Kuoppala wrote: > > Piglit runner and QA are both looking at the dmesg for > > DRM_ERRORs with test cases. Add a flag to control those > > when we they are expected from related test cases. > > > > Also add flag to control if contexts should be banned > > that introduced the hang. Hangcheck is timer based and > > preventing bans by adding sleeps to testcases makes > > testing slower. > > > > v2: intel_ring_stopped(), readable comment (Chris) > > v3: keep compatibility (Daniel) > > Do we really want to keep backwards compatibility with the igt testsuite > that currently fails through debugfs? I'm leaning towards keeping things working if it's not too much fuzz. Yeah, currently igt is a bit botched, but with this negative flag you can still run older igt on newer kernels and the other way round, withing limits of things simply being broken. > > > References: https://bugs.freedesktop.org/show_bug.cgi?id=75876 > > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> > > However, it is not a battle worth fighting, so either way, > Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Sorry that I've missed this, please ping me next time around there's a patch ready to be picked up. Merged to dinq, thanks. Please push the relevant igt patches, last time I've looked the interfaces and docs and all that looked excellent. Cheers, Daniel
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index c697204..7b80a55 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1102,8 +1102,12 @@ struct i915_gpu_error { */ wait_queue_head_t reset_queue; - /* For gpu hang simulation. */ - unsigned int stop_rings; + /* Userspace knobs for gpu hang simulation; + * combines both a ring mask, and extra flags + */ + u32 stop_rings; +#define I915_STOP_RING_ALLOW_BAN (1 << 31) +#define I915_STOP_RING_ALLOW_WARN (1 << 30) /* For missed irq/seqno simulation. */ unsigned int test_irq_rings; @@ -2149,6 +2153,18 @@ static inline u32 i915_reset_count(struct i915_gpu_error *error) return ((atomic_read(&error->reset_counter) & ~I915_WEDGED) + 1) / 2; } +static inline bool i915_stop_ring_allow_ban(struct drm_i915_private *dev_priv) +{ + return dev_priv->gpu_error.stop_rings == 0 || + dev_priv->gpu_error.stop_rings & I915_STOP_RING_ALLOW_BAN; +} + +static inline bool i915_stop_ring_allow_warn(struct drm_i915_private *dev_priv) +{ + return dev_priv->gpu_error.stop_rings == 0 || + dev_priv->gpu_error.stop_rings & I915_STOP_RING_ALLOW_WARN; +} + void i915_gem_reset(struct drm_device *dev); bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, bool force); int __must_check i915_gem_object_finish_gpu(struct drm_i915_gem_object *obj); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 33bbaa0..4dde5b1 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2277,8 +2277,9 @@ static bool i915_context_is_banned(struct drm_i915_private *dev_priv, if (!i915_gem_context_is_default(ctx)) { DRM_DEBUG("context hanging too fast, banning!\n"); return true; - } else if (dev_priv->gpu_error.stop_rings == 0) { - DRM_ERROR("gpu hanging too fast, banning!\n"); + } else if (i915_stop_ring_allow_ban(dev_priv)) { + if (i915_stop_ring_allow_warn(dev_priv)) + DRM_ERROR("gpu hanging too fast, banning!\n"); return true; } } diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 913b8ab..9c0c292 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -41,12 +41,16 @@ static inline int ring_space(struct intel_ring_buffer *ring) return space; } -void __intel_ring_advance(struct intel_ring_buffer *ring) +static bool intel_ring_stopped(struct intel_ring_buffer *ring) { struct drm_i915_private *dev_priv = ring->dev->dev_private; + return dev_priv->gpu_error.stop_rings & intel_ring_flag(ring); +} +void __intel_ring_advance(struct intel_ring_buffer *ring) +{ ring->tail &= ring->size - 1; - if (dev_priv->gpu_error.stop_rings & intel_ring_flag(ring)) + if (intel_ring_stopped(ring)) return; ring->write_tail(ring, ring->tail); }
Piglit runner and QA are both looking at the dmesg for DRM_ERRORs with test cases. Add a flag to control those when we they are expected from related test cases. Also add flag to control if contexts should be banned that introduced the hang. Hangcheck is timer based and preventing bans by adding sleeps to testcases makes testing slower. v2: intel_ring_stopped(), readable comment (Chris) v3: keep compatibility (Daniel) References: https://bugs.freedesktop.org/show_bug.cgi?id=75876 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> --- drivers/gpu/drm/i915/i915_drv.h | 20 ++++++++++++++++++-- drivers/gpu/drm/i915/i915_gem.c | 5 +++-- drivers/gpu/drm/i915/intel_ringbuffer.c | 8 ++++++-- 3 files changed, 27 insertions(+), 6 deletions(-)