diff mbox

[2/2] drm/i915: Dump the engine state before declaring wedged from wait_for_engines()

Message ID 20171211194135.27095-2-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson Dec. 11, 2017, 7:41 p.m. UTC
If wait_for_engines() fails and we resort to declaring the HW wedged,
dump the engine state for debugging.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Comments

Joonas Lahtinen Dec. 12, 2017, 1:40 p.m. UTC | #1
On Mon, 2017-12-11 at 19:41 +0000, Chris Wilson wrote:
> If wait_for_engines() fails and we resort to declaring the HW wedged,
> dump the engine state for debugging.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
Chris Wilson Dec. 12, 2017, 9:40 p.m. UTC | #2
Quoting Joonas Lahtinen (2017-12-12 13:40:25)
> On Mon, 2017-12-11 at 19:41 +0000, Chris Wilson wrote:
> > If wait_for_engines() fails and we resort to declaring the HW wedged,
> > dump the engine state for debugging.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> 
> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Ta for the review. I've worked through to the immediate cause of the
problem, so if you would like to review

drm/i915: Don't check #active_requests from i915_gem_wait_for_idle()
drm/i915: Mark up potential allocation paths within i915_sw_fence as might_sleep
drm/i915: Allow fence allocations to fail
drm/i915: Ratelimit request allocation under oom

and

igt/gem_shrink: Exercise allocations in the middle of execbuf under oom-pressure

next, that would be grand. A fine piece of cheese, Gromit.

I'm still puzzling how such a simple piece of code managed to get into
so much trouble in the first place. I suppose it was able to fill 3
rings with a few 10k requests each, which is definitely more than enough
to run into oom on that machine. Ok, not such a mystery after all.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 694f0551a66e..9e957b213fdb 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3534,7 +3534,18 @@  static int wait_for_timeline(struct i915_gem_timeline *tl, unsigned int flags)
 static int wait_for_engines(struct drm_i915_private *i915)
 {
 	if (wait_for(intel_engines_are_idle(i915), I915_IDLE_ENGINES_TIMEOUT)) {
-		DRM_ERROR("Failed to idle engines, declaring wedged!\n");
+		dev_err(i915->drm.dev,
+			"Failed to idle engines, declaring wedged!\n");
+		if (drm_debug & DRM_UT_DRIVER) {
+			struct drm_printer p = drm_debug_printer(__func__);
+			struct intel_engine_cs *engine;
+			enum intel_engine_id id;
+
+			for_each_engine(engine, i915, id)
+				intel_engine_dump(engine, &p,
+						  "%s", engine->name);
+		}
+
 		i915_gem_set_wedged(i915);
 		return -EIO;
 	}