Message ID | 20171211194135.27095-2-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, 2017-12-11 at 19:41 +0000, Chris Wilson wrote: > If wait_for_engines() fails and we resort to declaring the HW wedged, > dump the engine state for debugging. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Regards, Joonas
Quoting Joonas Lahtinen (2017-12-12 13:40:25) > On Mon, 2017-12-11 at 19:41 +0000, Chris Wilson wrote: > > If wait_for_engines() fails and we resort to declaring the HW wedged, > > dump the engine state for debugging. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Ta for the review. I've worked through to the immediate cause of the problem, so if you would like to review drm/i915: Don't check #active_requests from i915_gem_wait_for_idle() drm/i915: Mark up potential allocation paths within i915_sw_fence as might_sleep drm/i915: Allow fence allocations to fail drm/i915: Ratelimit request allocation under oom and igt/gem_shrink: Exercise allocations in the middle of execbuf under oom-pressure next, that would be grand. A fine piece of cheese, Gromit. I'm still puzzling how such a simple piece of code managed to get into so much trouble in the first place. I suppose it was able to fill 3 rings with a few 10k requests each, which is definitely more than enough to run into oom on that machine. Ok, not such a mystery after all. -Chris
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 694f0551a66e..9e957b213fdb 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3534,7 +3534,18 @@ static int wait_for_timeline(struct i915_gem_timeline *tl, unsigned int flags) static int wait_for_engines(struct drm_i915_private *i915) { if (wait_for(intel_engines_are_idle(i915), I915_IDLE_ENGINES_TIMEOUT)) { - DRM_ERROR("Failed to idle engines, declaring wedged!\n"); + dev_err(i915->drm.dev, + "Failed to idle engines, declaring wedged!\n"); + if (drm_debug & DRM_UT_DRIVER) { + struct drm_printer p = drm_debug_printer(__func__); + struct intel_engine_cs *engine; + enum intel_engine_id id; + + for_each_engine(engine, i915, id) + intel_engine_dump(engine, &p, + "%s", engine->name); + } + i915_gem_set_wedged(i915); return -EIO; }
If wait_for_engines() fails and we resort to declaring the HW wedged, dump the engine state for debugging. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> --- drivers/gpu/drm/i915/i915_gem.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)