Message ID | 20191128102546.3857140-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/selftests: Try to show where the pulse went | expand |
Chris Wilson <chris@chris-wilson.co.uk> writes: > We have a case of a mysteriously absent pulse, so dump the engine > details to see if we can find out what happened to it. > > References: https://bugs.freedesktop.org/show_bug.cgi?id=112405 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 2 ++ > drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c | 5 +++++ > 2 files changed, 7 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > index 8f6e353caa66..df3369c3f330 100644 > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > @@ -1479,6 +1479,8 @@ void intel_engine_dump(struct intel_engine_cs *engine, > drm_printf(m, "*** WEDGED ***\n"); > > drm_printf(m, "\tAwake? %d\n", atomic_read(&engine->wakeref.count)); > + drm_printf(m, "\tBarriers?: %s\n", > + yesno(!llist_empty(&engine->barrier_tasks))); A random thought arise when reading this and it was that should we do a memory barrier before dump. But there would be no pairing and it could actually hide something more sinister. It is important to know if and where and why did we drop our stethoscope. Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> > > rcu_read_lock(); > rq = READ_ONCE(engine->heartbeat.systole); > diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c > index f665a0e23c61..0b1148cf3f61 100644 > --- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c > +++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c > @@ -91,6 +91,11 @@ static int __live_idle_pulse(struct intel_engine_cs *engine, > GEM_BUG_ON(!llist_empty(&engine->barrier_tasks)); > > if (intel_gt_retire_requests_timeout(engine->gt, HZ / 5)) { > + struct drm_printer m = drm_err_printer("pulse"); > + > + pr_err("%s: no heartbeat pulse?\n", engine->name); > + intel_engine_dump(engine, &m, "%s", engine->name); > + > err = -ETIME; > goto out; > } > -- > 2.24.0 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Quoting Mika Kuoppala (2019-11-28 11:27:30) > Chris Wilson <chris@chris-wilson.co.uk> writes: > > > We have a case of a mysteriously absent pulse, so dump the engine > > details to see if we can find out what happened to it. > > > > References: https://bugs.freedesktop.org/show_bug.cgi?id=112405 > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > --- > > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 2 ++ > > drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c | 5 +++++ > > 2 files changed, 7 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > > index 8f6e353caa66..df3369c3f330 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > > @@ -1479,6 +1479,8 @@ void intel_engine_dump(struct intel_engine_cs *engine, > > drm_printf(m, "*** WEDGED ***\n"); > > > > drm_printf(m, "\tAwake? %d\n", atomic_read(&engine->wakeref.count)); > > + drm_printf(m, "\tBarriers?: %s\n", > > + yesno(!llist_empty(&engine->barrier_tasks))); > > A random thought arise when reading this and it was that > should we do a memory barrier before dump. But there > would be no pairing and it could actually hide something > more sinister. Yeah, exactly who do we want to serialise with -- and we will always need a snapshot just in case we screw that up :) It's debug; as async as we can without oopsing, and keep on staring until it starts to make sense. -Chris
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 8f6e353caa66..df3369c3f330 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1479,6 +1479,8 @@ void intel_engine_dump(struct intel_engine_cs *engine, drm_printf(m, "*** WEDGED ***\n"); drm_printf(m, "\tAwake? %d\n", atomic_read(&engine->wakeref.count)); + drm_printf(m, "\tBarriers?: %s\n", + yesno(!llist_empty(&engine->barrier_tasks))); rcu_read_lock(); rq = READ_ONCE(engine->heartbeat.systole); diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c index f665a0e23c61..0b1148cf3f61 100644 --- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c +++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c @@ -91,6 +91,11 @@ static int __live_idle_pulse(struct intel_engine_cs *engine, GEM_BUG_ON(!llist_empty(&engine->barrier_tasks)); if (intel_gt_retire_requests_timeout(engine->gt, HZ / 5)) { + struct drm_printer m = drm_err_printer("pulse"); + + pr_err("%s: no heartbeat pulse?\n", engine->name); + intel_engine_dump(engine, &m, "%s", engine->name); + err = -ETIME; goto out; }
We have a case of a mysteriously absent pulse, so dump the engine details to see if we can find out what happened to it. References: https://bugs.freedesktop.org/show_bug.cgi?id=112405 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 2 ++ drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c | 5 +++++ 2 files changed, 7 insertions(+)