drm/i915/selftests: Try to show where the pulse went
diff mbox series

Message ID 20191128102546.3857140-1-chris@chris-wilson.co.uk
State New
Headers show
Series
  • drm/i915/selftests: Try to show where the pulse went
Related show

Commit Message

Chris Wilson Nov. 28, 2019, 10:25 a.m. UTC
We have a case of a mysteriously absent pulse, so dump the engine
details to see if we can find out what happened to it.

References: https://bugs.freedesktop.org/show_bug.cgi?id=112405
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c           | 2 ++
 drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c | 5 +++++
 2 files changed, 7 insertions(+)

Comments

Mika Kuoppala Nov. 28, 2019, 11:27 a.m. UTC | #1
Chris Wilson <chris@chris-wilson.co.uk> writes:

> We have a case of a mysteriously absent pulse, so dump the engine
> details to see if we can find out what happened to it.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=112405
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c           | 2 ++
>  drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c | 5 +++++
>  2 files changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 8f6e353caa66..df3369c3f330 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -1479,6 +1479,8 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>  		drm_printf(m, "*** WEDGED ***\n");
>  
>  	drm_printf(m, "\tAwake? %d\n", atomic_read(&engine->wakeref.count));
> +	drm_printf(m, "\tBarriers?: %s\n",
> +		   yesno(!llist_empty(&engine->barrier_tasks)));

A random thought arise when reading this and it was that
should we do a memory barrier before dump. But there
would be no pairing and it could actually hide something
more sinister.

It is important to know if and where and why did we drop
our stethoscope.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>  
>  	rcu_read_lock();
>  	rq = READ_ONCE(engine->heartbeat.systole);
> diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
> index f665a0e23c61..0b1148cf3f61 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
> @@ -91,6 +91,11 @@ static int __live_idle_pulse(struct intel_engine_cs *engine,
>  	GEM_BUG_ON(!llist_empty(&engine->barrier_tasks));
>  
>  	if (intel_gt_retire_requests_timeout(engine->gt, HZ / 5)) {
> +		struct drm_printer m = drm_err_printer("pulse");
> +
> +		pr_err("%s: no heartbeat pulse?\n", engine->name);
> +		intel_engine_dump(engine, &m, "%s", engine->name);
> +
>  		err = -ETIME;
>  		goto out;
>  	}
> -- 
> 2.24.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson Nov. 28, 2019, 11:31 a.m. UTC | #2
Quoting Mika Kuoppala (2019-11-28 11:27:30)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > We have a case of a mysteriously absent pulse, so dump the engine
> > details to see if we can find out what happened to it.
> >
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=112405
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/gt/intel_engine_cs.c           | 2 ++
> >  drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c | 5 +++++
> >  2 files changed, 7 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > index 8f6e353caa66..df3369c3f330 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > @@ -1479,6 +1479,8 @@ void intel_engine_dump(struct intel_engine_cs *engine,
> >               drm_printf(m, "*** WEDGED ***\n");
> >  
> >       drm_printf(m, "\tAwake? %d\n", atomic_read(&engine->wakeref.count));
> > +     drm_printf(m, "\tBarriers?: %s\n",
> > +                yesno(!llist_empty(&engine->barrier_tasks)));
> 
> A random thought arise when reading this and it was that
> should we do a memory barrier before dump. But there
> would be no pairing and it could actually hide something
> more sinister.

Yeah, exactly who do we want to serialise with -- and we will always
need a snapshot just in case we screw that up :)

It's debug; as async as we can without oopsing, and keep on staring
until it starts to make sense.
-Chris

Patch
diff mbox series

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 8f6e353caa66..df3369c3f330 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1479,6 +1479,8 @@  void intel_engine_dump(struct intel_engine_cs *engine,
 		drm_printf(m, "*** WEDGED ***\n");
 
 	drm_printf(m, "\tAwake? %d\n", atomic_read(&engine->wakeref.count));
+	drm_printf(m, "\tBarriers?: %s\n",
+		   yesno(!llist_empty(&engine->barrier_tasks)));
 
 	rcu_read_lock();
 	rq = READ_ONCE(engine->heartbeat.systole);
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
index f665a0e23c61..0b1148cf3f61 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
@@ -91,6 +91,11 @@  static int __live_idle_pulse(struct intel_engine_cs *engine,
 	GEM_BUG_ON(!llist_empty(&engine->barrier_tasks));
 
 	if (intel_gt_retire_requests_timeout(engine->gt, HZ / 5)) {
+		struct drm_printer m = drm_err_printer("pulse");
+
+		pr_err("%s: no heartbeat pulse?\n", engine->name);
+		intel_engine_dump(engine, &m, "%s", engine->name);
+
 		err = -ETIME;
 		goto out;
 	}