diff mbox series

drm/i915/selftests: Try to show where the pulse went

Message ID 20191128102546.3857140-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series drm/i915/selftests: Try to show where the pulse went | expand

Commit Message

Chris Wilson Nov. 28, 2019, 10:25 a.m. UTC
We have a case of a mysteriously absent pulse, so dump the engine
details to see if we can find out what happened to it.

References: https://bugs.freedesktop.org/show_bug.cgi?id=112405
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c           | 2 ++
 drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c | 5 +++++
 2 files changed, 7 insertions(+)

Comments

Mika Kuoppala Nov. 28, 2019, 11:27 a.m. UTC | #1
Chris Wilson <chris@chris-wilson.co.uk> writes:

> We have a case of a mysteriously absent pulse, so dump the engine
> details to see if we can find out what happened to it.
>
> References: https://bugs.freedesktop.org/show_bug.cgi?id=112405
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c           | 2 ++
>  drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c | 5 +++++
>  2 files changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 8f6e353caa66..df3369c3f330 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -1479,6 +1479,8 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>  		drm_printf(m, "*** WEDGED ***\n");
>  
>  	drm_printf(m, "\tAwake? %d\n", atomic_read(&engine->wakeref.count));
> +	drm_printf(m, "\tBarriers?: %s\n",
> +		   yesno(!llist_empty(&engine->barrier_tasks)));

A random thought arise when reading this and it was that
should we do a memory barrier before dump. But there
would be no pairing and it could actually hide something
more sinister.

It is important to know if and where and why did we drop
our stethoscope.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
>  
>  	rcu_read_lock();
>  	rq = READ_ONCE(engine->heartbeat.systole);
> diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
> index f665a0e23c61..0b1148cf3f61 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
> @@ -91,6 +91,11 @@ static int __live_idle_pulse(struct intel_engine_cs *engine,
>  	GEM_BUG_ON(!llist_empty(&engine->barrier_tasks));
>  
>  	if (intel_gt_retire_requests_timeout(engine->gt, HZ / 5)) {
> +		struct drm_printer m = drm_err_printer("pulse");
> +
> +		pr_err("%s: no heartbeat pulse?\n", engine->name);
> +		intel_engine_dump(engine, &m, "%s", engine->name);
> +
>  		err = -ETIME;
>  		goto out;
>  	}
> -- 
> 2.24.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson Nov. 28, 2019, 11:31 a.m. UTC | #2
Quoting Mika Kuoppala (2019-11-28 11:27:30)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > We have a case of a mysteriously absent pulse, so dump the engine
> > details to see if we can find out what happened to it.
> >
> > References: https://bugs.freedesktop.org/show_bug.cgi?id=112405
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/gt/intel_engine_cs.c           | 2 ++
> >  drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c | 5 +++++
> >  2 files changed, 7 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > index 8f6e353caa66..df3369c3f330 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> > @@ -1479,6 +1479,8 @@ void intel_engine_dump(struct intel_engine_cs *engine,
> >               drm_printf(m, "*** WEDGED ***\n");
> >  
> >       drm_printf(m, "\tAwake? %d\n", atomic_read(&engine->wakeref.count));
> > +     drm_printf(m, "\tBarriers?: %s\n",
> > +                yesno(!llist_empty(&engine->barrier_tasks)));
> 
> A random thought arise when reading this and it was that
> should we do a memory barrier before dump. But there
> would be no pairing and it could actually hide something
> more sinister.

Yeah, exactly who do we want to serialise with -- and we will always
need a snapshot just in case we screw that up :)

It's debug; as async as we can without oopsing, and keep on staring
until it starts to make sense.
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 8f6e353caa66..df3369c3f330 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1479,6 +1479,8 @@  void intel_engine_dump(struct intel_engine_cs *engine,
 		drm_printf(m, "*** WEDGED ***\n");
 
 	drm_printf(m, "\tAwake? %d\n", atomic_read(&engine->wakeref.count));
+	drm_printf(m, "\tBarriers?: %s\n",
+		   yesno(!llist_empty(&engine->barrier_tasks)));
 
 	rcu_read_lock();
 	rq = READ_ONCE(engine->heartbeat.systole);
diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
index f665a0e23c61..0b1148cf3f61 100644
--- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c
@@ -91,6 +91,11 @@  static int __live_idle_pulse(struct intel_engine_cs *engine,
 	GEM_BUG_ON(!llist_empty(&engine->barrier_tasks));
 
 	if (intel_gt_retire_requests_timeout(engine->gt, HZ / 5)) {
+		struct drm_printer m = drm_err_printer("pulse");
+
+		pr_err("%s: no heartbeat pulse?\n", engine->name);
+		intel_engine_dump(engine, &m, "%s", engine->name);
+
 		err = -ETIME;
 		goto out;
 	}