diff mbox series

[01/28] drm/i915: Wait for a moment before forcibly resetting the device

Message ID 20190128010245.20148-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [01/28] drm/i915: Wait for a moment before forcibly resetting the device | expand

Commit Message

Chris Wilson Jan. 28, 2019, 1:02 a.m. UTC
During igt, we ask to reset the device if any requests are still
outstanding at the end of a test, as this quickly kills off any
erroneous hanging request streams that may escape a test. However, since
it may take the device a few milliseconds to flush itself after the end
of a normal test, *cough* guc *cough*, we may accidentally tell the
device to reset itself after it idles. If we wait a moment, our usual
I915_IDLE_ENGINES_TIMEOUT of 200ms (seems a bit high, but still better
than umpteen hangchecks!), we can differentiate better between a stuck
engine and a healthy one, and so avoid prematurely forcing the reset and
any extra complications that may entail.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Mika Kuoppala Jan. 28, 2019, 9:24 a.m. UTC | #1
Chris Wilson <chris@chris-wilson.co.uk> writes:

> During igt, we ask to reset the device if any requests are still
> outstanding at the end of a test, as this quickly kills off any
> erroneous hanging request streams that may escape a test. However, since
> it may take the device a few milliseconds to flush itself after the end
> of a normal test, *cough* guc *cough*, we may accidentally tell the
> device to reset itself after it idles. If we wait a moment, our usual
> I915_IDLE_ENGINES_TIMEOUT of 200ms (seems a bit high, but still better
> than umpteen hangchecks!), we can differentiate better between a stuck
> engine and a healthy one, and so avoid prematurely forcing the reset and
> any extra complications that may entail.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 3b995f9fdc06..e46de507fea2 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -4051,7 +4051,8 @@ i915_drop_caches_set(void *data, u64 val)
>  		  val, val & DROP_ALL);
>  	wakeref = intel_runtime_pm_get(i915);
>  
> -	if (val & DROP_RESET_ACTIVE && !intel_engines_are_idle(i915))
> +	if (val & DROP_RESET_ACTIVE &&
> +	    wait_for(intel_engines_are_idle(i915), I915_IDLE_ENGINES_TIMEOUT))
>  		i915_gem_set_wedged(i915);

Some of the compilications have been welcomed. But it is still
better to try to entail them into tests explicitly rather
than using indirect test harness stress.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>


>  
>  	/* No need to check and wait for gpu resets, only libdrm auto-restarts
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson Jan. 28, 2019, 9:38 a.m. UTC | #2
Quoting Mika Kuoppala (2019-01-28 09:24:12)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > During igt, we ask to reset the device if any requests are still
> > outstanding at the end of a test, as this quickly kills off any
> > erroneous hanging request streams that may escape a test. However, since
> > it may take the device a few milliseconds to flush itself after the end
> > of a normal test, *cough* guc *cough*, we may accidentally tell the
> > device to reset itself after it idles. If we wait a moment, our usual
> > I915_IDLE_ENGINES_TIMEOUT of 200ms (seems a bit high, but still better
> > than umpteen hangchecks!), we can differentiate better between a stuck
> > engine and a healthy one, and so avoid prematurely forcing the reset and
> > any extra complications that may entail.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_debugfs.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index 3b995f9fdc06..e46de507fea2 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -4051,7 +4051,8 @@ i915_drop_caches_set(void *data, u64 val)
> >                 val, val & DROP_ALL);
> >       wakeref = intel_runtime_pm_get(i915);
> >  
> > -     if (val & DROP_RESET_ACTIVE && !intel_engines_are_idle(i915))
> > +     if (val & DROP_RESET_ACTIVE &&
> > +         wait_for(intel_engines_are_idle(i915), I915_IDLE_ENGINES_TIMEOUT))
> >               i915_gem_set_wedged(i915);
> 
> Some of the compilications have been welcomed. But it is still
> better to try to entail them into tests explicitly rather
> than using indirect test harness stress.
> 
> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

Pushed to mask potential problems in *-guc BAT.
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 3b995f9fdc06..e46de507fea2 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -4051,7 +4051,8 @@  i915_drop_caches_set(void *data, u64 val)
 		  val, val & DROP_ALL);
 	wakeref = intel_runtime_pm_get(i915);
 
-	if (val & DROP_RESET_ACTIVE && !intel_engines_are_idle(i915))
+	if (val & DROP_RESET_ACTIVE &&
+	    wait_for(intel_engines_are_idle(i915), I915_IDLE_ENGINES_TIMEOUT))
 		i915_gem_set_wedged(i915);
 
 	/* No need to check and wait for gpu resets, only libdrm auto-restarts