diff mbox

[13/15] drm/i915: Emit a user level message when resetting the GPU (or engine)

Message ID 20170717091141.23102-13-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson July 17, 2017, 9:11 a.m. UTC
Although a banned context will be told to -EIO off if they try to submit
more requests, we have a discrepancy between whole device resets and
per-engine resets where we report the GPU reset but not the engine
resets. This leaves a bit of mystery as to why the context was banned,
and also reduces awareness overall of when a GPU (engine) reset occurs
with its possible side-effects.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

Comments

Michel Thierry July 18, 2017, 12:22 a.m. UTC | #1
On 17/07/17 02:11, Chris Wilson wrote:
> Although a banned context will be told to -EIO off if they try to submit
> more requests, we have a discrepancy between whole device resets and
> per-engine resets where we report the GPU reset but not the engine
> resets. This leaves a bit of mystery as to why the context was banned,
> and also reduces awareness overall of when a GPU (engine) reset occurs
> with its possible side-effects.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index bc121a46ed9a..4b62fd012877 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1865,9 +1865,10 @@ void i915_reset(struct drm_i915_private *dev_priv)
>   	if (!i915_gem_unset_wedged(dev_priv))
>   		goto wakeup;
>   
> +	dev_notice(dev_priv->drm.dev,
> +		   "Resetting chip after gpu hang\n");
>   	error->reset_count++;
>   
> -	pr_notice("drm/i915: Resetting chip after gpu hang\n");
>   	disable_irq(dev_priv->drm.irq);
>   	ret = i915_gem_reset_prepare(dev_priv);
>   	if (ret) {
> @@ -1945,7 +1946,9 @@ int i915_reset_engine(struct intel_engine_cs *engine)
>   
>   	GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &error->flags));
>   
> -	DRM_DEBUG_DRIVER("resetting %s\n", engine->name);
> +	dev_notice(engine->i915->drm.dev,
> +		   "Resetting %s after gpu hang\n", engine->name);
> +	error->reset_engine_count[engine->id]++;
>   

This will increment both the engine-reset-count and gpu-reset count in 
the unlikely case that engine-reset gets promoted to full reset.

Not a problem per-se, but I wanted to point it out (plus it makes both 
functions symmetric).

>   	active_request = i915_gem_reset_prepare_engine(engine);
>   	if (IS_ERR(active_request)) {
> @@ -1978,7 +1981,6 @@ int i915_reset_engine(struct intel_engine_cs *engine)
>   	if (ret)
>   		goto out;
>   
> -	error->reset_engine_count[engine->id]++;
>   out:
>   	i915_gem_reset_finish_engine(engine);
>   	return ret;
> 

Reviewed-by: Michel Thierry <michel.thierry@intel.com>
Chris Wilson July 20, 2017, 12:52 p.m. UTC | #2
Quoting Michel Thierry (2017-07-18 01:22:28)
> On 17/07/17 02:11, Chris Wilson wrote:
> > Although a banned context will be told to -EIO off if they try to submit
> > more requests, we have a discrepancy between whole device resets and
> > per-engine resets where we report the GPU reset but not the engine
> > resets. This leaves a bit of mystery as to why the context was banned,
> > and also reduces awareness overall of when a GPU (engine) reset occurs
> > with its possible side-effects.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Michel Thierry <michel.thierry@intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_drv.c | 8 +++++---
> >   1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > index bc121a46ed9a..4b62fd012877 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -1865,9 +1865,10 @@ void i915_reset(struct drm_i915_private *dev_priv)
> >       if (!i915_gem_unset_wedged(dev_priv))
> >               goto wakeup;
> >   
> > +     dev_notice(dev_priv->drm.dev,
> > +                "Resetting chip after gpu hang\n");
> >       error->reset_count++;
> >   
> > -     pr_notice("drm/i915: Resetting chip after gpu hang\n");
> >       disable_irq(dev_priv->drm.irq);
> >       ret = i915_gem_reset_prepare(dev_priv);
> >       if (ret) {
> > @@ -1945,7 +1946,9 @@ int i915_reset_engine(struct intel_engine_cs *engine)
> >   
> >       GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &error->flags));
> >   
> > -     DRM_DEBUG_DRIVER("resetting %s\n", engine->name);
> > +     dev_notice(engine->i915->drm.dev,
> > +                "Resetting %s after gpu hang\n", engine->name);
> > +     error->reset_engine_count[engine->id]++;
> >   
> 
> This will increment both the engine-reset-count and gpu-reset count in 
> the unlikely case that engine-reset gets promoted to full reset.
> 
> Not a problem per-se, but I wanted to point it out (plus it makes both 
> functions symmetric).

I felt it was justified as then we always increment either counter on
every attempt, not just success, which was the behaviour for the global
counter. I guess should split that out since it is unrelated.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index bc121a46ed9a..4b62fd012877 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1865,9 +1865,10 @@  void i915_reset(struct drm_i915_private *dev_priv)
 	if (!i915_gem_unset_wedged(dev_priv))
 		goto wakeup;
 
+	dev_notice(dev_priv->drm.dev,
+		   "Resetting chip after gpu hang\n");
 	error->reset_count++;
 
-	pr_notice("drm/i915: Resetting chip after gpu hang\n");
 	disable_irq(dev_priv->drm.irq);
 	ret = i915_gem_reset_prepare(dev_priv);
 	if (ret) {
@@ -1945,7 +1946,9 @@  int i915_reset_engine(struct intel_engine_cs *engine)
 
 	GEM_BUG_ON(!test_bit(I915_RESET_ENGINE + engine->id, &error->flags));
 
-	DRM_DEBUG_DRIVER("resetting %s\n", engine->name);
+	dev_notice(engine->i915->drm.dev,
+		   "Resetting %s after gpu hang\n", engine->name);
+	error->reset_engine_count[engine->id]++;
 
 	active_request = i915_gem_reset_prepare_engine(engine);
 	if (IS_ERR(active_request)) {
@@ -1978,7 +1981,6 @@  int i915_reset_engine(struct intel_engine_cs *engine)
 	if (ret)
 		goto out;
 
-	error->reset_engine_count[engine->id]++;
 out:
 	i915_gem_reset_finish_engine(engine);
 	return ret;