drm/i915/gt: Mention which device failed
diff mbox series

Message ID 20191115122343.821331-1-chris@chris-wilson.co.uk
State New
Headers show
Series
  • drm/i915/gt: Mention which device failed
Related show

Commit Message

Chris Wilson Nov. 15, 2019, 12:23 p.m. UTC
When telling the user that device power management is disabled, it is
helpful to say which device that was. At the same time, while it is a
mere inconvenience to the user, it is devastating to CI as this and
future tests may fail out of the blue.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Imre Deak <imre.deak@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_rc6.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Imre Deak Nov. 15, 2019, 1:11 p.m. UTC | #1
On Fri, Nov 15, 2019 at 12:23:43PM +0000, Chris Wilson wrote:
> When telling the user that device power management is disabled, it is
> helpful to say which device that was. At the same time, while it is a
> mere inconvenience to the user, it is devastating to CI as this and
> future tests may fail out of the blue.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Imre Deak <imre.deak@intel.com>

Reviewed-by: Imre Deak <imre.deak@intel.com>

> ---
>  drivers/gpu/drm/i915/gt/intel_rc6.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_rc6.c b/drivers/gpu/drm/i915/gt/intel_rc6.c
> index 602a02d01850..b56a903431b8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_rc6.c
> +++ b/drivers/gpu/drm/i915/gt/intel_rc6.c
> @@ -540,7 +540,9 @@ void intel_rc6_ctx_wa_check(struct intel_rc6 *rc6)
>  	if (!intel_rc6_ctx_corrupted(rc6))
>  		return;
>  
> -	DRM_NOTE("RC6 context corruption, disabling runtime power management\n");
> +	dev_notice(i915->drm.dev,
> +		   "RC6 context corruption, disabling runtime power management\n");
> +	add_taint_for_CI(TAINT_WARN);
>  
>  	intel_rc6_disable(rc6);
>  	rc6->ctx_corrupted = true;
> -- 
> 2.24.0
>
Imre Deak Nov. 15, 2019, 1:15 p.m. UTC | #2
On Fri, Nov 15, 2019 at 03:11:43PM +0200, Imre Deak wrote:
> On Fri, Nov 15, 2019 at 12:23:43PM +0000, Chris Wilson wrote:
> > When telling the user that device power management is disabled, it is
> > helpful to say which device that was. At the same time, while it is a
> > mere inconvenience to the user, it is devastating to CI as this and
> > future tests may fail out of the blue.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Imre Deak <imre.deak@intel.com>
> 
> Reviewed-by: Imre Deak <imre.deak@intel.com>

Although we would need a way to test recovery - which we a have a
testcase for - so tainting for that case is not ok.

> 
> > ---
> >  drivers/gpu/drm/i915/gt/intel_rc6.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_rc6.c b/drivers/gpu/drm/i915/gt/intel_rc6.c
> > index 602a02d01850..b56a903431b8 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_rc6.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_rc6.c
> > @@ -540,7 +540,9 @@ void intel_rc6_ctx_wa_check(struct intel_rc6 *rc6)
> >  	if (!intel_rc6_ctx_corrupted(rc6))
> >  		return;
> >  
> > -	DRM_NOTE("RC6 context corruption, disabling runtime power management\n");
> > +	dev_notice(i915->drm.dev,
> > +		   "RC6 context corruption, disabling runtime power management\n");
> > +	add_taint_for_CI(TAINT_WARN);
> >  
> >  	intel_rc6_disable(rc6);
> >  	rc6->ctx_corrupted = true;
> > -- 
> > 2.24.0
> > 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson Nov. 15, 2019, 1:22 p.m. UTC | #3
Quoting Imre Deak (2019-11-15 13:15:30)
> On Fri, Nov 15, 2019 at 03:11:43PM +0200, Imre Deak wrote:
> > On Fri, Nov 15, 2019 at 12:23:43PM +0000, Chris Wilson wrote:
> > > When telling the user that device power management is disabled, it is
> > > helpful to say which device that was. At the same time, while it is a
> > > mere inconvenience to the user, it is devastating to CI as this and
> > > future tests may fail out of the blue.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > Cc: Imre Deak <imre.deak@intel.com>
> > 
> > Reviewed-by: Imre Deak <imre.deak@intel.com>
> 
> Although we would need a way to test recovery - which we a have a
> testcase for - so tainting for that case is not ok.

You put that test at the end of the queue. Fwiw, we don't seem to see
the corrupt state across a module reload; either that or I am blind.

We probably should add the register + known state to the error dump.
-Chris
Chris Wilson Nov. 15, 2019, 1:28 p.m. UTC | #4
Quoting Chris Wilson (2019-11-15 13:22:04)
> Quoting Imre Deak (2019-11-15 13:15:30)
> > On Fri, Nov 15, 2019 at 03:11:43PM +0200, Imre Deak wrote:
> > > On Fri, Nov 15, 2019 at 12:23:43PM +0000, Chris Wilson wrote:
> > > > When telling the user that device power management is disabled, it is
> > > > helpful to say which device that was. At the same time, while it is a
> > > > mere inconvenience to the user, it is devastating to CI as this and
> > > > future tests may fail out of the blue.
> > > > 
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > > Cc: Imre Deak <imre.deak@intel.com>
> > > 
> > > Reviewed-by: Imre Deak <imre.deak@intel.com>
> > 
> > Although we would need a way to test recovery - which we a have a
> > testcase for - so tainting for that case is not ok.
> 
> You put that test at the end of the queue. Fwiw, we don't seem to see
> the corrupt state across a module reload; either that or I am blind.

I guess you would prefer a debugfs...
-Chris
Imre Deak Nov. 15, 2019, 1:30 p.m. UTC | #5
On Fri, Nov 15, 2019 at 01:28:42PM +0000, Chris Wilson wrote:
> Quoting Chris Wilson (2019-11-15 13:22:04)
> > Quoting Imre Deak (2019-11-15 13:15:30)
> > > On Fri, Nov 15, 2019 at 03:11:43PM +0200, Imre Deak wrote:
> > > > On Fri, Nov 15, 2019 at 12:23:43PM +0000, Chris Wilson wrote:
> > > > > When telling the user that device power management is disabled, it is
> > > > > helpful to say which device that was. At the same time, while it is a
> > > > > mere inconvenience to the user, it is devastating to CI as this and
> > > > > future tests may fail out of the blue.
> > > > > 
> > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > > > Cc: Imre Deak <imre.deak@intel.com>
> > > > 
> > > > Reviewed-by: Imre Deak <imre.deak@intel.com>
> > > 
> > > Although we would need a way to test recovery - which we a have a
> > > testcase for - so tainting for that case is not ok.
> > 
> > You put that test at the end of the queue. Fwiw, we don't seem to see
> > the corrupt state across a module reload; either that or I am blind.

Hm, that's a problem then. From intel_rc6_init() we should check for the
corruption, which persist until reboot or S3 suspend/resume.

> 
> I guess you would prefer a debugfs...

Yes, would be clearer.

> -Chris

Patch
diff mbox series

diff --git a/drivers/gpu/drm/i915/gt/intel_rc6.c b/drivers/gpu/drm/i915/gt/intel_rc6.c
index 602a02d01850..b56a903431b8 100644
--- a/drivers/gpu/drm/i915/gt/intel_rc6.c
+++ b/drivers/gpu/drm/i915/gt/intel_rc6.c
@@ -540,7 +540,9 @@  void intel_rc6_ctx_wa_check(struct intel_rc6 *rc6)
 	if (!intel_rc6_ctx_corrupted(rc6))
 		return;
 
-	DRM_NOTE("RC6 context corruption, disabling runtime power management\n");
+	dev_notice(i915->drm.dev,
+		   "RC6 context corruption, disabling runtime power management\n");
+	add_taint_for_CI(TAINT_WARN);
 
 	intel_rc6_disable(rc6);
 	rc6->ctx_corrupted = true;