Message ID | 20191115122343.821331-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/gt: Mention which device failed | expand |
On Fri, Nov 15, 2019 at 12:23:43PM +0000, Chris Wilson wrote: > When telling the user that device power management is disabled, it is > helpful to say which device that was. At the same time, while it is a > mere inconvenience to the user, it is devastating to CI as this and > future tests may fail out of the blue. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > Cc: Imre Deak <imre.deak@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> > --- > drivers/gpu/drm/i915/gt/intel_rc6.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_rc6.c b/drivers/gpu/drm/i915/gt/intel_rc6.c > index 602a02d01850..b56a903431b8 100644 > --- a/drivers/gpu/drm/i915/gt/intel_rc6.c > +++ b/drivers/gpu/drm/i915/gt/intel_rc6.c > @@ -540,7 +540,9 @@ void intel_rc6_ctx_wa_check(struct intel_rc6 *rc6) > if (!intel_rc6_ctx_corrupted(rc6)) > return; > > - DRM_NOTE("RC6 context corruption, disabling runtime power management\n"); > + dev_notice(i915->drm.dev, > + "RC6 context corruption, disabling runtime power management\n"); > + add_taint_for_CI(TAINT_WARN); > > intel_rc6_disable(rc6); > rc6->ctx_corrupted = true; > -- > 2.24.0 >
On Fri, Nov 15, 2019 at 03:11:43PM +0200, Imre Deak wrote: > On Fri, Nov 15, 2019 at 12:23:43PM +0000, Chris Wilson wrote: > > When telling the user that device power management is disabled, it is > > helpful to say which device that was. At the same time, while it is a > > mere inconvenience to the user, it is devastating to CI as this and > > future tests may fail out of the blue. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > > Cc: Imre Deak <imre.deak@intel.com> > > Reviewed-by: Imre Deak <imre.deak@intel.com> Although we would need a way to test recovery - which we a have a testcase for - so tainting for that case is not ok. > > > --- > > drivers/gpu/drm/i915/gt/intel_rc6.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_rc6.c b/drivers/gpu/drm/i915/gt/intel_rc6.c > > index 602a02d01850..b56a903431b8 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_rc6.c > > +++ b/drivers/gpu/drm/i915/gt/intel_rc6.c > > @@ -540,7 +540,9 @@ void intel_rc6_ctx_wa_check(struct intel_rc6 *rc6) > > if (!intel_rc6_ctx_corrupted(rc6)) > > return; > > > > - DRM_NOTE("RC6 context corruption, disabling runtime power management\n"); > > + dev_notice(i915->drm.dev, > > + "RC6 context corruption, disabling runtime power management\n"); > > + add_taint_for_CI(TAINT_WARN); > > > > intel_rc6_disable(rc6); > > rc6->ctx_corrupted = true; > > -- > > 2.24.0 > > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Quoting Imre Deak (2019-11-15 13:15:30) > On Fri, Nov 15, 2019 at 03:11:43PM +0200, Imre Deak wrote: > > On Fri, Nov 15, 2019 at 12:23:43PM +0000, Chris Wilson wrote: > > > When telling the user that device power management is disabled, it is > > > helpful to say which device that was. At the same time, while it is a > > > mere inconvenience to the user, it is devastating to CI as this and > > > future tests may fail out of the blue. > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > > > Cc: Imre Deak <imre.deak@intel.com> > > > > Reviewed-by: Imre Deak <imre.deak@intel.com> > > Although we would need a way to test recovery - which we a have a > testcase for - so tainting for that case is not ok. You put that test at the end of the queue. Fwiw, we don't seem to see the corrupt state across a module reload; either that or I am blind. We probably should add the register + known state to the error dump. -Chris
Quoting Chris Wilson (2019-11-15 13:22:04) > Quoting Imre Deak (2019-11-15 13:15:30) > > On Fri, Nov 15, 2019 at 03:11:43PM +0200, Imre Deak wrote: > > > On Fri, Nov 15, 2019 at 12:23:43PM +0000, Chris Wilson wrote: > > > > When telling the user that device power management is disabled, it is > > > > helpful to say which device that was. At the same time, while it is a > > > > mere inconvenience to the user, it is devastating to CI as this and > > > > future tests may fail out of the blue. > > > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > > > > Cc: Imre Deak <imre.deak@intel.com> > > > > > > Reviewed-by: Imre Deak <imre.deak@intel.com> > > > > Although we would need a way to test recovery - which we a have a > > testcase for - so tainting for that case is not ok. > > You put that test at the end of the queue. Fwiw, we don't seem to see > the corrupt state across a module reload; either that or I am blind. I guess you would prefer a debugfs... -Chris
On Fri, Nov 15, 2019 at 01:28:42PM +0000, Chris Wilson wrote: > Quoting Chris Wilson (2019-11-15 13:22:04) > > Quoting Imre Deak (2019-11-15 13:15:30) > > > On Fri, Nov 15, 2019 at 03:11:43PM +0200, Imre Deak wrote: > > > > On Fri, Nov 15, 2019 at 12:23:43PM +0000, Chris Wilson wrote: > > > > > When telling the user that device power management is disabled, it is > > > > > helpful to say which device that was. At the same time, while it is a > > > > > mere inconvenience to the user, it is devastating to CI as this and > > > > > future tests may fail out of the blue. > > > > > > > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > > > > > Cc: Imre Deak <imre.deak@intel.com> > > > > > > > > Reviewed-by: Imre Deak <imre.deak@intel.com> > > > > > > Although we would need a way to test recovery - which we a have a > > > testcase for - so tainting for that case is not ok. > > > > You put that test at the end of the queue. Fwiw, we don't seem to see > > the corrupt state across a module reload; either that or I am blind. Hm, that's a problem then. From intel_rc6_init() we should check for the corruption, which persist until reboot or S3 suspend/resume. > > I guess you would prefer a debugfs... Yes, would be clearer. > -Chris
diff --git a/drivers/gpu/drm/i915/gt/intel_rc6.c b/drivers/gpu/drm/i915/gt/intel_rc6.c index 602a02d01850..b56a903431b8 100644 --- a/drivers/gpu/drm/i915/gt/intel_rc6.c +++ b/drivers/gpu/drm/i915/gt/intel_rc6.c @@ -540,7 +540,9 @@ void intel_rc6_ctx_wa_check(struct intel_rc6 *rc6) if (!intel_rc6_ctx_corrupted(rc6)) return; - DRM_NOTE("RC6 context corruption, disabling runtime power management\n"); + dev_notice(i915->drm.dev, + "RC6 context corruption, disabling runtime power management\n"); + add_taint_for_CI(TAINT_WARN); intel_rc6_disable(rc6); rc6->ctx_corrupted = true;
When telling the user that device power management is disabled, it is helpful to say which device that was. At the same time, while it is a mere inconvenience to the user, it is devastating to CI as this and future tests may fail out of the blue. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Imre Deak <imre.deak@intel.com> --- drivers/gpu/drm/i915/gt/intel_rc6.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)