diff mbox

drm/i915: Silence debugging DRM_ERROR for failing to suspend vlv powerwells

Message ID 20180409094905.4516-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson April 9, 2018, 9:49 a.m. UTC
If we try to suspend a wedged device following a GPU reset failure, we
will also fail to turn off the rc6 powerwells (on vlv), leading to a
*ERROR*. This is quite expected in this case, so the best we can do is
shake our heads and reduce the *ERROR* to a debug so CI stops
complaining.

Testcase: igt/gem_eio/in-flight-suspend #vlv
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Chris Wilson April 9, 2018, 9:54 a.m. UTC | #1
Quoting Chris Wilson (2018-04-09 10:49:05)
> If we try to suspend a wedged device following a GPU reset failure, we
> will also fail to turn off the rc6 powerwells (on vlv), leading to a
> *ERROR*. This is quite expected in this case, so the best we can do is
> shake our heads and reduce the *ERROR* to a debug so CI stops
> complaining.
> 
> Testcase: igt/gem_eio/in-flight-suspend #vlv
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105583
-Chris
Chris Wilson May 2, 2018, 9:32 p.m. UTC | #2
Quoting Chris Wilson (2018-04-09 10:54:46)
> Quoting Chris Wilson (2018-04-09 10:49:05)
> > If we try to suspend a wedged device following a GPU reset failure, we
> > will also fail to turn off the rc6 powerwells (on vlv), leading to a
> > *ERROR*. This is quite expected in this case, so the best we can do is
> > shake our heads and reduce the *ERROR* to a debug so CI stops
> > complaining.
> > 
> > Testcase: igt/gem_eio/in-flight-suspend #vlv
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105583

Does anyone want to keep this *ERROR* that we can trigger at will?
-Chris
Jani Nikula May 3, 2018, 8:58 a.m. UTC | #3
On Wed, 02 May 2018, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> Quoting Chris Wilson (2018-04-09 10:54:46)
>> Quoting Chris Wilson (2018-04-09 10:49:05)
>> > If we try to suspend a wedged device following a GPU reset failure, we
>> > will also fail to turn off the rc6 powerwells (on vlv), leading to a
>> > *ERROR*. This is quite expected in this case, so the best we can do is
>> > shake our heads and reduce the *ERROR* to a debug so CI stops
>> > complaining.
>> > 
>> > Testcase: igt/gem_eio/in-flight-suspend #vlv
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105583
>
> Does anyone want to keep this *ERROR* that we can trigger at will?

No strong feelings. I guess my question is, can this happen when there's
no GPU reset failures, and the change would silence that?

Anyway, if it makes CI less noisy,

Acked-by: Jani Nikula <jani.nikula@intel.com>
Chris Wilson May 3, 2018, 9:06 a.m. UTC | #4
Quoting Jani Nikula (2018-05-03 09:58:17)
> On Wed, 02 May 2018, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > Quoting Chris Wilson (2018-04-09 10:54:46)
> >> Quoting Chris Wilson (2018-04-09 10:49:05)
> >> > If we try to suspend a wedged device following a GPU reset failure, we
> >> > will also fail to turn off the rc6 powerwells (on vlv), leading to a
> >> > *ERROR*. This is quite expected in this case, so the best we can do is
> >> > shake our heads and reduce the *ERROR* to a debug so CI stops
> >> > complaining.
> >> > 
> >> > Testcase: igt/gem_eio/in-flight-suspend #vlv
> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105583
> >
> > Does anyone want to keep this *ERROR* that we can trigger at will?
> 
> No strong feelings. I guess my question is, can this happen when there's
> no GPU reset failures, and the change would silence that?

It hasn't occurred yet, but we see reset failures in the wild (just not
on byt afair). From the code pov, it is marked as a debug feature and
called before system suspend, so should be harmless (famous last words).
 
> Anyway, if it makes CI less noisy,
> 
> Acked-by: Jani Nikula <jani.nikula@intel.com>

Thanks,
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index f770be18b2d7..db6fc176ec3c 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -2462,10 +2462,13 @@  static void vlv_wait_for_gt_wells(struct drm_i915_private *dev_priv,
 	/*
 	 * RC6 transitioning can be delayed up to 2 msec (see
 	 * valleyview_enable_rps), use 3 msec for safety.
+	 *
+	 * This can fail to turn off the rc6 if the GPU is stuck after a failed
+	 * reset and we are trying to force the machine to sleep.
 	 */
 	if (vlv_wait_for_pw_status(dev_priv, mask, val))
-		DRM_ERROR("timeout waiting for GT wells to go %s\n",
-			  onoff(wait_for_on));
+		DRM_DEBUG_DRIVER("timeout waiting for GT wells to go %s\n",
+				 onoff(wait_for_on));
 }
 
 static void vlv_check_no_gt_access(struct drm_i915_private *dev_priv)