drm/i915: Show stack (by WARN) for hitting forcewake errors

Message ID	20180720111102.11549-1-chris@chris-wilson.co.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Chris Wilson <chris@chris-wilson.co.uk> To: intel-gfx@lists.freedesktop.org Date: Fri, 20 Jul 2018 12:11:02 +0100 Message-Id: <20180720111102.11549-1-chris@chris-wilson.co.uk> Subject: [Intel-gfx] [PATCH] drm/i915: Show stack (by WARN) for hitting forcewake errors Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Message ID

20180720111102.11549-1-chris@chris-wilson.co.uk (mailing list archive)

State

New, archived

Headers

From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Date: Fri, 20 Jul 2018 12:11:02 +0100
Message-Id: <20180720111102.11549-1-chris@chris-wilson.co.uk>
Subject: [Intel-gfx] [PATCH] drm/i915: Show stack (by WARN) for hitting
	forcewake errors
Precedence: list
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Commit Message

Chris Wilson July 20, 2018, 11:11 a.m. UTC

On Sandybridge, we need a workaround to wait for the CPU thread to wake
up before we are sure that we have enabled the GT power well. However,
we do see the errors being reported and failed reads returning spurious
results. To try and capture more details as it fails, promote the error
into a WARN so we grab the stacktrace, and to try and reduce the
frequency of error increase the timeout from 500us to 5ms.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_uncore.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

Comments

Tvrtko Ursulin July 24, 2018, 10:33 a.m. UTC | #1

On 20/07/2018 12:11, Chris Wilson wrote:
> On Sandybridge, we need a workaround to wait for the CPU thread to wake
> up before we are sure that we have enabled the GT power well. However,
> we do see the errors being reported and failed reads returning spurious
> results. To try and capture more details as it fails, promote the error
> into a WARN so we grab the stacktrace, and to try and reduce the
> frequency of error increase the timeout from 500us to 5ms.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_uncore.c | 18 ++++++++++++++----
>   1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index b892ca8396e8..284be151f645 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -283,14 +283,24 @@ fw_domains_reset(struct drm_i915_private *i915,
>   		fw_domain_reset(i915, d);
>   }
>   
> +static inline u32 gt_thread_status(struct drm_i915_private *dev_priv)
> +{
> +	u32 val;
> +
> +	val = __raw_i915_read32(dev_priv, GEN6_GT_THREAD_STATUS_REG);
> +	val &= GEN6_GT_THREAD_STATUS_CORE_MASK;
> +
> +	return val;
> +}
> +
>   static void __gen6_gt_wait_for_thread_c0(struct drm_i915_private *dev_priv)
>   {
> -	/* w/a for a sporadic read returning 0 by waiting for the GT
> +	/*
> +	 * w/a for a sporadic read returning 0 by waiting for the GT
>   	 * thread to wake up.
>   	 */
> -	if (wait_for_atomic_us((__raw_i915_read32(dev_priv, GEN6_GT_THREAD_STATUS_REG) &
> -				GEN6_GT_THREAD_STATUS_CORE_MASK) == 0, 500))
> -		DRM_ERROR("GT thread status wait timed out\n");
> +	WARN_ONCE(wait_for_atomic_us(gt_thread_status(dev_priv) == 0, 5000),
> +		  "GT thread status wait timed out\n");
>   }
>   
>   static void fw_domains_get_with_thread_status(struct drm_i915_private *dev_priv,
> 

Quite a large increase - but I guess nothing we can do if it's hit, nor 
it's better to time out faster if the wait needs to be longer.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

Chris Wilson July 24, 2018, 10:59 a.m. UTC | #2

Quoting Tvrtko Ursulin (2018-07-24 11:33:47)
> 
> On 20/07/2018 12:11, Chris Wilson wrote:
> > On Sandybridge, we need a workaround to wait for the CPU thread to wake
> > up before we are sure that we have enabled the GT power well. However,
> > we do see the errors being reported and failed reads returning spurious
> > results. To try and capture more details as it fails, promote the error
> > into a WARN so we grab the stacktrace, and to try and reduce the
> > frequency of error increase the timeout from 500us to 5ms.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/intel_uncore.c | 18 ++++++++++++++----
> >   1 file changed, 14 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> > index b892ca8396e8..284be151f645 100644
> > --- a/drivers/gpu/drm/i915/intel_uncore.c
> > +++ b/drivers/gpu/drm/i915/intel_uncore.c
> > @@ -283,14 +283,24 @@ fw_domains_reset(struct drm_i915_private *i915,
> >               fw_domain_reset(i915, d);
> >   }
> >   
> > +static inline u32 gt_thread_status(struct drm_i915_private *dev_priv)
> > +{
> > +     u32 val;
> > +
> > +     val = __raw_i915_read32(dev_priv, GEN6_GT_THREAD_STATUS_REG);
> > +     val &= GEN6_GT_THREAD_STATUS_CORE_MASK;
> > +
> > +     return val;
> > +}
> > +
> >   static void __gen6_gt_wait_for_thread_c0(struct drm_i915_private *dev_priv)
> >   {
> > -     /* w/a for a sporadic read returning 0 by waiting for the GT
> > +     /*
> > +      * w/a for a sporadic read returning 0 by waiting for the GT
> >        * thread to wake up.
> >        */
> > -     if (wait_for_atomic_us((__raw_i915_read32(dev_priv, GEN6_GT_THREAD_STATUS_REG) &
> > -                             GEN6_GT_THREAD_STATUS_CORE_MASK) == 0, 500))
> > -             DRM_ERROR("GT thread status wait timed out\n");
> > +     WARN_ONCE(wait_for_atomic_us(gt_thread_status(dev_priv) == 0, 5000),
> > +               "GT thread status wait timed out\n");
> >   }
> >   
> >   static void fw_domains_get_with_thread_status(struct drm_i915_private *dev_priv,
> > 
> 
> Quite a large increase - but I guess nothing we can do if it's hit, nor 
> it's better to time out faster if the wait needs to be longer.

It was a bit of a jump, but I felt that if we are prematurely declaring
the machine dead, we end up killing it anyway.

A long, long time ago, when we saw these errors it was terminal. The
recent sightings in CI however looked transient (hard to tell as we
panicked shortly after), so I'm betting on a longer timeout will help
the false positives (additional hardirq latency is better than death).
-Chris

diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index b892ca8396e8..284be151f645 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -283,14 +283,24 @@  fw_domains_reset(struct drm_i915_private *i915,
 		fw_domain_reset(i915, d);
 }
 
+static inline u32 gt_thread_status(struct drm_i915_private *dev_priv)
+{
+	u32 val;
+
+	val = __raw_i915_read32(dev_priv, GEN6_GT_THREAD_STATUS_REG);
+	val &= GEN6_GT_THREAD_STATUS_CORE_MASK;
+
+	return val;
+}
+
 static void __gen6_gt_wait_for_thread_c0(struct drm_i915_private *dev_priv)
 {
-	/* w/a for a sporadic read returning 0 by waiting for the GT
+	/*
+	 * w/a for a sporadic read returning 0 by waiting for the GT
 	 * thread to wake up.
 	 */
-	if (wait_for_atomic_us((__raw_i915_read32(dev_priv, GEN6_GT_THREAD_STATUS_REG) &
-				GEN6_GT_THREAD_STATUS_CORE_MASK) == 0, 500))
-		DRM_ERROR("GT thread status wait timed out\n");
+	WARN_ONCE(wait_for_atomic_us(gt_thread_status(dev_priv) == 0, 5000),
+		  "GT thread status wait timed out\n");
 }
 
 static void fw_domains_get_with_thread_status(struct drm_i915_private *dev_priv,

drm/i915: Show stack (by WARN) for hitting forcewake errors

Commit Message

Comments

Patch