Message ID | 20180720111102.11549-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 20/07/2018 12:11, Chris Wilson wrote: > On Sandybridge, we need a workaround to wait for the CPU thread to wake > up before we are sure that we have enabled the GT power well. However, > we do see the errors being reported and failed reads returning spurious > results. To try and capture more details as it fails, promote the error > into a WARN so we grab the stacktrace, and to try and reduce the > frequency of error increase the timeout from 500us to 5ms. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/intel_uncore.c | 18 ++++++++++++++---- > 1 file changed, 14 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c > index b892ca8396e8..284be151f645 100644 > --- a/drivers/gpu/drm/i915/intel_uncore.c > +++ b/drivers/gpu/drm/i915/intel_uncore.c > @@ -283,14 +283,24 @@ fw_domains_reset(struct drm_i915_private *i915, > fw_domain_reset(i915, d); > } > > +static inline u32 gt_thread_status(struct drm_i915_private *dev_priv) > +{ > + u32 val; > + > + val = __raw_i915_read32(dev_priv, GEN6_GT_THREAD_STATUS_REG); > + val &= GEN6_GT_THREAD_STATUS_CORE_MASK; > + > + return val; > +} > + > static void __gen6_gt_wait_for_thread_c0(struct drm_i915_private *dev_priv) > { > - /* w/a for a sporadic read returning 0 by waiting for the GT > + /* > + * w/a for a sporadic read returning 0 by waiting for the GT > * thread to wake up. > */ > - if (wait_for_atomic_us((__raw_i915_read32(dev_priv, GEN6_GT_THREAD_STATUS_REG) & > - GEN6_GT_THREAD_STATUS_CORE_MASK) == 0, 500)) > - DRM_ERROR("GT thread status wait timed out\n"); > + WARN_ONCE(wait_for_atomic_us(gt_thread_status(dev_priv) == 0, 5000), > + "GT thread status wait timed out\n"); > } > > static void fw_domains_get_with_thread_status(struct drm_i915_private *dev_priv, > Quite a large increase - but I guess nothing we can do if it's hit, nor it's better to time out faster if the wait needs to be longer. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Regards, Tvrtko
Quoting Tvrtko Ursulin (2018-07-24 11:33:47) > > On 20/07/2018 12:11, Chris Wilson wrote: > > On Sandybridge, we need a workaround to wait for the CPU thread to wake > > up before we are sure that we have enabled the GT power well. However, > > we do see the errors being reported and failed reads returning spurious > > results. To try and capture more details as it fails, promote the error > > into a WARN so we grab the stacktrace, and to try and reduce the > > frequency of error increase the timeout from 500us to 5ms. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > --- > > drivers/gpu/drm/i915/intel_uncore.c | 18 ++++++++++++++---- > > 1 file changed, 14 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c > > index b892ca8396e8..284be151f645 100644 > > --- a/drivers/gpu/drm/i915/intel_uncore.c > > +++ b/drivers/gpu/drm/i915/intel_uncore.c > > @@ -283,14 +283,24 @@ fw_domains_reset(struct drm_i915_private *i915, > > fw_domain_reset(i915, d); > > } > > > > +static inline u32 gt_thread_status(struct drm_i915_private *dev_priv) > > +{ > > + u32 val; > > + > > + val = __raw_i915_read32(dev_priv, GEN6_GT_THREAD_STATUS_REG); > > + val &= GEN6_GT_THREAD_STATUS_CORE_MASK; > > + > > + return val; > > +} > > + > > static void __gen6_gt_wait_for_thread_c0(struct drm_i915_private *dev_priv) > > { > > - /* w/a for a sporadic read returning 0 by waiting for the GT > > + /* > > + * w/a for a sporadic read returning 0 by waiting for the GT > > * thread to wake up. > > */ > > - if (wait_for_atomic_us((__raw_i915_read32(dev_priv, GEN6_GT_THREAD_STATUS_REG) & > > - GEN6_GT_THREAD_STATUS_CORE_MASK) == 0, 500)) > > - DRM_ERROR("GT thread status wait timed out\n"); > > + WARN_ONCE(wait_for_atomic_us(gt_thread_status(dev_priv) == 0, 5000), > > + "GT thread status wait timed out\n"); > > } > > > > static void fw_domains_get_with_thread_status(struct drm_i915_private *dev_priv, > > > > Quite a large increase - but I guess nothing we can do if it's hit, nor > it's better to time out faster if the wait needs to be longer. It was a bit of a jump, but I felt that if we are prematurely declaring the machine dead, we end up killing it anyway. A long, long time ago, when we saw these errors it was terminal. The recent sightings in CI however looked transient (hard to tell as we panicked shortly after), so I'm betting on a longer timeout will help the false positives (additional hardirq latency is better than death). -Chris
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index b892ca8396e8..284be151f645 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -283,14 +283,24 @@ fw_domains_reset(struct drm_i915_private *i915, fw_domain_reset(i915, d); } +static inline u32 gt_thread_status(struct drm_i915_private *dev_priv) +{ + u32 val; + + val = __raw_i915_read32(dev_priv, GEN6_GT_THREAD_STATUS_REG); + val &= GEN6_GT_THREAD_STATUS_CORE_MASK; + + return val; +} + static void __gen6_gt_wait_for_thread_c0(struct drm_i915_private *dev_priv) { - /* w/a for a sporadic read returning 0 by waiting for the GT + /* + * w/a for a sporadic read returning 0 by waiting for the GT * thread to wake up. */ - if (wait_for_atomic_us((__raw_i915_read32(dev_priv, GEN6_GT_THREAD_STATUS_REG) & - GEN6_GT_THREAD_STATUS_CORE_MASK) == 0, 500)) - DRM_ERROR("GT thread status wait timed out\n"); + WARN_ONCE(wait_for_atomic_us(gt_thread_status(dev_priv) == 0, 5000), + "GT thread status wait timed out\n"); } static void fw_domains_get_with_thread_status(struct drm_i915_private *dev_priv,
On Sandybridge, we need a workaround to wait for the CPU thread to wake up before we are sure that we have enabled the GT power well. However, we do see the errors being reported and failed reads returning spurious results. To try and capture more details as it fails, promote the error into a WARN so we grab the stacktrace, and to try and reduce the frequency of error increase the timeout from 500us to 5ms. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/intel_uncore.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)