Message ID | 1461744121-27051-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Apr 27, 2016 at 09:02:01AM +0100, Chris Wilson wrote: > Faced with sporadic machine hangs on gen7, that mimic the issue of > concurrent writes to the same cacheline and seem to start with > commit 9b9ed309 (drm/i915: Remove forcewake dance from seqno/irq > barrier on legacy gen6+), let us restore the spinlock around the mmio > read. > > Fixes: 9b9ed3093613288247a27a55a6dd07f1222150f1 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Mika Kuoppala <mika.kuoppala@intel.com> Testcase: igt/gem_concurrent_blit #crw I haven't been able to narrow this down to a simpler scenario. Still not happy that I understand how we are triggering the erratum with this read, but that does appear to be the case. -Chris
Chris Wilson <chris@chris-wilson.co.uk> writes: > [ text/plain ] > Faced with sporadic machine hangs on gen7, that mimic the issue of > concurrent writes to the same cacheline and seem to start with > commit 9b9ed309 (drm/i915: Remove forcewake dance from seqno/irq > barrier on legacy gen6+), let us restore the spinlock around the mmio > read. > > Fixes: 9b9ed3093613288247a27a55a6dd07f1222150f1 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Mika Kuoppala <mika.kuoppala@intel.com> After 23 hours and 2078 GpuTest07 runs the box is healthy so: Tested-by: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com> > --- > drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > index 66f69cdd1d36..ad5bd3808d8b 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -1590,7 +1590,10 @@ gen6_seqno_barrier(struct intel_engine_cs *engine) > * interrupt (with the same net latency). > */ > struct drm_i915_private *dev_priv = engine->dev->dev_private; > + > + spin_lock_irq(&dev_priv->uncore.lock); > POSTING_READ_FW(RING_ACTHD(engine->mmio_base)); > + spin_unlock_irq(&dev_priv->uncore.lock); I was thinking that comment would be needed for the casual reader. But perhaps the blatant unorthodoxity is big enough warning sign to tread carefully here. -Mika > } > > static u32 > -- > 2.8.1
On Wed, Apr 27, 2016 at 09:02:01AM +0100, Chris Wilson wrote: > Faced with sporadic machine hangs on gen7, that mimic the issue of > concurrent writes to the same cacheline and seem to start with > commit 9b9ed309 (drm/i915: Remove forcewake dance from seqno/irq > barrier on legacy gen6+), let us restore the spinlock around the mmio > read. > > Fixes: 9b9ed3093613288247a27a55a6dd07f1222150f1 4.7 is frozen, need to re-run dim fixes for this one: $ dim fixes 9b9ed3093613288247a27a55a6dd07f1222150f1 Fixes: 9b9ed3093613 ("drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+") Cc: drm-intel-fixes@lists.freedesktop.org > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Mika Kuoppala <mika.kuoppala@intel.com> > --- > drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c > index 66f69cdd1d36..ad5bd3808d8b 100644 > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c > @@ -1590,7 +1590,10 @@ gen6_seqno_barrier(struct intel_engine_cs *engine) > * interrupt (with the same net latency). > */ > struct drm_i915_private *dev_priv = engine->dev->dev_private; > + > + spin_lock_irq(&dev_priv->uncore.lock); > POSTING_READ_FW(RING_ACTHD(engine->mmio_base)); > + spin_unlock_irq(&dev_priv->uncore.lock); > } > > static u32 > -- > 2.8.1 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Thu, Apr 28, 2016 at 10:23:00AM +0200, Daniel Vetter wrote: > On Wed, Apr 27, 2016 at 09:02:01AM +0100, Chris Wilson wrote: > > Faced with sporadic machine hangs on gen7, that mimic the issue of > > concurrent writes to the same cacheline and seem to start with > > commit 9b9ed309 (drm/i915: Remove forcewake dance from seqno/irq > > barrier on legacy gen6+), let us restore the spinlock around the mmio > > read. > > > > Fixes: 9b9ed3093613288247a27a55a6dd07f1222150f1 > > 4.7 is frozen, need to re-run dim fixes for this one: > > $ dim fixes 9b9ed3093613288247a27a55a6dd07f1222150f1 > Fixes: 9b9ed3093613 ("drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+") > Cc: drm-intel-fixes@lists.freedesktop.org Sigh, missed. -Chris
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 66f69cdd1d36..ad5bd3808d8b 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -1590,7 +1590,10 @@ gen6_seqno_barrier(struct intel_engine_cs *engine) * interrupt (with the same net latency). */ struct drm_i915_private *dev_priv = engine->dev->dev_private; + + spin_lock_irq(&dev_priv->uncore.lock); POSTING_READ_FW(RING_ACTHD(engine->mmio_base)); + spin_unlock_irq(&dev_priv->uncore.lock); } static u32
Faced with sporadic machine hangs on gen7, that mimic the issue of concurrent writes to the same cacheline and seem to start with commit 9b9ed309 (drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+), let us restore the spinlock around the mmio read. Fixes: 9b9ed3093613288247a27a55a6dd07f1222150f1 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> --- drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++ 1 file changed, 3 insertions(+)