diff mbox

drm/i915: Protect gen7 irq_seqno_barrier with uncore lock

Message ID 1461744121-27051-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson April 27, 2016, 8:02 a.m. UTC
Faced with sporadic machine hangs on gen7, that mimic the issue of
concurrent writes to the same cacheline and seem to start with
commit 9b9ed309 (drm/i915: Remove forcewake dance from seqno/irq
barrier on legacy gen6+), let us restore the spinlock around the mmio
read.

Fixes: 9b9ed3093613288247a27a55a6dd07f1222150f1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Chris Wilson April 27, 2016, 10:15 a.m. UTC | #1
On Wed, Apr 27, 2016 at 09:02:01AM +0100, Chris Wilson wrote:
> Faced with sporadic machine hangs on gen7, that mimic the issue of
> concurrent writes to the same cacheline and seem to start with
> commit 9b9ed309 (drm/i915: Remove forcewake dance from seqno/irq
> barrier on legacy gen6+), let us restore the spinlock around the mmio
> read.
> 
> Fixes: 9b9ed3093613288247a27a55a6dd07f1222150f1
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>

Testcase: igt/gem_concurrent_blit #crw

I haven't been able to narrow this down to a simpler scenario. Still not
happy that I understand how we are triggering the erratum with this
read, but that does appear to be the case.
-Chris
Mika Kuoppala April 28, 2016, 7:50 a.m. UTC | #2
Chris Wilson <chris@chris-wilson.co.uk> writes:

> [ text/plain ]
> Faced with sporadic machine hangs on gen7, that mimic the issue of
> concurrent writes to the same cacheline and seem to start with
> commit 9b9ed309 (drm/i915: Remove forcewake dance from seqno/irq
> barrier on legacy gen6+), let us restore the spinlock around the mmio
> read.
>
> Fixes: 9b9ed3093613288247a27a55a6dd07f1222150f1
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>

After 23 hours and 2078 GpuTest07 runs the box is healthy so:

Tested-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 66f69cdd1d36..ad5bd3808d8b 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1590,7 +1590,10 @@ gen6_seqno_barrier(struct intel_engine_cs *engine)
>  	 * interrupt (with the same net latency).
>  	 */
>  	struct drm_i915_private *dev_priv = engine->dev->dev_private;
> +
> +	spin_lock_irq(&dev_priv->uncore.lock);
>  	POSTING_READ_FW(RING_ACTHD(engine->mmio_base));
> +	spin_unlock_irq(&dev_priv->uncore.lock);

I was thinking that comment would be needed for the casual reader.
But perhaps the blatant unorthodoxity is big enough warning sign
to tread carefully here.

-Mika


>  }
>  
>  static u32
> -- 
> 2.8.1
Daniel Vetter April 28, 2016, 8:23 a.m. UTC | #3
On Wed, Apr 27, 2016 at 09:02:01AM +0100, Chris Wilson wrote:
> Faced with sporadic machine hangs on gen7, that mimic the issue of
> concurrent writes to the same cacheline and seem to start with
> commit 9b9ed309 (drm/i915: Remove forcewake dance from seqno/irq
> barrier on legacy gen6+), let us restore the spinlock around the mmio
> read.
> 
> Fixes: 9b9ed3093613288247a27a55a6dd07f1222150f1

4.7 is frozen, need to re-run dim fixes for this one:

$ dim fixes 9b9ed3093613288247a27a55a6dd07f1222150f1
Fixes: 9b9ed3093613 ("drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+")
Cc: drm-intel-fixes@lists.freedesktop.org


> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 66f69cdd1d36..ad5bd3808d8b 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1590,7 +1590,10 @@ gen6_seqno_barrier(struct intel_engine_cs *engine)
>  	 * interrupt (with the same net latency).
>  	 */
>  	struct drm_i915_private *dev_priv = engine->dev->dev_private;
> +
> +	spin_lock_irq(&dev_priv->uncore.lock);
>  	POSTING_READ_FW(RING_ACTHD(engine->mmio_base));
> +	spin_unlock_irq(&dev_priv->uncore.lock);
>  }
>  
>  static u32
> -- 
> 2.8.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson April 28, 2016, 8:42 a.m. UTC | #4
On Thu, Apr 28, 2016 at 10:23:00AM +0200, Daniel Vetter wrote:
> On Wed, Apr 27, 2016 at 09:02:01AM +0100, Chris Wilson wrote:
> > Faced with sporadic machine hangs on gen7, that mimic the issue of
> > concurrent writes to the same cacheline and seem to start with
> > commit 9b9ed309 (drm/i915: Remove forcewake dance from seqno/irq
> > barrier on legacy gen6+), let us restore the spinlock around the mmio
> > read.
> > 
> > Fixes: 9b9ed3093613288247a27a55a6dd07f1222150f1
> 
> 4.7 is frozen, need to re-run dim fixes for this one:
> 
> $ dim fixes 9b9ed3093613288247a27a55a6dd07f1222150f1
> Fixes: 9b9ed3093613 ("drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+")
> Cc: drm-intel-fixes@lists.freedesktop.org

Sigh, missed.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 66f69cdd1d36..ad5bd3808d8b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1590,7 +1590,10 @@  gen6_seqno_barrier(struct intel_engine_cs *engine)
 	 * interrupt (with the same net latency).
 	 */
 	struct drm_i915_private *dev_priv = engine->dev->dev_private;
+
+	spin_lock_irq(&dev_priv->uncore.lock);
 	POSTING_READ_FW(RING_ACTHD(engine->mmio_base));
+	spin_unlock_irq(&dev_priv->uncore.lock);
 }
 
 static u32