drm/i915: Protect gen7 irq_seqno_barrier with uncore lock

Message ID	1461744121-27051-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Chris Wilson <chris@chris-wilson.co.uk> To: intel-gfx@lists.freedesktop.org Date: Wed, 27 Apr 2016 09:02:01 +0100 Message-Id: <1461744121-27051-1-git-send-email-chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Subject: [Intel-gfx] [PATCH] drm/i915: Protect gen7 irq_seqno_barrier with uncore lock Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Message ID

1461744121-27051-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive)

State

New, archived

Headers

From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Date: Wed, 27 Apr 2016 09:02:01 +0100
Message-Id: <1461744121-27051-1-git-send-email-chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Subject: [Intel-gfx] [PATCH] drm/i915: Protect gen7 irq_seqno_barrier with
	uncore lock
Precedence: list
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Commit Message

Chris Wilson April 27, 2016, 8:02 a.m. UTC

Faced with sporadic machine hangs on gen7, that mimic the issue of
concurrent writes to the same cacheline and seem to start with
commit 9b9ed309 (drm/i915: Remove forcewake dance from seqno/irq
barrier on legacy gen6+), let us restore the spinlock around the mmio
read.

Fixes: 9b9ed3093613288247a27a55a6dd07f1222150f1
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Chris Wilson April 27, 2016, 10:15 a.m. UTC | #1

On Wed, Apr 27, 2016 at 09:02:01AM +0100, Chris Wilson wrote:
> Faced with sporadic machine hangs on gen7, that mimic the issue of
> concurrent writes to the same cacheline and seem to start with
> commit 9b9ed309 (drm/i915: Remove forcewake dance from seqno/irq
> barrier on legacy gen6+), let us restore the spinlock around the mmio
> read.
> 
> Fixes: 9b9ed3093613288247a27a55a6dd07f1222150f1
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>

Testcase: igt/gem_concurrent_blit #crw

I haven't been able to narrow this down to a simpler scenario. Still not
happy that I understand how we are triggering the erratum with this
read, but that does appear to be the case.
-Chris

Mika Kuoppala April 28, 2016, 7:50 a.m. UTC | #2

Chris Wilson <chris@chris-wilson.co.uk> writes:

> [ text/plain ]
> Faced with sporadic machine hangs on gen7, that mimic the issue of
> concurrent writes to the same cacheline and seem to start with
> commit 9b9ed309 (drm/i915: Remove forcewake dance from seqno/irq
> barrier on legacy gen6+), let us restore the spinlock around the mmio
> read.
>
> Fixes: 9b9ed3093613288247a27a55a6dd07f1222150f1
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>

After 23 hours and 2078 GpuTest07 runs the box is healthy so:

Tested-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>

> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 66f69cdd1d36..ad5bd3808d8b 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1590,7 +1590,10 @@ gen6_seqno_barrier(struct intel_engine_cs *engine)
>  	 * interrupt (with the same net latency).
>  	 */
>  	struct drm_i915_private *dev_priv = engine->dev->dev_private;
> +
> +	spin_lock_irq(&dev_priv->uncore.lock);
>  	POSTING_READ_FW(RING_ACTHD(engine->mmio_base));
> +	spin_unlock_irq(&dev_priv->uncore.lock);

I was thinking that comment would be needed for the casual reader.
But perhaps the blatant unorthodoxity is big enough warning sign
to tread carefully here.

-Mika


>  }
>  
>  static u32
> -- 
> 2.8.1

Daniel Vetter April 28, 2016, 8:23 a.m. UTC | #3

On Wed, Apr 27, 2016 at 09:02:01AM +0100, Chris Wilson wrote:
> Faced with sporadic machine hangs on gen7, that mimic the issue of
> concurrent writes to the same cacheline and seem to start with
> commit 9b9ed309 (drm/i915: Remove forcewake dance from seqno/irq
> barrier on legacy gen6+), let us restore the spinlock around the mmio
> read.
> 
> Fixes: 9b9ed3093613288247a27a55a6dd07f1222150f1

4.7 is frozen, need to re-run dim fixes for this one:

$ dim fixes 9b9ed3093613288247a27a55a6dd07f1222150f1
Fixes: 9b9ed3093613 ("drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+")
Cc: drm-intel-fixes@lists.freedesktop.org


> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 66f69cdd1d36..ad5bd3808d8b 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1590,7 +1590,10 @@ gen6_seqno_barrier(struct intel_engine_cs *engine)
>  	 * interrupt (with the same net latency).
>  	 */
>  	struct drm_i915_private *dev_priv = engine->dev->dev_private;
> +
> +	spin_lock_irq(&dev_priv->uncore.lock);
>  	POSTING_READ_FW(RING_ACTHD(engine->mmio_base));
> +	spin_unlock_irq(&dev_priv->uncore.lock);
>  }
>  
>  static u32
> -- 
> 2.8.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Chris Wilson April 28, 2016, 8:42 a.m. UTC | #4

On Thu, Apr 28, 2016 at 10:23:00AM +0200, Daniel Vetter wrote:
> On Wed, Apr 27, 2016 at 09:02:01AM +0100, Chris Wilson wrote:
> > Faced with sporadic machine hangs on gen7, that mimic the issue of
> > concurrent writes to the same cacheline and seem to start with
> > commit 9b9ed309 (drm/i915: Remove forcewake dance from seqno/irq
> > barrier on legacy gen6+), let us restore the spinlock around the mmio
> > read.
> > 
> > Fixes: 9b9ed3093613288247a27a55a6dd07f1222150f1
> 
> 4.7 is frozen, need to re-run dim fixes for this one:
> 
> $ dim fixes 9b9ed3093613288247a27a55a6dd07f1222150f1
> Fixes: 9b9ed3093613 ("drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+")
> Cc: drm-intel-fixes@lists.freedesktop.org

Sigh, missed.
-Chris

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 66f69cdd1d36..ad5bd3808d8b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1590,7 +1590,10 @@  gen6_seqno_barrier(struct intel_engine_cs *engine)
 	 * interrupt (with the same net latency).
 	 */
 	struct drm_i915_private *dev_priv = engine->dev->dev_private;
+
+	spin_lock_irq(&dev_priv->uncore.lock);
 	POSTING_READ_FW(RING_ACTHD(engine->mmio_base));
+	spin_unlock_irq(&dev_priv->uncore.lock);
 }
 
 static u32

drm/i915: Protect gen7 irq_seqno_barrier with uncore lock

Commit Message

Comments

Patch