drm/i915: Invalidate TLBs for the rings after a reset

Message ID	1375812074-2665-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org> From: Chris Wilson <chris@chris-wilson.co.uk> To: intel-gfx@lists.freedesktop.org Date: Tue, 6 Aug 2013 19:01:14 +0100 Message-Id: <1375812074-2665-1-git-send-email-chris@chris-wilson.co.uk> Subject: [Intel-gfx] [PATCH] drm/i915: Invalidate TLBs for the rings after a reset Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org

Message ID

1375812074-2665-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive)

State

New, archived

Headers

From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Date: Tue,  6 Aug 2013 19:01:14 +0100
Message-Id: <1375812074-2665-1-git-send-email-chris@chris-wilson.co.uk>
Subject: [Intel-gfx] [PATCH] drm/i915: Invalidate TLBs for the rings after a
	reset
Precedence: list
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org
Errors-To: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org

Commit Message

Chris Wilson Aug. 6, 2013, 6:01 p.m. UTC

After any "soft gfx reset" we must manually invalidate the TLBs
associated with each ring. Empirically, it seems that a
suspend/resume or D3-D0 cycle count as a "soft reset". The symptom is
that the hardware would fail to note the new address for its status
page, and so it would continue to write the shadow registers and
breadcrumbs into the old physical address (now used by something
completely different, scary). Whereas the driver would read the new
status page and never see any progress, it would appear that the GPU
hung immediately upon resume.

Based on a patch by naresh kumar kachhi <naresh.kumar.kacchi@intel.com>

Reported-by: Thiago Macieira <thiago@kde.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64725
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_reg.h         |  2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.c | 12 ++++++++++++
 2 files changed, 14 insertions(+)

Comments

Chris Wilson Aug. 16, 2013, 7:31 a.m. UTC | #1

On Tue, Aug 06, 2013 at 07:01:14PM +0100, Chris Wilson wrote:
> After any "soft gfx reset" we must manually invalidate the TLBs
> associated with each ring. Empirically, it seems that a
> suspend/resume or D3-D0 cycle count as a "soft reset". The symptom is
> that the hardware would fail to note the new address for its status
> page, and so it would continue to write the shadow registers and
> breadcrumbs into the old physical address (now used by something
> completely different, scary). Whereas the driver would read the new
> status page and never see any progress, it would appear that the GPU
> hung immediately upon resume.
> 
> Based on a patch by naresh kumar kachhi <naresh.kumar.kacchi@intel.com>
> 
> Reported-by: Thiago Macieira <thiago@kde.org>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64725
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Thiago reports that early testing indicates success.

Anyone fancy acking this and sending this onto to stable@?
-Chris

Daniel Vetter Aug. 16, 2013, 12:17 p.m. UTC | #2

On Fri, Aug 16, 2013 at 08:31:35AM +0100, Chris Wilson wrote:
> On Tue, Aug 06, 2013 at 07:01:14PM +0100, Chris Wilson wrote:
> > After any "soft gfx reset" we must manually invalidate the TLBs
> > associated with each ring. Empirically, it seems that a
> > suspend/resume or D3-D0 cycle count as a "soft reset". The symptom is
> > that the hardware would fail to note the new address for its status
> > page, and so it would continue to write the shadow registers and
> > breadcrumbs into the old physical address (now used by something
> > completely different, scary). Whereas the driver would read the new
> > status page and never see any progress, it would appear that the GPU
> > hung immediately upon resume.
> > 
> > Based on a patch by naresh kumar kachhi <naresh.kumar.kacchi@intel.com>
> > 
> > Reported-by: Thiago Macieira <thiago@kde.org>
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64725
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> 
> Thiago reports that early testing indicates success.
> 
> Anyone fancy acking this and sending this onto to stable@?

Picked up for -fixes, thanks for the patch. I'll let it hang there a bit
though before forwarding, so I don't plan to update the -fixes pull
request I've just recently sent out.
-Daniel

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 714a909..ca82e5f 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -752,6 +752,8 @@ 
 					will not assert AGPBUSY# and will only
 					be delivered when out of C3. */
 #define   INSTPM_FORCE_ORDERING				(1<<7) /* GEN6+ */
+#define   INSTPM_TLB_INVALIDATE	(1<<9)
+#define   INSTPM_SYNC_FLUSH	(1<<5)
 #define ACTHD	        0x020c8
 #define FW_BLC		0x020d8
 #define FW_BLC2		0x020dc
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index dbc1f7c..58eb6a0 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -972,6 +972,18 @@  void intel_ring_setup_status_page(struct intel_ring_buffer *ring)
 
 	I915_WRITE(mmio, (u32)ring->status_page.gfx_addr);
 	POSTING_READ(mmio);
+
+	/* Flush the TLB for this page */
+	if (INTEL_INFO(dev)->gen >= 6) {
+		u32 reg = RING_INSTPM(ring->mmio_base);
+		I915_WRITE(reg,
+			   _MASKED_BIT_ENABLE(INSTPM_TLB_INVALIDATE |
+					      INSTPM_SYNC_FLUSH));
+		if (wait_for((I915_READ(reg) & INSTPM_SYNC_FLUSH) == 0,
+			     1000))
+			DRM_ERROR("%s: wait for SyncFlush to complete for TLB invalidation timed out\n",
+				  ring->name);
+	}
 }
 
 static int

drm/i915: Invalidate TLBs for the rings after a reset

Commit Message

Comments

Patch