diff mbox

drm/i915/hsw: Implement Selective Write workaround

Message ID 1418955618-382-1-git-send-email-benjamin.widawsky@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ben Widawsky Dec. 19, 2014, 2:20 a.m. UTC
From: Ben Widawsky <ben@bwidawsk.net>

The docs specify this needs to be set on HSW GT1 parts. I've implemented it as
such since it should only be needed when using RC6, but it can probably go
anywhere.

This patch fixes extremely reproducible hangs on our Jenkins setup.

The interesting failure signature is:
  IPEHR: 0x780c0000 (3DSTATE_VF)
  INSTDONE_0: 0xffdfbffa (SVG + VS)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87138 (more?)
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: stable@vger.kernel.org
Reported-by: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h | 2 ++
 drivers/gpu/drm/i915/i915_reg.h | 7 +++++++
 drivers/gpu/drm/i915/intel_pm.c | 9 +++++++++
 3 files changed, 18 insertions(+)

Comments

Chris Wilson Dec. 19, 2014, 8:20 a.m. UTC | #1
On Thu, Dec 18, 2014 at 06:20:18PM -0800, Ben Widawsky wrote:
> From: Ben Widawsky <ben@bwidawsk.net>
> 
> The docs specify this needs to be set on HSW GT1 parts. I've implemented it as
> such since it should only be needed when using RC6, but it can probably go
> anywhere.
> 
> This patch fixes extremely reproducible hangs on our Jenkins setup.
> 
> The interesting failure signature is:
>   IPEHR: 0x780c0000 (3DSTATE_VF)
>   INSTDONE_0: 0xffdfbffa (SVG + VS)

We see the same bug on IVB, and the bug (verified on hsw gt1) persists with i915.enable_rc6=0.
-Chris
Ben Widawsky Dec. 19, 2014, 5:26 p.m. UTC | #2
On Fri, Dec 19, 2014 at 08:20:04AM +0000, Chris Wilson wrote:
> On Thu, Dec 18, 2014 at 06:20:18PM -0800, Ben Widawsky wrote:
> > From: Ben Widawsky <ben@bwidawsk.net>
> > 
> > The docs specify this needs to be set on HSW GT1 parts. I've implemented it as
> > such since it should only be needed when using RC6, but it can probably go
> > anywhere.
> > 
> > This patch fixes extremely reproducible hangs on our Jenkins setup.
> > 
> > The interesting failure signature is:
> >   IPEHR: 0x780c0000 (3DSTATE_VF)
> >   INSTDONE_0: 0xffdfbffa (SVG + VS)
> 
> We see the same bug on IVB, and the bug (verified on hsw gt1) persists with i915.enable_rc6=0.
> -Chris

I would claim it's not the same bug on IVB. The workaround very specifically
says HSW GT1 only. The failure signature itself is pretty generic, it just
suggests the VS went off and did something stupid, and the VF was unable to fill
the URB. (That's just my best guess) I do apologize if I inadvertently made a
connection to the IVB bug issue.

So, what is your point? I'm willing to concede we don't know all the facts, but
the [previous version of this] patch demonstrably fixes hangs on HSW GT1.
Chris Wilson Dec. 19, 2014, 6:34 p.m. UTC | #3
On Fri, Dec 19, 2014 at 09:26:01AM -0800, Ben Widawsky wrote:
> On Fri, Dec 19, 2014 at 08:20:04AM +0000, Chris Wilson wrote:
> > On Thu, Dec 18, 2014 at 06:20:18PM -0800, Ben Widawsky wrote:
> > > From: Ben Widawsky <ben@bwidawsk.net>
> > > 
> > > The docs specify this needs to be set on HSW GT1 parts. I've implemented it as
> > > such since it should only be needed when using RC6, but it can probably go
> > > anywhere.
> > > 
> > > This patch fixes extremely reproducible hangs on our Jenkins setup.
> > > 
> > > The interesting failure signature is:
> > >   IPEHR: 0x780c0000 (3DSTATE_VF)
> > >   INSTDONE_0: 0xffdfbffa (SVG + VS)
> > 
> > We see the same bug on IVB, and the bug (verified on hsw gt1) persists with i915.enable_rc6=0.
> > -Chris
> 
> I would claim it's not the same bug on IVB. The workaround very specifically
> says HSW GT1 only. The failure signature itself is pretty generic, it just
> suggests the VS went off and did something stupid, and the VF was unable to fill
> the URB. (That's just my best guess) I do apologize if I inadvertently made a
> connection to the IVB bug issue.

The failure message matches the context hang bug: "dies on last
instruction in the context restore" and probably has nothing to do with
the VS.
 
> So, what is your point? I'm willing to concede we don't know all the facts, but
> the [previous version of this] patch demonstrably fixes hangs on HSW GT1.

Just a patch a few days ago that to fix a very ontologically similar bug
for ivb+.
-Chris
Chris Wilson Dec. 19, 2014, 8:05 p.m. UTC | #4
On Fri, Dec 19, 2014 at 06:34:07PM +0000, Chris Wilson wrote:
> Just a patch a few days ago that to fix a very ontologically similar bug
> for ivb+.

Oh boy. I just double checked the error states from those bugs I marked
as ivb context restore hangs... So far I appear to have consistently
mislabled 0x402/0x406 as ivb and not hsw gt1. /o\
-Chris
Ben Widawsky Dec. 19, 2014, 10:14 p.m. UTC | #5
On Fri, Dec 19, 2014 at 08:05:30PM +0000, Chris Wilson wrote:
> On Fri, Dec 19, 2014 at 06:34:07PM +0000, Chris Wilson wrote:
> > Just a patch a few days ago that to fix a very ontologically similar bug
> > for ivb+.
> 
> Oh boy. I just double checked the error states from those bugs I marked
> as ivb context restore hangs... So far I appear to have consistently
> mislabled 0x402/0x406 as ivb and not hsw gt1. /o\
> -Chris

I don't know what to do with this patch now. It looks like all we needed was
your patch. Merge without the "stable" is what I'm thinking.

> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre
Daniel Vetter Jan. 5, 2015, 10:02 a.m. UTC | #6
On Fri, Dec 19, 2014 at 02:14:31PM -0800, Ben Widawsky wrote:
> On Fri, Dec 19, 2014 at 08:05:30PM +0000, Chris Wilson wrote:
> > On Fri, Dec 19, 2014 at 06:34:07PM +0000, Chris Wilson wrote:
> > > Just a patch a few days ago that to fix a very ontologically similar bug
> > > for ivb+.
> > 
> > Oh boy. I just double checked the error states from those bugs I marked
> > as ivb context restore hangs... So far I appear to have consistently
> > mislabled 0x402/0x406 as ivb and not hsw gt1. /o\
> > -Chris
> 
> I don't know what to do with this patch now. It looks like all we needed was
> your patch. Merge without the "stable" is what I'm thinking.

Makes sense if we really don't need it any more with Chris' patches in
-fixes. Can you please volunteer someone to cough up an r-b?

Thanks, Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 921e4c5..f69984d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2282,6 +2282,8 @@  struct drm_i915_cmd_table {
 				 (INTEL_DEVID(dev) & 0xf) == 0xe))
 #define IS_BDW_GT3(dev)		(IS_BROADWELL(dev) && \
 				 (INTEL_DEVID(dev) & 0x00F0) == 0x0020)
+#define IS_HSW_GT1(dev)		(IS_HASWELL(dev) && \
+				 (INTEL_DEVID(dev) & 0x00F0) == 0x0)
 #define IS_HSW_ULT(dev)		(IS_HASWELL(dev) && \
 				 (INTEL_DEVID(dev) & 0xFF00) == 0x0A00)
 #define IS_HSW_GT3(dev)		(IS_HASWELL(dev) && \
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 40ca873..f9ff662 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -1218,6 +1218,13 @@  enum punit_power_well {
 #define RING_DMA_FADD_UDW(base)	((base)+0x60) /* gen8+ */
 #define RING_INSTPM(base)	((base)+0xc0)
 #define RING_MI_MODE(base)	((base)+0x9c)
+#define RING_WAIT_FOR_RC6_EXIT(base)	((base)+0xcc)
+#define   RING_RC6_SEL_WRITE_ADDR_MASK		(0x7 << 4)
+#define   RING_RC6_SEL_WRITE_ADDR_MULTICAST	(0x0 << 4)
+#define   RING_RC6_SEL_WRITE_ADDR_UPPER_LEFT	(0x4 << 4)
+#define   RING_RC6_SEL_WRITE_ADDR_UPPER_RIGHT	(0x5 << 4)
+#define   RING_RC6_SEL_WRITE_ADDR_LOWER_LEFT	(0x6 << 4)
+#define   RING_RC6_SEL_WRITE_ADDR_LOWER_RIGHT	(0x7 << 4)
 #define INSTPS		0x02070 /* 965+ only */
 #define INSTDONE1	0x0207c /* 965+ only */
 #define ACTHD_I965	0x02074
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index a3ebaa8..a27003c 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -4259,6 +4259,15 @@  static void gen6_enable_rps(struct drm_device *dev)
 			DRM_ERROR("Couldn't fix incorrect rc6 voltage\n");
 	}
 
+	/* HSW GT1: "This field must be always [be] programmed to “100” , this
+	 * is required to address know [sic] HW issue." */
+	if (IS_HSW_GT1(dev)) {
+		for_each_ring(ring, dev_priv, i) {
+			I915_WRITE(RING_WAIT_FOR_RC6_EXIT(ring->mmio_base),
+				   _MASKED_FIELD(RING_RC6_SEL_WRITE_ADDR_MASK,
+						 RING_RC6_SEL_WRITE_ADDR_UPPER_LEFT));
+		}
+	}
 	gen6_gt_force_wake_put(dev_priv, FORCEWAKE_ALL);
 }