diff mbox

[v2,1/2] drm/i915/bxt: work around HW coherency issue when accessing GPU seqno

Message ID 1439555937-8016-2-git-send-email-imre.deak@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Imre Deak Aug. 14, 2015, 12:38 p.m. UTC
By running igt/store_dword_loop_render on BXT we can hit a coherency
problem where the seqno written at GPU command completion time is not
seen by the CPU. This results in __i915_wait_request seeing the stale
seqno and not completing the request (not considering the lost
interrupt/GPU reset mechanism). I also verified that this isn't a case
of a lost interrupt, or that the command didn't complete somehow: when
the coherency issue occured I read the seqno via an uncached GTT mapping
too. While the cached version of the seqno still showed the stale value
the one read via the uncached mapping was the correct one.

Work around this issue by clflushing the corresponding CPU cacheline
following any store of the seqno and preceding any reading of it. When
reading it do this only when the caller expects a coherent view.

v2:
- fix using the proper logical && instead of a bitwise & (Jani, Mika)
- limit the workaround to A stepping, on later steppings this HW issue
  is fixed

Testcase: igt/store_dword_loop_render
Signed-off-by: Imre Deak <imre.deak@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 20 ++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  7 +++++++
 2 files changed, 27 insertions(+)

Comments

Chris Wilson Aug. 14, 2015, 1:12 p.m. UTC | #1
On Fri, Aug 14, 2015 at 03:38:56PM +0300, Imre Deak wrote:
> By running igt/store_dword_loop_render on BXT we can hit a coherency
> problem where the seqno written at GPU command completion time is not
> seen by the CPU. This results in __i915_wait_request seeing the stale
> seqno and not completing the request (not considering the lost
> interrupt/GPU reset mechanism). I also verified that this isn't a case
> of a lost interrupt, or that the command didn't complete somehow: when
> the coherency issue occured I read the seqno via an uncached GTT mapping
> too. While the cached version of the seqno still showed the stale value
> the one read via the uncached mapping was the correct one.
> 
> Work around this issue by clflushing the corresponding CPU cacheline
> following any store of the seqno and preceding any reading of it. When
> reading it do this only when the caller expects a coherent view.
> 
> v2:
> - fix using the proper logical && instead of a bitwise & (Jani, Mika)
> - limit the workaround to A stepping, on later steppings this HW issue
>   is fixed

We have vfuncs in order to avoid the pointer dance (and boy is it a
pretty and quite convoluted dance).
-Chris
Imre Deak Aug. 14, 2015, 1:31 p.m. UTC | #2
On pe, 2015-08-14 at 14:12 +0100, Chris Wilson wrote:
> On Fri, Aug 14, 2015 at 03:38:56PM +0300, Imre Deak wrote:
> > By running igt/store_dword_loop_render on BXT we can hit a coherency
> > problem where the seqno written at GPU command completion time is not
> > seen by the CPU. This results in __i915_wait_request seeing the stale
> > seqno and not completing the request (not considering the lost
> > interrupt/GPU reset mechanism). I also verified that this isn't a case
> > of a lost interrupt, or that the command didn't complete somehow: when
> > the coherency issue occured I read the seqno via an uncached GTT mapping
> > too. While the cached version of the seqno still showed the stale value
> > the one read via the uncached mapping was the correct one.
> > 
> > Work around this issue by clflushing the corresponding CPU cacheline
> > following any store of the seqno and preceding any reading of it. When
> > reading it do this only when the caller expects a coherent view.
> > 
> > v2:
> > - fix using the proper logical && instead of a bitwise & (Jani, Mika)
> > - limit the workaround to A stepping, on later steppings this HW issue
> >   is fixed
> 
> We have vfuncs in order to avoid the pointer dance (and boy is it a
> pretty and quite convoluted dance).

Ok, I'll add new get_seqno/set_seqno vfuncs.

> -Chris
>
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 138964a..46f2be0 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1691,12 +1691,32 @@  static int gen8_emit_flush_render(struct drm_i915_gem_request *request,
 
 static u32 gen8_get_seqno(struct intel_engine_cs *ring, bool lazy_coherency)
 {
+
+	/*
+	 * On BXT A steppings there is a HW coherency issue whereby the
+	 * MI_STORE_DATA_IMM storing the completed request's seqno
+	 * occasionally doesn't invalidate the CPU cache. Work around this by
+	 * clflushing the corresponding cacheline whenever the caller wants
+	 * the coherency to be guaranteed. Note that this cacheline is known
+	 * to be clean at this point, since we only write it in
+	 * gen8_set_seqno(), where we also do a clflush after the write. So
+	 * this clflush in practice becomes an invalidate operation.
+	 */
+
+	if (IS_BROXTON(ring->dev) && INTEL_REVID(ring->dev) < BXT_REVID_B0 &&
+	    !lazy_coherency)
+		intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
+
 	return intel_read_status_page(ring, I915_GEM_HWS_INDEX);
 }
 
 static void gen8_set_seqno(struct intel_engine_cs *ring, u32 seqno)
 {
 	intel_write_status_page(ring, I915_GEM_HWS_INDEX, seqno);
+
+	/* See gen8_get_seqno() explaining the reason for the clflush. */
+	if (IS_BROXTON(ring->dev) && INTEL_REVID(ring->dev) < BXT_REVID_B0)
+		intel_flush_status_page(ring, I915_GEM_HWS_INDEX);
 }
 
 static int gen8_emit_request(struct drm_i915_gem_request *request)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 2e85fda..95b0b4b 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -377,6 +377,13 @@  intel_ring_sync_index(struct intel_engine_cs *ring,
 	return idx;
 }
 
+static inline void
+intel_flush_status_page(struct intel_engine_cs *ring, int reg)
+{
+	drm_clflush_virt_range(&ring->status_page.page_addr[reg],
+			       sizeof(uint32_t));
+}
+
 static inline u32
 intel_read_status_page(struct intel_engine_cs *ring,
 		       int reg)