diff mbox

[10/12] drm/i915: TLB invalidation with MI_FLUSH_SW requires a post-sync op

Message ID 1349217826-2538-11-git-send-email-jbarnes@virtuousgeek.org (mailing list archive)
State New, archived
Headers show

Commit Message

Jesse Barnes Oct. 2, 2012, 10:43 p.m. UTC
So store into the scratch space of the HWS to make sure the invalidate
occurs.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
---
 drivers/gpu/drm/i915/i915_reg.h         |    6 ++++--
 drivers/gpu/drm/i915/intel_ringbuffer.c |   22 ++++++++++++++++++----
 2 files changed, 22 insertions(+), 6 deletions(-)

Comments

Ben Widawsky Oct. 3, 2012, 12:14 a.m. UTC | #1
s/MI_FLUSH_SW/MI_FLUSH_DW/

On Tue,  2 Oct 2012 17:43:44 -0500
Jesse Barnes <jbarnes@virtuousgeek.org> wrote:

> So store into the scratch space of the HWS to make sure the invalidate
> occurs.
> 
> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
> ---
>  drivers/gpu/drm/i915/i915_reg.h         |    6 ++++--
>  drivers/gpu/drm/i915/intel_ringbuffer.c |   22 ++++++++++++++++++----
>  2 files changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 3ceeb68..d98c989 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -241,8 +241,10 @@
>   */
>  #define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*x-1)
>  #define MI_FLUSH_DW		MI_INSTR(0x26, 1) /* for GEN6 */
> -#define   MI_INVALIDATE_TLB	(1<<18)
> -#define   MI_INVALIDATE_BSD	(1<<7)
> +#define   MI_FLUSH_DW_STORE_INDEX	(1<<21)
> +#define   MI_INVALIDATE_TLB		(1<<18)
> +#define   MI_FLUSH_DW_OP_STOREDW	(1<<14)
> +#define   MI_INVALIDATE_BSD		(1<<7)
>  #define MI_BATCH_BUFFER		MI_INSTR(0x30, 1)
>  #define   MI_BATCH_NON_SECURE	(1)
>  #define   MI_BATCH_NON_SECURE_I965 (1<<8)
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 1718c54..d3b7129 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1395,10 +1395,17 @@ static int gen6_ring_flush(struct intel_ring_buffer *ring,
>  		return ret;
>  
>  	cmd = MI_FLUSH_DW;
> +	/*
> +	 * Bspec vol 1c.5 - video engine command streamer:
> +	 * "If ENABLED, all TLBs will be invalidated once the flush
> +	 * operation is complete. This bit is only valid when the
> +	 * Post-Sync Operation field is a value of 1h or 3h."
> +	 */
>  	if (invalidate & I915_GEM_GPU_DOMAINS)
> -		cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD;
> +		cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD |
> +			MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
>  	intel_ring_emit(ring, cmd);
> -	intel_ring_emit(ring, 0);
> +	intel_ring_emit(ring, I915_GEM_SCRATCH_INDEX << 3);
>  	intel_ring_emit(ring, 0);
>  	intel_ring_emit(ring, MI_NOOP);
>  	intel_ring_advance(ring);
> @@ -1436,10 +1443,17 @@ static int blt_ring_flush(struct intel_ring_buffer *ring,
>  		return ret;
>  
>  	cmd = MI_FLUSH_DW;
> +	/*
> +	 * Bspec vol 1c.3 - blitter engine command streamer:
> +	 * "If ENABLED, all TLBs will be invalidated once the flush
> +	 * operation is complete. This bit is only valid when the
> +	 * Post-Sync Operation field is a value of 1h or 3h."
> +	 */
>  	if (invalidate & I915_GEM_DOMAIN_RENDER)
> -		cmd |= MI_INVALIDATE_TLB;
> +		cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
> +			MI_FLUSH_DW_OP_STOREDW;
>  	intel_ring_emit(ring, cmd);
> -	intel_ring_emit(ring, 0);
> +	intel_ring_emit(ring, I915_GEM_SCRATCH_INDEX << 3);
>  	intel_ring_emit(ring, 0);
>  	intel_ring_emit(ring, MI_NOOP);
>  	intel_ring_advance(ring);


Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Daniel Vetter Oct. 3, 2012, 7:20 a.m. UTC | #2
On Tue, Oct 02, 2012 at 05:14:53PM -0700, Ben Widawsky wrote:
> s/MI_FLUSH_SW/MI_FLUSH_DW/

Applied, with spelling fixed. Thanks for patch&review.
-Daniel
Daniel Vetter Oct. 4, 2012, 8:32 a.m. UTC | #3
On Wed, Oct 03, 2012 at 09:20:20AM +0200, Daniel Vetter wrote:
> On Tue, Oct 02, 2012 at 05:14:53PM -0700, Ben Widawsky wrote:
> > s/MI_FLUSH_SW/MI_FLUSH_DW/
> 
> Applied, with spelling fixed. Thanks for patch&review.

This hard-hangs my snb here when X starts (so probably on the very first
batch). Impressive!

I've dropped this one from -fixes. And since no one piped up that the
other w/a patches fix anything, I've moved those two I've merged already
to dinq.

Totally unrelated, I think I'll instate harsher rules for w/a patches:
1. w/a patches only go in through -fixes if they indeed fix an issue we
(or a bug reporter) can reproduce.
2. w/a patches need testcases, too. Either a register check added to i-g-t
or if it's a runtime thing, a runtime assert at a nice place (where
feasible, ofc).
3. I'll randomly stall patches to bring 2. up to par for existing
workarounds.

Cheers, Daniel
Jesse Barnes Oct. 4, 2012, 2:39 p.m. UTC | #4
On Thu, 4 Oct 2012 10:32:13 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Wed, Oct 03, 2012 at 09:20:20AM +0200, Daniel Vetter wrote:
> > On Tue, Oct 02, 2012 at 05:14:53PM -0700, Ben Widawsky wrote:
> > > s/MI_FLUSH_SW/MI_FLUSH_DW/
> > 
> > Applied, with spelling fixed. Thanks for patch&review.
> 
> This hard-hangs my snb here when X starts (so probably on the very first
> batch). Impressive!
> 
> I've dropped this one from -fixes. And since no one piped up that the
> other w/a patches fix anything, I've moved those two I've merged already
> to dinq.

Yeah not -fixes material.  But it fixes i-g-t on VLV and *should* fix
issues we may not have seen yet on IVB, so we need to figure this one
out.

> Totally unrelated, I think I'll instate harsher rules for w/a patches:
> 1. w/a patches only go in through -fixes if they indeed fix an issue we
> (or a bug reporter) can reproduce.

Yeah, that's fine.

> 2. w/a patches need testcases, too. Either a register check added to i-g-t
> or if it's a runtime thing, a runtime assert at a nice place (where
> feasible, ofc).

A register check isn't that useful imo.  A real test case would be
ideal, but given how hard some of these issues are to hit, it's
unrealistic to spend weeks writing a test case for a workaround that's
already been documented to fix a specific issue.

> 3. I'll randomly stall patches to bring 2. up to par for existing
> workarounds.

Btw if you want to take this to its logical conclusion, we also
shouldn't be "fixing" issues that are obvious from code review but
people haven't hit in practice (this goes for a good chunk of the code
churn in our driver involving cleanups and fixes for potential
non-issues).  And that's not even including test case development for
any patch claiming it fixes anything.

That said, I definitely agree we want to add more test cases.  Just
don't block applying known workaround fixes or other stuff on those
test cases.

Jesse
Daniel Vetter Oct. 4, 2012, 2:49 p.m. UTC | #5
On Thu, Oct 4, 2012 at 4:39 PM, Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
>
>> 2. w/a patches need testcases, too. Either a register check added to i-g-t
>> or if it's a runtime thing, a runtime assert at a nice place (where
>> feasible, ofc).
>
> A register check isn't that useful imo.  A real test case would be
> ideal, but given how hard some of these issues are to hit, it's
> unrealistic to spend weeks writing a test case for a workaround that's
> already been documented to fix a specific issue.

That's pretty because of Ben's w/a patch to remove the w/a from the
part of the init sequence where the write sticks, and keep it at the
place where the write doesn't stick. Similarly, we've sometimes
managed to apply w/a not correctly after gpu reset or resume.

Exactly since for many of these we won't ever have a good test-case,
we should at least make sure that we actually succeed in setting the
right values everywhere. Shockingly, we've failed even at that :(

>> 3. I'll randomly stall patches to bring 2. up to par for existing
>> workarounds.
>
> Btw if you want to take this to its logical conclusion, we also
> shouldn't be "fixing" issues that are obvious from code review but
> people haven't hit in practice (this goes for a good chunk of the code
> churn in our driver involving cleanups and fixes for potential
> non-issues).  And that's not even including test case development for
> any patch claiming it fixes anything.

And most of those actually go through dinq, at least if the exact
impact is unclear and there's no testcase or bug to demonstrate the
issue. My gripes here are purely for pushing too much w/a patches to
-fixes, since both your patches and Ben's had regressions that hang
machines. Hence my grumpiness.

> That said, I definitely agree we want to add more test cases.  Just
> don't block applying known workaround fixes or other stuff on those
> test cases.

Sure, I'll usually try to come up with a random distribution not
biased against stuff that fixes real bugs ;-) But even for bugfixes I
want testcases first (as always, if feasible ofc), since ppl are so
easily distracted (and writing testcase is a royal pain).

Cheers, Daniel
Jesse Barnes Oct. 4, 2012, 2:54 p.m. UTC | #6
On Thu, 4 Oct 2012 16:49:42 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:
> > Btw if you want to take this to its logical conclusion, we also
> > shouldn't be "fixing" issues that are obvious from code review but
> > people haven't hit in practice (this goes for a good chunk of the code
> > churn in our driver involving cleanups and fixes for potential
> > non-issues).  And that's not even including test case development for
> > any patch claiming it fixes anything.
> 
> And most of those actually go through dinq, at least if the exact
> impact is unclear and there's no testcase or bug to demonstrate the
> issue. My gripes here are purely for pushing too much w/a patches to
> -fixes, since both your patches and Ben's had regressions that hang
> machines. Hence my grumpiness.

Yeah, not sure why you were pushing them straight to -fixes in the
first place. :)  -fixes is just for known bug fixes that people are
hitting, not speculative ones, especially workarounds we know work in
theory but haven't applied to a specific bug.  So we agree there.  I'm
hoping at least some of the known w/a's will be found to fix a known
bug report, then we can push to -fixes and/or stable as appropriate.

Jesse
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 3ceeb68..d98c989 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -241,8 +241,10 @@ 
  */
 #define MI_LOAD_REGISTER_IMM(x)	MI_INSTR(0x22, 2*x-1)
 #define MI_FLUSH_DW		MI_INSTR(0x26, 1) /* for GEN6 */
-#define   MI_INVALIDATE_TLB	(1<<18)
-#define   MI_INVALIDATE_BSD	(1<<7)
+#define   MI_FLUSH_DW_STORE_INDEX	(1<<21)
+#define   MI_INVALIDATE_TLB		(1<<18)
+#define   MI_FLUSH_DW_OP_STOREDW	(1<<14)
+#define   MI_INVALIDATE_BSD		(1<<7)
 #define MI_BATCH_BUFFER		MI_INSTR(0x30, 1)
 #define   MI_BATCH_NON_SECURE	(1)
 #define   MI_BATCH_NON_SECURE_I965 (1<<8)
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 1718c54..d3b7129 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1395,10 +1395,17 @@  static int gen6_ring_flush(struct intel_ring_buffer *ring,
 		return ret;
 
 	cmd = MI_FLUSH_DW;
+	/*
+	 * Bspec vol 1c.5 - video engine command streamer:
+	 * "If ENABLED, all TLBs will be invalidated once the flush
+	 * operation is complete. This bit is only valid when the
+	 * Post-Sync Operation field is a value of 1h or 3h."
+	 */
 	if (invalidate & I915_GEM_GPU_DOMAINS)
-		cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD;
+		cmd |= MI_INVALIDATE_TLB | MI_INVALIDATE_BSD |
+			MI_FLUSH_DW_STORE_INDEX | MI_FLUSH_DW_OP_STOREDW;
 	intel_ring_emit(ring, cmd);
-	intel_ring_emit(ring, 0);
+	intel_ring_emit(ring, I915_GEM_SCRATCH_INDEX << 3);
 	intel_ring_emit(ring, 0);
 	intel_ring_emit(ring, MI_NOOP);
 	intel_ring_advance(ring);
@@ -1436,10 +1443,17 @@  static int blt_ring_flush(struct intel_ring_buffer *ring,
 		return ret;
 
 	cmd = MI_FLUSH_DW;
+	/*
+	 * Bspec vol 1c.3 - blitter engine command streamer:
+	 * "If ENABLED, all TLBs will be invalidated once the flush
+	 * operation is complete. This bit is only valid when the
+	 * Post-Sync Operation field is a value of 1h or 3h."
+	 */
 	if (invalidate & I915_GEM_DOMAIN_RENDER)
-		cmd |= MI_INVALIDATE_TLB;
+		cmd |= MI_INVALIDATE_TLB | MI_FLUSH_DW_STORE_INDEX |
+			MI_FLUSH_DW_OP_STOREDW;
 	intel_ring_emit(ring, cmd);
-	intel_ring_emit(ring, 0);
+	intel_ring_emit(ring, I915_GEM_SCRATCH_INDEX << 3);
 	intel_ring_emit(ring, 0);
 	intel_ring_emit(ring, MI_NOOP);
 	intel_ring_advance(ring);