diff mbox

drm/i915: Remove Braswell GGTT update w/a

Message ID 20170220124718.14796-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson Feb. 20, 2017, 12:47 p.m. UTC
Testing with concurrent GGTT accesses no longer show the coherency
problems from yonder, commit 5bab6f60cb4d ("drm/i915: Serialise updates
to GGTT with access through GGTT on Braswell"). My presumption is that
the root cause was more likely fixed by commit 3b5724d702ef ("drm/i915:
Wait for writes through the GTT to land before reading back"), along
with the use of WC updates to the global gTT in commit 8448661d65f6
("drm/i915: Convert clflushed pagetables over to WC maps". Given
that the original symptoms can no longer be reproduced, time to remove
the workaround.

Testcase: igt/gem_concurrenct_blit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 28 ----------------------------
 1 file changed, 28 deletions(-)

Comments

Joonas Lahtinen Feb. 22, 2017, 11:27 a.m. UTC | #1
On ma, 2017-02-20 at 12:47 +0000, Chris Wilson wrote:
> Testing with concurrent GGTT accesses no longer show the coherency
> problems from yonder, commit 5bab6f60cb4d ("drm/i915: Serialise updates
> to GGTT with access through GGTT on Braswell"). My presumption is that
> the root cause was more likely fixed by commit 3b5724d702ef ("drm/i915:
> Wait for writes through the GTT to land before reading back"), along
> with the use of WC updates to the global gTT in commit 8448661d65f6
> ("drm/i915: Convert clflushed pagetables over to WC maps". Given
> that the original symptoms can no longer be reproduced, time to remove
> the workaround.
> 
> Testcase: igt/gem_concurrenct_blit
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

Makes one think if the original fix has been appropriate, when adding
stop_machine for a software bug :P

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas
Chris Wilson Feb. 22, 2017, 11:39 a.m. UTC | #2
On Wed, Feb 22, 2017 at 01:27:41PM +0200, Joonas Lahtinen wrote:
> On ma, 2017-02-20 at 12:47 +0000, Chris Wilson wrote:
> > Testing with concurrent GGTT accesses no longer show the coherency
> > problems from yonder, commit 5bab6f60cb4d ("drm/i915: Serialise updates
> > to GGTT with access through GGTT on Braswell"). My presumption is that
> > the root cause was more likely fixed by commit 3b5724d702ef ("drm/i915:
> > Wait for writes through the GTT to land before reading back"), along
> > with the use of WC updates to the global gTT in commit 8448661d65f6
> > ("drm/i915: Convert clflushed pagetables over to WC maps". Given
> > that the original symptoms can no longer be reproduced, time to remove
> > the workaround.
> > 
> > Testcase: igt/gem_concurrenct_blit
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Makes one think if the original fix has been appropriate, when adding
> stop_machine for a software bug :P

Depends if you consider the months of hair pulling trying to find where
the flush/stall was missing. It was a desperate patch to fix an annoying
corruption issue - and since it seems that we now just avoid the
dangerous path by taking a different route through hw, I don't think
it is was wholly a sw bug.
-Chris
Chris Wilson Feb. 22, 2017, 6:02 p.m. UTC | #3
On Wed, Feb 22, 2017 at 11:39:30AM +0000, Chris Wilson wrote:
> On Wed, Feb 22, 2017 at 01:27:41PM +0200, Joonas Lahtinen wrote:
> > On ma, 2017-02-20 at 12:47 +0000, Chris Wilson wrote:
> > > Testing with concurrent GGTT accesses no longer show the coherency
> > > problems from yonder, commit 5bab6f60cb4d ("drm/i915: Serialise updates
> > > to GGTT with access through GGTT on Braswell"). My presumption is that
> > > the root cause was more likely fixed by commit 3b5724d702ef ("drm/i915:
> > > Wait for writes through the GTT to land before reading back"), along
> > > with the use of WC updates to the global gTT in commit 8448661d65f6
> > > ("drm/i915: Convert clflushed pagetables over to WC maps". Given
> > > that the original symptoms can no longer be reproduced, time to remove
> > > the workaround.
> > > 
> > > Testcase: igt/gem_concurrenct_blit
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > 
> > Makes one think if the original fix has been appropriate, when adding
> > stop_machine for a software bug :P
> 
> Depends if you consider the months of hair pulling trying to find where
> the flush/stall was missing. It was a desperate patch to fix an annoying
> corruption issue - and since it seems that we now just avoid the
> dangerous path by taking a different route through hw, I don't think
> it is was wholly a sw bug.

Bah, after a few days of continuous testing, I've hit a workload that
shows the bug again. (Small numbers of large GTT objects, rather than
large number of small GTT objects.)
-Chris
Chris Wilson Feb. 22, 2017, 11:04 p.m. UTC | #4
On Wed, Feb 22, 2017 at 06:02:46PM +0000, Chris Wilson wrote:
> On Wed, Feb 22, 2017 at 11:39:30AM +0000, Chris Wilson wrote:
> > On Wed, Feb 22, 2017 at 01:27:41PM +0200, Joonas Lahtinen wrote:
> > > On ma, 2017-02-20 at 12:47 +0000, Chris Wilson wrote:
> > > > Testing with concurrent GGTT accesses no longer show the coherency
> > > > problems from yonder, commit 5bab6f60cb4d ("drm/i915: Serialise updates
> > > > to GGTT with access through GGTT on Braswell"). My presumption is that
> > > > the root cause was more likely fixed by commit 3b5724d702ef ("drm/i915:
> > > > Wait for writes through the GTT to land before reading back"), along
> > > > with the use of WC updates to the global gTT in commit 8448661d65f6
> > > > ("drm/i915: Convert clflushed pagetables over to WC maps". Given
> > > > that the original symptoms can no longer be reproduced, time to remove
> > > > the workaround.
> > > > 
> > > > Testcase: igt/gem_concurrenct_blit
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > 
> > > Makes one think if the original fix has been appropriate, when adding
> > > stop_machine for a software bug :P
> > 
> > Depends if you consider the months of hair pulling trying to find where
> > the flush/stall was missing. It was a desperate patch to fix an annoying
> > corruption issue - and since it seems that we now just avoid the
> > dangerous path by taking a different route through hw, I don't think
> > it is was wholly a sw bug.
> 
> Bah, after a few days of continuous testing, I've hit a workload that
> shows the bug again. (Small numbers of large GTT objects, rather than
> large number of small GTT objects.)

Hmm, and it also died with the w/a with nigh on identical symptoms. And
appears quite tempermental. Uncertainity is prevailing.
-Chris
Chris Wilson Feb. 23, 2017, 9:33 a.m. UTC | #5
On Wed, Feb 22, 2017 at 01:27:41PM +0200, Joonas Lahtinen wrote:
> On ma, 2017-02-20 at 12:47 +0000, Chris Wilson wrote:
> > Testing with concurrent GGTT accesses no longer show the coherency
> > problems from yonder, commit 5bab6f60cb4d ("drm/i915: Serialise updates
> > to GGTT with access through GGTT on Braswell"). My presumption is that
> > the root cause was more likely fixed by commit 3b5724d702ef ("drm/i915:
> > Wait for writes through the GTT to land before reading back"), along
> > with the use of WC updates to the global gTT in commit 8448661d65f6
> > ("drm/i915: Convert clflushed pagetables over to WC maps". Given
> > that the original symptoms can no longer be reproduced, time to remove
> > the workaround.
> > 
> > Testcase: igt/gem_concurrenct_blit
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Makes one think if the original fix has been appropriate, when adding
> stop_machine for a software bug :P

I can reliably kill the machine with and without the patch in the same
manner. I don't believe I am seeing the same corruption that prompted
the w/a in the first place, so I've bitten the bullet and applied.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 18c051a11b7d..8302dae0b9c6 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2085,32 +2085,6 @@  static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
 	ggtt->invalidate(vm->i915);
 }
 
-struct insert_entries {
-	struct i915_address_space *vm;
-	struct sg_table *st;
-	u64 start;
-	enum i915_cache_level level;
-	u32 flags;
-};
-
-static int gen8_ggtt_insert_entries__cb(void *_arg)
-{
-	struct insert_entries *arg = _arg;
-	gen8_ggtt_insert_entries(arg->vm, arg->st,
-				 arg->start, arg->level, arg->flags);
-	return 0;
-}
-
-static void gen8_ggtt_insert_entries__BKL(struct i915_address_space *vm,
-					  struct sg_table *st,
-					  u64 start,
-					  enum i915_cache_level level,
-					  u32 flags)
-{
-	struct insert_entries arg = { vm, st, start, level, flags };
-	stop_machine(gen8_ggtt_insert_entries__cb, &arg, NULL);
-}
-
 static void gen6_ggtt_insert_page(struct i915_address_space *vm,
 				  dma_addr_t addr,
 				  u64 offset,
@@ -2762,8 +2736,6 @@  static int gen8_gmch_probe(struct i915_ggtt *ggtt)
 		ggtt->base.clear_range = gen8_ggtt_clear_range;
 
 	ggtt->base.insert_entries = gen8_ggtt_insert_entries;
-	if (IS_CHERRYVIEW(dev_priv))
-		ggtt->base.insert_entries = gen8_ggtt_insert_entries__BKL;
 
 	ggtt->invalidate = gen6_ggtt_invalidate;