Message ID | 20170220124718.14796-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On ma, 2017-02-20 at 12:47 +0000, Chris Wilson wrote: > Testing with concurrent GGTT accesses no longer show the coherency > problems from yonder, commit 5bab6f60cb4d ("drm/i915: Serialise updates > to GGTT with access through GGTT on Braswell"). My presumption is that > the root cause was more likely fixed by commit 3b5724d702ef ("drm/i915: > Wait for writes through the GTT to land before reading back"), along > with the use of WC updates to the global gTT in commit 8448661d65f6 > ("drm/i915: Convert clflushed pagetables over to WC maps". Given > that the original symptoms can no longer be reproduced, time to remove > the workaround. > > Testcase: igt/gem_concurrenct_blit > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Makes one think if the original fix has been appropriate, when adding stop_machine for a software bug :P Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Regards, Joonas
On Wed, Feb 22, 2017 at 01:27:41PM +0200, Joonas Lahtinen wrote: > On ma, 2017-02-20 at 12:47 +0000, Chris Wilson wrote: > > Testing with concurrent GGTT accesses no longer show the coherency > > problems from yonder, commit 5bab6f60cb4d ("drm/i915: Serialise updates > > to GGTT with access through GGTT on Braswell"). My presumption is that > > the root cause was more likely fixed by commit 3b5724d702ef ("drm/i915: > > Wait for writes through the GTT to land before reading back"), along > > with the use of WC updates to the global gTT in commit 8448661d65f6 > > ("drm/i915: Convert clflushed pagetables over to WC maps". Given > > that the original symptoms can no longer be reproduced, time to remove > > the workaround. > > > > Testcase: igt/gem_concurrenct_blit > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > > Makes one think if the original fix has been appropriate, when adding > stop_machine for a software bug :P Depends if you consider the months of hair pulling trying to find where the flush/stall was missing. It was a desperate patch to fix an annoying corruption issue - and since it seems that we now just avoid the dangerous path by taking a different route through hw, I don't think it is was wholly a sw bug. -Chris
On Wed, Feb 22, 2017 at 11:39:30AM +0000, Chris Wilson wrote: > On Wed, Feb 22, 2017 at 01:27:41PM +0200, Joonas Lahtinen wrote: > > On ma, 2017-02-20 at 12:47 +0000, Chris Wilson wrote: > > > Testing with concurrent GGTT accesses no longer show the coherency > > > problems from yonder, commit 5bab6f60cb4d ("drm/i915: Serialise updates > > > to GGTT with access through GGTT on Braswell"). My presumption is that > > > the root cause was more likely fixed by commit 3b5724d702ef ("drm/i915: > > > Wait for writes through the GTT to land before reading back"), along > > > with the use of WC updates to the global gTT in commit 8448661d65f6 > > > ("drm/i915: Convert clflushed pagetables over to WC maps". Given > > > that the original symptoms can no longer be reproduced, time to remove > > > the workaround. > > > > > > Testcase: igt/gem_concurrenct_blit > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > > > > Makes one think if the original fix has been appropriate, when adding > > stop_machine for a software bug :P > > Depends if you consider the months of hair pulling trying to find where > the flush/stall was missing. It was a desperate patch to fix an annoying > corruption issue - and since it seems that we now just avoid the > dangerous path by taking a different route through hw, I don't think > it is was wholly a sw bug. Bah, after a few days of continuous testing, I've hit a workload that shows the bug again. (Small numbers of large GTT objects, rather than large number of small GTT objects.) -Chris
On Wed, Feb 22, 2017 at 06:02:46PM +0000, Chris Wilson wrote: > On Wed, Feb 22, 2017 at 11:39:30AM +0000, Chris Wilson wrote: > > On Wed, Feb 22, 2017 at 01:27:41PM +0200, Joonas Lahtinen wrote: > > > On ma, 2017-02-20 at 12:47 +0000, Chris Wilson wrote: > > > > Testing with concurrent GGTT accesses no longer show the coherency > > > > problems from yonder, commit 5bab6f60cb4d ("drm/i915: Serialise updates > > > > to GGTT with access through GGTT on Braswell"). My presumption is that > > > > the root cause was more likely fixed by commit 3b5724d702ef ("drm/i915: > > > > Wait for writes through the GTT to land before reading back"), along > > > > with the use of WC updates to the global gTT in commit 8448661d65f6 > > > > ("drm/i915: Convert clflushed pagetables over to WC maps". Given > > > > that the original symptoms can no longer be reproduced, time to remove > > > > the workaround. > > > > > > > > Testcase: igt/gem_concurrenct_blit > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > > > > > > Makes one think if the original fix has been appropriate, when adding > > > stop_machine for a software bug :P > > > > Depends if you consider the months of hair pulling trying to find where > > the flush/stall was missing. It was a desperate patch to fix an annoying > > corruption issue - and since it seems that we now just avoid the > > dangerous path by taking a different route through hw, I don't think > > it is was wholly a sw bug. > > Bah, after a few days of continuous testing, I've hit a workload that > shows the bug again. (Small numbers of large GTT objects, rather than > large number of small GTT objects.) Hmm, and it also died with the w/a with nigh on identical symptoms. And appears quite tempermental. Uncertainity is prevailing. -Chris
On Wed, Feb 22, 2017 at 01:27:41PM +0200, Joonas Lahtinen wrote: > On ma, 2017-02-20 at 12:47 +0000, Chris Wilson wrote: > > Testing with concurrent GGTT accesses no longer show the coherency > > problems from yonder, commit 5bab6f60cb4d ("drm/i915: Serialise updates > > to GGTT with access through GGTT on Braswell"). My presumption is that > > the root cause was more likely fixed by commit 3b5724d702ef ("drm/i915: > > Wait for writes through the GTT to land before reading back"), along > > with the use of WC updates to the global gTT in commit 8448661d65f6 > > ("drm/i915: Convert clflushed pagetables over to WC maps". Given > > that the original symptoms can no longer be reproduced, time to remove > > the workaround. > > > > Testcase: igt/gem_concurrenct_blit > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > > Makes one think if the original fix has been appropriate, when adding > stop_machine for a software bug :P I can reliably kill the machine with and without the patch in the same manner. I don't believe I am seeing the same corruption that prompted the w/a in the first place, so I've bitten the bullet and applied. -Chris
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 18c051a11b7d..8302dae0b9c6 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -2085,32 +2085,6 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm, ggtt->invalidate(vm->i915); } -struct insert_entries { - struct i915_address_space *vm; - struct sg_table *st; - u64 start; - enum i915_cache_level level; - u32 flags; -}; - -static int gen8_ggtt_insert_entries__cb(void *_arg) -{ - struct insert_entries *arg = _arg; - gen8_ggtt_insert_entries(arg->vm, arg->st, - arg->start, arg->level, arg->flags); - return 0; -} - -static void gen8_ggtt_insert_entries__BKL(struct i915_address_space *vm, - struct sg_table *st, - u64 start, - enum i915_cache_level level, - u32 flags) -{ - struct insert_entries arg = { vm, st, start, level, flags }; - stop_machine(gen8_ggtt_insert_entries__cb, &arg, NULL); -} - static void gen6_ggtt_insert_page(struct i915_address_space *vm, dma_addr_t addr, u64 offset, @@ -2762,8 +2736,6 @@ static int gen8_gmch_probe(struct i915_ggtt *ggtt) ggtt->base.clear_range = gen8_ggtt_clear_range; ggtt->base.insert_entries = gen8_ggtt_insert_entries; - if (IS_CHERRYVIEW(dev_priv)) - ggtt->base.insert_entries = gen8_ggtt_insert_entries__BKL; ggtt->invalidate = gen6_ggtt_invalidate;
Testing with concurrent GGTT accesses no longer show the coherency problems from yonder, commit 5bab6f60cb4d ("drm/i915: Serialise updates to GGTT with access through GGTT on Braswell"). My presumption is that the root cause was more likely fixed by commit 3b5724d702ef ("drm/i915: Wait for writes through the GTT to land before reading back"), along with the use of WC updates to the global gTT in commit 8448661d65f6 ("drm/i915: Convert clflushed pagetables over to WC maps". Given that the original symptoms can no longer be reproduced, time to remove the workaround. Testcase: igt/gem_concurrenct_blit Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> --- drivers/gpu/drm/i915/i915_gem_gtt.c | 28 ---------------------------- 1 file changed, 28 deletions(-)