drm/i915: Remove Braswell GGTT update w/a

Message ID	20170220124718.14796-1-chris@chris-wilson.co.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Chris Wilson <chris@chris-wilson.co.uk> To: intel-gfx@lists.freedesktop.org Date: Mon, 20 Feb 2017 12:47:18 +0000 Message-Id: <20170220124718.14796-1-chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Subject: [Intel-gfx] [PATCH] drm/i915: Remove Braswell GGTT update w/a Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Chris Wilson Feb. 20, 2017, 12:47 p.m. UTC

Testing with concurrent GGTT accesses no longer show the coherency
problems from yonder, commit 5bab6f60cb4d ("drm/i915: Serialise updates
to GGTT with access through GGTT on Braswell"). My presumption is that
the root cause was more likely fixed by commit 3b5724d702ef ("drm/i915:
Wait for writes through the GTT to land before reading back"), along
with the use of WC updates to the global gTT in commit 8448661d65f6
("drm/i915: Convert clflushed pagetables over to WC maps". Given
that the original symptoms can no longer be reproduced, time to remove
the workaround.

Testcase: igt/gem_concurrenct_blit
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 28 ----------------------------
 1 file changed, 28 deletions(-)

Joonas Lahtinen Feb. 22, 2017, 11:27 a.m. UTC | #1

On ma, 2017-02-20 at 12:47 +0000, Chris Wilson wrote:
> Testing with concurrent GGTT accesses no longer show the coherency
> problems from yonder, commit 5bab6f60cb4d ("drm/i915: Serialise updates
> to GGTT with access through GGTT on Braswell"). My presumption is that
> the root cause was more likely fixed by commit 3b5724d702ef ("drm/i915:
> Wait for writes through the GTT to land before reading back"), along
> with the use of WC updates to the global gTT in commit 8448661d65f6
> ("drm/i915: Convert clflushed pagetables over to WC maps". Given
> that the original symptoms can no longer be reproduced, time to remove
> the workaround.
> 
> Testcase: igt/gem_concurrenct_blit
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>

Makes one think if the original fix has been appropriate, when adding
stop_machine for a software bug :P

Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>

Regards, Joonas

Chris Wilson Feb. 22, 2017, 11:39 a.m. UTC | #2

On Wed, Feb 22, 2017 at 01:27:41PM +0200, Joonas Lahtinen wrote:
> On ma, 2017-02-20 at 12:47 +0000, Chris Wilson wrote:
> > Testing with concurrent GGTT accesses no longer show the coherency
> > problems from yonder, commit 5bab6f60cb4d ("drm/i915: Serialise updates
> > to GGTT with access through GGTT on Braswell"). My presumption is that
> > the root cause was more likely fixed by commit 3b5724d702ef ("drm/i915:
> > Wait for writes through the GTT to land before reading back"), along
> > with the use of WC updates to the global gTT in commit 8448661d65f6
> > ("drm/i915: Convert clflushed pagetables over to WC maps". Given
> > that the original symptoms can no longer be reproduced, time to remove
> > the workaround.
> > 
> > Testcase: igt/gem_concurrenct_blit
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Makes one think if the original fix has been appropriate, when adding
> stop_machine for a software bug :P

Depends if you consider the months of hair pulling trying to find where
the flush/stall was missing. It was a desperate patch to fix an annoying
corruption issue - and since it seems that we now just avoid the
dangerous path by taking a different route through hw, I don't think
it is was wholly a sw bug.
-Chris

Chris Wilson Feb. 22, 2017, 6:02 p.m. UTC | #3

On Wed, Feb 22, 2017 at 11:39:30AM +0000, Chris Wilson wrote:
> On Wed, Feb 22, 2017 at 01:27:41PM +0200, Joonas Lahtinen wrote:
> > On ma, 2017-02-20 at 12:47 +0000, Chris Wilson wrote:
> > > Testing with concurrent GGTT accesses no longer show the coherency
> > > problems from yonder, commit 5bab6f60cb4d ("drm/i915: Serialise updates
> > > to GGTT with access through GGTT on Braswell"). My presumption is that
> > > the root cause was more likely fixed by commit 3b5724d702ef ("drm/i915:
> > > Wait for writes through the GTT to land before reading back"), along
> > > with the use of WC updates to the global gTT in commit 8448661d65f6
> > > ("drm/i915: Convert clflushed pagetables over to WC maps". Given
> > > that the original symptoms can no longer be reproduced, time to remove
> > > the workaround.
> > > 
> > > Testcase: igt/gem_concurrenct_blit
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > 
> > Makes one think if the original fix has been appropriate, when adding
> > stop_machine for a software bug :P
> 
> Depends if you consider the months of hair pulling trying to find where
> the flush/stall was missing. It was a desperate patch to fix an annoying
> corruption issue - and since it seems that we now just avoid the
> dangerous path by taking a different route through hw, I don't think
> it is was wholly a sw bug.

Bah, after a few days of continuous testing, I've hit a workload that
shows the bug again. (Small numbers of large GTT objects, rather than
large number of small GTT objects.)
-Chris

Chris Wilson Feb. 22, 2017, 11:04 p.m. UTC | #4

On Wed, Feb 22, 2017 at 06:02:46PM +0000, Chris Wilson wrote:
> On Wed, Feb 22, 2017 at 11:39:30AM +0000, Chris Wilson wrote:
> > On Wed, Feb 22, 2017 at 01:27:41PM +0200, Joonas Lahtinen wrote:
> > > On ma, 2017-02-20 at 12:47 +0000, Chris Wilson wrote:
> > > > Testing with concurrent GGTT accesses no longer show the coherency
> > > > problems from yonder, commit 5bab6f60cb4d ("drm/i915: Serialise updates
> > > > to GGTT with access through GGTT on Braswell"). My presumption is that
> > > > the root cause was more likely fixed by commit 3b5724d702ef ("drm/i915:
> > > > Wait for writes through the GTT to land before reading back"), along
> > > > with the use of WC updates to the global gTT in commit 8448661d65f6
> > > > ("drm/i915: Convert clflushed pagetables over to WC maps". Given
> > > > that the original symptoms can no longer be reproduced, time to remove
> > > > the workaround.
> > > > 
> > > > Testcase: igt/gem_concurrenct_blit
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > 
> > > Makes one think if the original fix has been appropriate, when adding
> > > stop_machine for a software bug :P
> > 
> > Depends if you consider the months of hair pulling trying to find where
> > the flush/stall was missing. It was a desperate patch to fix an annoying
> > corruption issue - and since it seems that we now just avoid the
> > dangerous path by taking a different route through hw, I don't think
> > it is was wholly a sw bug.
> 
> Bah, after a few days of continuous testing, I've hit a workload that
> shows the bug again. (Small numbers of large GTT objects, rather than
> large number of small GTT objects.)

Hmm, and it also died with the w/a with nigh on identical symptoms. And
appears quite tempermental. Uncertainity is prevailing.
-Chris

Chris Wilson Feb. 23, 2017, 9:33 a.m. UTC | #5

On Wed, Feb 22, 2017 at 01:27:41PM +0200, Joonas Lahtinen wrote:
> On ma, 2017-02-20 at 12:47 +0000, Chris Wilson wrote:
> > Testing with concurrent GGTT accesses no longer show the coherency
> > problems from yonder, commit 5bab6f60cb4d ("drm/i915: Serialise updates
> > to GGTT with access through GGTT on Braswell"). My presumption is that
> > the root cause was more likely fixed by commit 3b5724d702ef ("drm/i915:
> > Wait for writes through the GTT to land before reading back"), along
> > with the use of WC updates to the global gTT in commit 8448661d65f6
> > ("drm/i915: Convert clflushed pagetables over to WC maps". Given
> > that the original symptoms can no longer be reproduced, time to remove
> > the workaround.
> > 
> > Testcase: igt/gem_concurrenct_blit
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> 
> Makes one think if the original fix has been appropriate, when adding
> stop_machine for a software bug :P

I can reliably kill the machine with and without the patch in the same
manner. I don't believe I am seeing the same corruption that prompted
the w/a in the first place, so I've bitten the bullet and applied.
-Chris

drm/i915: Remove Braswell GGTT update w/a

Commit Message

Comments

Patch