diff mbox

drm/i915: disable cpu relocs on ilk and earlier

Message ID 1350288682-6236-1-git-send-email-daniel.vetter@ffwll.ch (mailing list archive)
State New, archived
Headers show

Commit Message

Daniel Vetter Oct. 15, 2012, 8:11 a.m. UTC
Hi Greg&stable-team,

The below patch papers over a graphics corruption issue in 3.5/3.6. The
regression happened due to pwrite tunings in 3.5, which made cpu relocations
much more likely.

The issue seems to have disappeared in 3.7-rc1, but it takes a few days to test
a patch, so we haven't figured out what exactly fixed things. Now users are
taking out their pitchforks already, so instead of wasting more days (maybe
weeks?) to fully understand the bug before backporting the fix, we've opted for
the below disable patch, which should have minimal impact (at most it undoes the
tuning improvements in 3.5).

Patch is tested by reporters & acked by all relevant ppl, please apply to
3.5/3.6 series kernels.

Thanks, Daniel

---

They seem to be implicated in render corruptions. And up to now no one
really seems to understand the issue, so let's just disable them for
now. Most of the machines exhibiting this issue have only a 128 gtt
mmio window, so increased pressure on the mappable part (and so higher
chance for cpu relocs) seems to be the key.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=852210
Tested-by: Dave Airlie <airlied@gmail.com>
Cc: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |    6 ++++++
 1 file changed, 6 insertions(+)

Comments

Greg KH Oct. 15, 2012, 3:11 p.m. UTC | #1
On Mon, Oct 15, 2012 at 10:11:22AM +0200, Daniel Vetter wrote:
> Hi Greg&stable-team,
> 
> The below patch papers over a graphics corruption issue in 3.5/3.6. The
> regression happened due to pwrite tunings in 3.5, which made cpu relocations
> much more likely.
> 
> The issue seems to have disappeared in 3.7-rc1, but it takes a few days to test
> a patch, so we haven't figured out what exactly fixed things. Now users are
> taking out their pitchforks already, so instead of wasting more days (maybe
> weeks?) to fully understand the bug before backporting the fix, we've opted for
> the below disable patch, which should have minimal impact (at most it undoes the
> tuning improvements in 3.5).
> 
> Patch is tested by reporters & acked by all relevant ppl, please apply to
> 3.5/3.6 series kernels.

No, I'd really like to wait until you figure out what is happening in
3.7-rc1 right now before applying the patch.  We have the rule, "it must
be in Linus's tree first" for a very good reason :)

So, I'll hold onto this until you say what's up with 3.7-rc1, ok?

thanks,

greg k-h
Daniel Vetter Oct. 15, 2012, 5:16 p.m. UTC | #2
On Mon, Oct 15, 2012 at 5:11 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> On Mon, Oct 15, 2012 at 10:11:22AM +0200, Daniel Vetter wrote:
>> Hi Greg&stable-team,
>>
>> The below patch papers over a graphics corruption issue in 3.5/3.6. The
>> regression happened due to pwrite tunings in 3.5, which made cpu relocations
>> much more likely.
>>
>> The issue seems to have disappeared in 3.7-rc1, but it takes a few days to test
>> a patch, so we haven't figured out what exactly fixed things. Now users are
>> taking out their pitchforks already, so instead of wasting more days (maybe
>> weeks?) to fully understand the bug before backporting the fix, we've opted for
>> the below disable patch, which should have minimal impact (at most it undoes the
>> tuning improvements in 3.5).
>>
>> Patch is tested by reporters & acked by all relevant ppl, please apply to
>> 3.5/3.6 series kernels.
>
> No, I'd really like to wait until you figure out what is happening in
> 3.7-rc1 right now before applying the patch.  We have the rule, "it must
> be in Linus's tree first" for a very good reason :)
>
> So, I'll hold onto this until you say what's up with 3.7-rc1, ok?

Can do, might send a few pitchforks I collect your way though ;-)

While I have your attention (and now that -rc1 is out), can you please
pick up my two console_lock patches into your tty tree for 3.8?

Thanks, Daniel
Greg KH Oct. 15, 2012, 5:34 p.m. UTC | #3
On Mon, Oct 15, 2012 at 07:16:26PM +0200, Daniel Vetter wrote:
> On Mon, Oct 15, 2012 at 5:11 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> > On Mon, Oct 15, 2012 at 10:11:22AM +0200, Daniel Vetter wrote:
> >> Hi Greg&stable-team,
> >>
> >> The below patch papers over a graphics corruption issue in 3.5/3.6. The
> >> regression happened due to pwrite tunings in 3.5, which made cpu relocations
> >> much more likely.
> >>
> >> The issue seems to have disappeared in 3.7-rc1, but it takes a few days to test
> >> a patch, so we haven't figured out what exactly fixed things. Now users are
> >> taking out their pitchforks already, so instead of wasting more days (maybe
> >> weeks?) to fully understand the bug before backporting the fix, we've opted for
> >> the below disable patch, which should have minimal impact (at most it undoes the
> >> tuning improvements in 3.5).
> >>
> >> Patch is tested by reporters & acked by all relevant ppl, please apply to
> >> 3.5/3.6 series kernels.
> >
> > No, I'd really like to wait until you figure out what is happening in
> > 3.7-rc1 right now before applying the patch.  We have the rule, "it must
> > be in Linus's tree first" for a very good reason :)
> >
> > So, I'll hold onto this until you say what's up with 3.7-rc1, ok?
> 
> Can do, might send a few pitchforks I collect your way though ;-)

No problem at all, I can handle them :)

> While I have your attention (and now that -rc1 is out), can you please
> pick up my two console_lock patches into your tty tree for 3.8?

Let me catch up on my 3.7 patches first please, I'm still on the road
traveling to conferences on different continents, and am in a conference
this week as well.  The fact that I'm waking up at the right time is
amazing...

greg k-h
Daniel Vetter Oct. 18, 2012, 7:34 a.m. UTC | #4
On Mon, Oct 15, 2012 at 5:11 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> On Mon, Oct 15, 2012 at 10:11:22AM +0200, Daniel Vetter wrote:
>> Hi Greg&stable-team,
>>
>> The below patch papers over a graphics corruption issue in 3.5/3.6. The
>> regression happened due to pwrite tunings in 3.5, which made cpu relocations
>> much more likely.
>>
>> The issue seems to have disappeared in 3.7-rc1, but it takes a few days to test
>> a patch, so we haven't figured out what exactly fixed things. Now users are
>> taking out their pitchforks already, so instead of wasting more days (maybe
>> weeks?) to fully understand the bug before backporting the fix, we've opted for
>> the below disable patch, which should have minimal impact (at most it undoes the
>> tuning improvements in 3.5).
>>
>> Patch is tested by reporters & acked by all relevant ppl, please apply to
>> 3.5/3.6 series kernels.
>
> No, I'd really like to wait until you figure out what is happening in
> 3.7-rc1 right now before applying the patch.  We have the rule, "it must
> be in Linus's tree first" for a very good reason :)

Ok, the verdict is in (thanks a lot Dave for testing all these
different patches) and it seems like

commit 504c7267a1e84b157cbd7e9c1b805e1bc0c2c846
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Aug 23 13:12:52 2012 +0100

    drm/i915: Use cpu relocations if the object is in the GTT but not mappable

from upstream nicely papers over the issues. Please apply to 3.5/3.6
stable series (earlier kernels don't exhibit the problem).

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=852210
Tested-by: Dave Airlie <airlied@gmail.com>

Thanks, Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index ff2819e..682156a 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -268,6 +268,12 @@  eb_destroy(struct eb_objects *eb)
 
 static inline int use_cpu_reloc(struct drm_i915_gem_object *obj)
 {
+	/* cpu relocs are implicated in some not-yet-understood render
+	 * corruptions on at least ilk, but probably also gm45. Until we know
+	 * what's going on, just disable them. */
+	if (INTEL_INFO(obj->base.dev)->gen < 6)
+		return false;
+
 	return (obj->base.write_domain == I915_GEM_DOMAIN_CPU ||
 		obj->cache_level != I915_CACHE_NONE);
 }