[1/2] drm/i915: tune the RC6 threshold for stability
diff mbox

Message ID 1370990967-22892-1-git-send-email-marcheu@chromium.org
State New, archived
Headers show

Commit Message

Stéphane Marchesin June 11, 2013, 10:49 p.m. UTC
It's basically the same deal as the RC6+ issues on ivy bridge
except this time with RC6 on sandy bridge. Like last time the
core of the issue is that the timings don't work 100% with our
voltage regulator. So from time to time, the kernel will print
a warning message about the GPU not getting out of RC6. In
particular, I found this fairly easy to reproduce during
suspend/resume.

Changing the threshold to 150000 instead of 50000 seems to fix
the issue.

I also measured the idle power usage before/after this patch and
didn't see a difference on a sandy bridge laptop.

Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
---
 drivers/gpu/drm/i915/intel_pm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Chris Wilson June 12, 2013, 9:41 a.m. UTC | #1
On Tue, Jun 11, 2013 at 03:49:26PM -0700, Stéphane Marchesin wrote:
> It's basically the same deal as the RC6+ issues on ivy bridge
> except this time with RC6 on sandy bridge. Like last time the
> core of the issue is that the timings don't work 100% with our
> voltage regulator. So from time to time, the kernel will print
> a warning message about the GPU not getting out of RC6. In
> particular, I found this fairly easy to reproduce during
> suspend/resume.
> 
> Changing the threshold to 150000 instead of 50000 seems to fix
> the issue.
> 
> I also measured the idle power usage before/after this patch and
> didn't see a difference on a sandy bridge laptop.
> 
> Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>

One magic number for another with no idea what is blowing up - I fear we
are just changing the frequency of the hang. I've pinged a number of snb
rc6 bug reports to see if we get a bite.

FWIW,
Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
Stéphane Marchesin June 14, 2013, 7:13 p.m. UTC | #2
On Wed, Jun 12, 2013 at 2:41 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Tue, Jun 11, 2013 at 03:49:26PM -0700, Stéphane Marchesin wrote:
>> It's basically the same deal as the RC6+ issues on ivy bridge
>> except this time with RC6 on sandy bridge. Like last time the
>> core of the issue is that the timings don't work 100% with our
>> voltage regulator. So from time to time, the kernel will print
>> a warning message about the GPU not getting out of RC6. In
>> particular, I found this fairly easy to reproduce during
>> suspend/resume.
>>
>> Changing the threshold to 150000 instead of 50000 seems to fix
>> the issue.
>>
>> I also measured the idle power usage before/after this patch and
>> didn't see a difference on a sandy bridge laptop.
>>
>> Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
>
> One magic number for another with no idea what is blowing up - I fear we
> are just changing the frequency of the hang. I've pinged a number of snb
> rc6 bug reports to see if we get a bite.

Yup, if only Intel documented those registers :)

Stéphane

>
> FWIW,
> Acked-by: Chris Wilson <chris@chris-wilson.co.uk>
> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre
Daniel Vetter June 14, 2013, 7:32 p.m. UTC | #3
On Fri, Jun 14, 2013 at 9:13 PM, Stéphane Marchesin
<marcheu@chromium.org> wrote:
> On Wed, Jun 12, 2013 at 2:41 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> On Tue, Jun 11, 2013 at 03:49:26PM -0700, Stéphane Marchesin wrote:
>>> It's basically the same deal as the RC6+ issues on ivy bridge
>>> except this time with RC6 on sandy bridge. Like last time the
>>> core of the issue is that the timings don't work 100% with our
>>> voltage regulator. So from time to time, the kernel will print
>>> a warning message about the GPU not getting out of RC6. In
>>> particular, I found this fairly easy to reproduce during
>>> suspend/resume.
>>>
>>> Changing the threshold to 150000 instead of 50000 seems to fix
>>> the issue.
>>>
>>> I also measured the idle power usage before/after this patch and
>>> didn't see a difference on a sandy bridge laptop.
>>>
>>> Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
>>
>> One magic number for another with no idea what is blowing up - I fear we
>> are just changing the frequency of the hang. I've pinged a number of snb
>> rc6 bug reports to see if we get a bite.
>
> Yup, if only Intel documented those registers :)

We've spammed rc6 bugs in bugzilla, one reporter says that this patch
breaks rc6 from "sometimes it doesn't work after resume" to "always
broken":

https://bugs.freedesktop.org/show_bug.cgi?id=54089#c63

So I guess I can't merge this :(
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
Stéphane Marchesin June 20, 2013, 12:43 a.m. UTC | #4
On Fri, Jun 14, 2013 at 12:32 PM, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> On Fri, Jun 14, 2013 at 9:13 PM, Stéphane Marchesin
> <marcheu@chromium.org> wrote:
>> On Wed, Jun 12, 2013 at 2:41 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>>> On Tue, Jun 11, 2013 at 03:49:26PM -0700, Stéphane Marchesin wrote:
>>>> It's basically the same deal as the RC6+ issues on ivy bridge
>>>> except this time with RC6 on sandy bridge. Like last time the
>>>> core of the issue is that the timings don't work 100% with our
>>>> voltage regulator. So from time to time, the kernel will print
>>>> a warning message about the GPU not getting out of RC6. In
>>>> particular, I found this fairly easy to reproduce during
>>>> suspend/resume.
>>>>
>>>> Changing the threshold to 150000 instead of 50000 seems to fix
>>>> the issue.
>>>>
>>>> I also measured the idle power usage before/after this patch and
>>>> didn't see a difference on a sandy bridge laptop.
>>>>
>>>> Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
>>>
>>> One magic number for another with no idea what is blowing up - I fear we
>>> are just changing the frequency of the hang. I've pinged a number of snb
>>> rc6 bug reports to see if we get a bite.
>>
>> Yup, if only Intel documented those registers :)
>
> We've spammed rc6 bugs in bugzilla, one reporter says that this patch
> breaks rc6 from "sometimes it doesn't work after resume" to "always
> broken":
>
> https://bugs.freedesktop.org/show_bug.cgi?id=54089#c63
>
> So I guess I can't merge this :(

Yeah I was actually going to send an email to withdraw this patch, as
it prevents rc6 from working on some machines here. So I guess I found
out the same thing.

Stéphane

Patch
diff mbox

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index aa01128..52fe8f7 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -2577,7 +2577,7 @@  static void gen6_enable_rps(struct drm_device *dev)
 
 	I915_WRITE(GEN6_RC_SLEEP, 0);
 	I915_WRITE(GEN6_RC1e_THRESHOLD, 1000);
-	I915_WRITE(GEN6_RC6_THRESHOLD, 50000);
+	I915_WRITE(GEN6_RC6_THRESHOLD, 150000);
 	I915_WRITE(GEN6_RC6p_THRESHOLD, 150000);
 	I915_WRITE(GEN6_RC6pp_THRESHOLD, 64000); /* unused */