diff mbox

drm/i915: Revert RPS UP_EI value for SandyBridge and IvyBridge

Message ID 1350830642-10923-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson Oct. 21, 2012, 2:44 p.m. UTC
Even though we do not use the EI mode for determining when to change GPU
frequencies for RPS, changing this value causes no up interrupts to be
generated whilst an OpenGL client runs.

Fixes regression from commit 1ee9ae3244c4789f3184c5123f3b2d7e405b3f4c
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Wed Aug 15 10:41:45 2012 +0200

    drm/i915: use hsw rps tuning values everywhere on gen6+

Reported-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Eric Anholt <eric@anholt.net>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/i915/intel_pm.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Ben Widawsky Oct. 24, 2012, 7:05 p.m. UTC | #1
On Sun, 21 Oct 2012 15:44:02 +0100
Chris Wilson <chris@chris-wilson.co.uk> wrote:

> Even though we do not use the EI mode for determining when to change GPU
> frequencies for RPS, changing this value causes no up interrupts to be
> generated whilst an OpenGL client runs.
> 
> Fixes regression from commit 1ee9ae3244c4789f3184c5123f3b2d7e405b3f4c
> Author: Daniel Vetter <daniel.vetter@ffwll.ch>
> Date:   Wed Aug 15 10:41:45 2012 +0200
> 
>     drm/i915: use hsw rps tuning values everywhere on gen6+
> 
> Reported-by: Eric Anholt <eric@anholt.net>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Eric Anholt <eric@anholt.net>
> Cc: Ben Widawsky <ben@bwidawsk.net>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: stable@vger.kernel.org
> ---
>  drivers/gpu/drm/i915/intel_pm.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 81e88c2..15b585e 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -2493,7 +2493,7 @@ static void gen6_enable_rps(struct drm_device *dev)
>  
>  	I915_WRITE(GEN6_RP_UP_THRESHOLD, 59400);
>  	I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 245000);
> -	I915_WRITE(GEN6_RP_UP_EI, 66000);
> +	I915_WRITE(GEN6_RP_UP_EI, IS_HASWELL(dev) ? 66000 : 100000);
>  	I915_WRITE(GEN6_RP_DOWN_EI, 350000);
>  
>  	I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);




I've not verified this with interrupts, but simply reading back the
current frequency using the sysfs interface.

What I've seen running xonotic on Ivybridge is that we do bump the
frequency initially, and then stops. Nearing the end of the demo, we
again raise the frequency. Note that on my system, both before, and
after this patch, I am able to get to the max GPU frequency with the
xonotic demo.

Specifically, on my IVB which has a range of 350->1100 with an RP1 of
650. I see the following (the demo is roughly 2 minutes)

without patch:
Within a few seconds we cycle up to 750
Nothing for about 30 seconds
very slowly cycle up to 1100 (*just* before the demo ends)
demo ends; throttle down to 350 quickly

with patch:
Within a few seconds we cycle up to 1000
Nothing for about 30 seconds
cycle up to 1100
demo ends; throttle down to 350 slowly

I think if this fixes someones critical issue, it's great, but
unfortunately I do not see the problem the patch claims to fix.
Furthermore, none of us can really make sense of why this has the effect
that it does, but I believe a lot of that is because the workloads we
run (in this case xonotic) are very blackbox.

Personally, on this IVB, I think the behavior before the patch is more
desirable because it stays near RP1 for a longer period of time, and
drops to RP0 quickly (but it's definitely a matter of opinion).
Jesse Barnes Dec. 20, 2012, 11:34 p.m. UTC | #2
On Wed, 24 Oct 2012 12:05:35 -0700
Ben Widawsky <ben@bwidawsk.net> wrote:

> On Sun, 21 Oct 2012 15:44:02 +0100
> Chris Wilson <chris@chris-wilson.co.uk> wrote:
> 
> > Even though we do not use the EI mode for determining when to change GPU
> > frequencies for RPS, changing this value causes no up interrupts to be
> > generated whilst an OpenGL client runs.
> > 
> > Fixes regression from commit 1ee9ae3244c4789f3184c5123f3b2d7e405b3f4c
> > Author: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Date:   Wed Aug 15 10:41:45 2012 +0200
> > 
> >     drm/i915: use hsw rps tuning values everywhere on gen6+
> > 
> > Reported-by: Eric Anholt <eric@anholt.net>
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Eric Anholt <eric@anholt.net>
> > Cc: Ben Widawsky <ben@bwidawsk.net>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: stable@vger.kernel.org
> > ---
> >  drivers/gpu/drm/i915/intel_pm.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > index 81e88c2..15b585e 100644
> > --- a/drivers/gpu/drm/i915/intel_pm.c
> > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > @@ -2493,7 +2493,7 @@ static void gen6_enable_rps(struct drm_device *dev)
> >  
> >  	I915_WRITE(GEN6_RP_UP_THRESHOLD, 59400);
> >  	I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 245000);
> > -	I915_WRITE(GEN6_RP_UP_EI, 66000);
> > +	I915_WRITE(GEN6_RP_UP_EI, IS_HASWELL(dev) ? 66000 : 100000);
> >  	I915_WRITE(GEN6_RP_DOWN_EI, 350000);
> >  
> >  	I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);
> 
> 
> 
> 
> I've not verified this with interrupts, but simply reading back the
> current frequency using the sysfs interface.
> 
> What I've seen running xonotic on Ivybridge is that we do bump the
> frequency initially, and then stops. Nearing the end of the demo, we
> again raise the frequency. Note that on my system, both before, and
> after this patch, I am able to get to the max GPU frequency with the
> xonotic demo.
> 
> Specifically, on my IVB which has a range of 350->1100 with an RP1 of
> 650. I see the following (the demo is roughly 2 minutes)
> 
> without patch:
> Within a few seconds we cycle up to 750
> Nothing for about 30 seconds
> very slowly cycle up to 1100 (*just* before the demo ends)
> demo ends; throttle down to 350 quickly
> 
> with patch:
> Within a few seconds we cycle up to 1000
> Nothing for about 30 seconds
> cycle up to 1100
> demo ends; throttle down to 350 slowly
> 
> I think if this fixes someones critical issue, it's great, but
> unfortunately I do not see the problem the patch claims to fix.
> Furthermore, none of us can really make sense of why this has the effect
> that it does, but I believe a lot of that is because the workloads we
> run (in this case xonotic) are very blackbox.
> 
> Personally, on this IVB, I think the behavior before the patch is more
> desirable because it stays near RP1 for a longer period of time, and
> drops to RP0 quickly (but it's definitely a matter of opinion).

It's a balance between power and performance.  Running at the higher
freqs is definitely less power efficient, but without this patch we
definitely have a performance regression (running at 750 instead of
1000 MHz for most of the demo, plus the cases Chris saw).

But we also don't want to prevent RC6 entry on vulnerable systems, so
maybe we need two sets of values or a broader set of changes that work
better for a swath of workloads.

Also, the initial patch to use the HSW values implies that the values
are applicable across generations.  They're not.  They're very specific
to a given generation and potentially even different versions of a
given generation, so sharing the values is probably a bad idea in
general.
Ben Widawsky Dec. 21, 2012, 1:57 a.m. UTC | #3
On Thu, Dec 20, 2012 at 03:34:11PM -0800, Jesse Barnes wrote:
> On Wed, 24 Oct 2012 12:05:35 -0700
> Ben Widawsky <ben@bwidawsk.net> wrote:
> 
> > On Sun, 21 Oct 2012 15:44:02 +0100
> > Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > 
> > > Even though we do not use the EI mode for determining when to change GPU
> > > frequencies for RPS, changing this value causes no up interrupts to be
> > > generated whilst an OpenGL client runs.
> > > 
> > > Fixes regression from commit 1ee9ae3244c4789f3184c5123f3b2d7e405b3f4c
> > > Author: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Date:   Wed Aug 15 10:41:45 2012 +0200
> > > 
> > >     drm/i915: use hsw rps tuning values everywhere on gen6+
> > > 
> > > Reported-by: Eric Anholt <eric@anholt.net>
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Cc: Eric Anholt <eric@anholt.net>
> > > Cc: Ben Widawsky <ben@bwidawsk.net>
> > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > Cc: stable@vger.kernel.org
> > > ---
> > >  drivers/gpu/drm/i915/intel_pm.c |    2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > > index 81e88c2..15b585e 100644
> > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > @@ -2493,7 +2493,7 @@ static void gen6_enable_rps(struct drm_device *dev)
> > >  
> > >  	I915_WRITE(GEN6_RP_UP_THRESHOLD, 59400);
> > >  	I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 245000);
> > > -	I915_WRITE(GEN6_RP_UP_EI, 66000);
> > > +	I915_WRITE(GEN6_RP_UP_EI, IS_HASWELL(dev) ? 66000 : 100000);
> > >  	I915_WRITE(GEN6_RP_DOWN_EI, 350000);
> > >  
> > >  	I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);
> > 
> > 
> > 
> > 
> > I've not verified this with interrupts, but simply reading back the
> > current frequency using the sysfs interface.
> > 
> > What I've seen running xonotic on Ivybridge is that we do bump the
> > frequency initially, and then stops. Nearing the end of the demo, we
> > again raise the frequency. Note that on my system, both before, and
> > after this patch, I am able to get to the max GPU frequency with the
> > xonotic demo.
> > 
> > Specifically, on my IVB which has a range of 350->1100 with an RP1 of
> > 650. I see the following (the demo is roughly 2 minutes)
> > 
> > without patch:
> > Within a few seconds we cycle up to 750
> > Nothing for about 30 seconds
> > very slowly cycle up to 1100 (*just* before the demo ends)
> > demo ends; throttle down to 350 quickly
> > 
> > with patch:
> > Within a few seconds we cycle up to 1000
> > Nothing for about 30 seconds
> > cycle up to 1100
> > demo ends; throttle down to 350 slowly
> > 
> > I think if this fixes someones critical issue, it's great, but
> > unfortunately I do not see the problem the patch claims to fix.
> > Furthermore, none of us can really make sense of why this has the effect
> > that it does, but I believe a lot of that is because the workloads we
> > run (in this case xonotic) are very blackbox.
> > 
> > Personally, on this IVB, I think the behavior before the patch is more
> > desirable because it stays near RP1 for a longer period of time, and
> > drops to RP0 quickly (but it's definitely a matter of opinion).
> 
> It's a balance between power and performance.  Running at the higher
> freqs is definitely less power efficient, but without this patch we
> definitely have a performance regression (running at 750 instead of
> 1000 MHz for most of the demo, plus the cases Chris saw).
> 
> But we also don't want to prevent RC6 entry on vulnerable systems, so
> maybe we need two sets of values or a broader set of changes that work
> better for a swath of workloads.

To me this sounds like an idea which is better on paper than in
practice. Unless we can generalize the various sets of values with knobs
for users to pick from, it will really be overwhelming (I feel), and
will probably provide more bad data than good. Fine grained knobs are
already given via registers, and really pestering users can toy with
those, if they want. Perhaps it's on us to add better comments to
i915_reg.h to explain how we think it works a bit better?

> 
> Also, the initial patch to use the HSW values implies that the values
> are applicable across generations.  They're not.  They're very specific
> to a given generation and potentially even different versions of a
> given generation, so sharing the values is probably a bad idea in
> general.
> 
> -- 
> Jesse Barnes, Intel Open Source Technology Center

To set the record straight, I am pretty sure we can agree on a few
things.
1. We don't really know wtf.
2. There will be no globally good set of values.
2a. There may even be no globally decent set of values.
3. 1 workload is insufficient to determine anything

If a workload is obviously too GPU render heavy, I think a decent
solution is to just use the sysfs parameters to force a specific min. We
don't really need to be skittish about setting this as really *most*
things are too GPU render heavy for us at the moment. For the general
case I'll assert the performance 'profile' is the right thing unless
someone proves otherwise, and here we just want the set of values which
throttle down faster than they throttle up.

I think there is still some value in the proof of concept stuff you were
hoping to do around a system wide GPU power governor thing. Maybe we can
just set the RPS autothrottle values to be as dumb as possible, and then
use that governor to set the min/max frequencies as it sees fit. Another
similar option is to try to use SW RC6. If we had an intern, I'd even
suggest to pick some decent ranges for all the thresholds and have that
intern run benchmarks with all the different values.

Okay, done wasting your time now..
Ben Widawsky Dec. 21, 2012, 2:07 a.m. UTC | #4
On Thu, 20 Dec 2012 17:57:04 -0800
Ben Widawsky <ben@bwidawsk.net> wrote:

> On Thu, Dec 20, 2012 at 03:34:11PM -0800, Jesse Barnes wrote:
> > On Wed, 24 Oct 2012 12:05:35 -0700
> > Ben Widawsky <ben@bwidawsk.net> wrote:
> > 
> > > On Sun, 21 Oct 2012 15:44:02 +0100
> > > Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > 
> > > > Even though we do not use the EI mode for determining when to
> > > > change GPU frequencies for RPS, changing this value causes no
> > > > up interrupts to be generated whilst an OpenGL client runs.
> > > > 
> > > > Fixes regression from commit
> > > > 1ee9ae3244c4789f3184c5123f3b2d7e405b3f4c Author: Daniel Vetter
> > > > <daniel.vetter@ffwll.ch> Date:   Wed Aug 15 10:41:45 2012 +0200
> > > > 
> > > >     drm/i915: use hsw rps tuning values everywhere on gen6+
> > > > 
> > > > Reported-by: Eric Anholt <eric@anholt.net>
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Cc: Eric Anholt <eric@anholt.net>
> > > > Cc: Ben Widawsky <ben@bwidawsk.net>
> > > > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > > > Cc: stable@vger.kernel.org
> > > > ---
> > > >  drivers/gpu/drm/i915/intel_pm.c |    2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c
> > > > b/drivers/gpu/drm/i915/intel_pm.c index 81e88c2..15b585e 100644
> > > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > > @@ -2493,7 +2493,7 @@ static void gen6_enable_rps(struct
> > > > drm_device *dev) 
> > > >  	I915_WRITE(GEN6_RP_UP_THRESHOLD, 59400);
> > > >  	I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 245000);
> > > > -	I915_WRITE(GEN6_RP_UP_EI, 66000);
> > > > +	I915_WRITE(GEN6_RP_UP_EI, IS_HASWELL(dev) ? 66000 :
> > > > 100000); I915_WRITE(GEN6_RP_DOWN_EI, 350000);
> > > >  
> > > >  	I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);
> > > 
> > > 
> > > 
> > > 
> > > I've not verified this with interrupts, but simply reading back
> > > the current frequency using the sysfs interface.
> > > 
> > > What I've seen running xonotic on Ivybridge is that we do bump the
> > > frequency initially, and then stops. Nearing the end of the demo,
> > > we again raise the frequency. Note that on my system, both
> > > before, and after this patch, I am able to get to the max GPU
> > > frequency with the xonotic demo.
> > > 
> > > Specifically, on my IVB which has a range of 350->1100 with an
> > > RP1 of 650. I see the following (the demo is roughly 2 minutes)
> > > 
> > > without patch:
> > > Within a few seconds we cycle up to 750
> > > Nothing for about 30 seconds
> > > very slowly cycle up to 1100 (*just* before the demo ends)
> > > demo ends; throttle down to 350 quickly
> > > 
> > > with patch:
> > > Within a few seconds we cycle up to 1000
> > > Nothing for about 30 seconds
> > > cycle up to 1100
> > > demo ends; throttle down to 350 slowly
> > > 
> > > I think if this fixes someones critical issue, it's great, but
> > > unfortunately I do not see the problem the patch claims to fix.
> > > Furthermore, none of us can really make sense of why this has the
> > > effect that it does, but I believe a lot of that is because the
> > > workloads we run (in this case xonotic) are very blackbox.
> > > 
> > > Personally, on this IVB, I think the behavior before the patch is
> > > more desirable because it stays near RP1 for a longer period of
> > > time, and drops to RP0 quickly (but it's definitely a matter of
> > > opinion).
> > 
> > It's a balance between power and performance.  Running at the higher
> > freqs is definitely less power efficient, but without this patch we
> > definitely have a performance regression (running at 750 instead of
> > 1000 MHz for most of the demo, plus the cases Chris saw).
> > 
> > But we also don't want to prevent RC6 entry on vulnerable systems,
> > so maybe we need two sets of values or a broader set of changes
> > that work better for a swath of workloads.
> 
> To me this sounds like an idea which is better on paper than in
> practice. Unless we can generalize the various sets of values with
> knobs for users to pick from, it will really be overwhelming (I
> feel), and will probably provide more bad data than good. Fine
> grained knobs are already given via registers, and really pestering
> users can toy with those, if they want. Perhaps it's on us to add
> better comments to i915_reg.h to explain how we think it works a bit
> better?
> 
> > 
> > Also, the initial patch to use the HSW values implies that the
> > values are applicable across generations.  They're not.  They're
> > very specific to a given generation and potentially even different
> > versions of a given generation, so sharing the values is probably a
> > bad idea in general.
> > 
> > -- 
> > Jesse Barnes, Intel Open Source Technology Center
> 
> To set the record straight, I am pretty sure we can agree on a few
> things.
> 1. We don't really know wtf.
> 2. There will be no globally good set of values.
> 2a. There may even be no globally decent set of values.
> 3. 1 workload is insufficient to determine anything
> 
> If a workload is obviously too GPU render heavy, I think a decent
> solution is to just use the sysfs parameters to force a specific min.
> We don't really need to be skittish about setting this as really
> *most* things are too GPU render heavy for us at the moment. For the
> general case I'll assert the performance 'profile' is the right thing
> unless someone proves otherwise, and here we just want the set of
> values which throttle down faster than they throttle up.
> 

In the email in my head, that said "the power 'profile'"

> I think there is still some value in the proof of concept stuff you
> were hoping to do around a system wide GPU power governor thing.
> Maybe we can just set the RPS autothrottle values to be as dumb as
> possible, and then use that governor to set the min/max frequencies
> as it sees fit. Another similar option is to try to use SW RC6. If we
> had an intern, I'd even suggest to pick some decent ranges for all
> the thresholds and have that intern run benchmarks with all the
> different values.
> 
> Okay, done wasting your time now..
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 81e88c2..15b585e 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -2493,7 +2493,7 @@  static void gen6_enable_rps(struct drm_device *dev)
 
 	I915_WRITE(GEN6_RP_UP_THRESHOLD, 59400);
 	I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 245000);
-	I915_WRITE(GEN6_RP_UP_EI, 66000);
+	I915_WRITE(GEN6_RP_UP_EI, IS_HASWELL(dev) ? 66000 : 100000);
 	I915_WRITE(GEN6_RP_DOWN_EI, 350000);
 
 	I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);