diff mbox

drm/i915: initialize ring frequency scaling table on SNB

Message ID 20110621152424.75f54a13@jbarnes-desktop (mailing list archive)
State New, archived
Headers show

Commit Message

Jesse Barnes June 21, 2011, 10:24 p.m. UTC
The ring frequency scaling table tells the PCU to treat certain GPU
frequencies as if they were a given CPU frequency for purposes of
scaling the ring frequency.  Normally the PCU will scale the ring
frequency based on the CPU P-state, but with the table present, it will
also take the GPU frequency into account.  The scaling_factor used in
this patch may not be ideal, but is enough to increase performance in
nexuiz on a 1366x768 panel by about 20%.

The main downside of keeping the ring frequency high while the CPU is
at a low frequency (or asleep altogether) is increased power
consumption.  But then if you're keeping your GPU busy, you probably
want the extra performance.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>

Comments

Chris Wilson June 22, 2011, 11:30 a.m. UTC | #1
On Tue, 21 Jun 2011 15:24:24 -0700, Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
> The ring frequency scaling table tells the PCU to treat certain GPU
> frequencies as if they were a given CPU frequency for purposes of
> scaling the ring frequency.  Normally the PCU will scale the ring
> frequency based on the CPU P-state, but with the table present, it will
> also take the GPU frequency into account.  The scaling_factor used in
> this patch may not be ideal, but is enough to increase performance in
> nexuiz on a 1366x768 panel by about 20%.

Am I right in thinking that the improvement offered by these new defaults
is dependent upon the previous values programmed by the BIOS? On my
desktop SNB, I only see a marginal improvement (<~1%) for games and
similarly small differences for cairo-gl/xlib.

Plus they also revealed my confusion over dev_priv->min_delay is actually
min_freq on SNB. Evil. :-p
-Chris
Jesse Barnes June 22, 2011, 4:49 p.m. UTC | #2
Chris Wilson <chris@chris-wilson.co.uk> wrote:

>On Tue, 21 Jun 2011 15:24:24 -0700, Jesse Barnes
><jbarnes@virtuousgeek.org> wrote:
>> The ring frequency scaling table tells the PCU to treat certain GPU
>> frequencies as if they were a given CPU frequency for purposes of
>> scaling the ring frequency.  Normally the PCU will scale the ring
>> frequency based on the CPU P-state, but with the table present, it
>will
>> also take the GPU frequency into account.  The scaling_factor used in
>> this patch may not be ideal, but is enough to increase performance in
>> nexuiz on a 1366x768 panel by about 20%.
>
>Am I right in thinking that the improvement offered by these new
>defaults
>is dependent upon the previous values programmed by the BIOS? On my
>desktop SNB, I only see a marginal improvement (<~1%) for games and
>similarly small differences for cairo-gl/xlib.
>
>Plus they also revealed my confusion over dev_priv->min_delay is
>actually
>min_freq on SNB. Evil. :-p
>-Chris
>
>-- 
>Chris Wilson, Intel Open Source Technology Centre

Yes, though on my machines the BIOS programs these values to 0, indicating no minimum ring freq.  The perf increase will be proportional to the bandwidth requirements not met by the existing frequency.  So for your tests, it's possible that the CPU is being kept busy enough to kee the ring freq up anyway (iow this patch is redundant for those loads) or that your workloads don't have high memory bw requirements to begin with, so the higher ring freq isn't a benefit.
Eric Anholt June 22, 2011, 6:42 p.m. UTC | #3
On Tue, 21 Jun 2011 15:24:24 -0700, Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
> The ring frequency scaling table tells the PCU to treat certain GPU
> frequencies as if they were a given CPU frequency for purposes of
> scaling the ring frequency.  Normally the PCU will scale the ring
> frequency based on the CPU P-state, but with the table present, it will
> also take the GPU frequency into account.  The scaling_factor used in
> this patch may not be ideal, but is enough to increase performance in
> nexuiz on a 1366x768 panel by about 20%.
> 
> The main downside of keeping the ring frequency high while the CPU is
> at a low frequency (or asleep altogether) is increased power
> consumption.  But then if you're keeping your GPU busy, you probably
> want the extra performance.

The intent of the patch sounds good, but it doesn't seem to be doing
anything here (graphs below).  If I run a "while true; do; done" loop to
keep a CPU busy during measurement, though, OA gets a 15.9% +/- 0.6%
win, and nexuiz around 20%.  There are no complaints in dmesg.

x /home/anholt/nexuiz-before
+ /home/anholt/nexuiz-after
+-------------------------------------------------------------------------------+
| x       +                +    x                                x             +|
||_|_______________________M____MA_____A_________________________|_________|    |
+-------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   3     59.217461     59.775295     59.479322     59.490693    0.27909057
+   3     59.287311     59.893383     59.440451     59.540382    0.31515189
No difference proven at 95.0% confidence

x /home/anholt/oa-before
+ /home/anholt/oa-after
+-------------------------------------------------------------------------------+
|x  x                           +  +    +               x      *          +    x|
|    |________________________|_________MA_______A______M__________|________|   |
+-------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5         184.2         187.2         186.3        185.72     1.3809417
+   5         185.4           187         185.7        186.04    0.71624018
No difference proven at 95.0% confidence

x /home/anholt/taiji-before
+ /home/anholt/taiji-after
+-------------------------------------------------------------------------------+
|x      x                  +     xx       +    x          + ++     +           x|
|    |____________________________A___|______________A______M_|_____|           |
+-------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   6       132.486       139.931       135.608       135.596     2.6966051
+   6       134.974       138.832       138.111     137.41183      1.434609
No difference proven at 95.0% confidence
Jesse Barnes June 22, 2011, 7:55 p.m. UTC | #4
On Wed, 22 Jun 2011 11:42:04 -0700
Eric Anholt <eric@anholt.net> wrote:

> On Tue, 21 Jun 2011 15:24:24 -0700, Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
> > The ring frequency scaling table tells the PCU to treat certain GPU
> > frequencies as if they were a given CPU frequency for purposes of
> > scaling the ring frequency.  Normally the PCU will scale the ring
> > frequency based on the CPU P-state, but with the table present, it will
> > also take the GPU frequency into account.  The scaling_factor used in
> > this patch may not be ideal, but is enough to increase performance in
> > nexuiz on a 1366x768 panel by about 20%.
> > 
> > The main downside of keeping the ring frequency high while the CPU is
> > at a low frequency (or asleep altogether) is increased power
> > consumption.  But then if you're keeping your GPU busy, you probably
> > want the extra performance.
> 
> The intent of the patch sounds good, but it doesn't seem to be doing
> anything here (graphs below).  If I run a "while true; do; done" loop to
> keep a CPU busy during measurement, though, OA gets a 15.9% +/- 0.6%
> win, and nexuiz around 20%.  There are no complaints in dmesg.
> 
> x /home/anholt/nexuiz-before
> + /home/anholt/nexuiz-after
> +-------------------------------------------------------------------------------+
> | x       +                +    x                                x             +|
> ||_|_______________________M____MA_____A_________________________|_________|    |
> +-------------------------------------------------------------------------------+
>     N           Min           Max        Median           Avg        Stddev
> x   3     59.217461     59.775295     59.479322     59.490693    0.27909057
> +   3     59.287311     59.893383     59.440451     59.540382    0.31515189
> No difference proven at 95.0% confidence
> 
> x /home/anholt/oa-before
> + /home/anholt/oa-after
> +-------------------------------------------------------------------------------+
> |x  x                           +  +    +               x      *          +    x|
> |    |________________________|_________MA_______A______M__________|________|   |
> +-------------------------------------------------------------------------------+
>     N           Min           Max        Median           Avg        Stddev
> x   5         184.2         187.2         186.3        185.72     1.3809417
> +   5         185.4           187         185.7        186.04    0.71624018
> No difference proven at 95.0% confidence
> 
> x /home/anholt/taiji-before
> + /home/anholt/taiji-after
> +-------------------------------------------------------------------------------+
> |x      x                  +     xx       +    x          + ++     +           x|
> |    |____________________________A___|______________A______M_|_____|           |
> +-------------------------------------------------------------------------------+
>     N           Min           Max        Median           Avg        Stddev
> x   6       132.486       139.931       135.608       135.596     2.6966051
> +   6       134.974       138.832       138.111     137.41183      1.434609
> No difference proven at 95.0% confidence
Jesse Barnes June 22, 2011, 7:56 p.m. UTC | #5
On Wed, 22 Jun 2011 11:42:04 -0700
Eric Anholt <eric@anholt.net> wrote:

> On Tue, 21 Jun 2011 15:24:24 -0700, Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
> > The ring frequency scaling table tells the PCU to treat certain GPU
> > frequencies as if they were a given CPU frequency for purposes of
> > scaling the ring frequency.  Normally the PCU will scale the ring
> > frequency based on the CPU P-state, but with the table present, it will
> > also take the GPU frequency into account.  The scaling_factor used in
> > this patch may not be ideal, but is enough to increase performance in
> > nexuiz on a 1366x768 panel by about 20%.
> > 
> > The main downside of keeping the ring frequency high while the CPU is
> > at a low frequency (or asleep altogether) is increased power
> > consumption.  But then if you're keeping your GPU busy, you probably
> > want the extra performance.
> 
> The intent of the patch sounds good, but it doesn't seem to be doing
> anything here (graphs below).  If I run a "while true; do; done" loop to
> keep a CPU busy during measurement, though, OA gets a 15.9% +/- 0.6%
> win, and nexuiz around 20%.  There are no complaints in dmesg.

Ok I'll checck it out; I admit I tested it then made a few changes then
sent it out, so I must have broken something...

Jesse
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 2f967af..15ee639 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -3433,7 +3433,8 @@ 
 #define GEN6_PCODE_MAILBOX			0x138124
 #define   GEN6_PCODE_READY			(1<<31)
 #define   GEN6_READ_OC_PARAMS			0xc
-#define   GEN6_PCODE_WRITE_MIN_FREQ_TABLE	0x9
+#define   GEN6_PCODE_WRITE_MIN_FREQ_TABLE	0x8
 #define GEN6_PCODE_DATA				0x138128
+#define   GEN6_PCODE_FREQ_IA_RATIO_SHIFT	8
 
 #endif /* _I915_REG_H_ */
diff --git a/drivers/gpu/drm/i915/i915_suspend.c b/drivers/gpu/drm/i915/i915_suspend.c
index 60a94d2..03f0fac 100644
--- a/drivers/gpu/drm/i915/i915_suspend.c
+++ b/drivers/gpu/drm/i915/i915_suspend.c
@@ -870,8 +870,10 @@  int i915_restore_state(struct drm_device *dev)
 		intel_init_emon(dev);
 	}
 
-	if (IS_GEN6(dev))
+	if (IS_GEN6(dev)) {
 		gen6_enable_rps(dev_priv);
+		gen6_update_ring_freq(dev_priv);
+	}
 
 	/* Cache mode state */
 	I915_WRITE (CACHE_MODE_0, dev_priv->saveCACHE_MODE_0 | 0xffff0000);
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 86a3ec1..05c28eb 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -7159,6 +7159,46 @@  void gen6_enable_rps(struct drm_i915_private *dev_priv)
 	mutex_unlock(&dev_priv->dev->struct_mutex);
 }
 
+void gen6_update_ring_freq(struct drm_i915_private *dev_priv)
+{
+	int min_freq = 15, max_freq = 32;
+	int gpu_freq, ia_freq;
+	int scaling_factor = 180;
+
+	mutex_lock(&dev_priv->dev->struct_mutex);
+	gen6_gt_force_wake_get(dev_priv);
+
+	/*
+	 * For each potential GPU frequency, load a ring frequency we'd like
+	 * to use for memory access.  We do this by specifying the IA frequency
+	 * the PCU should use as a reference to determine the ring frequency.
+	 */
+	for (gpu_freq = dev_priv->min_delay; gpu_freq <= dev_priv->max_delay;
+	     gpu_freq++) {
+		if (gpu_freq < min_freq) /* Use 800MHz min IA reference freq */
+			ia_freq = 800;
+		else if (gpu_freq > max_freq) /* Clamp to 3GHz IA ref. freq */
+			ia_freq = 3000;
+		else
+			ia_freq = (gpu_freq * scaling_factor) / 2;
+		ia_freq = DIV_ROUND_UP(ia_freq, 100);
+
+		I915_WRITE(GEN6_PCODE_DATA,
+			   (ia_freq << GEN6_PCODE_FREQ_IA_RATIO_SHIFT) |
+			   gpu_freq);
+		I915_WRITE(GEN6_PCODE_MAILBOX, GEN6_PCODE_READY |
+			   GEN6_PCODE_WRITE_MIN_FREQ_TABLE);
+		if (wait_for((I915_READ(GEN6_PCODE_MAILBOX) &
+			      GEN6_PCODE_READY) == 0, 10)) {
+			DRM_ERROR("pcode write of freq table timed out\n");
+			continue;
+		}
+	}
+
+	gen6_gt_force_wake_put(dev_priv);
+	mutex_unlock(&dev_priv->dev->struct_mutex);
+}
+
 static void ironlake_init_clock_gating(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -7777,8 +7817,10 @@  void intel_modeset_init(struct drm_device *dev)
 		intel_init_emon(dev);
 	}
 
-	if (IS_GEN6(dev))
+	if (IS_GEN6(dev)) {
 		gen6_enable_rps(dev_priv);
+		gen6_update_ring_freq(dev_priv);
+	}
 
 	INIT_WORK(&dev_priv->idle_work, intel_idle_update);
 	setup_timer(&dev_priv->idle_timer, intel_gpu_idle_timer,
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 9ffa61e..8ac3bd8 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -317,6 +317,7 @@  extern void intel_enable_clock_gating(struct drm_device *dev);
 extern void ironlake_enable_drps(struct drm_device *dev);
 extern void ironlake_disable_drps(struct drm_device *dev);
 extern void gen6_enable_rps(struct drm_i915_private *dev_priv);
+extern void gen6_update_ring_freq(struct drm_i915_private *dev_priv);
 extern void gen6_disable_rps(struct drm_device *dev);
 extern void intel_init_emon(struct drm_device *dev);