Message ID | 20110621152424.75f54a13@jbarnes-desktop (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, 21 Jun 2011 15:24:24 -0700, Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > The ring frequency scaling table tells the PCU to treat certain GPU > frequencies as if they were a given CPU frequency for purposes of > scaling the ring frequency. Normally the PCU will scale the ring > frequency based on the CPU P-state, but with the table present, it will > also take the GPU frequency into account. The scaling_factor used in > this patch may not be ideal, but is enough to increase performance in > nexuiz on a 1366x768 panel by about 20%. Am I right in thinking that the improvement offered by these new defaults is dependent upon the previous values programmed by the BIOS? On my desktop SNB, I only see a marginal improvement (<~1%) for games and similarly small differences for cairo-gl/xlib. Plus they also revealed my confusion over dev_priv->min_delay is actually min_freq on SNB. Evil. :-p -Chris
Chris Wilson <chris@chris-wilson.co.uk> wrote: >On Tue, 21 Jun 2011 15:24:24 -0700, Jesse Barnes ><jbarnes@virtuousgeek.org> wrote: >> The ring frequency scaling table tells the PCU to treat certain GPU >> frequencies as if they were a given CPU frequency for purposes of >> scaling the ring frequency. Normally the PCU will scale the ring >> frequency based on the CPU P-state, but with the table present, it >will >> also take the GPU frequency into account. The scaling_factor used in >> this patch may not be ideal, but is enough to increase performance in >> nexuiz on a 1366x768 panel by about 20%. > >Am I right in thinking that the improvement offered by these new >defaults >is dependent upon the previous values programmed by the BIOS? On my >desktop SNB, I only see a marginal improvement (<~1%) for games and >similarly small differences for cairo-gl/xlib. > >Plus they also revealed my confusion over dev_priv->min_delay is >actually >min_freq on SNB. Evil. :-p >-Chris > >-- >Chris Wilson, Intel Open Source Technology Centre Yes, though on my machines the BIOS programs these values to 0, indicating no minimum ring freq. The perf increase will be proportional to the bandwidth requirements not met by the existing frequency. So for your tests, it's possible that the CPU is being kept busy enough to kee the ring freq up anyway (iow this patch is redundant for those loads) or that your workloads don't have high memory bw requirements to begin with, so the higher ring freq isn't a benefit.
On Tue, 21 Jun 2011 15:24:24 -0700, Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > The ring frequency scaling table tells the PCU to treat certain GPU > frequencies as if they were a given CPU frequency for purposes of > scaling the ring frequency. Normally the PCU will scale the ring > frequency based on the CPU P-state, but with the table present, it will > also take the GPU frequency into account. The scaling_factor used in > this patch may not be ideal, but is enough to increase performance in > nexuiz on a 1366x768 panel by about 20%. > > The main downside of keeping the ring frequency high while the CPU is > at a low frequency (or asleep altogether) is increased power > consumption. But then if you're keeping your GPU busy, you probably > want the extra performance. The intent of the patch sounds good, but it doesn't seem to be doing anything here (graphs below). If I run a "while true; do; done" loop to keep a CPU busy during measurement, though, OA gets a 15.9% +/- 0.6% win, and nexuiz around 20%. There are no complaints in dmesg. x /home/anholt/nexuiz-before + /home/anholt/nexuiz-after +-------------------------------------------------------------------------------+ | x + + x x +| ||_|_______________________M____MA_____A_________________________|_________| | +-------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 3 59.217461 59.775295 59.479322 59.490693 0.27909057 + 3 59.287311 59.893383 59.440451 59.540382 0.31515189 No difference proven at 95.0% confidence x /home/anholt/oa-before + /home/anholt/oa-after +-------------------------------------------------------------------------------+ |x x + + + x * + x| | |________________________|_________MA_______A______M__________|________| | +-------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 5 184.2 187.2 186.3 185.72 1.3809417 + 5 185.4 187 185.7 186.04 0.71624018 No difference proven at 95.0% confidence x /home/anholt/taiji-before + /home/anholt/taiji-after +-------------------------------------------------------------------------------+ |x x + xx + x + ++ + x| | |____________________________A___|______________A______M_|_____| | +-------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 6 132.486 139.931 135.608 135.596 2.6966051 + 6 134.974 138.832 138.111 137.41183 1.434609 No difference proven at 95.0% confidence
On Wed, 22 Jun 2011 11:42:04 -0700 Eric Anholt <eric@anholt.net> wrote: > On Tue, 21 Jun 2011 15:24:24 -0700, Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > > The ring frequency scaling table tells the PCU to treat certain GPU > > frequencies as if they were a given CPU frequency for purposes of > > scaling the ring frequency. Normally the PCU will scale the ring > > frequency based on the CPU P-state, but with the table present, it will > > also take the GPU frequency into account. The scaling_factor used in > > this patch may not be ideal, but is enough to increase performance in > > nexuiz on a 1366x768 panel by about 20%. > > > > The main downside of keeping the ring frequency high while the CPU is > > at a low frequency (or asleep altogether) is increased power > > consumption. But then if you're keeping your GPU busy, you probably > > want the extra performance. > > The intent of the patch sounds good, but it doesn't seem to be doing > anything here (graphs below). If I run a "while true; do; done" loop to > keep a CPU busy during measurement, though, OA gets a 15.9% +/- 0.6% > win, and nexuiz around 20%. There are no complaints in dmesg. > > x /home/anholt/nexuiz-before > + /home/anholt/nexuiz-after > +-------------------------------------------------------------------------------+ > | x + + x x +| > ||_|_______________________M____MA_____A_________________________|_________| | > +-------------------------------------------------------------------------------+ > N Min Max Median Avg Stddev > x 3 59.217461 59.775295 59.479322 59.490693 0.27909057 > + 3 59.287311 59.893383 59.440451 59.540382 0.31515189 > No difference proven at 95.0% confidence > > x /home/anholt/oa-before > + /home/anholt/oa-after > +-------------------------------------------------------------------------------+ > |x x + + + x * + x| > | |________________________|_________MA_______A______M__________|________| | > +-------------------------------------------------------------------------------+ > N Min Max Median Avg Stddev > x 5 184.2 187.2 186.3 185.72 1.3809417 > + 5 185.4 187 185.7 186.04 0.71624018 > No difference proven at 95.0% confidence > > x /home/anholt/taiji-before > + /home/anholt/taiji-after > +-------------------------------------------------------------------------------+ > |x x + xx + x + ++ + x| > | |____________________________A___|______________A______M_|_____| | > +-------------------------------------------------------------------------------+ > N Min Max Median Avg Stddev > x 6 132.486 139.931 135.608 135.596 2.6966051 > + 6 134.974 138.832 138.111 137.41183 1.434609 > No difference proven at 95.0% confidence
On Wed, 22 Jun 2011 11:42:04 -0700 Eric Anholt <eric@anholt.net> wrote: > On Tue, 21 Jun 2011 15:24:24 -0700, Jesse Barnes <jbarnes@virtuousgeek.org> wrote: > > The ring frequency scaling table tells the PCU to treat certain GPU > > frequencies as if they were a given CPU frequency for purposes of > > scaling the ring frequency. Normally the PCU will scale the ring > > frequency based on the CPU P-state, but with the table present, it will > > also take the GPU frequency into account. The scaling_factor used in > > this patch may not be ideal, but is enough to increase performance in > > nexuiz on a 1366x768 panel by about 20%. > > > > The main downside of keeping the ring frequency high while the CPU is > > at a low frequency (or asleep altogether) is increased power > > consumption. But then if you're keeping your GPU busy, you probably > > want the extra performance. > > The intent of the patch sounds good, but it doesn't seem to be doing > anything here (graphs below). If I run a "while true; do; done" loop to > keep a CPU busy during measurement, though, OA gets a 15.9% +/- 0.6% > win, and nexuiz around 20%. There are no complaints in dmesg. Ok I'll checck it out; I admit I tested it then made a few changes then sent it out, so I must have broken something... Jesse
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 2f967af..15ee639 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -3433,7 +3433,8 @@ #define GEN6_PCODE_MAILBOX 0x138124 #define GEN6_PCODE_READY (1<<31) #define GEN6_READ_OC_PARAMS 0xc -#define GEN6_PCODE_WRITE_MIN_FREQ_TABLE 0x9 +#define GEN6_PCODE_WRITE_MIN_FREQ_TABLE 0x8 #define GEN6_PCODE_DATA 0x138128 +#define GEN6_PCODE_FREQ_IA_RATIO_SHIFT 8 #endif /* _I915_REG_H_ */ diff --git a/drivers/gpu/drm/i915/i915_suspend.c b/drivers/gpu/drm/i915/i915_suspend.c index 60a94d2..03f0fac 100644 --- a/drivers/gpu/drm/i915/i915_suspend.c +++ b/drivers/gpu/drm/i915/i915_suspend.c @@ -870,8 +870,10 @@ int i915_restore_state(struct drm_device *dev) intel_init_emon(dev); } - if (IS_GEN6(dev)) + if (IS_GEN6(dev)) { gen6_enable_rps(dev_priv); + gen6_update_ring_freq(dev_priv); + } /* Cache mode state */ I915_WRITE (CACHE_MODE_0, dev_priv->saveCACHE_MODE_0 | 0xffff0000); diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 86a3ec1..05c28eb 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -7159,6 +7159,46 @@ void gen6_enable_rps(struct drm_i915_private *dev_priv) mutex_unlock(&dev_priv->dev->struct_mutex); } +void gen6_update_ring_freq(struct drm_i915_private *dev_priv) +{ + int min_freq = 15, max_freq = 32; + int gpu_freq, ia_freq; + int scaling_factor = 180; + + mutex_lock(&dev_priv->dev->struct_mutex); + gen6_gt_force_wake_get(dev_priv); + + /* + * For each potential GPU frequency, load a ring frequency we'd like + * to use for memory access. We do this by specifying the IA frequency + * the PCU should use as a reference to determine the ring frequency. + */ + for (gpu_freq = dev_priv->min_delay; gpu_freq <= dev_priv->max_delay; + gpu_freq++) { + if (gpu_freq < min_freq) /* Use 800MHz min IA reference freq */ + ia_freq = 800; + else if (gpu_freq > max_freq) /* Clamp to 3GHz IA ref. freq */ + ia_freq = 3000; + else + ia_freq = (gpu_freq * scaling_factor) / 2; + ia_freq = DIV_ROUND_UP(ia_freq, 100); + + I915_WRITE(GEN6_PCODE_DATA, + (ia_freq << GEN6_PCODE_FREQ_IA_RATIO_SHIFT) | + gpu_freq); + I915_WRITE(GEN6_PCODE_MAILBOX, GEN6_PCODE_READY | + GEN6_PCODE_WRITE_MIN_FREQ_TABLE); + if (wait_for((I915_READ(GEN6_PCODE_MAILBOX) & + GEN6_PCODE_READY) == 0, 10)) { + DRM_ERROR("pcode write of freq table timed out\n"); + continue; + } + } + + gen6_gt_force_wake_put(dev_priv); + mutex_unlock(&dev_priv->dev->struct_mutex); +} + static void ironlake_init_clock_gating(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; @@ -7777,8 +7817,10 @@ void intel_modeset_init(struct drm_device *dev) intel_init_emon(dev); } - if (IS_GEN6(dev)) + if (IS_GEN6(dev)) { gen6_enable_rps(dev_priv); + gen6_update_ring_freq(dev_priv); + } INIT_WORK(&dev_priv->idle_work, intel_idle_update); setup_timer(&dev_priv->idle_timer, intel_gpu_idle_timer, diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h index 9ffa61e..8ac3bd8 100644 --- a/drivers/gpu/drm/i915/intel_drv.h +++ b/drivers/gpu/drm/i915/intel_drv.h @@ -317,6 +317,7 @@ extern void intel_enable_clock_gating(struct drm_device *dev); extern void ironlake_enable_drps(struct drm_device *dev); extern void ironlake_disable_drps(struct drm_device *dev); extern void gen6_enable_rps(struct drm_i915_private *dev_priv); +extern void gen6_update_ring_freq(struct drm_i915_private *dev_priv); extern void gen6_disable_rps(struct drm_device *dev); extern void intel_init_emon(struct drm_device *dev);
The ring frequency scaling table tells the PCU to treat certain GPU frequencies as if they were a given CPU frequency for purposes of scaling the ring frequency. Normally the PCU will scale the ring frequency based on the CPU P-state, but with the table present, it will also take the GPU frequency into account. The scaling_factor used in this patch may not be ideal, but is enough to increase performance in nexuiz on a 1366x768 panel by about 20%. The main downside of keeping the ring frequency high while the CPU is at a low frequency (or asleep altogether) is increased power consumption. But then if you're keeping your GPU busy, you probably want the extra performance. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>