diff mbox

drm/i915: set cache sharing policy to max sharing on SNB+

Message ID 1309563307-5480-1-git-send-email-jbarnes@virtuousgeek.org (mailing list archive)
State New, archived
Headers show

Commit Message

Jesse Barnes July 1, 2011, 11:35 p.m. UTC
By default, the GPU will only share a very small portion of the CPU
cache.  With this change, both the GPU and CPU will have full access to
the cache, which should help (sometimes a lot) in most cases.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
---
 drivers/gpu/drm/i915/i915_reg.h      |    5 +++++
 drivers/gpu/drm/i915/intel_display.c |    7 +++++++
 2 files changed, 12 insertions(+), 0 deletions(-)

Comments

Chris Wilson July 2, 2011, 7:20 a.m. UTC | #1
On Fri,  1 Jul 2011 16:35:07 -0700, Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
> By default, the GPU will only share a very small portion of the CPU
> cache.  With this change, both the GPU and CPU will have full access to
> the cache, which should help (sometimes a lot) in most cases.

What's the trade off? Is the GPU data in the cache treated differently
than CPU data when it comes to cache eviction and so this asymmetrically
hurts CPU bound applications? At the least it will force more CPU data out
of the cache, which will be enough to make some people scream and howl.

Do we want to expose this as a parameter whilst we test various
configurations? Is this just yet a another step on the path to a
coordinated cpu-gpu governor?
-Chris
Chris Wilson July 2, 2011, 10:47 a.m. UTC | #2
On Fri,  1 Jul 2011 16:35:07 -0700, Jesse Barnes <jbarnes@virtuousgeek.org> wrote:
> By default, the GPU will only share a very small portion of the CPU
> cache.  With this change, both the GPU and CPU will have full access to
> the cache, which should help (sometimes a lot) in most cases.

Joy, this looks to be at best a mixed blessing. For CPU bound games like
padman, it degrades performance by about 5% on my desktop SNB. But for
nexuiz, there appears to be little change. The ddx shows further regression
of the order of 10%. The immediate suspect is that it hurts the use of
pixman for trapezoid mask generation, which whilst being less than ideal
behaviour and will be fixed in the near future, is indicative of the sort
of negative impact this change will have on CPU-memory bound applications.
Conversely the equivalent spans-based code is about the only example I
found that is sped up by the patch, by about 3%.

Having just checked up on 0x900c, I'm even more confused. From my old
specs, the register is SNPCR, the snoop control register, which makes
more sense than MBC, and that 0<<21 is for the maximum uncore resources,
the default setting and the default on my SNB, with 1<<21 being the
medium setting. Now, the only reference I have is the register dump with
no explanation of what the resource that is actually being controlled...
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 4a446b1..eac59f1 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -78,6 +78,11 @@ 
 #define  GRDOM_RENDER	(1<<2)
 #define  GRDOM_MEDIA	(3<<2)
 
+#define GEN6_MBCUNIT_CFG	0x900c /* for LLC config */
+#define   GEN6_MBC_LLC_CFG_MASK	(3<<21)
+#define   GEN6_MBC_LLC_CFG_FULL	(1<<21) /* full sharing of 16/16ths of the cache */
+#define   GEN6_MBC_LLC_CFG_MIN	(3<<21) /* only 1/16th of the cache is shared */
+
 #define GEN6_GDRST	0x941c
 #define  GEN6_GRDOM_FULL		(1 << 0)
 #define  GEN6_GRDOM_RENDER		(1 << 1)
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 823b8d9..0ed4ed2 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -7279,6 +7279,7 @@  void gen6_update_ring_freq(struct drm_i915_private *dev_priv)
 	int min_freq = 15;
 	int gpu_freq, ia_freq, max_ia_freq;
 	int scaling_factor = 180;
+	u32 mbccfg;
 
 	max_ia_freq = cpufreq_quick_get_max(0);
 	/*
@@ -7293,6 +7294,12 @@  void gen6_update_ring_freq(struct drm_i915_private *dev_priv)
 
 	mutex_lock(&dev_priv->dev->struct_mutex);
 
+	/* Update the cache sharing policy here as well */
+	mbccfg = I915_READ(GEN6_MBCUNIT_CFG);
+	mbccfg &= ~GEN6_MBC_LLC_CFG_MASK;
+	mbccfg |= GEN6_MBC_LLC_CFG_FULL;
+	I915_WRITE(GEN6_MBCUNIT_CFG, mbccfg);
+
 	/*
 	 * For each potential GPU frequency, load a ring frequency we'd like
 	 * to use for memory access.  We do this by specifying the IA frequency