From patchwork Wed Apr 10 19:22:23 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 2422991 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by patchwork1.kernel.org (Postfix) with ESMTP id 7382B3FCA5 for ; Wed, 10 Apr 2013 19:22:46 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 62858E6496 for ; Wed, 10 Apr 2013 12:22:46 -0700 (PDT) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (s16502780.onlinehome-server.info [87.106.93.118]) by gabe.freedesktop.org (Postfix) with ESMTP id E1425E64D1 for ; Wed, 10 Apr 2013 12:22:35 -0700 (PDT) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.73.22; Received: from arrandale.alporthouse.com (unverified [78.156.73.22]) by fireflyinternet.com (Firefly Internet (M2)) with ESMTP id 3657781-1500048 for multiple; Wed, 10 Apr 2013 20:22:52 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Wed, 10 Apr 2013 20:22:23 +0100 Message-Id: <1365621743-2761-1-git-send-email-chris@chris-wilson.co.uk> X-Mailer: git-send-email 1.7.10.4 X-Originating-IP: 78.156.73.22 Subject: [Intel-gfx] [PATCH] drm/i915: Scale ring, rather than ia, frequency on Haswell X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org Haswell introduces a separate frequency domain for the ring (uncore). So where we used to increase the CPU (IA) clock with GPU busyness, we now need to scale the ring frequency directly instead. As the ring limits our memory bandwidth, it is vital for performance that when the GPU is busy, we increase the frequency of the ring to increase the available memory bandwidth. Signed-off-by: Chris Wilson Cc: Jesse Barnes --- drivers/gpu/drm/i915/i915_debugfs.c | 7 +++++-- drivers/gpu/drm/i915/i915_reg.h | 4 ++++ drivers/gpu/drm/i915/intel_pm.c | 39 ++++++++++++++++++++++------------- 3 files changed, 34 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 7da45aa..6220d97 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -1357,7 +1357,7 @@ static int i915_ring_freq_table(struct seq_file *m, void *unused) if (ret) return ret; - seq_printf(m, "GPU freq (MHz)\tEffective CPU freq (MHz)\n"); + seq_printf(m, "GPU freq (MHz)\tEffective CPU freq (MHz)\tEffective Ring freq (MHz)\n"); for (gpu_freq = dev_priv->rps.min_delay; gpu_freq <= dev_priv->rps.max_delay; @@ -1366,7 +1366,10 @@ static int i915_ring_freq_table(struct seq_file *m, void *unused) sandybridge_pcode_read(dev_priv, GEN6_PCODE_READ_MIN_FREQ_TABLE, &ia_freq); - seq_printf(m, "%d\t\t%d\n", gpu_freq * GT_FREQUENCY_MULTIPLIER, ia_freq * 100); + seq_printf(m, "%d\t\t%d\t\t\t\t%d\n", + gpu_freq * GT_FREQUENCY_MULTIPLIER, + ((ia_freq >> 0) & 0xff) * 100, + ((ia_freq >> 8) & 0xff) * 100); } mutex_unlock(&dev_priv->rps.hw_lock); diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index e0fc070..077d40f 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -1210,6 +1210,9 @@ #define MCHBAR_MIRROR_BASE_SNB 0x140000 +/* Memory controller frequency in MCHBAR for Haswell (possible SNB+) */ +#define DCLK 0x5e04 + /** 915-945 and GM965 MCH register controlling DRAM channel access */ #define DCC 0x10200 #define DCC_ADDRESSING_MODE_SINGLE_CHANNEL (0 << 0) @@ -4390,6 +4393,7 @@ #define GEN6_DECODE_RC6_VID(vids) (((vids) * 5) + 245) #define GEN6_PCODE_DATA 0x138128 #define GEN6_PCODE_FREQ_IA_RATIO_SHIFT 8 +#define GEN6_PCODE_FREQ_RING_RATIO_SHIFT 16 #define VLV_IOSF_DOORBELL_REQ 0x182100 #define IOSF_DEVFN_SHIFT 24 diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index baea4fc..59c3443 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -2684,8 +2684,8 @@ static void gen6_update_ring_freq(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev->dev_private; int min_freq = 15; - int gpu_freq; - unsigned int ia_freq, max_ia_freq; + unsigned int gpu_freq; + unsigned int max_ia_freq, max_ring_freq; int scaling_factor = 180; WARN_ON(!mutex_is_locked(&dev_priv->rps.hw_lock)); @@ -2701,6 +2701,10 @@ static void gen6_update_ring_freq(struct drm_device *dev) /* Convert from kHz to MHz */ max_ia_freq /= 1000; + max_ring_freq = I915_READ(MCHBAR_MIRROR_BASE_SNB + DCLK); + /* convert DDR frequency from units of 133.3MHz to bandwidth */ + max_ring_freq = (2 * 4 * max_ring_freq + 2)/ 3; + /* * For each potential GPU frequency, load a ring frequency we'd like * to use for memory access. We do this by specifying the IA frequency @@ -2709,21 +2713,28 @@ static void gen6_update_ring_freq(struct drm_device *dev) for (gpu_freq = dev_priv->rps.max_delay; gpu_freq >= dev_priv->rps.min_delay; gpu_freq--) { int diff = dev_priv->rps.max_delay - gpu_freq; - - /* - * For GPU frequencies less than 750MHz, just use the lowest - * ring freq. - */ - if (gpu_freq < min_freq) - ia_freq = 800; - else - ia_freq = max_ia_freq - ((diff * scaling_factor) / 2); - ia_freq = DIV_ROUND_CLOSEST(ia_freq, 100); - ia_freq <<= GEN6_PCODE_FREQ_IA_RATIO_SHIFT; + unsigned int ia_freq = 0, ring_freq = 0; + + if (IS_HASWELL(dev)) { + ring_freq = (gpu_freq * 5 + 1) / 2; + ring_freq = max(max_ring_freq, gpu_freq); + } else { + /* + * For GPU frequencies less than 750MHz, + * just use the lowest ring freq. + */ + if (gpu_freq < min_freq) + ia_freq = 800; + else + ia_freq = max_ia_freq - ((diff * scaling_factor) / 2); + ia_freq = DIV_ROUND_CLOSEST(ia_freq, 100); + } sandybridge_pcode_write(dev_priv, GEN6_PCODE_WRITE_MIN_FREQ_TABLE, - ia_freq | gpu_freq); + ia_freq << GEN6_PCODE_FREQ_IA_RATIO_SHIFT | + ring_freq << GEN6_PCODE_FREQ_RING_RATIO_SHIFT | + gpu_freq); } }