From patchwork Mon Sep 23 20:33:28 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Rodrigo Vivi X-Patchwork-Id: 2929941 Return-Path: X-Original-To: patchwork-intel-gfx@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 6AEB4BFF05 for ; Mon, 23 Sep 2013 20:40:40 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id DFE0E203E6 for ; Mon, 23 Sep 2013 20:40:38 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by mail.kernel.org (Postfix) with ESMTP id 9AE16203AD for ; Mon, 23 Sep 2013 20:40:37 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6FE7AE76C2 for ; Mon, 23 Sep 2013 13:40:37 -0700 (PDT) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mail-yh0-f48.google.com (mail-yh0-f48.google.com [209.85.213.48]) by gabe.freedesktop.org (Postfix) with ESMTP id DB75EE6921 for ; Mon, 23 Sep 2013 13:34:02 -0700 (PDT) Received: by mail-yh0-f48.google.com with SMTP id f10so1560080yha.7 for ; Mon, 23 Sep 2013 13:34:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-type:content-transfer-encoding; bh=8cWn2K+loL4YuuVcThI8f4eXycEug+kamalu2xg0dhE=; b=WuAQF7TylEmlmeO4ioXYSIbPfgftJLvV5iYSzCh2ucA8Ipr/7kXMcJ8YFQR8RbfveA zsnUxHK6R2WoQvJuGVa/DJUPU5LhmJN8Y+WMIMZWggzqCb2JTKUuKBbbboRu+HBNUqvk RsHLTFGeqJ6zwY3kGa+fU+8zXXfNVbDwqEU60V64xV5CjVMt8qzuW6ZueU7tGgjbU4ZB ldESMqcgAnlfHxd95640lx/K7f+iHHZDnkbff0gHk1YnBeOs/dSS/jJVi9GbZLn3Tq3Q ZToIlWc1HoM+j0tHF9Q7owykI/rIbzzj0Qp162cApkfmpmcP+0sAVxF57aUhzBxdzC0m RhPw== X-Received: by 10.236.192.73 with SMTP id h49mr3717yhn.164.1379968442301; Mon, 23 Sep 2013 13:34:02 -0700 (PDT) Received: from localhost.localdomain ([186.204.164.107]) by mx.google.com with ESMTPSA id v96sm38875107yhp.3.1969.12.31.16.00.00 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Mon, 23 Sep 2013 13:34:01 -0700 (PDT) From: Rodrigo Vivi To: intel-gfx@lists.freedesktop.org Date: Mon, 23 Sep 2013 17:33:28 -0300 Message-Id: <1379968410-14428-12-git-send-email-rodrigo.vivi@gmail.com> X-Mailer: git-send-email 1.8.1.4 In-Reply-To: <1379968410-14428-1-git-send-email-rodrigo.vivi@gmail.com> References: <1379968410-14428-1-git-send-email-rodrigo.vivi@gmail.com> MIME-Version: 1.0 Cc: Owen Taylor , =?UTF-8?q?St=C3=A9phane=20Marchesin?= , "Zhuang, Lena" Subject: [Intel-gfx] [PATCH 11/13] drm/i915: Tweak RPS thresholds to more aggressively downclock X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org X-Spam-Status: No, score=-6.4 required=5.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, T_DKIM_INVALID, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Chris Wilson After applying wait-boost we often find ourselves stuck at higher clocks than required. The current threshold value requires the GPU to be continuously and completely idle for 313ms before it is dropped by one bin. Conversely, we require the GPU to be busy for an average of 90% over a 84ms period before we upclock. So the current thresholds almost never downclock the GPU, and respond very slowly to sudden demands for more power. It is easy to observe that we currently lock into the wrong bin and both underperform in benchmarks and consume more power than optimal (just by repeating the task and measuring the different results). An alternative approach, as discussed in the bspec, is to use a continuous threshold for upclocking, and an average value for downclocking. This is good for quickly detecting and reacting to state changes within a frame, however it fails with the common throttling method of waiting upon the outstanding frame - at least it is difficult to choose a threshold that works well at 15,000fps and at 60fps. So continue to use average busy/idle loads to determine frequency change. v2: Use 3 power zones to keep frequencies low in steady-state mostly idle (e.g. scrolling, interactive 2D drawing), and frequencies high for demanding games. In between those end-states, we use a fast-reclocking algorithm to converge more quickly on the desired bin. v3: Bug fixes - make sure we reset adj after switching power zones. v4: Tune - drop the continuous busy thresholds as it prevents us from choosing the right frequency for glxgears style swap benchmarks. Instead the goal is to be able to find the right clocks irrespective of the wait-boost. Signed-off-by: Chris Wilson Cc: Kenneth Graunke Cc: Stéphane Marchesin Cc: Owen Taylor Cc: "Meng, Mengmeng" Cc: "Zhuang, Lena" Signed-off-by: Rodrigo Vivi --- drivers/gpu/drm/i915/i915_drv.h | 5 ++ drivers/gpu/drm/i915/i915_irq.c | 46 ++++++++++---- drivers/gpu/drm/i915/i915_reg.h | 2 +- drivers/gpu/drm/i915/intel_pm.c | 135 ++++++++++++++++++++++++++++++---------- 4 files changed, 141 insertions(+), 47 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 82658f8..0a1d249 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -850,8 +850,13 @@ struct intel_gen6_power_mgmt { u8 min_delay; u8 max_delay; u8 rpe_delay; + u8 rp1_delay; + u8 rp0_delay; u8 hw_max; + int last_adj; + enum { LOW_POWER, BETWEEN, HIGH_POWER } power; + struct delayed_work delayed_resume_work; /* diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 68c0b08..e930366 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -818,7 +818,7 @@ static void gen6_pm_rps_work(struct work_struct *work) drm_i915_private_t *dev_priv = container_of(work, drm_i915_private_t, rps.work); u32 pm_iir; - u8 new_delay; + int new_delay, adj; spin_lock_irq(&dev_priv->irq_lock); pm_iir = dev_priv->rps.pm_iir; @@ -835,29 +835,49 @@ static void gen6_pm_rps_work(struct work_struct *work) mutex_lock(&dev_priv->rps.hw_lock); + adj = dev_priv->rps.last_adj; if (pm_iir & GEN6_PM_RP_UP_THRESHOLD) { - new_delay = dev_priv->rps.cur_delay + 1; + if (adj > 0) + adj *= 2; + else + adj = 1; + new_delay = dev_priv->rps.cur_delay + adj; /* * For better performance, jump directly * to RPe if we're below it. */ - if (IS_VALLEYVIEW(dev_priv->dev) && - dev_priv->rps.cur_delay < dev_priv->rps.rpe_delay) + if (new_delay < dev_priv->rps.rpe_delay) + new_delay = dev_priv->rps.rpe_delay; + } else if (pm_iir & GEN6_PM_RP_DOWN_TIMEOUT) { + if (dev_priv->rps.cur_delay > dev_priv->rps.rpe_delay) new_delay = dev_priv->rps.rpe_delay; - } else - new_delay = dev_priv->rps.cur_delay - 1; + else + new_delay = dev_priv->rps.min_delay; + adj = 0; + } else if (pm_iir & GEN6_PM_RP_DOWN_THRESHOLD) { + if (adj < 0) + adj *= 2; + else + adj = -1; + new_delay = dev_priv->rps.cur_delay + adj; + } else { /* unknown event */ + new_delay = dev_priv->rps.cur_delay; + } /* sysfs frequency interfaces may have snuck in while servicing the * interrupt */ - if (new_delay >= dev_priv->rps.min_delay && - new_delay <= dev_priv->rps.max_delay) { - if (IS_VALLEYVIEW(dev_priv->dev)) - valleyview_set_rps(dev_priv->dev, new_delay); - else - gen6_set_rps(dev_priv->dev, new_delay); - } + if (new_delay < (int)dev_priv->rps.min_delay) + new_delay = dev_priv->rps.min_delay; + if (new_delay > (int)dev_priv->rps.max_delay) + new_delay = dev_priv->rps.max_delay; + dev_priv->rps.last_adj = new_delay - dev_priv->rps.cur_delay; + + if (IS_VALLEYVIEW(dev_priv->dev)) + valleyview_set_rps(dev_priv->dev, new_delay); + else + gen6_set_rps(dev_priv->dev, new_delay); mutex_unlock(&dev_priv->rps.hw_lock); } diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index c4f9bef..6248bd8 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -4675,7 +4675,7 @@ #define GEN6_RP_UP_IDLE_MIN (0x1<<3) #define GEN6_RP_UP_BUSY_AVG (0x2<<3) #define GEN6_RP_UP_BUSY_CONT (0x4<<3) -#define GEN7_RP_DOWN_IDLE_AVG (0x2<<0) +#define GEN6_RP_DOWN_IDLE_AVG (0x2<<0) #define GEN6_RP_DOWN_IDLE_CONT (0x1<<0) #define GEN6_RP_UP_THRESHOLD 0xA02C #define GEN6_RP_DOWN_THRESHOLD 0xA030 diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index d78a415..12951bc 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -3295,6 +3295,98 @@ static u32 gen6_rps_limits(struct drm_i915_private *dev_priv, u8 val) return limits; } +static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val) +{ + int new_power; + + new_power = dev_priv->rps.power; + switch (dev_priv->rps.power) { + case LOW_POWER: + if (val > dev_priv->rps.rpe_delay + 1 && val > dev_priv->rps.cur_delay) + new_power = BETWEEN; + break; + + case BETWEEN: + if (val <= dev_priv->rps.rpe_delay && val < dev_priv->rps.cur_delay) + new_power = LOW_POWER; + else if (val >= dev_priv->rps.rp0_delay && val > dev_priv->rps.cur_delay) + new_power = HIGH_POWER; + break; + + case HIGH_POWER: + if (val < (dev_priv->rps.rp1_delay + dev_priv->rps.rp0_delay) >> 1 && val < dev_priv->rps.cur_delay) + new_power = BETWEEN; + break; + } + /* Max/min bins are special */ + if (val == dev_priv->rps.min_delay) + new_power = LOW_POWER; + if (val == dev_priv->rps.max_delay) + new_power = HIGH_POWER; + if (new_power == dev_priv->rps.power) + return; + + /* Note the units here are not exactly 1us, but 1280ns. */ + switch (new_power) { + case LOW_POWER: + /* Upclock if more than 95% busy over 16ms */ + I915_WRITE(GEN6_RP_UP_EI, 12500); + I915_WRITE(GEN6_RP_UP_THRESHOLD, 11800); + + /* Downclock if less than 85% busy over 32ms */ + I915_WRITE(GEN6_RP_DOWN_EI, 25000); + I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 21250); + + I915_WRITE(GEN6_RP_CONTROL, + GEN6_RP_MEDIA_TURBO | + GEN6_RP_MEDIA_HW_NORMAL_MODE | + GEN6_RP_MEDIA_IS_GFX | + GEN6_RP_ENABLE | + GEN6_RP_UP_BUSY_AVG | + GEN6_RP_DOWN_IDLE_AVG); + break; + + case BETWEEN: + /* Upclock if more than 90% busy over 13ms */ + I915_WRITE(GEN6_RP_UP_EI, 10250); + I915_WRITE(GEN6_RP_UP_THRESHOLD, 9225); + + /* Downclock if less than 75% busy over 32ms */ + I915_WRITE(GEN6_RP_DOWN_EI, 25000); + I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 18750); + + I915_WRITE(GEN6_RP_CONTROL, + GEN6_RP_MEDIA_TURBO | + GEN6_RP_MEDIA_HW_NORMAL_MODE | + GEN6_RP_MEDIA_IS_GFX | + GEN6_RP_ENABLE | + GEN6_RP_UP_BUSY_AVG | + GEN6_RP_DOWN_IDLE_AVG); + break; + + case HIGH_POWER: + /* Upclock if more than 85% busy over 10ms */ + I915_WRITE(GEN6_RP_UP_EI, 8000); + I915_WRITE(GEN6_RP_UP_THRESHOLD, 6800); + + /* Downclock if less than 60% busy over 32ms */ + I915_WRITE(GEN6_RP_DOWN_EI, 25000); + I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 15000); + + I915_WRITE(GEN6_RP_CONTROL, + GEN6_RP_MEDIA_TURBO | + GEN6_RP_MEDIA_HW_NORMAL_MODE | + GEN6_RP_MEDIA_IS_GFX | + GEN6_RP_ENABLE | + GEN6_RP_UP_BUSY_AVG | + GEN6_RP_DOWN_IDLE_AVG); + break; + } + + dev_priv->rps.power = new_power; + dev_priv->rps.last_adj = 0; +} + void gen6_set_rps(struct drm_device *dev, u8 val) { struct drm_i915_private *dev_priv = dev->dev_private; @@ -3306,6 +3398,8 @@ void gen6_set_rps(struct drm_device *dev, u8 val) if (val == dev_priv->rps.cur_delay) return; + gen6_set_rps_thresholds(dev_priv, val); + if (IS_HASWELL(dev)) I915_WRITE(GEN6_RPNSWREQ, HSW_FREQUENCY(val)); @@ -3527,7 +3621,10 @@ static void gen6_enable_rps(struct drm_device *dev) /* In units of 50MHz */ dev_priv->rps.hw_max = dev_priv->rps.max_delay = rp_state_cap & 0xff; - dev_priv->rps.min_delay = (rp_state_cap & 0xff0000) >> 16; + dev_priv->rps.min_delay = (rp_state_cap >> 16) & 0xff; + dev_priv->rps.rp1_delay = (rp_state_cap >> 8) & 0xff; + dev_priv->rps.rp0_delay = (rp_state_cap >> 0) & 0xff; + dev_priv->rps.rpe_delay = dev_priv->rps.rp1_delay; dev_priv->rps.cur_delay = 0; /* disable the counters and set deterministic thresholds */ @@ -3575,38 +3672,9 @@ static void gen6_enable_rps(struct drm_device *dev) GEN6_RC_CTL_EI_MODE(1) | GEN6_RC_CTL_HW_ENABLE); - if (IS_HASWELL(dev)) { - I915_WRITE(GEN6_RPNSWREQ, - HSW_FREQUENCY(10)); - I915_WRITE(GEN6_RC_VIDEO_FREQ, - HSW_FREQUENCY(12)); - } else { - I915_WRITE(GEN6_RPNSWREQ, - GEN6_FREQUENCY(10) | - GEN6_OFFSET(0) | - GEN6_AGGRESSIVE_TURBO); - I915_WRITE(GEN6_RC_VIDEO_FREQ, - GEN6_FREQUENCY(12)); - } - - I915_WRITE(GEN6_RP_DOWN_TIMEOUT, 1000000); - I915_WRITE(GEN6_RP_INTERRUPT_LIMITS, - dev_priv->rps.max_delay << 24 | - dev_priv->rps.min_delay << 16); - - I915_WRITE(GEN6_RP_UP_THRESHOLD, 59400); - I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 245000); - I915_WRITE(GEN6_RP_UP_EI, 66000); - I915_WRITE(GEN6_RP_DOWN_EI, 350000); - + /* Power down if completely idle for over 50ms */ + I915_WRITE(GEN6_RP_DOWN_TIMEOUT, 50000); I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10); - I915_WRITE(GEN6_RP_CONTROL, - GEN6_RP_MEDIA_TURBO | - GEN6_RP_MEDIA_HW_NORMAL_MODE | - GEN6_RP_MEDIA_IS_GFX | - GEN6_RP_ENABLE | - GEN6_RP_UP_BUSY_AVG | - (IS_HASWELL(dev) ? GEN7_RP_DOWN_IDLE_AVG : GEN6_RP_DOWN_IDLE_CONT)); ret = sandybridge_pcode_write(dev_priv, GEN6_PCODE_WRITE_MIN_FREQ_TABLE, 0); if (!ret) { @@ -3622,7 +3690,8 @@ static void gen6_enable_rps(struct drm_device *dev) DRM_DEBUG_DRIVER("Failed to set the min frequency\n"); } - gen6_set_rps(dev_priv->dev, (gt_perf_status & 0xff00) >> 8); + dev_priv->rps.power = HIGH_POWER; /* force a reset */ + gen6_set_rps(dev_priv->dev, dev_priv->rps.min_delay); gen6_enable_rps_interrupts(dev);