diff mbox

drm/i915: Enabling RC6 immediately during init/resume

Message ID 1440190188-12629-1-git-send-email-namrta.salonie@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Namrta Salonie Aug. 21, 2015, 8:49 p.m. UTC
Since RC6 enabling does not involve PCU communication overhead,
it can be enabled immediately during the resume time.
This will help save additional power & meet power requirements
for active Idle KPI where power is evaluated over
number of transitions of suspend/resume.

Signed-off-by: Namrta Salonie <namrta.salonie@intel.com>
Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/intel_pm.c |   94 ++++++++++++++++++++++++++-------------
 1 file changed, 63 insertions(+), 31 deletions(-)

Comments

Chris Wilson Aug. 21, 2015, 12:41 p.m. UTC | #1
On Sat, Aug 22, 2015 at 02:19:48AM +0530, Namrta Salonie wrote:
> Since RC6 enabling does not involve PCU communication overhead,
> it can be enabled immediately during the resume time.
> This will help save additional power & meet power requirements
> for active Idle KPI where power is evaluated over
> number of transitions of suspend/resume.
> 
> Signed-off-by: Namrta Salonie <namrta.salonie@intel.com>
> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>

You can pull out gen9 rc6 as well, and apply a similar transformation to
gen6-8. So instead of putting the if-chain in
intel_enable_gt_powersave(), add intel_enable_rc6() and start placing
the ready functions there.

Reviewing the comments we only need the rpm lock until after rc6
enabling and as you keep that wakelock, you are not getting the full
improvement you seek. If you keep refactoring the remaining two rc6
functions, you can then drop the wakelock.
-Chris
Daniel Vetter Aug. 25, 2015, 2:32 p.m. UTC | #2
On Fri, Aug 21, 2015 at 01:41:26PM +0100, Chris Wilson wrote:
> On Sat, Aug 22, 2015 at 02:19:48AM +0530, Namrta Salonie wrote:
> > Since RC6 enabling does not involve PCU communication overhead,
> > it can be enabled immediately during the resume time.
> > This will help save additional power & meet power requirements
> > for active Idle KPI where power is evaluated over
> > number of transitions of suspend/resume.
> > 
> > Signed-off-by: Namrta Salonie <namrta.salonie@intel.com>
> > Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
> 
> You can pull out gen9 rc6 as well, and apply a similar transformation to
> gen6-8. So instead of putting the if-chain in
> intel_enable_gt_powersave(), add intel_enable_rc6() and start placing
> the ready functions there.
> 
> Reviewing the comments we only need the rpm lock until after rc6
> enabling and as you keep that wakelock, you are not getting the full
> improvement you seek. If you keep refactoring the remaining two rc6
> functions, you can then drop the wakelock.

Since this seems to not have much of a benefit due to the missing removal
of the wakelock I wonder how this was tested ... Next patch should have
(relative, we're not allowed to publish absolute) performance data
attached, e.g. "Over 100 suspend/resume cycles with 5s of idle time in
between each suspend/resume time this reduce in a reduction of $number
$unit."

Without this this patch is just unjustified tuning and I won't take it.
-Daniel
Namrta Salonie Aug. 31, 2015, 11:38 a.m. UTC | #3
Hi Chris, Daniel.

Thanks for your inputs.
I agree that we need to amend the patch. Will do following changes.
1.	RPM ref count is not needed with immediate enabling of RC6, I will 
remove that.
2.	I will extend this to other GEN as well.

This was one of the set of optimization we implemented for BYT Android. 
All of these
gave improvement of ~5mW for 30minutes for Active Idle WLAN KPI. And 
about ~5mW for other airplane, wifi, radio suspend scenarios.

The other optimizations included :-
1.	Reduction of autosuspend delay to 500ms from 10ms (On BYT, display D3 
should happen in suspend as Punit initiates S0iX flow only considering 
Display D3). Because of this reduction Display D3 will happen 
immediately: This can be controlled by user mode in android. However 
shall we bring this value for Linux as well?

2.	Deferring RC6 disabling from early_resume callback to resume callback 
to reduce the delay for which the wells had to stay ON – We verified the 
HDMI case and it worked without issues.
3.	During resume, perform modeset based on the DPMS state, so that 
Display remains Off for the intermediate wake ups where no DPMS ON/OFF 
happens.

Also, can we port the optimizations 2 & 3 to the upstream kernel?

Thanks,
Namrta

On 8/25/2015 8:02 PM, Daniel Vetter wrote:
> On Fri, Aug 21, 2015 at 01:41:26PM +0100, Chris Wilson wrote:
>> On Sat, Aug 22, 2015 at 02:19:48AM +0530, Namrta Salonie wrote:
>>> Since RC6 enabling does not involve PCU communication overhead,
>>> it can be enabled immediately during the resume time.
>>> This will help save additional power & meet power requirements
>>> for active Idle KPI where power is evaluated over
>>> number of transitions of suspend/resume.
>>>
>>> Signed-off-by: Namrta Salonie <namrta.salonie@intel.com>
>>> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble@intel.com>
>>
>> You can pull out gen9 rc6 as well, and apply a similar transformation to
>> gen6-8. So instead of putting the if-chain in
>> intel_enable_gt_powersave(), add intel_enable_rc6() and start placing
>> the ready functions there.
>>
>> Reviewing the comments we only need the rpm lock until after rc6
>> enabling and as you keep that wakelock, you are not getting the full
>> improvement you seek. If you keep refactoring the remaining two rc6
>> functions, you can then drop the wakelock.
>
> Since this seems to not have much of a benefit due to the missing removal
> of the wakelock I wonder how this was tested ... Next patch should have
> (relative, we're not allowed to publish absolute) performance data
> attached, e.g. "Over 100 suspend/resume cycles with 5s of idle time in
> between each suspend/resume time this reduce in a reduction of $number
> $unit."
>
> Without this this patch is just unjustified tuning and I won't take it.
> -Daniel
>
Daniel Vetter Sept. 2, 2015, 8:30 a.m. UTC | #4
On Mon, Aug 31, 2015 at 05:08:36PM +0530, Salonie, Namrta wrote:
> Hi Chris, Daniel.
> 
> Thanks for your inputs.
> I agree that we need to amend the patch. Will do following changes.
> 1.	RPM ref count is not needed with immediate enabling of RC6, I will remove
> that.
> 2.	I will extend this to other GEN as well.
> 
> This was one of the set of optimization we implemented for BYT Android. All
> of these
> gave improvement of ~5mW for 30minutes for Active Idle WLAN KPI. And about
> ~5mW for other airplane, wifi, radio suspend scenarios.
> 
> The other optimizations included :-
> 1.	Reduction of autosuspend delay to 500ms from 10ms (On BYT, display D3

I guess you meant 10 s runtime pm autosuspend default?

> should happen in suspend as Punit initiates S0iX flow only considering
> Display D3). Because of this reduction Display D3 will happen immediately:
> This can be controlled by user mode in android. However shall we bring this
> value for Linux as well?

I have a patch for that (including enabling runtime pm by default), but
it's blocked because atm runtime pm is broken in upstream.

> 2.	Deferring RC6 disabling from early_resume callback to resume callback to
> reduce the delay for which the wells had to stay ON – We verified the HDMI
> case and it worked without issues.

I think the big trouble there is that right now we don't handle power well
references correctly in the suspend/resume code at all - we just
force-enable them all. Definitely something we want to fix, but will be a
lot of work to make sure it works everywhere. I'd like to see an overall
approach to this though since I fear if we just move around individual
rpm references (like rc6 or specific power wells) the end-result will be a
really complicated and fragile design. Suspend/resume is already one of
the most fragile parts of the driver as-is.

> 3.	During resume, perform modeset based on the DPMS state, so that Display
> remains Off for the intermediate wake ups where no DPMS ON/OFF happens.

That /should/ be how it's supposed to work. I.e. for dpms off, we should
not try to enable things again. Might have been broken, but with latest
atomic it really should work correctly.

> Also, can we port the optimizations 2 & 3 to the upstream kernel?

Sure, sounds like some good. Usual caveat applies though since upstream
needs to work everywhere, so probably some more work is needed to make
sure the patches don't break anything.
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index fff0c22..f1164c0 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -5468,14 +5468,13 @@  static void valleyview_cleanup_gt_powersave(struct drm_device *dev)
 	valleyview_cleanup_pctx(dev);
 }
 
-static void cherryview_enable_rps(struct drm_device *dev)
+static void cherryview_enable_rc6(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *ring;
-	u32 gtfifodbg, val, rc6_mode = 0, pcbr;
+	u32 gtfifodbg, rc6_mode = 0, pcbr;
 	int i;
 
-	WARN_ON(!mutex_is_locked(&dev_priv->rps.hw_lock));
 
 	gtfifodbg = I915_READ(GTFIFODBG);
 	if (gtfifodbg) {
@@ -5486,9 +5485,9 @@  static void cherryview_enable_rps(struct drm_device *dev)
 
 	cherryview_check_pctx(dev_priv);
 
-	/* 1a & 1b: Get forcewake during program sequence. Although the driver
-	 * hasn't enabled a state yet where we need forcewake, BIOS may have.*/
-	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+       /* 1: Get forcewake during program sequence. Although the driver
+         * hasn't enabled a state yet where we need forcewake, BIOS may have.*/
+        intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
 
 	/*  Disable RC states. */
 	I915_WRITE(GEN6_RC_CONTROL, 0);
@@ -5520,8 +5519,21 @@  static void cherryview_enable_rps(struct drm_device *dev)
 		rc6_mode = GEN7_RC_CTL_TO_MODE;
 
 	I915_WRITE(GEN6_RC_CONTROL, rc6_mode);
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+}
 
-	/* 4 Program defaults and thresholds for RPS*/
+static void cherryview_enable_rps(struct drm_device *dev)
+{
+        struct drm_i915_private *dev_priv = dev->dev_private;
+        u32 val;
+
+	WARN_ON(!mutex_is_locked(&dev_priv->rps.hw_lock));
+
+        /* 1: Get forcewake during program sequence. As Driver would have enabled RC6
+	 * by now before Turbo enabling sequence */
+        intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+
+	/* 2: Program defaults and thresholds for RPS*/
 	I915_WRITE(GEN6_RP_DOWN_TIMEOUT, 1000000);
 	I915_WRITE(GEN6_RP_UP_THRESHOLD, 59400);
 	I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 245000);
@@ -5530,7 +5542,7 @@  static void cherryview_enable_rps(struct drm_device *dev)
 
 	I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);
 
-	/* 5: Enable RPS */
+	/* 3: Enable RPS */
 	I915_WRITE(GEN6_RP_CONTROL,
 		   GEN6_RP_MEDIA_HW_NORMAL_MODE |
 		   GEN6_RP_MEDIA_IS_GFX |
@@ -5538,7 +5550,7 @@  static void cherryview_enable_rps(struct drm_device *dev)
 		   GEN6_RP_UP_BUSY_AVG |
 		   GEN6_RP_DOWN_IDLE_AVG);
 
-	/* Setting Fixed Bias */
+	/* 4: Setting Fixed Bias */
 	val = VLV_OVERRIDE_EN |
 		  VLV_SOC_TDP_EN |
 		  CHV_BIAS_CPU_50_SOC_50;
@@ -5546,7 +5558,7 @@  static void cherryview_enable_rps(struct drm_device *dev)
 
 	val = vlv_punit_read(dev_priv, PUNIT_REG_GPU_FREQ_STS);
 
-	/* RPS code assumes GPLL is used */
+	/* 5: RPS code assumes GPLL is used */
 	WARN_ONCE((val & GPLLENABLE) == 0, "GPLL not enabled\n");
 
 	DRM_DEBUG_DRIVER("GPLL enabled? %s\n", val & GPLLENABLE ? "yes" : "no");
@@ -5566,14 +5578,13 @@  static void cherryview_enable_rps(struct drm_device *dev)
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 }
 
-static void valleyview_enable_rps(struct drm_device *dev)
+static void valleyview_enable_rc6(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *ring;
-	u32 gtfifodbg, val, rc6_mode = 0;
+	u32 gtfifodbg, rc6_mode = 0;
 	int i;
 
-	WARN_ON(!mutex_is_locked(&dev_priv->rps.hw_lock));
 
 	valleyview_check_pctx(dev_priv);
 
@@ -5583,28 +5594,12 @@  static void valleyview_enable_rps(struct drm_device *dev)
 		I915_WRITE(GTFIFODBG, gtfifodbg);
 	}
 
-	/* If VLV, Forcewake all wells, else re-direct to regular path */
-	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+        /* If VLV, Forcewake all wells */
+        intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
 
 	/*  Disable RC states. */
 	I915_WRITE(GEN6_RC_CONTROL, 0);
 
-	I915_WRITE(GEN6_RP_DOWN_TIMEOUT, 1000000);
-	I915_WRITE(GEN6_RP_UP_THRESHOLD, 59400);
-	I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 245000);
-	I915_WRITE(GEN6_RP_UP_EI, 66000);
-	I915_WRITE(GEN6_RP_DOWN_EI, 350000);
-
-	I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);
-
-	I915_WRITE(GEN6_RP_CONTROL,
-		   GEN6_RP_MEDIA_TURBO |
-		   GEN6_RP_MEDIA_HW_NORMAL_MODE |
-		   GEN6_RP_MEDIA_IS_GFX |
-		   GEN6_RP_ENABLE |
-		   GEN6_RP_UP_BUSY_AVG |
-		   GEN6_RP_DOWN_IDLE_CONT);
-
 	I915_WRITE(GEN6_RC6_WAKE_RATE_LIMIT, 0x00280000);
 	I915_WRITE(GEN6_RC_EVALUATION_INTERVAL, 125000);
 	I915_WRITE(GEN6_RC_IDLE_HYSTERSIS, 25);
@@ -5627,6 +5622,34 @@  static void valleyview_enable_rps(struct drm_device *dev)
 	intel_print_rc6_info(dev, rc6_mode);
 
 	I915_WRITE(GEN6_RC_CONTROL, rc6_mode);
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+}
+
+static void valleyview_enable_rps(struct drm_device *dev)
+{
+        struct drm_i915_private *dev_priv = dev->dev_private;
+        u32 val;
+
+        WARN_ON(!mutex_is_locked(&dev_priv->rps.hw_lock));
+
+        /* If VLV, Forcewake all wells, else re-direct to regular path */
+        intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+
+        I915_WRITE(GEN6_RP_DOWN_TIMEOUT, 1000000);
+        I915_WRITE(GEN6_RP_UP_THRESHOLD, 59400);
+        I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 245000);
+        I915_WRITE(GEN6_RP_UP_EI, 66000);
+        I915_WRITE(GEN6_RP_DOWN_EI, 350000);
+
+        I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);
+
+        I915_WRITE(GEN6_RP_CONTROL,
+                   GEN6_RP_MEDIA_TURBO |
+                   GEN6_RP_MEDIA_HW_NORMAL_MODE |
+                   GEN6_RP_MEDIA_IS_GFX |
+                   GEN6_RP_ENABLE |
+                   GEN6_RP_UP_BUSY_AVG |
+                   GEN6_RP_DOWN_IDLE_CONT);
 
 	/* Setting Fixed Bias */
 	val = VLV_OVERRIDE_EN |
@@ -6274,6 +6297,15 @@  void intel_enable_gt_powersave(struct drm_device *dev)
 		mutex_unlock(&dev->struct_mutex);
 	} else if (INTEL_INFO(dev)->gen >= 6) {
 		/*
+		 * Enabling RC6 for CHV/VLV here itself and only deferring Turbo
+		 * enabling.
+		 */
+		if (IS_CHERRYVIEW(dev))
+			cherryview_enable_rc6(dev);
+		else if (IS_VALLEYVIEW(dev))
+			valleyview_enable_rc6(dev);
+
+		/*
 		 * PCU communication is slow and this doesn't need to be
 		 * done at any specific time, so do this out of our fast path
 		 * to make resume and init faster.