diff mbox

i915: pm: Be less agressive with clockfreq changes on Bay Trail

Message ID 20171109185101.5653-1-hdegoede@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Hans de Goede Nov. 9, 2017, 6:51 p.m. UTC
Bay Trail devices are known to hang when changing the frequency often,
this is discussed in great length in:
https://bugzilla.kernel.org/show_bug.cgi?id=109051

Commit 6067a27d1f01 ("drm/i915: Avoid tweaking evaluation thresholds
on Baytrail v3") is an attempt to workaround this. Several users in
bko109051 report that an earlier version of this patch, v1:
https://bugzilla.kernel.org/attachment.cgi?id=251471

Works better for them and they still see hangs with the merged v3.

Comparing the 2 versions shows that they are indeed not equivalent,
v1 not only skips writing the GEN6_RP* registers from valleyview_set_rps,
as v3 does. It also contained these modifications to i915_irq.c:

     if (pm_iir & GEN6_PM_RP_DOWN_EI_EXPIRED) {
         if (!vlv_c0_above(dev_priv,
                   &dev_priv->rps.down_ei, &now,
-                  dev_priv->rps.down_threshold))
+                  VLV_RP_DOWN_EI_THRESHOLD))
             events |= GEN6_PM_RP_DOWN_THRESHOLD;
         dev_priv->rps.down_ei = now;
     }

     if (pm_iir & GEN6_PM_RP_UP_EI_EXPIRED) {
         if (vlv_c0_above(dev_priv,
                  &dev_priv->rps.up_ei, &now,
-                 dev_priv->rps.up_threshold))
+                 VLV_RP_UP_EI_THRESHOLD))
             events |= GEN6_PM_RP_UP_THRESHOLD;
         dev_priv->rps.up_ei = now;
     }

Which use less aggressive up/down thresholds, which results in less
GEN6_PM_RP_*_THRESHOLD events and thus in less calls to intel_set_rps() ->
valleyview_set_rps() -> vlv_punit_write(PUNIT_REG_GPU_FREQ_REQ).
With the last call being the likely cause of the hang.

This commit hardcodes the threshold_up and _down values for Bay Trail to
less aggressive values, reducing the amount of clock frequency changes,
thus avoiding the hangs some people are still seeing with the merged fix.

Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=109051
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
 drivers/gpu/drm/i915/i915_reg.h | 3 +++
 drivers/gpu/drm/i915/intel_pm.c | 5 ++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

Comments

Chris Wilson Nov. 10, 2017, 2:56 p.m. UTC | #1
Quoting Hans de Goede (2017-11-09 18:51:01)
> Bay Trail devices are known to hang when changing the frequency often,
> this is discussed in great length in:
> https://bugzilla.kernel.org/show_bug.cgi?id=109051
> 
> Commit 6067a27d1f01 ("drm/i915: Avoid tweaking evaluation thresholds
> on Baytrail v3") is an attempt to workaround this. Several users in
> bko109051 report that an earlier version of this patch, v1:
> https://bugzilla.kernel.org/attachment.cgi?id=251471
> 
> Works better for them and they still see hangs with the merged v3.
> 
> Comparing the 2 versions shows that they are indeed not equivalent,
> v1 not only skips writing the GEN6_RP* registers from valleyview_set_rps,
> as v3 does. It also contained these modifications to i915_irq.c:
> 
>      if (pm_iir & GEN6_PM_RP_DOWN_EI_EXPIRED) {
>          if (!vlv_c0_above(dev_priv,
>                    &dev_priv->rps.down_ei, &now,
> -                  dev_priv->rps.down_threshold))
> +                  VLV_RP_DOWN_EI_THRESHOLD))
>              events |= GEN6_PM_RP_DOWN_THRESHOLD;
>          dev_priv->rps.down_ei = now;
>      }
> 
>      if (pm_iir & GEN6_PM_RP_UP_EI_EXPIRED) {
>          if (vlv_c0_above(dev_priv,
>                   &dev_priv->rps.up_ei, &now,
> -                 dev_priv->rps.up_threshold))
> +                 VLV_RP_UP_EI_THRESHOLD))
>              events |= GEN6_PM_RP_UP_THRESHOLD;
>          dev_priv->rps.up_ei = now;
>      }
> 
> Which use less aggressive up/down thresholds, which results in less
> GEN6_PM_RP_*_THRESHOLD events and thus in less calls to intel_set_rps() ->
> valleyview_set_rps() -> vlv_punit_write(PUNIT_REG_GPU_FREQ_REQ).
> With the last call being the likely cause of the hang.
> 
> This commit hardcodes the threshold_up and _down values for Bay Trail to
> less aggressive values, reducing the amount of clock frequency changes,
> thus avoiding the hangs some people are still seeing with the merged fix.
> 
> Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=109051
> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> ---
>  drivers/gpu/drm/i915/i915_reg.h | 3 +++
>  drivers/gpu/drm/i915/intel_pm.c | 5 ++++-
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 68a58cce6ab1..2561af075ebb 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -1355,6 +1355,9 @@ enum i915_power_well_id {
>  #define        VLV_BIAS_CPU_125_SOC_875 (6 << 2)
>  #define        CHV_BIAS_CPU_50_SOC_50 (3 << 2)
>  
> +#define VLV_RP_UP_EI_THRESHOLD                 90
> +#define VLV_RP_DOWN_EI_THRESHOLD               70
> +
>  /* vlv2 north clock has */
>  #define CCK_FUSE_REG                           0x8
>  #define  CCK_FUSE_HPLL_FREQ_MASK               0x3
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 01966b89be14..177b6caa0a38 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -6096,8 +6096,11 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
>         /* When byt can survive without system hang with dynamic
>          * sw freq adjustments, this restriction can be lifted.
>          */
> -       if (IS_VALLEYVIEW(dev_priv))
> +       if (IS_VALLEYVIEW(dev_priv)) {
> +               threshold_up = VLV_RP_UP_EI_THRESHOLD;
> +               threshold_down = VLV_RP_DOWN_EI_THRESHOLD;

Basically you want a limit on the frequency of ... pcode access?
As presented, you ultimately do not trust any write and the only
solution is to disable all writes. No RPS whatsoever, run at max and
hope rc6 works (maybe even decrease the rc6 threshold).

One of the ideas we floated was moving the pcode access to a worker and
ratelimiting the updates.
-Chris
Hans de Goede Nov. 10, 2017, 2:59 p.m. UTC | #2
Hi,

On 10-11-17 15:56, Chris Wilson wrote:
> Quoting Hans de Goede (2017-11-09 18:51:01)
>> Bay Trail devices are known to hang when changing the frequency often,
>> this is discussed in great length in:
>> https://bugzilla.kernel.org/show_bug.cgi?id=109051
>>
>> Commit 6067a27d1f01 ("drm/i915: Avoid tweaking evaluation thresholds
>> on Baytrail v3") is an attempt to workaround this. Several users in
>> bko109051 report that an earlier version of this patch, v1:
>> https://bugzilla.kernel.org/attachment.cgi?id=251471
>>
>> Works better for them and they still see hangs with the merged v3.
>>
>> Comparing the 2 versions shows that they are indeed not equivalent,
>> v1 not only skips writing the GEN6_RP* registers from valleyview_set_rps,
>> as v3 does. It also contained these modifications to i915_irq.c:
>>
>>       if (pm_iir & GEN6_PM_RP_DOWN_EI_EXPIRED) {
>>           if (!vlv_c0_above(dev_priv,
>>                     &dev_priv->rps.down_ei, &now,
>> -                  dev_priv->rps.down_threshold))
>> +                  VLV_RP_DOWN_EI_THRESHOLD))
>>               events |= GEN6_PM_RP_DOWN_THRESHOLD;
>>           dev_priv->rps.down_ei = now;
>>       }
>>
>>       if (pm_iir & GEN6_PM_RP_UP_EI_EXPIRED) {
>>           if (vlv_c0_above(dev_priv,
>>                    &dev_priv->rps.up_ei, &now,
>> -                 dev_priv->rps.up_threshold))
>> +                 VLV_RP_UP_EI_THRESHOLD))
>>               events |= GEN6_PM_RP_UP_THRESHOLD;
>>           dev_priv->rps.up_ei = now;
>>       }
>>
>> Which use less aggressive up/down thresholds, which results in less
>> GEN6_PM_RP_*_THRESHOLD events and thus in less calls to intel_set_rps() ->
>> valleyview_set_rps() -> vlv_punit_write(PUNIT_REG_GPU_FREQ_REQ).
>> With the last call being the likely cause of the hang.
>>
>> This commit hardcodes the threshold_up and _down values for Bay Trail to
>> less aggressive values, reducing the amount of clock frequency changes,
>> thus avoiding the hangs some people are still seeing with the merged fix.
>>
>> Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=109051
>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>> ---
>>   drivers/gpu/drm/i915/i915_reg.h | 3 +++
>>   drivers/gpu/drm/i915/intel_pm.c | 5 ++++-
>>   2 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
>> index 68a58cce6ab1..2561af075ebb 100644
>> --- a/drivers/gpu/drm/i915/i915_reg.h
>> +++ b/drivers/gpu/drm/i915/i915_reg.h
>> @@ -1355,6 +1355,9 @@ enum i915_power_well_id {
>>   #define        VLV_BIAS_CPU_125_SOC_875 (6 << 2)
>>   #define        CHV_BIAS_CPU_50_SOC_50 (3 << 2)
>>   
>> +#define VLV_RP_UP_EI_THRESHOLD                 90
>> +#define VLV_RP_DOWN_EI_THRESHOLD               70
>> +
>>   /* vlv2 north clock has */
>>   #define CCK_FUSE_REG                           0x8
>>   #define  CCK_FUSE_HPLL_FREQ_MASK               0x3
>> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
>> index 01966b89be14..177b6caa0a38 100644
>> --- a/drivers/gpu/drm/i915/intel_pm.c
>> +++ b/drivers/gpu/drm/i915/intel_pm.c
>> @@ -6096,8 +6096,11 @@ static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
>>          /* When byt can survive without system hang with dynamic
>>           * sw freq adjustments, this restriction can be lifted.
>>           */
>> -       if (IS_VALLEYVIEW(dev_priv))
>> +       if (IS_VALLEYVIEW(dev_priv)) {
>> +               threshold_up = VLV_RP_UP_EI_THRESHOLD;
>> +               threshold_down = VLV_RP_DOWN_EI_THRESHOLD;
> 
> Basically you want a limit on the frequency of ... pcode access?
> As presented, you ultimately do not trust any write and the only
> solution is to disable all writes. No RPS whatsoever, run at max and
> hope rc6 works (maybe even decrease the rc6 threshold).
> 
> One of the ideas we floated was moving the pcode access to a worker and
> ratelimiting the updates.

Yes ratelimiting the amount of vlv_punit_write(PUNIT_REG_GPU_FREQ_REQ)
calls is also something which came to my mind as an alternative solution.

As I tried to explain in the commit message my main inspiration for this
patch was that people were reporting different success rates with v1
and v3 of the Commit 6067a27d1f01 ("drm/i915: Avoid tweaking evaluation thresholds
on Baytrail v3") patch.

Regards,

Hans



> -Chris
>
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 68a58cce6ab1..2561af075ebb 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -1355,6 +1355,9 @@  enum i915_power_well_id {
 #define 	VLV_BIAS_CPU_125_SOC_875 (6 << 2)
 #define 	CHV_BIAS_CPU_50_SOC_50 (3 << 2)
 
+#define VLV_RP_UP_EI_THRESHOLD			90
+#define VLV_RP_DOWN_EI_THRESHOLD		70
+
 /* vlv2 north clock has */
 #define CCK_FUSE_REG				0x8
 #define  CCK_FUSE_HPLL_FREQ_MASK		0x3
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 01966b89be14..177b6caa0a38 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6096,8 +6096,11 @@  static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
 	/* When byt can survive without system hang with dynamic
 	 * sw freq adjustments, this restriction can be lifted.
 	 */
-	if (IS_VALLEYVIEW(dev_priv))
+	if (IS_VALLEYVIEW(dev_priv)) {
+		threshold_up = VLV_RP_UP_EI_THRESHOLD;
+		threshold_down = VLV_RP_DOWN_EI_THRESHOLD;
 		goto skip_hw_write;
+	}
 
 	I915_WRITE(GEN6_RP_UP_EI,
 		   GT_INTERVAL_FROM_US(dev_priv, ei_up));