diff mbox

[v2,2/3] cpufreq: intel_pstate: Do not reinit performance limits in ->setpolicy

Message ID 2188688.SPioTUuSuO@aspire.rjw.lan (mailing list archive)
State Superseded, archived
Delegated to: Rafael Wysocki
Headers show

Commit Message

Rafael J. Wysocki Feb. 28, 2017, 11:09 p.m. UTC
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

If the current P-state selection algorithm is set to "performance"
in intel_pstate_set_policy(), the limits may be initialized from
scratch, but only if no_turbo is not set and the maximum frequency
allowed for the given CPU (i.e. the policy object representing it)
is at least equal to the max frequency supported by the CPU.  In all
of the other cases, the limits will not be updated.

For example, the following can happen:

 # cat intel_pstate/status
 active
 # echo performance > cpufreq/policy0/scaling_governor
 # cat intel_pstate/min_perf_pct
 100
 # echo 94 > intel_pstate/min_perf_pct
 # cat intel_pstate/min_perf_pct
 100
 # cat cpufreq/policy0/scaling_max_freq
 3100000
 echo 3000000 > cpufreq/policy0/scaling_max_freq
 # cat intel_pstate/min_perf_pct
 94
 # echo 95 > intel_pstate/min_perf_pct
 # cat intel_pstate/min_perf_pct
 95

That is confusing for two reasons.  First, the initial attempt to
change min_perf_pct to 94 seems to have no effect, even though
setting the global limits should always work.  Second, after
changing scaling_max_freq for policy0 the global min_perf_pct
attribute shows 94, even though it should have not been affected
by that operation in principle.

Moreover, the final attempt to change min_perf_pct to 95 worked
as expected, because scaling_max_freq for the only policy with
scaling_governor equal to "performance" was different from the
maximum at that time.

To make all that confusion go away, modify intel_pstate_set_policy()
so that it doesn't reinitialize the limits at all.

At the same time, change intel_pstate_set_performance_limits() to
set min_sysfs_pct to 100 in the "performance" limits set so that
switching the P-state selection algorithm to "performance" causes
intel_pstate/min_perf_pct in sysfs to go to 100 (or whatever value
min_sysfs_pct in the "performance" limits is set to later).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

-> v2: No changes

---
 drivers/cpufreq/intel_pstate.c |   10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

Comments

Rafael J. Wysocki March 2, 2017, 5:18 p.m. UTC | #1
On Wed, Mar 1, 2017 at 12:09 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> If the current P-state selection algorithm is set to "performance"
> in intel_pstate_set_policy(), the limits may be initialized from
> scratch, but only if no_turbo is not set and the maximum frequency
> allowed for the given CPU (i.e. the policy object representing it)
> is at least equal to the max frequency supported by the CPU.  In all
> of the other cases, the limits will not be updated.
>
> For example, the following can happen:
>
>  # cat intel_pstate/status
>  active
>  # echo performance > cpufreq/policy0/scaling_governor
>  # cat intel_pstate/min_perf_pct
>  100
>  # echo 94 > intel_pstate/min_perf_pct
>  # cat intel_pstate/min_perf_pct
>  100
>  # cat cpufreq/policy0/scaling_max_freq
>  3100000
>  echo 3000000 > cpufreq/policy0/scaling_max_freq
>  # cat intel_pstate/min_perf_pct
>  94
>  # echo 95 > intel_pstate/min_perf_pct
>  # cat intel_pstate/min_perf_pct
>  95
>
> That is confusing for two reasons.  First, the initial attempt to
> change min_perf_pct to 94 seems to have no effect, even though
> setting the global limits should always work.  Second, after
> changing scaling_max_freq for policy0 the global min_perf_pct
> attribute shows 94, even though it should have not been affected
> by that operation in principle.
>
> Moreover, the final attempt to change min_perf_pct to 95 worked
> as expected, because scaling_max_freq for the only policy with
> scaling_governor equal to "performance" was different from the
> maximum at that time.
>
> To make all that confusion go away, modify intel_pstate_set_policy()
> so that it doesn't reinitialize the limits at all.
>
> At the same time, change intel_pstate_set_performance_limits() to
> set min_sysfs_pct to 100 in the "performance" limits set so that
> switching the P-state selection algorithm to "performance" causes
> intel_pstate/min_perf_pct in sysfs to go to 100 (or whatever value
> min_sysfs_pct in the "performance" limits is set to later).
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>
> -> v2: No changes
>
> ---
>  drivers/cpufreq/intel_pstate.c |   10 +++-------
>  1 file changed, 3 insertions(+), 7 deletions(-)
>
> Index: linux-pm/drivers/cpufreq/intel_pstate.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> +++ linux-pm/drivers/cpufreq/intel_pstate.c
> @@ -382,6 +382,7 @@ static void intel_pstate_set_performance
>         intel_pstate_init_limits(limits);
>         limits->min_perf_pct = 100;
>         limits->min_perf = int_ext_tofp(1);
> +       limits->min_sysfs_pct = 100;

This change breaks the per-CPU limits, so the patch is not correct.

Withdrawing.

Thanks,
Rafael
Rafael J. Wysocki March 2, 2017, 5:22 p.m. UTC | #2
On Thu, Mar 2, 2017 at 6:18 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Wed, Mar 1, 2017 at 12:09 AM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> If the current P-state selection algorithm is set to "performance"
>> in intel_pstate_set_policy(), the limits may be initialized from
>> scratch, but only if no_turbo is not set and the maximum frequency
>> allowed for the given CPU (i.e. the policy object representing it)
>> is at least equal to the max frequency supported by the CPU.  In all
>> of the other cases, the limits will not be updated.
>>
>> For example, the following can happen:
>>
>>  # cat intel_pstate/status
>>  active
>>  # echo performance > cpufreq/policy0/scaling_governor
>>  # cat intel_pstate/min_perf_pct
>>  100
>>  # echo 94 > intel_pstate/min_perf_pct
>>  # cat intel_pstate/min_perf_pct
>>  100
>>  # cat cpufreq/policy0/scaling_max_freq
>>  3100000
>>  echo 3000000 > cpufreq/policy0/scaling_max_freq
>>  # cat intel_pstate/min_perf_pct
>>  94
>>  # echo 95 > intel_pstate/min_perf_pct
>>  # cat intel_pstate/min_perf_pct
>>  95
>>
>> That is confusing for two reasons.  First, the initial attempt to
>> change min_perf_pct to 94 seems to have no effect, even though
>> setting the global limits should always work.  Second, after
>> changing scaling_max_freq for policy0 the global min_perf_pct
>> attribute shows 94, even though it should have not been affected
>> by that operation in principle.
>>
>> Moreover, the final attempt to change min_perf_pct to 95 worked
>> as expected, because scaling_max_freq for the only policy with
>> scaling_governor equal to "performance" was different from the
>> maximum at that time.
>>
>> To make all that confusion go away, modify intel_pstate_set_policy()
>> so that it doesn't reinitialize the limits at all.
>>
>> At the same time, change intel_pstate_set_performance_limits() to
>> set min_sysfs_pct to 100 in the "performance" limits set so that
>> switching the P-state selection algorithm to "performance" causes
>> intel_pstate/min_perf_pct in sysfs to go to 100 (or whatever value
>> min_sysfs_pct in the "performance" limits is set to later).
>>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> ---
>>
>> -> v2: No changes
>>
>> ---
>>  drivers/cpufreq/intel_pstate.c |   10 +++-------
>>  1 file changed, 3 insertions(+), 7 deletions(-)
>>
>> Index: linux-pm/drivers/cpufreq/intel_pstate.c
>> ===================================================================
>> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
>> +++ linux-pm/drivers/cpufreq/intel_pstate.c
>> @@ -382,6 +382,7 @@ static void intel_pstate_set_performance
>>         intel_pstate_init_limits(limits);
>>         limits->min_perf_pct = 100;
>>         limits->min_perf = int_ext_tofp(1);
>> +       limits->min_sysfs_pct = 100;
>
> This change breaks the per-CPU limits, so the patch is not correct.
>
> Withdrawing.

Actually, it appears to be fixable, so I will send a new version
later, most likely.

Thanks,
Rafael
Doug Smythies March 29, 2017, 10:01 p.m. UTC | #3
Hi Rafael,

Sorry for the delay, but I didn't notice until today that this
commit causes a regression, at least in my computer.

I have not figured out exactly what is wrong, as I must
admit I am finding these policy interactions difficult
to follow.

On 2017.03.02 14:29 Rafael J. Wysocki wrote:
>From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> If the current P-state selection algorithm is set to "performance"
> in intel_pstate_set_policy(), the limits may be initialized from

... [cut] ...

Going back to kernel 4.11-rc1 I get this after boot**:

$ uname -a
Linux s15 4.11.0-rc1-stock #217 SMP Sun Mar 5 15:34:38 PST 2017 x86_64 x86_64 x86_64 GNU/Linux

$ grep . /sys/devices/system/cpu/intel_pstate/*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:100  <<< Correct
/sys/devices/system/cpu/intel_pstate/min_perf_pct:43   <<< Correct
/sys/devices/system/cpu/intel_pstate/no_turbo:0
/sys/devices/system/cpu/intel_pstate/num_pstates:23
/sys/devices/system/cpu/intel_pstate/status:active
/sys/devices/system/cpu/intel_pstate/turbo_pct:18

$ grep . /sys/devices/system/cpu/cpu0/cpufreq/*
/sys/devices/system/cpu/cpu0/cpufreq/affected_cpus:0
grep: /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq: Permission denied
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq:3800000  <<< Correct
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq:1600000  <<< Correct
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency:4294967295
/sys/devices/system/cpu/cpu0/cpufreq/related_cpus:0
/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors:performance powersave
/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:1600805
/sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave  <<< Notice
/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:3800000  <<< Correct
/sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:1600000  <<< Correct
/sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed:<unsupported>

$ sudo rdmsr --bitfield 15:8 -d -a 0x199 <<< Requested P-States
16
16
17
17
16
16
16
16

Going back to kernel 4.11-rc2 I get this after boot**:

~$ uname -a
Linux s15 4.11.0-rc2-stock #218 SMP Sun Mar 12 23:57:44 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux

$ grep . /sys/devices/system/cpu/intel_pstate/*
/sys/devices/system/cpu/intel_pstate/max_perf_pct:100  <<< Correct
/sys/devices/system/cpu/intel_pstate/min_perf_pct:100  <<< Incorrect
/sys/devices/system/cpu/intel_pstate/no_turbo:0
/sys/devices/system/cpu/intel_pstate/num_pstates:23
/sys/devices/system/cpu/intel_pstate/status:active
/sys/devices/system/cpu/intel_pstate/turbo_pct:18

$ grep . /sys/devices/system/cpu/cpu0/cpufreq/*
/sys/devices/system/cpu/cpu0/cpufreq/affected_cpus:0
grep: /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq: Permission denied
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq:3800000  <<< Correct
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq:1600000  <<< Correct
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency:4294967295
/sys/devices/system/cpu/cpu0/cpufreq/related_cpus:0
/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors:performance powersave
/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:1600805
/sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave  <<< Notice
/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:3800000  <<< Correct
/sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:3800000  <<< Incorrect
/sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed:<unsupported>

$ sudo rdmsr --bitfield 15:8 -d -a 0x199  <<< Requested P-States
38   <<<< All are pinned, system is idle
38
38
38
38
38
38
38

**: After boot means > 1 minute after boot, because my distro (Ubtunu)
starts up using the performance governor and then changes to powersave
after 1 minute.

... Doug
Doug Smythies March 29, 2017, 10:16 p.m. UTC | #4
Hi  Rafael,

Disregard.
I see this was fixed in kernel 4.11-rc4.
While kernel 4.11-rc4 was where I thought I started from earlier,
actually I was running 4.11-rc2 at the time.

sorry for the noise.

On 2017.03.29 15:01 Doug Smythies wrote:
> Sorry for the delay, but I didn't notice until today that this
> commit causes a regression, at least in my computer.
>
> I have not figured out exactly what is wrong, as I must
> admit I am finding these policy interactions difficult
> to follow.
>
> On 2017.03.02 14:29 Rafael J. Wysocki wrote:
>>From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> If the current P-state selection algorithm is set to "performance"
>> in intel_pstate_set_policy(), the limits may be initialized from
>
> ... [cut] ...
>
> Going back to kernel 4.11-rc1 I get this after boot**:
>
> $ uname -a
> Linux s15 4.11.0-rc1-stock #217 SMP Sun Mar 5 15:34:38 PST 2017 x86_64 x86_64 x86_64 GNU/Linux
Rafael J. Wysocki March 29, 2017, 10:17 p.m. UTC | #5
On Thu, Mar 30, 2017 at 12:16 AM, Doug Smythies <dsmythies@telus.net> wrote:
> Hi  Rafael,
>
> Disregard.
> I see this was fixed in kernel 4.11-rc4.
> While kernel 4.11-rc4 was where I thought I started from earlier,
> actually I was running 4.11-rc2 at the time.
>
> sorry for the noise.

No worries.

Thanks,
Rafael
diff mbox

Patch

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -382,6 +382,7 @@  static void intel_pstate_set_performance
 	intel_pstate_init_limits(limits);
 	limits->min_perf_pct = 100;
 	limits->min_perf = int_ext_tofp(1);
+	limits->min_sysfs_pct = 100;
 }
 
 static DEFINE_MUTEX(intel_pstate_driver_lock);
@@ -2146,16 +2147,11 @@  static int intel_pstate_set_policy(struc
 	mutex_lock(&intel_pstate_limits_lock);
 
 	if (policy->policy == CPUFREQ_POLICY_PERFORMANCE) {
+		pr_debug("set performance\n");
 		if (!perf_limits) {
 			limits = &performance_limits;
 			perf_limits = limits;
 		}
-		if (policy->max >= policy->cpuinfo.max_freq &&
-		    !limits->no_turbo) {
-			pr_debug("set performance\n");
-			intel_pstate_set_performance_limits(perf_limits);
-			goto out;
-		}
 	} else {
 		pr_debug("set powersave\n");
 		if (!perf_limits) {
@@ -2166,7 +2162,7 @@  static int intel_pstate_set_policy(struc
 	}
 
 	intel_pstate_update_perf_limits(policy, perf_limits);
- out:
+
 	if (cpu->policy == CPUFREQ_POLICY_PERFORMANCE) {
 		/*
 		 * NOHZ_FULL CPUs need this as the governor callback may not