diff mbox

[RFC,7/7] cpufreq: intel_pstate: Change P-state selection algorithm for Core

Message ID 000801d1f261$d69fc740$83df55c0$@net (mailing list archive)
State RFC, archived
Headers show

Commit Message

Doug Smythies Aug. 9, 2016, 5:16 p.m. UTC
On 2016.08.05 17:02 Rafael J. Wysocki wrote:
> On 2016.08.03 21:19 Doug Smythies wrote:
>> On 2016.07.31 16:49 Rafael J. Wysocki wrote:
>>
>> 
>>> +static inline int32_t get_target_pstate_default(struct cpudata *cpu)
>>> +{
>>> +	struct sample *sample = &cpu->sample;
>>> +	int32_t busy_frac;
>>> +	int pstate;
>>> +
>>> +	busy_frac = div_fp(sample->mperf, sample->tsc);
>>> +	sample->busy_scaled = busy_frac * 100;
>>> +
>>> +	if (busy_frac < cpu->iowait_boost)
>>> +		busy_frac = cpu->iowait_boost;
>>> +
>>> +	cpu->iowait_boost >>= 1;
>>> +
>>> +	pstate = cpu->pstate.turbo_pstate;
>>> +	return fp_toint((pstate + (pstate >> 2)) * busy_frac);
>>> +}
>>> +
>> 
>> The response curve is not normalized on the lower end to the minimum
>> pstate for the processor, meaning the overall response will vary
>> between processors as a function of minimum pstate.

> But that's OK IMO.
>
> Mapping busy_frac = 0 to the minimum P-state would over-provision workloads
> with small values of busy_frac.

Agreed, mapping busy_frac = 0 to the minimum Pstate would be a bad thing to do.

However, that is not what I meant. I meant that the mapping of busy-frac = N to
the minimum pstate for the processor should give the same "N" (within granularity),
regardless of the processor.

Example, my processor, i7-2600K: max pstate = 38; min pstate = 16.
Load before going to pstate, 17:  17 = (38 + 38/4) * load
Load = N = 35.8 %

Example, something like, i5-3337U (I think, I don't actually have one):
max pstate = 27; min pstate = 8.
Load before going to pstate, 9: 9 = (27 + 27/4) * load
Load =  N = 26.7 %

It was a couple of years ago, so I should re-do the sensitivity
analysis/testing, but I concluded that the performance / energy tradeoff
was somewhat sensitive to "N".
 
I am suggesting that the response curve, or transfer function,
needs to be normalized, for any processor, to:

  Max pstate |              __________
             |             /
             |            /
             |           /
             |          /
             |         /
   Min pstate| _______/
             |__________________________
              |       |     |          |
              0%      N     M         100%
                      CPU load

Currently M ~= 80%

One time (not re-based since kernel 4.3) I did have a proposed solution [1],
but it was expensive in terms of extra multiplies and divides.

[1]: http://marc.info/?l=linux-pm&m=142881187323474&w=2

>> The clamping at maximum pstate at about 80% load seems at little high
>> to me. Work I have done in various attempts to bring back the use of actual load
>> has always ended up achieving maximum pstate before 80% load for best results.
>> Even the get_target_pstate_cpu_load people reach the max pstate faster, and they
>> are more about energy than performance.
>> What was the criteria for the decision here? Are test results available for review
>> and/or duplication by others?

> This follows the coefficient used by the schedutil governor, but then the
> metric is different, so quite possibly a different value may work better here.
> 
> We'll test other values before applying this for sure. :-)

I am now testing this change to the code (for M ~= 67%; N ~= 30% (my CPU); N ~= 22% (i5-3337U)):


> 
> Several tests were done with this patch set.
...[cut]...

>> Questions:
>> Is there a migration plan?
>
> Not yet.  We have quite a lot of testing to do first.
>
>> i.e. will there be an attempt to merge the current cpu_load method
>> and this method into one method?
>
> Quite possibly if the results are good enough.
>
>> Then possibly the PID controller could be eliminated.
>
> Right.

I think this change is important, and I'll help with it as best as I can.

... Doug

A related CPU frequency Vs. Load graph will be sent to Rafael and Srinivas off-list.


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 8b2bdb7..909d441 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -1313,7 +1313,7 @@  static inline int32_t get_target_pstate_default(struct cpudata *cpu)
        cpu->iowait_boost >>= 1;

        pstate = cpu->pstate.turbo_pstate;
-       return fp_toint((pstate + (pstate >> 2)) * busy_frac);
+       return fp_toint((pstate + (pstate >> 1)) * busy_frac);
 }

 static inline void intel_pstate_update_pstate(struct cpudata *cpu, int pstate)