diff mbox

[RFC,7/7] cpufreq: intel_pstate: Change P-state selection algorithm for Core

Message ID 000301d1f57b$9e1fed10$da5fc730$@net (mailing list archive)
State RFC, archived
Headers show

Commit Message

Doug Smythies Aug. 13, 2016, 3:59 p.m. UTC
On 2016.08.05 17:02 Rafael J. Wysocki wrote:
>> On 2016.08.03 21:19 Doug Smythies wrote:
>>> On 2016.07.31 16:49 Rafael J. Wysocki wrote:
>>> 
>>> The PID-base P-state selection algorithm used by intel_pstate for
>>> Core processors is based on very weak foundations.
>>
>> ...[cut]...
>> 
>>> +static inline int32_t get_target_pstate_default(struct cpudata *cpu)
>>> +{
>>> +	struct sample *sample = &cpu->sample;
>>> +	int32_t busy_frac;
>>> +	int pstate;
>>> +
>>> +	busy_frac = div_fp(sample->mperf, sample->tsc);
>>> +	sample->busy_scaled = busy_frac * 100;
>>> +
>>> +	if (busy_frac < cpu->iowait_boost)
>>> +		busy_frac = cpu->iowait_boost;
>>> +
>>> +	cpu->iowait_boost >>= 1;
>>> +
>>> +	pstate = cpu->pstate.turbo_pstate;
>>> +	return fp_toint((pstate + (pstate >> 2)) * busy_frac);
>>> +}
>>> +
>>
My previous replies (and see below) have suggested that some filtering
is needed on the target pstate, otherwise, and dependant on the type of
workload, it tends to oscillate.

I added the IIR (Infinite Impulse Response) filter that I have suggested in the past:


The filter introduces a trade-off between step function load response time
and the tendency for the target pstate to oscillate.

...[cut]...

>> Several tests were done with this patch set.
>> The patch set would not apply to kernel 4.7, but did apply fine to a 4.7+ kernel
>> (I did as of 7a66ecf) from a few days ago.
>>
>> Test 1: Phoronix ffmpeg test (less time is better):
>> Reason: Because it suffers from rotating amongst CPUs in an odd way, challenging for CPU frequency scaling drivers.
>> This test tends to be an indicator of potential troubles with some games.
>> Criteria: (Dirk Brandewie): Must match or better acpi_cpufreq - ondemand.
>> With patch set: 15.8 Seconds average and 24.51 package watts.
>> Without patch set: 11.61 Seconds average and 27.59 watts.
>> Conclusion: Significant reduction in performance with proposed patch set.

With the filter this become even worse at ~18 seconds.
I used to fix this by moving the response curve to the left.
I have not tested this:

+       unfiltered_target = (pstate + (pstate >> 1)) * busy_frac;

which moves the response curve left a little, yet. I will test it.

...[cut]...

>> Test 9: Doug's_specpower simulator (20% load):
>> Time is fixed, less energy is better.
>> Reason: During the long
>> "[intel-pstate driver regression] processor frequency very high even if in idle"
>> and subsequent https://bugzilla.kernel.org/show_bug.cgi?id=115771
>> discussion / thread(s), some sort of test was needed to try to mimic what Srinivas
>> was getting on his fancy SpecPower test platform. So far at least, this test does that.
>> Only the 20% load case was created, because that was the biggest problem case back then.
>> With patch set: 4 tests at an average of 7197 Joules per test, relatively high CPU frequencies.
>> Without the patch set: 4 tests at an average of 5956 Joules per test, relatively low CPU frequencies.
>> Conclusion: 21% energy regression with the patch set.
>> Note: Newer processors might do better than my older i7-2600K.

Patch set + above and IIR gain = 10%: 5670 Joules.
Conclusion: Energy regression eliminated.

Other gains:

gain =  5%: 5342 Joules; Busy MHz: 2172
gain = 10%: 5670 Joules; Busy MHz: 2285
gain = 20%: 6126 Joules; Busy MHz: 2560
gain = 30%: 6426 Joules; Busy MHz: 2739
gain = 40%: 6674 Joules; Busy MHz: 2912
gain = 70%: 7109 Joules; Busy MHz: 3199

locked at minimum pstate (reference): 4653 Joules; Busy MHz: 1600
Performance mode (reference): 7808 Joules; Busy MHz: 3647

>> Test 10: measure the frequency response curve, fixed work packet method,
>> 75 hertz work / sleep frequency (all CPU, no IOWAIT):
>> Reason: To compare to some older data and observe overall.
>> png graph NOT attached.
>> Conclusions: Tends to oscillate, suggesting some sort of damping is needed.
>> However, any filtering tends to increase the step function load rise time
>> (see test 11 below, I think there is some wiggle room here).
>> See also graph which has: with and without patch set; performance mode (for reference);
>> Philippe Longepe's cpu_load method also with setpoint 40 (for reference); one of my previous
>> attempts at a load related patch set from quite some time ago (for reference).

As expected, the filter damps out the oscillation.
New graphs will be sent to Rafael and Srinivas off-list.

>> 
>> Test 11: Look at the step function load response. From no load to 100% on one CPU (CPU load only, no IO).
>> While there is a graph, it is not attached:
>> Conclusion: The step function response is greatly improved (virtually one sample time max).
>> It would probably be O.K. to slow it down a little with a filter so as to reduce the
>> tendency to oscillate under periodic load conditions (to a point, at least. A low enough frequency will
>> always oscillate) (see the graph for test10).

I haven't done this test yet, but from previous work, a gain setting of 10 to 15% gives a
load step function response time similar to the current PID based filter.

The other tests gave similar results with or without the filter.

... Doug


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Peter Zijlstra Aug. 19, 2016, 2:47 p.m. UTC | #1
On Sat, Aug 13, 2016 at 08:59:01AM -0700, Doug Smythies wrote:
> My previous replies (and see below) have suggested that some filtering
> is needed on the target pstate, otherwise, and dependant on the type of
> workload, it tends to oscillate.
> 
> I added the IIR (Infinite Impulse Response) filter that I have suggested in the past:

One question though; why is this filter intel_pstate specific? Should we
not do this in generic code?

> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index c43ef55..262ec5f 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -1313,7 +1318,74 @@ static inline int32_t get_target_pstate_default(struct cpudata *cpu)
>         cpu->iowait_boost >>= 1;
> 
>         pstate = cpu->pstate.turbo_pstate;

> +       unfiltered_target = (pstate + (pstate >> 2)) * busy_frac;

> +       duration_ns = cpu->sample.time - cpu->last_sample_time;
> +
> +       scaled_gain = div_u64(int_tofp(duration_ns) *
> +               int_tofp(pid_params.p_gain_pct), int_tofp(pid_params.sample_rate_ns));

Drop int_to_fp() on one of the dividend terms and in the divisor. Same
end result since they divide away against one another but reduces the
risk of overflow.

Also, sample_rate_ns, really!? A rate is in [1/s], should that thing be
called period_ns ?

> +       if (scaled_gain > int_tofp(100))
> +               scaled_gain = int_tofp(100);

> +       if (scaled_gain < int_tofp(pid_params.p_gain_pct))
> +               scaled_gain = int_tofp(pid_params.p_gain_pct);
> +
> +       /*
> +        * Bandwidth limit the output. For now, re-task p_gain_pct for this purpose.
> +        * Use a smple IIR (Infinite Impulse Response) filter.
> +        */
> +       cpu->sample.target = div_u64((int_tofp(100) - scaled_gain) *
> +                       cpu->sample.target + scaled_gain *
> +                       unfiltered_target, int_tofp(100));


Really hard to read that stuff, maybe cure with a comment:

	/*
	 *       g = dt*p / period
	 *
	 * target' = (1 - g)*target  +  g*input
	 */

> +
> +       return fp_toint(cpu->sample.target + (1 << (FRAC_BITS-1)));
>  }
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Aug. 20, 2016, 1:06 a.m. UTC | #2
On Friday, August 19, 2016 04:47:29 PM Peter Zijlstra wrote:
> On Sat, Aug 13, 2016 at 08:59:01AM -0700, Doug Smythies wrote:
> > My previous replies (and see below) have suggested that some filtering
> > is needed on the target pstate, otherwise, and dependant on the type of
> > workload, it tends to oscillate.
> > 
> > I added the IIR (Infinite Impulse Response) filter that I have suggested in the past:
> 
> One question though; why is this filter intel_pstate specific? Should we
> not do this in generic code?

The intel_pstate algorithm is based on the feedback registers and I'm not sure
if the same effect appears if utilization is computed in a different way.

> > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> > index c43ef55..262ec5f 100644
> > --- a/drivers/cpufreq/intel_pstate.c
> > +++ b/drivers/cpufreq/intel_pstate.c
> > @@ -1313,7 +1318,74 @@ static inline int32_t get_target_pstate_default(struct cpudata *cpu)
> >         cpu->iowait_boost >>= 1;
> > 
> >         pstate = cpu->pstate.turbo_pstate;
> 
> > +       unfiltered_target = (pstate + (pstate >> 2)) * busy_frac;
> 
> > +       duration_ns = cpu->sample.time - cpu->last_sample_time;
> > +
> > +       scaled_gain = div_u64(int_tofp(duration_ns) *
> > +               int_tofp(pid_params.p_gain_pct), int_tofp(pid_params.sample_rate_ns));
> 
> Drop int_to_fp() on one of the dividend terms and in the divisor. Same
> end result since they divide away against one another but reduces the
> risk of overflow.
> 
> Also, sample_rate_ns, really!? A rate is in [1/s], should that thing be
> called period_ns ?

That's an old name that hasn't been changed for quite a while.  That said
"period" or "interval" would be better.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Smythies Aug. 20, 2016, 6:40 a.m. UTC | #3
On 2016.08.19 07:47 Peter Zijlstra wrote:
> On Sat, Aug 13, 2016 at 08:59:01AM -0700, Doug Smythies wrote:
>> My previous replies (and see below) have suggested that some filtering
>> is needed on the target pstate, otherwise, and dependant on the type of
>> workload, it tends to oscillate.
>> 
>> I added the IIR (Infinite Impulse Response) filter that I have suggested in the past:
>
> One question though; why is this filter intel_pstate specific? Should we
> not do this in generic code?

I wouldn't know. I'm not familiar with the other CPU frequency scaling drivers
or what filtering, if any, they already have.

>> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
>> index c43ef55..262ec5f 100644
>> --- a/drivers/cpufreq/intel_pstate.c
>> +++ b/drivers/cpufreq/intel_pstate.c
>> @@ -1313,7 +1318,74 @@ static inline int32_t get_target_pstate_default(struct cpudata *cpu)
>>         cpu->iowait_boost >>= 1;
>> 
>>         pstate = cpu->pstate.turbo_pstate;
>> +       unfiltered_target = (pstate + (pstate >> 2)) * busy_frac;
>> +       duration_ns = cpu->sample.time - cpu->last_sample_time;
>> +
>> +       scaled_gain = div_u64(int_tofp(duration_ns) *
>> +               int_tofp(pid_params.p_gain_pct), int_tofp(pid_params.sample_rate_ns));
>
> Drop int_to_fp() on one of the dividend terms and in the divisor. Same
> end result since they divide away against one another but reduces the
> risk of overflow.

Yes of course. Thanks.

> Also, sample_rate_ns, really!? A rate is in [1/s], should that thing be
> called period_ns ?

Agreed (strongly), however and as Rafael mentioned on his reply, this stuff
has been around for a long time, including the externally available:

/sys/kernel/debug/pstate_snb/sample_rate_ms

Which be referenced by some documentation and scripts (I have some).
Myself, I'd be O.K. to change it all to "period".

>> +       if (scaled_gain > int_tofp(100))
>> +               scaled_gain = int_tofp(100);
>> +       if (scaled_gain < int_tofp(pid_params.p_gain_pct))
>> +               scaled_gain = int_tofp(pid_params.p_gain_pct);
>> +
>> +       /*
>> +        * Bandwidth limit the output. For now, re-task p_gain_pct for this purpose.
>> +        * Use a smple IIR (Infinite Impulse Response) filter.
>> +        */
>> +       cpu->sample.target = div_u64((int_tofp(100) - scaled_gain) *
>> +                       cpu->sample.target + scaled_gain *
>> +                       unfiltered_target, int_tofp(100));
>
> Really hard to read that stuff, maybe cure with a comment:
>
>	/*
>	 *       g = dt*p / period
>	 *
>	 * target' = (1 - g)*target  +  g*input
>	 */

Yes, O.K. I'll add more comments if this continues towards a formal
patch submission.

>> +
>> +       return fp_toint(cpu->sample.target + (1 << (FRAC_BITS-1)));
>>  }



--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
srinivas pandruvada Aug. 22, 2016, 6:53 p.m. UTC | #4
Hi Doug,

I am not able to apply this patch. Can you send as a patch on top of
Rafael's RFC 7/7. Since test takes long time, I want to apply correct
patch.

Thanks,
Srinivas

On Sat, 2016-08-13 at 08:59 -0700, Doug Smythies wrote:
> On 2016.08.05 17:02 Rafael J. Wysocki wrote:
> > 
> > > 
> > > On 2016.08.03 21:19 Doug Smythies wrote:
> > > > 
> > > > On 2016.07.31 16:49 Rafael J. Wysocki wrote:
> > > > 
> > > > The PID-base P-state selection algorithm used by intel_pstate
> > > > for
> > > > Core processors is based on very weak foundations.
> > > ...[cut]...
> > > 
> > > > 
> > > > +static inline int32_t get_target_pstate_default(struct cpudata
> > > > *cpu)
> > > > +{
> > > > +	struct sample *sample = &cpu->sample;
> > > > +	int32_t busy_frac;
> > > > +	int pstate;
> > > > +
> > > > +	busy_frac = div_fp(sample->mperf, sample->tsc);
> > > > +	sample->busy_scaled = busy_frac * 100;
> > > > +
> > > > +	if (busy_frac < cpu->iowait_boost)
> > > > +		busy_frac = cpu->iowait_boost;
> > > > +
> > > > +	cpu->iowait_boost >>= 1;
> > > > +
> > > > +	pstate = cpu->pstate.turbo_pstate;
> > > > +	return fp_toint((pstate + (pstate >> 2)) * busy_frac);
> > > > +}
> > > > +
> My previous replies (and see below) have suggested that some
> filtering
> is needed on the target pstate, otherwise, and dependant on the type
> of
> workload, it tends to oscillate.
> 
> I added the IIR (Infinite Impulse Response) filter that I have
> suggested in the past:
> 
> diff --git a/drivers/cpufreq/intel_pstate.c
> b/drivers/cpufreq/intel_pstate.c
> index c43ef55..262ec5f 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -98,6 +98,7 @@ static inline u64 div_ext_fp(u64 x, u64 y)
>   * @tsc:               Difference of time stamp counter between last
> and
>   *                     current sample
>   * @time:              Current time from scheduler
> + * @target:            target pstate filtered.
>   *
>   * This structure is used in the cpudata structure to store
> performance sample
>   * data for choosing next P State.
> @@ -108,6 +109,7 @@ struct sample {
>         u64 aperf;
>         u64 mperf;
>         u64 tsc;
> +       u64 target;
>         u64 time;
>  };
> 
> @@ -1168,6 +1170,7 @@ static void intel_pstate_get_cpu_pstates(struct
> cpudata *cpu)
>                 pstate_funcs.get_vid(cpu);
> 
>         intel_pstate_set_min_pstate(cpu);
> +       cpu->sample.target = int_tofp(cpu->pstate.min_pstate);
>  }
> 
>  static inline void intel_pstate_calc_avg_perf(struct cpudata *cpu)
> @@ -1301,8 +1304,10 @@ static inline int32_t
> get_target_pstate_use_performance(struct cpudata *cpu)
>  static inline int32_t get_target_pstate_default(struct cpudata *cpu)
>  {
>         struct sample *sample = &cpu->sample;
> +       int64_t scaled_gain, unfiltered_target;
>         int32_t busy_frac;
>         int pstate;
> +       u64 duration_ns;
> 
>         busy_frac = div_fp(sample->mperf, sample->tsc);
>         sample->busy_scaled = busy_frac * 100;
> @@ -1313,7 +1318,74 @@ static inline int32_t
> get_target_pstate_default(struct cpudata *cpu)
>         cpu->iowait_boost >>= 1;
> 
>         pstate = cpu->pstate.turbo_pstate;
> -       return fp_toint((pstate + (pstate >> 2)) * busy_frac);
> +       /* To Do: I think the above should be:
> +        *
> +        * if (limits.no_turbo || limits.turbo_disabled)
> +        *      pstate = cpu->pstate.max_pstate;
> +        * else
> +        *      pstate = cpu->pstate.turbo_pstate;
> +        *
> +        * figure it out.
> +        *
> +        * no clamps. Pre-filter clamping was needed in past
> implementations.
> +        * To Do: Is any pre-filter clamping needed here? */
> +
> +       unfiltered_target = (pstate + (pstate >> 2)) * busy_frac;
> +
> +       /*
> +        * Idle check.
> +        * We have a deferrable timer. Very long durations can be
> +        * either due to long idle (C0 time near 0),
> +        * or due to short idle times that spanned jiffy boundaries
> +        * (C0 time not near zero).
> +        *
> +        * To Do: As of the utilization stuff, I do not think the
> +        * spanning jiffy boundaries thing is true anymore.
> +        * Check, and fix the comment.
> +        *
> +        * The very long durations are 0.4 seconds or more.
> +        * Either way, a very long duration will effectively flush
> +        * the IIR filter, otherwise falling edge load response times
> +        * can be on the order of tens of seconds, because this
> driver
> +        * runs very rarely. Furthermore, for higher periodic loads
> that
> +        * just so happen to not be in the C0 state on jiffy
> boundaries,
> +        * the long ago history should be forgotten.
> +        * For cases of durations that are a few times the set sample
> +        * period, increase the IIR filter gain so as to weight
> +        * the current sample more appropriately.
> +        *
> +        * To Do: sample_time should be forced to be accurate. For
> +        * example if the kernel is a 250 Hz kernel, then a
> +        * sample_rate_ms of 10 should result in a sample_time of 12.
> +        *
> +        * To Do: Check that the IO Boost case is not filtered too
> much.
> +        *        It might be that a filter by-pass is needed for the
> boost case.
> +        *        However, the existing gain = f(duration) might be
> good enough.
> +        */
> +
> +       duration_ns = cpu->sample.time - cpu->last_sample_time;
> +
> +       scaled_gain = div_u64(int_tofp(duration_ns) *
> +               int_tofp(pid_params.p_gain_pct),
> int_tofp(pid_params.sample_rate_ns));
> +       if (scaled_gain > int_tofp(100))
> +               scaled_gain = int_tofp(100);
> +       /*
> +        * This code should not be required,
> +        * but short duration times have been observed
> +        * To Do: Check if this code is actually still needed. I
> don't think so.
> +        */
> +       if (scaled_gain < int_tofp(pid_params.p_gain_pct))
> +               scaled_gain = int_tofp(pid_params.p_gain_pct);
> +
> +       /*
> +        * Bandwidth limit the output. For now, re-task p_gain_pct
> for this purpose.
> +        * Use a smple IIR (Infinite Impulse Response) filter.
> +        */
> +       cpu->sample.target = div_u64((int_tofp(100) - scaled_gain) *
> +                       cpu->sample.target + scaled_gain *
> +                       unfiltered_target, int_tofp(100));
> +
> +       return fp_toint(cpu->sample.target + (1 << (FRAC_BITS-1)));
>  }
> 
>  static inline void intel_pstate_update_pstate(struct cpudata *cpu,
> int pstate)
> @@ -1579,6 +1651,7 @@ static void intel_pstate_stop_cpu(struct
> cpufreq_policy *policy)
>                 return;
> 
>         intel_pstate_set_min_pstate(cpu);
> +       cpu->sample.target = int_tofp(cpu->pstate.min_pstate);
>  }
> 
>  static int intel_pstate_cpu_init(struct cpufreq_policy *policy)
> 
> The filter introduces a trade-off between step function load response
> time
> and the tendency for the target pstate to oscillate.
> 
> ...[cut]...
> 
> > 
> > > 
> > > Several tests were done with this patch set.
> > > The patch set would not apply to kernel 4.7, but did apply fine
> > > to a 4.7+ kernel
> > > (I did as of 7a66ecf) from a few days ago.
> > > 
> > > Test 1: Phoronix ffmpeg test (less time is better):
> > > Reason: Because it suffers from rotating amongst CPUs in an odd
> > > way, challenging for CPU frequency scaling drivers.
> > > This test tends to be an indicator of potential troubles with
> > > some games.
> > > Criteria: (Dirk Brandewie): Must match or better acpi_cpufreq -
> > > ondemand.
> > > With patch set: 15.8 Seconds average and 24.51 package watts.
> > > Without patch set: 11.61 Seconds average and 27.59 watts.
> > > Conclusion: Significant reduction in performance with proposed
> > > patch set.
> With the filter this become even worse at ~18 seconds.
> I used to fix this by moving the response curve to the left.
> I have not tested this:
> 
> +       unfiltered_target = (pstate + (pstate >> 1)) * busy_frac;
> 
> which moves the response curve left a little, yet. I will test it.
> 
> ...[cut]...
> 
> > 
> > > 
> > > Test 9: Doug's_specpower simulator (20% load):
> > > Time is fixed, less energy is better.
> > > Reason: During the long
> > > "[intel-pstate driver regression] processor frequency very high
> > > even if in idle"
> > > and subsequent https://bugzilla.kernel.org/show_bug.cgi?id=115771
> > > discussion / thread(s), some sort of test was needed to try to
> > > mimic what Srinivas
> > > was getting on his fancy SpecPower test platform. So far at
> > > least, this test does that.
> > > Only the 20% load case was created, because that was the biggest
> > > problem case back then.
> > > With patch set: 4 tests at an average of 7197 Joules per test,
> > > relatively high CPU frequencies.
> > > Without the patch set: 4 tests at an average of 5956 Joules per
> > > test, relatively low CPU frequencies.
> > > Conclusion: 21% energy regression with the patch set.
> > > Note: Newer processors might do better than my older i7-2600K.
> Patch set + above and IIR gain = 10%: 5670 Joules.
> Conclusion: Energy regression eliminated.
> 
> Other gains:
> 
> gain =  5%: 5342 Joules; Busy MHz: 2172
> gain = 10%: 5670 Joules; Busy MHz: 2285
> gain = 20%: 6126 Joules; Busy MHz: 2560
> gain = 30%: 6426 Joules; Busy MHz: 2739
> gain = 40%: 6674 Joules; Busy MHz: 2912
> gain = 70%: 7109 Joules; Busy MHz: 3199
> 
> locked at minimum pstate (reference): 4653 Joules; Busy MHz: 1600
> Performance mode (reference): 7808 Joules; Busy MHz: 3647
> 
> > 
> > > 
> > > Test 10: measure the frequency response curve, fixed work packet
> > > method,
> > > 75 hertz work / sleep frequency (all CPU, no IOWAIT):
> > > Reason: To compare to some older data and observe overall.
> > > png graph NOT attached.
> > > Conclusions: Tends to oscillate, suggesting some sort of damping
> > > is needed.
> > > However, any filtering tends to increase the step function load
> > > rise time
> > > (see test 11 below, I think there is some wiggle room here).
> > > See also graph which has: with and without patch set; performance
> > > mode (for reference);
> > > Philippe Longepe's cpu_load method also with setpoint 40 (for
> > > reference); one of my previous
> > > attempts at a load related patch set from quite some time ago
> > > (for reference).
> As expected, the filter damps out the oscillation.
> New graphs will be sent to Rafael and Srinivas off-list.
> 
> > 
> > > 
> > > 
> > > Test 11: Look at the step function load response. From no load to
> > > 100% on one CPU (CPU load only, no IO).
> > > While there is a graph, it is not attached:
> > > Conclusion: The step function response is greatly improved
> > > (virtually one sample time max).
> > > It would probably be O.K. to slow it down a little with a filter
> > > so as to reduce the
> > > tendency to oscillate under periodic load conditions (to a point,
> > > at least. A low enough frequency will
> > > always oscillate) (see the graph for test10).
> I haven't done this test yet, but from previous work, a gain setting
> of 10 to 15% gives a
> load step function response time similar to the current PID based
> filter.
> 
> The other tests gave similar results with or without the filter.
> 
> ... Doug
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Smythies Aug. 22, 2016, 10:53 p.m. UTC | #5
On 2016.08.22 11:54 Srinivas Pandruvada wrote:

> I am not able to apply this patch. Can you send as a patch on top of
> Rafael's RFC 7/7. Since test takes long time, I want to apply correct
> patch.

...[cut]...

O.K. I have included a couple of changes as per the feedback from Peter Zijlstra,
but all of my "To Do" notes are still there.

Coming shortly as patch 8 of 7 as per below:

doug@s15:~/temp-k-git/linux$ git log --oneline

305e905 cpufreq: intel_pstate: add iir filter to pstate.
a29457f cpufreq: intel_pstate: Change P-state selection algorithm for Core
75cf618 cpufreq: schedutil: Add iowait boosting
01b35b0 cpufreq / sched: UUF_IO flag to indicate iowait condition
81d0b42 cpufreq / sched: Add flags argument to cpufreq_update_util()
cb8f8e0 cpufreq / sched: Check cpu_of(rq) in cpufreq_update_util()
ffa049b cpufreq / sched: Drop cpufreq_trigger_update()
a3c19c3 cpufreq / sched: Make schedutil access utilization data directly
fa8410b Linux 4.8-rc3

... Doug


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index c43ef55..262ec5f 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -98,6 +98,7 @@  static inline u64 div_ext_fp(u64 x, u64 y)
  * @tsc:               Difference of time stamp counter between last and
  *                     current sample
  * @time:              Current time from scheduler
+ * @target:            target pstate filtered.
  *
  * This structure is used in the cpudata structure to store performance sample
  * data for choosing next P State.
@@ -108,6 +109,7 @@  struct sample {
        u64 aperf;
        u64 mperf;
        u64 tsc;
+       u64 target;
        u64 time;
 };

@@ -1168,6 +1170,7 @@  static void intel_pstate_get_cpu_pstates(struct cpudata *cpu)
                pstate_funcs.get_vid(cpu);

        intel_pstate_set_min_pstate(cpu);
+       cpu->sample.target = int_tofp(cpu->pstate.min_pstate);
 }

 static inline void intel_pstate_calc_avg_perf(struct cpudata *cpu)
@@ -1301,8 +1304,10 @@  static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu)
 static inline int32_t get_target_pstate_default(struct cpudata *cpu)
 {
        struct sample *sample = &cpu->sample;
+       int64_t scaled_gain, unfiltered_target;
        int32_t busy_frac;
        int pstate;
+       u64 duration_ns;

        busy_frac = div_fp(sample->mperf, sample->tsc);
        sample->busy_scaled = busy_frac * 100;
@@ -1313,7 +1318,74 @@  static inline int32_t get_target_pstate_default(struct cpudata *cpu)
        cpu->iowait_boost >>= 1;

        pstate = cpu->pstate.turbo_pstate;
-       return fp_toint((pstate + (pstate >> 2)) * busy_frac);
+       /* To Do: I think the above should be:
+        *
+        * if (limits.no_turbo || limits.turbo_disabled)
+        *      pstate = cpu->pstate.max_pstate;
+        * else
+        *      pstate = cpu->pstate.turbo_pstate;
+        *
+        * figure it out.
+        *
+        * no clamps. Pre-filter clamping was needed in past implementations.
+        * To Do: Is any pre-filter clamping needed here? */
+
+       unfiltered_target = (pstate + (pstate >> 2)) * busy_frac;
+
+       /*
+        * Idle check.
+        * We have a deferrable timer. Very long durations can be
+        * either due to long idle (C0 time near 0),
+        * or due to short idle times that spanned jiffy boundaries
+        * (C0 time not near zero).
+        *
+        * To Do: As of the utilization stuff, I do not think the
+        * spanning jiffy boundaries thing is true anymore.
+        * Check, and fix the comment.
+        *
+        * The very long durations are 0.4 seconds or more.
+        * Either way, a very long duration will effectively flush
+        * the IIR filter, otherwise falling edge load response times
+        * can be on the order of tens of seconds, because this driver
+        * runs very rarely. Furthermore, for higher periodic loads that
+        * just so happen to not be in the C0 state on jiffy boundaries,
+        * the long ago history should be forgotten.
+        * For cases of durations that are a few times the set sample
+        * period, increase the IIR filter gain so as to weight
+        * the current sample more appropriately.
+        *
+        * To Do: sample_time should be forced to be accurate. For
+        * example if the kernel is a 250 Hz kernel, then a
+        * sample_rate_ms of 10 should result in a sample_time of 12.
+        *
+        * To Do: Check that the IO Boost case is not filtered too much.
+        *        It might be that a filter by-pass is needed for the boost case.
+        *        However, the existing gain = f(duration) might be good enough.
+        */
+
+       duration_ns = cpu->sample.time - cpu->last_sample_time;
+
+       scaled_gain = div_u64(int_tofp(duration_ns) *
+               int_tofp(pid_params.p_gain_pct), int_tofp(pid_params.sample_rate_ns));
+       if (scaled_gain > int_tofp(100))
+               scaled_gain = int_tofp(100);
+       /*
+        * This code should not be required,
+        * but short duration times have been observed
+        * To Do: Check if this code is actually still needed. I don't think so.
+        */
+       if (scaled_gain < int_tofp(pid_params.p_gain_pct))
+               scaled_gain = int_tofp(pid_params.p_gain_pct);
+
+       /*
+        * Bandwidth limit the output. For now, re-task p_gain_pct for this purpose.
+        * Use a smple IIR (Infinite Impulse Response) filter.
+        */
+       cpu->sample.target = div_u64((int_tofp(100) - scaled_gain) *
+                       cpu->sample.target + scaled_gain *
+                       unfiltered_target, int_tofp(100));
+
+       return fp_toint(cpu->sample.target + (1 << (FRAC_BITS-1)));
 }

 static inline void intel_pstate_update_pstate(struct cpudata *cpu, int pstate)
@@ -1579,6 +1651,7 @@  static void intel_pstate_stop_cpu(struct cpufreq_policy *policy)
                return;

        intel_pstate_set_min_pstate(cpu);
+       cpu->sample.target = int_tofp(cpu->pstate.min_pstate);
 }

 static int intel_pstate_cpu_init(struct cpufreq_policy *policy)