diff mbox

[v2,1/3] cpufreq: ondemand: Change the calculation of target frequency

Message ID 51ACF2EA.70001@semaphore.gr (mailing list archive)
State Changes Requested, archived
Headers show

Commit Message

Stratos Karafotis June 3, 2013, 7:47 p.m. UTC
Ondemand calculates load in terms of frequency and increases it only
if the load_freq is greater than up_threshold multiplied by current
or average frequency. This seems to produce oscillations of frequency
between min and max because, for example, a relatively small load can
easily saturate minimum frequency and lead the CPU to max. Then, the
CPU will decrease back to min due to a small load_freq.

This patch changes the calculation method of load and target frequency
considering 2 points:
- Load computation should be independent from current or average
measured frequency. For example an absolute load 80% at 100MHz is not
necessarily equivalent to 8% at 1000MHz in the next sampling interval.
- Target frequency should be increased to any value of frequency table
proportional to absolute load, instead to only the max. Thus:

Target frequency = C * load

where C = policy->cpuinfo.max_freq / 100

Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
that middle frequencies are used more, with this patch. Highest
and lowest frequencies were used less by ~9%

Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
---
 drivers/cpufreq/cpufreq_governor.c | 10 +---------
 drivers/cpufreq/cpufreq_governor.h |  1 -
 drivers/cpufreq/cpufreq_ondemand.c | 39 +++++++-------------------------------
 3 files changed, 8 insertions(+), 42 deletions(-)

Comments

David C Niemi June 3, 2013, 8:38 p.m. UTC | #1
Interesting analysis; I just got back from vacation and have not had a chance to comment until now.

I like Stratos' general idea of making the decision to upshift or downshift independent of current frequency, as it makes thinks simpler and potentially more stable.  But I believe it will be important to measure performance and power consumption in a wider range of use cases to know whether it is an overall win (or whether it can at least be tuned to match the status quo for various use cases).

In my main use case (network servers), I don't think using more middle frequencies is a good thing at all; as soon as a load gets heavy even briefly I want the CPU doing all it can until the load has clearly abated.  The main competition in this use case is between using ondemand (tuned for performance at the cost of some extra power consumption) or the "performance" governor (which cannot be tuned at all, and where C-states are the only hope for moderating power consumption).

A couple of additional points -- it is possible to get excellent overall performance and avoid oscillation using ondemand right now by using a low up_threshold and a sampling_down_factor of around 100; in this case you spend most of your time at either the lowest or highest possible frequency and you spend very little time thinking about slowing down.  The main downside of this is an increase in power consumption, so it is not a battery-friendly approach, but someone will need to also measure power consumption if we want to justify a change from the status quo on that basis.  There are dozens of ways to save power at the expense of performance or vice versa, so any major change like this needs to be analyzed for both, in case your patch just results in running at higher average frequencies and gets its performance boost from that.

David C Niemi

On 06/03/13 15:47, Stratos Karafotis wrote:
> Ondemand calculates load in terms of frequency and increases it only
> if the load_freq is greater than up_threshold multiplied by current
> or average frequency. This seems to produce oscillations of frequency
> between min and max because, for example, a relatively small load can
> easily saturate minimum frequency and lead the CPU to max. Then, the
> CPU will decrease back to min due to a small load_freq.
>
> This patch changes the calculation method of load and target frequency
> considering 2 points:
> - Load computation should be independent from current or average
> measured frequency. For example an absolute load 80% at 100MHz is not
> necessarily equivalent to 8% at 1000MHz in the next sampling interval.
> - Target frequency should be increased to any value of frequency table
> proportional to absolute load, instead to only the max. Thus:
>
> Target frequency = C * load
>
> where C = policy->cpuinfo.max_freq / 100
>
> Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
> increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
> that middle frequencies are used more, with this patch. Highest
> and lowest frequencies were used less by ~9%
>
> Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
> ---
>  drivers/cpufreq/cpufreq_governor.c | 10 +---------
>  drivers/cpufreq/cpufreq_governor.h |  1 -
>  drivers/cpufreq/cpufreq_ondemand.c | 39 +++++++-------------------------------
>  3 files changed, 8 insertions(+), 42 deletions(-)
>
...
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stratos Karafotis June 4, 2013, 3:20 p.m. UTC | #2
On 06/03/2013 11:38 PM, David C Niemi wrote:
> 
> Interesting analysis; I just got back from vacation and have not had a chance to comment until now.
> 
> I like Stratos' general idea of making the decision to upshift or downshift independent of current frequency, as it makes thinks simpler and potentially more stable.  But I believe it will be important to measure performance and power consumption in a wider range of use cases to know whether it is an overall win (or whether it can at least be tuned to match the status quo for various use cases).
> 
> In my main use case (network servers), I don't think using more middle frequencies is a good thing at all; as soon as a load gets heavy even briefly I want the CPU doing all it can until the load has clearly abated.  The main competition in this use case is between using ondemand (tuned for performance at the cost of some extra power consumption) or the "performance" governor (which cannot be tuned at all, and where C-states are the only hope for moderating power consumption).
> 
> A couple of additional points -- it is possible to get excellent overall performance and avoid oscillation using ondemand right now by using a low up_threshold and a sampling_down_factor of around 100; in this case you spend most of your time at either the lowest or highest possible frequency and you spend very little time thinking about slowing down.  The main downside of this is an increase in power consumption, so it is not a battery-friendly approach, but someone will need to also measure power consumption if we want to justify a change from the status quo on that basis.  There are dozens of ways to save power at the expense of performance or vice versa, so any major change like this needs to be analyzed for both, in case your patch just results in running at higher average frequencies and gets its performance boost from that.
> 
> David C Niemi

Hi David, 
Thanks for your comments!

In your case, the behavior of ondemand will not change to the worst.
up_threshold/sampling down factor remain as is. 
So, for loads above up_threshold ondemand will behave the same.

For loads lower than up_threshold, CPU will remain in lowest
frequency or downshift to a middle one with the old method.
After this patch, CPU will remain to the lowest or downshift to a
middle frequency or upshift to a middle frequency. So, I think we will
have a better performance, with the patch.

I know that CPU load tends to be chaotic, but please let me try to explain
my logic with a theoretical example to compare ondemand with and without
this patch that I think it will be valid in many cases.

Let's assume for simplicity a single core CPU with available
frequencies 100-1000MHz in steps of 100MHz. The architecture does
not support APERF/MPERF to measure average frequency. All tunables
to default values. As initial state we consider that the CPU is
idling in 100MHz with load = 0 (ideally).

A process needs CPU time and in the next iteration ondemand calculates
the load of the previous sampling interval.
There are 3 different possible paths:
1) Load is greater than up_threshold: with or without the patch, CPU will increase to max.
2) Load is lower than 10: with or without the patch, CPU will remain in the lowest freq.
3) Load between 10 and up_threshold, for example 50:
	without this patch, CPU will remain to 100MHz
	with this patch, CPU will increase to a frequency that it's directly
	proportional to load (500MHz)

If we concern about performance, ondemand will behave better with this patch
for case 3. But what about power consumption? I would say that this depends
on the duration of load:

3a) Suppose that the process causes a CPU load of 50 for 5 sampling periods without this patch.
Without this patch, the CPU will remain for 5 sampling periods in 100MHz
With this patch, CPU will increase to 500Mhz, most probably, for ~1 sampling period.

3b) The process causes a CPU load of 50 for 1 sampling period.
Without this patch, the CPU will remain to 100MHz for 1 sampling period
With this patch, the CPU will increase to 500MHz for 1 sampling period

3c) The process causes a CPU load of 50, and then increases to 100 for next iterations
(most probably because the process started in the middle of sampling period).
Without this patch CPU will remain to 100MHz for the 1st period and then
it will increase to 1000MHz for next iterations.
With this patch the CPU will increase to 500MHz for the 1st period and then
it will increase to 1000MHz for next iterations.

The only case that the new method will be less power efficient is b) but I think there will be
significant improvement in performance for a) and c)

The results will be similar when the governor upshifts from any other frequency.

Using the highest frequency, the proposed method will downshift to lower frequencies
because with the 'old' method the calculation it's dependent from the current frequency
and up_threshold.

In this simplified example the new method seems to have a better ratio of
performance/power consumption.

I don't know if it is appropriate to mention that I use the proposed method
in 3.4.47 and 3.0.80 kernels for two embedded devices (smart phones). There are
about 2,000 installations and no reports for increased power consumption (so far).
Of course this is not a proof but maybe and indication.
But unfortunately, I don't have measurements about the ratio of performance/consumption.

Thanks,
Stratos

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David C Niemi June 4, 2013, 5 p.m. UTC | #3
Thanks for the detailed response.  I am heartened by the edge cases (true idle and above up_threshold) remaining the same, that sounds like it should cover a lot of ground well.

David C Niemi


On 06/04/13 11:20, Stratos Karafotis wrote:
> Hi David, 
> Thanks for your comments!
>
> In your case, the behavior of ondemand will not change to the worst.
> up_threshold/sampling down factor remain as is. 
> So, for loads above up_threshold ondemand will behave the same.
>
> For loads lower than up_threshold, CPU will remain in lowest
> frequency or downshift to a middle one with the old method.
> After this patch, CPU will remain to the lowest or downshift to a
> middle frequency or upshift to a middle frequency. So, I think we will
> have a better performance, with the patch.
>
> I know that CPU load tends to be chaotic, but please let me try to explain
> my logic with a theoretical example to compare ondemand with and without
> this patch that I think it will be valid in many cases.
>
> Let's assume for simplicity a single core CPU with available
> frequencies 100-1000MHz in steps of 100MHz. The architecture does
> not support APERF/MPERF to measure average frequency. All tunables
> to default values. As initial state we consider that the CPU is
> idling in 100MHz with load = 0 (ideally).
>
> A process needs CPU time and in the next iteration ondemand calculates
> the load of the previous sampling interval.
> There are 3 different possible paths:
> 1) Load is greater than up_threshold: with or without the patch, CPU will increase to max.
> 2) Load is lower than 10: with or without the patch, CPU will remain in the lowest freq.
> 3) Load between 10 and up_threshold, for example 50:
> 	without this patch, CPU will remain to 100MHz
> 	with this patch, CPU will increase to a frequency that it's directly
> 	proportional to load (500MHz)
>
> If we concern about performance, ondemand will behave better with this patch
> for case 3. But what about power consumption? I would say that this depends
> on the duration of load:
>
> 3a) Suppose that the process causes a CPU load of 50 for 5 sampling periods without this patch.
> Without this patch, the CPU will remain for 5 sampling periods in 100MHz
> With this patch, CPU will increase to 500Mhz, most probably, for ~1 sampling period.
>
> 3b) The process causes a CPU load of 50 for 1 sampling period.
> Without this patch, the CPU will remain to 100MHz for 1 sampling period
> With this patch, the CPU will increase to 500MHz for 1 sampling period
>
> 3c) The process causes a CPU load of 50, and then increases to 100 for next iterations
> (most probably because the process started in the middle of sampling period).
> Without this patch CPU will remain to 100MHz for the 1st period and then
> it will increase to 1000MHz for next iterations.
> With this patch the CPU will increase to 500MHz for the 1st period and then
> it will increase to 1000MHz for next iterations.
>
> The only case that the new method will be less power efficient is b) but I think there will be
> significant improvement in performance for a) and c)
>
> The results will be similar when the governor upshifts from any other frequency.
>
> Using the highest frequency, the proposed method will downshift to lower frequencies
> because with the 'old' method the calculation it's dependent from the current frequency
> and up_threshold.
>
> In this simplified example the new method seems to have a better ratio of
> performance/power consumption.
>
> I don't know if it is appropriate to mention that I use the proposed method
> in 3.4.47 and 3.0.80 kernels for two embedded devices (smart phones). There are
> about 2,000 installations and no reports for increased power consumption (so far).
> Of course this is not a proof but maybe and indication.
> But unfortunately, I don't have measurements about the ratio of performance/consumption.
>
> Thanks,
> Stratos

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 7532570..a2a56c4 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -53,7 +53,7 @@  void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 
 	policy = cdbs->cur_policy;
 
-	/* Get Absolute Load (in terms of freq for ondemand gov) */
+	/* Get Absolute Load */
 	for_each_cpu(j, policy->cpus) {
 		struct cpu_dbs_common_info *j_cdbs;
 		u64 cur_wall_time, cur_idle_time;
@@ -104,14 +104,6 @@  void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 
 		load = 100 * (wall_time - idle_time) / wall_time;
 
-		if (dbs_data->cdata->governor == GOV_ONDEMAND) {
-			int freq_avg = __cpufreq_driver_getavg(policy, j);
-			if (freq_avg <= 0)
-				freq_avg = policy->cur;
-
-			load *= freq_avg;
-		}
-
 		if (load > max_load)
 			max_load = load;
 	}
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index e7bbf76..c305cad 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -169,7 +169,6 @@  struct od_dbs_tuners {
 	unsigned int sampling_rate;
 	unsigned int sampling_down_factor;
 	unsigned int up_threshold;
-	unsigned int adj_up_threshold;
 	unsigned int powersave_bias;
 	unsigned int io_is_busy;
 };
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index 4b9bb5d..62e67a9 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -29,11 +29,9 @@ 
 #include "cpufreq_governor.h"
 
 /* On-demand governor macros */
-#define DEF_FREQUENCY_DOWN_DIFFERENTIAL		(10)
 #define DEF_FREQUENCY_UP_THRESHOLD		(80)
 #define DEF_SAMPLING_DOWN_FACTOR		(1)
 #define MAX_SAMPLING_DOWN_FACTOR		(100000)
-#define MICRO_FREQUENCY_DOWN_DIFFERENTIAL	(3)
 #define MICRO_FREQUENCY_UP_THRESHOLD		(95)
 #define MICRO_FREQUENCY_MIN_SAMPLE_RATE		(10000)
 #define MIN_FREQUENCY_UP_THRESHOLD		(11)
@@ -159,14 +157,10 @@  static void dbs_freq_increase(struct cpufreq_policy *p, unsigned int freq)
 
 /*
  * Every sampling_rate, we check, if current idle time is less than 20%
- * (default), then we try to increase frequency. Every sampling_rate, we look
- * for the lowest frequency which can sustain the load while keeping idle time
- * over 30%. If such a frequency exist, we try to decrease to this frequency.
- *
- * Any frequency increase takes it to the maximum frequency. Frequency reduction
- * happens at minimum steps of 5% (default) of current frequency
+ * (default), then we try to increase frequency. Else, we adjust the frequency
+ * proportional to load.
  */
-static void od_check_cpu(int cpu, unsigned int load_freq)
+static void od_check_cpu(int cpu, unsigned int load)
 {
 	struct od_cpu_dbs_info_s *dbs_info = &per_cpu(od_cpu_dbs_info, cpu);
 	struct cpufreq_policy *policy = dbs_info->cdbs.cur_policy;
@@ -176,29 +170,17 @@  static void od_check_cpu(int cpu, unsigned int load_freq)
 	dbs_info->freq_lo = 0;
 
 	/* Check for frequency increase */
-	if (load_freq > od_tuners->up_threshold * policy->cur) {
+	if (load > od_tuners->up_threshold) {
 		/* If switching to max speed, apply sampling_down_factor */
 		if (policy->cur < policy->max)
 			dbs_info->rate_mult =
 				od_tuners->sampling_down_factor;
 		dbs_freq_increase(policy, policy->max);
 		return;
-	}
-
-	/* Check for frequency decrease */
-	/* if we cannot reduce the frequency anymore, break out early */
-	if (policy->cur == policy->min)
-		return;
-
-	/*
-	 * The optimal frequency is the frequency that is the lowest that can
-	 * support the current CPU usage without triggering the up policy. To be
-	 * safe, we focus 10 points under the threshold.
-	 */
-	if (load_freq < od_tuners->adj_up_threshold
-			* policy->cur) {
+	} else {
+		/* Calculate the next frequency proportional to load */
 		unsigned int freq_next;
-		freq_next = load_freq / od_tuners->adj_up_threshold;
+		freq_next = load * policy->cpuinfo.max_freq / 100;
 
 		/* No longer fully busy, reset rate_mult */
 		dbs_info->rate_mult = 1;
@@ -372,9 +354,6 @@  static ssize_t store_up_threshold(struct dbs_data *dbs_data, const char *buf,
 			input < MIN_FREQUENCY_UP_THRESHOLD) {
 		return -EINVAL;
 	}
-	/* Calculate the new adj_up_threshold */
-	od_tuners->adj_up_threshold += input;
-	od_tuners->adj_up_threshold -= od_tuners->up_threshold;
 
 	od_tuners->up_threshold = input;
 	return count;
@@ -523,8 +502,6 @@  static int od_init(struct dbs_data *dbs_data)
 	if (idle_time != -1ULL) {
 		/* Idle micro accounting is supported. Use finer thresholds */
 		tuners->up_threshold = MICRO_FREQUENCY_UP_THRESHOLD;
-		tuners->adj_up_threshold = MICRO_FREQUENCY_UP_THRESHOLD -
-			MICRO_FREQUENCY_DOWN_DIFFERENTIAL;
 		/*
 		 * In nohz/micro accounting case we set the minimum frequency
 		 * not depending on HZ, but fixed (very low). The deferred
@@ -533,8 +510,6 @@  static int od_init(struct dbs_data *dbs_data)
 		dbs_data->min_sampling_rate = MICRO_FREQUENCY_MIN_SAMPLE_RATE;
 	} else {
 		tuners->up_threshold = DEF_FREQUENCY_UP_THRESHOLD;
-		tuners->adj_up_threshold = DEF_FREQUENCY_UP_THRESHOLD -
-			DEF_FREQUENCY_DOWN_DIFFERENTIAL;
 
 		/* For correct statistics, we need 10 ticks for each measure */
 		dbs_data->min_sampling_rate = MIN_SAMPLING_RATE_RATIO *