diff mbox

[v1,5/5] cpufreq: intel_pstate: try load instead of busy_scaled

Message ID 1448284006-13596-11-git-send-email-philippe.longepe@linux.intel.com (mailing list archive)
State Superseded, archived
Headers show

Commit Message

plongepe Nov. 23, 2015, 1:06 p.m. UTC
From: Philippe Longepe <philippe.longepe@intel.com>

- use the load instead of busy scales
- reactivate ioboost

The time spent in iowait is converted into cycles at max freq,
so it increases the load during IOs.

Signed-off-by: Philippe Longepe <philippe.longepe@intel.com>
Signed-off-by: Stephane Gasparini <stephane.gasparini@intel.com>
---
 drivers/cpufreq/intel_pstate.c | 38 +++++++++++++++-----------------------
 1 file changed, 15 insertions(+), 23 deletions(-)

Comments

Doug Smythies Nov. 23, 2015, 4:10 p.m. UTC | #1
On 2015.11.23 05:47 Philippe Longepe wrote:

> static inline int32_t intel_pstate_calc_scaled_busy(struct cpudata *cpu)
>  {
> +	u64 cummulative_iowait, delta_iowait_us;
> +	u64 delta_iowait_mperf;
> +	u64 mperf, now;
>  	struct sample *sample = &cpu->sample;
>  	struct pstate_data *pstate = &cpu->pstate;

To prevent a build "unused variable 'pstate'" warning, suggest deleting that last line.
As of this patch 5 of 5 it is no longer used. i.e.:

-  	struct pstate_data *pstate = &cpu->pstate;


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
plongepe Nov. 23, 2015, 4:26 p.m. UTC | #2
Thank you Smythies,

I found another small error on patch 5/5.

Let me doing few test on a CherryTrail and I'll resubmit a version 2 for 
this patch.

Thx,
Philippe

On 23/11/2015 17:10, Doug Smythies wrote:
> On 2015.11.23 05:47 Philippe Longepe wrote:
>
>> static inline int32_t intel_pstate_calc_scaled_busy(struct cpudata *cpu)
>>   {
>> +	u64 cummulative_iowait, delta_iowait_us;
>> +	u64 delta_iowait_mperf;
>> +	u64 mperf, now;
>>   	struct sample *sample = &cpu->sample;
>>   	struct pstate_data *pstate = &cpu->pstate;
> To prevent a build "unused variable 'pstate'" warning, suggest deleting that last line.
> As of this patch 5 of 5 it is no longer used. i.e.:
>
> -  	struct pstate_data *pstate = &cpu->pstate;
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Smythies Nov. 24, 2015, 1:33 a.m. UTC | #3
On 2015.11.23 05:07 Philippe Longepe wrote:

> The time spent in iowait is converted into cycles at max freq,
> so it increases the load during IOs.
...
> +	cummulative_iowait = get_cpu_iowait_time_us(cpu->cpu, &now);
...

Did you consider to use get_cpu_idle_time_us(cpu_num, NULL) instead?
Why? Because it doesn't include iowait time, and thus just subtract
it from the kernel time to get the load, including iowait.

Crudely, below is what I have been trying for a month now (which also
makes the is mpref / tsc allowed or not discussion moot, because it
isn't used). Note, it can include iowait or not, you can observe by the
commented out lines (I have been testing both methods):

@@ -726,21 +737,32 @@ static inline void intel_pstate_sample(struct cpudata *cpu)
 
 	tsc = rdtsc();
 	local_irq_restore(flags);
+	time = ktime_get();
+	idle_us = get_cpu_idle_time_us(cpu_num, NULL);
+//	io_wait_us = get_cpu_iowait_time_us(cpu_num, NULL);
+	if (idle_us == -1ULL) {
+		/* !NO_HZ so we can rely on cpustat.idle */
+		idle = kcpustat_cpu(cpu_num).cpustat[CPUTIME_IDLE];
+//		io_wait = kcpustat_cpu(cpu_num).cpustat[CPUTIME_IOWAIT];
+		idle_us = cputime_to_usecs(idle);
+//		io_wait_us = cputime_to_usecs(io_wait);
+	}
...
+	cpu->sample.duration_us = ktime_us_delta(time, cpu->prev_time);
+	cpu->sample.idle_us = idle_us - cpu->prev_idle_us;
+//	cpu->sample.io_wait_us = io_wait_us - cpu->prev_io_wait_us;

And then the load calculation becomes (I use units of tenths of a percent in my stuff):

-	core_pct = int_tofp(sample->mperf) * int_tofp(1000);
-	core_pct = div64_u64(core_pct, int_tofp(sample->tsc));
+//	core_pct = sample->duration_us - sample->idle_us - sample->io_wait_us;
+	core_pct = sample->duration_us - sample->idle_us;
+	if (core_pct < 0) core_pct = 0;
+	core_pct = int_tofp(core_pct) * int_tofp(1000);
+	core_pct = div64_u64(core_pct, int_tofp(sample->duration_us));

In the above notice the special NO_HZ case. I don't this area well,
but I was of the understanding that the NO_HZ case requires special code.
(see also: fs/proc/stat.c, which is where I stole this stuff.) Note
that I haven't actually tested NO_HZ mode yet.

... Doug


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 4f08fa7..61b31d4 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -114,6 +114,7 @@  struct cpudata {
 	u64	prev_mperf;
 	u64	prev_tsc;
 	struct sample sample;
+	u64 prev_cummulative_iowait;
 };
 
 static struct cpudata **all_cpu_data;
@@ -931,12 +932,22 @@  static inline void intel_pstate_set_sample_time(struct cpudata *cpu)
 	mod_timer_pinned(&cpu->timer, jiffies + delay);
 }
 
-
 static inline int32_t intel_pstate_calc_scaled_busy(struct cpudata *cpu)
 {
+	u64 cummulative_iowait, delta_iowait_us;
+	u64 delta_iowait_mperf;
+	u64 mperf, now;
 	struct sample *sample = &cpu->sample;
 	struct pstate_data *pstate = &cpu->pstate;
-	int64_t core_busy_ratio;
+
+	cummulative_iowait = get_cpu_iowait_time_us(cpu->cpu, &now);
+	/* Convert iowait time into number of IO cycles spent at max_freq */
+	delta_iowait_us = cummulative_iowait - cpu->prev_cummulative_iowait;
+	delta_iowait_mperf = div64_u64(delta_iowait_us * cpu->pstate.scaling *
+		cpu->pstate.max_pstate, 1000);
+
+	mperf = cpu->sample.mperf + delta_iowait_mperf;
+	cpu->prev_cummulative_iowait = cummulative_iowait;
 
 	/*
 	 * The load can be estimated as the ratio of the mperf counter
@@ -944,30 +955,11 @@  static inline int32_t intel_pstate_calc_scaled_busy(struct cpudata *cpu)
 	 * (C0) and the time stamp counter running at the same frequency
 	 * also during C-states.
 	 */
-	sample->cpu_load = div64_u64(100 * sample->mperf, sample->tsc);
-
-	/*
-	 * The target P-state can be estimated with the following formula:
-	 * PercentPerformance = PercentBusy * (delta_aperf/delta_mperf);
-	 * (see Section 14.2 from Intel Software Developer Manual)
-	 * with PercentBusy = 100 * (delta_mperf / delta_tsc) and
-	 * PercentPerformance can be simplified with:
-	 * (delta_mperf * delta_aperf) / (delta_tsc * delta_mperf) =
-	 * delta_aperf / delta_tsc. Finally, we normalize core_busy_ratio,
-	 * which was our actual percent performance to what we requested
-	 * during the last sample period. The result will be a percentage of
-	 * busy at a specified pstate.
-	 */
-	core_busy_ratio = div64_u64(int_tofp(100) * sample->aperf *
-		pstate->max_pstate, sample->tsc * pstate->current_pstate);
+	sample->cpu_load = div64_u64(100 * mperf, sample->tsc);
 
-	sample->freq = div64_u64(sample->aperf * pstate->max_pstate *
-		pstate->scaling, sample->mperf);
-
-	return core_busy_ratio;
+	return sample->cpu_load;
 }
 
-
 static inline int32_t intel_pstate_get_scaled_busy_estimate(struct cpudata *cpu)
 {
 	int32_t core_busy, max_pstate, current_pstate, sample_ratio;