diff mbox series

[PATCHv2,2/3] cpupower: mperf_monitor: Introduce per_cpu_schedule flag

Message ID bb4b0e4137b62651b9d028925fa8f09ca5fbd989.1570819652.git.Janakarajan.Natarajan@amd.com (mailing list archive)
State Changes Requested
Delegated to: Shuah Khan
Headers show
Series Update cpupower and make it more accurate | expand

Commit Message

Janakarajan Natarajan Oct. 11, 2019, 7:37 p.m. UTC
The per_cpu_schedule flag is used to move the cpupower process to the cpu
on which we are looking to read the APERF/MPERF registers.

This prevents IPIs from being generated by read_msr()s as we are already
on the cpu of interest.

Ex: If cpupower is running on CPU 0 and we execute

    read_msr(20, MSR_APERF, val) then,
    read_msr(20, MSR_MPERF, val)

    the msr module will generate an IPI from CPU 0 to CPU 20 to query
    for the MSR_APERF and then the MSR_MPERF in separate IPIs.

This delay, caused by IPI latency, between reading the APERF and MPERF
registers may cause both of them to go out of sync.

The use of the per_cpu_schedule flag reduces the probability of this
from happening. It comes at the cost of a negligible increase in cpu
consumption caused by the migration of cpupower across each of the
cpus of the system.

Signed-off-by: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com>
---
 .../utils/idle_monitor/cpupower-monitor.h     |  1 +
 .../utils/idle_monitor/mperf_monitor.c        | 42 ++++++++++++++-----
 2 files changed, 33 insertions(+), 10 deletions(-)

Comments

Thomas Renninger Oct. 25, 2019, 10:39 a.m. UTC | #1
Hi Natarajan,

sorry for answering that late.
I post on top as it doesn't fit to the patch context:

While I like the 2 other patches, especially the first preparing for
a generic "ensure to always run on the measured CPU at measure time" 
interface..., this patch does make use of it in a very static manner.

I then tried to get this more generic..., without any outcome for now.

If someone likes to play with this, my idea would be:

- the monitors need cpu_start() and cpu_stop() callbacks to register
- either start(), stop() and/or cpu_start(), cpu_stop() callbacks have to
  be provided by a monitor.
- current behavior is only start/stop which means the whole per_cpu logic
  resides inside the monitor
- if cpu_start/cpu_stop is provided, iterating over all cpus is done in
  fork_it and general start/stop functions are an optionally entry point
  before and after the per_cpu calls.

Then the cpu binding can be done from outside.
Another enhancement could be then to fork as many processes as there are CPUs
in case of per_cpu_schedule (or an extra param/flag) and then:

- Bind these forked processes to each cpu.
- Execute start measures via the forked processes on each cpu
- Execute test executable (which runs in yet another fork as done already)
- Execute stop measures via the forked processes on each cpu

This should be ideal environment to not interfere with the tested executable.
It would also allow a nicer program structure.

Just some ideas. But no time right now to look deeper into this.

I'll ack on the first summarizing commit message.


    Thomas
Shuah Oct. 25, 2019, 3:33 p.m. UTC | #2
On 10/25/19 4:39 AM, Thomas Renninger wrote:
> Hi Natarajan,
> 
> sorry for answering that late.
> I post on top as it doesn't fit to the patch context:
> 
> While I like the 2 other patches, especially the first preparing for
> a generic "ensure to always run on the measured CPU at measure time"
> interface..., this patch does make use of it in a very static manner.
> 
> I then tried to get this more generic..., without any outcome for now.
> 
> If someone likes to play with this, my idea would be:
> 
> - the monitors need cpu_start() and cpu_stop() callbacks to register
> - either start(), stop() and/or cpu_start(), cpu_stop() callbacks have to
>    be provided by a monitor.
> - current behavior is only start/stop which means the whole per_cpu logic
>    resides inside the monitor
> - if cpu_start/cpu_stop is provided, iterating over all cpus is done in
>    fork_it and general start/stop functions are an optionally entry point
>    before and after the per_cpu calls.
> 
> Then the cpu binding can be done from outside.
> Another enhancement could be then to fork as many processes as there are CPUs
> in case of per_cpu_schedule (or an extra param/flag) and then:
> 
> - Bind these forked processes to each cpu.
> - Execute start measures via the forked processes on each cpu
> - Execute test executable (which runs in yet another fork as done already)
> - Execute stop measures via the forked processes on each cpu
> 
> This should be ideal environment to not interfere with the tested executable.
> It would also allow a nicer program structure.
> 

It will be good to capture these ideas in the ToDo file.

Natarajan! WOuld you like to send a patch updating the ToDo file with
these ideas?

thanks,
-- Shuah
Janakarajan Natarajan Oct. 28, 2019, 4:37 p.m. UTC | #3
On 10/25/2019 10:33 AM, shuah wrote:
> On 10/25/19 4:39 AM, Thomas Renninger wrote:
>> Hi Natarajan,
>>
>> sorry for answering that late.
>> I post on top as it doesn't fit to the patch context:
>>
>> While I like the 2 other patches, especially the first preparing for
>> a generic "ensure to always run on the measured CPU at measure time"
>> interface..., this patch does make use of it in a very static manner.
>>
>> I then tried to get this more generic..., without any outcome for now.
>>
>> If someone likes to play with this, my idea would be:
>>
>> - the monitors need cpu_start() and cpu_stop() callbacks to register
>> - either start(), stop() and/or cpu_start(), cpu_stop() callbacks 
>> have to
>>    be provided by a monitor.
>> - current behavior is only start/stop which means the whole per_cpu 
>> logic
>>    resides inside the monitor
>> - if cpu_start/cpu_stop is provided, iterating over all cpus is done in
>>    fork_it and general start/stop functions are an optionally entry 
>> point
>>    before and after the per_cpu calls.
>>
>> Then the cpu binding can be done from outside.
>> Another enhancement could be then to fork as many processes as there 
>> are CPUs
>> in case of per_cpu_schedule (or an extra param/flag) and then:
>>
>> - Bind these forked processes to each cpu.
>> - Execute start measures via the forked processes on each cpu
>> - Execute test executable (which runs in yet another fork as done 
>> already)
>> - Execute stop measures via the forked processes on each cpu
>>
>> This should be ideal environment to not interfere with the tested 
>> executable.
>> It would also allow a nicer program structure.
>>
>
> It will be good to capture these ideas in the ToDo file.
>
> Natarajan! WOuld you like to send a patch updating the ToDo file with
> these ideas?


Sure. I can send out a patch capturing these ideas.


-Janak


>
> thanks,
> -- Shuah
>
diff mbox series

Patch

diff --git a/tools/power/cpupower/utils/idle_monitor/cpupower-monitor.h b/tools/power/cpupower/utils/idle_monitor/cpupower-monitor.h
index 9b612d999660..5b5eb1da0cce 100644
--- a/tools/power/cpupower/utils/idle_monitor/cpupower-monitor.h
+++ b/tools/power/cpupower/utils/idle_monitor/cpupower-monitor.h
@@ -62,6 +62,7 @@  struct cpuidle_monitor {
 	unsigned int overflow_s;
 	struct {
 		unsigned int needs_root:1;
+		unsigned int per_cpu_schedule:1;
 	} flags;
 };
 
diff --git a/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c b/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c
index 7cae74202a4d..afb2e6f8edd3 100644
--- a/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c
+++ b/tools/power/cpupower/utils/idle_monitor/mperf_monitor.c
@@ -86,15 +86,35 @@  static int mperf_get_tsc(unsigned long long *tsc)
 	return ret;
 }
 
+static int get_aperf_mperf(int cpu, unsigned long long *aval,
+				    unsigned long long *mval)
+{
+	int ret;
+
+	/*
+	 * Running on the cpu from which we read the registers will
+	 * prevent APERF/MPERF from going out of sync because of IPI
+	 * latency introduced by read_msr()s.
+	 */
+	if (mperf_monitor.flags.per_cpu_schedule) {
+		if (bind_cpu(cpu))
+			return 1;
+	}
+
+	ret  = read_msr(cpu, MSR_APERF, aval);
+	ret |= read_msr(cpu, MSR_MPERF, mval);
+
+	return ret;
+}
+
 static int mperf_init_stats(unsigned int cpu)
 {
-	unsigned long long val;
+	unsigned long long aval, mval;
 	int ret;
 
-	ret = read_msr(cpu, MSR_APERF, &val);
-	aperf_previous_count[cpu] = val;
-	ret |= read_msr(cpu, MSR_MPERF, &val);
-	mperf_previous_count[cpu] = val;
+	ret = get_aperf_mperf(cpu, &aval, &mval);
+	aperf_previous_count[cpu] = aval;
+	mperf_previous_count[cpu] = mval;
 	is_valid[cpu] = !ret;
 
 	return 0;
@@ -102,13 +122,12 @@  static int mperf_init_stats(unsigned int cpu)
 
 static int mperf_measure_stats(unsigned int cpu)
 {
-	unsigned long long val;
+	unsigned long long aval, mval;
 	int ret;
 
-	ret = read_msr(cpu, MSR_APERF, &val);
-	aperf_current_count[cpu] = val;
-	ret |= read_msr(cpu, MSR_MPERF, &val);
-	mperf_current_count[cpu] = val;
+	ret = get_aperf_mperf(cpu, &aval, &mval);
+	aperf_current_count[cpu] = aval;
+	mperf_current_count[cpu] = mval;
 	is_valid[cpu] = !ret;
 
 	return 0;
@@ -305,6 +324,9 @@  struct cpuidle_monitor *mperf_register(void)
 	if (init_maxfreq_mode())
 		return NULL;
 
+	if (cpupower_cpu_info.vendor == X86_VENDOR_AMD)
+		mperf_monitor.flags.per_cpu_schedule = 1;
+
 	/* Free this at program termination */
 	is_valid = calloc(cpu_count, sizeof(int));
 	mperf_previous_count = calloc(cpu_count, sizeof(unsigned long long));