diff mbox

[7/8] cpufreq: Frequency invariant scheduler load-tracking support

Message ID 1457932932-28444-8-git-send-email-mturquette+renesas@baylibre.com (mailing list archive)
State RFC, archived
Headers show

Commit Message

Michael Turquette March 14, 2016, 5:22 a.m. UTC
From: Dietmar Eggemann <dietmar.eggemann@arm.com>

Implements cpufreq_scale_freq_capacity() to provide the scheduler with a
frequency scaling correction factor for more accurate load-tracking.

The factor is:

	current_freq(cpu) << SCHED_CAPACITY_SHIFT / max_freq(cpu)

In fact, freq_scale should be a struct cpufreq_policy data member. But
this would require that the scheduler hot path (__update_load_avg()) would
have to grab the cpufreq lock. This can be avoided by using per-cpu data
initialized to SCHED_CAPACITY_SCALE for freq_scale.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Michael Turquette <mturquette+renesas@baylibre.com>
---
I'm not as sure about patches 7 & 8, but I included them since I needed
frequency invariance while testing.

As mentioned by myself in 2014 and Rafael last month, the
arch_scale_freq_capacity hook is awkward, because this behavior may vary
within an architecture.

I re-introduce Dietmar's generic cpufreq implementation of the frequency
invariance hook in this patch,  and change the preprocessor magic in
sched.h to favor the cpufreq implementation over arch- or
platform-specific ones in the next patch.

If run-time selection of ops is needed them someone will need to write
that code.

I think that this negates the need for the arm arch hooks[0-2], and
hopefully Morten and Dietmar can weigh in on this.

[0] lkml.kernel.org/r/1436293469-25707-2-git-send-email-morten.rasmussen@arm.com
[1] lkml.kernel.org/r/1436293469-25707-6-git-send-email-morten.rasmussen@arm.com
[2] lkml.kernel.org/r/1436293469-25707-8-git-send-email-morten.rasmussen@arm.com

 drivers/cpufreq/cpufreq.c | 29 +++++++++++++++++++++++++++++
 include/linux/cpufreq.h   |  3 +++
 2 files changed, 32 insertions(+)

Comments

Dietmar Eggemann March 15, 2016, 7:13 p.m. UTC | #1
Hi Mike,

On 14/03/16 05:22, Michael Turquette wrote:
> From: Dietmar Eggemann <dietmar.eggemann@arm.com>
> 
> Implements cpufreq_scale_freq_capacity() to provide the scheduler with a
> frequency scaling correction factor for more accurate load-tracking.
> 
> The factor is:
> 
> 	current_freq(cpu) << SCHED_CAPACITY_SHIFT / max_freq(cpu)
> 
> In fact, freq_scale should be a struct cpufreq_policy data member. But
> this would require that the scheduler hot path (__update_load_avg()) would
> have to grab the cpufreq lock. This can be avoided by using per-cpu data
> initialized to SCHED_CAPACITY_SCALE for freq_scale.
> 
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Signed-off-by: Michael Turquette <mturquette+renesas@baylibre.com>
> ---
> I'm not as sure about patches 7 & 8, but I included them since I needed
> frequency invariance while testing.
> 
> As mentioned by myself in 2014 and Rafael last month, the
> arch_scale_freq_capacity hook is awkward, because this behavior may vary
> within an architecture.
> 
> I re-introduce Dietmar's generic cpufreq implementation of the frequency
> invariance hook in this patch,  and change the preprocessor magic in
> sched.h to favor the cpufreq implementation over arch- or
> platform-specific ones in the next patch.

Maybe it is worth mentioning that this patch is from EAS RFC5.2
(linux-arm.org/linux-power.git energy_model_rfc_v5.2) which hasn't been
posted to LKML. The last EAS RFCv5 has the Frequency Invariant Engine
(FEI) based on the cpufreq notifier calls (cpufreq_callback,
cpufreq_policy_callback) in the ARM arch code.

> If run-time selection of ops is needed them someone will need to write
> that code.

Right now I see 3 different implementations of the FEI. 1) The X86
aperf/mperf based one (https://lkml.org/lkml/2016/3/3/589), 2) This one
in cpufreq.c and 3) the one based on cpufreq notifiers in ARCH (ARM,
ARM64) code.

I guess with sched_util we do need a solution for all platforms
(different archs, x86 w/ and w/o X86_FEATURE_APERFMPERF, ...).

> I think that this negates the need for the arm arch hooks[0-2], and
> hopefully Morten and Dietmar can weigh in on this.

It's true that we tried to get rid of the usage of the cpufreq callbacks
(cpufreq_callback, cpufreq_policy_callback) with this patch. Plus we
didn't want to implement it twice (for ARM and ARM64).

But 2) would have to work for other ARCHs as well. Maybe as a fall-back
for X86 w/o X86_FEATURE_APERFMPERF feature?

[...]
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Turquette March 15, 2016, 8:19 p.m. UTC | #2
Quoting Dietmar Eggemann (2016-03-15 12:13:46)
> Hi Mike,
> 
> On 14/03/16 05:22, Michael Turquette wrote:
> > From: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > 
> > Implements cpufreq_scale_freq_capacity() to provide the scheduler with a
> > frequency scaling correction factor for more accurate load-tracking.
> > 
> > The factor is:
> > 
> >       current_freq(cpu) << SCHED_CAPACITY_SHIFT / max_freq(cpu)
> > 
> > In fact, freq_scale should be a struct cpufreq_policy data member. But
> > this would require that the scheduler hot path (__update_load_avg()) would
> > have to grab the cpufreq lock. This can be avoided by using per-cpu data
> > initialized to SCHED_CAPACITY_SCALE for freq_scale.
> > 
> > Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > Signed-off-by: Michael Turquette <mturquette+renesas@baylibre.com>
> > ---
> > I'm not as sure about patches 7 & 8, but I included them since I needed
> > frequency invariance while testing.
> > 
> > As mentioned by myself in 2014 and Rafael last month, the
> > arch_scale_freq_capacity hook is awkward, because this behavior may vary
> > within an architecture.
> > 
> > I re-introduce Dietmar's generic cpufreq implementation of the frequency
> > invariance hook in this patch,  and change the preprocessor magic in
> > sched.h to favor the cpufreq implementation over arch- or
> > platform-specific ones in the next patch.
> 
> Maybe it is worth mentioning that this patch is from EAS RFC5.2
> (linux-arm.org/linux-power.git energy_model_rfc_v5.2) which hasn't been
> posted to LKML. The last EAS RFCv5 has the Frequency Invariant Engine
> (FEI) based on the cpufreq notifier calls (cpufreq_callback,
> cpufreq_policy_callback) in the ARM arch code.

Oops, my apologies. I got a little mixed up while developing these
patches and I should have at least asked you about this one before
posting.

I'm really quite happy to drop #7 and #8 if they are too contentious or
if patch #7 is deemed as not-ready by you.

> 
> > If run-time selection of ops is needed them someone will need to write
> > that code.
> 
> Right now I see 3 different implementations of the FEI. 1) The X86
> aperf/mperf based one (https://lkml.org/lkml/2016/3/3/589), 2) This one
> in cpufreq.c and 3) the one based on cpufreq notifiers in ARCH (ARM,
> ARM64) code.
> 
> I guess with sched_util we do need a solution for all platforms
> (different archs, x86 w/ and w/o X86_FEATURE_APERFMPERF, ...).
> 
> > I think that this negates the need for the arm arch hooks[0-2], and
> > hopefully Morten and Dietmar can weigh in on this.
> 
> It's true that we tried to get rid of the usage of the cpufreq callbacks
> (cpufreq_callback, cpufreq_policy_callback) with this patch. Plus we
> didn't want to implement it twice (for ARM and ARM64).
> 
> But 2) would have to work for other ARCHs as well. Maybe as a fall-back
> for X86 w/o X86_FEATURE_APERFMPERF feature?

That's what I had in mind. I guess that some day there will be a need to
select implementations at run-time for both cpufreq (e.g. different
cpufreq drivers might implement arch_scale_freq_capacity) and for the
!CONFIG_CPU_FREQ case (e.g. different platforms might implement
arch_scale_freq_capcity within the same arch).

The cpufreq approach seems the most generic, hence patch #8 to make it
the default.

Regards,
Mike

> 
> [...]
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra March 15, 2016, 9:32 p.m. UTC | #3
On Tue, Mar 15, 2016 at 01:19:17PM -0700, Michael Turquette wrote:
> That's what I had in mind. I guess that some day there will be a need to
> select implementations at run-time for both cpufreq (e.g. different
> cpufreq drivers might implement arch_scale_freq_capacity) and for the
> !CONFIG_CPU_FREQ case (e.g. different platforms might implement
> arch_scale_freq_capcity within the same arch).

No, no runtime selection. That gets us function pointers and other
indirect mess.

We should be trying very hard to get rid of that cpufreq_util_update()
pointer, not add more of that gunk.

Use self modifying code if you have to do something.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra March 15, 2016, 9:34 p.m. UTC | #4
On Sun, Mar 13, 2016 at 10:22:11PM -0700, Michael Turquette wrote:

> +static DEFINE_PER_CPU(unsigned long, freq_scale) = SCHED_CAPACITY_SCALE;
> +
> +unsigned long cpufreq_scale_freq_capacity(struct sched_domain *sd, int cpu)
> +{
> +	return per_cpu(freq_scale, cpu);
> +}

These should be in a header so that we can avoid having to do a function
call.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dietmar Eggemann March 16, 2016, 6:33 p.m. UTC | #5
On 15/03/16 20:19, Michael Turquette wrote:
> Quoting Dietmar Eggemann (2016-03-15 12:13:46)
>> Hi Mike,
>>
>> On 14/03/16 05:22, Michael Turquette wrote:
>>> From: Dietmar Eggemann <dietmar.eggemann@arm.com>
>>>

[...]

>> Maybe it is worth mentioning that this patch is from EAS RFC5.2
>> (linux-arm.org/linux-power.git energy_model_rfc_v5.2) which hasn't been
>> posted to LKML. The last EAS RFCv5 has the Frequency Invariant Engine
>> (FEI) based on the cpufreq notifier calls (cpufreq_callback,
>> cpufreq_policy_callback) in the ARM arch code.
> 
> Oops, my apologies. I got a little mixed up while developing these
> patches and I should have at least asked you about this one before
> posting.

No need to apologize, just wanted to set the context right for people
following the EAS stuff.

> I'm really quite happy to drop #7 and #8 if they are too contentious or
> if patch #7 is deemed as not-ready by you.

If people think that driving frequency invariance from within cpufreq.c
is the right thing to so then this patch can stay.

That said, I'm not sure if we can do a better job of integrating
freq_scale into existing cpufreq data structures.


>>> If run-time selection of ops is needed them someone will need to write
>>> that code.
>>
>> Right now I see 3 different implementations of the FEI. 1) The X86
>> aperf/mperf based one (https://lkml.org/lkml/2016/3/3/589), 2) This one
>> in cpufreq.c and 3) the one based on cpufreq notifiers in ARCH (ARM,
>> ARM64) code.

I rechecked the functionality of this implementation (2) versus (1) and
(3) by running them all together on an X86 system w/
X86_FEATURE_APERFMPERF and acpi-cpufreq (i5 CPU M520) w/o actually doing
the wiring towards arch_scale_freq_capacity and they all behave similar
or equal. The update of (1) happens much more often obviously since its
driven from the scheduler tick.

...
arch_scale_freq_tick: APERF/MPERF: aperf=5864236223 mperf=6009442434
acnt=200073 mcnt=362360 arch_cpu_freq=565
arch_scale_freq_tick: APERF/MPERF: aperf=5864261886 mperf=6009474724
acnt=25663 mcnt=32290 arch_cpu_freq=813
scale_freq_capacity: CPUFREQ.C: cpu=1 freq_scale=910
cpufreq_callback: NOTIFIER: cpu=1 curr_freq=2133000 max_freq=2400000
freq_scale=910
arch_scale_freq_tick: APERF/MPERF: aperf=5864524610 mperf=6009802492
acnt=262724 mcnt=327768 arch_cpu_freq=820
arch_scale_freq_tick: APERF/MPERF: aperf=5864581063 mperf=6009862268
acnt=56453 mcnt=59776 arch_cpu_freq=967
...

[...]
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index b1ca9c4..e67584f 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -306,6 +306,31 @@  static void adjust_jiffies(unsigned long val, struct cpufreq_freqs *ci)
 #endif
 }
 
+/*********************************************************************
+ *               FREQUENCY INVARIANT CPU CAPACITY                    *
+ *********************************************************************/
+
+static DEFINE_PER_CPU(unsigned long, freq_scale) = SCHED_CAPACITY_SCALE;
+
+static void
+scale_freq_capacity(struct cpufreq_policy *policy, struct cpufreq_freqs *freqs)
+{
+	unsigned long cur = freqs ? freqs->new : policy->cur;
+	unsigned long scale = (cur << SCHED_CAPACITY_SHIFT) / policy->max;
+	int cpu;
+
+	pr_debug("cpus %*pbl cur/cur max freq %lu/%u kHz freq scale %lu\n",
+		 cpumask_pr_args(policy->cpus), cur, policy->max, scale);
+
+	for_each_cpu(cpu, policy->cpus)
+		per_cpu(freq_scale, cpu) = scale;
+}
+
+unsigned long cpufreq_scale_freq_capacity(struct sched_domain *sd, int cpu)
+{
+	return per_cpu(freq_scale, cpu);
+}
+
 static void __cpufreq_notify_transition(struct cpufreq_policy *policy,
 		struct cpufreq_freqs *freqs, unsigned int state)
 {
@@ -409,6 +434,8 @@  wait:
 
 	spin_unlock(&policy->transition_lock);
 
+	scale_freq_capacity(policy, freqs);
+
 	cpufreq_notify_transition(policy, freqs, CPUFREQ_PRECHANGE);
 }
 EXPORT_SYMBOL_GPL(cpufreq_freq_transition_begin);
@@ -2125,6 +2152,8 @@  static int cpufreq_set_policy(struct cpufreq_policy *policy,
 	blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
 			CPUFREQ_NOTIFY, new_policy);
 
+	scale_freq_capacity(new_policy, NULL);
+
 	policy->min = new_policy->min;
 	policy->max = new_policy->max;
 
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 0e39499..72833be 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -583,4 +583,7 @@  unsigned int cpufreq_generic_get(unsigned int cpu);
 int cpufreq_generic_init(struct cpufreq_policy *policy,
 		struct cpufreq_frequency_table *table,
 		unsigned int transition_latency);
+
+struct sched_domain;
+unsigned long cpufreq_scale_freq_capacity(struct sched_domain *sd, int cpu);
 #endif /* _LINUX_CPUFREQ_H */