diff mbox

[RFCv7,01/10] sched: Compute cpu capacity available at current frequency

Message ID 1456190570-4475-2-git-send-email-smuckle@linaro.org (mailing list archive)
State RFC, archived
Headers show

Commit Message

Steve Muckle Feb. 23, 2016, 1:22 a.m. UTC
From: Morten Rasmussen <morten.rasmussen@arm.com>

capacity_orig_of() returns the max available compute capacity of a cpu.
For scale-invariant utilization tracking and energy-aware scheduling
decisions it is useful to know the compute capacity available at the
current OPP of a cpu.

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Steve Muckle <smuckle@linaro.org>
---
 kernel/sched/fair.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Rafael J. Wysocki Feb. 23, 2016, 1:41 a.m. UTC | #1
On Tue, Feb 23, 2016 at 2:22 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
> From: Morten Rasmussen <morten.rasmussen@arm.com>
>
> capacity_orig_of() returns the max available compute capacity of a cpu.
> For scale-invariant utilization tracking and energy-aware scheduling
> decisions it is useful to know the compute capacity available at the
> current OPP of a cpu.
>
> cc: Ingo Molnar <mingo@redhat.com>
> cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
> Signed-off-by: Steve Muckle <smuckle@linaro.org>
> ---
>  kernel/sched/fair.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7ce24a4..3437e01 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4821,6 +4821,17 @@ static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
>  #endif
>
>  /*
> + * Returns the current capacity of cpu after applying both
> + * cpu and freq scaling.
> + */
> +static unsigned long capacity_curr_of(int cpu)
> +{
> +       return cpu_rq(cpu)->cpu_capacity_orig *
> +              arch_scale_freq_capacity(NULL, cpu)

What about architectures that don't have this?

Why is that an architecture feature?

I can easily imagine two x86 platforms using different
scale_freq_capacity(), for example.

> +              >> SCHED_CAPACITY_SHIFT;
> +}
> +
> +/*
>   * Detect M:N waker/wakee relationships via a switching-frequency heuristic.
>   * A waker of many should wake a different task than the one last awakened
>   * at a frequency roughly N times higher than one of its wakees.  In order
> --


Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra Feb. 23, 2016, 9:19 a.m. UTC | #2
On Tue, Feb 23, 2016 at 02:41:20AM +0100, Rafael J. Wysocki wrote:
> >  /*
> > + * Returns the current capacity of cpu after applying both
> > + * cpu and freq scaling.
> > + */
> > +static unsigned long capacity_curr_of(int cpu)
> > +{
> > +       return cpu_rq(cpu)->cpu_capacity_orig *
> > +              arch_scale_freq_capacity(NULL, cpu)
> 
> What about architectures that don't have this?

They get the 'default' which is a constant SCHED_CAPACITY_SCALE unit.

> Why is that an architecture feature?

Because not all archs can tell the frequency the same way. Some you
program the DVFS state and they really run at this speed, for those you
can simply report back.

For others, x86 for example, you program a DVFS 'hint' and the hardware
does whatever, we'd have to do APERF/MPERF samples to get an idea of the
actual frequency we ran at.

Also, the having of this makes the load tracking slightly more
expensive, instead of compile time constants we get function calls and
actual multiplications. Its not _too_ bad, but still.

> I can easily imagine two x86 platforms using different
> scale_freq_capacity(), for example.

That's up to the arch, if different x86 platforms need different
thingies the arch implementation needs to offer a selector -- this isn't
'hard'.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Feb. 26, 2016, 1:37 a.m. UTC | #3
On Tuesday, February 23, 2016 10:19:16 AM Peter Zijlstra wrote:
> On Tue, Feb 23, 2016 at 02:41:20AM +0100, Rafael J. Wysocki wrote:
> > >  /*
> > > + * Returns the current capacity of cpu after applying both
> > > + * cpu and freq scaling.
> > > + */
> > > +static unsigned long capacity_curr_of(int cpu)
> > > +{
> > > +       return cpu_rq(cpu)->cpu_capacity_orig *
> > > +              arch_scale_freq_capacity(NULL, cpu)
> > 
> > What about architectures that don't have this?
> 
> They get the 'default' which is a constant SCHED_CAPACITY_SCALE unit.
> 
> > Why is that an architecture feature?
> 
> Because not all archs can tell the frequency the same way. Some you
> program the DVFS state and they really run at this speed, for those you
> can simply report back.
> 
> For others, x86 for example, you program a DVFS 'hint' and the hardware
> does whatever, we'd have to do APERF/MPERF samples to get an idea of the
> actual frequency we ran at.
> 
> Also, the having of this makes the load tracking slightly more
> expensive, instead of compile time constants we get function calls and
> actual multiplications. Its not _too_ bad, but still.

That's all correct, but my question should rather be: is arch the right
granularity?

In theory, there may be ARM64-based platforms using ACPI and behaving
like x86 in that respect in the future.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra Feb. 26, 2016, 9:14 a.m. UTC | #4
On Fri, Feb 26, 2016 at 02:37:19AM +0100, Rafael J. Wysocki wrote:

> That's all correct, but my question should rather be: is arch the right
> granularity?
> 
> In theory, there may be ARM64-based platforms using ACPI and behaving
> like x86 in that respect in the future.

Ah, so I started these hooks way before the cpufreq/cpuidle etc.
integration push.

Maybe we should look at something like that, but performance is really
critical, you most definitely do not want 3 indirections just because
abstract framework crap, that's measurable overhead on these callsites.

Hence the current inline with constant value or single function call.
And if archs would want a selector, I would recommend boot time call
instruction rewrites a-la alternatives/paravirt.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7ce24a4..3437e01 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4821,6 +4821,17 @@  static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
 #endif
 
 /*
+ * Returns the current capacity of cpu after applying both
+ * cpu and freq scaling.
+ */
+static unsigned long capacity_curr_of(int cpu)
+{
+	return cpu_rq(cpu)->cpu_capacity_orig *
+	       arch_scale_freq_capacity(NULL, cpu)
+	       >> SCHED_CAPACITY_SHIFT;
+}
+
+/*
  * Detect M:N waker/wakee relationships via a switching-frequency heuristic.
  * A waker of many should wake a different task than the one last awakened
  * at a frequency roughly N times higher than one of its wakees.  In order