diff mbox

[5/6] arm64: topology: Tell the scheduler about the relative power of cores

Message ID 1386767606-6391-5-git-send-email-broonie@kernel.org (mailing list archive)
State New, archived
Headers show

Commit Message

Mark Brown Dec. 11, 2013, 1:13 p.m. UTC
From: Mark Brown <broonie@linaro.org>

In non-heterogeneous systems like big.LITTLE systems the scheduler will be
able to make better use of the available cores if we provide power numbers
to it. Do this by parsing the CPU nodes in the DT.

The power numbers are the same as for ARMv7 since it seems that the
expected differential between the big and little cores is very similar on
both ARMv7 and ARMv8.

Signed-off-by: Mark Brown <broonie@linaro.org>
---
 arch/arm64/kernel/topology.c | 164 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 164 insertions(+)

Comments

Catalin Marinas Dec. 11, 2013, 2:47 p.m. UTC | #1
On Wed, Dec 11, 2013 at 01:13:25PM +0000, Mark Brown wrote:
> The power numbers are the same as for ARMv7 since it seems that the
> expected differential between the big and little cores is very similar on
> both ARMv7 and ARMv8.

I have no idea ;). We don't have real silicon yet, so that's just a wild
guess.

> +/*
> + * Table of relative efficiency of each processors
> + * The efficiency value must fit in 20bit and the final
> + * cpu_scale value must be in the range
> + *   0 < cpu_scale < 3*SCHED_POWER_SCALE/2
> + * in order to return at most 1 when DIV_ROUND_CLOSEST
> + * is used to compute the capacity of a CPU.
> + * Processors that are not defined in the table,
> + * use the default SCHED_POWER_SCALE value for cpu_scale.
> + */
> +static const struct cpu_efficiency table_efficiency[] = {
> +	{ "arm,cortex-a57", 3891 },
> +	{ "arm,cortex-a53", 2048 },
> +	{ NULL, },
> +};

I also don't think we can just have absolute numbers here. I'm pretty
sure these were generated on TC2 but other platforms may have different
max CPU frequencies, memory subsystem, level and size of caches. The
"average" efficiency and difference will be different.

Can we define this via DT? It's a bit strange since that's a constant
used by the Linux scheduler but highly related to hardware.
Mark Brown Dec. 11, 2013, 5:31 p.m. UTC | #2
On Wed, Dec 11, 2013 at 02:47:55PM +0000, Catalin Marinas wrote:
> On Wed, Dec 11, 2013 at 01:13:25PM +0000, Mark Brown wrote:

> > The power numbers are the same as for ARMv7 since it seems that the
> > expected differential between the big and little cores is very similar on
> > both ARMv7 and ARMv8.

> I have no idea ;). We don't have real silicon yet, so that's just a wild
> guess.

I was going on some typical DMIPS/MHz numbers that I'd found so
hopefully it's not a complete guess, though it will vary and that's just
one benchmark with all the realism problems that entails.  The ratio
seemed to be about the same as the equivalent for the ARMv7 cores so
given that it's a finger in the air thing it didn't seem worth drilling
down much further.

> > +static const struct cpu_efficiency table_efficiency[] = {
> > +	{ "arm,cortex-a57", 3891 },
> > +	{ "arm,cortex-a53", 2048 },
> > +	{ NULL, },
> > +};

> I also don't think we can just have absolute numbers here. I'm pretty
> sure these were generated on TC2 but other platforms may have different
> max CPU frequencies, memory subsystem, level and size of caches. The
> "average" efficiency and difference will be different.

The CPU frequencies at least are taken care of already, these numbers
get scaled for each core.  Once we're talking about things like the
memory I'd also start worrying about application specific effects.
There's also going to be stuff like thermal management which get fed in
here and which varies during runtime.

I don't know where the numbers came from for v7.

> Can we define this via DT? It's a bit strange since that's a constant
> used by the Linux scheduler but highly related to hardware.

I really don't think that's a good idea at this point, it seems better
for the DT to stick to factual descriptions of what's present rather
than putting tuning numbers in there.  If the wild guesses are in the
kernel source it's fairly easy to improve them, if they're baked into
system DTs that becomes harder.

I think it's important not to overthink what we're doing here - the
information we're trying to convey is that the A57s are a lot faster
than the A53s.  Getting the numbers "right" is good and helpful but it's
not so critical that we should let perfect be the enemy of good.  This
should at least give ARMv8 implementations about equivalent performance
to ARMv7 with this stuff.

I'm also worried about putting numbers into the DT now with all the
scheduler work going on, this time next year we may well have a
completely different idea of what we want to tell the scheduler.  It may
be that we end up being able to explicitly tell the scheduler about
things like the memory architecture, or that the scheduler just gets
smarter and can estimate all this stuff at runtime.  

Customisation seems better provided at runtime than in the DT, that's
more friendly to application specific tuning and it means that we're
less committed to what's in the DT so we can improve things as our
understanding increases.  If it was punting to platform data and we
could just update it if we decided it wasn't ideal it'd be less of an
issue but punting to something that ought to be an ABI isn't awesome.

Once we've got more experience with the silicon and the scheduler work
has progressed we might decide it's helpful to put tuning controls into
DT but starting from that point feels like it's more likely to cause
problems than help.  With where we are now something simple and in the
ballpark is going to get us a long way.
Nicolas Pitre Dec. 11, 2013, 7:27 p.m. UTC | #3
On Wed, 11 Dec 2013, Mark Brown wrote:

> On Wed, Dec 11, 2013 at 02:47:55PM +0000, Catalin Marinas wrote:
> > On Wed, Dec 11, 2013 at 01:13:25PM +0000, Mark Brown wrote:
> 
> > > The power numbers are the same as for ARMv7 since it seems that the
> > > expected differential between the big and little cores is very similar on
> > > both ARMv7 and ARMv8.
> 
> > I have no idea ;). We don't have real silicon yet, so that's just a wild
> > guess.
> 
> I was going on some typical DMIPS/MHz numbers that I'd found so
> hopefully it's not a complete guess, though it will vary and that's just
> one benchmark with all the realism problems that entails.  The ratio
> seemed to be about the same as the equivalent for the ARMv7 cores so
> given that it's a finger in the air thing it didn't seem worth drilling
> down much further.
> 
> > > +static const struct cpu_efficiency table_efficiency[] = {
> > > +	{ "arm,cortex-a57", 3891 },
> > > +	{ "arm,cortex-a53", 2048 },
> > > +	{ NULL, },
> > > +};
> 
> > I also don't think we can just have absolute numbers here. I'm pretty
> > sure these were generated on TC2 but other platforms may have different
> > max CPU frequencies, memory subsystem, level and size of caches. The
> > "average" efficiency and difference will be different.
> 
> The CPU frequencies at least are taken care of already, these numbers
> get scaled for each core.  Once we're talking about things like the
> memory I'd also start worrying about application specific effects.
> There's also going to be stuff like thermal management which get fed in
> here and which varies during runtime.
> 
> I don't know where the numbers came from for v7.
> 
> > Can we define this via DT? It's a bit strange since that's a constant
> > used by the Linux scheduler but highly related to hardware.
> 
> I really don't think that's a good idea at this point, it seems better
> for the DT to stick to factual descriptions of what's present rather
> than putting tuning numbers in there.  If the wild guesses are in the
> kernel source it's fairly easy to improve them, if they're baked into
> system DTs that becomes harder.

I really think putting such things into DT is wrong.

If those numbers were derived from benchmark results, then it is most 
probably best to try to come up with some kind of equivalent benchmark 
in the kernel to qualify CPUs at run time.  After all this is what 
actually matters i.e. how CPUs perform relative to each other, and that 
may vary with many factors that people will forget to update when 
copying a DT content to enable a new board.

And that wouldn't be the first time some benchmark is used at boot time.  
Different crypto/RAID algorithms are tested to determine the best one to 
use, etc.

> I'm also worried about putting numbers into the DT now with all the
> scheduler work going on, this time next year we may well have a
> completely different idea of what we want to tell the scheduler.  It may
> be that we end up being able to explicitly tell the scheduler about
> things like the memory architecture, or that the scheduler just gets
> smarter and can estimate all this stuff at runtime.  

Exactly.  Which is why the kernel better be self-sufficient to determine 
such params.  Dt should be used only for things that may not be probed 
at run time.  The relative performance of a CPU certainly can be probed 
at run time.

Obviously the specifics of the actual benchmark might be debated, but 
the same can be said about static numbers.


Nicolas
Morten Rasmussen Dec. 12, 2013, 11:56 a.m. UTC | #4
On Wed, Dec 11, 2013 at 07:27:09PM +0000, Nicolas Pitre wrote:
> On Wed, 11 Dec 2013, Mark Brown wrote:
> 
> > On Wed, Dec 11, 2013 at 02:47:55PM +0000, Catalin Marinas wrote:
> > > On Wed, Dec 11, 2013 at 01:13:25PM +0000, Mark Brown wrote:
> > 
> > > > The power numbers are the same as for ARMv7 since it seems that the
> > > > expected differential between the big and little cores is very similar on
> > > > both ARMv7 and ARMv8.
> > 
> > > I have no idea ;). We don't have real silicon yet, so that's just a wild
> > > guess.
> > 
> > I was going on some typical DMIPS/MHz numbers that I'd found so
> > hopefully it's not a complete guess, though it will vary and that's just
> > one benchmark with all the realism problems that entails.  The ratio
> > seemed to be about the same as the equivalent for the ARMv7 cores so
> > given that it's a finger in the air thing it didn't seem worth drilling
> > down much further.
> > 
> > > > +static const struct cpu_efficiency table_efficiency[] = {
> > > > +	{ "arm,cortex-a57", 3891 },
> > > > +	{ "arm,cortex-a53", 2048 },
> > > > +	{ NULL, },
> > > > +};
> > 
> > > I also don't think we can just have absolute numbers here. I'm pretty
> > > sure these were generated on TC2 but other platforms may have different
> > > max CPU frequencies, memory subsystem, level and size of caches. The
> > > "average" efficiency and difference will be different.
> > 
> > The CPU frequencies at least are taken care of already, these numbers
> > get scaled for each core.  Once we're talking about things like the
> > memory I'd also start worrying about application specific effects.
> > There's also going to be stuff like thermal management which get fed in
> > here and which varies during runtime.
> > 
> > I don't know where the numbers came from for v7.

I'm fairly sure that they are guestimates based on TC2. Vincent should
know. I wouldn't consider them accurate in any way as the relative
performance varies wildly depending on the workload. However, they are
better than having no information at all.

> > 
> > > Can we define this via DT? It's a bit strange since that's a constant
> > > used by the Linux scheduler but highly related to hardware.
> > 
> > I really don't think that's a good idea at this point, it seems better
> > for the DT to stick to factual descriptions of what's present rather
> > than putting tuning numbers in there.  If the wild guesses are in the
> > kernel source it's fairly easy to improve them, if they're baked into
> > system DTs that becomes harder.
> 
> I really think putting such things into DT is wrong.
> 
> If those numbers were derived from benchmark results, then it is most 
> probably best to try to come up with some kind of equivalent benchmark 
> in the kernel to qualify CPUs at run time.  After all this is what 
> actually matters i.e. how CPUs perform relative to each other, and that 
> may vary with many factors that people will forget to update when 
> copying a DT content to enable a new board.
> 
> And that wouldn't be the first time some benchmark is used at boot time.  
> Different crypto/RAID algorithms are tested to determine the best one to 
> use, etc.
> 
> > I'm also worried about putting numbers into the DT now with all the
> > scheduler work going on, this time next year we may well have a
> > completely different idea of what we want to tell the scheduler.  It may
> > be that we end up being able to explicitly tell the scheduler about
> > things like the memory architecture, or that the scheduler just gets
> > smarter and can estimate all this stuff at runtime.  

I agree. We need to sort the scheduler side out first before we commit
to anything. If we are worried about including code into v8 that we are
going to change later, then it is probably better to leave this part
out. See my response to Mark's patch subset with the same patch for
details (I didn't see this thread until afterwardsi - sorry).

> 
> Exactly.  Which is why the kernel better be self-sufficient to determine 
> such params.  Dt should be used only for things that may not be probed 
> at run time.  The relative performance of a CPU certainly can be probed 
> at run time.
> 
> Obviously the specifics of the actual benchmark might be debated, but 
> the same can be said about static numbers.

Indeed.

Morten
Mark Brown Dec. 12, 2013, 12:22 p.m. UTC | #5
On Thu, Dec 12, 2013 at 11:56:40AM +0000, Morten Rasmussen wrote:

> > > I'm also worried about putting numbers into the DT now with all the
> > > scheduler work going on, this time next year we may well have a
> > > completely different idea of what we want to tell the scheduler.  It may
> > > be that we end up being able to explicitly tell the scheduler about
> > > things like the memory architecture, or that the scheduler just gets
> > > smarter and can estimate all this stuff at runtime.  

> I agree. We need to sort the scheduler side out first before we commit
> to anything. If we are worried about including code into v8 that we are
> going to change later, then it is probably better to leave this part
> out. See my response to Mark's patch subset with the same patch for
> details (I didn't see this thread until afterwardsi - sorry).

My take on change is that we should be doing as good a job as we can
with the scheduler we have so users get whatever we're able to deliver
at the current time.  Having to change in kernel code shouldn't be that
big a deal, especially with something like this where the scheduler is
free to ignore what it's told without churning the interface.
Morten Rasmussen Dec. 12, 2013, 1:42 p.m. UTC | #6
On Thu, Dec 12, 2013 at 12:22:36PM +0000, Mark Brown wrote:
> On Thu, Dec 12, 2013 at 11:56:40AM +0000, Morten Rasmussen wrote:
> 
> > > > I'm also worried about putting numbers into the DT now with all the
> > > > scheduler work going on, this time next year we may well have a
> > > > completely different idea of what we want to tell the scheduler.  It may
> > > > be that we end up being able to explicitly tell the scheduler about
> > > > things like the memory architecture, or that the scheduler just gets
> > > > smarter and can estimate all this stuff at runtime.  
> 
> > I agree. We need to sort the scheduler side out first before we commit
> > to anything. If we are worried about including code into v8 that we are
> > going to change later, then it is probably better to leave this part
> > out. See my response to Mark's patch subset with the same patch for
> > details (I didn't see this thread until afterwardsi - sorry).
> 
> My take on change is that we should be doing as good a job as we can
> with the scheduler we have so users get whatever we're able to deliver
> at the current time.  Having to change in kernel code shouldn't be that
> big a deal, especially with something like this where the scheduler is
> free to ignore what it's told without churning the interface.

Fair enough. I just wanted to make sure that people knew about the
cpu_power issues before deciding whether to do the same for v8.

Morten
Vincent Guittot Dec. 12, 2013, 2:26 p.m. UTC | #7
On 12 December 2013 12:56, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
> On Wed, Dec 11, 2013 at 07:27:09PM +0000, Nicolas Pitre wrote:
>> On Wed, 11 Dec 2013, Mark Brown wrote:
>>
>> > On Wed, Dec 11, 2013 at 02:47:55PM +0000, Catalin Marinas wrote:
>> > > On Wed, Dec 11, 2013 at 01:13:25PM +0000, Mark Brown wrote:
>> >
>> > > > The power numbers are the same as for ARMv7 since it seems that the
>> > > > expected differential between the big and little cores is very similar on
>> > > > both ARMv7 and ARMv8.
>> >
>> > > I have no idea ;). We don't have real silicon yet, so that's just a wild
>> > > guess.
>> >
>> > I was going on some typical DMIPS/MHz numbers that I'd found so
>> > hopefully it's not a complete guess, though it will vary and that's just
>> > one benchmark with all the realism problems that entails.  The ratio
>> > seemed to be about the same as the equivalent for the ARMv7 cores so
>> > given that it's a finger in the air thing it didn't seem worth drilling
>> > down much further.
>> >
>> > > > +static const struct cpu_efficiency table_efficiency[] = {
>> > > > +       { "arm,cortex-a57", 3891 },
>> > > > +       { "arm,cortex-a53", 2048 },
>> > > > +       { NULL, },
>> > > > +};
>> >
>> > > I also don't think we can just have absolute numbers here. I'm pretty
>> > > sure these were generated on TC2 but other platforms may have different
>> > > max CPU frequencies, memory subsystem, level and size of caches. The
>> > > "average" efficiency and difference will be different.
>> >
>> > The CPU frequencies at least are taken care of already, these numbers
>> > get scaled for each core.  Once we're talking about things like the
>> > memory I'd also start worrying about application specific effects.
>> > There's also going to be stuff like thermal management which get fed in
>> > here and which varies during runtime.
>> >
>> > I don't know where the numbers came from for v7.
>
> I'm fairly sure that they are guestimates based on TC2. Vincent should
> know. I wouldn't consider them accurate in any way as the relative

The values are not based on TC2 but on the dmips/Mhz figures from ARM

Vincent

> performance varies wildly depending on the workload. However, they are
> better than having no information at all.
>
>> >
>> > > Can we define this via DT? It's a bit strange since that's a constant
>> > > used by the Linux scheduler but highly related to hardware.
>> >
>> > I really don't think that's a good idea at this point, it seems better
>> > for the DT to stick to factual descriptions of what's present rather
>> > than putting tuning numbers in there.  If the wild guesses are in the
>> > kernel source it's fairly easy to improve them, if they're baked into
>> > system DTs that becomes harder.
>>
>> I really think putting such things into DT is wrong.
>>
>> If those numbers were derived from benchmark results, then it is most
>> probably best to try to come up with some kind of equivalent benchmark
>> in the kernel to qualify CPUs at run time.  After all this is what
>> actually matters i.e. how CPUs perform relative to each other, and that
>> may vary with many factors that people will forget to update when
>> copying a DT content to enable a new board.
>>
>> And that wouldn't be the first time some benchmark is used at boot time.
>> Different crypto/RAID algorithms are tested to determine the best one to
>> use, etc.
>>
>> > I'm also worried about putting numbers into the DT now with all the
>> > scheduler work going on, this time next year we may well have a
>> > completely different idea of what we want to tell the scheduler.  It may
>> > be that we end up being able to explicitly tell the scheduler about
>> > things like the memory architecture, or that the scheduler just gets
>> > smarter and can estimate all this stuff at runtime.
>
> I agree. We need to sort the scheduler side out first before we commit
> to anything. If we are worried about including code into v8 that we are
> going to change later, then it is probably better to leave this part
> out. See my response to Mark's patch subset with the same patch for
> details (I didn't see this thread until afterwardsi - sorry).
>
>>
>> Exactly.  Which is why the kernel better be self-sufficient to determine
>> such params.  Dt should be used only for things that may not be probed
>> at run time.  The relative performance of a CPU certainly can be probed
>> at run time.
>>
>> Obviously the specifics of the actual benchmark might be debated, but
>> the same can be said about static numbers.
>
> Indeed.
>
> Morten
>
> _______________________________________________
> linaro-kernel mailing list
> linaro-kernel@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-kernel
diff mbox

Patch

diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index e0b40f48b448..f08bb2306cd4 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -18,6 +18,7 @@ 
 #include <linux/percpu.h>
 #include <linux/node.h>
 #include <linux/nodemask.h>
+#include <linux/of.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
 
@@ -26,6 +27,163 @@ 
 #include <asm/topology.h>
 
 /*
+ * cpu power scale management
+ */
+
+/*
+ * cpu power table
+ * This per cpu data structure describes the relative capacity of each core.
+ * On a heteregenous system, cores don't have the same computation capacity
+ * and we reflect that difference in the cpu_power field so the scheduler can
+ * take this difference into account during load balance. A per cpu structure
+ * is preferred because each CPU updates its own cpu_power field during the
+ * load balance except for idle cores. One idle core is selected to run the
+ * rebalance_domains for all idle cores and the cpu_power can be updated
+ * during this sequence.
+ */
+static DEFINE_PER_CPU(unsigned long, cpu_scale);
+
+unsigned long arch_scale_freq_power(struct sched_domain *sd, int cpu)
+{
+	return per_cpu(cpu_scale, cpu);
+}
+
+static void set_power_scale(unsigned int cpu, unsigned long power)
+{
+	per_cpu(cpu_scale, cpu) = power;
+}
+
+#ifdef CONFIG_OF
+struct cpu_efficiency {
+	const char *compatible;
+	unsigned long efficiency;
+};
+
+/*
+ * Table of relative efficiency of each processors
+ * The efficiency value must fit in 20bit and the final
+ * cpu_scale value must be in the range
+ *   0 < cpu_scale < 3*SCHED_POWER_SCALE/2
+ * in order to return at most 1 when DIV_ROUND_CLOSEST
+ * is used to compute the capacity of a CPU.
+ * Processors that are not defined in the table,
+ * use the default SCHED_POWER_SCALE value for cpu_scale.
+ */
+static const struct cpu_efficiency table_efficiency[] = {
+	{ "arm,cortex-a57", 3891 },
+	{ "arm,cortex-a53", 2048 },
+	{ NULL, },
+};
+
+static unsigned long *__cpu_capacity;
+#define cpu_capacity(cpu)	__cpu_capacity[cpu]
+
+static unsigned long middle_capacity = 1;
+
+/*
+ * Iterate all CPUs' descriptor in DT and compute the efficiency
+ * (as per table_efficiency). Also calculate a middle efficiency
+ * as close as possible to  (max{eff_i} - min{eff_i}) / 2
+ * This is later used to scale the cpu_power field such that an
+ * 'average' CPU is of middle power. Also see the comments near
+ * table_efficiency[] and update_cpu_power().
+ */
+static void __init parse_dt_topology(void)
+{
+	const struct cpu_efficiency *cpu_eff;
+	struct device_node *cn = NULL;
+	unsigned long min_capacity = (unsigned long)(-1);
+	unsigned long max_capacity = 0;
+	unsigned long capacity = 0;
+	int alloc_size, cpu;
+
+	alloc_size = nr_cpu_ids * sizeof(*__cpu_capacity);
+	__cpu_capacity = kzalloc(alloc_size, GFP_NOWAIT);
+
+	for_each_possible_cpu(cpu) {
+		const u32 *rate;
+		int len;
+
+		/* Too early to use cpu->of_node */
+		cn = of_get_cpu_node(cpu, NULL);
+		if (!cn) {
+			pr_err("Missing device node for CPU %d\n", cpu);
+			continue;
+		}
+
+		/* check if the cpu is marked as "disabled", if so ignore */
+		if (!of_device_is_available(cn))
+			continue;
+
+		for (cpu_eff = table_efficiency; cpu_eff->compatible; cpu_eff++)
+			if (of_device_is_compatible(cn, cpu_eff->compatible))
+				break;
+
+		if (cpu_eff->compatible == NULL) {
+			pr_warn("%s: Unknown CPU type\n", cn->full_name);
+			continue;
+		}
+
+		rate = of_get_property(cn, "clock-frequency", &len);
+		if (!rate || len != 4) {
+			pr_err("%s: Missing clock-frequency property\n",
+				cn->full_name);
+			continue;
+		}
+
+		capacity = ((be32_to_cpup(rate)) >> 20) * cpu_eff->efficiency;
+
+		/* Save min capacity of the system */
+		if (capacity < min_capacity)
+			min_capacity = capacity;
+
+		/* Save max capacity of the system */
+		if (capacity > max_capacity)
+			max_capacity = capacity;
+
+		cpu_capacity(cpu) = capacity;
+	}
+
+	/* If min and max capacities are equal we bypass the update of the
+	 * cpu_scale because all CPUs have the same capacity. Otherwise, we
+	 * compute a middle_capacity factor that will ensure that the capacity
+	 * of an 'average' CPU of the system will be as close as possible to
+	 * SCHED_POWER_SCALE, which is the default value, but with the
+	 * constraint explained near table_efficiency[].
+	 */
+	if (min_capacity == max_capacity)
+		return;
+	else if (4 * max_capacity < (3 * (max_capacity + min_capacity)))
+		middle_capacity = (min_capacity + max_capacity)
+				>> (SCHED_POWER_SHIFT+1);
+	else
+		middle_capacity = ((max_capacity / 3)
+				>> (SCHED_POWER_SHIFT-1)) + 1;
+
+}
+
+/*
+ * Look for a customed capacity of a CPU in the cpu_topo_data table during the
+ * boot. The update of all CPUs is in O(n^2) for heteregeneous system but the
+ * function returns directly for SMP system.
+ */
+static void update_cpu_power(unsigned int cpu, unsigned long hwid)
+{
+	if (!cpu_capacity(cpu))
+		return;
+
+	set_power_scale(cpu, cpu_capacity(cpu) / middle_capacity);
+
+	pr_info("CPU%u: update cpu_power %lu\n",
+		cpu, arch_scale_freq_power(NULL, cpu));
+}
+
+#else
+static inline void parse_dt_topology(void) {}
+static inline void update_cpu_power(unsigned int cpuid, unsigned int mpidr) {}
+#endif
+
+/*
  * cpu topology table
  */
 struct cputopo_arm cpu_topology[NR_CPUS];
@@ -88,6 +246,8 @@  void store_cpu_topology(unsigned int cpuid)
 
 	update_siblings_masks(cpuid);
 
+	update_cpu_power(cpuid, mpidr & MPIDR_HWID_BITMASK);
+
 	pr_info("CPU%u: cpu %d, socket %d mapped using MPIDR %llx\n",
 		cpuid, cpu_topology[cpuid].core_id,
 		cpu_topology[cpuid].socket_id, mpidr);
@@ -138,6 +298,10 @@  void __init init_cpu_topology(void)
 		cpu_topo->socket_id = -1;
 		cpumask_clear(&cpu_topo->core_sibling);
 		cpumask_clear(&cpu_topo->thread_sibling);
+
+		set_power_scale(cpu, SCHED_POWER_SCALE);
 	}
 	smp_wmb();
+
+	parse_dt_topology();
 }