diff mbox

[RFCv5,28/46] sched: Count number of shallower idle-states in struct sched_group_energy

Message ID 1436293469-25707-29-git-send-email-morten.rasmussen@arm.com (mailing list archive)
State RFC
Headers show

Commit Message

Morten Rasmussen July 7, 2015, 6:24 p.m. UTC
cpuidle associates all idle-states with each cpu while the energy model
associates them with the sched_group covering the cpus coordinating
entry to the idle-state. To look up the idle-state power consumption in
the energy model it is therefore necessary to translate from cpuidle
idle-state index to energy model index. For this purpose it is helpful
to know how many idle-states that are listed in lower level sched_groups
(in struct sched_group_energy).

Example: ARMv8 big.LITTLE JUNO (Cortex A57, A53) idle-states:
Idle-state              cpuidle         Energy model table indices
                        index           per-cpu sg      per-cluster sg
WFI                     0               0               (0)
Core power-down         1               1               0*
Cluster power-down      2               (1)             1

For per-cpu sgs no translation is required. If cpuidle reports state
index 0 or 1, the cpu is in WFI or core power-down, respectively. We can
look the idle-power up directly in the sg energy model table. Idle-state
cluster power-down, is represented in the per-cluster sg energy model
table as index 1. Index 0* is reserved for cluster power consumption
when the cpus all are in state 0 or 1, but cpuidle decided not to go for
cluster power-down. Given the index from cpuidle we can compute the
correct index in the energy model tables for the sgs at each level if we
know how many states are in the tables in the child sgs. The actual
translation is implemented in a later patch.

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
---
 include/linux/sched.h |  1 +
 kernel/sched/core.c   | 12 ++++++++++++
 2 files changed, 13 insertions(+)

Comments

Peter Zijlstra Aug. 13, 2015, 6:10 p.m. UTC | #1
On Tue, Jul 07, 2015 at 07:24:11PM +0100, Morten Rasmussen wrote:
> cpuidle associates all idle-states with each cpu while the energy model
> associates them with the sched_group covering the cpus coordinating
> entry to the idle-state. To look up the idle-state power consumption in
> the energy model it is therefore necessary to translate from cpuidle
> idle-state index to energy model index. For this purpose it is helpful
> to know how many idle-states that are listed in lower level sched_groups
> (in struct sched_group_energy).
> 
> Example: ARMv8 big.LITTLE JUNO (Cortex A57, A53) idle-states:
> Idle-state              cpuidle         Energy model table indices
>                         index           per-cpu sg      per-cluster sg
> WFI                     0               0               (0)
> Core power-down         1               1               0*
> Cluster power-down      2               (1)             1
> 
> For per-cpu sgs no translation is required. If cpuidle reports state
> index 0 or 1, the cpu is in WFI or core power-down, respectively. We can
> look the idle-power up directly in the sg energy model table.

OK..

> Idle-state
> cluster power-down, is represented in the per-cluster sg energy model
> table as index 1. Index 0* is reserved for cluster power consumption
> when the cpus all are in state 0 or 1, but cpuidle decided not to go for
> cluster power-down.

0* is not an integer.

> Given the index from cpuidle we can compute the
> correct index in the energy model tables for the sgs at each level if we
> know how many states are in the tables in the child sgs. The actual
> translation is implemented in a later patch.

And you've lost me... I've looked at that later patch (its the next one)
and I cannot say I'm less confused.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sai Aug. 14, 2015, 7:08 p.m. UTC | #2
On 08/13/2015 11:10 AM, Peter Zijlstra wrote:
> On Tue, Jul 07, 2015 at 07:24:11PM +0100, Morten Rasmussen wrote:
>> cpuidle associates all idle-states with each cpu while the energy model
>> associates them with the sched_group covering the cpus coordinating
>> entry to the idle-state. To look up the idle-state power consumption in
>> the energy model it is therefore necessary to translate from cpuidle
>> idle-state index to energy model index. For this purpose it is helpful
>> to know how many idle-states that are listed in lower level sched_groups
>> (in struct sched_group_energy).
>>
>> Example: ARMv8 big.LITTLE JUNO (Cortex A57, A53) idle-states:
>> Idle-state              cpuidle         Energy model table indices
>>                         index           per-cpu sg      per-cluster sg
>> WFI                     0               0               (0)
>> Core power-down         1               1               0*
>> Cluster power-down      2               (1)             1
>>
>> For per-cpu sgs no translation is required. If cpuidle reports state
>> index 0 or 1, the cpu is in WFI or core power-down, respectively. We can
>> look the idle-power up directly in the sg energy model table.
> 
> OK..
> 
>> Idle-state
>> cluster power-down, is represented in the per-cluster sg energy model
>> table as index 1. Index 0* is reserved for cluster power consumption
>> when the cpus all are in state 0 or 1, but cpuidle decided not to go for
>> cluster power-down.
> 
> 0* is not an integer.
> 
>> Given the index from cpuidle we can compute the
>> correct index in the energy model tables for the sgs at each level if we
>> know how many states are in the tables in the child sgs. The actual
>> translation is implemented in a later patch.
> 
> And you've lost me... I've looked at that later patch (its the next one)
> and I cannot say I'm less confused.
> 

I think I understand this roughly but I don't understand why this isn't as simple as describing the power consumption at core and cluster level for each cpuidle state. If a particular cpuidle state has no real impact at the cluster level, then can't we just describe that in the power model? Sorry if this has already been discussed.

Every cpuidle state is essentially a Cx/CCx combination. For the WFI/core pd case above, the cluster state turns out to be CC0 (cluster 'active').

Thanks,
-Sai
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8ac2db8..6b5e44d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1031,6 +1031,7 @@  struct sched_group_energy {
 	atomic_t ref;
 	unsigned int nr_idle_states;	/* number of idle states */
 	struct idle_state *idle_states;	/* ptr to idle state array */
+	unsigned int nr_idle_states_below; /* number idle states in lower groups */
 	unsigned int nr_cap_states;	/* number of capacity states */
 	struct capacity_state *cap_states; /* ptr to capacity state array */
 };
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fe4b361..c13fa9c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6117,6 +6117,7 @@  static void init_sched_energy(int cpu, struct sched_domain *sd,
 	struct sched_group_energy *sge = sg->sge;
 	sched_domain_energy_f fn = tl->energy;
 	struct cpumask *mask = sched_group_cpus(sg);
+	int nr_idle_states_below = 0;
 
 	if (fn && sd->child && !sd->child->groups->sge) {
 		pr_err("BUG: EAS setup broken for CPU%d\n", cpu);
@@ -6140,7 +6141,18 @@  static void init_sched_energy(int cpu, struct sched_domain *sd,
 	if (cpumask_weight(mask) > 1)
 		check_sched_energy_data(cpu, fn, mask);
 
+	/* Figure out the number of true cpuidle states below current group */
+	sd = sd->child;
+	for_each_lower_domain(sd) {
+		nr_idle_states_below += sd->groups->sge->nr_idle_states;
+
+		/* Disregard non-cpuidle 'active' idle states */
+		if (sd->child)
+			nr_idle_states_below--;
+	}
+
 	sge->nr_idle_states = fn(cpu)->nr_idle_states;
+	sge->nr_idle_states_below = nr_idle_states_below;
 	sge->nr_cap_states = fn(cpu)->nr_cap_states;
 	sge->idle_states = (struct idle_state *)
 			   ((void *)&sge->cap_states +