Message ID | 20180521142505.6522-10-quentin.perret@arm.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Hi, On 21/05/18 15:25, Quentin Perret wrote: [...] > +static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) > +{ > + unsigned long cur_energy, prev_energy, best_energy, cpu_cap, task_util; > + int cpu, best_energy_cpu = prev_cpu; > + struct sched_energy_fd *sfd; > + struct sched_domain *sd; > + > + sync_entity_load_avg(&p->se); > + > + task_util = task_util_est(p); > + if (!task_util) > + return prev_cpu; > + > + /* > + * Energy-aware wake-up happens on the lowest sched_domain starting > + * from sd_ea spanning over this_cpu and prev_cpu. > + */ > + sd = rcu_dereference(*this_cpu_ptr(&sd_ea)); > + while (sd && !cpumask_test_cpu(prev_cpu, sched_domain_span(sd))) > + sd = sd->parent; > + if (!sd) > + return -1; Shouldn't this be return prev_cpu? > + > + if (cpumask_test_cpu(prev_cpu, &p->cpus_allowed)) > + prev_energy = best_energy = compute_energy(p, prev_cpu); > + else > + prev_energy = best_energy = ULONG_MAX; > + > + for_each_freq_domain(sfd) { > + unsigned long spare_cap, max_spare_cap = 0; > + int max_spare_cap_cpu = -1; > + unsigned long util; > + > + /* Find the CPU with the max spare cap in the freq. dom. */ I undestand this being a heuristic to cut some overhead, but shouldn't the model tell between packing vs. spreading? Thanks, -Juri
On Friday 08 Jun 2018 at 12:24:46 (+0200), Juri Lelli wrote: > Hi, > > On 21/05/18 15:25, Quentin Perret wrote: > > [...] > > > +static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) > > +{ > > + unsigned long cur_energy, prev_energy, best_energy, cpu_cap, task_util; > > + int cpu, best_energy_cpu = prev_cpu; > > + struct sched_energy_fd *sfd; > > + struct sched_domain *sd; > > + > > + sync_entity_load_avg(&p->se); > > + > > + task_util = task_util_est(p); > > + if (!task_util) > > + return prev_cpu; > > + > > + /* > > + * Energy-aware wake-up happens on the lowest sched_domain starting > > + * from sd_ea spanning over this_cpu and prev_cpu. > > + */ > > + sd = rcu_dereference(*this_cpu_ptr(&sd_ea)); > > + while (sd && !cpumask_test_cpu(prev_cpu, sched_domain_span(sd))) > > + sd = sd->parent; > > + if (!sd) > > + return -1; > > Shouldn't this be return prev_cpu? Well, you shouldn't be entering this function without an sd_ea pointer, so this case is a sort of bug I think. By returning -1 I think we should end-up picking a CPU using select_fallback_rq(), which sort of makes sense ? > > > + > > + if (cpumask_test_cpu(prev_cpu, &p->cpus_allowed)) > > + prev_energy = best_energy = compute_energy(p, prev_cpu); > > + else > > + prev_energy = best_energy = ULONG_MAX; > > + > > + for_each_freq_domain(sfd) { > > + unsigned long spare_cap, max_spare_cap = 0; > > + int max_spare_cap_cpu = -1; > > + unsigned long util; > > + > > + /* Find the CPU with the max spare cap in the freq. dom. */ > > I undestand this being a heuristic to cut some overhead, but shouldn't > the model tell between packing vs. spreading? Ah, that's a very interesting one :-) ! So, with only active costs of the CPUs in the model, we can't really tell what's best between packing or spreading between identical CPUs if the migration of the task doesn't change the OPP request. In a frequency domain, all the "best" CPU candidates for a task are those for which we'll request a low OPP. When there are several CPUs for which the OPP request will be the same, we just don't know which one to pick from an energy standpoint, because we don't have other energy costs (for idle states for ex) to break the tie. With this EM, the interesting thing is that if you assume that OPP requests follow utilization, you are _guaranteed_ that the CPU with the max spare capacity in a freq domain will always be among the best candidates of this freq domain. And since we don't know how to differentiate those candidates, why not using this one ? Yes, it _might_ be better from an energy standpoint to pack small tasks on a CPU in order to let other CPUs go in deeper idle states. But that also hurts your chances to go cluster idle. Which solution is the best ? It depends, and we have no ways to tell with this EM. This approach basically favors cluster-packing, and spreading inside a cluster. That should at least be a good thing for latency, and this is consistent with the idea that most of the energy savings come from the asymmetry of the system, and not so much from breaking the tie between identical CPUs. That's also the reason why EAS is enabled only if your system has SD_ASYM_CPUCAPACITY set, as we already discussed for patch 05/10 :-). Does that make sense ? Thanks, Quentin
On 08/06/18 12:19, Quentin Perret wrote: > On Friday 08 Jun 2018 at 12:24:46 (+0200), Juri Lelli wrote: > > Hi, > > > > On 21/05/18 15:25, Quentin Perret wrote: > > > > [...] > > > > > +static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) > > > +{ > > > + unsigned long cur_energy, prev_energy, best_energy, cpu_cap, task_util; > > > + int cpu, best_energy_cpu = prev_cpu; > > > + struct sched_energy_fd *sfd; > > > + struct sched_domain *sd; > > > + > > > + sync_entity_load_avg(&p->se); > > > + > > > + task_util = task_util_est(p); > > > + if (!task_util) > > > + return prev_cpu; > > > + > > > + /* > > > + * Energy-aware wake-up happens on the lowest sched_domain starting > > > + * from sd_ea spanning over this_cpu and prev_cpu. > > > + */ > > > + sd = rcu_dereference(*this_cpu_ptr(&sd_ea)); > > > + while (sd && !cpumask_test_cpu(prev_cpu, sched_domain_span(sd))) > > > + sd = sd->parent; > > > + if (!sd) > > > + return -1; > > > > Shouldn't this be return prev_cpu? > > Well, you shouldn't be entering this function without an sd_ea pointer, > so this case is a sort of bug I think. By returning -1 I think we should > end-up picking a CPU using select_fallback_rq(), which sort of makes > sense ? I fear cpumask_test_cpu() and such won't be happy with a -1 arg. If it's a recoverable bug, I'd say return prev and WARN_ON_ONCE() ? > > > + > > > + if (cpumask_test_cpu(prev_cpu, &p->cpus_allowed)) > > > + prev_energy = best_energy = compute_energy(p, prev_cpu); > > > + else > > > + prev_energy = best_energy = ULONG_MAX; > > > + > > > + for_each_freq_domain(sfd) { > > > + unsigned long spare_cap, max_spare_cap = 0; > > > + int max_spare_cap_cpu = -1; > > > + unsigned long util; > > > + > > > + /* Find the CPU with the max spare cap in the freq. dom. */ > > > > I undestand this being a heuristic to cut some overhead, but shouldn't > > the model tell between packing vs. spreading? > > Ah, that's a very interesting one :-) ! > > So, with only active costs of the CPUs in the model, we can't really > tell what's best between packing or spreading between identical CPUs if > the migration of the task doesn't change the OPP request. > > In a frequency domain, all the "best" CPU candidates for a task are > those for which we'll request a low OPP. When there are several CPUs for > which the OPP request will be the same, we just don't know which one to > pick from an energy standpoint, because we don't have other energy costs > (for idle states for ex) to break the tie. > > With this EM, the interesting thing is that if you assume that OPP > requests follow utilization, you are _guaranteed_ that the CPU with > the max spare capacity in a freq domain will always be among the best > candidates of this freq domain. And since we don't know how to > differentiate those candidates, why not using this one ? > > Yes, it _might_ be better from an energy standpoint to pack small tasks > on a CPU in order to let other CPUs go in deeper idle states. But that > also hurts your chances to go cluster idle. Which solution is the best ? > It depends, and we have no ways to tell with this EM. > > This approach basically favors cluster-packing, and spreading inside a > cluster. That should at least be a good thing for latency, and this is > consistent with the idea that most of the energy savings come from the > asymmetry of the system, and not so much from breaking the tie between > identical CPUs. That's also the reason why EAS is enabled only if your > system has SD_ASYM_CPUCAPACITY set, as we already discussed for patch > 05/10 :-). > > Does that make sense ? Yes, thanks for the explanation. It would probably make sense to copy and paste your text above somewhere in comment/doc for future ref.
On Friday 08 Jun 2018 at 13:59:28 (+0200), Juri Lelli wrote: > On 08/06/18 12:19, Quentin Perret wrote: > > On Friday 08 Jun 2018 at 12:24:46 (+0200), Juri Lelli wrote: > > > Hi, > > > > > > On 21/05/18 15:25, Quentin Perret wrote: > > > > > > [...] > > > > > > > +static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) > > > > +{ > > > > + unsigned long cur_energy, prev_energy, best_energy, cpu_cap, task_util; > > > > + int cpu, best_energy_cpu = prev_cpu; > > > > + struct sched_energy_fd *sfd; > > > > + struct sched_domain *sd; > > > > + > > > > + sync_entity_load_avg(&p->se); > > > > + > > > > + task_util = task_util_est(p); > > > > + if (!task_util) > > > > + return prev_cpu; > > > > + > > > > + /* > > > > + * Energy-aware wake-up happens on the lowest sched_domain starting > > > > + * from sd_ea spanning over this_cpu and prev_cpu. > > > > + */ > > > > + sd = rcu_dereference(*this_cpu_ptr(&sd_ea)); > > > > + while (sd && !cpumask_test_cpu(prev_cpu, sched_domain_span(sd))) > > > > + sd = sd->parent; > > > > + if (!sd) > > > > + return -1; > > > > > > Shouldn't this be return prev_cpu? > > > > Well, you shouldn't be entering this function without an sd_ea pointer, > > so this case is a sort of bug I think. By returning -1 I think we should > > end-up picking a CPU using select_fallback_rq(), which sort of makes > > sense ? > > I fear cpumask_test_cpu() and such won't be happy with a -1 arg. > If it's a recoverable bug, I'd say return prev and WARN_ON_ONCE() ? Hmmm, yes, prev + WARN_ON_ONCE is probably appropriate here then. > > > > > + > > > > + if (cpumask_test_cpu(prev_cpu, &p->cpus_allowed)) > > > > + prev_energy = best_energy = compute_energy(p, prev_cpu); > > > > + else > > > > + prev_energy = best_energy = ULONG_MAX; > > > > + > > > > + for_each_freq_domain(sfd) { > > > > + unsigned long spare_cap, max_spare_cap = 0; > > > > + int max_spare_cap_cpu = -1; > > > > + unsigned long util; > > > > + > > > > + /* Find the CPU with the max spare cap in the freq. dom. */ > > > > > > I undestand this being a heuristic to cut some overhead, but shouldn't > > > the model tell between packing vs. spreading? > > > > Ah, that's a very interesting one :-) ! > > > > So, with only active costs of the CPUs in the model, we can't really > > tell what's best between packing or spreading between identical CPUs if > > the migration of the task doesn't change the OPP request. > > > > In a frequency domain, all the "best" CPU candidates for a task are > > those for which we'll request a low OPP. When there are several CPUs for > > which the OPP request will be the same, we just don't know which one to > > pick from an energy standpoint, because we don't have other energy costs > > (for idle states for ex) to break the tie. > > > > With this EM, the interesting thing is that if you assume that OPP > > requests follow utilization, you are _guaranteed_ that the CPU with > > the max spare capacity in a freq domain will always be among the best > > candidates of this freq domain. And since we don't know how to > > differentiate those candidates, why not using this one ? > > > > Yes, it _might_ be better from an energy standpoint to pack small tasks > > on a CPU in order to let other CPUs go in deeper idle states. But that > > also hurts your chances to go cluster idle. Which solution is the best ? > > It depends, and we have no ways to tell with this EM. > > > > This approach basically favors cluster-packing, and spreading inside a > > cluster. That should at least be a good thing for latency, and this is > > consistent with the idea that most of the energy savings come from the > > asymmetry of the system, and not so much from breaking the tie between > > identical CPUs. That's also the reason why EAS is enabled only if your > > system has SD_ASYM_CPUCAPACITY set, as we already discussed for patch > > 05/10 :-). > > > > Does that make sense ? > > Yes, thanks for the explanation. It would probably make sense to copy > and paste your text above somewhere in comment/doc for future ref. OK, will do. Thanks ! Quentin
On Mon, May 21, 2018 at 03:25:04PM +0100, Quentin Perret wrote: <snip> > + if (cpumask_test_cpu(prev_cpu, &p->cpus_allowed)) > + prev_energy = best_energy = compute_energy(p, prev_cpu); > + else > + prev_energy = best_energy = ULONG_MAX; > + > + for_each_freq_domain(sfd) { > + unsigned long spare_cap, max_spare_cap = 0; > + int max_spare_cap_cpu = -1; > + unsigned long util; > + > + /* Find the CPU with the max spare cap in the freq. dom. */ > + for_each_cpu_and(cpu, freq_domain_span(sfd), sched_domain_span(sd)) { > + if (!cpumask_test_cpu(cpu, &p->cpus_allowed)) > + continue; > + > + if (cpu == prev_cpu) > + continue; > + > + /* Skip CPUs that will be overutilized */ > + util = cpu_util_wake(cpu, p) + task_util; > + cpu_cap = capacity_of(cpu); > + if (cpu_cap * 1024 < util * capacity_margin) > + continue; > + > + spare_cap = cpu_cap - util; > + if (spare_cap > max_spare_cap) { > + max_spare_cap = spare_cap; > + max_spare_cap_cpu = cpu; > + } > + } > + > + /* Evaluate the energy impact of using this CPU. */ > + if (max_spare_cap_cpu >= 0) { > + cur_energy = compute_energy(p, max_spare_cap_cpu); > + if (cur_energy < best_energy) { > + best_energy = cur_energy; > + best_energy_cpu = max_spare_cap_cpu; > + } > + } > + } > + > + /* > + * We pick the best CPU only if it saves at least 1.5% of the > + * energy used by prev_cpu. > + */ > + if ((prev_energy - best_energy) > (prev_energy >> 6)) > + return best_energy_cpu; > + > + return prev_cpu; > +} We are comparing the best_energy_cpu against prev_cpu even when prev_cpu can't accommodate the waking task. Is this intentional? Should not we discard the prev_cpu if it can't accommodate the task. This can potentially make a BIG task run on a lower capacity CPU until load balancer kicks in and corrects the situation. Thanks, Pavan
Hi Pavan, On Tuesday 19 Jun 2018 at 10:36:01 (+0530), Pavan Kondeti wrote: > On Mon, May 21, 2018 at 03:25:04PM +0100, Quentin Perret wrote: > > <snip> > > > + if (cpumask_test_cpu(prev_cpu, &p->cpus_allowed)) > > + prev_energy = best_energy = compute_energy(p, prev_cpu); > > + else > > + prev_energy = best_energy = ULONG_MAX; > > + > > + for_each_freq_domain(sfd) { > > + unsigned long spare_cap, max_spare_cap = 0; > > + int max_spare_cap_cpu = -1; > > + unsigned long util; > > + > > + /* Find the CPU with the max spare cap in the freq. dom. */ > > + for_each_cpu_and(cpu, freq_domain_span(sfd), sched_domain_span(sd)) { > > + if (!cpumask_test_cpu(cpu, &p->cpus_allowed)) > > + continue; > > + > > + if (cpu == prev_cpu) > > + continue; > > + > > + /* Skip CPUs that will be overutilized */ > > + util = cpu_util_wake(cpu, p) + task_util; > > + cpu_cap = capacity_of(cpu); > > + if (cpu_cap * 1024 < util * capacity_margin) > > + continue; > > + > > + spare_cap = cpu_cap - util; > > + if (spare_cap > max_spare_cap) { > > + max_spare_cap = spare_cap; > > + max_spare_cap_cpu = cpu; > > + } > > + } > > + > > + /* Evaluate the energy impact of using this CPU. */ > > + if (max_spare_cap_cpu >= 0) { > > + cur_energy = compute_energy(p, max_spare_cap_cpu); > > + if (cur_energy < best_energy) { > > + best_energy = cur_energy; > > + best_energy_cpu = max_spare_cap_cpu; > > + } > > + } > > + } > > + > > + /* > > + * We pick the best CPU only if it saves at least 1.5% of the > > + * energy used by prev_cpu. > > + */ > > + if ((prev_energy - best_energy) > (prev_energy >> 6)) > > + return best_energy_cpu; > > + > > + return prev_cpu; > > +} > > We are comparing the best_energy_cpu against prev_cpu even when prev_cpu > can't accommodate the waking task. Is this intentional? Should not we > discard the prev_cpu if it can't accommodate the task. > > This can potentially make a BIG task run on a lower capacity CPU until > load balancer kicks in and corrects the situation. We shouldn't enter find_energy_efficient_cpu() in the first place if the system is overutilized, so that shouldn't too much of an issue in general. But yeah, there is one small corner case where prev_cpu is overutilized and the system has not been flagged as such yet (when the tasks wakes-up shortly before the tick for ex). I think it's possible to cover this case by extending the "if (cpumask_test_cpu(prev_cpu, &p->cpus_allowed))" condition at the top of the function with a check on capacity_margin. I'll change that in v4. Thanks ! Quentin
On Tue, Jun 19, 2018 at 08:57:23AM +0100, Quentin Perret wrote: > Hi Pavan, > > On Tuesday 19 Jun 2018 at 10:36:01 (+0530), Pavan Kondeti wrote: > > On Mon, May 21, 2018 at 03:25:04PM +0100, Quentin Perret wrote: > > > > <snip> > > > > > + if (cpumask_test_cpu(prev_cpu, &p->cpus_allowed)) > > > + prev_energy = best_energy = compute_energy(p, prev_cpu); > > > + else > > > + prev_energy = best_energy = ULONG_MAX; > > > + > > > + for_each_freq_domain(sfd) { > > > + unsigned long spare_cap, max_spare_cap = 0; > > > + int max_spare_cap_cpu = -1; > > > + unsigned long util; > > > + > > > + /* Find the CPU with the max spare cap in the freq. dom. */ > > > + for_each_cpu_and(cpu, freq_domain_span(sfd), sched_domain_span(sd)) { > > > + if (!cpumask_test_cpu(cpu, &p->cpus_allowed)) > > > + continue; > > > + > > > + if (cpu == prev_cpu) > > > + continue; > > > + > > > + /* Skip CPUs that will be overutilized */ > > > + util = cpu_util_wake(cpu, p) + task_util; > > > + cpu_cap = capacity_of(cpu); > > > + if (cpu_cap * 1024 < util * capacity_margin) > > > + continue; > > > + > > > + spare_cap = cpu_cap - util; > > > + if (spare_cap > max_spare_cap) { > > > + max_spare_cap = spare_cap; > > > + max_spare_cap_cpu = cpu; > > > + } > > > + } > > > + > > > + /* Evaluate the energy impact of using this CPU. */ > > > + if (max_spare_cap_cpu >= 0) { > > > + cur_energy = compute_energy(p, max_spare_cap_cpu); > > > + if (cur_energy < best_energy) { > > > + best_energy = cur_energy; > > > + best_energy_cpu = max_spare_cap_cpu; > > > + } > > > + } > > > + } > > > + > > > + /* > > > + * We pick the best CPU only if it saves at least 1.5% of the > > > + * energy used by prev_cpu. > > > + */ > > > + if ((prev_energy - best_energy) > (prev_energy >> 6)) > > > + return best_energy_cpu; > > > + > > > + return prev_cpu; > > > +} > > > > We are comparing the best_energy_cpu against prev_cpu even when prev_cpu > > can't accommodate the waking task. Is this intentional? Should not we > > discard the prev_cpu if it can't accommodate the task. > > > > This can potentially make a BIG task run on a lower capacity CPU until > > load balancer kicks in and corrects the situation. > > We shouldn't enter find_energy_efficient_cpu() in the first place if the > system is overutilized, so that shouldn't too much of an issue in > general. > With UTIL_EST enabled, the overutilization condtion may get turned off when that 1 BIG task goes to sleep. When it wakes up again, we may place it on the previous CPU due to the above mentioned issue. It is not just an existing overutilization condition. By placing this task on the prev_cpu, we may enter overutilization state. > But yeah, there is one small corner case where prev_cpu is overutilized > and the system has not been flagged as such yet (when the tasks wakes-up > shortly before the tick for ex). I think it's possible to cover this case > by extending the "if (cpumask_test_cpu(prev_cpu, &p->cpus_allowed))" > condition at the top of the function with a check on capacity_margin. > LGTM.
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1f7029258df2..eb44829be17f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6683,6 +6683,80 @@ static long compute_energy(struct task_struct *p, int dst_cpu) return energy; } +static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) +{ + unsigned long cur_energy, prev_energy, best_energy, cpu_cap, task_util; + int cpu, best_energy_cpu = prev_cpu; + struct sched_energy_fd *sfd; + struct sched_domain *sd; + + sync_entity_load_avg(&p->se); + + task_util = task_util_est(p); + if (!task_util) + return prev_cpu; + + /* + * Energy-aware wake-up happens on the lowest sched_domain starting + * from sd_ea spanning over this_cpu and prev_cpu. + */ + sd = rcu_dereference(*this_cpu_ptr(&sd_ea)); + while (sd && !cpumask_test_cpu(prev_cpu, sched_domain_span(sd))) + sd = sd->parent; + if (!sd) + return -1; + + if (cpumask_test_cpu(prev_cpu, &p->cpus_allowed)) + prev_energy = best_energy = compute_energy(p, prev_cpu); + else + prev_energy = best_energy = ULONG_MAX; + + for_each_freq_domain(sfd) { + unsigned long spare_cap, max_spare_cap = 0; + int max_spare_cap_cpu = -1; + unsigned long util; + + /* Find the CPU with the max spare cap in the freq. dom. */ + for_each_cpu_and(cpu, freq_domain_span(sfd), sched_domain_span(sd)) { + if (!cpumask_test_cpu(cpu, &p->cpus_allowed)) + continue; + + if (cpu == prev_cpu) + continue; + + /* Skip CPUs that will be overutilized */ + util = cpu_util_wake(cpu, p) + task_util; + cpu_cap = capacity_of(cpu); + if (cpu_cap * 1024 < util * capacity_margin) + continue; + + spare_cap = cpu_cap - util; + if (spare_cap > max_spare_cap) { + max_spare_cap = spare_cap; + max_spare_cap_cpu = cpu; + } + } + + /* Evaluate the energy impact of using this CPU. */ + if (max_spare_cap_cpu >= 0) { + cur_energy = compute_energy(p, max_spare_cap_cpu); + if (cur_energy < best_energy) { + best_energy = cur_energy; + best_energy_cpu = max_spare_cap_cpu; + } + } + } + + /* + * We pick the best CPU only if it saves at least 1.5% of the + * energy used by prev_cpu. + */ + if ((prev_energy - best_energy) > (prev_energy >> 6)) + return best_energy_cpu; + + return prev_cpu; +} + /* * select_task_rq_fair: Select target runqueue for the waking task in domains * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE, @@ -6701,16 +6775,23 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f struct sched_domain *tmp, *sd = NULL; int cpu = smp_processor_id(); int new_cpu = prev_cpu; - int want_affine = 0; + int want_affine = 0, want_energy = 0; int sync = (wake_flags & WF_SYNC) && !(current->flags & PF_EXITING); if (sd_flag & SD_BALANCE_WAKE) { record_wakee(p); + want_energy = sched_energy_enabled() && + !READ_ONCE(cpu_rq(cpu)->rd->overutilized); want_affine = !wake_wide(p) && !wake_cap(p, cpu, prev_cpu) && cpumask_test_cpu(cpu, &p->cpus_allowed); } rcu_read_lock(); + if (want_energy) { + new_cpu = find_energy_efficient_cpu(p, prev_cpu); + goto unlock; + } + for_each_domain(cpu, tmp) { if (!(tmp->flags & SD_LOAD_BALANCE)) break; @@ -6745,6 +6826,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f if (want_affine) current->recent_used_cpu = cpu; } +unlock: rcu_read_unlock(); return new_cpu;
If an energy model is available, and if the system isn't overutilized, waking tasks are re-routed into a new energy-aware placement algorithm. The selection of an energy-efficient CPU for a task is achieved by estimating the impact on system-level active energy resulting from the placement of the task on the CPU with the highest spare capacity in each frequency domain. This strategy spreads tasks in a frequency domain and avoids overly aggressive task packing. The best CPU energy-wise is then selected if it saves a large enough amount of energy with respect to prev_cpu. Although it has already shown significant benefits on some existing targets, this approach cannot scale to platforms with numerous CPUs. This patch is an attempt to do something useful as writing a fast heuristic that performs reasonably well on a broad spectrum of architectures isn't an easy task. As such, the scope of usability of the energy-aware wake-up path is restricted to systems with the SD_ASYM_CPUCAPACITY flag set, and where the EM isn't too complex. In addition, the energy-aware wake-up path is accessible only if sched_energy_enabled() is true. For systems which don't meet all dependencies for EAS (CONFIG_ENERGY_MODEL for ex.) at compile time, sched_enegy_enabled() defaults to a constant "false" value, hence letting the compiler remove the unused EAS code entirely. Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Quentin Perret <quentin.perret@arm.com> --- kernel/sched/fair.c | 84 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 83 insertions(+), 1 deletion(-)