diff mbox series

[v7,08/14] sched/topology: Disable EAS on inappropriate platforms

Message ID 20180912091309.7551-9-quentin.perret@arm.com (mailing list archive)
State Changes Requested, archived
Headers show
Series Energy Aware Scheduling | expand

Commit Message

Quentin Perret Sept. 12, 2018, 9:13 a.m. UTC
Energy Aware Scheduling (EAS) in its current form is most relevant on
platforms with asymmetric CPU topologies (e.g. Arm big.LITTLE) since
this is where there is a lot of potential for saving energy through
scheduling. This is particularly true since the Energy Model only
includes the active power costs of CPUs, hence not providing enough data
to compare packing-vs-spreading strategies.

As such, disable EAS on root domains where the SD_ASYM_CPUCAPACITY flag
is not set. While at it, disable EAS on systems where the complexity of
the Energy Model is too high since that could lead to unacceptable
scheduling overhead.

All in all, EAS can be used on a root domain if and only if:
  1. the ENERGY_AWARE sched_feat is enabled;
  2. the root domain has an asymmetric CPU capacity topology;
  3. the complexity of the root domain's EM is low enough to keep
     scheduling overheads low.

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Quentin Perret <quentin.perret@arm.com>
---
 kernel/sched/topology.c | 50 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 49 insertions(+), 1 deletion(-)

Comments

Peter Zijlstra Oct. 3, 2018, 4:27 p.m. UTC | #1
On Wed, Sep 12, 2018 at 10:13:03AM +0100, Quentin Perret wrote:
> @@ -288,6 +321,21 @@ static void build_perf_domains(const struct cpumask *cpu_map)
>  			goto free;
>  		tmp->next = pd;
>  		pd = tmp;
> +
> +		/*
> +		 * Count performance domains and capacity states for the
> +		 * complexity check.
> +		 */
> +		nr_pd++;
> +		nr_cs += em_pd_nr_cap_states(pd->obj);
> +	}
> +
> +	/* Bail out if the Energy Model complexity is too high. */
> +	if (nr_pd * (nr_cs + nr_cpus) > EM_MAX_COMPLEXITY) {
> +		if (sched_debug())
> +			pr_info("rd %*pbl: EM complexity is too high\n ",
> +						cpumask_pr_args(cpu_map));
> +		goto free;
>  	}

I would make than an unconditional WARN, we do not really expect that to
trigger, but then it does, we really don't want to hide it.
Quentin Perret Oct. 4, 2018, 9:10 a.m. UTC | #2
On Wednesday 03 Oct 2018 at 18:27:19 (+0200), Peter Zijlstra wrote:
> On Wed, Sep 12, 2018 at 10:13:03AM +0100, Quentin Perret wrote:
> > @@ -288,6 +321,21 @@ static void build_perf_domains(const struct cpumask *cpu_map)
> >  			goto free;
> >  		tmp->next = pd;
> >  		pd = tmp;
> > +
> > +		/*
> > +		 * Count performance domains and capacity states for the
> > +		 * complexity check.
> > +		 */
> > +		nr_pd++;
> > +		nr_cs += em_pd_nr_cap_states(pd->obj);
> > +	}
> > +
> > +	/* Bail out if the Energy Model complexity is too high. */
> > +	if (nr_pd * (nr_cs + nr_cpus) > EM_MAX_COMPLEXITY) {
> > +		if (sched_debug())
> > +			pr_info("rd %*pbl: EM complexity is too high\n ",
> > +						cpumask_pr_args(cpu_map));
> > +		goto free;
> >  	}
> 
> I would make than an unconditional WARN, we do not really expect that to
> trigger, but then it does, we really don't want to hide it.

OTOH that also means that some people with big asymmetric machines can
get a WARN message every time they boot, and even if they don't want to
use EAS.

Now, that shouldn't happen any time soon, so it's maybe a good thing if
we get reports when/if people start to hit that one, so why not ...

Thanks,
Quentin
Peter Zijlstra Oct. 4, 2018, 9:38 a.m. UTC | #3
On Thu, Oct 04, 2018 at 10:10:48AM +0100, Quentin Perret wrote:
> On Wednesday 03 Oct 2018 at 18:27:19 (+0200), Peter Zijlstra wrote:
> > On Wed, Sep 12, 2018 at 10:13:03AM +0100, Quentin Perret wrote:
> > > @@ -288,6 +321,21 @@ static void build_perf_domains(const struct cpumask *cpu_map)
> > >  			goto free;
> > >  		tmp->next = pd;
> > >  		pd = tmp;
> > > +
> > > +		/*
> > > +		 * Count performance domains and capacity states for the
> > > +		 * complexity check.
> > > +		 */
> > > +		nr_pd++;
> > > +		nr_cs += em_pd_nr_cap_states(pd->obj);
> > > +	}
> > > +
> > > +	/* Bail out if the Energy Model complexity is too high. */
> > > +	if (nr_pd * (nr_cs + nr_cpus) > EM_MAX_COMPLEXITY) {
> > > +		if (sched_debug())
> > > +			pr_info("rd %*pbl: EM complexity is too high\n ",
> > > +						cpumask_pr_args(cpu_map));
> > > +		goto free;
> > >  	}
> > 
> > I would make than an unconditional WARN, we do not really expect that to
> > trigger, but then it does, we really don't want to hide it.
> 
> OTOH that also means that some people with big asymmetric machines can
> get a WARN message every time they boot, and even if they don't want to
> use EAS.
> 
> Now, that shouldn't happen any time soon, so it's maybe a good thing if
> we get reports when/if people start to hit that one, so why not ...

Right, and if becomes a real problem we can think of a solution (like
maybe a DT thingy that says to not use EAS, or a 'better' EAS
algorithm).
Quentin Perret Oct. 4, 2018, 9:45 a.m. UTC | #4
On Thursday 04 Oct 2018 at 11:38:48 (+0200), Peter Zijlstra wrote:
> On Thu, Oct 04, 2018 at 10:10:48AM +0100, Quentin Perret wrote:
> > On Wednesday 03 Oct 2018 at 18:27:19 (+0200), Peter Zijlstra wrote:
> > > On Wed, Sep 12, 2018 at 10:13:03AM +0100, Quentin Perret wrote:
> > > > @@ -288,6 +321,21 @@ static void build_perf_domains(const struct cpumask *cpu_map)
> > > >  			goto free;
> > > >  		tmp->next = pd;
> > > >  		pd = tmp;
> > > > +
> > > > +		/*
> > > > +		 * Count performance domains and capacity states for the
> > > > +		 * complexity check.
> > > > +		 */
> > > > +		nr_pd++;
> > > > +		nr_cs += em_pd_nr_cap_states(pd->obj);
> > > > +	}
> > > > +
> > > > +	/* Bail out if the Energy Model complexity is too high. */
> > > > +	if (nr_pd * (nr_cs + nr_cpus) > EM_MAX_COMPLEXITY) {
> > > > +		if (sched_debug())
> > > > +			pr_info("rd %*pbl: EM complexity is too high\n ",
> > > > +						cpumask_pr_args(cpu_map));
> > > > +		goto free;
> > > >  	}
> > > 
> > > I would make than an unconditional WARN, we do not really expect that to
> > > trigger, but then it does, we really don't want to hide it.
> > 
> > OTOH that also means that some people with big asymmetric machines can
> > get a WARN message every time they boot, and even if they don't want to
> > use EAS.
> > 
> > Now, that shouldn't happen any time soon, so it's maybe a good thing if
> > we get reports when/if people start to hit that one, so why not ...
> 
> Right, and if becomes a real problem we can think of a solution (like
> maybe a DT thingy that says to not use EAS, or a 'better' EAS
> algorithm).

That works for me. I'll switch to a plain WARN in v8.

Thanks,
Quentin
diff mbox series

Patch

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 10e37ffea19a..0d18d69b719c 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -270,12 +270,45 @@  static void destroy_perf_domain_rcu(struct rcu_head *rp)
 	free_pd(pd);
 }
 
+/*
+ * EAS can be used on a root domain if it meets all the following conditions:
+ *    1. the ENERGY_AWARE sched_feat is enabled;
+ *    2. the SD_ASYM_CPUCAPACITY flag is set in the sched_domain hierarchy.
+ *    3. the EM complexity is low enough to keep scheduling overheads low;
+ *
+ * The complexity of the Energy Model is defined as:
+ *
+ *              C = nr_pd * (nr_cpus + nr_cs)
+ *
+ * with parameters defined as:
+ *  - nr_pd:    the number of performance domains
+ *  - nr_cpus:  the number of CPUs
+ *  - nr_cs:    the sum of the number of capacity states of all performance
+ *              domains (for example, on a system with 2 performance domains,
+ *              with 10 capacity states each, nr_cs = 2 * 10 = 20).
+ *
+ * It is generally not a good idea to use such a model in the wake-up path on
+ * very complex platforms because of the associated scheduling overheads. The
+ * arbitrary constraint below prevents that. It makes EAS usable up to 16 CPUs
+ * with per-CPU DVFS and less than 8 capacity states each, for example.
+ */
+#define EM_MAX_COMPLEXITY 2048
+
 static void build_perf_domains(const struct cpumask *cpu_map)
 {
+	int i, nr_pd = 0, nr_cs = 0, nr_cpus = cpumask_weight(cpu_map);
 	struct perf_domain *pd = NULL, *tmp;
 	int cpu = cpumask_first(cpu_map);
 	struct root_domain *rd = cpu_rq(cpu)->rd;
-	int i;
+
+	/* EAS is enabled for asymmetric CPU capacity topologies. */
+	if (!per_cpu(sd_asym_cpucapacity, cpu)) {
+		if (sched_debug()) {
+			pr_info("rd %*pbl: CPUs do not have asymmetric capacities\n",
+					cpumask_pr_args(cpu_map));
+		}
+		goto free;
+	}
 
 	for_each_cpu(i, cpu_map) {
 		/* Skip already covered CPUs. */
@@ -288,6 +321,21 @@  static void build_perf_domains(const struct cpumask *cpu_map)
 			goto free;
 		tmp->next = pd;
 		pd = tmp;
+
+		/*
+		 * Count performance domains and capacity states for the
+		 * complexity check.
+		 */
+		nr_pd++;
+		nr_cs += em_pd_nr_cap_states(pd->obj);
+	}
+
+	/* Bail out if the Energy Model complexity is too high. */
+	if (nr_pd * (nr_cs + nr_cpus) > EM_MAX_COMPLEXITY) {
+		if (sched_debug())
+			pr_info("rd %*pbl: EM complexity is too high\n ",
+						cpumask_pr_args(cpu_map));
+		goto free;
 	}
 
 	perf_domain_debug(cpu_map, pd);