Message ID | 1457932932-28444-7-git-send-email-mturquette+renesas@baylibre.com (mailing list archive) |
---|---|
State | RFC, archived |
Headers | show |
On Sun, Mar 13, 2016 at 10:22:10PM -0700, Michael Turquette wrote: > +static unsigned long sugov_sum_total_util(struct sugov_cpu *sg_cpu) > +{ > + enum sched_class_util sc; > + > + /* sum the utilization of all sched classes */ > + sg_cpu->total_util = 0; > + for (sc = 0; sc < nr_util_types; sc++) > + sg_cpu->total_util += sg_cpu->util[sc]; > + > + return sg_cpu->total_util; > +} > @@ -153,7 +172,7 @@ static unsigned int sugov_next_freq(struct sugov_policy *sg_policy, > if ((s64)delta_ns > NSEC_PER_SEC / HZ) > continue; > > - j_util = j_sg_cpu->util; > + j_util = j_sg_cpu->total_util; > j_max = j_sg_cpu->max; > if (j_util > j_max) > return max_f; So while not strictly wrong, I think we can do so much better. Changelog doesn't mention anything useful, like that this is indeed very rough and what we really should be doing etc.. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 15, 2016 at 03:09:51PM -0700, Michael Turquette wrote: > Quoting Peter Zijlstra (2016-03-15 14:29:26) > > On Sun, Mar 13, 2016 at 10:22:10PM -0700, Michael Turquette wrote: > > > > > +static unsigned long sugov_sum_total_util(struct sugov_cpu *sg_cpu) > > > +{ > > > + enum sched_class_util sc; > > > + > > > + /* sum the utilization of all sched classes */ > > > + sg_cpu->total_util = 0; > > > + for (sc = 0; sc < nr_util_types; sc++) > > > + sg_cpu->total_util += sg_cpu->util[sc]; > > > + > > > + return sg_cpu->total_util; > > > +} > > > > > @@ -153,7 +172,7 @@ static unsigned int sugov_next_freq(struct sugov_policy *sg_policy, > > > if ((s64)delta_ns > NSEC_PER_SEC / HZ) > > > continue; > > > > > > - j_util = j_sg_cpu->util; > > > + j_util = j_sg_cpu->total_util; > > > j_max = j_sg_cpu->max; > > > if (j_util > j_max) > > > return max_f; > > > > So while not strictly wrong, I think we can do so much better. > > > > Changelog doesn't mention anything useful, like that this is indeed very > > rough and what we really should be doing etc.. > > What should we really be doing? Summing the scheduler class > contributions seems correct to me. > > Are you referring to the fact that dl and rt are passing bogus values > into cpufreq_update_util()? If so I'm happy to add a note about that in > the changelog. Somewhere in the giant discussions I mentioned that we should be looking at a CPPC like interface and pass {min,max} tuples to the cpufreq selection thingy. In that same discussion I also mentioned that we must compute min as the hard dl reservation, but that for max we can actually use the avg dl + avg rt + avg cfs. That way there is far more room for selecting a sensible frequency. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/16/2016 12:38 AM, Peter Zijlstra wrote: > Somewhere in the giant discussions I mentioned that we should be looking > at a CPPC like interface and pass {min,max} tuples to the cpufreq > selection thingy. > > In that same discussion I also mentioned that we must compute min as the > hard dl reservation, but that for max we can actually use the avg dl + > avg rt + avg cfs. > > That way there is far more room for selecting a sensible frequency. Doesn't the above min/max policy mean that the platform will likely underserve the task load? If avg dl+rt+cfs represents our best estimate of the work to be done, I would think that should be the min. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 16, 2016 at 11:20:45AM -0700, Steve Muckle wrote: > On 03/16/2016 12:38 AM, Peter Zijlstra wrote: > > Somewhere in the giant discussions I mentioned that we should be looking > > at a CPPC like interface and pass {min,max} tuples to the cpufreq > > selection thingy. > > > > In that same discussion I also mentioned that we must compute min as the > > hard dl reservation, but that for max we can actually use the avg dl + > > avg rt + avg cfs. > > > > That way there is far more room for selecting a sensible frequency. > > Doesn't the above min/max policy mean that the platform will likely > underserve the task load? If avg dl+rt+cfs represents our best estimate > of the work to be done, I would think that should be the min. Can't be the min, avg_dl might (and typically will be) must lower than the worst case utilization estimates. However if we use that as our min, peaks in DL utilization will not complete, because we run at too low a frequency. Therefore, the min must be given by our worst case utilization reservation, not by the actual avg consumed. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/16/2016 11:36 AM, Peter Zijlstra wrote: > On Wed, Mar 16, 2016 at 11:20:45AM -0700, Steve Muckle wrote: >> On 03/16/2016 12:38 AM, Peter Zijlstra wrote: >>> Somewhere in the giant discussions I mentioned that we should be looking >>> at a CPPC like interface and pass {min,max} tuples to the cpufreq >>> selection thingy. >>> >>> In that same discussion I also mentioned that we must compute min as the >>> hard dl reservation, but that for max we can actually use the avg dl + >>> avg rt + avg cfs. >>> >>> That way there is far more room for selecting a sensible frequency. >> >> Doesn't the above min/max policy mean that the platform will likely >> underserve the task load? If avg dl+rt+cfs represents our best estimate >> of the work to be done, I would think that should be the min. > > Can't be the min, avg_dl might (and typically will be) must lower than > the worst case utilization estimates. > > However if we use that as our min, peaks in DL utilization will not > complete, because we run at too low a frequency. Doesn't that mean the max (if one is specified) should also use hard dl? I.e. hard dl + rt + cfs. > Therefore, the min must be given by our worst case utilization > reservation, not by the actual avg consumed. Ok sure. My concern was more about the platform potentially ignoring the CFS and RT capacity requests. So given the point about DL bw I'd think the min would then be hard dl + rt + cfs. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/cpufreq/cpufreq_schedutil.c b/drivers/cpufreq/cpufreq_schedutil.c index 18d9ca3..b9234e1 100644 --- a/drivers/cpufreq/cpufreq_schedutil.c +++ b/drivers/cpufreq/cpufreq_schedutil.c @@ -46,8 +46,10 @@ struct sugov_cpu { struct freq_update_hook update_hook; struct sugov_policy *sg_policy; + unsigned long util[nr_util_types]; + unsigned long total_util; + /* The fields below are only needed when sharing a policy. */ - unsigned long util; unsigned long max; u64 last_update; }; @@ -106,6 +108,18 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time, trace_cpu_frequency(freq, smp_processor_id()); } +static unsigned long sugov_sum_total_util(struct sugov_cpu *sg_cpu) +{ + enum sched_class_util sc; + + /* sum the utilization of all sched classes */ + sg_cpu->total_util = 0; + for (sc = 0; sc < nr_util_types; sc++) + sg_cpu->total_util += sg_cpu->util[sc]; + + return sg_cpu->total_util; +} + static void sugov_update_single(struct freq_update_hook *hook, enum sched_class_util sc, u64 time, unsigned long util, unsigned long max) @@ -113,12 +127,17 @@ static void sugov_update_single(struct freq_update_hook *hook, struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_hook); struct sugov_policy *sg_policy = sg_cpu->sg_policy; unsigned int max_f, next_f; + unsigned long total_util; if (!sugov_should_update_freq(sg_policy, time)) return; + /* update per-sched_class utilization for this cpu */ + sg_cpu->util[sc] = util; + total_util = sugov_sum_total_util(sg_cpu); + max_f = sg_policy->max_freq; - next_f = util > max ? max_f : util * max_f / max; + next_f = total_util > max ? max_f : total_util * max_f / max; sugov_update_commit(sg_policy, time, next_f); } @@ -153,7 +172,7 @@ static unsigned int sugov_next_freq(struct sugov_policy *sg_policy, if ((s64)delta_ns > NSEC_PER_SEC / HZ) continue; - j_util = j_sg_cpu->util; + j_util = j_sg_cpu->total_util; j_max = j_sg_cpu->max; if (j_util > j_max) return max_f; @@ -174,15 +193,19 @@ static void sugov_update_shared(struct freq_update_hook *hook, struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_hook); struct sugov_policy *sg_policy = sg_cpu->sg_policy; unsigned int next_f; + unsigned long total_util; raw_spin_lock(&sg_policy->update_lock); - sg_cpu->util = util; + sg_cpu->util[sc] = util; sg_cpu->max = max; sg_cpu->last_update = time; + /* update per-sched_class utilization for this cpu */ + total_util = sugov_sum_total_util(sg_cpu); + if (sugov_should_update_freq(sg_policy, time)) { - next_f = sugov_next_freq(sg_policy, util, max); + next_f = sugov_next_freq(sg_policy, total_util, max); sugov_update_commit(sg_policy, time, next_f); } @@ -423,6 +446,7 @@ static int sugov_start(struct cpufreq_policy *policy) { struct sugov_policy *sg_policy = policy->governor_data; unsigned int cpu; + enum sched_class_util sc; sg_policy->freq_update_delay_ns = sg_policy->tunables->rate_limit_us * NSEC_PER_USEC; sg_policy->last_freq_update_time = 0; @@ -434,8 +458,11 @@ static int sugov_start(struct cpufreq_policy *policy) struct sugov_cpu *sg_cpu = &per_cpu(sugov_cpu, cpu); sg_cpu->sg_policy = sg_policy; + for (sc = 0; sc < nr_util_types; sc++) { + sg_cpu->util[sc] = ULONG_MAX; + sg_cpu->total_util = ULONG_MAX; + } if (policy_is_shared(policy)) { - sg_cpu->util = ULONG_MAX; sg_cpu->max = 0; sg_cpu->last_update = 0; cpufreq_set_freq_update_hook(cpu, &sg_cpu->update_hook,
Patch, "sched/cpufreq: pass sched class into cpufreq_update_util" made it possible for calls of cpufreq_update_util() to specify scheduler class, particularly cfs, rt & dl. Update the schedutil governor to store these individual utilizations per cpu and sum them to create a total utilization contribution. Signed-off-by: Michael Turquette <mturquette+renesas@baylibre.com> --- drivers/cpufreq/cpufreq_schedutil.c | 39 +++++++++++++++++++++++++++++++------ 1 file changed, 33 insertions(+), 6 deletions(-)