Message ID | 20180828135324.21976-7-patrick.bellasi@arm.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
Series | Add utilization clamping support | expand |
On Tue, Aug 28, 2018 at 02:53:14PM +0100, Patrick Bellasi wrote: > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > index 3fffad3bc8a8..949082555ee8 100644 > --- a/kernel/sched/cpufreq_schedutil.c > +++ b/kernel/sched/cpufreq_schedutil.c > @@ -222,8 +222,13 @@ static unsigned long sugov_get_util(struct sugov_cpu *sg_cpu) > * CFS tasks and we use the same metric to track the effective > * utilization (PELT windows are synchronized) we can directly add them > * to obtain the CPU's actual utilization. > + * > + * CFS utilization can be boosted or capped, depending on utilization > + * clamp constraints configured for currently RUNNABLE tasks. > */ > util = cpu_util_cfs(rq); > + if (util) > + util = uclamp_util(rq, util); Should that not be: util = clamp_util(rq, cpu_util_cfs(rq)); Because if !util might we not still want to enforce the min clamp? > util += cpu_util_rt(rq); > > /* > @@ -322,11 +328,24 @@ static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, u64 time, > return; > sg_cpu->iowait_boost_pending = true; > > + /* > + * Boost FAIR tasks only up to the CPU clamped utilization. > + * > + * Since DL tasks have a much more advanced bandwidth control, it's > + * safe to assume that IO boost does not apply to those tasks. > + * Instead, since RT tasks are not utiliation clamped, we don't want > + * to apply clamping on IO boost while there is blocked RT > + * utilization. > + */ > + max_boost = sg_cpu->iowait_boost_max; > + if (!cpu_util_rt(cpu_rq(sg_cpu->cpu))) > + max_boost = uclamp_util(cpu_rq(sg_cpu->cpu), max_boost); OK I suppose. > + > /* Double the boost at each request */ > if (sg_cpu->iowait_boost) { > sg_cpu->iowait_boost <<= 1; > - if (sg_cpu->iowait_boost > sg_cpu->iowait_boost_max) > - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max; > + if (sg_cpu->iowait_boost > max_boost) > + sg_cpu->iowait_boost = max_boost; > return; > } > > +static inline unsigned int uclamp_value(struct rq *rq, int clamp_id) > +{ > + struct uclamp_cpu *uc_cpu = &rq->uclamp; > + > + if (uc_cpu->value[clamp_id] == UCLAMP_NOT_VALID) > + return uclamp_none(clamp_id); > + > + return uc_cpu->value[clamp_id]; > +} Would that not be more readable as: static inline unsigned int uclamp_value(struct rq *rq, int clamp_id) { unsigned int val = rq->uclamp.value[clamp_id]; if (unlikely(val == UCLAMP_NOT_VALID)) val = uclamp_none(clamp_id); return val; } And how come NOT_VALID is possible? I thought the idea was to always have all things a valid value.
On 14-Sep 11:32, Peter Zijlstra wrote: > On Tue, Aug 28, 2018 at 02:53:14PM +0100, Patrick Bellasi wrote: > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > > index 3fffad3bc8a8..949082555ee8 100644 > > --- a/kernel/sched/cpufreq_schedutil.c > > +++ b/kernel/sched/cpufreq_schedutil.c > > @@ -222,8 +222,13 @@ static unsigned long sugov_get_util(struct sugov_cpu *sg_cpu) > > * CFS tasks and we use the same metric to track the effective > > * utilization (PELT windows are synchronized) we can directly add them > > * to obtain the CPU's actual utilization. > > + * > > + * CFS utilization can be boosted or capped, depending on utilization > > + * clamp constraints configured for currently RUNNABLE tasks. > > */ > > util = cpu_util_cfs(rq); > > + if (util) > > + util = uclamp_util(rq, util); > > Should that not be: > > util = clamp_util(rq, cpu_util_cfs(rq)); > > Because if !util might we not still want to enforce the min clamp? If !util CFS tasks should have been gone since a long time (proportional to their estimated utilization) and thus it probably makes sense to not affect further energy efficiency for tasks of other classes. IOW, the blocked utiliation of a class gives us a bit of "hysteresis" in case its tasks have a relatively small period and thus they are lucky to wakeup soonish. This "hysteresis" so far is based on the specific PELT decay rate, which is not very tunable... what I would like instead, but that's for a future update, is a dedicated (per-task) attribute which allows to defined for how long a clamp has to last since the last task enqueue time. This will make up a much more flexible mechanism which allows to completely decouple a clamp duration from PELT enabling scenarios like quite similar to the 0lag we have in DL: - a small task with relatively long period which gets and ensured boost up to their next activation - a big task which has important things to do just at the beginning but can complete in a more energy efficient lower OPP We already have this "boost holding" feature in Android and we found it quite useful especially for RT tasks where it grants that an RT tasks does not risk to wakeup on a lower OPP when that feature is required (which can be not always). Furthermore, based on such a generic "clamp holding mechanism" we can thing also about replacing the IOWAIT boost with a more tunable and task-specific boosting based on util_min. ... but again, if the above makes any sense, its for a future series once we are happy with at least these bits. > > util += cpu_util_rt(rq); > > > > /* > > > @@ -322,11 +328,24 @@ static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, u64 time, > > return; > > sg_cpu->iowait_boost_pending = true; > > > > + /* > > + * Boost FAIR tasks only up to the CPU clamped utilization. > > + * > > + * Since DL tasks have a much more advanced bandwidth control, it's > > + * safe to assume that IO boost does not apply to those tasks. > > + * Instead, since RT tasks are not utiliation clamped, we don't want > > + * to apply clamping on IO boost while there is blocked RT > > + * utilization. > > + */ > > + max_boost = sg_cpu->iowait_boost_max; > > + if (!cpu_util_rt(cpu_rq(sg_cpu->cpu))) > > + max_boost = uclamp_util(cpu_rq(sg_cpu->cpu), max_boost); > > OK I suppose. Yes, if we have a task constraint it should apply to boost too... > > + > > /* Double the boost at each request */ > > if (sg_cpu->iowait_boost) { > > sg_cpu->iowait_boost <<= 1; > > - if (sg_cpu->iowait_boost > sg_cpu->iowait_boost_max) > > - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max; > > + if (sg_cpu->iowait_boost > max_boost) > > + sg_cpu->iowait_boost = max_boost; > > return; > > } > > > > > > +static inline unsigned int uclamp_value(struct rq *rq, int clamp_id) > > +{ > > + struct uclamp_cpu *uc_cpu = &rq->uclamp; > > + > > + if (uc_cpu->value[clamp_id] == UCLAMP_NOT_VALID) > > + return uclamp_none(clamp_id); > > + > > + return uc_cpu->value[clamp_id]; > > +} > > Would that not be more readable as: > > static inline unsigned int uclamp_value(struct rq *rq, int clamp_id) > { > unsigned int val = rq->uclamp.value[clamp_id]; > > if (unlikely(val == UCLAMP_NOT_VALID)) > val = uclamp_none(clamp_id); > > return val; > } I'm trying to keep consistency in variable names usages by always accessing the rq's clamps via a *uc_cpu to make it easy grepping the code. Does this argument make sense ? On the other side, what you propose above is more easy to read by looking just at that function.... so, if you prefer it better, I'll update it on v5. > And how come NOT_VALID is possible? I thought the idea was to always > have all things a valid value. When we update the CPU's clamp for a "newly idle" CPU, there are not tasks refcounting clamps and thus we end up with UCLAMP_NOT_VALID for that CPU. That's how uclamp_cpu_update() is currently encoded. Perhaps we can set the value to uclamp_none(clamp_id) from that function, but I was thinking that perhaps it could be useful to track explicitly that the CPU is now idle. Cheers, Patrick
On Fri, Sep 14, 2018 at 02:19:19PM +0100, Patrick Bellasi wrote: > On 14-Sep 11:32, Peter Zijlstra wrote: > > Should that not be: > > > > util = clamp_util(rq, cpu_util_cfs(rq)); > > > > Because if !util might we not still want to enforce the min clamp? > > If !util CFS tasks should have been gone since a long time > (proportional to their estimated utilization) and thus it probably > makes sense to not affect further energy efficiency for tasks of other > classes. I don't remember what we do for util for new tasks; but weren't we talking about setting that to 0 recently? IIRC the problem was that if we start at 1 with util we'll always run new tasks on big cores, or something along those lines. So new tasks would still trigger this case until they'd accrued enough history. Either way around, I don't much care at this point except I think it would be good to have a comment to record the assumptions. > > Would that not be more readable as: > > > > static inline unsigned int uclamp_value(struct rq *rq, int clamp_id) > > { > > unsigned int val = rq->uclamp.value[clamp_id]; > > > > if (unlikely(val == UCLAMP_NOT_VALID)) > > val = uclamp_none(clamp_id); > > > > return val; > > } > > I'm trying to keep consistency in variable names usages by always > accessing the rq's clamps via a *uc_cpu to make it easy grepping the > code. Does this argument make sense ? > > On the other side, what you propose above is more easy to read > by looking just at that function.... so, if you prefer it better, I'll > update it on v5. I prefer my version, also because it has a single load of the value (yes I know about CSE passes). I figure one can always grep for uclamp or something. > > And how come NOT_VALID is possible? I thought the idea was to always > > have all things a valid value. > > When we update the CPU's clamp for a "newly idle" CPU, there are not > tasks refcounting clamps and thus we end up with UCLAMP_NOT_VALID for > that CPU. That's how uclamp_cpu_update() is currently encoded. > > Perhaps we can set the value to uclamp_none(clamp_id) from that > function, but I was thinking that perhaps it could be useful to track > explicitly that the CPU is now idle. IIRC you added an explicit flag to track idle somewhere.. to keep the last max clamp in effect or something. I think, but haven't overly thought about this, that if you always ensure these things are valid you can avoid a bunch of NOT_VALID conditions. And less conditions is always good, right? :-)
On 14-Sep 15:36, Peter Zijlstra wrote: > On Fri, Sep 14, 2018 at 02:19:19PM +0100, Patrick Bellasi wrote: > > On 14-Sep 11:32, Peter Zijlstra wrote: > > > > Should that not be: > > > > > > util = clamp_util(rq, cpu_util_cfs(rq)); > > > > > > Because if !util might we not still want to enforce the min clamp? > > > > If !util CFS tasks should have been gone since a long time > > (proportional to their estimated utilization) and thus it probably > > makes sense to not affect further energy efficiency for tasks of other > > classes. > > I don't remember what we do for util for new tasks; but weren't we > talking about setting that to 0 recently? IIRC the problem was that if > we start at 1 with util we'll always run new tasks on big cores, or > something along those lines. Mmm.. could have been in a recent discussion with Quentin, but I think I've missed it. I know we have something similar on Android for similar reasons. > So new tasks would still trigger this case until they'd accrued enough > history. Well, yes and no. New tasks will be clamped which means that if they are generated from a capped parent (or within a cgroups with a suitable util_max) they can still live in a smaller capacity CPU despite their utilization being 1024. Thus, to a certain extend, UtilClamp could be a fix for the above misbehavior whenever needed. NOTE: this series does not include tasks biasing bits. > Either way around, I don't much care at this point except I think it > would be good to have a comment to record the assumptions. Sure, will add a comment on that and a warning about possible side effects on tasks placement > > > Would that not be more readable as: > > > > > > static inline unsigned int uclamp_value(struct rq *rq, int clamp_id) > > > { > > > unsigned int val = rq->uclamp.value[clamp_id]; > > > > > > if (unlikely(val == UCLAMP_NOT_VALID)) > > > val = uclamp_none(clamp_id); > > > > > > return val; > > > } > > > > I'm trying to keep consistency in variable names usages by always > > accessing the rq's clamps via a *uc_cpu to make it easy grepping the > > code. Does this argument make sense ? > > > > On the other side, what you propose above is more easy to read > > by looking just at that function.... so, if you prefer it better, I'll > > update it on v5. > > I prefer my version, also because it has a single load of the value (yes > I know about CSE passes). I figure one can always grep for uclamp or > something. +1 > > > And how come NOT_VALID is possible? I thought the idea was to always > > > have all things a valid value. > > > > When we update the CPU's clamp for a "newly idle" CPU, there are not > > tasks refcounting clamps and thus we end up with UCLAMP_NOT_VALID for > > that CPU. That's how uclamp_cpu_update() is currently encoded. > > > > Perhaps we can set the value to uclamp_none(clamp_id) from that > > function, but I was thinking that perhaps it could be useful to track > > explicitly that the CPU is now idle. > > IIRC you added an explicit flag to track idle somewhere.. to keep the > last max clamp in effect or something. Right... that patch was after this one on v3, but know that I've moved it before we can probably simplify this path. > I think, but haven't overly thought about this, that if you always > ensure these things are valid you can avoid a bunch of NOT_VALID > conditions. And less conditions is always good, right? :-) Right, will check better all the usages and remove them when not strictly required. Cheers, Patrick
On Friday 14 Sep 2018 at 14:57:12 (+0100), Patrick Bellasi wrote: > On 14-Sep 15:36, Peter Zijlstra wrote: > > On Fri, Sep 14, 2018 at 02:19:19PM +0100, Patrick Bellasi wrote: > > > On 14-Sep 11:32, Peter Zijlstra wrote: > > > > > > Should that not be: > > > > > > > > util = clamp_util(rq, cpu_util_cfs(rq)); > > > > > > > > Because if !util might we not still want to enforce the min clamp? > > > > > > If !util CFS tasks should have been gone since a long time > > > (proportional to their estimated utilization) and thus it probably > > > makes sense to not affect further energy efficiency for tasks of other > > > classes. > > > > I don't remember what we do for util for new tasks; but weren't we > > talking about setting that to 0 recently? IIRC the problem was that if > > we start at 1 with util we'll always run new tasks on big cores, or > > something along those lines. I guess you're referring to that discussion ? https://lore.kernel.org/lkml/CAKfTPtDcoySXK0fBkDNy4wp1vsRxmiuAGT3CDZBh6Vnwyep2BA@mail.gmail.com/ If yes, then the outcome was that we'll see later what we do with new tasks :-) Setting the util of new tasks to 0 surely can help power, but that can also harm performance pretty badly, I think. You'd be stuck at min freq for a while w/ sugov in case of a fork bomb for example. > Mmm.. could have been in a recent discussion with Quentin, but I > think I've missed it. I know we have something similar on Android for > similar reasons. I don't think PELT is different in Android (we still set the initial util of new tasks as half of the spare cap of the CPU), but there are other tweaks that influence the first task placement, though. And WALT sets the util of new tasks to 0 IIRC (but I'm not sure it's relevant since its signal ramps up a lot faster than PELT's). Thanks, Quentin
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 3fffad3bc8a8..949082555ee8 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -222,8 +222,13 @@ static unsigned long sugov_get_util(struct sugov_cpu *sg_cpu) * CFS tasks and we use the same metric to track the effective * utilization (PELT windows are synchronized) we can directly add them * to obtain the CPU's actual utilization. + * + * CFS utilization can be boosted or capped, depending on utilization + * clamp constraints configured for currently RUNNABLE tasks. */ util = cpu_util_cfs(rq); + if (util) + util = uclamp_util(rq, util); util += cpu_util_rt(rq); /* @@ -307,6 +312,7 @@ static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, u64 time, unsigned int flags) { bool set_iowait_boost = flags & SCHED_CPUFREQ_IOWAIT; + unsigned int max_boost; /* Reset boost if the CPU appears to have been idle enough */ if (sg_cpu->iowait_boost && @@ -322,11 +328,24 @@ static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, u64 time, return; sg_cpu->iowait_boost_pending = true; + /* + * Boost FAIR tasks only up to the CPU clamped utilization. + * + * Since DL tasks have a much more advanced bandwidth control, it's + * safe to assume that IO boost does not apply to those tasks. + * Instead, since RT tasks are not utiliation clamped, we don't want + * to apply clamping on IO boost while there is blocked RT + * utilization. + */ + max_boost = sg_cpu->iowait_boost_max; + if (!cpu_util_rt(cpu_rq(sg_cpu->cpu))) + max_boost = uclamp_util(cpu_rq(sg_cpu->cpu), max_boost); + /* Double the boost at each request */ if (sg_cpu->iowait_boost) { sg_cpu->iowait_boost <<= 1; - if (sg_cpu->iowait_boost > sg_cpu->iowait_boost_max) - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max; + if (sg_cpu->iowait_boost > max_boost) + sg_cpu->iowait_boost = max_boost; return; } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 411635c4c09a..1b05b38b1081 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2293,6 +2293,56 @@ static inline unsigned int uclamp_none(int clamp_id) return SCHED_CAPACITY_SCALE; } +#ifdef CONFIG_UCLAMP_TASK +/** + * uclamp_value: get the current CPU's utilization clamp value + * @rq: the CPU's RQ to consider + * @clamp_id: the utilization clamp index (i.e. min or max utilization) + * + * The utilization clamp value for a CPU depends on its set of currently + * RUNNABLE tasks and their specific util_{min,max} constraints. + * A max aggregated value is tracked for each CPU and returned by this + * function. + * + * Return: the current value for the specified CPU and clamp index + */ +static inline unsigned int uclamp_value(struct rq *rq, int clamp_id) +{ + struct uclamp_cpu *uc_cpu = &rq->uclamp; + + if (uc_cpu->value[clamp_id] == UCLAMP_NOT_VALID) + return uclamp_none(clamp_id); + + return uc_cpu->value[clamp_id]; +} + +/** + * clamp_util: clamp a utilization value for a specified CPU + * @rq: the CPU's RQ to get the clamp values from + * @util: the utilization signal to clamp + * + * Each CPU tracks util_{min,max} clamp values depending on the set of its + * currently RUNNABLE tasks. Given a utilization signal, i.e a signal in + * the [0..SCHED_CAPACITY_SCALE] range, this function returns a clamped + * utilization signal considering the current clamp values for the + * specified CPU. + * + * Return: a clamped utilization signal for a given CPU. + */ +static inline unsigned int uclamp_util(struct rq *rq, unsigned int util) +{ + unsigned int min_util = uclamp_value(rq, UCLAMP_MIN); + unsigned int max_util = uclamp_value(rq, UCLAMP_MAX); + + return clamp(util, min_util, max_util); +} +#else /* CONFIG_UCLAMP_TASK */ +static inline unsigned int uclamp_util(struct rq *rq, unsigned int util) +{ + return util; +} +#endif /* CONFIG_UCLAMP_TASK */ + #ifdef arch_scale_freq_capacity # ifndef arch_scale_freq_invariant # define arch_scale_freq_invariant() true
Each time a frequency update is required via schedutil, a frequency is selected to (possibly) satisfy the utilization reported by the CFS class. However, when utilization clamping is in use, the frequency selection should consider the requirements suggested by userspace, for example, to: - boost tasks which are directly affecting the user experience by running them at least at a minimum "required" frequency - cap low priority tasks not directly affecting the user experience by running them only up to a maximum "allowed" frequency These constraints are meant to support a per-task based tuning of the frequency selection thus allowing to have a fine grained definition of performance boosting vs energy saving strategies in kernel space. Let's add the required support to clamp the utilization generated by FAIR tasks within the boundaries defined by their aggregated utilization clamp constraints. On each CPU the aggregated clamp values are obtained by considering the maximum of the {min,max}_util values for each task. This max aggregation responds to the goal of not penalizing, for example, high boosted (i.e. more important for the user-experience) CFS tasks which happens to be co-scheduled with high capped (i.e. less important for the user-experience) CFS tasks. For FAIR tasks both the utilization as well as the IOWait boost values are clamped according to the CPU aggregated utilization clamp constraints. The default values for boosting and capping are defined to be: - util_min: 0 - util_max: SCHED_CAPACITY_SCALE which means that by default no boosting/capping is enforced on FAIR tasks, and thus the frequency will be selected considering the actual utilization value of each CPU. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Todd Kjos <tkjos@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: linux-kernel@vger.kernel.org Cc: linux-pm@vger.kernel.org --- Changes in v4: Message-ID: <CAKfTPtC2adLupg7wy1JU9zxKx1466Sza6fSCcr92wcawm1OYkg@mail.gmail.com> - use *rq instead of cpu for both uclamp_util() and uclamp_value() Message-ID: <20180816135300.GC2960@e110439-lin> - remove uclamp_value() which is never used outside CONFIG_UCLAMP_TASK Others: - rebased on v4.19-rc1 Changes in v3: Message-ID: <CAJuCfpF6=L=0LrmNnJrTNPazT4dWKqNv+thhN0dwpKCgUzs9sg@mail.gmail.com> - rename UCLAMP_NONE into UCLAMP_NOT_VALID Others: - rebased on tip/sched/core Changes in v2: - rebased on v4.18-rc4 --- kernel/sched/cpufreq_schedutil.c | 23 +++++++++++++-- kernel/sched/sched.h | 50 ++++++++++++++++++++++++++++++++ 2 files changed, 71 insertions(+), 2 deletions(-)