Message ID | 1457932932-28444-3-git-send-email-mturquette+renesas@baylibre.com (mailing list archive) |
---|---|
State | RFC, archived |
Headers | show |
On Sun, Mar 13, 2016 at 10:22:06PM -0700, Michael Turquette wrote: > @@ -2840,6 +2853,8 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg) > > if (cpu == smp_processor_id() && &rq->cfs == cfs_rq) { > unsigned long max = rq->cpu_capacity_orig; > + unsigned long cap = cfs_rq->avg.util_avg * > + cfs_capacity_margin / max; > > /* > * There are a few boundary cases this might miss but it should > @@ -2852,8 +2867,7 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg) > * thread is a different class (!fair), nor will the utilization > * number include things like RT tasks. > */ > - cpufreq_update_util(rq_clock(rq), > - min(cfs_rq->avg.util_avg, max), max); > + cpufreq_update_util(rq_clock(rq), min(cap, max), max); > } > } I really don't see why that is here, and not inside whatever uses cpufreq_update_util(). -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 15, 2016 at 02:28:48PM -0700, Michael Turquette wrote: > Quoting Peter Zijlstra (2016-03-15 14:16:14) > > On Sun, Mar 13, 2016 at 10:22:06PM -0700, Michael Turquette wrote: > > > @@ -2840,6 +2853,8 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg) > > > > > > if (cpu == smp_processor_id() && &rq->cfs == cfs_rq) { > > > unsigned long max = rq->cpu_capacity_orig; > > > + unsigned long cap = cfs_rq->avg.util_avg * > > > + cfs_capacity_margin / max; > > > > > > /* > > > * There are a few boundary cases this might miss but it should > > > @@ -2852,8 +2867,7 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg) > > > * thread is a different class (!fair), nor will the utilization > > > * number include things like RT tasks. > > > */ > > > - cpufreq_update_util(rq_clock(rq), > > > - min(cfs_rq->avg.util_avg, max), max); > > > + cpufreq_update_util(rq_clock(rq), min(cap, max), max); > > > } > > > } > > > > I really don't see why that is here, and not inside whatever uses > > cpufreq_update_util(). > > Because I want schedutil to be dumb and not implement any policy of it's > own. The idea is for the scheduler to select frequency after all. > > I want to avoid a weird hybrid solution where we try to be smart about > selecting the right capacity/frequency in fair.c (e.g. Steve and Juri's > patches to fair.c from the sched-freq-v7 series), but then have an > additional layer of "smarts" massaging those values further in the > cpufreq governor. So the problem here is that you add an unconditional division, even if cpufreq_update_util() then decides to not do anything with it. Please, these are scheduler paths, do not add (fancy) instructions if you don't absolutely have to. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/13/2016 10:22 PM, Michael Turquette wrote: > +unsigned long cfs_capacity_margin = CAPACITY_MARGIN_DEFAULT; > + > #ifdef CONFIG_CFS_BANDWIDTH > /* > * Amount of runtime to allocate from global (tg) to local (per-cfs_rq) pool > @@ -2840,6 +2853,8 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg) > > if (cpu == smp_processor_id() && &rq->cfs == cfs_rq) { > unsigned long max = rq->cpu_capacity_orig; > + unsigned long cap = cfs_rq->avg.util_avg * > + cfs_capacity_margin / max; Doesn't rq->cpu_capacity_orig get scaled per the microarch invariance? This would mean that the margin we're applying here would differ based on that. I'd expect that the margin would be * (cfs_capacity_margin / SCHED_CAPACITY_SCALE) which would then reduce the division into a shift. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Quoting Steve Muckle (2016-03-15 19:52:59) > On 03/13/2016 10:22 PM, Michael Turquette wrote: > > +unsigned long cfs_capacity_margin = CAPACITY_MARGIN_DEFAULT; > > + > > #ifdef CONFIG_CFS_BANDWIDTH > > /* > > * Amount of runtime to allocate from global (tg) to local (per-cfs_rq) pool > > @@ -2840,6 +2853,8 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg) > > > > if (cpu == smp_processor_id() && &rq->cfs == cfs_rq) { > > unsigned long max = rq->cpu_capacity_orig; > > + unsigned long cap = cfs_rq->avg.util_avg * > > + cfs_capacity_margin / max; > > Doesn't rq->cpu_capacity_orig get scaled per the microarch invariance? > This would mean that the margin we're applying here would differ based > on that. > > I'd expect that the margin would be * (cfs_capacity_margin / > SCHED_CAPACITY_SCALE) which would then reduce the division into a shift. Will fix. Thanks, Mike -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index a32f281..29e8bae 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -100,6 +100,19 @@ const_debug unsigned int sysctl_sched_migration_cost = 500000UL; */ unsigned int __read_mostly sysctl_sched_shares_window = 10000000UL; +/* + * Add a 25% margin globally to all capacity requests from cfs. This is + * equivalent to an 80% up_threshold in legacy governors like ondemand. + * + * This is required as task utilization increases. The frequency-invariant + * utilization will asymptotically approach the current capacity of the cpu and + * the additional margin will cross the threshold into the next capacity state. + * + * XXX someday expand to separate, per-call site margins? e.g. enqueue, fork, + * task_tick, load_balance, etc + */ +unsigned long cfs_capacity_margin = CAPACITY_MARGIN_DEFAULT; + #ifdef CONFIG_CFS_BANDWIDTH /* * Amount of runtime to allocate from global (tg) to local (per-cfs_rq) pool @@ -2840,6 +2853,8 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg) if (cpu == smp_processor_id() && &rq->cfs == cfs_rq) { unsigned long max = rq->cpu_capacity_orig; + unsigned long cap = cfs_rq->avg.util_avg * + cfs_capacity_margin / max; /* * There are a few boundary cases this might miss but it should @@ -2852,8 +2867,7 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg) * thread is a different class (!fair), nor will the utilization * number include things like RT tasks. */ - cpufreq_update_util(rq_clock(rq), - min(cfs_rq->avg.util_avg, max), max); + cpufreq_update_util(rq_clock(rq), min(cap, max), max); } } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index f06dfca..8c93ed2 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -27,6 +27,9 @@ extern __read_mostly int scheduler_running; extern unsigned long calc_load_update; extern atomic_long_t calc_load_tasks; +#define CAPACITY_MARGIN_DEFAULT 1280; +extern unsigned long cfs_capacity_margin; + extern void calc_global_load_tick(struct rq *this_rq); extern long calc_load_fold_active(struct rq *this_rq);
Utilization contributions to cfs_rq->avg.util_avg are scaled for both microarchitecture-invariance as well as frequency-invariance. This means that any given utilization contribution will be scaled against the current cpu capacity (cpu frequency). Contributions from long running tasks, whose utilization grows larger over time, will asymptotically approach the current capacity. This causes a problem when using this utilization signal to select a target cpu capacity (cpu frequency), as our signal will never exceed the current capacity, which would otherwise be our signal to increase frequency. Solve this by introducing a default capacity margin that is added to the utilization signal when requesting a change to capacity (cpu frequency). The margin is 1280, or 1.25 x SCHED_CAPACITY_SCALE (1024). This is equivalent to similar margins such as the default 125 value assigned to struct sched_domain.imbalance_pct for load balancing, and to the 80% up_threshold used by the legacy cpufreq ondemand governor. Signed-off-by: Michael Turquette <mturquette+renesas@baylibre.com> --- kernel/sched/fair.c | 18 ++++++++++++++++-- kernel/sched/sched.h | 3 +++ 2 files changed, 19 insertions(+), 2 deletions(-)