From patchwork Tue Jul 7 18:24:08 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Morten Rasmussen X-Patchwork-Id: 6738381 Return-Path: X-Original-To: patchwork-linux-pm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 7D9A09F2F0 for ; Tue, 7 Jul 2015 18:59:53 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 6079A206EC for ; Tue, 7 Jul 2015 18:59:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 36AD420742 for ; Tue, 7 Jul 2015 18:59:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932695AbbGGSXI (ORCPT ); Tue, 7 Jul 2015 14:23:08 -0400 Received: from foss.arm.com ([217.140.101.70]:37563 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932887AbbGGSXE (ORCPT ); Tue, 7 Jul 2015 14:23:04 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E37D45E4; Tue, 7 Jul 2015 11:23:30 -0700 (PDT) Received: from e105550-lin.cambridge.arm.com (e105550-lin.cambridge.arm.com [10.2.131.193]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id CF03B3F23A; Tue, 7 Jul 2015 11:23:01 -0700 (PDT) From: Morten Rasmussen To: peterz@infradead.org, mingo@redhat.com Cc: vincent.guittot@linaro.org, daniel.lezcano@linaro.org, Dietmar Eggemann , yuyang.du@intel.com, mturquette@baylibre.com, rjw@rjwysocki.net, Juri Lelli , sgurrappadi@nvidia.com, pang.xunlei@zte.com.cn, linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Subject: [RFCv5 PATCH 25/46] sched: Add over-utilization/tipping point indicator Date: Tue, 7 Jul 2015 19:24:08 +0100 Message-Id: <1436293469-25707-26-git-send-email-morten.rasmussen@arm.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1436293469-25707-1-git-send-email-morten.rasmussen@arm.com> References: <1436293469-25707-1-git-send-email-morten.rasmussen@arm.com> Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X-Spam-Status: No, score=-7.7 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Energy-aware scheduling is only meant to be active while the system is _not_ over-utilized. That is, there are spare cycles available to shift tasks around based on their actual utilization to get a more energy-efficient task distribution without depriving any tasks. When above the tipping point task placement is done the traditional way, spreading the tasks across as many cpus as possible based on priority scaled load to preserve smp_nice. The over-utilization condition is conservatively chosen to indicate over-utilization as soon as one cpu is fully utilized at it's highest frequency. We don't consider groups as lumping usage and capacity together for a group of cpus may hide the fact that one or more cpus in the group are over-utilized while group-siblings are partially idle. The tasks could be served better if moved to another group with completely idle cpus. This is particularly problematic if some cpus have a significantly reduced capacity due to RT/IRQ pressure or if the system has cpus of different capacity (e.g. ARM big.LITTLE). cc: Ingo Molnar cc: Peter Zijlstra Signed-off-by: Morten Rasmussen --- kernel/sched/fair.c | 35 +++++++++++++++++++++++++++++++---- kernel/sched/sched.h | 3 +++ 2 files changed, 34 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index bf1d34c..99e43ee 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4281,6 +4281,8 @@ static inline void hrtick_update(struct rq *rq) } #endif +static bool cpu_overutilized(int cpu); + /* * The enqueue_task method is called before nr_running is * increased. Here we update the fair scheduling stats and @@ -4291,6 +4293,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) { struct cfs_rq *cfs_rq; struct sched_entity *se = &p->se; + int task_new = !(flags & ENQUEUE_WAKEUP); for_each_sched_entity(se) { if (se->on_rq) @@ -4325,6 +4328,9 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) if (!se) { update_rq_runnable_avg(rq, rq->nr_running); add_nr_running(rq, 1); + if (!task_new && !rq->rd->overutilized && + cpu_overutilized(rq->cpu)) + rq->rd->overutilized = true; } hrtick_update(rq); } @@ -4952,6 +4958,14 @@ static int find_new_capacity(struct energy_env *eenv, return idx; } +static unsigned int capacity_margin = 1280; /* ~20% margin */ + +static bool cpu_overutilized(int cpu) +{ + return (capacity_of(cpu) * 1024) < + (get_cpu_usage(cpu) * capacity_margin); +} + /* * sched_group_energy(): Returns absolute energy consumption of cpus belonging * to the sched_group including shared resources shared only by members of the @@ -6756,11 +6770,12 @@ static enum group_type group_classify(struct lb_env *env, * @local_group: Does group contain this_cpu. * @sgs: variable to hold the statistics for this group. * @overload: Indicate more than one runnable task for any CPU. + * @overutilized: Indicate overutilization for any CPU. */ static inline void update_sg_lb_stats(struct lb_env *env, struct sched_group *group, int load_idx, int local_group, struct sg_lb_stats *sgs, - bool *overload) + bool *overload, bool *overutilized) { unsigned long load; int i; @@ -6790,6 +6805,9 @@ static inline void update_sg_lb_stats(struct lb_env *env, sgs->sum_weighted_load += weighted_cpuload(i); if (idle_cpu(i)) sgs->idle_cpus++; + + if (cpu_overutilized(i)) + *overutilized = true; } /* Adjust by relative CPU capacity of the group */ @@ -6895,7 +6913,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd struct sched_group *sg = env->sd->groups; struct sg_lb_stats tmp_sgs; int load_idx, prefer_sibling = 0; - bool overload = false; + bool overload = false, overutilized = false; if (child && child->flags & SD_PREFER_SIBLING) prefer_sibling = 1; @@ -6917,7 +6935,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd } update_sg_lb_stats(env, sg, load_idx, local_group, sgs, - &overload); + &overload, &overutilized); if (local_group) goto next_group; @@ -6959,8 +6977,14 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd /* update overload indicator if we are at root domain */ if (env->dst_rq->rd->overload != overload) env->dst_rq->rd->overload = overload; - } + /* Update over-utilization (tipping point, U >= 0) indicator */ + if (env->dst_rq->rd->overutilized != overutilized) + env->dst_rq->rd->overutilized = overutilized; + } else { + if (!env->dst_rq->rd->overutilized && overutilized) + env->dst_rq->rd->overutilized = true; + } } /** @@ -8324,6 +8348,9 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued) task_tick_numa(rq, curr); update_rq_runnable_avg(rq, 1); + + if (!rq->rd->overutilized && cpu_overutilized(task_cpu(curr))) + rq->rd->overutilized = true; } /* diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 8a51692..fbe2da0 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -535,6 +535,9 @@ struct root_domain { /* Indicate more than one runnable task for any CPU */ bool overload; + /* Indicate one or more cpus over-utilized (tipping point) */ + bool overutilized; + /* * The bit corresponding to a CPU gets set here if such CPU has more * than one runnable -deadline task (as it is below for RT tasks).