From patchwork Tue Sep 23 16:08:02 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 4958261 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 39532BEEA5 for ; Tue, 23 Sep 2014 16:13:05 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 9012520263 for ; Tue, 23 Sep 2014 16:13:03 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DE67420131 for ; Tue, 23 Sep 2014 16:13:01 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1XWSg2-00043u-2j; Tue, 23 Sep 2014 16:10:54 +0000 Received: from mail-wi0-f179.google.com ([209.85.212.179]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1XWSf6-00027D-3m for linux-arm-kernel@lists.infradead.org; Tue, 23 Sep 2014 16:09:57 +0000 Received: by mail-wi0-f179.google.com with SMTP id d1so5433182wiv.6 for ; Tue, 23 Sep 2014 09:09:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=mB8cBVTU0ppMlmyQ8Kr+h5bfgtjHvbSz32XWJqUdLnE=; b=biU4Y4u46tEoem5CRMUP26YbUIXi3c1PmUYe4VLOsMFYCtWISOk6INrcoN/8Il8xoF HtHZ7FMOZMwjO7bPP7e2SFPJu2hi6v75bYioJqmEOsRQBW/oM1XzG9azw5L27hrfs8AB JsSov1l/8ZyX27CYbpMr3uOs8WdNq7abtKx4ZD3MMMCqa+D9yEQE7tJP4uuWiDnDAPbF 2fjsUQ+DmUiuxSAiwf+hA5P0MrbxwDC5Adag4+OoiDwh9cKMMR+s1JGHo5m+TYQB4POj URNDM2GN9AsbUonswZkb5rI5ZwSf9xhdFKhVfVVMEKzQznjQF9G69ELty2F2389shjV+ vwyQ== X-Gm-Message-State: ALoCoQmcUxsyrH5O3yfc0MxUTgGUYjNCLfgkZ7Fz+l3M3gAShloo23yxZ6eszA6smSprreTxwims X-Received: by 10.194.184.166 with SMTP id ev6mr587413wjc.61.1411488573779; Tue, 23 Sep 2014 09:09:33 -0700 (PDT) Received: from lmenx30s.lme.st.com (LPuteaux-656-01-48-212.w82-127.abo.wanadoo.fr. [82.127.83.212]) by mx.google.com with ESMTPSA id hm5sm16427901wjb.2.2014.09.23.09.09.31 for (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 23 Sep 2014 09:09:33 -0700 (PDT) From: Vincent Guittot To: peterz@infradead.org, mingo@kernel.org, linux-kernel@vger.kernel.org, preeti@linux.vnet.ibm.com, linux@arm.linux.org.uk, linux-arm-kernel@lists.infradead.org Subject: [PATCH v6 3/6] sched: add utilization_avg_contrib Date: Tue, 23 Sep 2014 18:08:02 +0200 Message-Id: <1411488485-10025-4-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1411488485-10025-1-git-send-email-vincent.guittot@linaro.org> References: <1411488485-10025-1-git-send-email-vincent.guittot@linaro.org> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20140923_090956_453826_33033F47 X-CRM114-Status: GOOD ( 21.83 ) X-Spam-Score: -0.7 (/) Cc: nicolas.pitre@linaro.org, riel@redhat.com, daniel.lezcano@linaro.org, Vincent Guittot , efault@gmx.de, dietmar.eggemann@arm.com, bsegall@google.com, linaro-kernel@lists.linaro.org, pjt@google.com, Morten.Rasmussen@arm.com X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_NONE, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Add new statistics which reflect the average time a task is running on the CPU and the sum of these running time of the tasks on a runqueue. The latter is named utilization_avg_contrib. This patch is based on the usage metric that was proposed in the 1st versions of the per-entity load tracking patchset by Paul Turner but that has be removed afterward. This version differs from the original one in the sense that it's not linked to task_group. The rq's utilization_avg_contrib will be used to check if a rq is overloaded or not instead of trying to compute how many task a group of CPUs can handle Rename runnable_avg_period into avg_period as it is now used with both runnable_avg_sum and running_avg_sum Add some descriptions of the variables to explain their differences Signed-off-by: Vincent Guittot --- include/linux/sched.h | 19 +++++++++++--- kernel/sched/debug.c | 9 ++++--- kernel/sched/fair.c | 68 +++++++++++++++++++++++++++++++++++++++------------ kernel/sched/sched.h | 8 +++++- 4 files changed, 81 insertions(+), 23 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 48ae6c4..51df220 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1071,15 +1071,26 @@ struct load_weight { }; struct sched_avg { + u64 last_runnable_update; + s64 decay_count; + /* + * utilization_avg_contrib describes the amount of time that a + * sched_entity is running on a CPU. It is based on running_avg_sum + * and is scaled in the range [0..SCHED_LOAD_SCALE]. + * load_avg_contrib described the the amount of time that a + * sched_entity is runnable on a rq. It is based on both + * runnable_avg_sum and the weight of the task. + */ + unsigned long load_avg_contrib, utilization_avg_contrib; /* * These sums represent an infinite geometric series and so are bound * above by 1024/(1-y). Thus we only need a u32 to store them for all * choices of y < 1-2^(-32)*1024. + * runnable_avg_sum represents the amount of time a sched_entity is on + * the runqueue whereas running_avg_sum reflects the time the + * sched_entity is effectively running on the runqueue. */ - u32 runnable_avg_sum, runnable_avg_period; - u64 last_runnable_update; - s64 decay_count; - unsigned long load_avg_contrib; + u32 runnable_avg_sum, avg_period, running_avg_sum; }; #ifdef CONFIG_SCHEDSTATS diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index c7fe1ea0..1761db9 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -71,7 +71,7 @@ static void print_cfs_group_stats(struct seq_file *m, int cpu, struct task_group if (!se) { struct sched_avg *avg = &cpu_rq(cpu)->avg; P(avg->runnable_avg_sum); - P(avg->runnable_avg_period); + P(avg->avg_period); return; } @@ -94,7 +94,7 @@ static void print_cfs_group_stats(struct seq_file *m, int cpu, struct task_group P(se->load.weight); #ifdef CONFIG_SMP P(se->avg.runnable_avg_sum); - P(se->avg.runnable_avg_period); + P(se->avg.avg_period); P(se->avg.load_avg_contrib); P(se->avg.decay_count); #endif @@ -215,6 +215,8 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq) cfs_rq->runnable_load_avg); SEQ_printf(m, " .%-30s: %ld\n", "blocked_load_avg", cfs_rq->blocked_load_avg); + SEQ_printf(m, " .%-30s: %ld\n", "utilization_load_avg", + cfs_rq->utilization_load_avg); #ifdef CONFIG_FAIR_GROUP_SCHED SEQ_printf(m, " .%-30s: %ld\n", "tg_load_contrib", cfs_rq->tg_load_contrib); @@ -631,7 +633,8 @@ void proc_sched_show_task(struct task_struct *p, struct seq_file *m) P(se.load.weight); #ifdef CONFIG_SMP P(se.avg.runnable_avg_sum); - P(se.avg.runnable_avg_period); + P(se.avg.running_avg_sum); + P(se.avg.avg_period); P(se.avg.load_avg_contrib); P(se.avg.decay_count); #endif diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7422044..2cf153d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -677,8 +677,8 @@ void init_task_runnable_average(struct task_struct *p) p->se.avg.decay_count = 0; slice = sched_slice(task_cfs_rq(p), &p->se) >> 10; - p->se.avg.runnable_avg_sum = slice; - p->se.avg.runnable_avg_period = slice; + p->se.avg.runnable_avg_sum = p->se.avg.running_avg_sum = slice; + p->se.avg.avg_period = slice; __update_task_entity_contrib(&p->se); } #else @@ -1547,7 +1547,7 @@ static u64 numa_get_avg_runtime(struct task_struct *p, u64 *period) *period = now - p->last_task_numa_placement; } else { delta = p->se.avg.runnable_avg_sum; - *period = p->se.avg.runnable_avg_period; + *period = p->se.avg.avg_period; } p->last_sum_exec_runtime = runtime; @@ -2297,7 +2297,8 @@ static u32 __compute_runnable_contrib(u64 n) */ static __always_inline int __update_entity_runnable_avg(u64 now, struct sched_avg *sa, - int runnable) + int runnable, + int running) { u64 delta, periods; u32 runnable_contrib; @@ -2323,7 +2324,7 @@ static __always_inline int __update_entity_runnable_avg(u64 now, sa->last_runnable_update = now; /* delta_w is the amount already accumulated against our next period */ - delta_w = sa->runnable_avg_period % 1024; + delta_w = sa->avg_period % 1024; if (delta + delta_w >= 1024) { /* period roll-over */ decayed = 1; @@ -2336,7 +2337,9 @@ static __always_inline int __update_entity_runnable_avg(u64 now, delta_w = 1024 - delta_w; if (runnable) sa->runnable_avg_sum += delta_w; - sa->runnable_avg_period += delta_w; + if (running) + sa->running_avg_sum += delta_w; + sa->avg_period += delta_w; delta -= delta_w; @@ -2346,20 +2349,26 @@ static __always_inline int __update_entity_runnable_avg(u64 now, sa->runnable_avg_sum = decay_load(sa->runnable_avg_sum, periods + 1); - sa->runnable_avg_period = decay_load(sa->runnable_avg_period, + sa->running_avg_sum = decay_load(sa->running_avg_sum, + periods + 1); + sa->avg_period = decay_load(sa->avg_period, periods + 1); /* Efficiently calculate \sum (1..n_period) 1024*y^i */ runnable_contrib = __compute_runnable_contrib(periods); if (runnable) sa->runnable_avg_sum += runnable_contrib; - sa->runnable_avg_period += runnable_contrib; + if (running) + sa->running_avg_sum += runnable_contrib; + sa->avg_period += runnable_contrib; } /* Remainder of delta accrued against u_0` */ if (runnable) sa->runnable_avg_sum += delta; - sa->runnable_avg_period += delta; + if (running) + sa->running_avg_sum += delta; + sa->avg_period += delta; return decayed; } @@ -2411,7 +2420,7 @@ static inline void __update_tg_runnable_avg(struct sched_avg *sa, /* The fraction of a cpu used by this cfs_rq */ contrib = div_u64((u64)sa->runnable_avg_sum << NICE_0_SHIFT, - sa->runnable_avg_period + 1); + sa->avg_period + 1); contrib -= cfs_rq->tg_runnable_contrib; if (abs(contrib) > cfs_rq->tg_runnable_contrib / 64) { @@ -2464,7 +2473,8 @@ static inline void __update_group_entity_contrib(struct sched_entity *se) static inline void update_rq_runnable_avg(struct rq *rq, int runnable) { - __update_entity_runnable_avg(rq_clock_task(rq), &rq->avg, runnable); + __update_entity_runnable_avg(rq_clock_task(rq), &rq->avg, runnable, + runnable); __update_tg_runnable_avg(&rq->avg, &rq->cfs); } #else /* CONFIG_FAIR_GROUP_SCHED */ @@ -2482,7 +2492,7 @@ static inline void __update_task_entity_contrib(struct sched_entity *se) /* avoid overflowing a 32-bit type w/ SCHED_LOAD_SCALE */ contrib = se->avg.runnable_avg_sum * scale_load_down(se->load.weight); - contrib /= (se->avg.runnable_avg_period + 1); + contrib /= (se->avg.avg_period + 1); se->avg.load_avg_contrib = scale_load(contrib); } @@ -2501,6 +2511,27 @@ static long __update_entity_load_avg_contrib(struct sched_entity *se) return se->avg.load_avg_contrib - old_contrib; } + +static inline void __update_task_entity_utilization(struct sched_entity *se) +{ + u32 contrib; + + /* avoid overflowing a 32-bit type w/ SCHED_LOAD_SCALE */ + contrib = se->avg.running_avg_sum * scale_load_down(SCHED_LOAD_SCALE); + contrib /= (se->avg.avg_period + 1); + se->avg.utilization_avg_contrib = scale_load(contrib); +} + +static long __update_entity_utilization_avg_contrib(struct sched_entity *se) +{ + long old_contrib = se->avg.utilization_avg_contrib; + + if (entity_is_task(se)) + __update_task_entity_utilization(se); + + return se->avg.utilization_avg_contrib - old_contrib; +} + static inline void subtract_blocked_load_contrib(struct cfs_rq *cfs_rq, long load_contrib) { @@ -2517,7 +2548,7 @@ static inline void update_entity_load_avg(struct sched_entity *se, int update_cfs_rq) { struct cfs_rq *cfs_rq = cfs_rq_of(se); - long contrib_delta; + long contrib_delta, utilization_delta; u64 now; /* @@ -2529,16 +2560,20 @@ static inline void update_entity_load_avg(struct sched_entity *se, else now = cfs_rq_clock_task(group_cfs_rq(se)); - if (!__update_entity_runnable_avg(now, &se->avg, se->on_rq)) + if (!__update_entity_runnable_avg(now, &se->avg, se->on_rq, + cfs_rq->curr == se)) return; contrib_delta = __update_entity_load_avg_contrib(se); + utilization_delta = __update_entity_utilization_avg_contrib(se); if (!update_cfs_rq) return; - if (se->on_rq) + if (se->on_rq) { cfs_rq->runnable_load_avg += contrib_delta; + cfs_rq->utilization_load_avg += utilization_delta; + } else subtract_blocked_load_contrib(cfs_rq, -contrib_delta); } @@ -2615,6 +2650,7 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq, } cfs_rq->runnable_load_avg += se->avg.load_avg_contrib; + cfs_rq->utilization_load_avg += se->avg.utilization_avg_contrib; /* we force update consideration on load-balancer moves */ update_cfs_rq_blocked_load(cfs_rq, !wakeup); } @@ -2633,6 +2669,7 @@ static inline void dequeue_entity_load_avg(struct cfs_rq *cfs_rq, update_cfs_rq_blocked_load(cfs_rq, !sleep); cfs_rq->runnable_load_avg -= se->avg.load_avg_contrib; + cfs_rq->utilization_load_avg -= se->avg.utilization_avg_contrib; if (sleep) { cfs_rq->blocked_load_avg += se->avg.load_avg_contrib; se->avg.decay_count = atomic64_read(&cfs_rq->decay_counter); @@ -2970,6 +3007,7 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se) */ update_stats_wait_end(cfs_rq, se); __dequeue_entity(cfs_rq, se); + update_entity_load_avg(se, 1); } update_stats_curr_start(cfs_rq, se); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index f332e45..3ccb136 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -339,8 +339,14 @@ struct cfs_rq { * Under CFS, load is tracked on a per-entity basis and aggregated up. * This allows for the description of both thread and group usage (in * the FAIR_GROUP_SCHED case). + * runnable_load_avg is the sum of the load_avg_contrib of the + * sched_entities on the rq. + * blocked_load_avg is similar to runnable_load_avg except that its + * the blocked sched_entities on the rq. + * utilization_load_avg is the sum of the average running time of the + * sched_entities on the rq. */ - unsigned long runnable_load_avg, blocked_load_avg; + unsigned long runnable_load_avg, blocked_load_avg, utilization_load_avg; atomic64_t decay_counter; u64 last_decay; atomic_long_t removed_load;