From patchwork Sun Oct 7 07:43:55 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 1562101 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) by patchwork2.kernel.org (Postfix) with ESMTP id 498E8DFF71 for ; Sun, 7 Oct 2012 21:06:07 +0000 (UTC) Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux)) id 1TKy1B-0001yD-E8; Sun, 07 Oct 2012 21:04:09 +0000 Received: from mail-wi0-f171.google.com ([209.85.212.171]) by merlin.infradead.org with esmtps (Exim 4.76 #1 (Red Hat Linux)) id 1TKlXU-0000dK-6R for linux-arm-kernel@lists.infradead.org; Sun, 07 Oct 2012 07:44:43 +0000 Received: by mail-wi0-f171.google.com with SMTP id hj13so1871008wib.0 for ; Sun, 07 Oct 2012 00:44:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:to:cc:subject:date:message-id:x-mailer:in-reply-to:references :x-gm-message-state; bh=092yGWE286bGiBqkZKOtC6fmjefKevY35Kn1iTpb5Ss=; b=GVOupdetAMAne9sOb4mdMkp5qCWepp2c51EshfiXIFYYbwmbxdiVlHeearH/owsw/P G0/A9pmyczq3hjow+XzOsPiEo3LRgSxvi2S+/Qxl4iMq+KQ6TPlewvUKWU8qNFDUiauC Vf4bLTEFK5kHhtZLc+e0hvp9hdfqQogvEBCWrC1/eaDAJgc+KUZ6nrVcRAZUWMLX7e0R GVsGpReuKoxTV8UMDY5TY5Yw+znIY/vCW8JlIfWZeH4dX0Z/ybSEDmZIRfpdy5QOLVss z0SyG/EwwcUgT5EkTBWqnrDQTVaUEcS9epmo46FT8EnFzLkyWPLhOPvYlyJjWcdrhKes xSDQ== Received: by 10.180.100.101 with SMTP id ex5mr13170797wib.16.1349595879057; Sun, 07 Oct 2012 00:44:39 -0700 (PDT) Received: from lmenx30s.lme.st.com (pas72-1-88-161-60-229.fbx.proxad.net. [88.161.60.229]) by mx.google.com with ESMTPS id j8sm12495245wiy.9.2012.10.07.00.44.37 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 07 Oct 2012 00:44:38 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linaro-dev@lists.linaro.org, peterz@infradead.org, mingo@redhat.com, pjt@google.com, linux@arm.linux.org.uk Subject: [RFC 3/6] sched: pack small tasks Date: Sun, 7 Oct 2012 09:43:55 +0200 Message-Id: <1349595838-31274-4-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1349595838-31274-1-git-send-email-vincent.guittot@linaro.org> References: <1349595838-31274-1-git-send-email-vincent.guittot@linaro.org> X-Gm-Message-State: ALoCoQmsWAWK4xbEMkIERYNymbATLH4x9HPTermFbNIgjc2CaqlC/wGq/nPMT9dLyMUv81KXHYzE X-Bad-Reply: References and In-Reply-To but no 'Re:' in Subject. X-Spam-Note: CRM114 invocation failed X-Spam-Score: -2.6 (--) X-Spam-Report: SpamAssassin version 3.3.2 on merlin.infradead.org summary: Content analysis details: (-2.6 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [209.85.212.171 listed in list.dnswl.org] -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] X-Mailman-Approved-At: Sun, 07 Oct 2012 17:03:37 -0400 Cc: Vincent Guittot X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: linux-arm-kernel-bounces@lists.infradead.org Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org During sched_domain creation, we define a pack buddy CPU if available. On a system that share the powerline at all level, the buddy is set to -1 On a dual clusters / dual cores system which can powergate each core and cluster independantly, the buddy configuration will be : | CPU0 | CPU1 | CPU2 | CPU3 | ----------------------------------- buddy | CPU0 | CPU0 | CPU0 | CPU2 | Small tasks tend to slip out of the periodic load balance. The best place to choose to migrate them is at their wake up. Signed-off-by: Vincent Guittot --- kernel/sched/core.c | 1 + kernel/sched/fair.c | 109 ++++++++++++++++++++++++++++++++++++++++++++++++++ kernel/sched/sched.h | 1 + 3 files changed, 111 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index dab7908..70cadbe 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6131,6 +6131,7 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu) rcu_assign_pointer(rq->sd, sd); destroy_sched_domains(tmp, cpu); + update_packing_domain(cpu); update_top_cache_domain(cpu); } diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4f4a4f6..8c9d3ed 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -157,6 +157,63 @@ void sched_init_granularity(void) update_sysctl(); } + +/* + * Save the id of the optimal CPU that should be used to pack small tasks + * The value -1 is used when no buddy has been found + */ +DEFINE_PER_CPU(int, sd_pack_buddy); + +/* Look for the best buddy CPU that can be used to pack small tasks + * We make the assumption that it doesn't wort to pack on CPU that share the + * same powerline. We looks for the 1st sched_domain without the + * SD_SHARE_POWERLINE flag. Then We look for the sched_group witht the lowest + * power per core based on the assumption that their power efficiency is + * better */ +void update_packing_domain(int cpu) +{ + struct sched_domain *sd; + int id = -1; + + sd = highest_flag_domain(cpu, SD_SHARE_POWERLINE); + if (!sd) + sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd); + else + sd = sd->parent; + + while (sd) { + struct sched_group *sg = sd->groups; + struct sched_group *pack = sg; + struct sched_group *tmp = sg->next; + + /* 1st CPU of the sched domain is a good candidate */ + if (id == -1) + id = cpumask_first(sched_domain_span(sd)); + + /* loop the sched groups to find the best one */ + while (tmp != sg) { + if (tmp->sgp->power * sg->group_weight < + sg->sgp->power * tmp->group_weight) + pack = tmp; + tmp = tmp->next; + } + + /* we have found a better group */ + if (pack != sg) + id = cpumask_first(sched_group_cpus(pack)); + + /* Look for another CPU than itself */ + if ((id != cpu) + || ((sd->parent) && !(sd->parent->flags && SD_LOAD_BALANCE))) + break; + + sd = sd->parent; + } + + pr_info(KERN_INFO "CPU%d packing on CPU%d\n", cpu, id); + per_cpu(sd_pack_buddy, cpu) = id; +} + #if BITS_PER_LONG == 32 # define WMULT_CONST (~0UL) #else @@ -3073,6 +3130,55 @@ static int select_idle_sibling(struct task_struct *p, int target) return target; } +static inline bool is_buddy_busy(int cpu) +{ + struct rq *rq = cpu_rq(cpu); + + /* + * A busy buddy is a CPU with a high load or a small load with a lot of + * running tasks. + */ + return ((rq->avg.usage_avg_sum << rq->nr_running) > + rq->avg.runnable_avg_period); +} + +static inline bool is_light_task(struct task_struct *p) +{ + /* A light task runs less than 25% in average */ + return ((p->se.avg.usage_avg_sum << 2) < p->se.avg.runnable_avg_period); +} + +static int check_pack_buddy(int cpu, struct task_struct *p) +{ + int buddy = per_cpu(sd_pack_buddy, cpu); + + /* No pack buddy for this CPU */ + if (buddy == -1) + return false; + + /* + * If a task is waiting for running on the CPU which is its own buddy, + * let the default behavior to look for a better CPU if available + * The threshold has been set to 37.5% + */ + if ((buddy == cpu) + && ((p->se.avg.usage_avg_sum << 3) < (p->se.avg.runnable_avg_sum * 5))) + return false; + + /* buddy is not an allowed CPU */ + if (!cpumask_test_cpu(buddy, tsk_cpus_allowed(p))) + return false; + + /* + * If the task is a small one and the buddy is not overloaded, + * we use buddy cpu + */ + if (!is_light_task(p) || is_buddy_busy(buddy)) + return false; + + return true; +} + /* * sched_balance_self: balance the current task (running on cpu) in domains * that have the 'flag' flag set. In practice, this is SD_BALANCE_FORK and @@ -3098,6 +3204,9 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) if (p->nr_cpus_allowed == 1) return prev_cpu; + if (check_pack_buddy(cpu, p)) + return per_cpu(sd_pack_buddy, cpu); + if (sd_flag & SD_BALANCE_WAKE) { if (cpumask_test_cpu(cpu, tsk_cpus_allowed(p))) want_affine = 1; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index a95d5c1..086d8bf 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -875,6 +875,7 @@ static inline void idle_balance(int cpu, struct rq *rq) extern void sysrq_sched_debug_show(void); extern void sched_init_granularity(void); +extern void update_packing_domain(int cpu); extern void update_max_interval(void); extern void update_group_power(struct sched_domain *sd, int cpu); extern int update_runtime(struct notifier_block *nfb, unsigned long action, void *hcpu);