From patchwork Thu Oct 25 10:25:03 2012
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: preeti <preeti@linux.vnet.ibm.com>
X-Patchwork-Id: 1643131
Return-Path: 
 <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org>
X-Original-To: patchwork-linux-arm@patchwork.kernel.org
Delivered-To: patchwork-process-083081@patchwork1.kernel.org
Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134])
	by patchwork1.kernel.org (Postfix) with ESMTP id 82BBE3FE1C
	for <patchwork-linux-arm@patchwork.kernel.org>;
	Thu, 25 Oct 2012 10:36:07 +0000 (UTC)
Received: from localhost ([::1] helo=merlin.infradead.org)
	by merlin.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux))
	id 1TRKlG-0006GF-SO; Thu, 25 Oct 2012 10:34:06 +0000
Received: from e23smtp01.au.ibm.com ([202.81.31.143])
	by merlin.infradead.org with esmtps (Exim 4.76 #1 (Red Hat Linux))
	id 1TRKkX-0005KU-7c for linux-arm-kernel@lists.infradead.org;
	Thu, 25 Oct 2012 10:33:27 +0000
Received: from /spool/local
	by e23smtp01.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use
	Only! Violators will be prosecuted
	for <linux-arm-kernel@lists.infradead.org> from
	<preeti@linux.vnet.ibm.com>; Thu, 25 Oct 2012 20:30:10 +1000
Received: from d23relay03.au.ibm.com (202.81.31.245)
	by e23smtp01.au.ibm.com (202.81.31.207) with IBM ESMTP SMTP Gateway:
	Authorized Use Only! Violators will be prosecuted;
	Thu, 25 Oct 2012 20:30:06 +1000
Received: from d23av02.au.ibm.com (d23av02.au.ibm.com [9.190.235.138])
	by d23relay03.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
	q9PAWa6v63111234 for <linux-arm-kernel@lists.infradead.org>;
	Thu, 25 Oct 2012 21:32:38 +1100
Received: from d23av02.au.ibm.com (loopback [127.0.0.1])
	by d23av02.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id
	q9PAPQbH008659 for <linux-arm-kernel@lists.infradead.org>;
	Thu, 25 Oct 2012 21:25:28 +1100
Received: from preeti.in.ibm.com (preeti.in.ibm.com [9.124.35.56])
	by d23av02.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id
	q9PAPMRX008582; Thu, 25 Oct 2012 21:25:23 +1100
Subject: [RFC PATCH 02/13] sched:Pick the apt busy sched group during load
	balancing
To: svaidy@linux.vnet.ibm.com, linux-kernel@vger.kernel.org
From: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Date: Thu, 25 Oct 2012 15:55:03 +0530
Message-ID: <20121025102503.21022.35291.stgit@preeti.in.ibm.com>
In-Reply-To: <20121025102045.21022.92489.stgit@preeti.in.ibm.com>
References: <20121025102045.21022.92489.stgit@preeti.in.ibm.com>
User-Agent: StGit/0.16-38-g167d
MIME-Version: 1.0
x-cbid: 12102510-1618-0000-0000-000002B61969
X-Spam-Note: CRM114 invocation failed
X-Spam-Score: -1.9 (-)
X-Spam-Report: SpamAssassin version 3.3.2 on merlin.infradead.org summary:
	Content analysis details:   (-1.9 points)
	pts rule name              description
	---- ----------------------
	--------------------------------------------------
	-2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at http://www.dnswl.org/,
	medium trust [202.81.31.143 listed in list.dnswl.org]
	3.0 KHOP_BIG_TO_CC Sent to 10+ recipients instaed of Bcc or a list
	-0.7 RP_MATCHES_RCVD Envelope sender domain matches handover relay
	domain
	-1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
	[score: 0.0000]
Cc: Morten.Rasmussen@arm.com, venki@google.com, robin.randhawa@arm.com,
	linaro-dev@lists.linaro.org, a.p.zijlstra@chello.nl,
	mjg59@srcf.ucam.org, viresh.kumar@linaro.org, amit.kucheria@linaro.org,
	deepthi@linux.vnet.ibm.com,
	Arvind.Chauhan@arm.com, paul.mckenney@linaro.org,
	suresh.b.siddha@intel.com,
	tglx@linutronix.de, srivatsa.bhat@linux.vnet.ibm.com,
	vincent.guittot@linaro.org, akpm@linux-foundation.org,
	paulmck@linux.vnet.ibm.com, arjan@linux.intel.com, mingo@kernel.org,
	linux-arm-kernel@lists.infradead.org, pjt@google.com
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: 
 <http://lists.infradead.org/mailman/options/linux-arm-kernel>,
	<mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: 
 <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>,
	<mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Sender: linux-arm-kernel-bounces@lists.infradead.org
Errors-To: 
 linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

If a sched group has passed the test for sufficient load in
update_sg_lb_stats,to qualify for load balancing,then PJT's
metrics has to be used to qualify the right sched group as the busiest group.

The scenario which led to this patch is shown below:
Consider Task1 and Task2 to be a long running task
and Tasks 3,4,5,6 to be short running tasks

			Task3
			Task4
Task1			Task5
Task2			Task6
------			------
SCHED_GRP1		SCHED_GRP2

Normal load calculator would qualify SCHED_GRP2 as
the candidate for sd->busiest due to the following loads
that it calculates.

SCHED_GRP1:2048
SCHED_GRP2:4096

Load calculator would probably qualify SCHED_GRP1 as the candidate
for sd->busiest due to the following loads that it calculates

SCHED_GRP1:3200
SCHED_GRP2:1156

This patch aims to strike a balance between the loads of the
group and the number of tasks running on the group to decide the
busiest group in the sched_domain.

This means we will need to use the PJT's metrics but with an
additional constraint.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
---
 kernel/sched/fair.c |   25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e02dad4..aafa3c1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -165,7 +165,8 @@ void sched_init_granularity(void)
 #else
 # define WMULT_CONST	(1UL << 32)
 #endif
-
+#define NR_THRESHOLD 2
+#define LOAD_THRESHOLD 1
 #define WMULT_SHIFT	32
 
 /*
@@ -4169,6 +4170,7 @@ struct sd_lb_stats {
 	/* Statistics of the busiest group */
 	unsigned int  busiest_idle_cpus;
 	unsigned long max_load;
+	u64 max_sg_load; /* Equivalent of max_load but calculated using pjt's metric*/
 	unsigned long busiest_load_per_task;
 	unsigned long busiest_nr_running;
 	unsigned long busiest_group_capacity;
@@ -4628,8 +4630,24 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 				   struct sched_group *sg,
 				   struct sg_lb_stats *sgs)
 {
-	if (sgs->avg_load <= sds->max_load)
-		return false;
+	/* Use PJT's metrics to qualify a sched_group as busy
+ 	 *
+ 	 * But a low load sched group may be queueing up many tasks
+ 	 * So before dismissing a sched group with lesser load,ensure
+ 	 * that the number of processes on it is checked if it is
+ 	 * not too less loaded than the max load so far
+ 	 *
+ 	 * But as of now as LOAD_THRESHOLD is 1,this check is a nop.
+ 	 * But we could vary LOAD_THRESHOLD suitably to bring in this check
+ 	 */
+	if (sgs->avg_cfs_runnable_load <= sds->max_sg_load) {
+		if (sgs->avg_cfs_runnable_load > LOAD_THRESHOLD * sds->max_sg_load) {
+			if (sgs->sum_nr_running <= (NR_THRESHOLD + sds->busiest_nr_running))
+				return false;
+		} else {
+			return false;
+		}
+	}
 
 	if (sgs->sum_nr_running > sgs->group_capacity)
 		return true;
@@ -4708,6 +4726,7 @@ static inline void update_sd_lb_stats(struct lb_env *env,
 			sds->this_idle_cpus = sgs.idle_cpus;
 		} else if (update_sd_pick_busiest(env, sds, sg, &sgs)) {
 			sds->max_load = sgs.avg_load;
+			sds->max_sg_load = sgs.avg_cfs_runnable_load;
 			sds->busiest = sg;
 			sds->busiest_nr_running = sgs.sum_nr_running;
 			sds->busiest_idle_cpus = sgs.idle_cpus;