[07/14] sched: agressively pack at wake/fork/exec

Message ID	1366910611-20048-8-git-send-email-vincent.guittot@linaro.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> From: Vincent Guittot <vincent.guittot@linaro.org> To: linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linaro-kernel@lists.linaro.org, peterz@infradead.org, mingo@kernel.org, linux@arm.linux.org.uk, pjt@google.com, santosh.shilimkar@ti.com, Morten.Rasmussen@arm.com, chander.kashyap@linaro.org, cmetcalf@tilera.com, tony.luck@intel.com, alex.shi@intel.com, preeti@linux.vnet.ibm.com Subject: [PATCH 07/14] sched: agressively pack at wake/fork/exec Date: Thu, 25 Apr 2013 19:23:23 +0200 Message-Id: <1366910611-20048-8-git-send-email-vincent.guittot@linaro.org> In-Reply-To: <1366910611-20048-1-git-send-email-vincent.guittot@linaro.org> References: <1366910611-20048-1-git-send-email-vincent.guittot@linaro.org> summary: Content analysis details: (-1.9 points) pts rule name description ---- ---------------------- -------------------------------------------------- -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] Cc: len.brown@intel.com, l.majewski@samsung.com, Vincent Guittot <vincent.guittot@linaro.org>, corbet@lwn.net, amit.kucheria@linaro.org, tglx@linutronix.de, paulmck@linux.vnet.ibm.com, arjan@linux.intel.com Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Message ID

1366910611-20048-8-git-send-email-vincent.guittot@linaro.org (mailing list archive)

State

New, archived

Headers

From: Vincent Guittot <vincent.guittot@linaro.org>
To: linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linaro-kernel@lists.linaro.org, peterz@infradead.org, mingo@kernel.org,
	linux@arm.linux.org.uk, pjt@google.com, santosh.shilimkar@ti.com,
	Morten.Rasmussen@arm.com, chander.kashyap@linaro.org,
	cmetcalf@tilera.com, 
	tony.luck@intel.com, alex.shi@intel.com, preeti@linux.vnet.ibm.com
Subject: [PATCH 07/14] sched: agressively pack at wake/fork/exec
Date: Thu, 25 Apr 2013 19:23:23 +0200
Message-Id: <1366910611-20048-8-git-send-email-vincent.guittot@linaro.org>
In-Reply-To: <1366910611-20048-1-git-send-email-vincent.guittot@linaro.org>
References: <1366910611-20048-1-git-send-email-vincent.guittot@linaro.org>
Cc: len.brown@intel.com, l.majewski@samsung.com,
	Vincent Guittot <vincent.guittot@linaro.org>, corbet@lwn.net,
	amit.kucheria@linaro.org, tglx@linutronix.de,
	paulmck@linux.vnet.ibm.com, arjan@linux.intel.com
Precedence: list
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Commit Message

Vincent Guittot April 25, 2013, 5:23 p.m. UTC

According to the packing policy, the scheduler can pack tasks at different
step:
-SCHED_PACKING_NONE level: we don't pack any task.
-SCHED_PACKING_DEFAULT: we only pack small tasks at wake up when system is not
busy.
-SCHED_PACKING_FULL: we pack tasks at wake up until a CPU becomes full. During
a fork or a exec, we assume that the new task is a full running one and we
look for an idle CPU close to the buddy CPU.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c |   47 ++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 42 insertions(+), 5 deletions(-)

Comments

Peter Zijlstra April 26, 2013, 1:08 p.m. UTC | #1

On Thu, Apr 25, 2013 at 07:23:23PM +0200, Vincent Guittot wrote:
> According to the packing policy, the scheduler can pack tasks at different
> step:
> -SCHED_PACKING_NONE level: we don't pack any task.
> -SCHED_PACKING_DEFAULT: we only pack small tasks at wake up when system is not
> busy.
> -SCHED_PACKING_FULL: we pack tasks at wake up until a CPU becomes full. During
> a fork or a exec, we assume that the new task is a full running one and we
> look for an idle CPU close to the buddy CPU.

This changelog is very short on explaining how it will go about achieving these
goals.

> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  kernel/sched/fair.c |   47 ++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 42 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 98166aa..874f330 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3259,13 +3259,16 @@ static struct sched_group *
>  find_idlest_group(struct sched_domain *sd, struct task_struct *p,


So for packing into power domains, wouldn't you typically pick the busiest non-
full domain to fill from other non-full?

Picking the idlest non-full seems like it would generate a ping-pong or not
actually pack anything.

Vincent Guittot April 26, 2013, 2:23 p.m. UTC | #2

On 26 April 2013 15:08, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Apr 25, 2013 at 07:23:23PM +0200, Vincent Guittot wrote:
>> According to the packing policy, the scheduler can pack tasks at different
>> step:
>> -SCHED_PACKING_NONE level: we don't pack any task.
>> -SCHED_PACKING_DEFAULT: we only pack small tasks at wake up when system is not
>> busy.
>> -SCHED_PACKING_FULL: we pack tasks at wake up until a CPU becomes full. During
>> a fork or a exec, we assume that the new task is a full running one and we
>> look for an idle CPU close to the buddy CPU.
>
> This changelog is very short on explaining how it will go about achieving these
> goals.

I could move some explanation of the cover letter inside the commit :
 In this case, the CPUs pack their tasks in their buddy until they
becomes full. Unlike
the previous step, we can't keep the same buddy so we update it during load
balance. During the periodic load balance, the scheduler computes the activity
of the system thanks the runnable_avg_sum and the cpu_power of all CPUs and
then it defines the CPUs that will be used to handle the current activity. The
selected CPUs will be their own buddy and will participate to the default
load balancing mecanism in order to share the tasks in a fair way, whereas the
not selected CPUs will not, and their buddy will be the last selected CPU.
The behavior can be summarized as: The scheduler defines how many CPUs are
required to handle the current activity, keeps the tasks on these CPUS and
perform normal load balancing

>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> ---
>>  kernel/sched/fair.c |   47 ++++++++++++++++++++++++++++++++++++++++++-----
>>  1 file changed, 42 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 98166aa..874f330 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -3259,13 +3259,16 @@ static struct sched_group *
>>  find_idlest_group(struct sched_domain *sd, struct task_struct *p,
>
>

A task that wakes up will be caught by the function check_pack_buddy
in order to stay in the CPUs that participates to the packing effort.
We will use the find_idlest_group only for fork/exec tasks which are
considered as full running tasks so we looks for the idlest CPU close
to the buddy.

> So for packing into power domains, wouldn't you typically pick the busiest non-
> full domain to fill from other non-full?
>
> Picking the idlest non-full seems like it would generate a ping-pong or not
> actually pack anything.

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 98166aa..874f330 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3259,13 +3259,16 @@  static struct sched_group *
 find_idlest_group(struct sched_domain *sd, struct task_struct *p,
 		  int this_cpu, int load_idx)
 {
-	struct sched_group *idlest = NULL, *group = sd->groups;
+	struct sched_group *idlest = NULL, *group = sd->groups, *buddy = NULL;
 	unsigned long min_load = ULONG_MAX, this_load = 0;
 	int imbalance = 100 + (sd->imbalance_pct-100)/2;
+	int buddy_cpu = per_cpu(sd_pack_buddy, this_cpu);
+	int get_buddy = ((sysctl_sched_packing_mode == SCHED_PACKING_FULL) &&
+		!(sd->flags & SD_SHARE_POWERDOMAIN) && (buddy_cpu != -1));
 
 	do {
 		unsigned long load, avg_load;
-		int local_group;
+		int local_group, buddy_group = 0;
 		int i;
 
 		/* Skip over this group if it has no CPUs allowed */
@@ -3276,6 +3279,11 @@  find_idlest_group(struct sched_domain *sd, struct task_struct *p,
 		local_group = cpumask_test_cpu(this_cpu,
 					       sched_group_cpus(group));
 
+		if (get_buddy) {
+			buddy_group = cpumask_test_cpu(buddy_cpu,
+						sched_group_cpus(group));
+		}
+
 		/* Tally up the load of all CPUs in the group */
 		avg_load = 0;
 
@@ -3287,6 +3295,9 @@  find_idlest_group(struct sched_domain *sd, struct task_struct *p,
 				load = target_load(i, load_idx);
 
 			avg_load += load;
+
+			if ((buddy_group) && idle_cpu(i))
+				buddy = group;
 		}
 
 		/* Adjust by relative CPU power of the group */
@@ -3300,6 +3311,9 @@  find_idlest_group(struct sched_domain *sd, struct task_struct *p,
 		}
 	} while (group = group->next, group != sd->groups);
 
+	if (buddy)
+		return buddy;
+
 	if (!idlest || 100*this_load < imbalance*min_load)
 		return NULL;
 	return idlest;
@@ -3402,6 +3416,21 @@  static bool is_buddy_busy(int cpu)
 	return (sum > (period / (rq->nr_running + 2)));
 }
 
+static bool is_buddy_full(int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+	u32 sum = rq->avg.runnable_avg_sum;
+	u32 period = rq->avg.runnable_avg_period;
+
+	sum = min(sum, period);
+
+	/*
+	 * A full buddy is a CPU with a sum greater or equal to period
+	 * We keep a margin of 2.4%
+	 */
+	return (sum * 1024 >= period * 1000);
+}
+
 static bool is_light_task(struct task_struct *p)
 {
 	/* A light task runs less than 20% in average */
@@ -3413,6 +3442,9 @@  static int check_pack_buddy(int cpu, struct task_struct *p)
 {
 	int buddy = per_cpu(sd_pack_buddy, cpu);
 
+	if (sysctl_sched_packing_mode == SCHED_PACKING_NONE)
+		return false;
+
 	/* No pack buddy for this CPU */
 	if (buddy == -1)
 		return false;
@@ -3421,14 +3453,19 @@  static int check_pack_buddy(int cpu, struct task_struct *p)
 	if (!cpumask_test_cpu(buddy, tsk_cpus_allowed(p)))
 		return false;
 
+	/* We agressively pack at wake up */
+	if ((sysctl_sched_packing_mode == SCHED_PACKING_FULL)
+	 && !is_buddy_full(buddy))
+		return true;
 	/*
 	 * If the task is a small one and the buddy is not overloaded,
 	 * we use buddy cpu
 	 */
-	if (!is_light_task(p) || is_buddy_busy(buddy))
-		return false;
+	if (is_light_task(p) && !is_buddy_busy(buddy))
+		return true;
+
+	return false;
 
-	return true;
 }
 
 /*

[07/14] sched: agressively pack at wake/fork/exec

Commit Message

Comments

Patch