diff mbox

[07/14] sched: agressively pack at wake/fork/exec

Message ID 1366910611-20048-8-git-send-email-vincent.guittot@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Vincent Guittot April 25, 2013, 5:23 p.m. UTC
According to the packing policy, the scheduler can pack tasks at different
step:
-SCHED_PACKING_NONE level: we don't pack any task.
-SCHED_PACKING_DEFAULT: we only pack small tasks at wake up when system is not
busy.
-SCHED_PACKING_FULL: we pack tasks at wake up until a CPU becomes full. During
a fork or a exec, we assume that the new task is a full running one and we
look for an idle CPU close to the buddy CPU.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c |   47 ++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 42 insertions(+), 5 deletions(-)

Comments

Peter Zijlstra April 26, 2013, 1:08 p.m. UTC | #1
On Thu, Apr 25, 2013 at 07:23:23PM +0200, Vincent Guittot wrote:
> According to the packing policy, the scheduler can pack tasks at different
> step:
> -SCHED_PACKING_NONE level: we don't pack any task.
> -SCHED_PACKING_DEFAULT: we only pack small tasks at wake up when system is not
> busy.
> -SCHED_PACKING_FULL: we pack tasks at wake up until a CPU becomes full. During
> a fork or a exec, we assume that the new task is a full running one and we
> look for an idle CPU close to the buddy CPU.

This changelog is very short on explaining how it will go about achieving these
goals.

> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  kernel/sched/fair.c |   47 ++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 42 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 98166aa..874f330 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3259,13 +3259,16 @@ static struct sched_group *
>  find_idlest_group(struct sched_domain *sd, struct task_struct *p,


So for packing into power domains, wouldn't you typically pick the busiest non-
full domain to fill from other non-full?

Picking the idlest non-full seems like it would generate a ping-pong or not
actually pack anything.
Vincent Guittot April 26, 2013, 2:23 p.m. UTC | #2
On 26 April 2013 15:08, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Apr 25, 2013 at 07:23:23PM +0200, Vincent Guittot wrote:
>> According to the packing policy, the scheduler can pack tasks at different
>> step:
>> -SCHED_PACKING_NONE level: we don't pack any task.
>> -SCHED_PACKING_DEFAULT: we only pack small tasks at wake up when system is not
>> busy.
>> -SCHED_PACKING_FULL: we pack tasks at wake up until a CPU becomes full. During
>> a fork or a exec, we assume that the new task is a full running one and we
>> look for an idle CPU close to the buddy CPU.
>
> This changelog is very short on explaining how it will go about achieving these
> goals.

I could move some explanation of the cover letter inside the commit :
 In this case, the CPUs pack their tasks in their buddy until they
becomes full. Unlike
the previous step, we can't keep the same buddy so we update it during load
balance. During the periodic load balance, the scheduler computes the activity
of the system thanks the runnable_avg_sum and the cpu_power of all CPUs and
then it defines the CPUs that will be used to handle the current activity. The
selected CPUs will be their own buddy and will participate to the default
load balancing mecanism in order to share the tasks in a fair way, whereas the
not selected CPUs will not, and their buddy will be the last selected CPU.
The behavior can be summarized as: The scheduler defines how many CPUs are
required to handle the current activity, keeps the tasks on these CPUS and
perform normal load balancing

>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> ---
>>  kernel/sched/fair.c |   47 ++++++++++++++++++++++++++++++++++++++++++-----
>>  1 file changed, 42 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 98166aa..874f330 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -3259,13 +3259,16 @@ static struct sched_group *
>>  find_idlest_group(struct sched_domain *sd, struct task_struct *p,
>
>

A task that wakes up will be caught by the function check_pack_buddy
in order to stay in the CPUs that participates to the packing effort.
We will use the find_idlest_group only for fork/exec tasks which are
considered as full running tasks so we looks for the idlest CPU close
to the buddy.

> So for packing into power domains, wouldn't you typically pick the busiest non-
> full domain to fill from other non-full?
>
> Picking the idlest non-full seems like it would generate a ping-pong or not
> actually pack anything.
diff mbox

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 98166aa..874f330 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3259,13 +3259,16 @@  static struct sched_group *
 find_idlest_group(struct sched_domain *sd, struct task_struct *p,
 		  int this_cpu, int load_idx)
 {
-	struct sched_group *idlest = NULL, *group = sd->groups;
+	struct sched_group *idlest = NULL, *group = sd->groups, *buddy = NULL;
 	unsigned long min_load = ULONG_MAX, this_load = 0;
 	int imbalance = 100 + (sd->imbalance_pct-100)/2;
+	int buddy_cpu = per_cpu(sd_pack_buddy, this_cpu);
+	int get_buddy = ((sysctl_sched_packing_mode == SCHED_PACKING_FULL) &&
+		!(sd->flags & SD_SHARE_POWERDOMAIN) && (buddy_cpu != -1));
 
 	do {
 		unsigned long load, avg_load;
-		int local_group;
+		int local_group, buddy_group = 0;
 		int i;
 
 		/* Skip over this group if it has no CPUs allowed */
@@ -3276,6 +3279,11 @@  find_idlest_group(struct sched_domain *sd, struct task_struct *p,
 		local_group = cpumask_test_cpu(this_cpu,
 					       sched_group_cpus(group));
 
+		if (get_buddy) {
+			buddy_group = cpumask_test_cpu(buddy_cpu,
+						sched_group_cpus(group));
+		}
+
 		/* Tally up the load of all CPUs in the group */
 		avg_load = 0;
 
@@ -3287,6 +3295,9 @@  find_idlest_group(struct sched_domain *sd, struct task_struct *p,
 				load = target_load(i, load_idx);
 
 			avg_load += load;
+
+			if ((buddy_group) && idle_cpu(i))
+				buddy = group;
 		}
 
 		/* Adjust by relative CPU power of the group */
@@ -3300,6 +3311,9 @@  find_idlest_group(struct sched_domain *sd, struct task_struct *p,
 		}
 	} while (group = group->next, group != sd->groups);
 
+	if (buddy)
+		return buddy;
+
 	if (!idlest || 100*this_load < imbalance*min_load)
 		return NULL;
 	return idlest;
@@ -3402,6 +3416,21 @@  static bool is_buddy_busy(int cpu)
 	return (sum > (period / (rq->nr_running + 2)));
 }
 
+static bool is_buddy_full(int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+	u32 sum = rq->avg.runnable_avg_sum;
+	u32 period = rq->avg.runnable_avg_period;
+
+	sum = min(sum, period);
+
+	/*
+	 * A full buddy is a CPU with a sum greater or equal to period
+	 * We keep a margin of 2.4%
+	 */
+	return (sum * 1024 >= period * 1000);
+}
+
 static bool is_light_task(struct task_struct *p)
 {
 	/* A light task runs less than 20% in average */
@@ -3413,6 +3442,9 @@  static int check_pack_buddy(int cpu, struct task_struct *p)
 {
 	int buddy = per_cpu(sd_pack_buddy, cpu);
 
+	if (sysctl_sched_packing_mode == SCHED_PACKING_NONE)
+		return false;
+
 	/* No pack buddy for this CPU */
 	if (buddy == -1)
 		return false;
@@ -3421,14 +3453,19 @@  static int check_pack_buddy(int cpu, struct task_struct *p)
 	if (!cpumask_test_cpu(buddy, tsk_cpus_allowed(p)))
 		return false;
 
+	/* We agressively pack at wake up */
+	if ((sysctl_sched_packing_mode == SCHED_PACKING_FULL)
+	 && !is_buddy_full(buddy))
+		return true;
 	/*
 	 * If the task is a small one and the buddy is not overloaded,
 	 * we use buddy cpu
 	 */
-	if (!is_light_task(p) || is_buddy_busy(buddy))
-		return false;
+	if (is_light_task(p) && !is_buddy_busy(buddy))
+		return true;
+
+	return false;
 
-	return true;
 }
 
 /*