diff mbox

[v5,03/12] sched: fix avg_load computation

Message ID 1409051215-16788-4-git-send-email-vincent.guittot@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Vincent Guittot Aug. 26, 2014, 11:06 a.m. UTC
The computation of avg_load and avg_load_per_task should only takes into
account the number of cfs tasks. The non cfs task are already taken into
account by decreasing the cpu's capacity and they will be tracked in the
CPU's utilization (group_utilization) of the next patches

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

preeti Aug. 30, 2014, noon UTC | #1
Hi Vincent,

On 08/26/2014 04:36 PM, Vincent Guittot wrote:
> The computation of avg_load and avg_load_per_task should only takes into
> account the number of cfs tasks. The non cfs task are already taken into
> account by decreasing the cpu's capacity and they will be tracked in the
> CPU's utilization (group_utilization) of the next patches
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  kernel/sched/fair.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 87b9dc7..b85e9f7 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4092,7 +4092,7 @@ static unsigned long capacity_of(int cpu)
>  static unsigned long cpu_avg_load_per_task(int cpu)
>  {
>  	struct rq *rq = cpu_rq(cpu);
> -	unsigned long nr_running = ACCESS_ONCE(rq->nr_running);
> +	unsigned long nr_running = ACCESS_ONCE(rq->cfs.h_nr_running);
>  	unsigned long load_avg = rq->cfs.runnable_load_avg;
> 
>  	if (nr_running)
> @@ -5985,7 +5985,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>  			load = source_load(i, load_idx);
> 
>  		sgs->group_load += load;
> -		sgs->sum_nr_running += rq->nr_running;
> +		sgs->sum_nr_running += rq->cfs.h_nr_running;
> 
>  		if (rq->nr_running > 1)
>  			*overload = true;
> 

Why do we probe rq->nr_running while we do load balancing? Should not we
be probing cfs_rq->nr_running instead? We are interested after all in
load balancing fair tasks right? The reason I ask this is, I was
wondering if we need to make the above similar change in more places in
load balancing.

To cite examples: The above check says a cpu is overloaded when
rq->nr_running > 1. However if these tasks happen to be rt tasks, we
would anyway not be able to load balance. So while I was looking through
this patch, I noticed this and wanted to cross verify if we are checking
rq->nr_running on purpose in some places in load balancing; another
example being in nohz_kick_needed().


Regards
Preeti U Murthy
Vincent Guittot Sept. 3, 2014, 11:09 a.m. UTC | #2
On 30 August 2014 14:00, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
> Hi Vincent,
>
> On 08/26/2014 04:36 PM, Vincent Guittot wrote:
>> The computation of avg_load and avg_load_per_task should only takes into
>> account the number of cfs tasks. The non cfs task are already taken into
>> account by decreasing the cpu's capacity and they will be tracked in the
>> CPU's utilization (group_utilization) of the next patches
>>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> ---
>>  kernel/sched/fair.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 87b9dc7..b85e9f7 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4092,7 +4092,7 @@ static unsigned long capacity_of(int cpu)
>>  static unsigned long cpu_avg_load_per_task(int cpu)
>>  {
>>       struct rq *rq = cpu_rq(cpu);
>> -     unsigned long nr_running = ACCESS_ONCE(rq->nr_running);
>> +     unsigned long nr_running = ACCESS_ONCE(rq->cfs.h_nr_running);
>>       unsigned long load_avg = rq->cfs.runnable_load_avg;
>>
>>       if (nr_running)
>> @@ -5985,7 +5985,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>>                       load = source_load(i, load_idx);
>>
>>               sgs->group_load += load;
>> -             sgs->sum_nr_running += rq->nr_running;
>> +             sgs->sum_nr_running += rq->cfs.h_nr_running;
>>
>>               if (rq->nr_running > 1)
>>                       *overload = true;
>>
>
> Why do we probe rq->nr_running while we do load balancing? Should not we
> be probing cfs_rq->nr_running instead? We are interested after all in
> load balancing fair tasks right? The reason I ask this is, I was
> wondering if we need to make the above similar change in more places in
> load balancing.

Hi Preeti,

Yes, we should probably the test   rq->cfs.h_nr_running > 0 before
setting overload.

Sorry for this late answer, the email was lost in my messy inbox

Vincent

>
> To cite examples: The above check says a cpu is overloaded when
> rq->nr_running > 1. However if these tasks happen to be rt tasks, we
> would anyway not be able to load balance. So while I was looking through
> this patch, I noticed this and wanted to cross verify if we are checking
> rq->nr_running on purpose in some places in load balancing; another
> example being in nohz_kick_needed().
>
>
> Regards
> Preeti U Murthy
>
Tim Chen Sept. 3, 2014, 11:43 p.m. UTC | #3
On Wed, 2014-09-03 at 13:09 +0200, Vincent Guittot wrote:
> On 30 August 2014 14:00, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
> > Hi Vincent,
> >
> > On 08/26/2014 04:36 PM, Vincent Guittot wrote:
> >> The computation of avg_load and avg_load_per_task should only takes into
> >> account the number of cfs tasks. The non cfs task are already taken into
> >> account by decreasing the cpu's capacity and they will be tracked in the
> >> CPU's utilization (group_utilization) of the next patches
> >>
> >> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> >> ---
> >>  kernel/sched/fair.c | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index 87b9dc7..b85e9f7 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -4092,7 +4092,7 @@ static unsigned long capacity_of(int cpu)
> >>  static unsigned long cpu_avg_load_per_task(int cpu)
> >>  {
> >>       struct rq *rq = cpu_rq(cpu);
> >> -     unsigned long nr_running = ACCESS_ONCE(rq->nr_running);
> >> +     unsigned long nr_running = ACCESS_ONCE(rq->cfs.h_nr_running);
> >>       unsigned long load_avg = rq->cfs.runnable_load_avg;
> >>
> >>       if (nr_running)
> >> @@ -5985,7 +5985,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
> >>                       load = source_load(i, load_idx);
> >>
> >>               sgs->group_load += load;
> >> -             sgs->sum_nr_running += rq->nr_running;
> >> +             sgs->sum_nr_running += rq->cfs.h_nr_running;
> >>
> >>               if (rq->nr_running > 1)
> >>                       *overload = true;
> >>
> >
> > Why do we probe rq->nr_running while we do load balancing? Should not we
> > be probing cfs_rq->nr_running instead? We are interested after all in
> > load balancing fair tasks right? The reason I ask this is, I was
> > wondering if we need to make the above similar change in more places in
> > load balancing.
> 
> Hi Preeti,
> 
> Yes, we should probably the test   rq->cfs.h_nr_running > 0 before
> setting overload.
> 

The overload indicator is used for knowing when we can totally avoid
load balancing to a cpu that is about to go idle.  
We can avoid load balancing when no cpu has more than 1 task.  So if you
have say just one fair task and multiple deadline tasks on a cpu,
and another cpu about to go idle, you should turn on normal load
balancing in the idle path by setting overload to true. 

So setting overload should be set based on rq->nr_running and not on
rq->cfs.h_nr_running. 

Thanks.
 
Tim
Vincent Guittot Sept. 4, 2014, 7:17 a.m. UTC | #4
On 4 September 2014 01:43, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> On Wed, 2014-09-03 at 13:09 +0200, Vincent Guittot wrote:
>> On 30 August 2014 14:00, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
>> > Hi Vincent,
>> >
>> > On 08/26/2014 04:36 PM, Vincent Guittot wrote:
>> >> The computation of avg_load and avg_load_per_task should only takes into
>> >> account the number of cfs tasks. The non cfs task are already taken into
>> >> account by decreasing the cpu's capacity and they will be tracked in the
>> >> CPU's utilization (group_utilization) of the next patches
>> >>
>> >> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> >> ---
>> >>  kernel/sched/fair.c | 4 ++--
>> >>  1 file changed, 2 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> >> index 87b9dc7..b85e9f7 100644
>> >> --- a/kernel/sched/fair.c
>> >> +++ b/kernel/sched/fair.c
>> >> @@ -4092,7 +4092,7 @@ static unsigned long capacity_of(int cpu)
>> >>  static unsigned long cpu_avg_load_per_task(int cpu)
>> >>  {
>> >>       struct rq *rq = cpu_rq(cpu);
>> >> -     unsigned long nr_running = ACCESS_ONCE(rq->nr_running);
>> >> +     unsigned long nr_running = ACCESS_ONCE(rq->cfs.h_nr_running);
>> >>       unsigned long load_avg = rq->cfs.runnable_load_avg;
>> >>
>> >>       if (nr_running)
>> >> @@ -5985,7 +5985,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>> >>                       load = source_load(i, load_idx);
>> >>
>> >>               sgs->group_load += load;
>> >> -             sgs->sum_nr_running += rq->nr_running;
>> >> +             sgs->sum_nr_running += rq->cfs.h_nr_running;
>> >>
>> >>               if (rq->nr_running > 1)
>> >>                       *overload = true;
>> >>
>> >
>> > Why do we probe rq->nr_running while we do load balancing? Should not we
>> > be probing cfs_rq->nr_running instead? We are interested after all in
>> > load balancing fair tasks right? The reason I ask this is, I was
>> > wondering if we need to make the above similar change in more places in
>> > load balancing.
>>
>> Hi Preeti,
>>
>> Yes, we should probably the test   rq->cfs.h_nr_running > 0 before
>> setting overload.
>>
>
> The overload indicator is used for knowing when we can totally avoid
> load balancing to a cpu that is about to go idle.
> We can avoid load balancing when no cpu has more than 1 task.  So if you
> have say just one fair task and multiple deadline tasks on a cpu,
> and another cpu about to go idle, you should turn on normal load
> balancing in the idle path by setting overload to true.

The newly idle load balancing can only affect CFS tasks so triggering
a load_balance because a cpu is overloaded by rt tasks only, will not
change anything.

>
> So setting overload should be set based on rq->nr_running and not on
> rq->cfs.h_nr_running.

We should probably use both values like below

if ((rq->nr_running > 1) && ( rq->cfs.h_nr_running > 0))

Regards,
Vincent
>
> Thanks.
>
> Tim
>
>
Tim Chen Sept. 4, 2014, 4:26 p.m. UTC | #5
On Thu, 2014-09-04 at 09:17 +0200, Vincent Guittot wrote:
> On 4 September 2014 01:43, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> > On Wed, 2014-09-03 at 13:09 +0200, Vincent Guittot wrote:
> >> On 30 August 2014 14:00, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
> >> > Hi Vincent,
> >> >
> >> > On 08/26/2014 04:36 PM, Vincent Guittot wrote:
> >> >> The computation of avg_load and avg_load_per_task should only takes into
> >> >> account the number of cfs tasks. The non cfs task are already taken into
> >> >> account by decreasing the cpu's capacity and they will be tracked in the
> >> >> CPU's utilization (group_utilization) of the next patches
> >> >>
> >> >> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> >> >> ---
> >> >>  kernel/sched/fair.c | 4 ++--
> >> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >> >>
> >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> >> index 87b9dc7..b85e9f7 100644
> >> >> --- a/kernel/sched/fair.c
> >> >> +++ b/kernel/sched/fair.c
> >> >> @@ -4092,7 +4092,7 @@ static unsigned long capacity_of(int cpu)
> >> >>  static unsigned long cpu_avg_load_per_task(int cpu)
> >> >>  {
> >> >>       struct rq *rq = cpu_rq(cpu);
> >> >> -     unsigned long nr_running = ACCESS_ONCE(rq->nr_running);
> >> >> +     unsigned long nr_running = ACCESS_ONCE(rq->cfs.h_nr_running);
> >> >>       unsigned long load_avg = rq->cfs.runnable_load_avg;
> >> >>
> >> >>       if (nr_running)
> >> >> @@ -5985,7 +5985,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
> >> >>                       load = source_load(i, load_idx);
> >> >>
> >> >>               sgs->group_load += load;
> >> >> -             sgs->sum_nr_running += rq->nr_running;
> >> >> +             sgs->sum_nr_running += rq->cfs.h_nr_running;
> >> >>
> >> >>               if (rq->nr_running > 1)
> >> >>                       *overload = true;
> >> >>
> >> >
> >> > Why do we probe rq->nr_running while we do load balancing? Should not we
> >> > be probing cfs_rq->nr_running instead? We are interested after all in
> >> > load balancing fair tasks right? The reason I ask this is, I was
> >> > wondering if we need to make the above similar change in more places in
> >> > load balancing.
> >>
> >> Hi Preeti,
> >>
> >> Yes, we should probably the test   rq->cfs.h_nr_running > 0 before
> >> setting overload.
> >>
> >
> > The overload indicator is used for knowing when we can totally avoid
> > load balancing to a cpu that is about to go idle.
> > We can avoid load balancing when no cpu has more than 1 task.  So if you
> > have say just one fair task and multiple deadline tasks on a cpu,
> > and another cpu about to go idle, you should turn on normal load
> > balancing in the idle path by setting overload to true.
> 
> The newly idle load balancing can only affect CFS tasks so triggering
> a load_balance because a cpu is overloaded by rt tasks only, will not
> change anything.
> 
> >
> > So setting overload should be set based on rq->nr_running and not on
> > rq->cfs.h_nr_running.
> 
> We should probably use both values like below
> 
> if ((rq->nr_running > 1) && ( rq->cfs.h_nr_running > 0))

Yes, this modification is the correct one that takes care of the
condition I objected to previously.

Thanks.

Tim
preeti Sept. 5, 2014, 11:10 a.m. UTC | #6
On 08/26/2014 04:36 PM, Vincent Guittot wrote:
> The computation of avg_load and avg_load_per_task should only takes into
> account the number of cfs tasks. The non cfs task are already taken into
> account by decreasing the cpu's capacity and they will be tracked in the
> CPU's utilization (group_utilization) of the next patches
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  kernel/sched/fair.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 87b9dc7..b85e9f7 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4092,7 +4092,7 @@ static unsigned long capacity_of(int cpu)
>  static unsigned long cpu_avg_load_per_task(int cpu)
>  {
>  	struct rq *rq = cpu_rq(cpu);
> -	unsigned long nr_running = ACCESS_ONCE(rq->nr_running);
> +	unsigned long nr_running = ACCESS_ONCE(rq->cfs.h_nr_running);
>  	unsigned long load_avg = rq->cfs.runnable_load_avg;
> 
>  	if (nr_running)
> @@ -5985,7 +5985,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>  			load = source_load(i, load_idx);
> 
>  		sgs->group_load += load;
> -		sgs->sum_nr_running += rq->nr_running;
> +		sgs->sum_nr_running += rq->cfs.h_nr_running;

Yes this was one of the concerns I had around the usage of
rq->nr_running. Looks good to me.

> 
>  		if (rq->nr_running > 1)
>  			*overload = true;
> 
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
diff mbox

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 87b9dc7..b85e9f7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4092,7 +4092,7 @@  static unsigned long capacity_of(int cpu)
 static unsigned long cpu_avg_load_per_task(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
-	unsigned long nr_running = ACCESS_ONCE(rq->nr_running);
+	unsigned long nr_running = ACCESS_ONCE(rq->cfs.h_nr_running);
 	unsigned long load_avg = rq->cfs.runnable_load_avg;
 
 	if (nr_running)
@@ -5985,7 +5985,7 @@  static inline void update_sg_lb_stats(struct lb_env *env,
 			load = source_load(i, load_idx);
 
 		sgs->group_load += load;
-		sgs->sum_nr_running += rq->nr_running;
+		sgs->sum_nr_running += rq->cfs.h_nr_running;
 
 		if (rq->nr_running > 1)
 			*overload = true;