diff mbox series

[2/4] sched/fair: Move avg_scan_cost calculations under SIS_PROP

Message ID 20201208153501.1467-3-mgorman@techsingularity.net (mailing list archive)
State New, archived
Headers show
Series Reduce scanning of runqueues in select_idle_sibling | expand

Commit Message

Mel Gorman Dec. 8, 2020, 3:34 p.m. UTC
As noted by Vincent Guittot, avg_scan_costs are calculated for SIS_PROP
even if SIS_PROP is disabled. Move the time calculations under a SIS_PROP
check and while we are at it, exclude the cost of initialising the CPU
mask from the average scan cost.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

Comments

Vincent Guittot Dec. 8, 2020, 4:03 p.m. UTC | #1
On Tue, 8 Dec 2020 at 16:35, Mel Gorman <mgorman@techsingularity.net> wrote:
>
> As noted by Vincent Guittot, avg_scan_costs are calculated for SIS_PROP
> even if SIS_PROP is disabled. Move the time calculations under a SIS_PROP
> check and while we are at it, exclude the cost of initialising the CPU
> mask from the average scan cost.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  kernel/sched/fair.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ac7b34e7372b..5c41875aec23 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6153,6 +6153,8 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>         if (!this_sd)
>                 return -1;

Just noticed while reviewing the patch that the above related to
this_sd can also go under sched_feat(SIS_PROP)

>
> +       cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> +
>         if (sched_feat(SIS_PROP)) {
>                 u64 avg_cost, avg_idle, span_avg;
>
> @@ -6168,11 +6170,9 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>                         nr = div_u64(span_avg, avg_cost);
>                 else
>                         nr = 4;
> -       }
> -
> -       time = cpu_clock(this);
>
> -       cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> +               time = cpu_clock(this);
> +       }
>
>         for_each_cpu_wrap(cpu, cpus, target) {
>                 if (!--nr)
> @@ -6181,8 +6181,10 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>                         break;
>         }
>
> -       time = cpu_clock(this) - time;
> -       update_avg(&this_sd->avg_scan_cost, time);
> +       if (sched_feat(SIS_PROP)) {
> +               time = cpu_clock(this) - time;
> +               update_avg(&this_sd->avg_scan_cost, time);
> +       }
>
>         return cpu;
>  }
> --
> 2.26.2
>
Mel Gorman Dec. 8, 2020, 4:30 p.m. UTC | #2
On Tue, Dec 08, 2020 at 05:03:21PM +0100, Vincent Guittot wrote:
> On Tue, 8 Dec 2020 at 16:35, Mel Gorman <mgorman@techsingularity.net> wrote:
> >
> > As noted by Vincent Guittot, avg_scan_costs are calculated for SIS_PROP
> > even if SIS_PROP is disabled. Move the time calculations under a SIS_PROP
> > check and while we are at it, exclude the cost of initialising the CPU
> > mask from the average scan cost.
> >
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> > ---
> >  kernel/sched/fair.c | 14 ++++++++------
> >  1 file changed, 8 insertions(+), 6 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index ac7b34e7372b..5c41875aec23 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6153,6 +6153,8 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
> >         if (!this_sd)
> >                 return -1;
> 
> Just noticed while reviewing the patch that the above related to
> this_sd can also go under sched_feat(SIS_PROP)
> 

Technically yes but I also decided against it. It's a functional difference
depending on whether SIS_PROP is set in the highly unlikely case that
this_sd == NULL. I was also thinking in terms of what happens if SIS_PROP
was disabled and enabled while a search is in progress even if it's very
unlikely. In that case, this_sd would be uninitialised. That might be
impossible in practice depending on how static branching is implemented
but I don't think we should rely on the internals of jump labels and play
it safe. I can move it in if you feel strongly about it but I think the
disable/enable race is enough of a concern to leave it alone.
Aubrey Li Dec. 9, 2020, 5:28 a.m. UTC | #3
On 2020/12/9 0:03, Vincent Guittot wrote:
> On Tue, 8 Dec 2020 at 16:35, Mel Gorman <mgorman@techsingularity.net> wrote:
>>
>> As noted by Vincent Guittot, avg_scan_costs are calculated for SIS_PROP
>> even if SIS_PROP is disabled. Move the time calculations under a SIS_PROP
>> check and while we are at it, exclude the cost of initialising the CPU
>> mask from the average scan cost.
>>
>> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
>> ---
>>  kernel/sched/fair.c | 14 ++++++++------
>>  1 file changed, 8 insertions(+), 6 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index ac7b34e7372b..5c41875aec23 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6153,6 +6153,8 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>>         if (!this_sd)
>>                 return -1;
> 
> Just noticed while reviewing the patch that the above related to
> this_sd can also go under sched_feat(SIS_PROP)
> 
>>
>> +       cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
>> +
>>         if (sched_feat(SIS_PROP)) {
>>                 u64 avg_cost, avg_idle, span_avg;
>>
>> @@ -6168,11 +6170,9 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>>                         nr = div_u64(span_avg, avg_cost);
>>                 else
>>                         nr = 4;
>> -       }
>> -
>> -       time = cpu_clock(this);
>>
>> -       cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
>> +               time = cpu_clock(this);
>> +       }
>>
>>         for_each_cpu_wrap(cpu, cpus, target) {
>>                 if (!--nr)

nr is the key of this throttling mechanism, need to be placed under sched_feat(SIS_PROP) as well.

Thanks,
-Aubrey
Mel Gorman Dec. 9, 2020, 9:05 a.m. UTC | #4
On Wed, Dec 09, 2020 at 01:28:11PM +0800, Li, Aubrey wrote:
> >>                         nr = div_u64(span_avg, avg_cost);
> >>                 else
> >>                         nr = 4;
> >> -       }
> >> -
> >> -       time = cpu_clock(this);
> >>
> >> -       cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> >> +               time = cpu_clock(this);
> >> +       }
> >>
> >>         for_each_cpu_wrap(cpu, cpus, target) {
> >>                 if (!--nr)
> 
> nr is the key of this throttling mechanism, need to be placed under sched_feat(SIS_PROP) as well.
> 

It isn't necessary as nr in initialised to INT_MAX if !SIS_PROP.
Aubrey Li Dec. 9, 2020, 11:07 a.m. UTC | #5
On 2020/12/9 17:05, Mel Gorman wrote:
> On Wed, Dec 09, 2020 at 01:28:11PM +0800, Li, Aubrey wrote:
>>>>                         nr = div_u64(span_avg, avg_cost);
>>>>                 else
>>>>                         nr = 4;
>>>> -       }
>>>> -
>>>> -       time = cpu_clock(this);
>>>>
>>>> -       cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
>>>> +               time = cpu_clock(this);
>>>> +       }
>>>>
>>>>         for_each_cpu_wrap(cpu, cpus, target) {
>>>>                 if (!--nr)
>>
>> nr is the key of this throttling mechanism, need to be placed under sched_feat(SIS_PROP) as well.
>>
> 
> It isn't necessary as nr in initialised to INT_MAX if !SIS_PROP.
>If !SIS_PROP, nr need to -1 then tested in the loop, instead of testing directly.
But with SIS_PROP, need adding a test in the loop.
Since SIS_PROP is default true, I think it's okay to keep the current way.

Thanks,
-Aubrey
Mel Gorman Dec. 9, 2020, 11:33 a.m. UTC | #6
On Wed, Dec 09, 2020 at 07:07:11PM +0800, Li, Aubrey wrote:
> On 2020/12/9 17:05, Mel Gorman wrote:
> > On Wed, Dec 09, 2020 at 01:28:11PM +0800, Li, Aubrey wrote:
> >>>>                         nr = div_u64(span_avg, avg_cost);
> >>>>                 else
> >>>>                         nr = 4;
> >>>> -       }
> >>>> -
> >>>> -       time = cpu_clock(this);
> >>>>
> >>>> -       cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> >>>> +               time = cpu_clock(this);
> >>>> +       }
> >>>>
> >>>>         for_each_cpu_wrap(cpu, cpus, target) {
> >>>>                 if (!--nr)
> >>
> >> nr is the key of this throttling mechanism, need to be placed under sched_feat(SIS_PROP) as well.
> >>
> > 
> > It isn't necessary as nr in initialised to INT_MAX if !SIS_PROP.
> >If !SIS_PROP, nr need to -1 then tested in the loop, instead of testing directly.
> But with SIS_PROP, need adding a test in the loop.
> Since SIS_PROP is default true, I think it's okay to keep the current way.
> 

It's because it's default true and the cost is negligible that I'm leaving
it alone. The branch cost and nr accounting cost is negligible and it
avoids peppering select_idle_cpu() with too many SIS_PROP checks.
Aubrey Li Dec. 10, 2020, 5:18 a.m. UTC | #7
On 2020/12/8 23:34, Mel Gorman wrote:
> As noted by Vincent Guittot, avg_scan_costs are calculated for SIS_PROP
> even if SIS_PROP is disabled. Move the time calculations under a SIS_PROP
> check and while we are at it, exclude the cost of initialising the CPU
> mask from the average scan cost.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  kernel/sched/fair.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ac7b34e7372b..5c41875aec23 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6153,6 +6153,8 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>  	if (!this_sd)
>  		return -1;
>  
> +	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> +
>  	if (sched_feat(SIS_PROP)) {
>  		u64 avg_cost, avg_idle, span_avg;
>  
> @@ -6168,11 +6170,9 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
>  			nr = div_u64(span_avg, avg_cost);
>  		else
>  			nr = 4;
> -	}
> -
> -	time = cpu_clock(this);
>  
> -	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> +		time = cpu_clock(this);
> +	}
>  
>  	for_each_cpu_wrap(cpu, cpus, target) {
>  		if (!--nr)
>			return -1;

I thought about this again and here seems not to be consistent:
- even if nr reduces to 0, shouldn't avg_scan_cost be updated as well before return -1?
- if avg_scan_cost is not updated because nr is throttled, the first 
	time = cpu_clock(this);
  can be optimized. As nr is calculated and we already know which of the weight of cpumask and nr is greater.

Thanks,
-Aubrey
Mel Gorman Dec. 10, 2020, 9:32 a.m. UTC | #8
On Thu, Dec 10, 2020 at 01:18:05PM +0800, Li, Aubrey wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index ac7b34e7372b..5c41875aec23 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6153,6 +6153,8 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
> >  	if (!this_sd)
> >  		return -1;
> >  
> > +	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > +
> >  	if (sched_feat(SIS_PROP)) {
> >  		u64 avg_cost, avg_idle, span_avg;
> >  
> > @@ -6168,11 +6170,9 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
> >  			nr = div_u64(span_avg, avg_cost);
> >  		else
> >  			nr = 4;
> > -	}
> > -
> > -	time = cpu_clock(this);
> >  
> > -	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> > +		time = cpu_clock(this);
> > +	}
> >  
> >  	for_each_cpu_wrap(cpu, cpus, target) {
> >  		if (!--nr)
> >			return -1;
> 
> I thought about this again and here seems not to be consistent:
> - even if nr reduces to 0, shouldn't avg_scan_cost be updated as well before return -1?

You're right, but it's outside the scope
of this patch. I noted that this was a problem in
lore.kernel.org/r/lore.kernel.org/r/20201203141124.7391-8-mgorman@techsingularity.net
It's neither a consistent win or loss to always account for it and so
was dropped for this series to keep the number of controversial patches
to a minimum.

> - if avg_scan_cost is not updated because nr is throttled, the first 
> 	time = cpu_clock(this);
>   can be optimized. As nr is calculated and we already know which of the weight of cpumask and nr is greater.
> 

That is also outside the scope of this patch. To do that, cpumask_weight()
would have to be calculated but it's likely to be a net loss. Even under
light load, nr will be smaller than the domain weight incurring both the
cost of cpumask_weight and the clock read in the common case.
diff mbox series

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ac7b34e7372b..5c41875aec23 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6153,6 +6153,8 @@  static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	if (!this_sd)
 		return -1;
 
+	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
+
 	if (sched_feat(SIS_PROP)) {
 		u64 avg_cost, avg_idle, span_avg;
 
@@ -6168,11 +6170,9 @@  static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 			nr = div_u64(span_avg, avg_cost);
 		else
 			nr = 4;
-	}
-
-	time = cpu_clock(this);
 
-	cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
+		time = cpu_clock(this);
+	}
 
 	for_each_cpu_wrap(cpu, cpus, target) {
 		if (!--nr)
@@ -6181,8 +6181,10 @@  static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 			break;
 	}
 
-	time = cpu_clock(this) - time;
-	update_avg(&this_sd->avg_scan_cost, time);
+	if (sched_feat(SIS_PROP)) {
+		time = cpu_clock(this) - time;
+		update_avg(&this_sd->avg_scan_cost, time);
+	}
 
 	return cpu;
 }