diff mbox series

[04/10] sched/fair: Return an idle cpu if one is found after a failed search for an idle core

Message ID 20201203141124.7391-5-mgorman@techsingularity.net (mailing list archive)
State New, archived
Headers show
Series Reduce time complexity of select_idle_sibling | expand

Commit Message

Mel Gorman Dec. 3, 2020, 2:11 p.m. UTC
select_idle_core is called when SMT is active and there is likely a free
core available. It may find idle CPUs but this information is simply
discarded and the scan starts over again with select_idle_cpu.

This patch caches information on idle CPUs found during the search for
a core and uses one if no core is found. This is a tradeoff. There may
be a slight impact when utilisation is low and an idle core can be
found quickly. It provides improvements as the number of busy CPUs
approaches 50% of the domain size when SMT is enabled.

With tbench on a 2-socket CascadeLake machine, 80 logical CPUs, HT enabled

                          5.10.0-rc6             5.10.0-rc6
                           schedstat          idlecandidate
Hmean     1        500.06 (   0.00%)      505.67 *   1.12%*
Hmean     2        975.90 (   0.00%)      974.06 *  -0.19%*
Hmean     4       1902.95 (   0.00%)     1904.43 *   0.08%*
Hmean     8       3761.73 (   0.00%)     3721.02 *  -1.08%*
Hmean     16      6713.93 (   0.00%)     6769.17 *   0.82%*
Hmean     32     10435.31 (   0.00%)    10312.58 *  -1.18%*
Hmean     64     12325.51 (   0.00%)    13792.01 *  11.90%*
Hmean     128    21225.21 (   0.00%)    20963.44 *  -1.23%*
Hmean     256    20532.83 (   0.00%)    20335.62 *  -0.96%*
Hmean     320    20334.81 (   0.00%)    20147.25 *  -0.92%*

In this particular test, the cost/benefit is marginal except
for 64 which was a point where the machine was over 50% busy
but not fully utilised.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 kernel/sched/fair.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Vincent Guittot Dec. 3, 2020, 4:35 p.m. UTC | #1
On Thu, 3 Dec 2020 at 15:11, Mel Gorman <mgorman@techsingularity.net> wrote:
>
> select_idle_core is called when SMT is active and there is likely a free
> core available. It may find idle CPUs but this information is simply
> discarded and the scan starts over again with select_idle_cpu.
>
> This patch caches information on idle CPUs found during the search for
> a core and uses one if no core is found. This is a tradeoff. There may
> be a slight impact when utilisation is low and an idle core can be
> found quickly. It provides improvements as the number of busy CPUs
> approaches 50% of the domain size when SMT is enabled.
>
> With tbench on a 2-socket CascadeLake machine, 80 logical CPUs, HT enabled
>
>                           5.10.0-rc6             5.10.0-rc6
>                            schedstat          idlecandidate
> Hmean     1        500.06 (   0.00%)      505.67 *   1.12%*
> Hmean     2        975.90 (   0.00%)      974.06 *  -0.19%*
> Hmean     4       1902.95 (   0.00%)     1904.43 *   0.08%*
> Hmean     8       3761.73 (   0.00%)     3721.02 *  -1.08%*
> Hmean     16      6713.93 (   0.00%)     6769.17 *   0.82%*
> Hmean     32     10435.31 (   0.00%)    10312.58 *  -1.18%*
> Hmean     64     12325.51 (   0.00%)    13792.01 *  11.90%*
> Hmean     128    21225.21 (   0.00%)    20963.44 *  -1.23%*
> Hmean     256    20532.83 (   0.00%)    20335.62 *  -0.96%*
> Hmean     320    20334.81 (   0.00%)    20147.25 *  -0.92%*
>
> In this particular test, the cost/benefit is marginal except
> for 64 which was a point where the machine was over 50% busy
> but not fully utilised.
>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  kernel/sched/fair.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index fc48cc99b03d..845bc0cd9158 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6066,6 +6066,7 @@ void __update_idle_core(struct rq *rq)
>   */
>  static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
>  {
> +       int idle_candidate = -1;
>         struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
>         int core, cpu;
>
> @@ -6084,7 +6085,13 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
>                         schedstat_inc(this_rq()->sis_scanned);
>                         if (!available_idle_cpu(cpu)) {
>                                 idle = false;
> -                               break;
> +                               if (idle_candidate != -1)
> +                                       break;


If I get your changes correctly, it will now continue to loop on all
cpus of the smt mask to try to find an idle cpu whereas it was
breaking before as soon as a cpu was not idle. In fact, I thought that
you just wanted to be opportunistic and save a candidate but without
looping more cpus than currently.

With the change above you might end up looping all cpus of llc if
there is only one idle cpu in the llc whereas before we were looping
only 1 cpu per core at most. The bottom change makes sense but the
above on is in some way replacing completely select_idle_cpu and
bypass SIS_PROP and we should avoid that IMO

> +                       }
> +
> +                       if (idle_candidate == -1 &&
> +                           cpumask_test_cpu(cpu, p->cpus_ptr)) {
> +                               idle_candidate = cpu;
>                         }
>                 }
>
> @@ -6099,7 +6106,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
>          */
>         set_idle_cores(target, 0);
>
> -       return -1;
> +       return idle_candidate;
>  }
>
>  /*
> --
> 2.26.2
>
Mel Gorman Dec. 3, 2020, 5:50 p.m. UTC | #2
On Thu, Dec 03, 2020 at 05:35:29PM +0100, Vincent Guittot wrote:
> > index fc48cc99b03d..845bc0cd9158 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6066,6 +6066,7 @@ void __update_idle_core(struct rq *rq)
> >   */
> >  static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
> >  {
> > +       int idle_candidate = -1;
> >         struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
> >         int core, cpu;
> >
> > @@ -6084,7 +6085,13 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
> >                         schedstat_inc(this_rq()->sis_scanned);
> >                         if (!available_idle_cpu(cpu)) {
> >                                 idle = false;
> > -                               break;
> > +                               if (idle_candidate != -1)
> > +                                       break;
> 
> 
> If I get your changes correctly, it will now continue to loop on all
> cpus of the smt mask to try to find an idle cpu whereas it was

That was an oversight, the intent is that the SMT search breaks but
the search for an idle core continues. The patch was taken from a very
different series that unified all the select_idle_* functions as a single
function and I failed to fix it up properly. The unification series
didn't generate good results back 9 months ago when I tried and I never
finished it off. In the current context, it would not make sense to try
a unification again.

> With the change above you might end up looping all cpus of llc if
> there is only one idle cpu in the llc whereas before we were looping
> only 1 cpu per core at most. The bottom change makes sense but the
> above on is in some way replacing completely select_idle_cpu and
> bypass SIS_PROP and we should avoid that IMO
> 

You're right of course, it was never intended to behave like that.

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0a3d338770c4..49b1590e60a9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6084,8 +6084,7 @@ static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
 		for_each_cpu(cpu, cpu_smt_mask(core)) {
 			if (!available_idle_cpu(cpu)) {
 				idle = false;
-				if (idle_candidate != -1)
-					break;
+				break;
 			}
 
 			if (idle_candidate == -1 &&
diff mbox series

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fc48cc99b03d..845bc0cd9158 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6066,6 +6066,7 @@  void __update_idle_core(struct rq *rq)
  */
 static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int target)
 {
+	int idle_candidate = -1;
 	struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
 	int core, cpu;
 
@@ -6084,7 +6085,13 @@  static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
 			schedstat_inc(this_rq()->sis_scanned);
 			if (!available_idle_cpu(cpu)) {
 				idle = false;
-				break;
+				if (idle_candidate != -1)
+					break;
+			}
+
+			if (idle_candidate == -1 &&
+			    cpumask_test_cpu(cpu, p->cpus_ptr)) {
+				idle_candidate = cpu;
 			}
 		}
 
@@ -6099,7 +6106,7 @@  static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
 	 */
 	set_idle_cores(target, 0);
 
-	return -1;
+	return idle_candidate;
 }
 
 /*