diff mbox

[19/19] xen: credit2: use cpumask_first instead of cpumask_any when choosing cpu

Message ID 146620521750.29766.12008171749820205436.stgit@Solace.fritz.box (mailing list archive)
State New, archived
Headers show

Commit Message

Dario Faggioli June 17, 2016, 11:13 p.m. UTC
because it is cheaper, and there is no much point in
randomizing which cpu gets selected anyway, as such
choice will be overridden shortly after, in runq_tickle().

If we really feel the need (e.g., we prove it worth with
benchmarking), we can record the last cpu which was used
by csched2_cpu_pick() and migrate() in a per-runq variable,
and then use cpumask_cycle()... but this really does not
look necessary.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
Cc: David Vrabel <david.vrabel@citrix.com>
---
 xen/common/sched_credit2.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Jan Beulich June 20, 2016, 8:30 a.m. UTC | #1
>>> On 18.06.16 at 01:13, <dario.faggioli@citrix.com> wrote:
> because it is cheaper, and there is no much point in
> randomizing which cpu gets selected anyway, as such
> choice will be overridden shortly after, in runq_tickle().

If it will always be overridden, why fill it in the first place? And if there
are cases where it won't get overridden, you're re-introducing a
preference towards lower CPU numbers, which I think is not a good
idea. Can the code perhaps be rearranged to avoid the cpumask_any()
when another value will subsequently get stored anyway?

Jan
Dario Faggioli June 20, 2016, 11:28 a.m. UTC | #2
On Mon, 2016-06-20 at 02:30 -0600, Jan Beulich wrote:
> > 
> > > 
> > > > 
> > > > On 18.06.16 at 01:13, <dario.faggioli@citrix.com> wrote:
> > because it is cheaper, and there is no much point in
> > randomizing which cpu gets selected anyway, as such
> > choice will be overridden shortly after, in runq_tickle().
> If it will always be overridden, why fill it in the first place? And
> if there
> are cases where it won't get overridden, you're re-introducing a
> preference towards lower CPU numbers, which I think is not a good
> idea. 
>
It will never be used directly as the actual target CPU --at least
according to my analysis of the code.

runq_tickle() will consider it, but only as an hint, and will actually
use it only if it satisfies all the other load balancing conditions
(being part of a fully idle core, being idle, being in hard affinity,
being in preemptable, etc).

As I said in the rest of the changelog, if we really fear, or start to
observe, that lower CPU numbers are being preferred, we can add
countermeasures (stashing the CPU we chose last time and use
cpumask_cycle(), as we do in Credit1, for another thing).

My feeling is that they won't, as the load balancing logic in
runq_tickle() will make that unlikely enough.

> Can the code perhaps be rearranged to avoid the cpumask_any()
> when another value will subsequently get stored anyway?
> 
I thought about it, and although for sure there are alternatives, none
of the ones I could come up with were looking better than the present
situation.

Fact is, when the pick_cpu() hook is called in vcpu_migrate(), what
vcpu_migrate() wants back from it is indeed a CPU number. Then (through
vcpu_move_locked()) it either just sets v->processor equal to such CPU,
or call the migrate() hook.

On Credit1, the CPU returned by pick_cpu() is indeed the CPU where we
want the vcpu to run, and setting v->processor to that is all we need
to do for migrating a vcpu (and in fact, migrate() is not even
defined).

On Credit2, we (ab?)use pick_cpu() to actually select not really a CPU,
but a runqueue. The fact that we return a random CPU from the runqueue
we decided we want is the (pretty clever, IMO) way with which we avoid
having to teach schedule.c about runqueues. Then, in migrate() (which
is defined for Credit2), we do the other way round: we hand a CPU to
Credit2 and it will translate that back in a runqueue (the runqueue
where that CPU sits).

Archaeology confirms that the migrate() hook was introduced (in
ff38d3faa7d "credit2: add a callback to migrate to a new cpu")
specifically for Credit2.

The main difference, wrt all the above, between Credit1 and Credit2 is
that in Credit1 there is one runqueue per each CPU, in Credit2, more
CPUs use the same runqueue. The current pick_cpu()/migrate() approach
lets both the schedulers, despite this difference, achieve what they
want. Note also how such an approach targets the simplest case (<<hey,
sched_*.c, give me a CPU!>>), which is good when reading and wanting to
understand schedule.c. It's then responsibility of any scheduler that
wants to play fancy tricks --like Credit2 does with runqueues-- to take
care of that, without making anyone else paying the price in terms of
complexity.

Every alternative I thought to, always involved making things less
straightforward in schedule.c, which is something I'd rather avoid. If
anyone has better alternatives, I'm all ears. :-)

I certainly can add more comments, in sched_credit2.c, for explaining
the situation.

Thanks and Regards,
Dario
David Vrabel June 21, 2016, 10:42 a.m. UTC | #3
On 18/06/16 00:13, Dario Faggioli wrote:
> because it is cheaper, and there is no much point in
> randomizing which cpu gets selected anyway, as such
> choice will be overridden shortly after, in runq_tickle().
> 
> If we really feel the need (e.g., we prove it worth with
> benchmarking), we can record the last cpu which was used
> by csched2_cpu_pick() and migrate() in a per-runq variable,
> and then use cpumask_cycle()... but this really does not
> look necessary.

Isn't this backwards?  Surely you should demonstrate that this change is
beneficial before proposing it?

I don't think any performance related change should be accepted without
experimental evidence that it makes something better, especially if it
looks like it might have negative consequences (e.g., by favouring low
cpus).

David
Dario Faggioli July 7, 2016, 4:55 p.m. UTC | #4
On Tue, 2016-06-21 at 11:42 +0100, David Vrabel wrote:
> On 18/06/16 00:13, Dario Faggioli wrote:
> > 
> > because it is cheaper, and there is no much point in
> > randomizing which cpu gets selected anyway, as such
> > choice will be overridden shortly after, in runq_tickle().
> > 
> > If we really feel the need (e.g., we prove it worth with
> > benchmarking), we can record the last cpu which was used
> > by csched2_cpu_pick() and migrate() in a per-runq variable,
> > and then use cpumask_cycle()... but this really does not
> > look necessary.
> Isn't this backwards?  Surely you should demonstrate that this change
> is
> beneficial before proposing it?
> 
Right. I think it's my fault having presented things this way.

This patch get rid of something that is pure overhead, and getting rid
of overhead is, in general, a good thing.

There is only one possible situation under which we may actually end up
favouring lower pCPU IDs, and it is unlikely enough that it is IMO, of
no concern.

But in any case, let's just drop this patch. I'm rerunning the
benchmarks anyway, I'll consider doing a set of runs with and without
this patch, and check if it does make any difference.

Thanks and Regards,
Dario
diff mbox

Patch

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index a8b3a85..afd432e 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1545,7 +1545,7 @@  csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         {
             cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
                         &svc->migrate_rqd->active);
-            new_cpu = cpumask_any(cpumask_scratch);
+            new_cpu = cpumask_first(cpumask_scratch);
             if ( new_cpu < nr_cpu_ids )
                 goto out_up;
         }
@@ -1604,7 +1604,7 @@  csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
 
     cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
                 &prv->rqd[min_rqi].active);
-    new_cpu = cpumask_any(cpumask_scratch);
+    new_cpu = cpumask_first(cpumask_scratch);
     BUG_ON(new_cpu >= nr_cpu_ids);
 
  out_up:
@@ -1718,7 +1718,7 @@  static void migrate(const struct scheduler *ops,
 
         cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
                     &trqd->active);
-        svc->vcpu->processor = cpumask_any(cpumask_scratch);
+        svc->vcpu->processor = cpumask_first(cpumask_scratch);
         ASSERT(svc->vcpu->processor < nr_cpu_ids);
 
         __runq_assign(svc, trqd);