diff mbox

[18/24] xen: credit2: soft-affinity awareness fallback_cpu() and cpu_pick()

Message ID 147145438023.25877.4564307455358470316.stgit@Solace.fritz.box (mailing list archive)
State New, archived
Headers show

Commit Message

Dario Faggioli Aug. 17, 2016, 5:19 p.m. UTC
For get_fallback_cpu(), by putting in place the "usual"
two steps (soft affinity step and hard affinity step)
loop. We just move the core logic of the function inside
the body of the loop itself.

For csched2_cpu_pick(), what is important is to find
the runqueue with the least average load. Currently,
we do that by looping on all runqueues and checking,
well, their load. For soft affinity, we want to know
which one is the runqueue with the least load, among
the ones where the vcpu would prefer to be assigned.

We find both the least loaded runqueue among the soft
affinity "friendly" ones, and the overall least loaded
one, in the same pass.

(Also, kill a spurious ';' when defining MAX_LOAD.)

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
---
 xen/common/sched_credit2.c |  136 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 111 insertions(+), 25 deletions(-)

Comments

Anshul Makkar Sept. 1, 2016, 11:08 a.m. UTC | #1
On 17/08/16 18:19, Dario Faggioli wrote:
> For get_fallback_cpu(), by putting in place the "usual"
> two steps (soft affinity step and hard affinity step)
> loop. We just move the core logic of the function inside
> the body of the loop itself.
>
> For csched2_cpu_pick(), what is important is to find
> the runqueue with the least average load. Currently,
> we do that by looping on all runqueues and checking,
> well, their load. For soft affinity, we want to know
> which one is the runqueue with the least load, among
> the ones where the vcpu would prefer to be assigned.
>
> We find both the least loaded runqueue among the soft
> affinity "friendly" ones, and the overall least loaded
> one, in the same pass.
>
> (Also, kill a spurious ';' when defining MAX_LOAD.)
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
> ---
> Cc: George Dunlap <george.dunlap@citrix.com>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> ---
>   xen/common/sched_credit2.c |  136 ++++++++++++++++++++++++++++++++++++--------
>   1 file changed, 111 insertions(+), 25 deletions(-)
>
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 3aef1b4..2d7228a 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -506,34 +506,68 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
>   }
>
>   /*
> - * When a hard affinity change occurs, we may not be able to check some
> - * (any!) of the other runqueues, when looking for the best new processor
> - * for svc (as trylock-s in csched2_cpu_pick() can fail). If that happens, we
> - * pick, in order of decreasing preference:
> - *  - svc's current pcpu;
> - *  - another pcpu from svc's current runq;
> - *  - any cpu.
> + * In csched2_cpu_pick(), it may not be possible to actually look at remote
> + * runqueues (the trylock-s on their spinlocks can fail!). If that happens,
With remote runqueues , do you mean runqs on remote socket ? Can't we 
just read their workload or we can change the locktype to allow reading ?
> + * we pick, in order of decreasing preference:
> + *  1) svc's current pcpu, if it is part of svc's soft affinity;
> + *  2) a pcpu in svc's current runqueue that is also in svc's soft affinity;
svc's current runqueue. Do you mean the runq in which svc is currently 
queued ?
> + *  3) just one valid pcpu from svc's soft affinity;
> + *  4) svc's current pcpu, if it is part of svc's hard affinity;
> + *  5) a pcpu in svc's current runqueue that is also in svc's hard affinity;
> + *  6) just one valid pcpu from svc's hard affinity
> + *
> + * Of course, 1, 2 and 3 makes sense only if svc has a soft affinity. Also
> + * note that at least 6 is guaranteed to _always_ return at least one pcpu.
>    */
>   static int get_fallback_cpu(struct csched2_vcpu *svc)
>   {
>       int cpu;
> +    unsigned int bs;
>
> -    if ( likely(cpumask_test_cpu(svc->vcpu->processor,
> -                                 svc->vcpu->cpu_hard_affinity)) )
> -        return svc->vcpu->processor;
> +    for_each_affinity_balance_step( bs )
> +    {
> +        if ( bs == BALANCE_SOFT_AFFINITY &&
> +             !has_soft_affinity(svc->vcpu, svc->vcpu->cpu_hard_affinity) )
> +            continue;
>
>
Anshul
Dario Faggioli Sept. 5, 2016, 1:26 p.m. UTC | #2
On Thu, 2016-09-01 at 12:08 +0100, anshul makkar wrote:
> On 17/08/16 18:19, Dario Faggioli wrote:
> > 
> > diff --git a/xen/common/sched_credit2.c
> > b/xen/common/sched_credit2.c
> > 
> > @@ -506,34 +506,68 @@ void smt_idle_mask_clear(unsigned int cpu,
> > cpumask_t *mask)
> >   }
> > 
> >   /*
> > - * When a hard affinity change occurs, we may not be able to check
> > some
> > - * (any!) of the other runqueues, when looking for the best new
> > processor
> > - * for svc (as trylock-s in csched2_cpu_pick() can fail). If that
> > happens, we
> > - * pick, in order of decreasing preference:
> > - *  - svc's current pcpu;
> > - *  - another pcpu from svc's current runq;
> > - *  - any cpu.
> > + * In csched2_cpu_pick(), it may not be possible to actually look
> > at remote
> > + * runqueues (the trylock-s on their spinlocks can fail!). If that
> > happens,
> With remote runqueues , do you mean runqs on remote socket ? 
>
I mean runqueues different from the runq the vcpu is currently assigned
to (as per runq_assign()/runq_deassing()).

If you have runqueues configured to be per-socket, yes, it will try to
lock runqueues in which there are pcpus that are on a different socket
wrt svc->vcpu->processor.

> Can't we 
> just read their workload or we can change the locktype to allow
> reading ?
>
Reading without taking the lock would race against the load value being
updated. And updating the load is done by __update_runq_load(), which,
with all it's shifts and maths, by no means is an atomic operation.

So it's not just a matter of risking to read a slightly outdated value,
which, I agree, may not be that bad, it's that we risk reading
something inconsistent. :-/

About "changing the locktype", I guess you mean that we can turn also
the runqueue lock into an rw-lock? If yes, that's indeed interesting,
and I've also thought about it, but, for now, always deferred trying to
actually do that.

It's technically non trivial, as it would involve changing
schedule_data->schedule_lock and all the {v,p}cpu_schedule_lock*()
functions. Also, it's a lock that will almost all the times be taken
for writing, which usually means what you want is a proper spinlock.

So, IMO, before embarking in doing something like that, it'd be good to
figure out how frequently we actually fail to take the remote runqueue
lock, and what's the real impact of having to deal the consequence of
that.

I'm not saying it's not worth a try, but I'm saying that's it's
something at high risk of being a lot of work for a very small gain,
and that there are more important things to focus on.

> > + * we pick, in order of decreasing preference:
> > + *  1) svc's current pcpu, if it is part of svc's soft affinity;
> > + *  2) a pcpu in svc's current runqueue that is also in svc's soft
> > affinity;
> svc's current runqueue. Do you mean the runq in which svc is
> currently 
> queued ?
>
I mean the runqueue to which svc is currently assigned (again, as per
runq_assign()/runq_deassing()), which in turns mean that, if svc is
queued in a runqueue, it's queues there (so, I guess the short answer
to your question is "yes" :-D).

Regards,
Dario
Anshul Makkar Sept. 7, 2016, 12:52 p.m. UTC | #3
On 05/09/16 14:26, Dario Faggioli wrote:
> On Thu, 2016-09-01 at 12:08 +0100, anshul makkar wrote:
>> On 17/08/16 18:19, Dario Faggioli wrote:
>
>> Can't we
>> just read their workload or we can change the locktype to allow
>> reading ?
>>
> Reading without taking the lock would race against the load value being
> updated. And updating the load is done by __update_runq_load(), which,
> with all it's shifts and maths, by no means is an atomic operation.
>
> So it's not just a matter of risking to read a slightly outdated value,
> which, I agree, may not be that bad, it's that we risk reading
> something inconsistent. :-/
>
Ok. Got it and agree.
> About "changing the locktype", I guess you mean that we can turn also
> the runqueue lock into an rw-lock? If yes, that's indeed interesting,
> and I've also thought about it, but, for now, always deferred trying to
> actually do that.
Yes.
>
> It's technically non trivial, as it would involve changing
> schedule_data->schedule_lock and all the {v,p}cpu_schedule_lock*()
> functions. Also, it's a lock that will almost all the times be taken
> for writing, which usually means what you want is a proper spinlock.
>
> So, IMO, before embarking in doing something like that, it'd be good to
> figure out how frequently we actually fail to take the remote runqueue
> lock, and what's the real impact of having to deal the consequence of
> that.
>
Ok. Lets discuss on that to finalize the approach.
> I'm not saying it's not worth a try, but I'm saying that's it's
> something at high risk of being a lot of work for a very small gain,
> and that there are more important things to focus on.
>
>>> + * we pick, in order of decreasing preference:
>>> + *  1) svc's current pcpu, if it is part of svc's soft affinity;
>>> + *  2) a pcpu in svc's current runqueue that is also in svc's soft
>>> affinity;
>> svc's current runqueue. Do you mean the runq in which svc is
>> currently
>> queued ?
>>
> I mean the runqueue to which svc is currently assigned (again, as per
> runq_assign()/runq_deassing()), which in turns mean that, if svc is
> queued in a runqueue, it's queues there (so, I guess the short answer
> to your question is "yes" :-D).
>
Ok.
> Regards,
> Dario
>
Anshul
George Dunlap Sept. 29, 2016, 11:11 a.m. UTC | #4
On 17/08/16 18:19, Dario Faggioli wrote:
> For get_fallback_cpu(), by putting in place the "usual"
> two steps (soft affinity step and hard affinity step)
> loop. We just move the core logic of the function inside
> the body of the loop itself.
> 
> For csched2_cpu_pick(), what is important is to find
> the runqueue with the least average load. Currently,
> we do that by looping on all runqueues and checking,
> well, their load. For soft affinity, we want to know
> which one is the runqueue with the least load, among
> the ones where the vcpu would prefer to be assigned.
> 
> We find both the least loaded runqueue among the soft
> affinity "friendly" ones, and the overall least loaded
> one, in the same pass.
> 
> (Also, kill a spurious ';' when defining MAX_LOAD.)
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>

Looks good:

Reviewed-by: George Dunlap <george.dunlap@citrix.com>
diff mbox

Patch

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 3aef1b4..2d7228a 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -506,34 +506,68 @@  void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
 }
 
 /*
- * When a hard affinity change occurs, we may not be able to check some
- * (any!) of the other runqueues, when looking for the best new processor
- * for svc (as trylock-s in csched2_cpu_pick() can fail). If that happens, we
- * pick, in order of decreasing preference:
- *  - svc's current pcpu;
- *  - another pcpu from svc's current runq;
- *  - any cpu.
+ * In csched2_cpu_pick(), it may not be possible to actually look at remote
+ * runqueues (the trylock-s on their spinlocks can fail!). If that happens,
+ * we pick, in order of decreasing preference:
+ *  1) svc's current pcpu, if it is part of svc's soft affinity;
+ *  2) a pcpu in svc's current runqueue that is also in svc's soft affinity;
+ *  3) just one valid pcpu from svc's soft affinity;
+ *  4) svc's current pcpu, if it is part of svc's hard affinity;
+ *  5) a pcpu in svc's current runqueue that is also in svc's hard affinity;
+ *  6) just one valid pcpu from svc's hard affinity
+ *
+ * Of course, 1, 2 and 3 makes sense only if svc has a soft affinity. Also
+ * note that at least 6 is guaranteed to _always_ return at least one pcpu.
  */
 static int get_fallback_cpu(struct csched2_vcpu *svc)
 {
     int cpu;
+    unsigned int bs;
 
-    if ( likely(cpumask_test_cpu(svc->vcpu->processor,
-                                 svc->vcpu->cpu_hard_affinity)) )
-        return svc->vcpu->processor;
+    for_each_affinity_balance_step( bs )
+    {
+        if ( bs == BALANCE_SOFT_AFFINITY &&
+             !has_soft_affinity(svc->vcpu, svc->vcpu->cpu_hard_affinity) )
+            continue;
 
-    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
-                &svc->rqd->active);
-    cpu = cpumask_first(cpumask_scratch);
-    if ( likely(cpu < nr_cpu_ids) )
-        return cpu;
+        affinity_balance_cpumask(svc->vcpu, bs, cpumask_scratch);
 
-    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
-                cpupool_domain_cpumask(svc->vcpu->domain));
+        /*
+         * This is cases 1 or 4 (depending on bs): if v->processor is (still)
+         * in our affinity, go for it, for cache betterness.
+         */
+        if ( likely(cpumask_test_cpu(svc->vcpu->processor,
+                                     cpumask_scratch)) )
+            return svc->vcpu->processor;
 
-    ASSERT(!cpumask_empty(cpumask_scratch));
+        /*
+         * This is cases 2 or 5 (depending on bsp): v->processor isn't there
+         * any longer, check if we at least can stay in our current runq.
+         */
+        cpumask_and(cpumask_scratch, cpumask_scratch,
+                    &svc->rqd->active);
+        cpu = cpumask_first(cpumask_scratch);
+        if ( likely(cpu < nr_cpu_ids) )
+            return cpu;
 
-    return cpumask_first(cpumask_scratch);
+        /*
+         * This is cases 3 or 6 (depending on bs): last stand, just one valid
+         * pcpu from our soft affinity, if we have one and if there's any. In
+         * fact, if we are doing soft-affinity, it is possible that we fail,
+         * which means we stay in the loop and look for hard affinity. OTOH,
+         * if we are at the hard-affinity balancing step, it's guaranteed that
+         * there is at least one valid cpu, and therefore we are sure that we
+         * return it, and never really exit the loop.
+         */
+        cpumask_and(cpumask_scratch, cpumask_scratch,
+                    cpupool_domain_cpumask(svc->vcpu->domain));
+        ASSERT(!cpumask_empty(cpumask_scratch) || bs == BALANCE_SOFT_AFFINITY);
+        cpu = cpumask_first(cpumask_scratch);
+        if ( likely(cpu < nr_cpu_ids) )
+            return cpu;
+    }
+    BUG_ON(1);
+    return -1;
 }
 
 /*
@@ -1561,14 +1595,15 @@  csched2_context_saved(const struct scheduler *ops, struct vcpu *vc)
     vcpu_schedule_unlock_irq(lock, vc);
 }
 
-#define MAX_LOAD (STIME_MAX);
+#define MAX_LOAD (STIME_MAX)
 static int
 csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
 {
     struct csched2_private *prv = CSCHED2_PRIV(ops);
-    int i, min_rqi = -1, new_cpu;
+    int i, min_rqi = -1, min_s_rqi = -1, new_cpu;
     struct csched2_vcpu *svc = CSCHED2_VCPU(vc);
-    s_time_t min_avgload = MAX_LOAD;
+    s_time_t min_avgload = MAX_LOAD, min_s_avgload = MAX_LOAD;
+    bool_t has_soft;
 
     ASSERT(!cpumask_empty(&prv->active_queues));
 
@@ -1613,6 +1648,12 @@  csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         }
         else
         {
+            /*
+             * If we've been asked to move to migrate_rqd, we should just do
+             * that, which we actually do by returning one cpu from that runq.
+             * There is no need to take care of soft affinity, as that will
+             * happen in runq_tickle().
+             */
             cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
                         &svc->migrate_rqd->active);
             new_cpu = cpumask_any(cpumask_scratch);
@@ -1622,7 +1663,21 @@  csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         /* Fall-through to normal cpu pick */
     }
 
-    /* Find the runqueue with the lowest average load. */
+    has_soft = has_soft_affinity(vc, vc->cpu_hard_affinity);
+    if ( has_soft )
+        affinity_balance_cpumask(vc, BALANCE_SOFT_AFFINITY, cpumask_scratch);
+
+    /*
+     * What we want is:
+     *  - if we have soft affinity, the runqueue with the lowest average
+     *    load, among the ones that contain cpus in our soft affinity; this
+     *    represents the best runq on which we would want to run.
+     *  - the runqueue with the lowest average load among the ones that
+     *    contains cpus in our hard affinity; this represent the best runq
+     *    on which we can run.
+     *
+     * Find both runqueues in one pass.
+     */
     for_each_cpu(i, &prv->active_queues)
     {
         struct csched2_runqueue_data *rqd;
@@ -1656,6 +1711,13 @@  csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
             spin_unlock(&rqd->lock);
         }
 
+        if ( has_soft &&
+             rqd_avgload < min_s_avgload &&
+             cpumask_intersects(cpumask_scratch, &rqd->active) )
+        {
+            min_s_avgload = rqd_avgload;
+            min_s_rqi = i;
+        }
         if ( rqd_avgload < min_avgload )
         {
             min_avgload = rqd_avgload;
@@ -1663,9 +1725,33 @@  csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         }
     }
 
-    /* We didn't find anyone (most likely because of spinlock contention). */
-    if ( min_rqi == -1 )
+    if ( has_soft && min_s_rqi != -1 )
+    {
+        /*
+         * We have soft affinity, and we have a candidate runq, so go for it.
+         *
+         * Note that, since has_soft is true, cpumask_scratch holds the proper
+         * soft-affinity mask.
+         */
+        cpumask_and(cpumask_scratch, cpumask_scratch,
+                    &prv->rqd[min_s_rqi].active);
+    }
+    else if ( min_rqi != -1 )
     {
+        /*
+         * Either we don't have soft affinity, or we do, but we did not find
+         * any suitable runq. But we did find one when considering hard
+         * affinity, so go for it.
+         */
+        cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
+                    &prv->rqd[min_rqi].active);
+    }
+    else
+    {
+        /*
+         * We didn't find anyone at all (most likely because of spinlock
+         * contention).
+         */
         new_cpu = get_fallback_cpu(svc);
         min_rqi = c2r(ops, new_cpu);
         min_avgload = prv->rqd[min_rqi].b_avgload;