diff mbox

[v2,3/3] xen: Remove buggy initial placement algorithm

Message ID 1469460493-18842-3-git-send-email-george.dunlap@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

George Dunlap July 25, 2016, 3:28 p.m. UTC
The initial placement algorithm sometimes picks cpus outside of the
mask it's given, does a lot of unnecessary bitmasking, does its own
separate load calculation, and completely ignores vcpu hard and soft
affinities.  Just get rid of it and rely on the schedulers to do
initial placement.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
v2:
- Use cpumask_any() to avoid a bias towards low pcpus on schedulers
that prefer to leave the cpu where it is.

The core problem with default_vcpu0_location() is that it chooses its
initial cpu based on the sibling of pcpu 0, not the first available
sibling in the online mask; so if pcpu 1 ends up being less "busy"
than all the cpus in the pool, then it ends up being chosen even
though it's not in the pool.

Fixing the algorithm would involve starting with the sibling map of
cpumask_first(online) rather than 0, and then having all sibling
checks not only test that the result of cpumask_next() < nr_cpu_ids,
but that the result is in online.

Additionally, as far as I can tell, the cpumask_test_cpu(i,
&cpu_exclude_map) at the top of the for_each_cpu() loop can never
return false; and this both this test and the cpumask_or() are
unnecessary and should be removed.

CC: Dario Faggioli <dario.faggioli@citrix.com>
CC: Anshul Makkar <anshul.makkar@citrix.com>
CC: Meng Xu <mengxu@cis.upenn.edu>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Tim Deegan <tim@xen.org>
CC: Wei Liu <wei.liu2@citrix.com>
---
 xen/common/domctl.c | 50 +-------------------------------------------------
 1 file changed, 1 insertion(+), 49 deletions(-)

Comments

Dario Faggioli July 26, 2016, 9:14 a.m. UTC | #1
On Mon, 2016-07-25 at 16:28 +0100, George Dunlap wrote:
> The initial placement algorithm sometimes picks cpus outside of the
> mask it's given, does a lot of unnecessary bitmasking, does its own
> separate load calculation, and completely ignores vcpu hard and soft
> affinities.  Just get rid of it and rely on the schedulers to do
> initial placement.
> 
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>

Regards,
Dario
Andrew Cooper July 26, 2016, 9:22 a.m. UTC | #2
On 25/07/16 16:28, George Dunlap wrote:
> The initial placement algorithm sometimes picks cpus outside of the
> mask it's given, does a lot of unnecessary bitmasking, does its own
> separate load calculation, and completely ignores vcpu hard and soft
> affinities.  Just get rid of it and rely on the schedulers to do
> initial placement.
>
> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
> ---
> v2:
> - Use cpumask_any() to avoid a bias towards low pcpus on schedulers
> that prefer to leave the cpu where it is.
>
> The core problem with default_vcpu0_location() is that it chooses its
> initial cpu based on the sibling of pcpu 0, not the first available
> sibling in the online mask; so if pcpu 1 ends up being less "busy"
> than all the cpus in the pool, then it ends up being chosen even
> though it's not in the pool.
>
> Fixing the algorithm would involve starting with the sibling map of
> cpumask_first(online) rather than 0, and then having all sibling
> checks not only test that the result of cpumask_next() < nr_cpu_ids,
> but that the result is in online.
>
> Additionally, as far as I can tell, the cpumask_test_cpu(i,
> &cpu_exclude_map) at the top of the for_each_cpu() loop can never
> return false; and this both this test and the cpumask_or() are
> unnecessary and should be removed.
>
> CC: Dario Faggioli <dario.faggioli@citrix.com>
> CC: Anshul Makkar <anshul.makkar@citrix.com>
> CC: Meng Xu <mengxu@cis.upenn.edu>
> CC: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Jan Beulich <jbeulich@suse.com>
> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Tim Deegan <tim@xen.org>
> CC: Wei Liu <wei.liu2@citrix.com>

Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> from a REST point of
view.
diff mbox

Patch

diff --git a/xen/common/domctl.c b/xen/common/domctl.c
index b784e6c..87640b6 100644
--- a/xen/common/domctl.c
+++ b/xen/common/domctl.c
@@ -217,54 +217,6 @@  void getdomaininfo(struct domain *d, struct xen_domctl_getdomaininfo *info)
     memcpy(info->handle, d->handle, sizeof(xen_domain_handle_t));
 }
 
-static unsigned int default_vcpu0_location(cpumask_t *online)
-{
-    struct domain *d;
-    struct vcpu   *v;
-    unsigned int   i, cpu, nr_cpus, *cnt;
-    cpumask_t      cpu_exclude_map;
-
-    /* Do an initial CPU placement. Pick the least-populated CPU. */
-    nr_cpus = cpumask_last(&cpu_online_map) + 1;
-    cnt = xzalloc_array(unsigned int, nr_cpus);
-    if ( cnt )
-    {
-        rcu_read_lock(&domlist_read_lock);
-        for_each_domain ( d )
-            for_each_vcpu ( d, v )
-                if ( !(v->pause_flags & VPF_down)
-                     && ((cpu = v->processor) < nr_cpus) )
-                    cnt[cpu]++;
-        rcu_read_unlock(&domlist_read_lock);
-    }
-
-    /*
-     * If we're on a HT system, we only auto-allocate to a non-primary HT. We
-     * favour high numbered CPUs in the event of a tie.
-     */
-    cpumask_copy(&cpu_exclude_map, per_cpu(cpu_sibling_mask, 0));
-    cpu = cpumask_first(&cpu_exclude_map);
-    i = cpumask_next(cpu, &cpu_exclude_map);
-    if ( i < nr_cpu_ids )
-        cpu = i;
-    for_each_cpu(i, online)
-    {
-        if ( cpumask_test_cpu(i, &cpu_exclude_map) )
-            continue;
-        if ( (i == cpumask_first(per_cpu(cpu_sibling_mask, i))) &&
-             (cpumask_next(i, per_cpu(cpu_sibling_mask, i)) < nr_cpu_ids) )
-            continue;
-        cpumask_or(&cpu_exclude_map, &cpu_exclude_map,
-                   per_cpu(cpu_sibling_mask, i));
-        if ( !cnt || cnt[i] <= cnt[cpu] )
-            cpu = i;
-    }
-
-    xfree(cnt);
-
-    return cpu;
-}
-
 bool_t domctl_lock_acquire(void)
 {
     /*
@@ -691,7 +643,7 @@  long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
                 continue;
 
             cpu = (i == 0) ?
-                default_vcpu0_location(online) :
+                cpumask_any(online) :
                 cpumask_cycle(d->vcpu[i-1]->processor, online);
 
             if ( alloc_vcpu(d, i, cpu) == NULL )