From patchwork Wed Aug 17 17:19:40 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dario Faggioli <dario.faggioli@citrix.com>
X-Patchwork-Id: 9286237
Return-Path: <xen-devel-bounces@lists.xen.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	2684860839 for <patchwork-xen-devel@patchwork.kernel.org>;
	Wed, 17 Aug 2016 17:22:14 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 156BB294A2
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Wed, 17 Aug 2016 17:22:14 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 0A043294C0; Wed, 17 Aug 2016 17:22:14 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 3636C294A2
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Wed, 17 Aug 2016 17:22:13 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <xen-devel-bounces@lists.xen.org>)
	id 1ba4VF-0000wq-Ad; Wed, 17 Aug 2016 17:19:45 +0000
Received: from mail6.bemta6.messagelabs.com ([193.109.254.103])
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <raistlin.df@gmail.com>) id 1ba4VE-0000vk-9i
	for xen-devel@lists.xenproject.org; Wed, 17 Aug 2016 17:19:44 +0000
Received: from [193.109.254.147] by server-5.bemta-6.messagelabs.com id
	46/8D-29563-FAC94B75; Wed, 17 Aug 2016 17:19:43 +0000
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrHIsWRWlGSWpSXmKPExsXiVRvkortuzpZ
	wg6M39Cy+b5nM5MDocfjDFZYAxijWzLyk/IoE1oyvN7YzFmx1rTj19AZLA+Njoy5GTg4hgemM
	EgumC4DYLAJrWCVeTbXoYuTikBC4xCpxdfpuRpCEhECMxK/Zh1kh7EqJTds3sUI0q0jc3L6KC
	aRBSGAek0TX+33MIAlhAT2JI0d/sEPY0RIn++6D2WwCBhJvduwFaxYRUJK4t2oyWDOzQBOjxO
	OdzSwQZ6hKnNy9Gmwzr4C3xOXZh8GGcgr4SNxa/ZsFYrO3xOHJPWwgtqiAnMTKyy2sEPWCEid
	nPgGq4QAaqimxfpc+SJhZQF5i+9s5zBMYRWYhqZqFUDULSdUCRuZVjBrFqUVlqUW6RkZ6SUWZ
	6RkluYmZObqGBmZ6uanFxYnpqTmJScV6yfm5mxiB4c8ABDsY18wPPMQoycGkJMp7p3pLuBBfU
	n5KZUZicUZ8UWlOavEhRg0ODoEJZ+dOZ5JiycvPS1WS4O2ZDVQnWJSanlqRlpkDjFCYUgkOHi
	UR3p0gad7igsTc4sx0iNQpRl2OLVPvrWUSApshJc6bAlIkAFKUUZoHNwKWLC4xykoJ8zICHSj
	EU5BalJtZgir/ilGcg1FJmHcZyBSezLwSuE2vgI5gAjqClx/siJJEhJRUA2NOx4kEkRsbl9/n
	sL3+j/Fz4KHiuo5toVMNrzsxeomnWJ/k29Jx4oD2RPUG7dzCFUd6tgiwHp3k2MNacddi35mfS
	lLBUR5RC++65KZ5b1xXWJ2xwC9t4rlvly55776llzz5+jmHG6HO//UVraZ9+z7LLHGjwFUZ/t
	vRbQ2VtmEFHl9POi7n+a3EUpyRaKjFXFScCABV4rdyEQMAAA==
X-Env-Sender: raistlin.df@gmail.com
X-Msg-Ref: server-5.tower-27.messagelabs.com!1471454382!54476782!1
X-Originating-IP: [74.125.82.68]
X-SpamReason: No, hits=0.2 required=7.0 tests=RCVD_ILLEGAL_IP
X-StarScan-Received: 
X-StarScan-Version: 8.84; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 25313 invoked from network); 17 Aug 2016 17:19:42 -0000
Received: from mail-wm0-f68.google.com (HELO mail-wm0-f68.google.com)
	(74.125.82.68)
	by server-5.tower-27.messagelabs.com with AES128-GCM-SHA256 encrypted
	SMTP; 17 Aug 2016 17:19:42 -0000
Received: by mail-wm0-f68.google.com with SMTP id i5so26292017wmg.2
	for <xen-devel@lists.xenproject.org>;
	Wed, 17 Aug 2016 10:19:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=sender:subject:from:to:cc:date:message-id:in-reply-to:references
	:user-agent:mime-version:content-transfer-encoding;
	bh=+XGNWdr2JxVyfjG2yJk1p9a0S9fUJzCCGsslEJ3nbHI=;
	b=rOYBfNO1gjuN82UFCXdX7brn34savZQ6WFXLiHce6TTy75KzPXnEM4UowO9KHBsGgv
	yMMwdQCJVUJpb7AUeGWFROnDqDubYIiNtKga5ZGWfU8fLIdh2XMgh7WvohzArLq0TMd3
	RLtdF5DlvF6ivBRLP+H2vEubc3kLJwa7o5ECF0YaD3NVB1tnNpsV8VnBcTf/XcWTE9XG
	PkSibAnyGlkn0cEE207PqtTShXS9o+7wM129Z1IAPeFbn3IzIRni2O9kPuk5Q8qL8AH3
	TV++cI86p6oHNGNAke5fKPbVlqTwizDtGb6b6TRbcINf/kA8rINDTSrA20GStl9OuCSN
	X3Mw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:sender:subject:from:to:cc:date:message-id
	:in-reply-to:references:user-agent:mime-version
	:content-transfer-encoding;
	bh=+XGNWdr2JxVyfjG2yJk1p9a0S9fUJzCCGsslEJ3nbHI=;
	b=lTIEOISpJgSObOHRrlsXzCFCLUigy95KTmYxhvaspfVWJuJlIvQu7e9Fwi6cubnpCU
	nzKCVi25xGIqfCy6dSffZsdItVbe6zsjCL62st4oHsX0y91ONrlS560R0MQqhaT/kaB/
	Iwva2HG4XwF1R8uBD4ek7K22tf6Hmc601AWa4SSkjyCkb4WerjgfDNnGfYrPbUmz5W8/
	ol6K3jMl+di/fyppAscdN2slnfu01vuwRhuMF24PwQF2QRhxadvVRKBAvm6e0AdYc4AX
	ajNN9Rwa39q7YuqYNu1prLznvP5IEzDECtZoGsHW2ZVKVqH8C7+KMgxp+ghFYIQUXeaY
	dvWA==
X-Gm-Message-State: 
 AEkoouvEb/eYtEjUI+uuBVGmwzH9RIuBNVaWPOQy1FKmmYc41eHphDwD3iszFVEHEz+nRA==
X-Received: by 10.194.153.5 with SMTP id vc5mr44879358wjb.75.1471454382165;
	Wed, 17 Aug 2016 10:19:42 -0700 (PDT)
Received: from Solace.fritz.box (net-2-32-14-104.cust.vodafonedsl.it.
	[2.32.14.104]) by smtp.gmail.com with ESMTPSA id
	r67sm27559897wmb.14.2016.08.17.10.19.40
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Wed, 17 Aug 2016 10:19:41 -0700 (PDT)
From: Dario Faggioli <dario.faggioli@citrix.com>
To: xen-devel@lists.xenproject.org
Date: Wed, 17 Aug 2016 19:19:40 +0200
Message-ID: <147145438023.25877.4564307455358470316.stgit@Solace.fritz.box>
In-Reply-To: <147145358844.25877.7490417583264534196.stgit@Solace.fritz.box>
References: <147145358844.25877.7490417583264534196.stgit@Solace.fritz.box>
User-Agent: StGit/0.17.1-dirty
MIME-Version: 1.0
Cc: Anshul Makkar <anshul.makkar@citrix.com>,
	"Justin T. Weaver" <jtweaver@hawaii.edu>,
	George Dunlap <george.dunlap@citrix.com>
Subject: [Xen-devel] [PATCH 18/24] xen: credit2: soft-affinity awareness
	fallback_cpu() and cpu_pick()
X-BeenThere: xen-devel@lists.xen.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xen.org>
List-Unsubscribe: <https://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <https://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
X-Virus-Scanned: ClamAV using ClamSMTP

For get_fallback_cpu(), by putting in place the "usual"
two steps (soft affinity step and hard affinity step)
loop. We just move the core logic of the function inside
the body of the loop itself.

For csched2_cpu_pick(), what is important is to find
the runqueue with the least average load. Currently,
we do that by looping on all runqueues and checking,
well, their load. For soft affinity, we want to know
which one is the runqueue with the least load, among
the ones where the vcpu would prefer to be assigned.

We find both the least loaded runqueue among the soft
affinity "friendly" ones, and the overall least loaded
one, in the same pass.

(Also, kill a spurious ';' when defining MAX_LOAD.)

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
---
 xen/common/sched_credit2.c |  136 ++++++++++++++++++++++++++++++++++++--------
 1 file changed, 111 insertions(+), 25 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 3aef1b4..2d7228a 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -506,34 +506,68 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
 }
 
 /*
- * When a hard affinity change occurs, we may not be able to check some
- * (any!) of the other runqueues, when looking for the best new processor
- * for svc (as trylock-s in csched2_cpu_pick() can fail). If that happens, we
- * pick, in order of decreasing preference:
- *  - svc's current pcpu;
- *  - another pcpu from svc's current runq;
- *  - any cpu.
+ * In csched2_cpu_pick(), it may not be possible to actually look at remote
+ * runqueues (the trylock-s on their spinlocks can fail!). If that happens,
+ * we pick, in order of decreasing preference:
+ *  1) svc's current pcpu, if it is part of svc's soft affinity;
+ *  2) a pcpu in svc's current runqueue that is also in svc's soft affinity;
+ *  3) just one valid pcpu from svc's soft affinity;
+ *  4) svc's current pcpu, if it is part of svc's hard affinity;
+ *  5) a pcpu in svc's current runqueue that is also in svc's hard affinity;
+ *  6) just one valid pcpu from svc's hard affinity
+ *
+ * Of course, 1, 2 and 3 makes sense only if svc has a soft affinity. Also
+ * note that at least 6 is guaranteed to _always_ return at least one pcpu.
  */
 static int get_fallback_cpu(struct csched2_vcpu *svc)
 {
     int cpu;
+    unsigned int bs;
 
-    if ( likely(cpumask_test_cpu(svc->vcpu->processor,
-                                 svc->vcpu->cpu_hard_affinity)) )
-        return svc->vcpu->processor;
+    for_each_affinity_balance_step( bs )
+    {
+        if ( bs == BALANCE_SOFT_AFFINITY &&
+             !has_soft_affinity(svc->vcpu, svc->vcpu->cpu_hard_affinity) )
+            continue;
 
-    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
-                &svc->rqd->active);
-    cpu = cpumask_first(cpumask_scratch);
-    if ( likely(cpu < nr_cpu_ids) )
-        return cpu;
+        affinity_balance_cpumask(svc->vcpu, bs, cpumask_scratch);
 
-    cpumask_and(cpumask_scratch, svc->vcpu->cpu_hard_affinity,
-                cpupool_domain_cpumask(svc->vcpu->domain));
+        /*
+         * This is cases 1 or 4 (depending on bs): if v->processor is (still)
+         * in our affinity, go for it, for cache betterness.
+         */
+        if ( likely(cpumask_test_cpu(svc->vcpu->processor,
+                                     cpumask_scratch)) )
+            return svc->vcpu->processor;
 
-    ASSERT(!cpumask_empty(cpumask_scratch));
+        /*
+         * This is cases 2 or 5 (depending on bsp): v->processor isn't there
+         * any longer, check if we at least can stay in our current runq.
+         */
+        cpumask_and(cpumask_scratch, cpumask_scratch,
+                    &svc->rqd->active);
+        cpu = cpumask_first(cpumask_scratch);
+        if ( likely(cpu < nr_cpu_ids) )
+            return cpu;
 
-    return cpumask_first(cpumask_scratch);
+        /*
+         * This is cases 3 or 6 (depending on bs): last stand, just one valid
+         * pcpu from our soft affinity, if we have one and if there's any. In
+         * fact, if we are doing soft-affinity, it is possible that we fail,
+         * which means we stay in the loop and look for hard affinity. OTOH,
+         * if we are at the hard-affinity balancing step, it's guaranteed that
+         * there is at least one valid cpu, and therefore we are sure that we
+         * return it, and never really exit the loop.
+         */
+        cpumask_and(cpumask_scratch, cpumask_scratch,
+                    cpupool_domain_cpumask(svc->vcpu->domain));
+        ASSERT(!cpumask_empty(cpumask_scratch) || bs == BALANCE_SOFT_AFFINITY);
+        cpu = cpumask_first(cpumask_scratch);
+        if ( likely(cpu < nr_cpu_ids) )
+            return cpu;
+    }
+    BUG_ON(1);
+    return -1;
 }
 
 /*
@@ -1561,14 +1595,15 @@ csched2_context_saved(const struct scheduler *ops, struct vcpu *vc)
     vcpu_schedule_unlock_irq(lock, vc);
 }
 
-#define MAX_LOAD (STIME_MAX);
+#define MAX_LOAD (STIME_MAX)
 static int
 csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
 {
     struct csched2_private *prv = CSCHED2_PRIV(ops);
-    int i, min_rqi = -1, new_cpu;
+    int i, min_rqi = -1, min_s_rqi = -1, new_cpu;
     struct csched2_vcpu *svc = CSCHED2_VCPU(vc);
-    s_time_t min_avgload = MAX_LOAD;
+    s_time_t min_avgload = MAX_LOAD, min_s_avgload = MAX_LOAD;
+    bool_t has_soft;
 
     ASSERT(!cpumask_empty(&prv->active_queues));
 
@@ -1613,6 +1648,12 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         }
         else
         {
+            /*
+             * If we've been asked to move to migrate_rqd, we should just do
+             * that, which we actually do by returning one cpu from that runq.
+             * There is no need to take care of soft affinity, as that will
+             * happen in runq_tickle().
+             */
             cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
                         &svc->migrate_rqd->active);
             new_cpu = cpumask_any(cpumask_scratch);
@@ -1622,7 +1663,21 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         /* Fall-through to normal cpu pick */
     }
 
-    /* Find the runqueue with the lowest average load. */
+    has_soft = has_soft_affinity(vc, vc->cpu_hard_affinity);
+    if ( has_soft )
+        affinity_balance_cpumask(vc, BALANCE_SOFT_AFFINITY, cpumask_scratch);
+
+    /*
+     * What we want is:
+     *  - if we have soft affinity, the runqueue with the lowest average
+     *    load, among the ones that contain cpus in our soft affinity; this
+     *    represents the best runq on which we would want to run.
+     *  - the runqueue with the lowest average load among the ones that
+     *    contains cpus in our hard affinity; this represent the best runq
+     *    on which we can run.
+     *
+     * Find both runqueues in one pass.
+     */
     for_each_cpu(i, &prv->active_queues)
     {
         struct csched2_runqueue_data *rqd;
@@ -1656,6 +1711,13 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
             spin_unlock(&rqd->lock);
         }
 
+        if ( has_soft &&
+             rqd_avgload < min_s_avgload &&
+             cpumask_intersects(cpumask_scratch, &rqd->active) )
+        {
+            min_s_avgload = rqd_avgload;
+            min_s_rqi = i;
+        }
         if ( rqd_avgload < min_avgload )
         {
             min_avgload = rqd_avgload;
@@ -1663,9 +1725,33 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc)
         }
     }
 
-    /* We didn't find anyone (most likely because of spinlock contention). */
-    if ( min_rqi == -1 )
+    if ( has_soft && min_s_rqi != -1 )
+    {
+        /*
+         * We have soft affinity, and we have a candidate runq, so go for it.
+         *
+         * Note that, since has_soft is true, cpumask_scratch holds the proper
+         * soft-affinity mask.
+         */
+        cpumask_and(cpumask_scratch, cpumask_scratch,
+                    &prv->rqd[min_s_rqi].active);
+    }
+    else if ( min_rqi != -1 )
     {
+        /*
+         * Either we don't have soft affinity, or we do, but we did not find
+         * any suitable runq. But we did find one when considering hard
+         * affinity, so go for it.
+         */
+        cpumask_and(cpumask_scratch, vc->cpu_hard_affinity,
+                    &prv->rqd[min_rqi].active);
+    }
+    else
+    {
+        /*
+         * We didn't find anyone at all (most likely because of spinlock
+         * contention).
+         */
         new_cpu = get_fallback_cpu(svc);
         min_rqi = c2r(ops, new_cpu);
         min_avgload = prv->rqd[min_rqi].b_avgload;