From patchwork Wed Aug 17 17:19:32 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dario Faggioli <dario.faggioli@citrix.com>
X-Patchwork-Id: 9286233
Return-Path: <xen-devel-bounces@lists.xen.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	A09A560839 for <patchwork-xen-devel@patchwork.kernel.org>;
	Wed, 17 Aug 2016 17:21:59 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8F33B294A2
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Wed, 17 Aug 2016 17:21:59 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 839B3294B8; Wed, 17 Aug 2016 17:21:59 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id C6D20294A2
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Wed, 17 Aug 2016 17:21:58 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <xen-devel-bounces@lists.xen.org>)
	id 1ba4V8-0000qp-Tg; Wed, 17 Aug 2016 17:19:38 +0000
Received: from mail6.bemta6.messagelabs.com ([193.109.254.103])
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <raistlin.df@gmail.com>) id 1ba4V7-0000pP-IH
	for xen-devel@lists.xenproject.org; Wed, 17 Aug 2016 17:19:37 +0000
Received: from [193.109.254.147] by server-5.bemta-6.messagelabs.com id
	67/6D-29563-8AC94B75; Wed, 17 Aug 2016 17:19:36 +0000
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrPIsWRWlGSWpSXmKPExsXiVRvkort8zpZ
	wg9U7bCy+b5nM5MDocfjDFZYAxijWzLyk/IoE1oxfzRuYCzaEV/xc/ZelgfGOVRcjF4eQwAxG
	iXnLLjJ1MXJysAisYZU4vkMUJCEhcIlVYtvnbYwgCQmBGIn3K64xQdjlEhOn3GMBsYUEVCRub
	l/FBDFpLpPEzen9YEXCAnoSR47+YIewfSXWLTjFCmKzCRhIvNmxF8wWEVCSuLdqMlgzs0ATo8
	Tjnc0sEGeoSkxZNA1sM6+Aj8TBxQvABnEC2bdW/4ba7C1xeHIPG4gtKiAnsfJyCytEvaDEyZl
	PgGo4gIZqSqzfpQ8SZhaQl9j+dg7zBEaRWUiqZiFUzUJStYCReRWjRnFqUVlqka6xoV5SUWZ6
	RkluYmaOrqGBmV5uanFxYnpqTmJSsV5yfu4mRmAEMADBDsamRYGHGCU5mJREee9UbwkX4kvKT
	6nMSCzOiC8qzUktPsSowcEhMOHs3OlMUix5+XmpShK8arOB6gSLUtNTK9Iyc4AxClMqwcGjJM
	LLCpLmLS5IzC3OTIdInWLU5dgy9d5aJiGwGVLivM9mARUJgBRllObBjYCli0uMslLCvIxABwr
	xFKQW5WaWoMq/YhTnYFQS5v0EMoUnM68EbtMroCOYgI7g5Qc7oiQRISXVwMid9PpEZuwJqas3
	55/d+mv15lNPUuXaPjRrBInVlk6JkPp/7+ZHmy5Jz4jVgdwKn6df5PpzcK3k11/tDA31z/aFl
	L85Wt3pY7LgR6/Rps1vmN5MT3ia+DzXW3+L9+7O1flix+4Y1eQzHpBK2yMVtLvZ2NF9W1IPe2
	xuzElJPfnO6c2lSxR+hiixFGckGmoxFxUnAgACzQ7LEgMAAA==
X-Env-Sender: raistlin.df@gmail.com
X-Msg-Ref: server-14.tower-27.messagelabs.com!1471454375!41935377!1
X-Originating-IP: [74.125.82.68]
X-SpamReason: No, hits=0.2 required=7.0 tests=RCVD_ILLEGAL_IP
X-StarScan-Received: 
X-StarScan-Version: 8.84; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 30195 invoked from network); 17 Aug 2016 17:19:35 -0000
Received: from mail-wm0-f68.google.com (HELO mail-wm0-f68.google.com)
	(74.125.82.68)
	by server-14.tower-27.messagelabs.com with AES128-GCM-SHA256
	encrypted SMTP; 17 Aug 2016 17:19:35 -0000
Received: by mail-wm0-f68.google.com with SMTP id o80so26324520wme.0
	for <xen-devel@lists.xenproject.org>;
	Wed, 17 Aug 2016 10:19:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=sender:subject:from:to:cc:date:message-id:in-reply-to:references
	:user-agent:mime-version:content-transfer-encoding;
	bh=ZR+A2LqoW7FMwPDkAA5PFLcG1AsQQSDGXhFKUE0Meho=;
	b=FwNqnWg1W9WEHRF/blu29h3d/gLDxll4AYrhT3i1P3x6iOCdv0WAQv41Kz7+VIsJuO
	uEwzYW/CTtpfGPNY4k0y60N9uk+SAllL35UtS7Sn6pf7tI4XiPYBbG8vMfDmTu0tKAkG
	kNgFUkD+smTjGpX9+gTE4e8XBiBKbjlo9ziLXERqug6lKBuqrIhCRtL+2r+4RUf8SdbP
	KDJZ36edFkUr0+7bf5mT8vXUrm2DTBekQkkT/hUqj+FbIvNfEh1HgjH+LDrX+3ig3LzZ
	tHcWd8BGL7Afr1CHn4NWLKZ+X2ugC57Qz9xjacl7nbvLa+8SDDwLytzBXl2OwsAjMY3y
	mUjQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:sender:subject:from:to:cc:date:message-id
	:in-reply-to:references:user-agent:mime-version
	:content-transfer-encoding;
	bh=ZR+A2LqoW7FMwPDkAA5PFLcG1AsQQSDGXhFKUE0Meho=;
	b=SQG1VyR8fzgN3KuMxdiL9DYqAXk0gk01M4UrKSVLkib9/CUl5beZUKBVBqTFcQcfy+
	AkPyZGdZZwvhTTF1rXC2SdXrGW7ASXtpu36/0/MY2SlOTodjLvUD0Ys2nqAIStzAC/f0
	pJI2iRjOskyQWcKQXfLymIWmjUgSnWxj0LIn1QooXeYzX18tH74AYhZv2uSSQ7gkDthf
	+Reqzf75FVlvDLANvu19hfe3SADcQ+4Ygh8nU0WufJBhqokZ9q8WcDv2T6F1Wg9VasQh
	JoTNbjCjsWIBZmfGb6H0jpl3c3PUrZqGdxd9twy47SrMjco1LD27dbF55mcWB8OXlcFA
	OUUQ==
X-Gm-Message-State: 
 AEkooush3UxN9PUtFvNgTu7YqbVJE7KS6JYunj08v3+84b5eUWuISjiLz4k6DwVnckUVTA==
X-Received: by 10.28.52.2 with SMTP id b2mr13484831wma.109.1471454375124;
	Wed, 17 Aug 2016 10:19:35 -0700 (PDT)
Received: from Solace.fritz.box (net-2-32-14-104.cust.vodafonedsl.it.
	[2.32.14.104]) by smtp.gmail.com with ESMTPSA id
	s6sm32474338wjm.25.2016.08.17.10.19.33
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Wed, 17 Aug 2016 10:19:34 -0700 (PDT)
From: Dario Faggioli <dario.faggioli@citrix.com>
To: xen-devel@lists.xenproject.org
Date: Wed, 17 Aug 2016 19:19:32 +0200
Message-ID: <147145437291.25877.11396888641547651914.stgit@Solace.fritz.box>
In-Reply-To: <147145358844.25877.7490417583264534196.stgit@Solace.fritz.box>
References: <147145358844.25877.7490417583264534196.stgit@Solace.fritz.box>
User-Agent: StGit/0.17.1-dirty
MIME-Version: 1.0
Cc: Anshul Makkar <anshul.makkar@citrix.com>,
	"Justin T. Weaver" <jtweaver@hawaii.edu>,
	George Dunlap <george.dunlap@citrix.com>
Subject: [Xen-devel] [PATCH 17/24] xen: credit2: soft-affinity awareness in
	runq_tickle()
X-BeenThere: xen-devel@lists.xen.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xen.org>
List-Unsubscribe: <https://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <https://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
X-Virus-Scanned: ClamAV using ClamSMTP

This is done by means of the "usual" two steps loop:
 - soft affinity balance step;
 - hard affinity balance step.

The entire logic implemented in runq_tickle() is
applied, during the first step, considering only the
CPUs in the vcpu's soft affinity. In the second step,
we fall back to use all the CPUs from its hard
affinity (as it is doing now, without this patch).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
---
 xen/common/sched_credit2.c |  243 ++++++++++++++++++++++++++++----------------
 1 file changed, 157 insertions(+), 86 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 0d83bd7..3aef1b4 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -902,6 +902,42 @@ __runq_remove(struct csched2_vcpu *svc)
     list_del_init(&svc->runq_elem);
 }
 
+/*
+ * During the soft-affinity step, only actually preempt someone if
+ * he does not have soft-affinity with cpu (while we have).
+ *
+ * BEWARE that this uses cpumask_scratch, trowing away what's in there!
+ */
+static inline bool_t soft_aff_check_preempt(unsigned int bs, unsigned int cpu)
+{
+    struct csched2_vcpu * cur = CSCHED2_VCPU(curr_on_cpu(cpu));
+
+    /*
+     * If we're doing hard-affinity, always check whether to preempt cur.
+     * If we're doing soft-affinity, but cur doesn't have one, check as well.
+     */
+    if ( bs == BALANCE_HARD_AFFINITY ||
+         !has_soft_affinity(cur->vcpu, cur->vcpu->cpu_hard_affinity) )
+        return 1;
+
+    /*
+     * We're doing soft-affinity, and we know that the current vcpu on cpu
+     * has a soft affinity. We now want to know whether cpu itself is in
+     * such affinity. In fact, since we now that new (in runq_tickle()) is:
+     *  - if cpu is not in cur's soft-affinity, we should indeed check to
+     *    see whether new should preempt cur. If that will be the case, that
+     *    would be an improvement wrt respecting soft affinity;
+     *  - if cpu is in cur's soft-affinity, we leave it alone and (in
+     *    runq_tickle()) move on to another cpu. In fact, we don't want to
+     *    be too harsh with someone which is running within its soft-affinity.
+     *    This is safe because later, if we don't fine anyone else during the
+     *    soft-affinity step, we will check cpu for preemption anyway, when
+     *    doing hard-affinity.
+     */
+    affinity_balance_cpumask(cur->vcpu, BALANCE_SOFT_AFFINITY, cpumask_scratch);
+    return !cpumask_test_cpu(cpu, cpumask_scratch);
+}
+
 void burn_credits(struct csched2_runqueue_data *rqd, struct csched2_vcpu *, s_time_t);
 
 /*
@@ -925,7 +961,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
 {
     int i, ipid = -1;
     s_time_t lowest = (1<<30);
-    unsigned int cpu = new->vcpu->processor;
+    unsigned int bs, cpu = new->vcpu->processor;
     struct csched2_runqueue_data *rqd = RQD(ops, cpu);
     cpumask_t mask;
     struct csched2_vcpu * cur;
@@ -947,109 +983,144 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now)
                     (unsigned char *)&d);
     }
 
-    /*
-     * First of all, consider idle cpus, checking if we can just
-     * re-use the pcpu where we were running before.
-     *
-     * If there are cores where all the siblings are idle, consider
-     * them first, honoring whatever the spreading-vs-consolidation
-     * SMT policy wants us to do.
-     */
-    if ( unlikely(sched_smt_power_savings) )
-        cpumask_andnot(&mask, &rqd->idle, &rqd->smt_idle);
-    else
-        cpumask_copy(&mask, &rqd->smt_idle);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
-    i = cpumask_test_or_cycle(cpu, &mask);
-    if ( i < nr_cpu_ids )
+    for_each_affinity_balance_step( bs )
     {
-        SCHED_STAT_CRANK(tickled_idle_cpu);
-        ipid = i;
-        goto tickle;
-    }
+        /*
+         * First things first: if we are at the first (soft affinity) step,
+         * but new doesn't have a soft affinity, skip this step.
+         */
+        if ( bs == BALANCE_SOFT_AFFINITY &&
+             !has_soft_affinity(new->vcpu, new->vcpu->cpu_hard_affinity) )
+            continue;
 
-    /*
-     * If there are no fully idle cores, check all idlers, after
-     * having filtered out pcpus that have been tickled but haven't
-     * gone through the scheduler yet.
-     */
-    cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
-    i = cpumask_test_or_cycle(cpu, &mask);
-    if ( i < nr_cpu_ids )
-    {
-        SCHED_STAT_CRANK(tickled_idle_cpu);
-        ipid = i;
-        goto tickle;
-    }
+        affinity_balance_cpumask(new->vcpu, bs, cpumask_scratch);
 
-    /*
-     * Otherwise, look for the non-idle (and non-tickled) processors with
-     * the lowest credit, among the ones new is allowed to run on. Again,
-     * the cpu were it was running on would be the best candidate.
-     */
-    cpumask_andnot(&mask, &rqd->active, &rqd->idle);
-    cpumask_andnot(&mask, &mask, &rqd->tickled);
-    cpumask_and(&mask, &mask, new->vcpu->cpu_hard_affinity);
-    if ( cpumask_test_cpu(cpu, &mask) )
-    {
-        cur = CSCHED2_VCPU(curr_on_cpu(cpu));
-        burn_credits(rqd, cur, now);
+        /*
+         * First of all, consider idle cpus, checking if we can just
+         * re-use the pcpu where we were running before.
+         *
+         * If there are cores where all the siblings are idle, consider
+         * them first, honoring whatever the spreading-vs-consolidation
+         * SMT policy wants us to do.
+         */
+        if ( unlikely(sched_smt_power_savings) )
+            cpumask_andnot(&mask, &rqd->idle, &rqd->smt_idle);
+        else
+            cpumask_copy(&mask, &rqd->smt_idle);
+        cpumask_and(&mask, &mask, cpumask_scratch);
+        i = cpumask_test_or_cycle(cpu, &mask);
+        if ( i < nr_cpu_ids )
+        {
+            SCHED_STAT_CRANK(tickled_idle_cpu);
+            ipid = i;
+            goto tickle;
+        }
 
-        if ( cur->credit < new->credit )
+        /*
+         * If there are no fully idle cores, check all idlers, after
+         * having filtered out pcpus that have been tickled but haven't
+         * gone through the scheduler yet.
+         */
+        cpumask_andnot(&mask, &rqd->idle, &rqd->tickled);
+        cpumask_and(&mask, &mask, cpumask_scratch);
+        i = cpumask_test_or_cycle(cpu, &mask);
+        if ( i < nr_cpu_ids )
         {
-            SCHED_STAT_CRANK(tickled_busy_cpu);
-            ipid = cpu;
+            SCHED_STAT_CRANK(tickled_idle_cpu);
+            ipid = i;
             goto tickle;
         }
-    }
 
-    for_each_cpu(i, &mask)
-    {
-        /* Already looked at this one above */
-        if ( i == cpu )
-            continue;
+        /*
+         * Otherwise, look for the non-idle (and non-tickled) processors with
+         * the lowest credit, among the ones new is allowed to run on. Again,
+         * the cpu were it was running on would be the best candidate.
+         */
+        cpumask_andnot(&mask, &rqd->active, &rqd->idle);
+        cpumask_andnot(&mask, &mask, &rqd->tickled);
+        cpumask_and(&mask, &mask, cpumask_scratch);
+        if ( cpumask_test_cpu(cpu, &mask) )
+        {
+            cur = CSCHED2_VCPU(curr_on_cpu(cpu));
 
-        cur = CSCHED2_VCPU(curr_on_cpu(i));
+            if ( soft_aff_check_preempt(bs, cpu) )
+            {
+                burn_credits(rqd, cur, now);
+
+                if ( unlikely(tb_init_done) )
+                {
+                    struct {
+                        unsigned vcpu:16, dom:16;
+                        unsigned cpu, credit;
+                    } d;
+                    d.dom = cur->vcpu->domain->domain_id;
+                    d.vcpu = cur->vcpu->vcpu_id;
+                    d.credit = cur->credit;
+                    d.cpu = cpu;
+                    __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1,
+                                sizeof(d),
+                                (unsigned char *)&d);
+                }
+
+                if ( cur->credit < new->credit )
+                {
+                    SCHED_STAT_CRANK(tickled_busy_cpu);
+                    ipid = cpu;
+                    goto tickle;
+                }
+            }
+        }
 
-        ASSERT(!is_idle_vcpu(cur->vcpu));
+        for_each_cpu(i, &mask)
+        {
+            /* Already looked at this one above */
+            if ( i == cpu )
+                continue;
 
-        /* Update credits for current to see if we want to preempt. */
-        burn_credits(rqd, cur, now);
+            cur = CSCHED2_VCPU(curr_on_cpu(i));
+            ASSERT(!is_idle_vcpu(cur->vcpu));
 
-        if ( cur->credit < lowest )
-        {
-            ipid = i;
-            lowest = cur->credit;
+            if ( soft_aff_check_preempt(bs, i) )
+            {
+                /* Update credits for current to see if we want to preempt. */
+                burn_credits(rqd, cur, now);
+
+                if ( unlikely(tb_init_done) )
+                {
+                    struct {
+                        unsigned vcpu:16, dom:16;
+                        unsigned cpu, credit;
+                    } d;
+                    d.dom = cur->vcpu->domain->domain_id;
+                    d.vcpu = cur->vcpu->vcpu_id;
+                    d.credit = cur->credit;
+                    d.cpu = i;
+                    __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1,
+                                sizeof(d),
+                                (unsigned char *)&d);
+                }
+
+                if ( cur->credit < lowest )
+                {
+                    ipid = i;
+                    lowest = cur->credit;
+                }
+            }
         }
 
-        if ( unlikely(tb_init_done) )
+        /*
+         * Only switch to another processor if the credit difference is
+         * greater than the migrate resistance.
+         */
+        if ( ipid != -1 && lowest + CSCHED2_MIGRATE_RESIST <= new->credit )
         {
-            struct {
-                unsigned vcpu:16, dom:16;
-                unsigned cpu, credit;
-            } d;
-            d.dom = cur->vcpu->domain->domain_id;
-            d.vcpu = cur->vcpu->vcpu_id;
-            d.credit = cur->credit;
-            d.cpu = i;
-            __trace_var(TRC_CSCHED2_TICKLE_CHECK, 1,
-                        sizeof(d),
-                        (unsigned char *)&d);
+            SCHED_STAT_CRANK(tickled_busy_cpu);
+            goto tickle;
         }
     }
 
-    /*
-     * Only switch to another processor if the credit difference is
-     * greater than the migrate resistance.
-     */
-    if ( ipid == -1 || lowest + CSCHED2_MIGRATE_RESIST > new->credit )
-    {
-        SCHED_STAT_CRANK(tickled_no_cpu);
-        return;
-    }
-
-    SCHED_STAT_CRANK(tickled_busy_cpu);
+    SCHED_STAT_CRANK(tickled_no_cpu);
+    return;
  tickle:
     BUG_ON(ipid == -1);