From patchwork Mon Jul 25 16:12:27 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Anshul Makkar <anshul.makkar@citrix.com>
X-Patchwork-Id: 9246093
Return-Path: <xen-devel-bounces@lists.xen.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	46E3760757 for <patchwork-xen-devel@patchwork.kernel.org>;
	Mon, 25 Jul 2016 17:27:25 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 35E1B24B48
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Mon, 25 Jul 2016 17:27:25 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 2AAD826490; Mon, 25 Jul 2016 17:27:25 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED
	autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 9FD3C24B48
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Mon, 25 Jul 2016 17:27:24 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <xen-devel-bounces@lists.xen.org>)
	id 1bRjct-0008MC-69; Mon, 25 Jul 2016 17:25:11 +0000
Received: from mail6.bemta14.messagelabs.com ([193.109.254.103])
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <prvs=0075993fc=anshul.makkar@citrix.com>)
	id 1bRjcs-0008LB-9t
	for xen-devel@lists.xen.org; Mon, 25 Jul 2016 17:25:10 +0000
Received: from [193.109.254.147] by server-10.bemta-14.messagelabs.com id
	EC/71-03469-57B46975; Mon, 25 Jul 2016 17:25:09 +0000
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmpmkeJIrShJLcpLzFFi42JxWrrBXrfUe1q
	4wdY+FoslHxezODB6HN39mymAMYo1My8pvyKBNWPz65qC01YVL/smMTcwbtbuYuTkkBDwl+i+
	Pp8NxGYT0JM4cusPO4gtIiArsbprDpjNLJAp8XXBDOYuRi4OYYG3jBJHvz0Fcjg4WARUJdZfz
	wExeQXcJbY0R0GMlJM4eWwyK4gtBFTR++wQE4jNKyAocXLmExaIkRISB1+8YIao55a4fXoq8w
	RGnllIymYhKVvAyLSKUaM4tagstUjXyEIvqSgzPaMkNzEzR9fQ0EQvN7W4ODE9NScxqVgvOT9
	3EyMwQOoZGBh3MK477neIUZKDSUmU14FxWrgQX1J+SmVGYnFGfFFpTmrxIUYNDg6BCWfnTmeS
	YsnLz0tVkuAN8wKqEyxKTU+tSMvMAYYwTKkEB4+SCG87SJq3uCAxtzgzHSJ1ilFRSpw3HiQhA
	JLIKM2Da4PFzSVGWSlhXkYGBgYhnoLUotzMElT5V4ziHIxKwrzzQKbwZOaVwE1/BbSYCWjxAp
	7JIItLEhFSUg2MWxt2SP7refIxUKhePLo1u+WD0rYHN26tTnwlEHKhaI/5DnHBZBU9gWBj8We
	uzmGXS2W3bFPdWitjIHOotjhqZ2KHGF9u4JdvxQ8ME35KZr84+n7ZRpYzynJHDuRMXa29c88s
	qUruy7mCqnssJOJbm9+/3Pdtf/w/cyu+LYERpfJrdDWeXXFRYinOSDTUYi4qTgQAL6u3HJYCA
	AA=
X-Env-Sender: prvs=0075993fc=anshul.makkar@citrix.com
X-Msg-Ref: server-8.tower-27.messagelabs.com!1469467507!45457953!1
X-Originating-IP: [66.165.176.63]
X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor:
	VHJ1c3RlZCBJUDogNjYuMTY1LjE3Ni42MyA9PiAzMDYwNDg=\n,
	received_headers: No Received headers
X-StarScan-Received: 
X-StarScan-Version: 8.77; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 49746 invoked from network); 25 Jul 2016 17:25:08 -0000
Received: from smtp02.citrix.com (HELO SMTP02.CITRIX.COM) (66.165.176.63)
	by server-8.tower-27.messagelabs.com with RC4-SHA encrypted SMTP;
	25 Jul 2016 17:25:08 -0000
X-IronPort-AV: E=Sophos;i="5.28,420,1464652800"; d="scan'208";a="375272092"
From: Anshul Makkar <anshul.makkar@citrix.com>
To: <xen-devel@lists.xen.org>
Date: Mon, 25 Jul 2016 17:12:27 +0100
Message-ID: <1469463147-4798-1-git-send-email-anshul.makkar@citrix.com>
X-Mailer: git-send-email 1.9.1
MIME-Version: 1.0
X-DLP: MIA2
Cc: george.dunlap@eu.citrix.com, dario.faggioli@citrix.com,
	Anshul Makkar <anshul.makkar@citrix.com>
Subject: [Xen-devel] [PATCH -v3 1/1] ratelimit: Implement rate limit for
	credit2 scheduler Rate limit assures that a vcpu will execute
	for a minimum amount of time before being put at the back of
	a queue or being preempted by higher priority thread.
X-BeenThere: xen-devel@lists.xen.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xen.org>
List-Unsubscribe: <https://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <https://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
X-Virus-Scanned: ClamAV using ClamSMTP

It introduces context-switch rate-limiting. The patch enables the VM to batch
its work and prevents the system from spending most of its time in context
switches because of a VM that is waking/sleeping at high rate.

ratelimit can be disabled by setting it to 0.

Signed-off-by: Anshul Makkar <anshul.makkar@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
---
 Changes in v3:
 * General fixes based on the review comments from v1 and v2.
---
 xen/common/sched_credit2.c | 111 ++++++++++++++++++++++++++++++++++-----------
 1 file changed, 85 insertions(+), 26 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index b92226c..d9f39dc 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -377,6 +377,7 @@ struct csched2_private {
 
     unsigned int load_precision_shift;
     unsigned int load_window_shift;
+    unsigned ratelimit_us; /* each cpupool can have its onw ratelimit */
 };
 
 /*
@@ -2029,6 +2030,34 @@ csched2_dom_cntl(
     return rc;
 }
 
+static int csched2_sys_cntl(const struct scheduler *ops,
+                            struct xen_sysctl_scheduler_op *sc)
+{
+    int rc = -EINVAL;
+    xen_sysctl_credit_schedule_t *params = &sc->u.sched_credit;
+    struct csched2_private *prv = CSCHED2_PRIV(ops);
+    unsigned long flags;
+
+    switch (sc->cmd )
+    {
+        case XEN_SYSCTL_SCHEDOP_putinfo:
+            if ( params->ratelimit_us &&
+                ( params->ratelimit_us > XEN_SYSCTL_SCHED_RATELIMIT_MAX ||
+                  params->ratelimit_us < XEN_SYSCTL_SCHED_RATELIMIT_MIN ))
+                return rc;
+            write_lock_irqsave(&prv->lock, flags);
+            prv->ratelimit_us = params->ratelimit_us;
+            write_unlock_irqrestore(&prv->lock, flags);
+            break;
+
+        case XEN_SYSCTL_SCHEDOP_getinfo:
+            params->ratelimit_us = prv->ratelimit_us;
+            rc = 0;
+            break;
+    }
+    return rc;
+}
+
 static void *
 csched2_alloc_domdata(const struct scheduler *ops, struct domain *dom)
 {
@@ -2098,12 +2127,14 @@ csched2_dom_destroy(const struct scheduler *ops, struct domain *dom)
 
 /* How long should we let this vcpu run for? */
 static s_time_t
-csched2_runtime(const struct scheduler *ops, int cpu, struct csched2_vcpu *snext)
+csched2_runtime(const struct scheduler *ops, int cpu,
+                struct csched2_vcpu *snext, s_time_t now)
 {
-    s_time_t time; 
+    s_time_t time, min_time;
     int rt_credit; /* Proposed runtime measured in credits */
     struct csched2_runqueue_data *rqd = RQD(ops, cpu);
     struct list_head *runq = &rqd->runq;
+    struct csched2_private *prv = CSCHED2_PRIV(ops);
 
     /*
      * If we're idle, just stay so. Others (or external events)
@@ -2116,9 +2147,22 @@ csched2_runtime(const struct scheduler *ops, int cpu, struct csched2_vcpu *snext
      * 1) Run until snext's credit will be 0
      * 2) But if someone is waiting, run until snext's credit is equal
      * to his
-     * 3) But never run longer than MAX_TIMER or shorter than MIN_TIMER.
+     * 3) But never run longer than MAX_TIMER or shorter than MIN_TIMER or
+     * the ratelimit time.
      */
 
+    /* Calculate mintime */
+    min_time = CSCHED2_MIN_TIMER;
+    if ( prv->ratelimit_us )
+    {
+        s_time_t ratelimit_min = prv->ratelimit_us;
+        if ( snext->vcpu->is_running )
+            ratelimit_min = snext->vcpu->runstate.state_entry_time +
+                            MICROSECS(prv->ratelimit_us) - now;
+        if ( ratelimit_min > min_time )
+            min_time = ratelimit_min;
+    }
+
     /* 1) Basic time: Run until credit is 0. */
     rt_credit = snext->credit;
 
@@ -2135,32 +2179,32 @@ csched2_runtime(const struct scheduler *ops, int cpu, struct csched2_vcpu *snext
         }
     }
 
-    /* The next guy may actually have a higher credit, if we've tried to
-     * avoid migrating him from a different cpu.  DTRT.  */
-    if ( rt_credit <= 0 )
+    /*
+     * The next guy on the runqueue may actually have a higher credit,
+     * if we've tried to avoid migrating him from a different cpu.
+     * Setting time=0 will ensure the minimum timeslice is chosen.
+     *
+     * FIXME: See if we can eliminate this conversion if we know time
+     * will be outside (MIN,MAX).  Probably requires pre-calculating
+     * credit values of MIN,MAX per vcpu, since each vcpu burns credit
+     * at a different rate.
+     */
+    if (rt_credit > 0)
+        time = c2t(rqd, rt_credit, snext);
+    else
+        time = 0;
+
+    /* 3) But never run longer than MAX_TIMER or less than MIN_TIMER or
+     * the rate_limit time. */
+    if ( time < min_time)
     {
-        time = CSCHED2_MIN_TIMER;
+        time = min_time;
         SCHED_STAT_CRANK(runtime_min_timer);
     }
-    else
+    else if (time > CSCHED2_MAX_TIMER)
     {
-        /* FIXME: See if we can eliminate this conversion if we know time
-         * will be outside (MIN,MAX).  Probably requires pre-calculating
-         * credit values of MIN,MAX per vcpu, since each vcpu burns credit
-         * at a different rate. */
-        time = c2t(rqd, rt_credit, snext);
-
-        /* Check limits */
-        if ( time < CSCHED2_MIN_TIMER )
-        {
-            time = CSCHED2_MIN_TIMER;
-            SCHED_STAT_CRANK(runtime_min_timer);
-        }
-        else if ( time > CSCHED2_MAX_TIMER )
-        {
-            time = CSCHED2_MAX_TIMER;
-            SCHED_STAT_CRANK(runtime_max_timer);
-        }
+        time = CSCHED2_MAX_TIMER;
+        SCHED_STAT_CRANK(runtime_max_timer);
     }
 
     return time;
@@ -2178,6 +2222,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
 {
     struct list_head *iter;
     struct csched2_vcpu *snext = NULL;
+    struct csched2_private *prv = CSCHED2_PRIV(per_cpu(scheduler, cpu));
 
     /* Default to current if runnable, idle otherwise */
     if ( vcpu_runnable(scurr->vcpu) )
@@ -2185,6 +2230,17 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     else
         snext = CSCHED2_VCPU(idle_vcpu[cpu]);
 
+    /*
+     * Return the current vcpu if it has executed for less than ratelimit.
+     * Adjuststment for the selected vcpu's credit and decision
+     * for how long it will run will be taken in csched2_runtime.
+     */
+    if ( prv->ratelimit_us && !is_idle_vcpu(scurr->vcpu) &&
+         vcpu_runnable(scurr->vcpu) &&
+         (now - scurr->vcpu->runstate.state_entry_time) <
+          MICROSECS(prv->ratelimit_us) )
+        return scurr;
+
     list_for_each( iter, &rqd->runq )
     {
         struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem);
@@ -2353,7 +2409,7 @@ csched2_schedule(
     /*
      * Return task to run next...
      */
-    ret.time = csched2_runtime(ops, cpu, snext);
+    ret.time = csched2_runtime(ops, cpu, snext, now);
     ret.task = snext->vcpu;
 
     CSCHED2_VCPU_CHECK(ret.task);
@@ -2808,6 +2864,8 @@ csched2_init(struct scheduler *ops)
         prv->runq_map[i] = -1;
         prv->rqd[i].id = -1;
     }
+    /* initialize ratelimit */
+    prv->ratelimit_us = sched_ratelimit_us;
 
     prv->load_precision_shift = opt_load_precision_shift;
     prv->load_window_shift = opt_load_window_shift - LOADAVG_GRANULARITY_SHIFT;
@@ -2842,6 +2900,7 @@ static const struct scheduler sched_credit2_def = {
     .wake           = csched2_vcpu_wake,
 
     .adjust         = csched2_dom_cntl,
+    .adjust_global  = csched2_sys_cntl,
 
     .pick_cpu       = csched2_cpu_pick,
     .migrate        = csched2_vcpu_migrate,