From patchwork Wed Jul  6 17:33:34 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Anshul Makkaranshul.makkar"@citrix.com
X-Patchwork-Id: 9216775
Return-Path: <xen-devel-bounces@lists.xen.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	5361760752 for <patchwork-xen-devel@patchwork.kernel.org>;
	Wed,  6 Jul 2016 17:36:39 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 44886280CF
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Wed,  6 Jul 2016 17:36:39 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 395FA28111; Wed,  6 Jul 2016 17:36:39 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED
	autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 90957280CF
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Wed,  6 Jul 2016 17:36:38 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <xen-devel-bounces@lists.xen.org>)
	id 1bKqij-0005EV-QO; Wed, 06 Jul 2016 17:34:45 +0000
Received: from mail6.bemta6.messagelabs.com ([85.158.143.247])
	by lists.xenproject.org with esmtp (Exim 4.84_2) id 1bKqii-0005DT-1v
	for xen-devel@lists.xen.org; Wed, 06 Jul 2016 17:34:44 +0000
Received: from [85.158.143.35] by server-1.bemta-6.messagelabs.com id
	A3/D7-09256-3314D775; Wed, 06 Jul 2016 17:34:43 +0000
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFupjkeJIrShJLcpLzFFi42JxWrrBXtfIsTb
	c4NkqI4slHxezODB6HN39mymAMYo1My8pvyKBNWPSkiMsBQs8Kt6f72dpYDxh0sXIySEh4C/R
	cXAGM4jNJqAnMeHmfkYQW0RAVmJ11xx2EJtZIFPi6wKIGmEBX4lPc08xgdgsAioSDyedZQGxe
	QU8JKY03mOFmCkncfLYZCCbg0NIgF9iwu0qiBJBiZMzn7BAjJSQOPjiBTNIiYQAt8TfbvsJjD
	yzkFTNQlK1gJFpFaN6cWpRWWqRrpFeUlFmekZJbmJmjq6hgZlebmpxcWJ6ak5iUrFecn7uJkZ
	geDAAwQ7GZX+dDjFKcjApifKyfKsOF+JLyk+pzEgszogvKs1JLT7EqMHBITDh7NzpTFIsefl5
	qUoSvKwOteFCgkWp6akVaZk5wACGKZXg4FES4fUASfMWFyTmFmemQ6ROMepy/Jl6fS2TENgMK
	XFeJpAiAZCijNI8uBGwaLrEKCslzMsIdKAQT0FqUW5mCar8K0ZxDkYlYd4ikCk8mXklcJteAR
	3BBHTET5dqkCNKEhFSUg2MJf9q/e3cHU4uK1m+49S8tiVZ9+R0e58dO7ykPPBcbp/u/lyD7/J
	T38x/IihsccNxS+EuVc+G3L/N3zbcvvTgglpykmD2+UDFYL3/G2YsWnTjZfP+3aYxPjLXqucu
	mmnOa7L1n6PNiZuHfHXTg0tSr2nK+7oG8TJwPzb5Vvz524Knczgvdax0UGIpzkg01GIuKk4EA
	NmJVDqhAgAA
X-Msg-Ref: server-5.tower-21.messagelabs.com!1467826481!22626983!1
X-Originating-IP: [66.165.176.63]
X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor:
	VHJ1c3RlZCBJUDogNjYuMTY1LjE3Ni42MyA9PiAzMDYwNDg=\n,
	received_headers: No Received headers
X-StarScan-Received: 
X-StarScan-Version: 8.77; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 60695 invoked from network); 6 Jul 2016 17:34:42 -0000
Received: from smtp02.citrix.com (HELO SMTP02.CITRIX.COM) (66.165.176.63)
	by server-5.tower-21.messagelabs.com with RC4-SHA encrypted SMTP;
	6 Jul 2016 17:34:42 -0000
X-IronPort-AV: E=Sophos;i="5.28,320,1464652800"; d="scan'208";a="371611816"
From: <"Anshul Makkaranshul.makkar"@citrix.com>
To: <xen-devel@lists.xen.org>
Date: Wed, 6 Jul 2016 18:33:34 +0100
Message-ID: <1467826414-17337-1-git-send-email-anshul.makkar@citrix.com>
X-Mailer: git-send-email 1.9.1
MIME-Version: 1.0
X-DLP: MIA1
Cc: george.dunlap@eu.citrix.com, dario.faggioli@citrix.com,
	Anshul Makkar <anshul.makkar@citrix.com>
Subject: [Xen-devel] [PATCH] credi2-ratelimit: Implement rate limit for
	credit2 scheduler
X-BeenThere: xen-devel@lists.xen.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xen.org>
List-Unsubscribe: <https://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <https://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
X-Virus-Scanned: ClamAV using ClamSMTP

From: Anshul Makkar <anshul.makkar@citrix.com>

Rate limit assures that a vcpu will execute for a minimum amount of time before
being put at the back of a queue or being preempted by higher priority thread.

It introduces a minimum amount of latency to enable a VM to batch its work and
it also ensures that system is not spending most of its time in
VMEXIT/VMENTRY because of VM that is waking/sleeping at high rate.

ratelimit can be disabled by setting it to 0.

Signed-off-by: Anshul Makkar <anshul.makkar@citrix.com>
---
---
 xen/common/sched_credit2.c | 115 ++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 98 insertions(+), 17 deletions(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 1933ff1..6718574 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -171,6 +171,11 @@ integer_param("sched_credit2_migrate_resist", opt_migrate_resist);
 #define c2r(_ops, _cpu)     (CSCHED2_PRIV(_ops)->runq_map[(_cpu)])
 /* CPU to runqueue struct macro */
 #define RQD(_ops, _cpu)     (&CSCHED2_PRIV(_ops)->rqd[c2r(_ops, _cpu)])
+/* Find the max of time slice */
+#define MAX_TSLICE(t1, t2)  \
+                   ({ typeof (t1) _t1 = (t1); \
+                      typeof (t1) _t2 = (t2); \
+                      _t1 > _t2 ? _t1 < 0 ? 0 : _t1 : _t2 < 0 ? 0 : _t2; })
 
 /*
  * Shifts for load average.
@@ -280,6 +285,7 @@ struct csched2_private {
     struct csched2_runqueue_data rqd[NR_CPUS];
 
     unsigned int load_window_shift;
+    unsigned ratelimit_us; /* each cpupool can have its onw ratelimit */
 };
 
 /*
@@ -1588,6 +1594,34 @@ csched2_dom_cntl(
     return rc;
 }
 
+static int csched2_sys_cntl(const struct scheduler *ops,
+                            struct xen_sysctl_scheduler_op *sc)
+{
+    int rc = -EINVAL;
+    xen_sysctl_credit_schedule_t *params = &sc->u.sched_credit;
+    struct csched2_private *prv = CSCHED2_PRIV(ops);
+    unsigned long flags;
+
+    switch (sc->cmd )
+    {
+        case XEN_SYSCTL_SCHEDOP_putinfo:
+            if ( params->ratelimit_us &&
+                ( params->ratelimit_us < CSCHED2_MIN_TIMER ||
+                  params->ratelimit_us > MICROSECS(CSCHED2_MAX_TIMER) ))
+                return rc;
+            spin_lock_irqsave(&prv->lock, flags);
+            prv->ratelimit_us = params->ratelimit_us;
+            spin_unlock_irqrestore(&prv->lock, flags);
+            break;
+
+        case XEN_SYSCTL_SCHEDOP_getinfo:
+            params->ratelimit_us = prv->ratelimit_us;
+            rc = 0;
+            break;
+    }
+    return rc;
+}
+
 static void *
 csched2_alloc_domdata(const struct scheduler *ops, struct domain *dom)
 {
@@ -1657,12 +1691,15 @@ csched2_dom_destroy(const struct scheduler *ops, struct domain *dom)
 
 /* How long should we let this vcpu run for? */
 static s_time_t
-csched2_runtime(const struct scheduler *ops, int cpu, struct csched2_vcpu *snext)
+csched2_runtime(const struct scheduler *ops, int cpu,
+                struct csched2_vcpu *snext, s_time_t now)
 {
-    s_time_t time; 
+    s_time_t time;
     int rt_credit; /* Proposed runtime measured in credits */
     struct csched2_runqueue_data *rqd = RQD(ops, cpu);
     struct list_head *runq = &rqd->runq;
+    s_time_t runtime = 0;
+    struct csched2_private *prv = CSCHED2_PRIV(ops);
 
     /*
      * If we're idle, just stay so. Others (or external events)
@@ -1680,6 +1717,14 @@ csched2_runtime(const struct scheduler *ops, int cpu, struct csched2_vcpu *snext
 
     /* 1) Basic time: Run until credit is 0. */
     rt_credit = snext->credit;
+    if (snext->vcpu->is_running)
+        runtime = now - snext->vcpu->runstate.state_entry_time;
+    if ( runtime < 0 )
+    {
+        runtime = 0;
+        d2printk("%s: Time went backwards? now %"PRI_stime" state_entry_time %"PRI_stime"\n",
+                  _func__, now, snext->runstate.state_entry_time);
+    }
 
     /* 2) If there's someone waiting whose credit is positive,
      * run until your credit ~= his */
@@ -1695,11 +1740,24 @@ csched2_runtime(const struct scheduler *ops, int cpu, struct csched2_vcpu *snext
     }
 
     /* The next guy may actually have a higher credit, if we've tried to
-     * avoid migrating him from a different cpu.  DTRT.  */
+     * avoid migrating him from a different cpu.  DTRT.
+     * Even if the next guy has higher credit and current vcpu has executed
+     * for less amount of time than rate limit, allow it to run for minimum
+     * amount of time.
+     */
     if ( rt_credit <= 0 )
     {
-        time = CSCHED2_MIN_TIMER;
-        SCHED_STAT_CRANK(runtime_min_timer);
+        if ( snext->vcpu->is_running && prv->ratelimit_us)
+           /* implies the current one has executed for time < ratelimit and thus
+            * it has neen selcted int runq_candidate to run next.
+            * No need to check for this condition again.
+            */
+            time = MAX_TSLICE(CSCHED2_MIN_TIMER,
+                               MICROSECS(prv->ratelimit_us) - runtime);
+        else
+            time = MAX_TSLICE(CSCHED2_MIN_TIMER, MICROSECS(prv->ratelimit_us));
+
+        SCHED_STAT_CRANK(time);
     }
     else
     {
@@ -1709,17 +1767,22 @@ csched2_runtime(const struct scheduler *ops, int cpu, struct csched2_vcpu *snext
          * at a different rate. */
         time = c2t(rqd, rt_credit, snext);
 
-        /* Check limits */
-        if ( time < CSCHED2_MIN_TIMER )
-        {
-            time = CSCHED2_MIN_TIMER;
-            SCHED_STAT_CRANK(runtime_min_timer);
-        }
-        else if ( time > CSCHED2_MAX_TIMER )
+        /* Check limits.
+         * time > ratelimit --> time > MIN.
+         */
+        if ( time > CSCHED2_MAX_TIMER )
         {
+
             time = CSCHED2_MAX_TIMER;
             SCHED_STAT_CRANK(runtime_max_timer);
         }
+        else
+        {
+            time = MAX_TSLICE(MAX_TSLICE(CSCHED2_MIN_TIMER,
+                                          MICROSECS(prv->ratelimit_us)), time);
+            SCHED_STAT_CRANK(time);
+        }
+
     }
 
     return time;
@@ -1733,7 +1796,7 @@ void __dump_execstate(void *unused);
 static struct csched2_vcpu *
 runq_candidate(struct csched2_runqueue_data *rqd,
                struct csched2_vcpu *scurr,
-               int cpu, s_time_t now)
+               int cpu, s_time_t now, struct csched2_private *prv)
 {
     struct list_head *iter;
     struct csched2_vcpu *snext = NULL;
@@ -1744,6 +1807,16 @@ runq_candidate(struct csched2_runqueue_data *rqd,
     else
         snext = CSCHED2_VCPU(idle_vcpu[cpu]);
 
+    /* return the current vcpu if it has executed for less than ratelimit.
+     * Adjuststment for the selected vcpu's credit and decision
+     * for how long it will run will be taken in csched2_runtime.
+     */
+    if ( prv->ratelimit_us && !is_idle_vcpu(scurr->vcpu) &&
+         vcpu_runnable(scurr->vcpu) &&
+         (now - scurr->vcpu->runstate.state_entry_time) <
+          MICROSECS(prv->ratelimit_us) )
+        return scurr;
+
     list_for_each( iter, &rqd->runq )
     {
         struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem);
@@ -1762,9 +1835,13 @@ runq_candidate(struct csched2_runqueue_data *rqd,
         }
 
         /* If the next one on the list has more credit than current
-         * (or idle, if current is not runnable), choose it. */
+         * (or idle, if current is not runnable) and current one has already
+         * executed for more than ratelimit. choose it.
+         * Control has reached here means that current vcpu has executed >
+         * ratelimit_us or ratelimit is off, so chose the next one.
+         */
         if ( svc->credit > snext->credit )
-            snext = svc;
+                snext = svc;
 
         /* In any case, if we got this far, break. */
         break;
@@ -1787,6 +1864,7 @@ csched2_schedule(
     struct csched2_vcpu * const scurr = CSCHED2_VCPU(current);
     struct csched2_vcpu *snext = NULL;
     struct task_slice ret;
+    struct csched2_private *prv = CSCHED2_PRIV(ops);
 
     SCHED_STAT_CRANK(schedule);
     CSCHED2_VCPU_CHECK(current);
@@ -1857,7 +1935,7 @@ csched2_schedule(
         snext = CSCHED2_VCPU(idle_vcpu[cpu]);
     }
     else
-        snext=runq_candidate(rqd, scurr, cpu, now);
+        snext=runq_candidate(rqd, scurr, cpu, now, prv);
 
     /* If switching from a non-idle runnable vcpu, put it
      * back on the runqueue. */
@@ -1921,7 +1999,7 @@ csched2_schedule(
     /*
      * Return task to run next...
      */
-    ret.time = csched2_runtime(ops, cpu, snext);
+    ret.time = csched2_runtime(ops, cpu, snext, now);
     ret.task = snext->vcpu;
 
     CSCHED2_VCPU_CHECK(ret.task);
@@ -2353,6 +2431,8 @@ csched2_init(struct scheduler *ops)
         prv->runq_map[i] = -1;
         prv->rqd[i].id = -1;
     }
+    /* initialize ratelimit */
+    prv->ratelimit_us = 1000 * CSCHED2_MIN_TIMER;
 
     prv->load_window_shift = opt_load_window_shift;
 
@@ -2385,6 +2465,7 @@ static const struct scheduler sched_credit2_def = {
     .wake           = csched2_vcpu_wake,
 
     .adjust         = csched2_dom_cntl,
+    .adjust_global  = csched2_sys_cntl,
 
     .pick_cpu       = csched2_cpu_pick,
     .migrate        = csched2_vcpu_migrate,