From patchwork Tue Aug  1 18:13:30 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Meng Xu <mengxu@cis.upenn.edu>
X-Patchwork-Id: 9875213
Return-Path: <xen-devel-bounces@lists.xen.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	5E1146038F for <patchwork-xen-devel@patchwork.kernel.org>;
	Tue,  1 Aug 2017 18:16:35 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 567512871C
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Tue,  1 Aug 2017 18:16:35 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id 4B01E28721; Tue,  1 Aug 2017 18:16:35 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED
	autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6A1C72871C
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Tue,  1 Aug 2017 18:16:34 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <xen-devel-bounces@lists.xen.org>)
	id 1dcbfz-0005Yi-4h; Tue, 01 Aug 2017 18:13:51 +0000
Received: from mail6.bemta6.messagelabs.com ([193.109.254.103])
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <mengxu@cis.upenn.edu>) id 1dcbfx-0005Yc-AC
	for xen-devel@lists.xenproject.org; Tue, 01 Aug 2017 18:13:49 +0000
Received: from [85.158.143.35] by server-6.bemta-6.messagelabs.com id
	19/94-03937-CD4C0895; Tue, 01 Aug 2017 18:13:48 +0000
X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrGLMWRWlGSWpSXmKPExsUyr8m9Sff2kYZ
	Ig8aNhhbft0xmcmD0OPzhCksAYxRrZl5SfkUCa8alj6tYCpZ4VDy+uZq1gfGDaRcjF4eQwCwm
	iS2buhm7GDk52ARUJI5veMQKYosIKEncWzWZCcRmFqiUuDnpODuILSxgJ3GxfScziM0ioCqxe
	dNssDivgLPEp0OLweISAnISJ49NZoWwQyXWLD7PBBN//PAB4wRGrgWMDKsYNYpTi8pSi3SNDf
	WSijLTM0pyEzNzdA0NzPRyU4uLE9NTcxKTivWS83M3MQL9yAAEOxibFgUeYpTkYFIS5VXsqY8
	U4kvKT6nMSCzOiC8qzUktPsQow8GhJMF77XBDpJBgUWp6akVaZg4woGDSEhw8SiK8+0HSvMUF
	ibnFmekQqVOMuhyvJvz/xiTEkpeflyolzssGDE8hAZCijNI8uBGw4L7EKCslzMsIdJQQT0FqU
	W5mCar8K0ZxDkYlYV5DkCk8mXklcJteAR3BBHSEZGktyBEliQgpqQZG4XC3C5y3nZymPE2+ks
	j7oSDqGsutAmuTu+qVC4UDjRbdfTLxbknsX9ZGradZOf/PzPsUffGpxtev3Qa5i/5ek+DfIbJ
	wggebVYD11umZyyzXNqRoLzkULqbIl5+3bkqPRuqHffmfxPNi4mcHRm875ajvadHO+n01zwRV
	Ia7Lv3+d4VvMbqCnxFKckWioxVxUnAgAUmj5lGkCAAA=
X-Env-Sender: mengxu@cis.upenn.edu
X-Msg-Ref: server-13.tower-21.messagelabs.com!1501611227!70135686!1
X-Originating-IP: [158.130.71.130]
X-SpamReason: No, hits=0.0 required=7.0 tests=
X-StarScan-Received: 
X-StarScan-Version: 9.4.25; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 28500 invoked from network); 1 Aug 2017 18:13:47 -0000
Received: from coyote.seas.upenn.edu (HELO hound.seas.upenn.edu)
	(158.130.71.130) by server-13.tower-21.messagelabs.com with SMTP;
	1 Aug 2017 18:13:47 -0000
Received: from panda-catbroadwell.cis.upenn.edu (SEASnet-48-12.cis.upenn.edu
	[158.130.48.13]) (authenticated bits=0)
	by hound.seas.upenn.edu (8.15.2/8.14.5) with ESMTPSA id
	v71IDbJa001074
	(version=TLSv1.2 cipher=AES256-SHA256 bits=256 verify=NOT);
	Tue, 1 Aug 2017 14:13:37 -0400
From: Meng Xu <mengxu@cis.upenn.edu>
To: xen-devel@lists.xenproject.org
Date: Tue,  1 Aug 2017 14:13:30 -0400
Message-Id: <1501611210-5232-1-git-send-email-mengxu@cis.upenn.edu>
X-Mailer: git-send-email 1.9.1
X-Proofpoint-Virus-Version: vendor=nai engine=5600 definitions=5800
	signatures=585085
X-Proofpoint-Spam-Reason: safe
Cc: george.dunlap@eu.citrix.com, dario.faggioli@citrix.com,
	xumengpanda@gmail.com, Meng Xu <mengxu@cis.upenn.edu>
Subject: [Xen-devel] [PATCH RFC v1] xen:rtds: towards work conserving RTDS
X-BeenThere: xen-devel@lists.xen.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xen.org>
List-Unsubscribe: <https://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <https://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
MIME-Version: 1.0
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Make RTDS scheduler work conserving to utilize the idle resource,
without breaking the real-time guarantees.

VCPU model:
Each real-time VCPU is extended to have a work conserving flag
and a priority_level field.
When a VCPU's budget is depleted in the current period,
if it has work conserving flag set,
its priority_level will increase by 1 and its budget will be refilled;
othewrise, the VCPU will be moved to the depletedq.

Scheduling policy: modified global EDF:
A VCPU v1 has higher priority than another VCPU v2 if
(i) v1 has smaller priority_leve; or
(ii) v1 has the same priority_level but has a smaller deadline

Signed-off-by: Meng Xu <mengxu@cis.upenn.edu>
---
 xen/common/sched_rt.c | 71 ++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 59 insertions(+), 12 deletions(-)

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 39f6bee..740a712 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -49,13 +49,16 @@
  * A PCPU is feasible if the VCPU can run on this PCPU and (the PCPU is idle or
  * has a lower-priority VCPU running on it.)
  *
- * Each VCPU has a dedicated period and budget.
+ * Each VCPU has a dedicated period, budget and is_work_conserving flag
  * The deadline of a VCPU is at the end of each period;
  * A VCPU has its budget replenished at the beginning of each period;
  * While scheduled, a VCPU burns its budget.
  * The VCPU needs to finish its budget before its deadline in each period;
  * The VCPU discards its unused budget at the end of each period.
- * If a VCPU runs out of budget in a period, it has to wait until next period.
+ * A work conserving VCPU has is_work_conserving flag set to true;
+ * When a VCPU runs out of budget in a period, if it is work conserving,
+ * it increases its priority_level by 1 and refill its budget; otherwise,
+ * it has to wait until next period.
  *
  * Each VCPU is implemented as a deferable server.
  * When a VCPU has a task running on it, its budget is continuously burned;
@@ -63,7 +66,8 @@
  *
  * Queue scheme:
  * A global runqueue and a global depletedqueue for each CPU pool.
- * The runqueue holds all runnable VCPUs with budget, sorted by deadline;
+ * The runqueue holds all runnable VCPUs with budget,
+ * sorted by priority_level and deadline;
  * The depletedqueue holds all VCPUs without budget, unsorted;
  *
  * Note: cpumask and cpupool is supported.
@@ -191,6 +195,7 @@ struct rt_vcpu {
     /* VCPU parameters, in nanoseconds */
     s_time_t period;
     s_time_t budget;
+    bool_t is_work_conserving;   /* is vcpu work conserving */
 
     /* VCPU current infomation in nanosecond */
     s_time_t cur_budget;         /* current budget */
@@ -201,6 +206,8 @@ struct rt_vcpu {
     struct rt_dom *sdom;
     struct vcpu *vcpu;
 
+    unsigned priority_level;
+
     unsigned flags;              /* mark __RTDS_scheduled, etc.. */
 };
 
@@ -245,6 +252,11 @@ static inline struct list_head *rt_replq(const struct scheduler *ops)
     return &rt_priv(ops)->replq;
 }
 
+static inline bool_t is_work_conserving(const struct rt_vcpu *svc)
+{
+    return svc->is_work_conserving;
+}
+
 /*
  * Helper functions for manipulating the runqueue, the depleted queue,
  * and the replenishment events queue.
@@ -273,6 +285,20 @@ vcpu_on_replq(const struct rt_vcpu *svc)
     return !list_empty(&svc->replq_elem);
 }
 
+/* If v1 priority >= v2 priority, return value > 0
+ * Otherwise, return value < 0
+ */
+static int
+compare_vcpu_priority(const struct rt_vcpu *v1, const struct rt_vcpu *v2)
+{
+    if ( v1->priority_level < v2->priority_level ||
+         ( v1->priority_level == v2->priority_level && 
+             v1->cur_deadline <= v2->cur_deadline ) )
+            return 1;
+    else
+        return -1;
+}
+
 /*
  * Debug related code, dump vcpu/cpu information
  */
@@ -303,6 +329,7 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
     cpulist_scnprintf(keyhandler_scratch, sizeof(keyhandler_scratch), mask);
     printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime"),"
            " cur_b=%"PRI_stime" cur_d=%"PRI_stime" last_start=%"PRI_stime"\n"
+           " \t\t priority_level=%d work_conserving=%d\n"
            " \t\t onQ=%d runnable=%d flags=%x effective hard_affinity=%s\n",
             svc->vcpu->domain->domain_id,
             svc->vcpu->vcpu_id,
@@ -312,6 +339,8 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
             svc->cur_budget,
             svc->cur_deadline,
             svc->last_start,
+            svc->priority_level,
+            is_work_conserving(svc),
             vcpu_on_q(svc),
             vcpu_runnable(svc->vcpu),
             svc->flags,
@@ -423,15 +452,18 @@ rt_update_deadline(s_time_t now, struct rt_vcpu *svc)
      */
     svc->last_start = now;
     svc->cur_budget = svc->budget;
+    svc->priority_level = 0;
 
     /* TRACE */
     {
         struct __packed {
             unsigned vcpu:16, dom:16;
+            unsigned priority_level;
             uint64_t cur_deadline, cur_budget;
         } d;
         d.dom = svc->vcpu->domain->domain_id;
         d.vcpu = svc->vcpu->vcpu_id;
+        d.priority_level = svc->priority_level;
         d.cur_deadline = (uint64_t) svc->cur_deadline;
         d.cur_budget = (uint64_t) svc->cur_budget;
         trace_var(TRC_RTDS_BUDGET_REPLENISH, 1,
@@ -477,7 +509,7 @@ deadline_queue_insert(struct rt_vcpu * (*qelem)(struct list_head *),
     list_for_each ( iter, queue )
     {
         struct rt_vcpu * iter_svc = (*qelem)(iter);
-        if ( svc->cur_deadline <= iter_svc->cur_deadline )
+        if ( compare_vcpu_priority(svc, iter_svc) > 0 )
             break;
         pos++;
     }
@@ -537,8 +569,9 @@ runq_insert(const struct scheduler *ops, struct rt_vcpu *svc)
     ASSERT( !vcpu_on_q(svc) );
     ASSERT( vcpu_on_replq(svc) );
 
-    /* add svc to runq if svc still has budget */
-    if ( svc->cur_budget > 0 )
+    /* add svc to runq if svc still has budget or svc is work_conserving */
+    if ( svc->cur_budget > 0 ||
+         is_work_conserving(svc) )
         deadline_runq_insert(svc, &svc->q_elem, runq);
     else
         list_add(&svc->q_elem, &prv->depletedq);
@@ -857,6 +890,8 @@ rt_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd)
     svc->vcpu = vc;
     svc->last_start = 0;
 
+    svc->is_work_conserving = 1;
+    svc->priority_level = 0;
     svc->period = RTDS_DEFAULT_PERIOD;
     if ( !is_idle_vcpu(vc) )
         svc->budget = RTDS_DEFAULT_BUDGET;
@@ -966,8 +1001,16 @@ burn_budget(const struct scheduler *ops, struct rt_vcpu *svc, s_time_t now)
 
     if ( svc->cur_budget <= 0 )
     {
-        svc->cur_budget = 0;
-        __set_bit(__RTDS_depleted, &svc->flags);
+        if ( is_work_conserving(svc) )
+        {
+            svc->priority_level++;
+            svc->cur_budget = svc->budget;
+        }
+        else
+        {
+            svc->cur_budget = 0;
+            __set_bit(__RTDS_depleted, &svc->flags);
+        }
     }
 
     /* TRACE */
@@ -976,11 +1019,15 @@ burn_budget(const struct scheduler *ops, struct rt_vcpu *svc, s_time_t now)
             unsigned vcpu:16, dom:16;
             uint64_t cur_budget;
             int delta;
+            unsigned priority_level;
+            bool_t is_work_conserving;
         } d;
         d.dom = svc->vcpu->domain->domain_id;
         d.vcpu = svc->vcpu->vcpu_id;
         d.cur_budget = (uint64_t) svc->cur_budget;
         d.delta = delta;
+        d.priority_level = svc->priority_level;
+        d.is_work_conserving = svc->is_work_conserving;
         trace_var(TRC_RTDS_BUDGET_BURN, 1,
                   sizeof(d),
                   (unsigned char *) &d);
@@ -1088,7 +1135,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched
              vcpu_runnable(current) &&
              scurr->cur_budget > 0 &&
              ( is_idle_vcpu(snext->vcpu) ||
-               scurr->cur_deadline <= snext->cur_deadline ) )
+               compare_vcpu_priority(scurr, snext) > 0 ) )
             snext = scurr;
     }
 
@@ -1198,13 +1245,13 @@ runq_tickle(const struct scheduler *ops, struct rt_vcpu *new)
         }
         iter_svc = rt_vcpu(iter_vc);
         if ( latest_deadline_vcpu == NULL ||
-             iter_svc->cur_deadline > latest_deadline_vcpu->cur_deadline )
+             compare_vcpu_priority(iter_svc, latest_deadline_vcpu) < 0 )
             latest_deadline_vcpu = iter_svc;
     }
 
     /* 3) candicate has higher priority, kick out lowest priority vcpu */
     if ( latest_deadline_vcpu != NULL &&
-         new->cur_deadline < latest_deadline_vcpu->cur_deadline )
+         compare_vcpu_priority(latest_deadline_vcpu, new) < 0 )
     {
         SCHED_STAT_CRANK(tickled_busy_cpu);
         cpu_to_tickle = latest_deadline_vcpu->vcpu->processor;
@@ -1493,7 +1540,7 @@ static void repl_timer_handler(void *data){
         {
             struct rt_vcpu *next_on_runq = q_elem(runq->next);
 
-            if ( svc->cur_deadline > next_on_runq->cur_deadline )
+            if ( compare_vcpu_priority(svc, next_on_runq) < 0 )
                 runq_tickle(ops, next_on_runq);
         }
         else if ( __test_and_clear_bit(__RTDS_depleted, &svc->flags) &&