From patchwork Tue Aug 1 18:13:30 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Meng Xu X-Patchwork-Id: 9875213 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 5E1146038F for ; Tue, 1 Aug 2017 18:16:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 567512871C for ; Tue, 1 Aug 2017 18:16:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4B01E28721; Tue, 1 Aug 2017 18:16:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6A1C72871C for ; Tue, 1 Aug 2017 18:16:34 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dcbfz-0005Yi-4h; Tue, 01 Aug 2017 18:13:51 +0000 Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dcbfx-0005Yc-AC for xen-devel@lists.xenproject.org; Tue, 01 Aug 2017 18:13:49 +0000 Received: from [85.158.143.35] by server-6.bemta-6.messagelabs.com id 19/94-03937-CD4C0895; Tue, 01 Aug 2017 18:13:48 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFrrGLMWRWlGSWpSXmKPExsUyr8m9Sff2kYZ Ig8aNhhbft0xmcmD0OPzhCksAYxRrZl5SfkUCa8alj6tYCpZ4VDy+uZq1gfGDaRcjF4eQwCwm iS2buhm7GDk52ARUJI5veMQKYosIKEncWzWZCcRmFqiUuDnpODuILSxgJ3GxfScziM0ioCqxe dNssDivgLPEp0OLweISAnISJ49NZoWwQyXWLD7PBBN//PAB4wRGrgWMDKsYNYpTi8pSi3SNDf WSijLTM0pyEzNzdA0NzPRyU4uLE9NTcxKTivWS83M3MQL9yAAEOxibFgUeYpTkYFIS5VXsqY8 U4kvKT6nMSCzOiC8qzUktPsQow8GhJMF77XBDpJBgUWp6akVaZg4woGDSEhw8SiK8+0HSvMUF ibnFmekQqVOMuhyvJvz/xiTEkpeflyolzssGDE8hAZCijNI8uBGw4L7EKCslzMsIdJQQT0FqU W5mCar8K0ZxDkYlYV5DkCk8mXklcJteAR3BBHSEZGktyBEliQgpqQZG4XC3C5y3nZymPE2+ks j7oSDqGsutAmuTu+qVC4UDjRbdfTLxbknsX9ZGradZOf/PzPsUffGpxtev3Qa5i/5ek+DfIbJ wggebVYD11umZyyzXNqRoLzkULqbIl5+3bkqPRuqHffmfxPNi4mcHRm875ajvadHO+n01zwRV Ia7Lv3+d4VvMbqCnxFKckWioxVxUnAgAUmj5lGkCAAA= X-Env-Sender: mengxu@cis.upenn.edu X-Msg-Ref: server-13.tower-21.messagelabs.com!1501611227!70135686!1 X-Originating-IP: [158.130.71.130] X-SpamReason: No, hits=0.0 required=7.0 tests= X-StarScan-Received: X-StarScan-Version: 9.4.25; banners=-,-,- X-VirusChecked: Checked Received: (qmail 28500 invoked from network); 1 Aug 2017 18:13:47 -0000 Received: from coyote.seas.upenn.edu (HELO hound.seas.upenn.edu) (158.130.71.130) by server-13.tower-21.messagelabs.com with SMTP; 1 Aug 2017 18:13:47 -0000 Received: from panda-catbroadwell.cis.upenn.edu (SEASnet-48-12.cis.upenn.edu [158.130.48.13]) (authenticated bits=0) by hound.seas.upenn.edu (8.15.2/8.14.5) with ESMTPSA id v71IDbJa001074 (version=TLSv1.2 cipher=AES256-SHA256 bits=256 verify=NOT); Tue, 1 Aug 2017 14:13:37 -0400 From: Meng Xu To: xen-devel@lists.xenproject.org Date: Tue, 1 Aug 2017 14:13:30 -0400 Message-Id: <1501611210-5232-1-git-send-email-mengxu@cis.upenn.edu> X-Mailer: git-send-email 1.9.1 X-Proofpoint-Virus-Version: vendor=nai engine=5600 definitions=5800 signatures=585085 X-Proofpoint-Spam-Reason: safe Cc: george.dunlap@eu.citrix.com, dario.faggioli@citrix.com, xumengpanda@gmail.com, Meng Xu Subject: [Xen-devel] [PATCH RFC v1] xen:rtds: towards work conserving RTDS X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Make RTDS scheduler work conserving to utilize the idle resource, without breaking the real-time guarantees. VCPU model: Each real-time VCPU is extended to have a work conserving flag and a priority_level field. When a VCPU's budget is depleted in the current period, if it has work conserving flag set, its priority_level will increase by 1 and its budget will be refilled; othewrise, the VCPU will be moved to the depletedq. Scheduling policy: modified global EDF: A VCPU v1 has higher priority than another VCPU v2 if (i) v1 has smaller priority_leve; or (ii) v1 has the same priority_level but has a smaller deadline Signed-off-by: Meng Xu --- xen/common/sched_rt.c | 71 ++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 59 insertions(+), 12 deletions(-) diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c index 39f6bee..740a712 100644 --- a/xen/common/sched_rt.c +++ b/xen/common/sched_rt.c @@ -49,13 +49,16 @@ * A PCPU is feasible if the VCPU can run on this PCPU and (the PCPU is idle or * has a lower-priority VCPU running on it.) * - * Each VCPU has a dedicated period and budget. + * Each VCPU has a dedicated period, budget and is_work_conserving flag * The deadline of a VCPU is at the end of each period; * A VCPU has its budget replenished at the beginning of each period; * While scheduled, a VCPU burns its budget. * The VCPU needs to finish its budget before its deadline in each period; * The VCPU discards its unused budget at the end of each period. - * If a VCPU runs out of budget in a period, it has to wait until next period. + * A work conserving VCPU has is_work_conserving flag set to true; + * When a VCPU runs out of budget in a period, if it is work conserving, + * it increases its priority_level by 1 and refill its budget; otherwise, + * it has to wait until next period. * * Each VCPU is implemented as a deferable server. * When a VCPU has a task running on it, its budget is continuously burned; @@ -63,7 +66,8 @@ * * Queue scheme: * A global runqueue and a global depletedqueue for each CPU pool. - * The runqueue holds all runnable VCPUs with budget, sorted by deadline; + * The runqueue holds all runnable VCPUs with budget, + * sorted by priority_level and deadline; * The depletedqueue holds all VCPUs without budget, unsorted; * * Note: cpumask and cpupool is supported. @@ -191,6 +195,7 @@ struct rt_vcpu { /* VCPU parameters, in nanoseconds */ s_time_t period; s_time_t budget; + bool_t is_work_conserving; /* is vcpu work conserving */ /* VCPU current infomation in nanosecond */ s_time_t cur_budget; /* current budget */ @@ -201,6 +206,8 @@ struct rt_vcpu { struct rt_dom *sdom; struct vcpu *vcpu; + unsigned priority_level; + unsigned flags; /* mark __RTDS_scheduled, etc.. */ }; @@ -245,6 +252,11 @@ static inline struct list_head *rt_replq(const struct scheduler *ops) return &rt_priv(ops)->replq; } +static inline bool_t is_work_conserving(const struct rt_vcpu *svc) +{ + return svc->is_work_conserving; +} + /* * Helper functions for manipulating the runqueue, the depleted queue, * and the replenishment events queue. @@ -273,6 +285,20 @@ vcpu_on_replq(const struct rt_vcpu *svc) return !list_empty(&svc->replq_elem); } +/* If v1 priority >= v2 priority, return value > 0 + * Otherwise, return value < 0 + */ +static int +compare_vcpu_priority(const struct rt_vcpu *v1, const struct rt_vcpu *v2) +{ + if ( v1->priority_level < v2->priority_level || + ( v1->priority_level == v2->priority_level && + v1->cur_deadline <= v2->cur_deadline ) ) + return 1; + else + return -1; +} + /* * Debug related code, dump vcpu/cpu information */ @@ -303,6 +329,7 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc) cpulist_scnprintf(keyhandler_scratch, sizeof(keyhandler_scratch), mask); printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime")," " cur_b=%"PRI_stime" cur_d=%"PRI_stime" last_start=%"PRI_stime"\n" + " \t\t priority_level=%d work_conserving=%d\n" " \t\t onQ=%d runnable=%d flags=%x effective hard_affinity=%s\n", svc->vcpu->domain->domain_id, svc->vcpu->vcpu_id, @@ -312,6 +339,8 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc) svc->cur_budget, svc->cur_deadline, svc->last_start, + svc->priority_level, + is_work_conserving(svc), vcpu_on_q(svc), vcpu_runnable(svc->vcpu), svc->flags, @@ -423,15 +452,18 @@ rt_update_deadline(s_time_t now, struct rt_vcpu *svc) */ svc->last_start = now; svc->cur_budget = svc->budget; + svc->priority_level = 0; /* TRACE */ { struct __packed { unsigned vcpu:16, dom:16; + unsigned priority_level; uint64_t cur_deadline, cur_budget; } d; d.dom = svc->vcpu->domain->domain_id; d.vcpu = svc->vcpu->vcpu_id; + d.priority_level = svc->priority_level; d.cur_deadline = (uint64_t) svc->cur_deadline; d.cur_budget = (uint64_t) svc->cur_budget; trace_var(TRC_RTDS_BUDGET_REPLENISH, 1, @@ -477,7 +509,7 @@ deadline_queue_insert(struct rt_vcpu * (*qelem)(struct list_head *), list_for_each ( iter, queue ) { struct rt_vcpu * iter_svc = (*qelem)(iter); - if ( svc->cur_deadline <= iter_svc->cur_deadline ) + if ( compare_vcpu_priority(svc, iter_svc) > 0 ) break; pos++; } @@ -537,8 +569,9 @@ runq_insert(const struct scheduler *ops, struct rt_vcpu *svc) ASSERT( !vcpu_on_q(svc) ); ASSERT( vcpu_on_replq(svc) ); - /* add svc to runq if svc still has budget */ - if ( svc->cur_budget > 0 ) + /* add svc to runq if svc still has budget or svc is work_conserving */ + if ( svc->cur_budget > 0 || + is_work_conserving(svc) ) deadline_runq_insert(svc, &svc->q_elem, runq); else list_add(&svc->q_elem, &prv->depletedq); @@ -857,6 +890,8 @@ rt_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd) svc->vcpu = vc; svc->last_start = 0; + svc->is_work_conserving = 1; + svc->priority_level = 0; svc->period = RTDS_DEFAULT_PERIOD; if ( !is_idle_vcpu(vc) ) svc->budget = RTDS_DEFAULT_BUDGET; @@ -966,8 +1001,16 @@ burn_budget(const struct scheduler *ops, struct rt_vcpu *svc, s_time_t now) if ( svc->cur_budget <= 0 ) { - svc->cur_budget = 0; - __set_bit(__RTDS_depleted, &svc->flags); + if ( is_work_conserving(svc) ) + { + svc->priority_level++; + svc->cur_budget = svc->budget; + } + else + { + svc->cur_budget = 0; + __set_bit(__RTDS_depleted, &svc->flags); + } } /* TRACE */ @@ -976,11 +1019,15 @@ burn_budget(const struct scheduler *ops, struct rt_vcpu *svc, s_time_t now) unsigned vcpu:16, dom:16; uint64_t cur_budget; int delta; + unsigned priority_level; + bool_t is_work_conserving; } d; d.dom = svc->vcpu->domain->domain_id; d.vcpu = svc->vcpu->vcpu_id; d.cur_budget = (uint64_t) svc->cur_budget; d.delta = delta; + d.priority_level = svc->priority_level; + d.is_work_conserving = svc->is_work_conserving; trace_var(TRC_RTDS_BUDGET_BURN, 1, sizeof(d), (unsigned char *) &d); @@ -1088,7 +1135,7 @@ rt_schedule(const struct scheduler *ops, s_time_t now, bool_t tasklet_work_sched vcpu_runnable(current) && scurr->cur_budget > 0 && ( is_idle_vcpu(snext->vcpu) || - scurr->cur_deadline <= snext->cur_deadline ) ) + compare_vcpu_priority(scurr, snext) > 0 ) ) snext = scurr; } @@ -1198,13 +1245,13 @@ runq_tickle(const struct scheduler *ops, struct rt_vcpu *new) } iter_svc = rt_vcpu(iter_vc); if ( latest_deadline_vcpu == NULL || - iter_svc->cur_deadline > latest_deadline_vcpu->cur_deadline ) + compare_vcpu_priority(iter_svc, latest_deadline_vcpu) < 0 ) latest_deadline_vcpu = iter_svc; } /* 3) candicate has higher priority, kick out lowest priority vcpu */ if ( latest_deadline_vcpu != NULL && - new->cur_deadline < latest_deadline_vcpu->cur_deadline ) + compare_vcpu_priority(latest_deadline_vcpu, new) < 0 ) { SCHED_STAT_CRANK(tickled_busy_cpu); cpu_to_tickle = latest_deadline_vcpu->vcpu->processor; @@ -1493,7 +1540,7 @@ static void repl_timer_handler(void *data){ { struct rt_vcpu *next_on_runq = q_elem(runq->next); - if ( svc->cur_deadline > next_on_runq->cur_deadline ) + if ( compare_vcpu_priority(svc, next_on_runq) < 0 ) runq_tickle(ops, next_on_runq); } else if ( __test_and_clear_bit(__RTDS_depleted, &svc->flags) &&