From patchwork Wed Jul 6 17:33:34 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Anshul Makkaranshul.makkar"@citrix.com X-Patchwork-Id: 9216775 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 5361760752 for ; Wed, 6 Jul 2016 17:36:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 44886280CF for ; Wed, 6 Jul 2016 17:36:39 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 395FA28111; Wed, 6 Jul 2016 17:36:39 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 90957280CF for ; Wed, 6 Jul 2016 17:36:38 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bKqij-0005EV-QO; Wed, 06 Jul 2016 17:34:45 +0000 Received: from mail6.bemta6.messagelabs.com ([85.158.143.247]) by lists.xenproject.org with esmtp (Exim 4.84_2) id 1bKqii-0005DT-1v for xen-devel@lists.xen.org; Wed, 06 Jul 2016 17:34:44 +0000 Received: from [85.158.143.35] by server-1.bemta-6.messagelabs.com id A3/D7-09256-3314D775; Wed, 06 Jul 2016 17:34:43 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFupjkeJIrShJLcpLzFFi42JxWrrBXtfIsTb c4NkqI4slHxezODB6HN39mymAMYo1My8pvyKBNWPSkiMsBQs8Kt6f72dpYDxh0sXIySEh4C/R cXAGM4jNJqAnMeHmfkYQW0RAVmJ11xx2EJtZIFPi6wKIGmEBX4lPc08xgdgsAioSDyedZQGxe QU8JKY03mOFmCkncfLYZCCbg0NIgF9iwu0qiBJBiZMzn7BAjJSQOPjiBTNIiYQAt8TfbvsJjD yzkFTNQlK1gJFpFaN6cWpRWWqRrpFeUlFmekZJbmJmjq6hgZlebmpxcWJ6ak5iUrFecn7uJkZ geDAAwQ7GZX+dDjFKcjApifKyfKsOF+JLyk+pzEgszogvKs1JLT7EqMHBITDh7NzpTFIsefl5 qUoSvKwOteFCgkWp6akVaZk5wACGKZXg4FES4fUASfMWFyTmFmemQ6ROMepy/Jl6fS2TENgMK XFeJpAiAZCijNI8uBGwaLrEKCslzMsIdKAQT0FqUW5mCar8K0ZxDkYlYd4ikCk8mXklcJteAR 3BBHTET5dqkCNKEhFSUg2MJf9q/e3cHU4uK1m+49S8tiVZ9+R0e58dO7ykPPBcbp/u/lyD7/J T38x/IihsccNxS+EuVc+G3L/N3zbcvvTgglpykmD2+UDFYL3/G2YsWnTjZfP+3aYxPjLXqucu mmnOa7L1n6PNiZuHfHXTg0tSr2nK+7oG8TJwPzb5Vvz524Knczgvdax0UGIpzkg01GIuKk4EA NmJVDqhAgAA X-Msg-Ref: server-5.tower-21.messagelabs.com!1467826481!22626983!1 X-Originating-IP: [66.165.176.63] X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor: VHJ1c3RlZCBJUDogNjYuMTY1LjE3Ni42MyA9PiAzMDYwNDg=\n, received_headers: No Received headers X-StarScan-Received: X-StarScan-Version: 8.77; banners=-,-,- X-VirusChecked: Checked Received: (qmail 60695 invoked from network); 6 Jul 2016 17:34:42 -0000 Received: from smtp02.citrix.com (HELO SMTP02.CITRIX.COM) (66.165.176.63) by server-5.tower-21.messagelabs.com with RC4-SHA encrypted SMTP; 6 Jul 2016 17:34:42 -0000 X-IronPort-AV: E=Sophos;i="5.28,320,1464652800"; d="scan'208";a="371611816" From: <"Anshul Makkaranshul.makkar"@citrix.com> To: Date: Wed, 6 Jul 2016 18:33:34 +0100 Message-ID: <1467826414-17337-1-git-send-email-anshul.makkar@citrix.com> X-Mailer: git-send-email 1.9.1 MIME-Version: 1.0 X-DLP: MIA1 Cc: george.dunlap@eu.citrix.com, dario.faggioli@citrix.com, Anshul Makkar Subject: [Xen-devel] [PATCH] credi2-ratelimit: Implement rate limit for credit2 scheduler X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP From: Anshul Makkar Rate limit assures that a vcpu will execute for a minimum amount of time before being put at the back of a queue or being preempted by higher priority thread. It introduces a minimum amount of latency to enable a VM to batch its work and it also ensures that system is not spending most of its time in VMEXIT/VMENTRY because of VM that is waking/sleeping at high rate. ratelimit can be disabled by setting it to 0. Signed-off-by: Anshul Makkar --- --- xen/common/sched_credit2.c | 115 ++++++++++++++++++++++++++++++++++++++------- 1 file changed, 98 insertions(+), 17 deletions(-) diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index 1933ff1..6718574 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -171,6 +171,11 @@ integer_param("sched_credit2_migrate_resist", opt_migrate_resist); #define c2r(_ops, _cpu) (CSCHED2_PRIV(_ops)->runq_map[(_cpu)]) /* CPU to runqueue struct macro */ #define RQD(_ops, _cpu) (&CSCHED2_PRIV(_ops)->rqd[c2r(_ops, _cpu)]) +/* Find the max of time slice */ +#define MAX_TSLICE(t1, t2) \ + ({ typeof (t1) _t1 = (t1); \ + typeof (t1) _t2 = (t2); \ + _t1 > _t2 ? _t1 < 0 ? 0 : _t1 : _t2 < 0 ? 0 : _t2; }) /* * Shifts for load average. @@ -280,6 +285,7 @@ struct csched2_private { struct csched2_runqueue_data rqd[NR_CPUS]; unsigned int load_window_shift; + unsigned ratelimit_us; /* each cpupool can have its onw ratelimit */ }; /* @@ -1588,6 +1594,34 @@ csched2_dom_cntl( return rc; } +static int csched2_sys_cntl(const struct scheduler *ops, + struct xen_sysctl_scheduler_op *sc) +{ + int rc = -EINVAL; + xen_sysctl_credit_schedule_t *params = &sc->u.sched_credit; + struct csched2_private *prv = CSCHED2_PRIV(ops); + unsigned long flags; + + switch (sc->cmd ) + { + case XEN_SYSCTL_SCHEDOP_putinfo: + if ( params->ratelimit_us && + ( params->ratelimit_us < CSCHED2_MIN_TIMER || + params->ratelimit_us > MICROSECS(CSCHED2_MAX_TIMER) )) + return rc; + spin_lock_irqsave(&prv->lock, flags); + prv->ratelimit_us = params->ratelimit_us; + spin_unlock_irqrestore(&prv->lock, flags); + break; + + case XEN_SYSCTL_SCHEDOP_getinfo: + params->ratelimit_us = prv->ratelimit_us; + rc = 0; + break; + } + return rc; +} + static void * csched2_alloc_domdata(const struct scheduler *ops, struct domain *dom) { @@ -1657,12 +1691,15 @@ csched2_dom_destroy(const struct scheduler *ops, struct domain *dom) /* How long should we let this vcpu run for? */ static s_time_t -csched2_runtime(const struct scheduler *ops, int cpu, struct csched2_vcpu *snext) +csched2_runtime(const struct scheduler *ops, int cpu, + struct csched2_vcpu *snext, s_time_t now) { - s_time_t time; + s_time_t time; int rt_credit; /* Proposed runtime measured in credits */ struct csched2_runqueue_data *rqd = RQD(ops, cpu); struct list_head *runq = &rqd->runq; + s_time_t runtime = 0; + struct csched2_private *prv = CSCHED2_PRIV(ops); /* * If we're idle, just stay so. Others (or external events) @@ -1680,6 +1717,14 @@ csched2_runtime(const struct scheduler *ops, int cpu, struct csched2_vcpu *snext /* 1) Basic time: Run until credit is 0. */ rt_credit = snext->credit; + if (snext->vcpu->is_running) + runtime = now - snext->vcpu->runstate.state_entry_time; + if ( runtime < 0 ) + { + runtime = 0; + d2printk("%s: Time went backwards? now %"PRI_stime" state_entry_time %"PRI_stime"\n", + _func__, now, snext->runstate.state_entry_time); + } /* 2) If there's someone waiting whose credit is positive, * run until your credit ~= his */ @@ -1695,11 +1740,24 @@ csched2_runtime(const struct scheduler *ops, int cpu, struct csched2_vcpu *snext } /* The next guy may actually have a higher credit, if we've tried to - * avoid migrating him from a different cpu. DTRT. */ + * avoid migrating him from a different cpu. DTRT. + * Even if the next guy has higher credit and current vcpu has executed + * for less amount of time than rate limit, allow it to run for minimum + * amount of time. + */ if ( rt_credit <= 0 ) { - time = CSCHED2_MIN_TIMER; - SCHED_STAT_CRANK(runtime_min_timer); + if ( snext->vcpu->is_running && prv->ratelimit_us) + /* implies the current one has executed for time < ratelimit and thus + * it has neen selcted int runq_candidate to run next. + * No need to check for this condition again. + */ + time = MAX_TSLICE(CSCHED2_MIN_TIMER, + MICROSECS(prv->ratelimit_us) - runtime); + else + time = MAX_TSLICE(CSCHED2_MIN_TIMER, MICROSECS(prv->ratelimit_us)); + + SCHED_STAT_CRANK(time); } else { @@ -1709,17 +1767,22 @@ csched2_runtime(const struct scheduler *ops, int cpu, struct csched2_vcpu *snext * at a different rate. */ time = c2t(rqd, rt_credit, snext); - /* Check limits */ - if ( time < CSCHED2_MIN_TIMER ) - { - time = CSCHED2_MIN_TIMER; - SCHED_STAT_CRANK(runtime_min_timer); - } - else if ( time > CSCHED2_MAX_TIMER ) + /* Check limits. + * time > ratelimit --> time > MIN. + */ + if ( time > CSCHED2_MAX_TIMER ) { + time = CSCHED2_MAX_TIMER; SCHED_STAT_CRANK(runtime_max_timer); } + else + { + time = MAX_TSLICE(MAX_TSLICE(CSCHED2_MIN_TIMER, + MICROSECS(prv->ratelimit_us)), time); + SCHED_STAT_CRANK(time); + } + } return time; @@ -1733,7 +1796,7 @@ void __dump_execstate(void *unused); static struct csched2_vcpu * runq_candidate(struct csched2_runqueue_data *rqd, struct csched2_vcpu *scurr, - int cpu, s_time_t now) + int cpu, s_time_t now, struct csched2_private *prv) { struct list_head *iter; struct csched2_vcpu *snext = NULL; @@ -1744,6 +1807,16 @@ runq_candidate(struct csched2_runqueue_data *rqd, else snext = CSCHED2_VCPU(idle_vcpu[cpu]); + /* return the current vcpu if it has executed for less than ratelimit. + * Adjuststment for the selected vcpu's credit and decision + * for how long it will run will be taken in csched2_runtime. + */ + if ( prv->ratelimit_us && !is_idle_vcpu(scurr->vcpu) && + vcpu_runnable(scurr->vcpu) && + (now - scurr->vcpu->runstate.state_entry_time) < + MICROSECS(prv->ratelimit_us) ) + return scurr; + list_for_each( iter, &rqd->runq ) { struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem); @@ -1762,9 +1835,13 @@ runq_candidate(struct csched2_runqueue_data *rqd, } /* If the next one on the list has more credit than current - * (or idle, if current is not runnable), choose it. */ + * (or idle, if current is not runnable) and current one has already + * executed for more than ratelimit. choose it. + * Control has reached here means that current vcpu has executed > + * ratelimit_us or ratelimit is off, so chose the next one. + */ if ( svc->credit > snext->credit ) - snext = svc; + snext = svc; /* In any case, if we got this far, break. */ break; @@ -1787,6 +1864,7 @@ csched2_schedule( struct csched2_vcpu * const scurr = CSCHED2_VCPU(current); struct csched2_vcpu *snext = NULL; struct task_slice ret; + struct csched2_private *prv = CSCHED2_PRIV(ops); SCHED_STAT_CRANK(schedule); CSCHED2_VCPU_CHECK(current); @@ -1857,7 +1935,7 @@ csched2_schedule( snext = CSCHED2_VCPU(idle_vcpu[cpu]); } else - snext=runq_candidate(rqd, scurr, cpu, now); + snext=runq_candidate(rqd, scurr, cpu, now, prv); /* If switching from a non-idle runnable vcpu, put it * back on the runqueue. */ @@ -1921,7 +1999,7 @@ csched2_schedule( /* * Return task to run next... */ - ret.time = csched2_runtime(ops, cpu, snext); + ret.time = csched2_runtime(ops, cpu, snext, now); ret.task = snext->vcpu; CSCHED2_VCPU_CHECK(ret.task); @@ -2353,6 +2431,8 @@ csched2_init(struct scheduler *ops) prv->runq_map[i] = -1; prv->rqd[i].id = -1; } + /* initialize ratelimit */ + prv->ratelimit_us = 1000 * CSCHED2_MIN_TIMER; prv->load_window_shift = opt_load_window_shift; @@ -2385,6 +2465,7 @@ static const struct scheduler sched_credit2_def = { .wake = csched2_vcpu_wake, .adjust = csched2_dom_cntl, + .adjust_global = csched2_sys_cntl, .pick_cpu = csched2_cpu_pick, .migrate = csched2_vcpu_migrate,