From patchwork Mon Jul 25 16:12:27 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anshul Makkar X-Patchwork-Id: 9246093 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 46E3760757 for ; Mon, 25 Jul 2016 17:27:25 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 35E1B24B48 for ; Mon, 25 Jul 2016 17:27:25 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2AAD826490; Mon, 25 Jul 2016 17:27:25 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 9FD3C24B48 for ; Mon, 25 Jul 2016 17:27:24 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bRjct-0008MC-69; Mon, 25 Jul 2016 17:25:11 +0000 Received: from mail6.bemta14.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bRjcs-0008LB-9t for xen-devel@lists.xen.org; Mon, 25 Jul 2016 17:25:10 +0000 Received: from [193.109.254.147] by server-10.bemta-14.messagelabs.com id EC/71-03469-57B46975; Mon, 25 Jul 2016 17:25:09 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmpmkeJIrShJLcpLzFFi42JxWrrBXrfUe1q 4wdY+FoslHxezODB6HN39mymAMYo1My8pvyKBNWPz65qC01YVL/smMTcwbtbuYuTkkBDwl+i+ Pp8NxGYT0JM4cusPO4gtIiArsbprDpjNLJAp8XXBDOYuRi4OYYG3jBJHvz0Fcjg4WARUJdZfz wExeQXcJbY0R0GMlJM4eWwyK4gtBFTR++wQE4jNKyAocXLmExaIkRISB1+8YIao55a4fXoq8w RGnllIymYhKVvAyLSKUaM4tagstUjXyEIvqSgzPaMkNzEzR9fQ0EQvN7W4ODE9NScxqVgvOT9 3EyMwQOoZGBh3MK477neIUZKDSUmU14FxWrgQX1J+SmVGYnFGfFFpTmrxIUYNDg6BCWfnTmeS YsnLz0tVkuAN8wKqEyxKTU+tSMvMAYYwTKkEB4+SCG87SJq3uCAxtzgzHSJ1ilFRSpw3HiQhA JLIKM2Da4PFzSVGWSlhXkYGBgYhnoLUotzMElT5V4ziHIxKwrzzQKbwZOaVwE1/BbSYCWjxAp 7JIItLEhFSUg2MWxt2SP7refIxUKhePLo1u+WD0rYHN26tTnwlEHKhaI/5DnHBZBU9gWBj8We uzmGXS2W3bFPdWitjIHOotjhqZ2KHGF9u4JdvxQ8ME35KZr84+n7ZRpYzynJHDuRMXa29c88s qUruy7mCqnssJOJbm9+/3Pdtf/w/cyu+LYERpfJrdDWeXXFRYinOSDTUYi4qTgQAL6u3HJYCA AA= X-Env-Sender: prvs=0075993fc=anshul.makkar@citrix.com X-Msg-Ref: server-8.tower-27.messagelabs.com!1469467507!45457953!1 X-Originating-IP: [66.165.176.63] X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor: VHJ1c3RlZCBJUDogNjYuMTY1LjE3Ni42MyA9PiAzMDYwNDg=\n, received_headers: No Received headers X-StarScan-Received: X-StarScan-Version: 8.77; banners=-,-,- X-VirusChecked: Checked Received: (qmail 49746 invoked from network); 25 Jul 2016 17:25:08 -0000 Received: from smtp02.citrix.com (HELO SMTP02.CITRIX.COM) (66.165.176.63) by server-8.tower-27.messagelabs.com with RC4-SHA encrypted SMTP; 25 Jul 2016 17:25:08 -0000 X-IronPort-AV: E=Sophos;i="5.28,420,1464652800"; d="scan'208";a="375272092" From: Anshul Makkar To: Date: Mon, 25 Jul 2016 17:12:27 +0100 Message-ID: <1469463147-4798-1-git-send-email-anshul.makkar@citrix.com> X-Mailer: git-send-email 1.9.1 MIME-Version: 1.0 X-DLP: MIA2 Cc: george.dunlap@eu.citrix.com, dario.faggioli@citrix.com, Anshul Makkar Subject: [Xen-devel] [PATCH -v3 1/1] ratelimit: Implement rate limit for credit2 scheduler Rate limit assures that a vcpu will execute for a minimum amount of time before being put at the back of a queue or being preempted by higher priority thread. X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP It introduces context-switch rate-limiting. The patch enables the VM to batch its work and prevents the system from spending most of its time in context switches because of a VM that is waking/sleeping at high rate. ratelimit can be disabled by setting it to 0. Signed-off-by: Anshul Makkar Reviewed-by: George Dunlap Reviewed-by: Dario Faggioli --- Changes in v3: * General fixes based on the review comments from v1 and v2. --- xen/common/sched_credit2.c | 111 ++++++++++++++++++++++++++++++++++----------- 1 file changed, 85 insertions(+), 26 deletions(-) diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index b92226c..d9f39dc 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -377,6 +377,7 @@ struct csched2_private { unsigned int load_precision_shift; unsigned int load_window_shift; + unsigned ratelimit_us; /* each cpupool can have its onw ratelimit */ }; /* @@ -2029,6 +2030,34 @@ csched2_dom_cntl( return rc; } +static int csched2_sys_cntl(const struct scheduler *ops, + struct xen_sysctl_scheduler_op *sc) +{ + int rc = -EINVAL; + xen_sysctl_credit_schedule_t *params = &sc->u.sched_credit; + struct csched2_private *prv = CSCHED2_PRIV(ops); + unsigned long flags; + + switch (sc->cmd ) + { + case XEN_SYSCTL_SCHEDOP_putinfo: + if ( params->ratelimit_us && + ( params->ratelimit_us > XEN_SYSCTL_SCHED_RATELIMIT_MAX || + params->ratelimit_us < XEN_SYSCTL_SCHED_RATELIMIT_MIN )) + return rc; + write_lock_irqsave(&prv->lock, flags); + prv->ratelimit_us = params->ratelimit_us; + write_unlock_irqrestore(&prv->lock, flags); + break; + + case XEN_SYSCTL_SCHEDOP_getinfo: + params->ratelimit_us = prv->ratelimit_us; + rc = 0; + break; + } + return rc; +} + static void * csched2_alloc_domdata(const struct scheduler *ops, struct domain *dom) { @@ -2098,12 +2127,14 @@ csched2_dom_destroy(const struct scheduler *ops, struct domain *dom) /* How long should we let this vcpu run for? */ static s_time_t -csched2_runtime(const struct scheduler *ops, int cpu, struct csched2_vcpu *snext) +csched2_runtime(const struct scheduler *ops, int cpu, + struct csched2_vcpu *snext, s_time_t now) { - s_time_t time; + s_time_t time, min_time; int rt_credit; /* Proposed runtime measured in credits */ struct csched2_runqueue_data *rqd = RQD(ops, cpu); struct list_head *runq = &rqd->runq; + struct csched2_private *prv = CSCHED2_PRIV(ops); /* * If we're idle, just stay so. Others (or external events) @@ -2116,9 +2147,22 @@ csched2_runtime(const struct scheduler *ops, int cpu, struct csched2_vcpu *snext * 1) Run until snext's credit will be 0 * 2) But if someone is waiting, run until snext's credit is equal * to his - * 3) But never run longer than MAX_TIMER or shorter than MIN_TIMER. + * 3) But never run longer than MAX_TIMER or shorter than MIN_TIMER or + * the ratelimit time. */ + /* Calculate mintime */ + min_time = CSCHED2_MIN_TIMER; + if ( prv->ratelimit_us ) + { + s_time_t ratelimit_min = prv->ratelimit_us; + if ( snext->vcpu->is_running ) + ratelimit_min = snext->vcpu->runstate.state_entry_time + + MICROSECS(prv->ratelimit_us) - now; + if ( ratelimit_min > min_time ) + min_time = ratelimit_min; + } + /* 1) Basic time: Run until credit is 0. */ rt_credit = snext->credit; @@ -2135,32 +2179,32 @@ csched2_runtime(const struct scheduler *ops, int cpu, struct csched2_vcpu *snext } } - /* The next guy may actually have a higher credit, if we've tried to - * avoid migrating him from a different cpu. DTRT. */ - if ( rt_credit <= 0 ) + /* + * The next guy on the runqueue may actually have a higher credit, + * if we've tried to avoid migrating him from a different cpu. + * Setting time=0 will ensure the minimum timeslice is chosen. + * + * FIXME: See if we can eliminate this conversion if we know time + * will be outside (MIN,MAX). Probably requires pre-calculating + * credit values of MIN,MAX per vcpu, since each vcpu burns credit + * at a different rate. + */ + if (rt_credit > 0) + time = c2t(rqd, rt_credit, snext); + else + time = 0; + + /* 3) But never run longer than MAX_TIMER or less than MIN_TIMER or + * the rate_limit time. */ + if ( time < min_time) { - time = CSCHED2_MIN_TIMER; + time = min_time; SCHED_STAT_CRANK(runtime_min_timer); } - else + else if (time > CSCHED2_MAX_TIMER) { - /* FIXME: See if we can eliminate this conversion if we know time - * will be outside (MIN,MAX). Probably requires pre-calculating - * credit values of MIN,MAX per vcpu, since each vcpu burns credit - * at a different rate. */ - time = c2t(rqd, rt_credit, snext); - - /* Check limits */ - if ( time < CSCHED2_MIN_TIMER ) - { - time = CSCHED2_MIN_TIMER; - SCHED_STAT_CRANK(runtime_min_timer); - } - else if ( time > CSCHED2_MAX_TIMER ) - { - time = CSCHED2_MAX_TIMER; - SCHED_STAT_CRANK(runtime_max_timer); - } + time = CSCHED2_MAX_TIMER; + SCHED_STAT_CRANK(runtime_max_timer); } return time; @@ -2178,6 +2222,7 @@ runq_candidate(struct csched2_runqueue_data *rqd, { struct list_head *iter; struct csched2_vcpu *snext = NULL; + struct csched2_private *prv = CSCHED2_PRIV(per_cpu(scheduler, cpu)); /* Default to current if runnable, idle otherwise */ if ( vcpu_runnable(scurr->vcpu) ) @@ -2185,6 +2230,17 @@ runq_candidate(struct csched2_runqueue_data *rqd, else snext = CSCHED2_VCPU(idle_vcpu[cpu]); + /* + * Return the current vcpu if it has executed for less than ratelimit. + * Adjuststment for the selected vcpu's credit and decision + * for how long it will run will be taken in csched2_runtime. + */ + if ( prv->ratelimit_us && !is_idle_vcpu(scurr->vcpu) && + vcpu_runnable(scurr->vcpu) && + (now - scurr->vcpu->runstate.state_entry_time) < + MICROSECS(prv->ratelimit_us) ) + return scurr; + list_for_each( iter, &rqd->runq ) { struct csched2_vcpu * svc = list_entry(iter, struct csched2_vcpu, runq_elem); @@ -2353,7 +2409,7 @@ csched2_schedule( /* * Return task to run next... */ - ret.time = csched2_runtime(ops, cpu, snext); + ret.time = csched2_runtime(ops, cpu, snext, now); ret.task = snext->vcpu; CSCHED2_VCPU_CHECK(ret.task); @@ -2808,6 +2864,8 @@ csched2_init(struct scheduler *ops) prv->runq_map[i] = -1; prv->rqd[i].id = -1; } + /* initialize ratelimit */ + prv->ratelimit_us = sched_ratelimit_us; prv->load_precision_shift = opt_load_precision_shift; prv->load_window_shift = opt_load_window_shift - LOADAVG_GRANULARITY_SHIFT; @@ -2842,6 +2900,7 @@ static const struct scheduler sched_credit2_def = { .wake = csched2_vcpu_wake, .adjust = csched2_dom_cntl, + .adjust_global = csched2_sys_cntl, .pick_cpu = csched2_cpu_pick, .migrate = csched2_vcpu_migrate,