From patchwork Thu Jul 27 12:06:14 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dario Faggioli X-Patchwork-Id: 9866707 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id DC65160382 for ; Thu, 27 Jul 2017 12:08:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CE69528789 for ; Thu, 27 Jul 2017 12:08:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C2FC7287FE; Thu, 27 Jul 2017 12:08:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 4BEE328789 for ; Thu, 27 Jul 2017 12:08:21 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dahYZ-0005hH-KK; Thu, 27 Jul 2017 12:06:19 +0000 Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dahYY-0005gh-DV for xen-devel@lists.xenproject.org; Thu, 27 Jul 2017 12:06:18 +0000 Received: from [85.158.143.35] by server-6.bemta-6.messagelabs.com id 9D/A2-03937-937D9795; Thu, 27 Jul 2017 12:06:17 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFmphleJIrShJLcpLzFFi42K5GNpwSNfyemW kwZ/vHBbft0xmcmD0OPzhCksAYxRrZl5SfkUCa8b6i/eYCiZqVbzaMpelgfG+YhcjF4eQwExG iZez3rKAOCwCa1gl7py5DeZICFxilXhytoOti5ETyImT+DLzOhOEXSmxfeMCRhBbSEBF4ub2V UwQo74zSvy8uI4VJCEsoCdx5OgP9i5GDiA7UuJrryxImE3AQOLNjr1gJSICShL3Vk0Gm8ksEC ExuXcVO4jNIqAqsb3tGVgNr4CjxIfT08F2cQo4SWy/0MQGsddR4vbZn2D1ogJyEisvt0DVC0q cnPmEBWQts4CmxPpd+hDj5SW2v53DPIFRZBaSqlkIVbOQVC1gZF7FqFGcWlSWWqRraKCXVJSZ nlGSm5iZA+SZ6eWmFhcnpqfmJCYV6yXn525iBIY/AxDsYLy3LOAQoyQHk5Io7yTTikghvqT8l MqMxOKM+KLSnNTiQ4waHBwCE87Onc4kxZKXn5eqJMErda0yUkiwKDU9tSItMwcYoTClEhw8Si K8DiBp3uKCxNzizHSI1ClGY44rV9Z9YeKYcmD7FyYhsElS4rzPrgKVCoCUZpTmwQ2CJY5LjLJ SwryMQGcK8RSkFuVmlqDKv2IU52BUEua9CTKFJzOvBG7fK6BTmIBOmdgEdkpJIkJKqoFR/0Xe jHWsJwrfhLJ1lOruv7jCZvH7u4e5TwVKWXP/l7XbzzMtSmnnK+fT3xl3m5/fNTtN8PGDYK3KI 6U39a2Dn4me3xnDeHzHnaz3CdntIlcrdsW07xFnFtJYvOZhDjvHOa28/2dsvv0+w3qqasvUxp jXHLv2Pbny8uXWEpOvlZpJjRHrul9FKrEUZyQaajEXFScCADOCgoMXAwAA X-Env-Sender: raistlin.df@gmail.com X-Msg-Ref: server-5.tower-21.messagelabs.com!1501157177!67230611!1 X-Originating-IP: [209.85.128.194] X-SpamReason: No, hits=0.0 required=7.0 tests= X-StarScan-Received: X-StarScan-Version: 9.4.25; banners=-,-,- X-VirusChecked: Checked Received: (qmail 40276 invoked from network); 27 Jul 2017 12:06:17 -0000 Received: from mail-wr0-f194.google.com (HELO mail-wr0-f194.google.com) (209.85.128.194) by server-5.tower-21.messagelabs.com with AES128-GCM-SHA256 encrypted SMTP; 27 Jul 2017 12:06:17 -0000 Received: by mail-wr0-f194.google.com with SMTP id y43so24195708wrd.0 for ; Thu, 27 Jul 2017 05:06:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=YsksaoSVRBepSGaFXqOrLe/hJ0YvxDzX+ZvPlN4JK7s=; b=GP1w4MxZ9kwzy8IYyJMtEoTbQPbPuAwn41aadjsB0u3JzEYBgSej1s04dqFPfj3Mim wNDElel9Pycm5ygC1KwVJG6nuosUv+R3eOe+DJEYFRfTpRfzS3Whdy3Yrcy58aIeh9EH dmv8VeYvpfva50iqJnU3ZrrVpeOizTFYG//acxDjhTbidnQcf3lEOtyD3AY656E/dJsP oKJP1/kpf2mTs7jV0QblavkRlTP8XFjGE5ldfIBp0DfZAxCuC6i9OttBZuQJMY//NZ35 diWtAVo4tI7PAB3muXv4Prys5Lr6K6I5wYGI8FME5AXHQyCgfpAOnTPm2aFPNw9c9vDR QzsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:from:to:cc:date:message-id :in-reply-to:references:user-agent:mime-version :content-transfer-encoding; bh=YsksaoSVRBepSGaFXqOrLe/hJ0YvxDzX+ZvPlN4JK7s=; b=R0IWVYPC68q2DtrcHSvu0dFR9yLEpynOZzB479bpLyaIyRVGMqLzBRmQZLoWPFk5ZS fruOql1OqM2qvyqfjyC+BkGvDzjg0qGaL6ZWJAKXjgzagKRCpbuqLHElxkMyaqwEei4m dui1Oyc4FonMvE8XibZ0NGsbvKXzq5ocN1pxJonyR/1b56zWH7JIBBx117+PBTXDDUno 7o9ulzs6OIIvaenMmVVU/59tYUg64hKUdO/xQ2E8QzjUph+W9CqlfC2RVdz7gr/5NFOp VgrAH6xl0N9tLNC55LBxlZWcm9/B/8azS6HIs/oUVG5iVW9t9R2fHlR/lAHvYJnTLXaS YKKw== X-Gm-Message-State: AIVw111i4fSm1t+1z12I6mhCRyVlvBev2mwzDhmOau0opr+bekRfqpZQ Qu6ENC7Oi8p3mA== X-Received: by 10.223.143.68 with SMTP id p62mr3709156wrb.20.1501157176669; Thu, 27 Jul 2017 05:06:16 -0700 (PDT) Received: from [192.168.0.31] ([80.66.223.212]) by smtp.gmail.com with ESMTPSA id l8sm2118380wmd.15.2017.07.27.05.06.15 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Jul 2017 05:06:16 -0700 (PDT) From: Dario Faggioli To: xen-devel@lists.xenproject.org Date: Thu, 27 Jul 2017 14:06:14 +0200 Message-ID: <150115717494.6767.14536203038593245612.stgit@Solace> In-Reply-To: <150115657192.6767.15778617807307106582.stgit@Solace> References: <150115657192.6767.15778617807307106582.stgit@Solace> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Cc: George Dunlap , Anshul Makkar Subject: [Xen-devel] [PATCH v2 6/6] xen: credit2: try to avoid tickling cpus subject to ratelimiting X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP With context switching ratelimiting enabled, the following pattern is quite common in a scheduling trace: 0.000845622 |||||||||||.x||| d32768v12 csched2:runq_insert d0v13, position 0 0.000845831 |||||||||||.x||| d32768v12 csched2:runq_tickle_new d0v13, processor = 12, credit = 10135529 0.000846546 |||||||||||.x||| d32768v12 csched2:burn_credits d2v7, credit = 2619231, delta = 255937 [1] 0.000846739 |||||||||||.x||| d32768v12 csched2:runq_tickle cpu 12 [...] [2] 0.000850597 ||||||||||||x||| d32768v12 csched2:schedule cpu 12, rq# 1, busy, SMT busy, tickled 0.000850760 ||||||||||||x||| d32768v12 csched2:burn_credits d2v7, credit = 2614028, delta = 5203 [3] 0.000851022 ||||||||||||x||| d32768v12 csched2:ratelimit triggered [4] 0.000851614 ||||||||||||x||| d32768v12 runstate_continue d2v7 running->running Basically, what happens is that runq_tickle() realizes d0v13 should preempt d2v7, running on cpu 12, as it has higher credits (10135529 vs. 2619231). It therefore tickles cpu 12 [1], which, in turn, schedules [2]. But --surprise surprise-- d2v7 has run for less than the ratelimit interval [3], and hence it is _not_ preempted, and continues to run. This indeed looks fine. Actually, this is what ratelimiting is there for. Note, however, that: 1) we interrupted cpu 12 for nothing; 2) what if, say on cpu 8, there is a vcpu that has: + less credit than d0v13 (so d0v13 can well preempt it), + more credit than d2v7 (that's why it was not selected to be preempted), + run for more than the ratelimiting interval (so it can really be scheduled out)? With this patch, if we are in case 2), we'd realize that tickling 12 would be pointless, and we'll continue looking, eventually finding and tickling 8. Signed-off-by: Dario Faggioli Reviewed-by: George Dunlap --- Cc: Anshul Makkar --- xen/common/sched_credit2.c | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index 30d9f55..fab7f2e 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -160,6 +160,8 @@ #define CSCHED2_MIGRATE_RESIST ((opt_migrate_resist)*MICROSECS(1)) /* How much to "compensate" a vcpu for L2 migration. */ #define CSCHED2_MIGRATE_COMPENSATION MICROSECS(50) +/* How tolerant we should be when peeking at runtime of vcpus on other cpus */ +#define CSCHED2_RATELIMIT_TICKLE_TOLERANCE MICROSECS(50) /* Reset: Value below which credit will be reset. */ #define CSCHED2_CREDIT_RESET 0 /* Max timer: Maximum time a guest can be run for. */ @@ -1203,6 +1205,23 @@ tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd) } /* + * What we want to know is whether svc, which we assume to be running on some + * pcpu, can be interrupted and preempted (which, so far, basically means + * whether or not it already run for more than the ratelimit, to which we + * apply some tolerance). + */ +static inline bool is_preemptable(const struct csched2_vcpu *svc, + s_time_t now, s_time_t ratelimit) +{ + if ( ratelimit <= CSCHED2_RATELIMIT_TICKLE_TOLERANCE ) + return true; + + ASSERT(svc->vcpu->is_running); + return now - svc->vcpu->runstate.state_entry_time > + ratelimit - CSCHED2_RATELIMIT_TICKLE_TOLERANCE; +} + +/* * Score to preempt the target cpu. Return a negative number if the * credit isn't high enough; if it is, favor a preemption on cpu in * this order: @@ -1216,10 +1235,12 @@ tickle_cpu(unsigned int cpu, struct csched2_runqueue_data *rqd) * * Within the same class, the highest difference of credit. */ -static s_time_t tickle_score(struct csched2_runqueue_data *rqd, s_time_t now, +static s_time_t tickle_score(const struct scheduler *ops, s_time_t now, struct csched2_vcpu *new, unsigned int cpu) { + struct csched2_runqueue_data *rqd = c2rqd(ops, cpu); struct csched2_vcpu * cur = csched2_vcpu(curr_on_cpu(cpu)); + struct csched2_private *prv = csched2_priv(ops); s_time_t score; /* @@ -1227,7 +1248,8 @@ static s_time_t tickle_score(struct csched2_runqueue_data *rqd, s_time_t now, * in rqd->idle). However, some of them may be running their idle vcpu, * if taking care of tasklets. In that case, we want to leave it alone. */ - if ( unlikely(is_idle_vcpu(cur->vcpu)) ) + if ( unlikely(is_idle_vcpu(cur->vcpu) || + !is_preemptable(cur, now, MICROSECS(prv->ratelimit_us))) ) return -1; burn_credits(rqd, cur, now); @@ -1384,7 +1406,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now) cpumask_and(&mask, &mask, cpumask_scratch_cpu(cpu)); if ( __cpumask_test_and_clear_cpu(cpu, &mask) ) { - s_time_t score = tickle_score(rqd, now, new, cpu); + s_time_t score = tickle_score(ops, now, new, cpu); if ( score > max ) { @@ -1407,7 +1429,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now) /* Already looked at this one above */ ASSERT(i != cpu); - score = tickle_score(rqd, now, new, i); + score = tickle_score(ops, now, new, i); if ( score > max ) {