From patchwork Wed Aug 17 17:18:08 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dario Faggioli X-Patchwork-Id: 9286203 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 8B51060839 for ; Wed, 17 Aug 2016 17:20:28 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7A00C294A2 for ; Wed, 17 Aug 2016 17:20:28 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6ECEF294B8; Wed, 17 Aug 2016 17:20:28 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 9C9EA294A2 for ; Wed, 17 Aug 2016 17:20:27 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ba4Tm-0008Hs-2T; Wed, 17 Aug 2016 17:18:14 +0000 Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ba4Tk-0008Gr-IX for xen-devel@lists.xenproject.org; Wed, 17 Aug 2016 17:18:12 +0000 Received: from [193.109.254.147] by server-3.bemta-6.messagelabs.com id 19/FD-05661-35C94B75; Wed, 17 Aug 2016 17:18:11 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrHIsWRWlGSWpSXmKPExsXiVRvkohs0Z0u 4wf9jNhbft0xmcmD0OPzhCksAYxRrZl5SfkUCa8a+rYfYC2Y4VfSeOsbYwHjdsIuRi0NIYAaj xKHre5hAHBaBNawSaw58BXMkBC6xSsw785S9i5ETyImROPa9jRnCrpF4uQGkgxOoXUXi5vZVT BCj5jFJLH3ZDpYQFtCTOHL0BzuE7Srx+MRaMJtNwEDizY69rCC2iICSxL1Vk8GamQWWMEpcfT aLBSTBIqAqsezAFsYuRg4OXgFviQNTNUDCnAI+ErdW/2aBWOwtcXhyDxuILSogJ7HycgvYTF4 BQYmTM5+wgLQyC2hKrN+lDxJmFpCX2P52DvMERpFZSKpmIVTNQlK1gJF5FaN6cWpRWWqRrrle UlFmekZJbmJmjq6hgZlebmpxcWJ6ak5iUrFecn7uJkZg+DMAwQ7GmZf9DzFKcjApifLeqd4SL sSXlJ9SmZFYnBFfVJqTWnyIUYODQ2DC2bnTmaRY8vLzUpUkeDfOAqoTLEpNT61Iy8wBRihMqQ QHj5IIL+tsoDRvcUFibnFmOkTqFKMux5ap99YyCYHNkBLnfQYyQwCkKKM0D24ELFlcYpSVEuZ lBDpQiKcgtSg3swRV/hWjOAejkjDvJ5ApPJl5JXCbXgEdwQR0BC8/2BEliQgpqQbGomDuKzrV HQGlh0Uf5fjNdF/2R/lxtJfF1Gm3+vdYyZ8pt9+rw/K7JyHj6/vHEv/XxjzXKCiQuPf+ttYhl bZjayt3uH2S/2Ez/ejluRNshWRMZ977Idp0WmKb8Y/2JiPfrPSfU8OSepYK3enc42V2peOpwd dHv9iuR7VaPVwhPa//4dTVFf55SizFGYmGWsxFxYkAnMxubxEDAAA= X-Env-Sender: raistlin.df@gmail.com X-Msg-Ref: server-3.tower-27.messagelabs.com!1471454290!54105905!1 X-Originating-IP: [74.125.82.68] X-SpamReason: No, hits=0.2 required=7.0 tests=RCVD_ILLEGAL_IP X-StarScan-Received: X-StarScan-Version: 8.84; banners=-,-,- X-VirusChecked: Checked Received: (qmail 21702 invoked from network); 17 Aug 2016 17:18:10 -0000 Received: from mail-wm0-f68.google.com (HELO mail-wm0-f68.google.com) (74.125.82.68) by server-3.tower-27.messagelabs.com with AES128-GCM-SHA256 encrypted SMTP; 17 Aug 2016 17:18:10 -0000 Received: by mail-wm0-f68.google.com with SMTP id o80so26312449wme.0 for ; Wed, 17 Aug 2016 10:18:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=tDiSR3pK2pRb+YFwkN+r56I9y0XXWF9Yf2gizJdmeUU=; b=g5+t0/ohEugzF3jOrtfhxxx4EbkuAhmH8XJyhflqaj5OjcQa/h5BH/NWZExS41HpTm BoyU9UL7e8BsD6TXrYwI7ilVs5LMlJ1L4xHquMfHaLc6peViiPlaFs98eRvPpJ5dI62p GN5hQvsiU1HUxszvKntAziSZJrTljaqu06nWU2mdzXMWdrA4D3m0z3OWFxZx/weJ5bXb OIsF4jsUwmsPH7Az5IfnRbW+DnIBxxrRjaoBbFlfDoePLnvp1tTcqW4QcOvrSLXKqcNP QsOIlAlDHEsZ52d4QSvs6pSsiiKahBZf+NjdvWlAdX5PI2r1TFgCvFcy9ENtOqoZN5Gx 7HQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:subject:from:to:cc:date:message-id :in-reply-to:references:user-agent:mime-version :content-transfer-encoding; bh=tDiSR3pK2pRb+YFwkN+r56I9y0XXWF9Yf2gizJdmeUU=; b=L3KqCY9ogqhP8QZ9FCnKU1bIunPiIpkJk2ytZTQgrNKIDRjtKW23F14bHwzt9Fva5O 4kkJlfp3LohCHrP97B08Ur9gpqanKxCbCicxAmUTnBx4fQjo8UNMauB1Xfq8fRrL/cPI d6t9OsPXnYYxnLkWsBN3XgLkXRRmQ95bfgkvO++UaOhonzbJapsh2BZ1yqFIyd3ZmCz8 G2b6dKzjH7rqXC3W81JAOpYDGZifHtfeow9I/kXf05pb4xRl30YTMQgOUXzX3IApoGvt yAv1lkL9TeLWVOLvbHW7X6uQ1mJ+cxQ5piUKrqVeCI0S9AZ1OP9QQixtHoBvXss401Z3 NhyQ== X-Gm-Message-State: AEkooutRj2V4MntgHEksxM7XheLhj1Sppoz0IZmV0WS4oyAoB+zOfuJ02Cpf43BzJtWHxA== X-Received: by 10.28.145.137 with SMTP id t131mr13410611wmd.110.1471454290346; Wed, 17 Aug 2016 10:18:10 -0700 (PDT) Received: from Solace.fritz.box (net-2-32-14-104.cust.vodafonedsl.it. [2.32.14.104]) by smtp.gmail.com with ESMTPSA id c139sm27587495wme.4.2016.08.17.10.18.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 17 Aug 2016 10:18:09 -0700 (PDT) From: Dario Faggioli To: xen-devel@lists.xenproject.org Date: Wed, 17 Aug 2016 19:18:08 +0200 Message-ID: <147145428827.25877.8612560340138019986.stgit@Solace.fritz.box> In-Reply-To: <147145358844.25877.7490417583264534196.stgit@Solace.fritz.box> References: <147145358844.25877.7490417583264534196.stgit@Solace.fritz.box> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Cc: Andrew Cooper , Anshul Makkar , George Dunlap , Jan Beulich Subject: [Xen-devel] [PATCH 05/24] xen: credit2: make tickling more deterministic X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Right now, the following scenario can occurr: - upon vcpu v wakeup, v itself is put in the runqueue, and pcpu X is tickled; - pcpu Y schedules (for whatever reason), sees v in the runqueue and picks it up. This may seem ok (or even a good thing), but it's not. In fact, if runq_tickle() decided X is where v should run, it did it for a reason (load distribution, SMT support, cache hotness, affinity, etc), and we really should try as hard as possible to stick to that. Of course, we can't be too strict, or we risk leaving vcpus in the runqueue while there is available CPU capacity. So, we only leave v in runqueue --for X to pick it up-- if we see that X has been tickled and has not scheduled yet, i.e., it will have a real chance of actually select and schedule v. If that is not the case, we schedule it on Y (or, at least, we consider that), as running somewhere non-ideal is better than not running at all. The commit also adds performance counters for each of the possible situations. Signed-off-by: Dario Faggioli --- Cc: George Dunlap Cc: Anshul Makkar Cc: Jan Beulich Cc: Andrew Cooper --- xen/common/sched_credit2.c | 65 +++++++++++++++++++++++++++++++++++++++--- xen/include/xen/perfc_defn.h | 3 ++ 2 files changed, 64 insertions(+), 4 deletions(-) diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index 12dfd20..a3d7beb 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -54,6 +54,7 @@ #define TRC_CSCHED2_LOAD_CHECK TRC_SCHED_CLASS_EVT(CSCHED2, 16) #define TRC_CSCHED2_LOAD_BALANCE TRC_SCHED_CLASS_EVT(CSCHED2, 17) #define TRC_CSCHED2_PICKED_CPU TRC_SCHED_CLASS_EVT(CSCHED2, 19) +#define TRC_CSCHED2_RUNQ_CANDIDATE TRC_SCHED_CLASS_EVT(CSCHED2, 20) /* * WARNING: This is still in an experimental phase. Status and work can be found at the @@ -398,6 +399,7 @@ struct csched2_vcpu { int credit; s_time_t start_time; /* When we were scheduled (used for credit) */ unsigned flags; /* 16 bits doesn't seem to play well with clear_bit() */ + int tickled_cpu; /* cpu tickled for picking us up (-1 if none) */ /* Individual contribution to load */ s_time_t load_last_update; /* Last time average was updated */ @@ -1049,6 +1051,10 @@ runq_tickle(const struct scheduler *ops, struct csched2_vcpu *new, s_time_t now) __cpumask_set_cpu(ipid, &rqd->tickled); smt_idle_mask_clear(ipid, &rqd->smt_idle); cpu_raise_softirq(ipid, SCHEDULE_SOFTIRQ); + + if ( unlikely(new->tickled_cpu != -1) ) + SCHED_STAT_CRANK(tickled_cpu_overwritten); + new->tickled_cpu = ipid; } /* @@ -1266,6 +1272,7 @@ csched2_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd) ASSERT(svc->sdom != NULL); svc->credit = CSCHED2_CREDIT_INIT; svc->weight = svc->sdom->weight; + svc->tickled_cpu = -1; /* Starting load of 50% */ svc->avgload = 1ULL << (CSCHED2_PRIV(ops)->load_precision_shift - 1); svc->load_last_update = NOW() >> LOADAVG_GRANULARITY_SHIFT; @@ -1273,6 +1280,7 @@ csched2_alloc_vdata(const struct scheduler *ops, struct vcpu *vc, void *dd) else { ASSERT(svc->sdom == NULL); + svc->tickled_cpu = svc->vcpu->vcpu_id; svc->credit = CSCHED2_IDLE_CREDIT; svc->weight = 0; } @@ -2233,7 +2241,8 @@ void __dump_execstate(void *unused); static struct csched2_vcpu * runq_candidate(struct csched2_runqueue_data *rqd, struct csched2_vcpu *scurr, - int cpu, s_time_t now) + int cpu, s_time_t now, + unsigned int *pos) { struct list_head *iter; struct csched2_vcpu *snext = NULL; @@ -2262,13 +2271,29 @@ runq_candidate(struct csched2_runqueue_data *rqd, /* Only consider vcpus that are allowed to run on this processor. */ if ( !cpumask_test_cpu(cpu, svc->vcpu->cpu_hard_affinity) ) + { + (*pos)++; continue; + } + + /* + * If a vcpu is meant to be picked up by another processor, and such + * processor has not scheduled yet, leave it in the runqueue for him. + */ + if ( svc->tickled_cpu != -1 && svc->tickled_cpu != cpu && + cpumask_test_cpu(svc->tickled_cpu, &rqd->tickled) ) + { + (*pos)++; + SCHED_STAT_CRANK(deferred_to_tickled_cpu); + continue; + } /* If this is on a different processor, don't pull it unless * its credit is at least CSCHED2_MIGRATE_RESIST higher. */ if ( svc->vcpu->processor != cpu && snext->credit + CSCHED2_MIGRATE_RESIST > svc->credit ) { + (*pos)++; SCHED_STAT_CRANK(migrate_resisted); continue; } @@ -2280,9 +2305,26 @@ runq_candidate(struct csched2_runqueue_data *rqd, /* In any case, if we got this far, break. */ break; + } + if ( unlikely(tb_init_done) ) + { + struct { + unsigned vcpu:16, dom:16; + unsigned tickled_cpu, position; + } d; + d.dom = snext->vcpu->domain->domain_id; + d.vcpu = snext->vcpu->vcpu_id; + d.tickled_cpu = snext->tickled_cpu; + d.position = *pos; + __trace_var(TRC_CSCHED2_RUNQ_CANDIDATE, 1, + sizeof(d), + (unsigned char *)&d); } + if ( unlikely(snext->tickled_cpu != -1 && snext->tickled_cpu != cpu) ) + SCHED_STAT_CRANK(tickled_cpu_overridden); + return snext; } @@ -2298,6 +2340,7 @@ csched2_schedule( struct csched2_runqueue_data *rqd; struct csched2_vcpu * const scurr = CSCHED2_VCPU(current); struct csched2_vcpu *snext = NULL; + unsigned int snext_pos = 0; struct task_slice ret; SCHED_STAT_CRANK(schedule); @@ -2347,7 +2390,7 @@ csched2_schedule( snext = CSCHED2_VCPU(idle_vcpu[cpu]); } else - snext=runq_candidate(rqd, scurr, cpu, now); + snext = runq_candidate(rqd, scurr, cpu, now, &snext_pos); /* If switching from a non-idle runnable vcpu, put it * back on the runqueue. */ @@ -2371,8 +2414,21 @@ csched2_schedule( __set_bit(__CSFLAG_scheduled, &snext->flags); } - /* Check for the reset condition */ - if ( snext->credit <= CSCHED2_CREDIT_RESET ) + /* + * The reset condition is "has a scheduler epoch come to an end?". + * The way this is enforced is checking whether the vcpu at the top + * of the runqueue has negative credits. This means the epochs have + * variable lenght, as in one epoch expores when: + * 1) the vcpu at the top of the runqueue has executed for + * around 10 ms (with default parameters); + * 2) no other vcpu with higher credits wants to run. + * + * Here, where we want to check for reset, we need to make sure the + * proper vcpu is being used. In fact, runqueue_candidate() may have + * not returned the first vcpu in the runqueue, for various reasons + * (e.g., affinity). Only trigger a reset when it does. + */ + if ( snext_pos == 0 && snext->credit <= CSCHED2_CREDIT_RESET ) { reset_credit(ops, cpu, now, snext); balance_load(ops, cpu, now); @@ -2386,6 +2442,7 @@ csched2_schedule( } snext->start_time = now; + snext->tickled_cpu = -1; /* Safe because lock for old processor is held */ if ( snext->vcpu->processor != cpu ) diff --git a/xen/include/xen/perfc_defn.h b/xen/include/xen/perfc_defn.h index a336c71..4a835b8 100644 --- a/xen/include/xen/perfc_defn.h +++ b/xen/include/xen/perfc_defn.h @@ -66,6 +66,9 @@ PERFCOUNTER(runtime_max_timer, "csched2: runtime_max_timer") PERFCOUNTER(migrated, "csched2: migrated") PERFCOUNTER(migrate_resisted, "csched2: migrate_resisted") PERFCOUNTER(credit_reset, "csched2: credit_reset") +PERFCOUNTER(deferred_to_tickled_cpu,"csched2: deferred_to_tickled_cpu") +PERFCOUNTER(tickled_cpu_overwritten,"csched2: tickled_cpu_overwritten") +PERFCOUNTER(tickled_cpu_overridden, "csched2: tickled_cpu_overridden") PERFCOUNTER(need_flush_tlb_flush, "PG_need_flush tlb flushes")