From patchwork Fri Jul 15 14:50:18 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dario Faggioli X-Patchwork-Id: 9232173 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id BB49C60574 for ; Fri, 15 Jul 2016 14:52:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AC72B27AC2 for ; Fri, 15 Jul 2016 14:52:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A0AE027F94; Fri, 15 Jul 2016 14:52:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id C1C7A27AC2 for ; Fri, 15 Jul 2016 14:52:53 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bO4Rc-0006d7-7f; Fri, 15 Jul 2016 14:50:24 +0000 Received: from mail6.bemta6.messagelabs.com ([85.158.143.247]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bO4Ra-0006cG-Qo for xen-devel@lists.xenproject.org; Fri, 15 Jul 2016 14:50:23 +0000 Received: from [85.158.143.35] by server-1.bemta-6.messagelabs.com id 07/56-21406-E28F8875; Fri, 15 Jul 2016 14:50:22 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrLIsWRWlGSWpSXmKPExsXiVRvkqKv7oyP cYMs0GYvvWyYzOTB6HP5whSWAMYo1My8pvyKBNeNi92LWgscZFXMXL2dqYFwT2MXIxSEkMINR 4vv0FkYQh0VgDatEU/N/FhBHQuASq8SWWUdYuxg5gZwYiZV7FjB1MXIA2RUSv6fmgISFBFQkb m5fxQQxaRGTRNO5fYwgCWEBPYkjR3+wQ9ihEn/ebmMDsdkEDCTe7NgLNlNEQEni3qrJYM3MAg 2MEhMe72ICSbAIqEqceLaBHWQZr4C3xK3v5SBhTgFfief317BALPaReHF3Nli5qICcxMrLLWA zeQUEJU7OfMIC0sosoCmxfpc+SJhZQF5i+9s5zBMYRWYhqZqFUDULSdUCRuZVjOrFqUVlqUW6 RnpJRZnpGSW5iZk5uoYGZnq5qcXFiempOYlJxXrJ+bmbGIHBzwAEOxiX/XU6xCjJwaQkyrsxp SNciC8pP6UyI7E4I76oNCe1+BCjBgeHwISzc6czSbHk5eelKknwPvsGVCdYlJqeWpGWmQOMT5 hSCQ4eJRHedyBp3uKCxNzizHSI1ClGXY4tU++tZRICmyElzsv/HahIAKQoozQPbgQsVVxilJU S5mUEOlCIpyC1KDezBFX+FaM4B6OSMK8WyBSezLwSuE2vgI5gAjrC2rwd5IiSRISUVAOj4prT 6+I/bxfQqmFfcWatiHPk3yWTj95Z9S5WZsoii9T9t9yuHf921Out49z0EvnXUt7JbAnbX9/9+ rnCvSVhoqzMOs/Ore4BMnvuvWHXKgg5yjq5LMj4j0zFZaUbUYXn1k3+U/r0GPuzWtala08whr zi1xMSKrwsKZlwOzJ0V+lasf1K36PKlViKMxINtZiLihMBmR04axADAAA= X-Env-Sender: raistlin.df@gmail.com X-Msg-Ref: server-11.tower-21.messagelabs.com!1468594221!24128180!1 X-Originating-IP: [74.125.82.65] X-SpamReason: No, hits=0.0 required=7.0 tests= X-StarScan-Received: X-StarScan-Version: 8.77; banners=-,-,- X-VirusChecked: Checked Received: (qmail 6070 invoked from network); 15 Jul 2016 14:50:21 -0000 Received: from mail-wm0-f65.google.com (HELO mail-wm0-f65.google.com) (74.125.82.65) by server-11.tower-21.messagelabs.com with AES128-GCM-SHA256 encrypted SMTP; 15 Jul 2016 14:50:21 -0000 Received: by mail-wm0-f65.google.com with SMTP id q128so2483929wma.1 for ; Fri, 15 Jul 2016 07:50:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=S1kOzyIoB5FC9wd2KJRcUhL4B52yRorsKUEa+7aIR4k=; b=N2OwMBR4fF853OemmDJKfk6O6U8mtF8nUZa4b0Wvmvt8GorvHcW/oNe/MK6Zy3sQqX NxbaM47JSgagEkybDDOTjxkrUzKor7WClCvhGFuX++alsT6XhEg7zQj/UX/HXPXUMomc 4lLciwSPmNTcah5iiP5i1LXI8ocniwGGJu0Z5QHR6kN6BJuWa0xyUzV3IShUZu0IdmWz n96xiE3E2kGbfgyyCL5BgwlykEr7DIiPuo6lBiMVeZSGc9izf+4O8TgFZGl4I4PgzUic kvnG9qYEaL0F/ZISGSqx0bemfAIctXWbB/Z/3TVnGmabrVBtWLWkLL70cMIJlU+C6y2v AAUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:subject:from:to:cc:date:message-id :in-reply-to:references:user-agent:mime-version :content-transfer-encoding; bh=S1kOzyIoB5FC9wd2KJRcUhL4B52yRorsKUEa+7aIR4k=; b=HtTAyMmFp0Ad+jPup50ssWRZ8xY1UH/SD0q8s+24mC+qnb8GonMku53Ml5q01tGmZc ipxzmf3YQgWj/IdeWgMrE+WEVhf/3WQlRjd9QVVrRVkh2C6syl/kv/gN9+lcTZfDCgwP 6vthxGPNYq/miLccFJ8pGeJt09F4on7Pn+pnU+59rRma7184zyDPWiTdCGFxihiIensj IZeRlApCmVVPkLS0qqGin0zmB+k4Ti7Tu/cN3yAJraN8AVtbovfE8frqMEtn/bP+3X+c +4Ge9swFEz1LUXrx10HwN63tVmPjI7JO4kwQSqspihw7iugPPEDMVroE23KMlbPTdlQk buYA== X-Gm-Message-State: ALyK8tIFr/uiPwZ/YlzXrfGxjJZUubFDPqTOSckODdZCqrbq++CUPAaUDhxrtUfxmisdBQ== X-Received: by 10.28.95.67 with SMTP id t64mr38094466wmb.99.1468594220824; Fri, 15 Jul 2016 07:50:20 -0700 (PDT) Received: from Solace.fritz.box (net-188-217-84-158.cust.vodafonedsl.it. [188.217.84.158]) by smtp.gmail.com with ESMTPSA id i66sm2511226wmg.9.2016.07.15.07.50.19 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 15 Jul 2016 07:50:20 -0700 (PDT) From: Dario Faggioli To: xen-devel@lists.xenproject.org Date: Fri, 15 Jul 2016 16:50:18 +0200 Message-ID: <146859421885.10217.4343438485432953690.stgit@Solace.fritz.box> In-Reply-To: <146859397891.10217.10155969474613302167.stgit@Solace.fritz.box> References: <146859397891.10217.10155969474613302167.stgit@Solace.fritz.box> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Cc: Anshul Makkar , George Dunlap , David Vrabel Subject: [Xen-devel] [PATCH v2 10/11] xen: credit2: the private scheduler lock can be an rwlock. X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP In fact, the data it protects only change either at init-time, during cpupools manipulation, or when changing domains' weights. In all other cases (namely, load balancing, reading weights and status dumping), information is only read. Therefore, let the lock be an read/write one. This means there is no full serialization point for the whole scheduler and for all the pCPUs of the host any longer. This is particularly good for scalability (especially when doing load balancing). Also, update the high level description of the locking discipline, and take the chance for rewording it a little bit (as well as for adding a couple of locking related ASSERT()-s). Signed-off-by: Dario Faggioli Reviewed-by: George Dunlap --- Cc: Anshul Makkar Cc: David Vrabel --- xen/common/sched_credit2.c | 133 ++++++++++++++++++++++++++------------------ 1 file changed, 80 insertions(+), 53 deletions(-) diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index be27ba3..b33ba7a 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -90,17 +90,37 @@ /* * Locking: - * - Schedule-lock is per-runqueue - * + Protects runqueue data, runqueue insertion, &c - * + Also protects updates to private sched vcpu structure - * + Must be grabbed using vcpu_schedule_lock_irq() to make sure vcpu->processr - * doesn't change under our feet. - * - Private data lock - * + Protects access to global domain list - * + All other private data is written at init and only read afterwards. + * + * - runqueue lock + * + it is per-runqueue, so: + * * cpus in a runqueue take the runqueue lock, when using + * pcpu_schedule_lock() / vcpu_schedule_lock() (and friends), + * * a cpu may (try to) take a "remote" runqueue lock, e.g., for + * load balancing; + * + serializes runqueue operations (removing and inserting vcpus); + * + protects runqueue-wide data in csched2_runqueue_data; + * + protects vcpu parameters in csched2_vcpu for the vcpu in the + * runqueue. + * + * - Private scheduler lock + * + protects scheduler-wide data in csched2_private, such as: + * * the list of domains active in this scheduler, + * * what cpus and what runqueues are active and in what + * runqueue each cpu is; + * + serializes the operation of changing the weights of domains; + * + * - Type: + * + runqueue locks are 'regular' spinlocks; + * + the private scheduler lock can be an rwlock. In fact, data + * it protects is modified only during initialization, cpupool + * manipulation and when changing weights, and read in all + * other cases (e.g., during load balancing). + * * Ordering: - * - We grab private->schedule when updating domain weight; so we - * must never grab private if a schedule lock is held. + * + tylock must be used when wanting to take a runqueue lock, + * if we already hold another one; + * + if taking both a runqueue lock and the private scheduler + * lock is, the latter must always be taken for first. */ /* @@ -345,7 +365,7 @@ struct csched2_runqueue_data { * System-wide private data */ struct csched2_private { - spinlock_t lock; + rwlock_t lock; cpumask_t initialized; /* CPU is initialized for this pool */ struct list_head sdom; /* Used mostly for dump keyhandler. */ @@ -1302,13 +1322,14 @@ static void csched2_vcpu_wake(const struct scheduler *ops, struct vcpu *vc) { struct csched2_vcpu * const svc = CSCHED2_VCPU(vc); + unsigned int cpu = vc->processor; s_time_t now; - /* Schedule lock should be held at this point. */ + ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock)); ASSERT(!is_idle_vcpu(vc)); - if ( unlikely(curr_on_cpu(vc->processor) == vc) ) + if ( unlikely(curr_on_cpu(cpu) == vc) ) { SCHED_STAT_CRANK(vcpu_wake_running); goto out; @@ -1399,7 +1420,7 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc) ASSERT(!cpumask_empty(&prv->active_queues)); /* Locking: - * - vc->processor is already locked + * - Runqueue lock of vc->processor is already locked * - Need to grab prv lock to make sure active runqueues don't * change * - Need to grab locks for other runqueues while checking @@ -1412,8 +1433,9 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc) * just grab the prv lock. Instead, we'll have to trylock, and * do something else reasonable if we fail. */ + ASSERT(spin_is_locked(per_cpu(schedule_data, vc->processor).schedule_lock)); - if ( !spin_trylock(&prv->lock) ) + if ( !read_trylock(&prv->lock) ) { /* We may be here because someone requested us to migrate. */ __clear_bit(__CSFLAG_runq_migrate_request, &svc->flags); @@ -1495,7 +1517,7 @@ csched2_cpu_pick(const struct scheduler *ops, struct vcpu *vc) } out_up: - spin_unlock(&prv->lock); + read_unlock(&prv->lock); if ( unlikely(tb_init_done) ) { @@ -1647,15 +1669,13 @@ static void balance_load(const struct scheduler *ops, int cpu, s_time_t now) * on either side may be empty). */ - /* Locking: - * - pcpu schedule lock should be already locked - */ + ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock)); st.lrqd = RQD(ops, cpu); __update_runq_load(ops, st.lrqd, 0, now); retry: - if ( !spin_trylock(&prv->lock) ) + if ( !read_trylock(&prv->lock) ) return; st.load_delta = 0; @@ -1685,8 +1705,8 @@ retry: spin_unlock(&st.orqd->lock); } - /* Minimize holding the big lock */ - spin_unlock(&prv->lock); + /* Minimize holding the private scheduler lock. */ + read_unlock(&prv->lock); if ( max_delta_rqi == -1 ) goto out; @@ -1854,14 +1874,19 @@ csched2_dom_cntl( unsigned long flags; int rc = 0; - /* Must hold csched2_priv lock to read and update sdom, - * runq lock to update csvcs. */ - spin_lock_irqsave(&prv->lock, flags); - + /* + * Locking: + * - we must take the private lock for accessing the weights of the + * vcpus of d, + * - in the putinfo case, we also need the runqueue lock(s), for + * updating the max waight of the runqueue(s). + */ switch ( op->cmd ) { case XEN_DOMCTL_SCHEDOP_getinfo: + read_lock_irqsave(&prv->lock, flags); op->u.credit2.weight = sdom->weight; + read_unlock_irqrestore(&prv->lock, flags); break; case XEN_DOMCTL_SCHEDOP_putinfo: if ( op->u.credit2.weight != 0 ) @@ -1869,6 +1894,8 @@ csched2_dom_cntl( struct vcpu *v; int old_weight; + write_lock_irqsave(&prv->lock, flags); + old_weight = sdom->weight; sdom->weight = op->u.credit2.weight; @@ -1877,11 +1904,6 @@ csched2_dom_cntl( for_each_vcpu ( d, v ) { struct csched2_vcpu *svc = CSCHED2_VCPU(v); - - /* NB: Locking order is important here. Because we grab this lock here, we - * must never lock csched2_priv.lock if we're holding a runqueue lock. - * Also, calling vcpu_schedule_lock() is enough, since IRQs have already - * been disabled. */ spinlock_t *lock = vcpu_schedule_lock(svc->vcpu); ASSERT(svc->rqd == RQD(ops, svc->vcpu->processor)); @@ -1891,6 +1913,8 @@ csched2_dom_cntl( vcpu_schedule_unlock(lock, svc->vcpu); } + + write_unlock_irqrestore(&prv->lock, flags); } break; default: @@ -1898,7 +1922,6 @@ csched2_dom_cntl( break; } - spin_unlock_irqrestore(&prv->lock, flags); return rc; } @@ -1906,6 +1929,7 @@ csched2_dom_cntl( static void * csched2_alloc_domdata(const struct scheduler *ops, struct domain *dom) { + struct csched2_private *prv = CSCHED2_PRIV(ops); struct csched2_dom *sdom; unsigned long flags; @@ -1919,11 +1943,11 @@ csched2_alloc_domdata(const struct scheduler *ops, struct domain *dom) sdom->weight = CSCHED2_DEFAULT_WEIGHT; sdom->nr_vcpus = 0; - spin_lock_irqsave(&CSCHED2_PRIV(ops)->lock, flags); + write_lock_irqsave(&prv->lock, flags); list_add_tail(&sdom->sdom_elem, &CSCHED2_PRIV(ops)->sdom); - spin_unlock_irqrestore(&CSCHED2_PRIV(ops)->lock, flags); + write_unlock_irqrestore(&prv->lock, flags); return (void *)sdom; } @@ -1950,12 +1974,13 @@ csched2_free_domdata(const struct scheduler *ops, void *data) { unsigned long flags; struct csched2_dom *sdom = data; + struct csched2_private *prv = CSCHED2_PRIV(ops); - spin_lock_irqsave(&CSCHED2_PRIV(ops)->lock, flags); + write_lock_irqsave(&prv->lock, flags); list_del_init(&sdom->sdom_elem); - spin_unlock_irqrestore(&CSCHED2_PRIV(ops)->lock, flags); + write_unlock_irqrestore(&prv->lock, flags); xfree(data); } @@ -2109,7 +2134,7 @@ csched2_schedule( rqd = RQD(ops, cpu); BUG_ON(!cpumask_test_cpu(cpu, &rqd->active)); - /* Protected by runqueue lock */ + ASSERT(spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock)); BUG_ON(!is_idle_vcpu(scurr->vcpu) && scurr->rqd != rqd); @@ -2248,12 +2273,12 @@ csched2_dump_pcpu(const struct scheduler *ops, int cpu) /* * We need both locks: - * - csched2_dump_vcpu() wants to access domains' scheduling - * parameters, which are protected by the private scheduler lock; + * - csched2_dump_vcpu() wants to access domains' weights, + * which are protected by the private scheduler lock; * - we scan through the runqueue, so we need the proper runqueue * lock (the one of the runqueue this cpu is associated to). */ - spin_lock_irqsave(&prv->lock, flags); + read_lock_irqsave(&prv->lock, flags); lock = per_cpu(schedule_data, cpu).schedule_lock; spin_lock(lock); @@ -2284,7 +2309,7 @@ csched2_dump_pcpu(const struct scheduler *ops, int cpu) } spin_unlock(lock); - spin_unlock_irqrestore(&prv->lock, flags); + read_unlock_irqrestore(&prv->lock, flags); #undef cpustr } @@ -2297,9 +2322,11 @@ csched2_dump(const struct scheduler *ops) int i, loop; #define cpustr keyhandler_scratch - /* We need the private lock as we access global scheduler data - * and (below) the list of active domains. */ - spin_lock_irqsave(&prv->lock, flags); + /* + * We need the private scheduler lock as we access global + * scheduler data and (below) the list of active domains. + */ + read_lock_irqsave(&prv->lock, flags); printk("Active queues: %d\n" "\tdefault-weight = %d\n", @@ -2360,7 +2387,7 @@ csched2_dump(const struct scheduler *ops) } } - spin_unlock_irqrestore(&prv->lock, flags); + read_unlock_irqrestore(&prv->lock, flags); #undef cpustr } @@ -2462,7 +2489,7 @@ init_pdata(struct csched2_private *prv, unsigned int cpu) unsigned rqi; struct csched2_runqueue_data *rqd; - ASSERT(spin_is_locked(&prv->lock)); + ASSERT(rw_is_write_locked(&prv->lock)); ASSERT(!cpumask_test_cpu(cpu, &prv->initialized)); /* Figure out which runqueue to put it in */ @@ -2501,7 +2528,7 @@ csched2_init_pdata(const struct scheduler *ops, void *pdata, int cpu) */ ASSERT(!pdata); - spin_lock_irqsave(&prv->lock, flags); + write_lock_irqsave(&prv->lock, flags); old_lock = pcpu_schedule_lock(cpu); rqi = init_pdata(prv, cpu); @@ -2510,7 +2537,7 @@ csched2_init_pdata(const struct scheduler *ops, void *pdata, int cpu) /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */ spin_unlock(old_lock); - spin_unlock_irqrestore(&prv->lock, flags); + write_unlock_irqrestore(&prv->lock, flags); } /* Change the scheduler of cpu to us (Credit2). */ @@ -2533,7 +2560,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu, * cpu) is what is necessary to prevent races. */ ASSERT(!local_irq_is_enabled()); - spin_lock(&prv->lock); + write_lock(&prv->lock); idle_vcpu[cpu]->sched_priv = vdata; @@ -2558,7 +2585,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu, smp_mb(); per_cpu(schedule_data, cpu).schedule_lock = &prv->rqd[rqi].lock; - spin_unlock(&prv->lock); + write_unlock(&prv->lock); } static void @@ -2569,7 +2596,7 @@ csched2_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu) struct csched2_runqueue_data *rqd; int rqi; - spin_lock_irqsave(&prv->lock, flags); + write_lock_irqsave(&prv->lock, flags); /* * alloc_pdata is not implemented, so pcpu must be NULL. On the other @@ -2600,7 +2627,7 @@ csched2_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu) __cpumask_clear_cpu(cpu, &prv->initialized); - spin_unlock_irqrestore(&prv->lock, flags); + write_unlock_irqrestore(&prv->lock, flags); return; } @@ -2651,7 +2678,7 @@ csched2_init(struct scheduler *ops) return -ENOMEM; ops->sched_data = prv; - spin_lock_init(&prv->lock); + rwlock_init(&prv->lock); INIT_LIST_HEAD(&prv->sdom); /* But un-initialize all runqueues */