From patchwork Fri Mar 18 19:05:05 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Dario Faggioli <dario.faggioli@citrix.com>
X-Patchwork-Id: 8623291
Return-Path: <xen-devel-bounces@lists.xen.org>
X-Original-To: patchwork-xen-devel@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.29.136])
	by patchwork1.web.kernel.org (Postfix) with ESMTP id 8B9A89F6E1
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Fri, 18 Mar 2016 19:07:10 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 57B4D20306
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Fri, 18 Mar 2016 19:07:09 +0000 (UTC)
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id 19AAD20373
	for <patchwork-xen-devel@patchwork.kernel.org>;
	Fri, 18 Mar 2016 19:07:08 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <xen-devel-bounces@lists.xen.org>)
	id 1agzhu-0000O0-Os; Fri, 18 Mar 2016 19:05:10 +0000
Received: from mail6.bemta5.messagelabs.com ([195.245.231.135])
	by lists.xenproject.org with esmtp (Exim 4.84_2)
	(envelope-from <raistlin.df@gmail.com>) id 1agzht-0000NQ-Ss
	for xen-devel@lists.xenproject.org; Fri, 18 Mar 2016 19:05:09 +0000
Received: from [85.158.139.211] by server-9.bemta-5.messagelabs.com id
	14/2D-22144-5615CE65; Fri, 18 Mar 2016 19:05:09 +0000
X-Env-Sender: raistlin.df@gmail.com
X-Msg-Ref: server-13.tower-206.messagelabs.com!1458327908!29850540!1
X-Originating-IP: [74.125.82.65]
X-SpamReason: No, hits=0.2 required=7.0 tests=RCVD_ILLEGAL_IP
X-StarScan-Received: 
X-StarScan-Version: 8.11; banners=-,-,-
X-VirusChecked: Checked
Received: (qmail 60278 invoked from network); 18 Mar 2016 19:05:08 -0000
Received: from mail-wm0-f65.google.com (HELO mail-wm0-f65.google.com)
	(74.125.82.65)
	by server-13.tower-206.messagelabs.com with AES128-GCM-SHA256
	encrypted SMTP; 18 Mar 2016 19:05:08 -0000
Received: by mail-wm0-f65.google.com with SMTP id p65so8547548wmp.1
	for <xen-devel@lists.xenproject.org>;
	Fri, 18 Mar 2016 12:05:08 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=sender:subject:from:to:cc:date:message-id:in-reply-to:references
	:user-agent:mime-version:content-transfer-encoding;
	bh=7Yf+T5gnDJ0poWPBt/E1Ray25foYI8KFmQa93mOAcww=;
	b=TS5tCgZXzYa85RvsErSbf0leDaGvhRfYE/prNbKSXnKNgUSDnNABQM6IS96UjdABi2
	obaUesEzytMOqNcm1k+TXVNYHNNMB+xjqRTzCqnK6sIxtDZpNo2kuCm9zAGruyKq1+bz
	In9ODFKxYuF13BrY1w67aXGOiwIUTIR14lwXQXGpVLwFfJkbR4Fzn6kkKoyZW/uLwyFy
	0V8PpKcZKNyt3kCGBc00YEN1KmEp601q5N9VJxC9KYysW2/zOH6XQvn8q1Z6BJ0MI5xs
	6NC9t6qeQmxW4HKhn2xGUynP1JPc+bfx1nSP8eIHhhCtkF8F0VlZyTJ4FJozkcV2JDUX
	i0mg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20130820;
	h=x-gm-message-state:sender:subject:from:to:cc:date:message-id
	:in-reply-to:references:user-agent:mime-version
	:content-transfer-encoding;
	bh=7Yf+T5gnDJ0poWPBt/E1Ray25foYI8KFmQa93mOAcww=;
	b=STpPkLX7oNL0/CzTpnhPOvlk/A4uZoSFAHbgHhypvBJDSY9gCBaOXvPCuM4syx/mg/
	WbiwYbqK6eOmmzbM0JHFuTb9IgwGz+HbBjqWxBfzi8lEdnaLYsbJXSiaJteDv/1mypm6
	6C1tZzoq8xlm3UslOfleZcffJt1sleJLGvmwBwrDLERbdEb3PFXLXbjjOggpnjSBFNIt
	b0UTKmLGLF/Tgg4rnmk9TFP9NAIV+IXVNRXy3IDvwT3PlpcJXG3VQ1Vr+J+uAN4Q8FO0
	1FFEJd/JyuiwBP7kCmRoXq2Ur5R0de+aj6HVZIXDNlnMuIoJVfiCaZaKxCvDOw8NNZNt
	jfZw==
X-Gm-Message-State: 
 AD7BkJJvJCyLFpVm2TIlW4zpR2q3rWOqf+U4BqcDIx6xUqEchN6r44eWxIqRxaYdMc5uWw==
X-Received: by 10.194.118.106 with SMTP id
	kl10mr19899191wjb.154.1458327908222;
	Fri, 18 Mar 2016 12:05:08 -0700 (PDT)
Received: from Solace.station (net-2-35-170-8.cust.vodafonedsl.it.
	[2.35.170.8]) by smtp.gmail.com with ESMTPSA id
	w188sm536085wmw.19.2016.03.18.12.05.06
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Fri, 18 Mar 2016 12:05:07 -0700 (PDT)
From: Dario Faggioli <dario.faggioli@citrix.com>
To: xen-devel@lists.xenproject.org
Date: Fri, 18 Mar 2016 20:05:05 +0100
Message-ID: <20160318190505.8117.89778.stgit@Solace.station>
In-Reply-To: <20160318185524.8117.74837.stgit@Solace.station>
References: <20160318185524.8117.74837.stgit@Solace.station>
User-Agent: StGit/0.17.1-dirty
MIME-Version: 1.0
Cc: George Dunlap <george.dunlap@eu.citrix.com>,
	Tianyang Chen <tiche@seas.upenn.edu>, Meng Xu <mengxu@cis.upenn.edu>
Subject: [Xen-devel] [PATCH 09/16] xen: sched: close potential races when
	switching scheduler to CPUs
X-BeenThere: xen-devel@lists.xen.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xen.org>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Errors-To: xen-devel-bounces@lists.xen.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>
X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_MED, T_DKIM_INVALID,
	UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

by using the sched_switch hook that we have introduced in
the various schedulers.

The key is to let the actual switch of scheduler and the
remapping of the scheduler lock for the CPU (if necessary)
happen together (in the same critical section) protected
(at least) by the old scheduler lock for the CPU.

This also means that, in Credit2 and RTDS, we can get rid
of the code that was doing the scheduler lock remapping
in csched2_free_pdata() and rt_free_pdata(), and of their
triggering ASSERT-s.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Meng Xu <mengxu@cis.upenn.edu>
Cc: Tianyang Chen <tiche@seas.upenn.edu>
---
 xen/common/sched_credit.c  |    9 +++++++++
 xen/common/sched_credit2.c |   28 ++++++++++------------------
 xen/common/sched_rt.c      |   13 -------------
 xen/common/schedule.c      |   30 +++++++++++++++++++++---------
 4 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 929ba9c..903a704 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -577,6 +577,15 @@ csched_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 {
     unsigned long flags;
     struct csched_private *prv = CSCHED_PRIV(ops);
+    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
+
+    /*
+     * This is called either during during boot, resume or hotplug, in
+     * case Credit1 is the scheduler chosen at boot. In such cases, the
+     * scheduler lock for cpu is already pointing to the default per-cpu
+     * spinlock, as Credit1 needs it, so there is no remapping to be done.
+     */
+    ASSERT(sd->schedule_lock == &sd->_lock && !spin_is_locked(&sd->_lock));
 
     spin_lock_irqsave(&prv->lock, flags);
     init_pdata(prv, pdata, cpu);
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 25d8e85..64fb028 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1974,7 +1974,6 @@ init_pdata(struct csched2_private *prv, unsigned int cpu)
 {
     unsigned rqi;
     struct csched2_runqueue_data *rqd;
-    spinlock_t *old_lock;
 
     ASSERT(spin_is_locked(&prv->lock));
     ASSERT(!cpumask_test_cpu(cpu, &prv->initialized));
@@ -2005,21 +2004,11 @@ init_pdata(struct csched2_private *prv, unsigned int cpu)
         activate_runqueue(prv, rqi);
     }
     
-    /* IRQs already disabled */
-    old_lock = pcpu_schedule_lock(cpu);
-
-    /* Move spinlock to new runq lock.  */
-    per_cpu(schedule_data, cpu).schedule_lock = &rqd->lock;
-
     /* Set the runqueue map */
     prv->runq_map[cpu] = rqi;
     
     cpumask_set_cpu(cpu, &rqd->idle);
     cpumask_set_cpu(cpu, &rqd->active);
-
-    /* _Not_ pcpu_schedule_unlock(): per_cpu().schedule_lock changed! */
-    spin_unlock(old_lock);
-
     cpumask_set_cpu(cpu, &prv->initialized);
 
     return rqi;
@@ -2029,10 +2018,19 @@ static void
 csched2_init_pdata(const struct scheduler *ops, void *pdata, int cpu)
 {
     struct csched2_private *prv = CSCHED2_PRIV(ops);
+    spinlock_t *old_lock;
     unsigned long flags;
+    unsigned rqi;
 
     spin_lock_irqsave(&prv->lock, flags);
-    init_pdata(prv, cpu);
+    old_lock = pcpu_schedule_lock(cpu);
+
+    rqi = init_pdata(prv, cpu);
+    /* Move the scheduler lock to the new runq lock. */
+    per_cpu(schedule_data, cpu).schedule_lock = &prv->rqd[rqi].lock;
+
+    /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */
+    spin_unlock(old_lock);
     spin_unlock_irqrestore(&prv->lock, flags);
 }
 
@@ -2079,7 +2077,6 @@ csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
     unsigned long flags;
     struct csched2_private *prv = CSCHED2_PRIV(ops);
     struct csched2_runqueue_data *rqd;
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
     int rqi;
 
     spin_lock_irqsave(&prv->lock, flags);
@@ -2107,11 +2104,6 @@ csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
         deactivate_runqueue(prv, rqi);
     }
 
-    /* Move spinlock to the original lock.  */
-    ASSERT(sd->schedule_lock == &rqd->lock);
-    ASSERT(!spin_is_locked(&sd->_lock));
-    sd->schedule_lock = &sd->_lock;
-
     spin_unlock(&rqd->lock);
 
     cpumask_clear_cpu(cpu, &prv->initialized);
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 92be248..0564b1d 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -718,19 +718,6 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu)
 static void
 rt_free_pdata(const struct scheduler *ops, void *pcpu, int cpu)
 {
-    struct rt_private *prv = rt_priv(ops);
-    struct schedule_data *sd = &per_cpu(schedule_data, cpu);
-    unsigned long flags;
-
-    spin_lock_irqsave(&prv->lock, flags);
-
-    /* Move spinlock back to the default lock */
-    ASSERT(sd->schedule_lock == &prv->lock);
-    ASSERT(!spin_is_locked(&sd->_lock));
-    sd->schedule_lock = &sd->_lock;
-
-    spin_unlock_irqrestore(&prv->lock, flags);
-
     free_cpumask_var(_cpumask_scratch[cpu]);
 }
 
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 1adc0e2..29582a6 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -1617,7 +1617,6 @@ void __init scheduler_init(void)
 int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
 {
     struct vcpu *idle;
-    spinlock_t *lock;
     void *ppriv, *ppriv_old, *vpriv, *vpriv_old;
     struct scheduler *old_ops = per_cpu(scheduler, cpu);
     struct scheduler *new_ops = (c == NULL) ? &ops : c->sched;
@@ -1640,11 +1639,21 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
     if ( old_ops == new_ops )
         goto out;
 
+    /*
+     * To setup the cpu for the new scheduler we need:
+     *  - a valid instance of per-CPU scheduler specific data, as it is
+     *    allocated by SCHED_OP(alloc_pdata). Note that we do not want to
+     *    initialize it yet (i.e., we are not calling SCHED_OP(init_pdata)).
+     *    That will be done by the target scheduler, in SCHED_OP(switch_sched),
+     *    in proper ordering and with locking.
+     *  - a valid instance of per-vCPU scheduler specific data, for the idle
+     *    vCPU of cpu. That is what the target scheduler will use for the
+     *    sched_priv field of the per-vCPU info of the idle domain.
+     */
     idle = idle_vcpu[cpu];
     ppriv = SCHED_OP(new_ops, alloc_pdata, cpu);
     if ( IS_ERR(ppriv) )
         return PTR_ERR(ppriv);
-    SCHED_OP(new_ops, init_pdata, ppriv, cpu);
     vpriv = SCHED_OP(new_ops, alloc_vdata, idle, idle->domain->sched_priv);
     if ( vpriv == NULL )
     {
@@ -1652,17 +1661,20 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c)
         return -ENOMEM;
     }
 
-    lock = pcpu_schedule_lock_irq(cpu);
-
     SCHED_OP(old_ops, tick_suspend, cpu);
+
+    /*
+     * The actual switch, including (if necessary) the rerouting of the
+     * scheduler lock to whatever new_ops prefers,  needs to happen in one
+     * critical section, protected by old_ops' lock, or races are possible.
+     * Since each scheduler has its own contraints and locking scheme, do
+     * that inside specific scheduler code, rather than here.
+     */
     vpriv_old = idle->sched_priv;
-    idle->sched_priv = vpriv;
-    per_cpu(scheduler, cpu) = new_ops;
     ppriv_old = per_cpu(schedule_data, cpu).sched_priv;
-    per_cpu(schedule_data, cpu).sched_priv = ppriv;
-    SCHED_OP(new_ops, tick_resume, cpu);
+    SCHED_OP(new_ops, switch_sched, cpu, ppriv, vpriv);
 
-    pcpu_schedule_unlock_irq(lock, cpu);
+    SCHED_OP(new_ops, tick_resume, cpu);
 
     SCHED_OP(old_ops, free_vdata, vpriv_old);
     SCHED_OP(old_ops, free_pdata, ppriv_old, cpu);