From patchwork Thu Aug 18 10:00:52 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dario Faggioli X-Patchwork-Id: 9287195 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 2333E607FF for ; Thu, 18 Aug 2016 10:03:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1521E290C3 for ; Thu, 18 Aug 2016 10:03:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 09B42290D4; Thu, 18 Aug 2016 10:03:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 76036290CF for ; Thu, 18 Aug 2016 10:03:36 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1baK89-0006F1-BM; Thu, 18 Aug 2016 10:00:57 +0000 Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1baK87-0006Ep-VH for xen-devel@lists.xenproject.org; Thu, 18 Aug 2016 10:00:56 +0000 Received: from [85.158.137.68] by server-1.bemta-3.messagelabs.com id 36/86-17152-75785B75; Thu, 18 Aug 2016 10:00:55 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrNIsWRWlGSWpSXmKPExsXiVRvkpBvWvjX coGGWgsX3LZOZHBg9Dn+4whLAGMWamZeUX5HAmnG6o7Dgo1rF6iltLA2Mm+S7GLk4hASmMUqs mXiTGcRhEVjDKnG8cy1rFyMnh4TAJVaJC8uLIOwYiYtNnVDxSoknl2+xgdhCAioSN7evYoKYN IdJ4vezB2BFwgJ6EkeO/mDvYuQAssMkjv41AQmzCRhIvNmxF6xEREBJ4t6qyUwgNrNAjcS80z dZQGwWAVWJnnvbwGp4Bbwknh18BGZzCvhINH08xwSx11vi0s4TzCC2qICcxMrLLVD1ghInZz5 hAVnLLKApsX6XPsR4eYntb+cwT2AUmYWkahZC1SwkVQsYmVcxqhenFpWlFuma6SUVZaZnlOQm ZuboGhoY6+WmFhcnpqfmJCYV6yXn525iBAZ+PQMD4w7GK23OhxglOZiURHmt+LaGC/El5adUZ iQWZ8QXleakFh9i1ODgEJhwdu50JimWvPy8VCUJ3jutQHWCRanpqRVpmTnA2IQpleDgURLhtW gDSvMWFyTmFmemQ6ROMepybJl6by2TENgMKXFeA5AiAZCijNI8uBGwNHGJUVZKmJeRgYFBiKc gtSg3swRV/hWjOAejkjCvLMgUnsy8ErhNr4COYAI6gpd/C8gRJYkIKakGxpNvDJ4dEms28qus 0AjT1jhybN2WvH/ffl/KCtnS4/3pnYDl9XnCu0PMtu+SrCxNdNxq7Vf7VXCmgZmttI+Zzz13c Z1fkcqT505KYAv82tEjU2+hPOOvwgH9edoXt1uofDpl8ft95au3nA8Dlm398iGuNrmsstf/iu itPYFhN1hbsqYreykXKrEUZyQaajEXFScCAHfDoVQOAwAA X-Env-Sender: raistlin.df@gmail.com X-Msg-Ref: server-16.tower-31.messagelabs.com!1471514454!48593518!1 X-Originating-IP: [74.125.82.66] X-SpamReason: No, hits=0.2 required=7.0 tests=RCVD_ILLEGAL_IP X-StarScan-Received: X-StarScan-Version: 8.84; banners=-,-,- X-VirusChecked: Checked Received: (qmail 7964 invoked from network); 18 Aug 2016 10:00:54 -0000 Received: from mail-wm0-f66.google.com (HELO mail-wm0-f66.google.com) (74.125.82.66) by server-16.tower-31.messagelabs.com with AES128-GCM-SHA256 encrypted SMTP; 18 Aug 2016 10:00:54 -0000 Received: by mail-wm0-f66.google.com with SMTP id i5so4590537wmg.2 for ; Thu, 18 Aug 2016 03:00:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=4i4ZAX9hCJaRKRLA7Y2+RIlSyd7ZyFoYcmsgjizhRpc=; b=Ia2oV5QEy1ip+q4j1YfG3ErhB18BUNMVnO5bh/Q1iZp4CWzq+SiZXFVwjbjoAhA0gj amZYJhI+JEcmsNchIiIDLJwO9nPFBvsFIH7A2Jz6YxNxJYXkUtp7tp+bJw+nuQIEdC7b rmR17923HVsufe9xAXV8qpXGkiTROrqW7SIphhetkTGj3cGP2NfQ/bOqxCI/4usOkzUc 9EkWtF69SMg26cWHwTlnKl45Kn805Tkk+2IpOXUENKZU1c4ZJo6fKfqciuvYm/moGLro m7H1teh6nfZsVcQKNUxWtX5iOaJGY8fbUx+305zVEVHWsmUzhRCSNIiAeqOvfEwU1PXY QLSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:subject:from:to:cc:date:message-id :in-reply-to:references:user-agent:mime-version :content-transfer-encoding; bh=4i4ZAX9hCJaRKRLA7Y2+RIlSyd7ZyFoYcmsgjizhRpc=; b=KLQpDoSsOiFEko8xNxID1yKcBJTH4LZbpfW8poUHZXKtWqb/lYtry2lgk081ks/XD1 1bwZRUnw/o0/TJJheQfeprxZSClg9GIszc3Wz4tT2CN9AEmGdeJOYVPbitFt+x3SbHwH r1r8aLBW0yLssJLlXrC6yy1y8dzh6idUoczqfMVY0gNo0H3oa6m9DvNpksYpNPiKruz2 4ADZfM6kUzHp3WDIRT/zR3CAP6iwf19tOcHPUWlQBVb001OTv85Rb5rZeSOzAvNBkjUU fk94vq8C0gzQ3ieYWO/Jz470FVjfDBrVt2bYI3RiZ1U33MqC1XJazf841jF3cjtMBka1 symQ== X-Gm-Message-State: AEkoouvbb0wi22nYK/zksiFKYOsaKyuuHrgA3R0atWT04xOBd39LBsLCul4NWDpkxjyBgQ== X-Received: by 10.28.100.70 with SMTP id y67mr32064452wmb.23.1471514454070; Thu, 18 Aug 2016 03:00:54 -0700 (PDT) Received: from Solace.fritz.box (net-2-32-14-104.cust.vodafonedsl.it. [2.32.14.104]) by smtp.gmail.com with ESMTPSA id vh6sm1484569wjb.0.2016.08.18.03.00.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 18 Aug 2016 03:00:53 -0700 (PDT) From: Dario Faggioli To: xen-devel@lists.xenproject.org Date: Thu, 18 Aug 2016 12:00:52 +0200 Message-ID: <147151445223.29674.955105994328843699.stgit@Solace.fritz.box> In-Reply-To: <147151337343.29674.1081345215393715232.stgit@Solace.fritz.box> References: <147151337343.29674.1081345215393715232.stgit@Solace.fritz.box> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Cc: George Dunlap , Andrew Cooper , Jan Beulich Subject: [Xen-devel] [PATCH v2 1/2] xen: credit1: fix a race when picking initial pCPU for a vCPU X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP In the Credit1 hunk of 9f358ddd69463 ("xen: Have schedulers revise initial placement") csched_cpu_pick() is called without taking the runqueue lock of the (temporary) pCPU that the vCPU has been assigned to (e.g., in XEN_DOMCTL_max_vcpus). However, although 'hidden' in the IS_RUNQ_IDLE() macro, that function does access the runq (for doing load balancing calculations). Two scenarios are possible: 1) we are on cpu X, and IS_RUNQ_IDLE() peeks at cpu's X own runq; 2) we are on cpu X, but IS_RUNQ_IDLE() peeks at some other cpu's runq. Scenario 2) absolutely requies that the appropriate runq lock is taken. Scenario 1) works even without taking the cpu's own runq lock, and this is important for the case when _csched_pick_cpu() is called from csched_vcpu_acct() which in turn is called by csched_tick(). Races have been observed and reported (by both XenServer own testing and OSSTest [1]), in the form of IS_RUNQ_IDLE() falling over LIST_POISON, because we're not currently holding the proper lock, in csched_vcpu_insert(), when scenario 1) occurs. Since this is all very tricky, in addition to fix things, add both an ASSERT() and a comment in IS_RUNQ_IDLE() (which is also becoming static inline instead of macro). [1] https://lists.xen.org/archives/html/xen-devel/2016-08/msg02144.html Reported-by: Andrew Cooper Signed-off-by: Dario Faggioli --- Cc: George Dunlap Cc: Andrew Cooper Cc: Jan Beulich --- Changes from v1: - macro IS_RUNQ_IDLE() to static inline is_runq_idle(), as suggested during review; - add an ASSERT() and a comment, as suggested during review; - take into account what's described in the changelog as "scenario 1)", which wasn't being considered in v1. --- xen/common/sched_credit.c | 38 +++++++++++++++++++++++++++++++++----- 1 file changed, 33 insertions(+), 5 deletions(-) diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c index 220ff0d..daace81 100644 --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -84,9 +84,6 @@ #define CSCHED_VCPU(_vcpu) ((struct csched_vcpu *) (_vcpu)->sched_priv) #define CSCHED_DOM(_dom) ((struct csched_dom *) (_dom)->sched_priv) #define RUNQ(_cpu) (&(CSCHED_PCPU(_cpu)->runq)) -/* Is the first element of _cpu's runq its idle vcpu? */ -#define IS_RUNQ_IDLE(_cpu) (list_empty(RUNQ(_cpu)) || \ - is_idle_vcpu(__runq_elem(RUNQ(_cpu)->next)->vcpu)) /* @@ -248,6 +245,33 @@ __runq_elem(struct list_head *elem) return list_entry(elem, struct csched_vcpu, runq_elem); } +/* Is the first element of cpu's runq (if any) cpu's idle vcpu? */ +static inline bool_t is_runq_idle(unsigned int cpu) +{ + /* + * If we are on cpu, and we are peeking at our own runq while cpu itself + * is not idle, that's fine even if we don't hold the runq lock. In fact, + * the fact that there is a (non idle!) vcpu running means that at least + * the idle vcpu is in the runq. And since only cpu itself (via work + * stealing) can add stuff to the runq, and no other cpu will ever steal + * our idle vcpu, that maks the runq manipulations done below safe, even + * without locks. + * + * On the other hand, if we're peeking at another cpu's runq, we must hold + * its the proper runq lock. + * + * As a matter of fact, the former scenario describes what happens when + * _cshced_cpu_pick() (which then calls us) is called from csched_tick(), + * the latter one describes what actually happen when it is called from + * csched_vcpu_insert(). + */ + ASSERT((!is_idle_vcpu(curr_on_cpu(cpu)) && cpu == smp_processor_id()) || + spin_is_locked(per_cpu(schedule_data, cpu).schedule_lock)); + + return list_empty(RUNQ(cpu)) || + is_idle_vcpu(__runq_elem(RUNQ(cpu)->next)->vcpu); +} + static inline void __runq_insert(struct csched_vcpu *svc) { @@ -771,7 +795,7 @@ _csched_cpu_pick(const struct scheduler *ops, struct vcpu *vc, bool_t commit) * runnable vcpu on cpu, we add cpu to the idlers. */ cpumask_and(&idlers, &cpu_online_map, CSCHED_PRIV(ops)->idlers); - if ( vc->processor == cpu && IS_RUNQ_IDLE(cpu) ) + if ( vc->processor == cpu && is_runq_idle(cpu) ) __cpumask_set_cpu(cpu, &idlers); cpumask_and(&cpus, &cpus, &idlers); @@ -998,9 +1022,13 @@ csched_vcpu_insert(const struct scheduler *ops, struct vcpu *vc) BUG_ON( is_idle_vcpu(vc) ); - /* This is safe because vc isn't yet being scheduled */ + /* csched_cpu_pick() looks in vc->processor's runq, so we need the lock. */ + lock = vcpu_schedule_lock_irq(vc); + vc->processor = csched_cpu_pick(ops, vc); + spin_unlock_irq(lock); + lock = vcpu_schedule_lock_irq(vc); if ( !__vcpu_on_runq(svc) && vcpu_runnable(vc) && !vc->is_running )