From patchwork Thu Jul 27 08:01:24 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dario Faggioli X-Patchwork-Id: 9866407 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 9E8956038C for ; Thu, 27 Jul 2017 08:03:56 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8EF3028782 for ; Thu, 27 Jul 2017 08:03:56 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 83AB82878C; Thu, 27 Jul 2017 08:03:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.6 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,RCVD_IN_SORBS_SPAM,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 9D98628782 for ; Thu, 27 Jul 2017 08:03:55 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dadjf-0000GQ-6o; Thu, 27 Jul 2017 08:01:31 +0000 Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dadjd-0000Fy-De for xen-devel@lists.xenproject.org; Thu, 27 Jul 2017 08:01:29 +0000 Received: from [85.158.143.35] by server-6.bemta-6.messagelabs.com id F0/11-03937-8DD99795; Thu, 27 Jul 2017 08:01:28 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrOIsWRWlGSWpSXmKPExsXiVRvkqHttbmW kwbIX5hbft0xmcmD0OPzhCksAYxRrZl5SfkUCa8bJ9c4F/90qZt9ewNjA+Na8i5GLQ0hgJqPE y7vvWEEcFoE1rBI3Pq5kA3EkBC6xStw69BbI4QBy4iQ+HfXrYuQEMssl1k2fyQpiCwmoSNzcv ooJYtIPRolt/36ygSSEBfQkjhz9wQ5hx0s0L5zBCGKzCRhIvNmxF6xZREBJ4t6qyWDNzAKLGC UunzjGDrKMRUBV4vhEHZAaXgEniaNT5oLVcwo4A5VsYIFY7CRxetcqsF2iAnISKy+3sELUC0q cnPmEBWQMs4CmxPpd+iBhZgF5ie1v5zBPYBSZhaRqFkLVLCRVCxiZVzFqFKcWlaUW6RqZ6iUV ZaZnlOQmZuboGhqY6eWmFhcnpqfmJCYV6yXn525iBAY/AxDsYFy1IPAQoyQHk5Io7yTTikghv qT8lMqMxOKM+KLSnNTiQ4wyHBxKEryX51RGCgkWpaanVqRl5gDjECYtwcGjJMJrCZLmLS5IzC 3OTIdInWI05tiwev0XJo5JB7Z/YRJiycvPS5US550HUioAUppRmgc3CJYeLjHKSgnzMgKdJsR TkFqUm1mCKv+KUZyDUUmYdx/IFJ7MvBK4fa+ATmECOmViE9gpJYkIKakGRtt9u27w7DjUYbjx rUSzU0DbtCmvuSV0X908lzTDScx8xye3AMP87ByhPaf8g73fb3w2LX31/bUMVWsKz3gfiLhb+ Ntxzb5/ioazu0Om3lVn2cDvbWEr76r3y+i04f8Xc/41iPuuK1VObp0tZvu/8mi3TfruDC6x47 96JtlvOnonK4/rcVZmnxJLcUaioRZzUXEiAH0VPpcKAwAA X-Env-Sender: raistlin.df@gmail.com X-Msg-Ref: server-7.tower-21.messagelabs.com!1501142486!75195287!1 X-Originating-IP: [74.125.82.65] X-SpamReason: No, hits=0.0 required=7.0 tests= X-StarScan-Received: X-StarScan-Version: 9.4.25; banners=-,-,- X-VirusChecked: Checked Received: (qmail 26713 invoked from network); 27 Jul 2017 08:01:26 -0000 Received: from mail-wm0-f65.google.com (HELO mail-wm0-f65.google.com) (74.125.82.65) by server-7.tower-21.messagelabs.com with AES128-GCM-SHA256 encrypted SMTP; 27 Jul 2017 08:01:26 -0000 Received: by mail-wm0-f65.google.com with SMTP id c184so10142020wmd.1 for ; Thu, 27 Jul 2017 01:01:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:from:to:cc:date:message-id:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=FlPwIc4SNZ1bWkIY8u7OfqhSO+w3QRPoyejWXhoE3DU=; b=NxXbUbyDURzkxOna2uxrPZOGuD838FRamXtTFxSAApfr7ncXETj1lLBgPriht4X6zp OneSQP9GD1rhwJ2wMGpG7kBEJY1hEzsgKfrraBXAM5nSbBScV2yNURAF7xIucFo9sd/Q I5UdEZAgSz+cZkHvmvqN9XoXfBpWaQmvHm9qDoUCjsHiT0wWEBTbZETnJguED2CJhm4M vK+cyJlrf1N/B0xasiHPXL0UDtjfBymuaNfH0gD7I3n5Q40QFWUxIPlHHhwluR5GSiUB uwwEPQT0Mz3GbzvzDY5+xoSnjOr6mt0jhWGSIJoPzwW2FGezmyenLl1HaIFjRjR5bcCy Tp1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:from:to:cc:date:message-id :in-reply-to:references:user-agent:mime-version :content-transfer-encoding; bh=FlPwIc4SNZ1bWkIY8u7OfqhSO+w3QRPoyejWXhoE3DU=; b=TIH4Yj4exWRJgTi5GpmPs17bXkXboIBjyw65S3SieK3HM4KtsMn8CBn8KvC+qJ1Cwk 1RkYBDB51LNmJ76UuSSiCaU9vfFQYKao2KXAmgvZ0nPmEu1nyAYEa/fbQT7Nb7eTX0iu loSOVLMCkAxuN9RLP29X65r89+1miEuAt60otZuBX/4bDEnCv94GUcUSa88j06R7VD9Q SyYmRIFd1xRes8FSptjmjksdy+nX3xZTYF2fZH/FVe+Au4iFdwjKzXEYV0TYMfrGmCOx vCzJr9+uLg8sQyyM6W4+53vDMa2fD93EMEGsPqLU9a2O3pWIb9wjfVZDu4Gj0ha/Zriz fV0A== X-Gm-Message-State: AIVw112Yur4bGOPsr3+KvwQAlQf0SfbEmMrDjwu/O9hlTaMSv3MtXKII VSy8+YN0Bbwkb6R6 X-Received: by 10.28.188.85 with SMTP id m82mr2396073wmf.3.1501142486239; Thu, 27 Jul 2017 01:01:26 -0700 (PDT) Received: from [192.168.0.31] ([80.66.223.212]) by smtp.gmail.com with ESMTPSA id v8sm20843409wrd.28.2017.07.27.01.01.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 27 Jul 2017 01:01:25 -0700 (PDT) From: Dario Faggioli To: xen-devel@lists.xenproject.org Date: Thu, 27 Jul 2017 10:01:24 +0200 Message-ID: <150114248433.22910.16140726025093688678.stgit@Solace> In-Reply-To: <150114201043.22910.12807057883146318803.stgit@Solace> References: <150114201043.22910.12807057883146318803.stgit@Solace> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Cc: Andrew Cooper , Julien Grall , Stefano Stabellini , Jan Beulich Subject: [Xen-devel] [PATCH 3/5] xen: RCU/x86/ARM: discount CPUs that were idle when grace period started. X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Xen is a tickless (micro-)kernel. This means that, when a CPU becomes idle, we stop all the activity on it, including any periodic tick or timer. When we imported RCU from Linux, Linux (x86) was a ticking kernel, i.e., there was a periodic timer tick always running, even on totally idle CPUs. This was bad from a power efficiency perspective, but it's what maked it possible to monitor the quiescent states of all the CPUs, and hence tell when an RCU grace period ends. In Xen, that is impossible, and that's particularly problematic when system is idle (or lightly loaded) systems, as CPUs that are idle may never have the chance to tell RCU about their quiescence, and grace periods could extend indefinitely! This has led, on x86, to long (an unpredictable) delays between RCU callbacks queueing and invokation. On ARM, we actually see infinite grace periods (e.g., complate_domain_destroy() may never be actually invoked on an idle system). See here: https://lists.xenproject.org/archives/html/xen-devel/2017-01/msg02454.html The first step for fixing this situation is for RCU to record, at the beginning of a grace period, which CPUs are already idle. In fact, being idle, they can't be in the middle of any read-side critical section, and we don't have to wait for them to declare a grace period finished. This is tracked in a cpumask, in a way that is very similar to how Linux also was achieving the same on s390 --which indeed was tickless already, even back then-- and to what it started to do for x86, from 2.6.21 on (see commit 79bf2bb3 "tick-management: dyntick / highres functionality"). While there, also adopt the memory barrier introduced by Linux commit commit c3f59023 ("Fix RCU race in access of nohz_cpu_mask"). Signed-off-by: Dario Faggioli --- Cc: Stefano Stabellini Cc: Julien Grall Cc: Jan Beulich Cc: Andrew Cooper --- xen/arch/arm/domain.c | 2 ++ xen/arch/x86/acpi/cpu_idle.c | 25 +++++++++++++++++-------- xen/arch/x86/cpu/mwait-idle.c | 9 ++++++++- xen/arch/x86/domain.c | 8 +++++++- xen/common/rcupdate.c | 28 ++++++++++++++++++++++++++-- xen/include/xen/rcupdate.h | 3 +++ 6 files changed, 63 insertions(+), 12 deletions(-) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index fce29cb..666b7ef 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -50,8 +50,10 @@ static void do_idle(void) local_irq_disable(); if ( cpu_is_haltable(cpu) ) { + rcu_idle_enter(cpu); dsb(sy); wfi(); + rcu_idle_exit(cpu); } local_irq_enable(); diff --git a/xen/arch/x86/acpi/cpu_idle.c b/xen/arch/x86/acpi/cpu_idle.c index 482b8a7..04c52e8 100644 --- a/xen/arch/x86/acpi/cpu_idle.c +++ b/xen/arch/x86/acpi/cpu_idle.c @@ -418,14 +418,16 @@ static void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx) mwait_idle_with_hints(cx->address, MWAIT_ECX_INTERRUPT_BREAK); } -static void acpi_idle_do_entry(struct acpi_processor_cx *cx) +static void acpi_idle_do_entry(unsigned int cpu, struct acpi_processor_cx *cx) { + rcu_idle_enter(cpu); + switch ( cx->entry_method ) { case ACPI_CSTATE_EM_FFH: /* Call into architectural FFH based C-state */ acpi_processor_ffh_cstate_enter(cx); - return; + break; case ACPI_CSTATE_EM_SYSIO: /* IO port based C-state */ inb(cx->address); @@ -433,12 +435,14 @@ static void acpi_idle_do_entry(struct acpi_processor_cx *cx) because chipsets cannot guarantee that STPCLK# signal gets asserted in time to freeze execution properly. */ inl(pmtmr_ioport); - return; + break; case ACPI_CSTATE_EM_HALT: safe_halt(); local_irq_disable(); - return; + break; } + + rcu_idle_exit(cpu); } static int acpi_idle_bm_check(void) @@ -540,7 +544,8 @@ void update_idle_stats(struct acpi_processor_power *power, static void acpi_processor_idle(void) { - struct acpi_processor_power *power = processor_powers[smp_processor_id()]; + unsigned int cpu = smp_processor_id(); + struct acpi_processor_power *power = processor_powers[cpu]; struct acpi_processor_cx *cx = NULL; int next_state; uint64_t t1, t2 = 0; @@ -563,7 +568,11 @@ static void acpi_processor_idle(void) if ( pm_idle_save ) pm_idle_save(); else + { + rcu_idle_enter(cpu); safe_halt(); + rcu_idle_exit(cpu); + } return; } @@ -579,7 +588,7 @@ static void acpi_processor_idle(void) */ local_irq_disable(); - if ( !cpu_is_haltable(smp_processor_id()) ) + if ( !cpu_is_haltable(cpu) ) { local_irq_enable(); sched_tick_resume(); @@ -610,7 +619,7 @@ static void acpi_processor_idle(void) update_last_cx_stat(power, cx, t1); /* Invoke C2 */ - acpi_idle_do_entry(cx); + acpi_idle_do_entry(cpu, cx); /* Get end time (ticks) */ t2 = cpuidle_get_tick(); trace_exit_reason(irq_traced); @@ -672,7 +681,7 @@ static void acpi_processor_idle(void) } /* Invoke C3 */ - acpi_idle_do_entry(cx); + acpi_idle_do_entry(cpu, cx); if ( (cx->type == ACPI_STATE_C3) && power->flags.bm_check && power->flags.bm_control ) diff --git a/xen/arch/x86/cpu/mwait-idle.c b/xen/arch/x86/cpu/mwait-idle.c index 762dff1..ae9e92b 100644 --- a/xen/arch/x86/cpu/mwait-idle.c +++ b/xen/arch/x86/cpu/mwait-idle.c @@ -735,8 +735,11 @@ static void mwait_idle(void) if (!cx) { if (pm_idle_save) pm_idle_save(); - else + else { + rcu_idle_enter(cpu); safe_halt(); + rcu_idle_exit(cpu); + } return; } @@ -756,6 +759,8 @@ static void mwait_idle(void) return; } + rcu_idle_enter(cpu); + eax = cx->address; cstate = ((eax >> MWAIT_SUBSTATE_SIZE) & MWAIT_CSTATE_MASK) + 1; @@ -787,6 +792,8 @@ static void mwait_idle(void) irq_traced[0], irq_traced[1], irq_traced[2], irq_traced[3]); /* Now back in C0. */ + rcu_idle_exit(cpu); + update_idle_stats(power, cx, before, after); local_irq_enable(); diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index dd8bf13..a6c0f66 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -73,9 +73,15 @@ void (*dead_idle) (void) __read_mostly = default_dead_idle; static void default_idle(void) { + unsigned int cpu = smp_processor_id(); + local_irq_disable(); - if ( cpu_is_haltable(smp_processor_id()) ) + if ( cpu_is_haltable(cpu) ) + { + rcu_idle_enter(cpu); safe_halt(); + rcu_idle_exit(cpu); + } else local_irq_enable(); } diff --git a/xen/common/rcupdate.c b/xen/common/rcupdate.c index 8cc5a82..f0fdc87 100644 --- a/xen/common/rcupdate.c +++ b/xen/common/rcupdate.c @@ -52,7 +52,8 @@ static struct rcu_ctrlblk { int next_pending; /* Is the next batch already waiting? */ spinlock_t lock __cacheline_aligned; - cpumask_t cpumask; /* CPUs that need to switch in order */ + cpumask_t cpumask; /* CPUs that need to switch in order ... */ + cpumask_t idle_cpumask; /* ... unless they are already idle */ /* for current batch to proceed. */ } __cacheline_aligned rcu_ctrlblk = { .cur = -300, @@ -248,7 +249,14 @@ static void rcu_start_batch(struct rcu_ctrlblk *rcp) smp_wmb(); rcp->cur++; - cpumask_copy(&rcp->cpumask, &cpu_online_map); + /* + * Accessing idle_cpumask before incrementing rcp->cur needs a + * Barrier Otherwise it can cause tickless idle CPUs to be + * included in rcp->cpumask, which will extend graceperiods + * unnecessarily. + */ + smp_mb(); + cpumask_andnot(&rcp->cpumask, &cpu_online_map, &rcp->idle_cpumask); } } @@ -474,7 +482,23 @@ static struct notifier_block cpu_nfb = { void __init rcu_init(void) { void *cpu = (void *)(long)smp_processor_id(); + + cpumask_setall(&rcu_ctrlblk.idle_cpumask); cpu_callback(&cpu_nfb, CPU_UP_PREPARE, cpu); register_cpu_notifier(&cpu_nfb); open_softirq(RCU_SOFTIRQ, rcu_process_callbacks); } + +/* + * The CPU is becoming idle, so no more read side critical + * sections, and one more step toward grace period. + */ +void rcu_idle_enter(unsigned int cpu) +{ + cpumask_set_cpu(cpu, &rcu_ctrlblk.idle_cpumask); +} + +void rcu_idle_exit(unsigned int cpu) +{ + cpumask_clear_cpu(cpu, &rcu_ctrlblk.idle_cpumask); +} diff --git a/xen/include/xen/rcupdate.h b/xen/include/xen/rcupdate.h index 557a7b1..561ac43 100644 --- a/xen/include/xen/rcupdate.h +++ b/xen/include/xen/rcupdate.h @@ -146,4 +146,7 @@ void call_rcu(struct rcu_head *head, int rcu_barrier(void); +void rcu_idle_enter(unsigned int cpu); +void rcu_idle_exit(unsigned int cpu); + #endif /* __XEN_RCUPDATE_H */