From patchwork Tue Mar 29 13:44:11 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joao Martins X-Patchwork-Id: 8687321 Return-Path: X-Original-To: patchwork-xen-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 4FE459F36E for ; Tue, 29 Mar 2016 13:47:25 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 9EE592013D for ; Tue, 29 Mar 2016 13:47:19 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9C7B72014A for ; Tue, 29 Mar 2016 13:47:18 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1aktws-0007Wv-OH; Tue, 29 Mar 2016 13:44:46 +0000 Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1aktwr-0007Vg-JF for xen-devel@lists.xen.org; Tue, 29 Mar 2016 13:44:45 +0000 Received: from [85.158.137.68] by server-9.bemta-3.messagelabs.com id 56/BE-03814-CC68AF65; Tue, 29 Mar 2016 13:44:44 +0000 X-Env-Sender: joao.m.martins@oracle.com X-Msg-Ref: server-5.tower-31.messagelabs.com!1459259082!31890568!1 X-Originating-IP: [156.151.31.81] X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor: VHJ1c3RlZCBJUDogMTU2LjE1MS4zMS44MSA9PiAyODgzMzk=\n X-StarScan-Received: X-StarScan-Version: 8.11; banners=-,-,- X-VirusChecked: Checked Received: (qmail 3067 invoked from network); 29 Mar 2016 13:44:43 -0000 Received: from userp1040.oracle.com (HELO userp1040.oracle.com) (156.151.31.81) by server-5.tower-31.messagelabs.com with DHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 29 Mar 2016 13:44:43 -0000 Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u2TDidnd012631 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 29 Mar 2016 13:44:39 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u2TDicsT013442 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 29 Mar 2016 13:44:39 GMT Received: from abhmp0016.oracle.com (abhmp0016.oracle.com [141.146.116.22]) by aserv0122.oracle.com (8.13.8/8.13.8) with ESMTP id u2TDiZ1m016048; Tue, 29 Mar 2016 13:44:37 GMT Received: from localhost.localdomain (/193.126.154.29) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 29 Mar 2016 06:44:35 -0700 From: Joao Martins To: xen-devel@lists.xen.org Date: Tue, 29 Mar 2016 14:44:11 +0100 Message-Id: <1459259051-4943-7-git-send-email-joao.m.martins@oracle.com> X-Mailer: git-send-email 2.1.4 In-Reply-To: <1459259051-4943-1-git-send-email-joao.m.martins@oracle.com> References: <1459259051-4943-1-git-send-email-joao.m.martins@oracle.com> X-Source-IP: userv0021.oracle.com [156.151.31.71] Cc: Andrew Cooper , Joao Martins , Keir Fraser , Jan Beulich Subject: [Xen-devel] [PATCH v2 6/6] x86/time: implement PVCLOCK_TSC_STABLE_BIT X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When using TSC as clocksource we will solely rely on TSC for updating vcpu time infos (pvti). Right now, each vCPU takes the tsc_timestamp at different instants meaning every EPOCH + delta. This delta is variable depending on the time the CPU calibrates with CPU 0 (master), and will likely be different and variable across vCPUS. This means that each VCPU pvti won't account to its calibration error which could lead to time going backwards, and allowing a situation where time read on VCPU B immediately after A being smaller. While this doesn't happen a lot, I was able to observe (for clocksource=tsc) around 50 times in an hour having warps of < 100 ns. This patch proposes relying on host TSC synchronization and passthrough to the guest, when running on a TSC-safe platform. On time_calibration we retrieve the platform time in ns and the counter read by the clocksource that was used to compute system time. We introduce a new rendezous function which doesn't require synchronization between master and slave CPUS and just reads calibration_rendezvous struct and writes it down the stime and stamp to the cpu_calibration struct to be used later on. We can guarantee that on a platform with a constant and reliable TSC, that the time read on vcpu B right after A is bigger independently of the CPU calibration error. Since pvclock time infos are monotonic as seen by any vCPU set PVCLOCK_TSC_STABLE_BIT, which then enables usage of VDSO on Linux. IIUC, this is similar to how it's implemented on KVM. Note that PVCLOCK_TSC_STABLE_BIT is set only when CPU hotplug isn't meant to be performed on the host, which will either be when max vcpus and num_present_cpu are the same or if "nocpuhotplug" command line parameter is used. This is because a newly hotplugged CPU may not satisfy the condition of having all TSCs synchronized. Signed-off-by: Joao Martins --- Cc: Keir Fraser Cc: Jan Beulich Cc: Andrew Cooper Perhaps "cpuhotplugsafe" would be a better name, since potentially hardware could guarantee TSCs are synchronized on hotplug? Changes since v1: - Change approach to follow Andrew's guideline to skip std_rendezvous. And doing so by introducing a nop_rendezvous - Change commit message reflecting the change above. - Use TSC_STABLE_BIT only if cpu hotplug isn't possible. - Add command line option to override it if no cpu hotplug is intended. --- xen/arch/x86/time.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c index 123aa42..1dcd4af 100644 --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -43,6 +43,10 @@ static char __initdata opt_clocksource[10]; string_param("clocksource", opt_clocksource); +/* opt_nocpuhotplug: Set if CPU hotplug isn't meant to be used */ +static bool_t __initdata opt_nocpuhotplug; +boolean_param("nocpuhotplug", opt_nocpuhotplug); + unsigned long __read_mostly cpu_khz; /* CPU clock frequency in kHz. */ DEFINE_SPINLOCK(rtc_lock); unsigned long pit0_ticks; @@ -435,6 +439,7 @@ uint64_t ns_to_acpi_pm_tick(uint64_t ns) * PLATFORM TIMER 4: TSC */ static bool_t clocksource_is_tsc; +static bool_t use_tsc_stable_bit; static u64 tsc_freq; static unsigned long tsc_max_warp; static void tsc_check_reliability(void); @@ -468,6 +473,11 @@ static int __init init_tsctimer(struct platform_timesource *pts) pts->frequency = tsc_freq; clocksource_is_tsc = tsc_reliable; + use_tsc_stable_bit = clocksource_is_tsc && + ((nr_cpu_ids == num_present_cpus()) || opt_nocpuhotplug); + + if ( clocksource_is_tsc && !use_tsc_stable_bit ) + printk(XENLOG_INFO "TSC: CPU Hotplug intended, not setting stable bit\n"); return tsc_reliable; } @@ -950,6 +960,8 @@ static void __update_vcpu_system_time(struct vcpu *v, int force) _u.tsc_timestamp = tsc_stamp; _u.system_time = t->stime_local_stamp; + if ( use_tsc_stable_bit ) + _u.flags |= PVCLOCK_TSC_STABLE_BIT; if ( is_hvm_domain(d) ) _u.tsc_timestamp += v->arch.hvm_vcpu.cache_tsc_offset; @@ -1431,6 +1443,22 @@ static void time_calibration_std_rendezvous(void *_r) raise_softirq(TIME_CALIBRATE_SOFTIRQ); } +/* + * Rendezvous function used when clocksource is TSC and + * no CPU hotplug will be performed. + */ +static void time_calibration_nop_rendezvous(void *_r) +{ + struct cpu_calibration *c = &this_cpu(cpu_calibration); + struct calibration_rendezvous *r = _r; + + c->local_tsc_stamp = r->master_tsc_stamp; + c->stime_local_stamp = get_s_time(); + c->stime_master_stamp = r->master_stime; + + raise_softirq(TIME_CALIBRATE_SOFTIRQ); +} + static void (*time_calibration_rendezvous_fn)(void *) = time_calibration_std_rendezvous; @@ -1440,6 +1468,13 @@ static void time_calibration(void *unused) .semaphore = ATOMIC_INIT(0) }; + if ( use_tsc_stable_bit ) + { + local_irq_disable(); + r.master_stime = read_platform_stime(&r.master_tsc_stamp); + local_irq_enable(); + } + cpumask_copy(&r.cpu_calibration_map, &cpu_online_map); /* @wait=1 because we must wait for all cpus before freeing @r. */ @@ -1555,6 +1590,14 @@ static int __init verify_tsc_reliability(void) init_percpu_time(); + /* + * We won't do CPU Hotplug and TSC clocksource is being used which + * means we have a reliable TSC, plus we don't sync with any other + * clocksource so no need for rendezvous. + */ + if ( use_tsc_stable_bit ) + time_calibration_rendezvous_fn = time_calibration_nop_rendezvous; + init_timer(&calibration_timer, time_calibration, NULL, 0); set_timer(&calibration_timer, NOW() + EPOCH); }