From patchwork Wed Jun 15 10:26:16 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 9178071 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 7EDC660776 for ; Wed, 15 Jun 2016 10:28:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6AD512804C for ; Wed, 15 Jun 2016 10:28:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5FA34281FE; Wed, 15 Jun 2016 10:28:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id C2B312804C for ; Wed, 15 Jun 2016 10:28:17 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bD81e-0007DT-JK; Wed, 15 Jun 2016 10:26:22 +0000 Received: from mail6.bemta6.messagelabs.com ([85.158.143.247]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bD81d-0007DI-Q6 for xen-devel@lists.xenproject.org; Wed, 15 Jun 2016 10:26:21 +0000 Received: from [85.158.143.35] by server-1.bemta-6.messagelabs.com id A5/54-30266-D4D21675; Wed, 15 Jun 2016 10:26:21 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrDIsWRWlGSWpSXmKPExsXS6fjDS9dHNzH c4O9nRYvvWyYzOTB6HP5whSWAMYo1My8pvyKBNWPL7pnsBeetKz7ff83UwPjHoIuRg0NIIE9i aXN1FyMnB6+AncTJt2uYQGwJAUOJffNXsYHYLAKqEoc79zCD2GwC6hJtz7azgrSKCBhInDua1 MXIxcEs0M0osWPTWbBeYQF/iU13+sF6hYBmLv+7hRXE5hSwl7j75SEbSC+vgKDE3x3CIGFmoJ J132cxTmDkmYWQmYUkA2FrSTz8dYsFwtaWWLbwNTNIObOAtMTyfxwQYQeJA+uuMqIqAbG9JZr OnmVfwMixilG9OLWoLLVI11gvqSgzPaMkNzEzR9fQwEwvN7W4ODE9NScxqVgvOT93EyMwVBmA YAdjxz+nQ4ySHExKorweconhQnxJ+SmVGYnFGfFFpTmpxYcYZTg4lCR4nXSAcoJFqempFWmZO cCogUlLcPAoifB6gKR5iwsSc4sz0yFSpxgVpcR5RUESAiCJjNI8uDZYpF5ilJUS5mUEOkSIpy C1KDezBFX+FaM4B6OSMG8UyBSezLwSuOmvgBYzAS22mR4PsrgkESEl1cC4Ykv1tYm3Ep4lK7O avIzRCTcx3pEyd9k33aXlixlv5lWYXpwoJ3iRWdjO7oLMBiefSY8X79L4ri0qIsq4eIKx0Psp vw+su/VULfDgj6PRpc31Hu6O0YWSDzcx5L6KPLG1xcDMM4fN5+GZy6vnfNh1g+dP+8FrJx2PN 3h1KqR9SfjPFTxxWu4NJZbijERDLeai4kQApLJCc88CAAA= X-Env-Sender: JBeulich@suse.com X-Msg-Ref: server-10.tower-21.messagelabs.com!1465986378!18952695!1 X-Originating-IP: [137.65.248.74] X-SpamReason: No, hits=0.0 required=7.0 tests= X-StarScan-Received: X-StarScan-Version: 8.46; banners=-,-,- X-VirusChecked: Checked Received: (qmail 10190 invoked from network); 15 Jun 2016 10:26:20 -0000 Received: from prv-mh.provo.novell.com (HELO prv-mh.provo.novell.com) (137.65.248.74) by server-10.tower-21.messagelabs.com with DHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 15 Jun 2016 10:26:20 -0000 Received: from INET-PRV-MTA by prv-mh.provo.novell.com with Novell_GroupWise; Wed, 15 Jun 2016 04:26:18 -0600 Message-Id: <5761496802000078000F5395@prv-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 14.2.0 Date: Wed, 15 Jun 2016 04:26:16 -0600 From: "Jan Beulich" To: "xen-devel" References: <576140F302000078000F52FE@prv-mh.provo.novell.com> In-Reply-To: <576140F302000078000F52FE@prv-mh.provo.novell.com> Mime-Version: 1.0 Cc: Andrew Cooper , Dario Faggioli , Joao Martins Subject: [Xen-devel] [PATCH 1/8] x86/time: improve cross-CPU clock monotonicity (and more) X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Using the bare return value from read_platform_stime() is not suitable when local_time_calibration() is going to use its fast path: Divergence of several dozen microseconds between NOW() return values on different CPUs results when platform and local time don't stay in close sync. Latch local and platform time on the CPU initiating AP bringup, such that the AP can use these values to seed its stime_local_stamp with as little of an error as possible. The boot CPU, otoh, can simply calculate the correct initial value (other CPUs could do so too with even greater accuracy than the approach being introduced, but that can work only if all CPUs' TSCs start ticking at the same time, which generally can't be assumed to be the case on multi-socket systems). This slightly defers init_percpu_time() (moved ahead by commit dd2658f966 ["x86/time: initialise time earlier during start_secondary()"]) in order to reduce as much as possible the gap between populating the stamps and consuming them. Signed-off-by: Jan Beulich x86/time: adjust local system time initialization Using the bare return value from read_platform_stime() is not suitable when local_time_calibration() is going to use its fast path: Divergence of several dozen microseconds between NOW() return values on different CPUs results when platform and local time don't stay in close sync. Latch local and platform time on the CPU initiating AP bringup, such that the AP can use these values to seed its stime_local_stamp with as little of an error as possible. The boot CPU, otoh, can simply calculate the correct initial value (other CPUs could do so too with even greater accuracy than the approach being introduced, but that can work only if all CPUs' TSCs start ticking at the same time, which generally can't be assumed to be the case on multi-socket systems). This slightly defers init_percpu_time() (moved ahead by commit dd2658f966 ["x86/time: initialise time earlier during start_secondary()"]) in order to reduce as much as possible the gap between populating the stamps and consuming them. Signed-off-by: Jan Beulich --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -328,12 +328,12 @@ void start_secondary(void *unused) percpu_traps_init(); - init_percpu_time(); - cpu_init(); smp_callin(); + init_percpu_time(); + setup_secondary_APIC_clock(); /* @@ -996,6 +996,8 @@ int __cpu_up(unsigned int cpu) if ( (ret = do_boot_cpu(apicid, cpu)) != 0 ) return ret; + time_latch_stamps(); + set_cpu_state(CPU_STATE_ONLINE); while ( !cpu_online(cpu) ) { --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -1328,21 +1328,51 @@ static void time_calibration(void *unuse &r, 1); } +static struct { + s_time_t local_stime, master_stime; +} ap_bringup_ref; + +void time_latch_stamps(void) { + unsigned long flags; + u64 tsc; + + local_irq_save(flags); + ap_bringup_ref.master_stime = read_platform_stime(); + tsc = rdtsc(); + local_irq_restore(flags); + + ap_bringup_ref.local_stime = get_s_time_fixed(tsc); +} + void init_percpu_time(void) { struct cpu_time *t = &this_cpu(cpu_time); unsigned long flags; + u64 tsc; s_time_t now; /* Initial estimate for TSC rate. */ t->tsc_scale = per_cpu(cpu_time, 0).tsc_scale; local_irq_save(flags); - t->local_tsc_stamp = rdtsc(); now = read_platform_stime(); + tsc = rdtsc(); local_irq_restore(flags); t->stime_master_stamp = now; + /* + * To avoid a discontinuity (TSC and platform clock can't be expected + * to be in perfect sync), initialization here needs to match up with + * local_time_calibration()'s decision whether to use its fast path. + */ + if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) ) + { + if ( system_state < SYS_STATE_smp_boot ) + now = get_s_time_fixed(tsc); + else + now += ap_bringup_ref.local_stime - ap_bringup_ref.master_stime; + } + t->local_tsc_stamp = tsc; t->stime_local_stamp = now; } --- a/xen/include/asm-x86/time.h +++ b/xen/include/asm-x86/time.h @@ -40,6 +40,7 @@ int time_suspend(void); int time_resume(void); void init_percpu_time(void); +void time_latch_stamps(void); struct ioreq; int hwdom_pit_access(struct ioreq *ioreq); Tested-by: Joao Martins --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -328,12 +328,12 @@ void start_secondary(void *unused) percpu_traps_init(); - init_percpu_time(); - cpu_init(); smp_callin(); + init_percpu_time(); + setup_secondary_APIC_clock(); /* @@ -996,6 +996,8 @@ int __cpu_up(unsigned int cpu) if ( (ret = do_boot_cpu(apicid, cpu)) != 0 ) return ret; + time_latch_stamps(); + set_cpu_state(CPU_STATE_ONLINE); while ( !cpu_online(cpu) ) { --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -1328,21 +1328,51 @@ static void time_calibration(void *unuse &r, 1); } +static struct { + s_time_t local_stime, master_stime; +} ap_bringup_ref; + +void time_latch_stamps(void) { + unsigned long flags; + u64 tsc; + + local_irq_save(flags); + ap_bringup_ref.master_stime = read_platform_stime(); + tsc = rdtsc(); + local_irq_restore(flags); + + ap_bringup_ref.local_stime = get_s_time_fixed(tsc); +} + void init_percpu_time(void) { struct cpu_time *t = &this_cpu(cpu_time); unsigned long flags; + u64 tsc; s_time_t now; /* Initial estimate for TSC rate. */ t->tsc_scale = per_cpu(cpu_time, 0).tsc_scale; local_irq_save(flags); - t->local_tsc_stamp = rdtsc(); now = read_platform_stime(); + tsc = rdtsc(); local_irq_restore(flags); t->stime_master_stamp = now; + /* + * To avoid a discontinuity (TSC and platform clock can't be expected + * to be in perfect sync), initialization here needs to match up with + * local_time_calibration()'s decision whether to use its fast path. + */ + if ( boot_cpu_has(X86_FEATURE_CONSTANT_TSC) ) + { + if ( system_state < SYS_STATE_smp_boot ) + now = get_s_time_fixed(tsc); + else + now += ap_bringup_ref.local_stime - ap_bringup_ref.master_stime; + } + t->local_tsc_stamp = tsc; t->stime_local_stamp = now; } --- a/xen/include/asm-x86/time.h +++ b/xen/include/asm-x86/time.h @@ -40,6 +40,7 @@ int time_suspend(void); int time_resume(void); void init_percpu_time(void); +void time_latch_stamps(void); struct ioreq; int hwdom_pit_access(struct ioreq *ioreq);