From patchwork Wed Jun 15 10:27:23 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 9178089 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id BEDEA60776 for ; Wed, 15 Jun 2016 10:29:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AA8EE2793B for ; Wed, 15 Jun 2016 10:29:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9E9D7280E0; Wed, 15 Jun 2016 10:29:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id B32912804C for ; Wed, 15 Jun 2016 10:29:28 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bD82k-0007QD-5M; Wed, 15 Jun 2016 10:27:30 +0000 Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1bD82i-0007Po-Vq for xen-devel@lists.xenproject.org; Wed, 15 Jun 2016 10:27:29 +0000 Received: from [85.158.139.211] by server-11.bemta-5.messagelabs.com id 21/0B-04210-09D21675; Wed, 15 Jun 2016 10:27:28 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrHIsWRWlGSWpSXmKPExsXS6fjDS7dfNzH cYNZufovvWyYzOTB6HP5whSWAMYo1My8pvyKBNePYvYNMBV8yKo7uXsbawNgc0sXIySEkkCcx t/s+G4jNK2AncflrJxOILSFgKLFv/iqwOIuAqsTbkzfA4mwC6hJtz7azdjFycIgIGEicO5rUx cjFwSzQzSixY9NZsBphAQeJby8nM0LMt5NY/ncLK4jNKWAvcffLQzaQXl4BQYm/O4RBwsxAJa 09k9gmMPLMQsjMQpKBsLUkHv66xQJha0ssW/iaGaScWUBaYvk/DoiwhcSzPdvYUZWA2M4S+46 /ZFzAyLGKUaM4tagstUjX0EwvqSgzPaMkNzEzR9fQwFQvN7W4ODE9NScxqVgvOT93EyMwXBmA YAfj+dOehxglOZiURHk95BLDhfiS8lMqMxKLM+KLSnNSiw8xynBwKEnw7tYBygkWpaanVqRl5 gAjByYtwcGjJML7ByTNW1yQmFucmQ6ROsWoKCXO2wKSEABJZJTmwbXBovUSo6yUMC8j0CFCPA WpRbmZJajyrxjFORiVhHkvgEzhycwrgZv+CmgxE9Bim+nxIItLEhFSUg2Mri+e+k+0L+s+2ao 0ZwHDhTh17pDsz3Oe6E9b/KZyUVH/b/kn9vOyz8XEiX/ROrwkIfGO3o96pxcnH78On/r7cmHd BkUpzrR3L8/McIg+/amvWtSh0UhS4Nh7PVnNKRM1BCW/CUgY/J66llGSxdM9j33jw+d6RnvPn J3u8MBEPNjl97l7/avOKrEUZyQaajEXFScCAMkPPK7RAgAA X-Env-Sender: JBeulich@suse.com X-Msg-Ref: server-2.tower-206.messagelabs.com!1465986444!28939989!1 X-Originating-IP: [137.65.248.74] X-SpamReason: No, hits=0.0 required=7.0 tests= X-StarScan-Received: X-StarScan-Version: 8.46; banners=-,-,- X-VirusChecked: Checked Received: (qmail 3014 invoked from network); 15 Jun 2016 10:27:26 -0000 Received: from prv-mh.provo.novell.com (HELO prv-mh.provo.novell.com) (137.65.248.74) by server-2.tower-206.messagelabs.com with DHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 15 Jun 2016 10:27:26 -0000 Received: from INET-PRV-MTA by prv-mh.provo.novell.com with Novell_GroupWise; Wed, 15 Jun 2016 04:27:24 -0600 Message-Id: <576149AB02000078000F539D@prv-mh.provo.novell.com> X-Mailer: Novell GroupWise Internet Agent 14.2.0 Date: Wed, 15 Jun 2016 04:27:23 -0600 From: "Jan Beulich" To: "xen-devel" References: <576140F302000078000F52FE@prv-mh.provo.novell.com> In-Reply-To: <576140F302000078000F52FE@prv-mh.provo.novell.com> Mime-Version: 1.0 Cc: Andrew Cooper , Dario Faggioli , Joao Martins Subject: [Xen-devel] [PATCH 3/8] x86/time: introduce and use rdtsc_ordered() X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Matching Linux commit 03b9730b76 ("x86/asm/tsc: Add rdtsc_ordered() and use it in trivial call sites") and earlier ones it builds upon, let's make sure timing loops don't have their rdtsc()-s re-ordered, as that would harm precision of the result (values were observed to be several hundred clocks off without this adjustment). Signed-off-by: Jan Beulich x86/time: introduce and use rdtsc_ordered() Matching Linux commit 03b9730b76 ("x86/asm/tsc: Add rdtsc_ordered() and use it in trivial call sites") and earlier ones it builds upon, let's make sure timing loops don't have their rdtsc()-s re-ordered, as that would harm precision of the result (values were observed to be several hundred clocks off without this adjustment). Signed-off-by: Jan Beulich --- a/xen/arch/x86/apic.c +++ b/xen/arch/x86/apic.c @@ -1137,7 +1137,7 @@ static int __init calibrate_APIC_clock(v /* * We wrapped around just now. Let's start: */ - t1 = rdtsc(); + t1 = rdtsc_ordered(); tt1 = apic_read(APIC_TMCCT); /* @@ -1147,7 +1147,7 @@ static int __init calibrate_APIC_clock(v wait_8254_wraparound(); tt2 = apic_read(APIC_TMCCT); - t2 = rdtsc(); + t2 = rdtsc_ordered(); /* * The APIC bus clock counter is 32 bits only, it --- a/xen/arch/x86/cpu/amd.c +++ b/xen/arch/x86/cpu/amd.c @@ -541,6 +541,9 @@ static void init_amd(struct cpuinfo_x86 wrmsr_amd_safe(0xc001100d, l, h & ~1); } + /* MFENCE stops RDTSC speculation */ + __set_bit(X86_FEATURE_MFENCE_RDTSC, c->x86_capability); + switch(c->x86) { case 0xf ... 0x17: --- a/xen/arch/x86/delay.c +++ b/xen/arch/x86/delay.c @@ -21,10 +21,10 @@ void __udelay(unsigned long usecs) unsigned long ticks = usecs * (cpu_khz / 1000); unsigned long s, e; - s = rdtsc(); + s = rdtsc_ordered(); do { rep_nop(); - e = rdtsc(); + e = rdtsc_ordered(); } while ((e-s) < ticks); } --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -123,7 +123,7 @@ static void synchronize_tsc_master(unsig for ( i = 1; i <= 5; i++ ) { - tsc_value = rdtsc(); + tsc_value = rdtsc_ordered(); wmb(); atomic_inc(&tsc_count); while ( atomic_read(&tsc_count) != (i<<1) ) --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -257,10 +257,10 @@ static u64 init_pit_and_calibrate_tsc(vo outb(CALIBRATE_LATCH & 0xff, PIT_CH2); /* LSB of count */ outb(CALIBRATE_LATCH >> 8, PIT_CH2); /* MSB of count */ - start = rdtsc(); + start = rdtsc_ordered(); for ( count = 0; (inb(0x61) & 0x20) == 0; count++ ) continue; - end = rdtsc(); + end = rdtsc_ordered(); /* Error if the CTC doesn't behave itself. */ if ( count == 0 ) @@ -760,7 +760,7 @@ s_time_t get_s_time_fixed(u64 at_tsc) if ( at_tsc ) tsc = at_tsc; else - tsc = rdtsc(); + tsc = rdtsc_ordered(); delta = tsc - t->local_tsc_stamp; now = t->stime_local_stamp + scale_delta(delta, &t->tsc_scale); @@ -933,7 +933,7 @@ int cpu_frequency_change(u64 freq) /* TSC-extrapolated time may be bogus after frequency change. */ /*t->stime_local_stamp = get_s_time();*/ t->stime_local_stamp = t->stime_master_stamp; - curr_tsc = rdtsc(); + curr_tsc = rdtsc_ordered(); t->local_tsc_stamp = curr_tsc; set_time_scale(&t->tsc_scale, freq); local_irq_enable(); @@ -1248,7 +1248,7 @@ static void time_calibration_tsc_rendezv if ( r->master_stime == 0 ) { r->master_stime = read_platform_stime(); - r->master_tsc_stamp = rdtsc(); + r->master_tsc_stamp = rdtsc_ordered(); } atomic_inc(&r->semaphore); @@ -1274,7 +1274,7 @@ static void time_calibration_tsc_rendezv } } - c->local_tsc_stamp = rdtsc(); + c->local_tsc_stamp = rdtsc_ordered(); c->stime_local_stamp = get_s_time_fixed(c->local_tsc_stamp); c->stime_master_stamp = r->master_stime; @@ -1304,7 +1304,7 @@ static void time_calibration_std_rendezv mb(); /* receive signal /then/ read r->master_stime */ } - c->local_tsc_stamp = rdtsc(); + c->local_tsc_stamp = rdtsc_ordered(); c->stime_local_stamp = get_s_time_fixed(c->local_tsc_stamp); c->stime_master_stamp = r->master_stime; @@ -1338,7 +1338,7 @@ void time_latch_stamps(void) { local_irq_save(flags); ap_bringup_ref.master_stime = read_platform_stime(); - tsc = rdtsc(); + tsc = rdtsc_ordered(); local_irq_restore(flags); ap_bringup_ref.local_stime = get_s_time_fixed(tsc); @@ -1356,7 +1356,7 @@ void init_percpu_time(void) local_irq_save(flags); now = read_platform_stime(); - tsc = rdtsc(); + tsc = rdtsc_ordered(); local_irq_restore(flags); t->stime_master_stamp = now; --- a/xen/include/asm-x86/cpufeature.h +++ b/xen/include/asm-x86/cpufeature.h @@ -16,6 +16,7 @@ XEN_CPUFEATURE(XTOPOLOGY, (FSCAPIN XEN_CPUFEATURE(CPUID_FAULTING, (FSCAPINTS+0)*32+ 6) /* cpuid faulting */ XEN_CPUFEATURE(CLFLUSH_MONITOR, (FSCAPINTS+0)*32+ 7) /* clflush reqd with monitor */ XEN_CPUFEATURE(APERFMPERF, (FSCAPINTS+0)*32+ 8) /* APERFMPERF */ +XEN_CPUFEATURE(MFENCE_RDTSC, (FSCAPINTS+0)*32+ 9) /* MFENCE synchronizes RDTSC */ #define NCAPINTS (FSCAPINTS + 1) /* N 32-bit words worth of info */ --- a/xen/include/asm-x86/msr.h +++ b/xen/include/asm-x86/msr.h @@ -80,6 +80,22 @@ static inline uint64_t rdtsc(void) return ((uint64_t)high << 32) | low; } +static inline uint64_t rdtsc_ordered(void) +{ + /* + * The RDTSC instruction is not ordered relative to memory access. + * The Intel SDM and the AMD APM are both vague on this point, but + * empirically an RDTSC instruction can be speculatively executed + * before prior loads. An RDTSC immediately after an appropriate + * barrier appears to be ordered as a normal load, that is, it + * provides the same ordering guarantees as reading from a global + * memory location that some other imaginary CPU is updating + * continuously with a time stamp. + */ + alternative("lfence", "mfence", X86_FEATURE_MFENCE_RDTSC); + return rdtsc(); +} + #define __write_tsc(val) wrmsrl(MSR_IA32_TSC, val) #define write_tsc(val) ({ \ /* Reliable TSCs are in lockstep across all CPUs. We should \ Reviewed-by: Andrew Cooper Reviewed-by: Dario Faggioli Tested-by: Dario Faggioli --- a/xen/arch/x86/apic.c +++ b/xen/arch/x86/apic.c @@ -1137,7 +1137,7 @@ static int __init calibrate_APIC_clock(v /* * We wrapped around just now. Let's start: */ - t1 = rdtsc(); + t1 = rdtsc_ordered(); tt1 = apic_read(APIC_TMCCT); /* @@ -1147,7 +1147,7 @@ static int __init calibrate_APIC_clock(v wait_8254_wraparound(); tt2 = apic_read(APIC_TMCCT); - t2 = rdtsc(); + t2 = rdtsc_ordered(); /* * The APIC bus clock counter is 32 bits only, it --- a/xen/arch/x86/cpu/amd.c +++ b/xen/arch/x86/cpu/amd.c @@ -541,6 +541,9 @@ static void init_amd(struct cpuinfo_x86 wrmsr_amd_safe(0xc001100d, l, h & ~1); } + /* MFENCE stops RDTSC speculation */ + __set_bit(X86_FEATURE_MFENCE_RDTSC, c->x86_capability); + switch(c->x86) { case 0xf ... 0x17: --- a/xen/arch/x86/delay.c +++ b/xen/arch/x86/delay.c @@ -21,10 +21,10 @@ void __udelay(unsigned long usecs) unsigned long ticks = usecs * (cpu_khz / 1000); unsigned long s, e; - s = rdtsc(); + s = rdtsc_ordered(); do { rep_nop(); - e = rdtsc(); + e = rdtsc_ordered(); } while ((e-s) < ticks); } --- a/xen/arch/x86/smpboot.c +++ b/xen/arch/x86/smpboot.c @@ -123,7 +123,7 @@ static void synchronize_tsc_master(unsig for ( i = 1; i <= 5; i++ ) { - tsc_value = rdtsc(); + tsc_value = rdtsc_ordered(); wmb(); atomic_inc(&tsc_count); while ( atomic_read(&tsc_count) != (i<<1) ) --- a/xen/arch/x86/time.c +++ b/xen/arch/x86/time.c @@ -257,10 +257,10 @@ static u64 init_pit_and_calibrate_tsc(vo outb(CALIBRATE_LATCH & 0xff, PIT_CH2); /* LSB of count */ outb(CALIBRATE_LATCH >> 8, PIT_CH2); /* MSB of count */ - start = rdtsc(); + start = rdtsc_ordered(); for ( count = 0; (inb(0x61) & 0x20) == 0; count++ ) continue; - end = rdtsc(); + end = rdtsc_ordered(); /* Error if the CTC doesn't behave itself. */ if ( count == 0 ) @@ -760,7 +760,7 @@ s_time_t get_s_time_fixed(u64 at_tsc) if ( at_tsc ) tsc = at_tsc; else - tsc = rdtsc(); + tsc = rdtsc_ordered(); delta = tsc - t->local_tsc_stamp; now = t->stime_local_stamp + scale_delta(delta, &t->tsc_scale); @@ -933,7 +933,7 @@ int cpu_frequency_change(u64 freq) /* TSC-extrapolated time may be bogus after frequency change. */ /*t->stime_local_stamp = get_s_time();*/ t->stime_local_stamp = t->stime_master_stamp; - curr_tsc = rdtsc(); + curr_tsc = rdtsc_ordered(); t->local_tsc_stamp = curr_tsc; set_time_scale(&t->tsc_scale, freq); local_irq_enable(); @@ -1248,7 +1248,7 @@ static void time_calibration_tsc_rendezv if ( r->master_stime == 0 ) { r->master_stime = read_platform_stime(); - r->master_tsc_stamp = rdtsc(); + r->master_tsc_stamp = rdtsc_ordered(); } atomic_inc(&r->semaphore); @@ -1274,7 +1274,7 @@ static void time_calibration_tsc_rendezv } } - c->local_tsc_stamp = rdtsc(); + c->local_tsc_stamp = rdtsc_ordered(); c->stime_local_stamp = get_s_time_fixed(c->local_tsc_stamp); c->stime_master_stamp = r->master_stime; @@ -1304,7 +1304,7 @@ static void time_calibration_std_rendezv mb(); /* receive signal /then/ read r->master_stime */ } - c->local_tsc_stamp = rdtsc(); + c->local_tsc_stamp = rdtsc_ordered(); c->stime_local_stamp = get_s_time_fixed(c->local_tsc_stamp); c->stime_master_stamp = r->master_stime; @@ -1338,7 +1338,7 @@ void time_latch_stamps(void) { local_irq_save(flags); ap_bringup_ref.master_stime = read_platform_stime(); - tsc = rdtsc(); + tsc = rdtsc_ordered(); local_irq_restore(flags); ap_bringup_ref.local_stime = get_s_time_fixed(tsc); @@ -1356,7 +1356,7 @@ void init_percpu_time(void) local_irq_save(flags); now = read_platform_stime(); - tsc = rdtsc(); + tsc = rdtsc_ordered(); local_irq_restore(flags); t->stime_master_stamp = now; --- a/xen/include/asm-x86/cpufeature.h +++ b/xen/include/asm-x86/cpufeature.h @@ -16,6 +16,7 @@ XEN_CPUFEATURE(XTOPOLOGY, (FSCAPIN XEN_CPUFEATURE(CPUID_FAULTING, (FSCAPINTS+0)*32+ 6) /* cpuid faulting */ XEN_CPUFEATURE(CLFLUSH_MONITOR, (FSCAPINTS+0)*32+ 7) /* clflush reqd with monitor */ XEN_CPUFEATURE(APERFMPERF, (FSCAPINTS+0)*32+ 8) /* APERFMPERF */ +XEN_CPUFEATURE(MFENCE_RDTSC, (FSCAPINTS+0)*32+ 9) /* MFENCE synchronizes RDTSC */ #define NCAPINTS (FSCAPINTS + 1) /* N 32-bit words worth of info */ --- a/xen/include/asm-x86/msr.h +++ b/xen/include/asm-x86/msr.h @@ -80,6 +80,22 @@ static inline uint64_t rdtsc(void) return ((uint64_t)high << 32) | low; } +static inline uint64_t rdtsc_ordered(void) +{ + /* + * The RDTSC instruction is not ordered relative to memory access. + * The Intel SDM and the AMD APM are both vague on this point, but + * empirically an RDTSC instruction can be speculatively executed + * before prior loads. An RDTSC immediately after an appropriate + * barrier appears to be ordered as a normal load, that is, it + * provides the same ordering guarantees as reading from a global + * memory location that some other imaginary CPU is updating + * continuously with a time stamp. + */ + alternative("lfence", "mfence", X86_FEATURE_MFENCE_RDTSC); + return rdtsc(); +} + #define __write_tsc(val) wrmsrl(MSR_IA32_TSC, val) #define write_tsc(val) ({ \ /* Reliable TSCs are in lockstep across all CPUs. We should \