Message ID | 20180809145307.066923033@amt.cnet (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | switch kvmclock base to CLOCK_MONOTONIC_RAW (v2) | expand |
On 09/08/2018 16:52, Marcelo Tosatti wrote: > Commit 0bc48bea36d1 ("KVM: x86: update master clock before computing kvmclock_offset") > switches the order of operations to avoid the conversion > > TSC (without frequency correction) -> > system_timestamp (with frequency correction), > > which might cause a time jump. > > However, it leaves any other masterclock update unsafe, which includes, > at the moment: > > * HV_X64_MSR_REFERENCE_TSC MSR write. > * TSC writes. > * Host suspend/resume. > > Avoid the time jump issue by using frequency uncorrected > CLOCK_MONOTONIC_RAW clock. > > Its the guests time keeping software responsability > to track and correct a reference clock such as UTC. > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> What happens across migration? Paolo
On Fri, Aug 10, 2018 at 09:15:51AM +0200, Paolo Bonzini wrote: > On 09/08/2018 16:52, Marcelo Tosatti wrote: > > Commit 0bc48bea36d1 ("KVM: x86: update master clock before computing kvmclock_offset") > > switches the order of operations to avoid the conversion > > > > TSC (without frequency correction) -> > > system_timestamp (with frequency correction), > > > > which might cause a time jump. > > > > However, it leaves any other masterclock update unsafe, which includes, > > at the moment: > > > > * HV_X64_MSR_REFERENCE_TSC MSR write. > > * TSC writes. > > * Host suspend/resume. > > > > Avoid the time jump issue by using frequency uncorrected > > CLOCK_MONOTONIC_RAW clock. > > > > Its the guests time keeping software responsability > > to track and correct a reference clock such as UTC. > > > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > What happens across migration? > > Paolo You mean between frequency corrected host -> frequency uncorrected host And vice-versa? I'll check NTP's/Chrony's behaviour and let you know. Thanks
On 13/08/2018 14:52, Marcelo Tosatti wrote: >> What happens across migration? >> >> Paolo > You mean between > > frequency corrected host -> frequency uncorrected host > > And vice-versa? Yes. Worst case we can use KVM_SET_CLOCK's flags argument to switch. Paolo > I'll check NTP's/Chrony's behaviour and let you know.
On Mon, Aug 13, 2018 at 09:52:55AM -0300, Marcelo Tosatti wrote: > On Fri, Aug 10, 2018 at 09:15:51AM +0200, Paolo Bonzini wrote: > > On 09/08/2018 16:52, Marcelo Tosatti wrote: > > > Commit 0bc48bea36d1 ("KVM: x86: update master clock before computing kvmclock_offset") > > > switches the order of operations to avoid the conversion > > > > > > TSC (without frequency correction) -> > > > system_timestamp (with frequency correction), > > > > > > which might cause a time jump. > > > > > > However, it leaves any other masterclock update unsafe, which includes, > > > at the moment: > > > > > > * HV_X64_MSR_REFERENCE_TSC MSR write. > > > * TSC writes. > > > * Host suspend/resume. > > > > > > Avoid the time jump issue by using frequency uncorrected > > > CLOCK_MONOTONIC_RAW clock. > > > > > > Its the guests time keeping software responsability > > > to track and correct a reference clock such as UTC. > > > > > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > > > What happens across migration? > > > > Paolo > > You mean between > > frequency corrected host -> frequency uncorrected host > > And vice-versa? > > I'll check NTP's/Chrony's behaviour and let you know. > > Thanks They will measure and adjust their frequency corrections (which is necessary anyway due to temperature variations for example). http://doc.ntp.org/4.1.0/ntpd.htm Frequency Discipline The ntpd behavior at startup depends on whether the frequency file, usually ntp.drift, exists. This file contains the latest estimate of clock frequency error. When the ntpd is started and the file does not exist, the ntpd enters a special mode designed to quickly adapt to the particular system clock oscillator time and frequency error. This takes approximately 15 minutes, after which the time and frequency are set to nominal values and the ntpd enters normal mode, where the time and frequency are continuously tracked relative to the server. After one hour the frequency file is created and the current frequency offset written to it. When the ntpd is started and the file does exist, the ntpd frequency is initialized from the file and enters normal mode immediately. After that the current frequency offset is written to the file at hourly intervals.
Index: linux-2.6.git/arch/x86/kvm/x86.c =================================================================== --- linux-2.6.git.orig/arch/x86/kvm/x86.c 2018-08-09 10:32:51.151942374 -0300 +++ linux-2.6.git/arch/x86/kvm/x86.c 2018-08-09 11:49:47.111293686 -0300 @@ -1240,20 +1240,25 @@ } #ifdef CONFIG_X86_64 +struct pvclock_clock { + int vclock_mode; + u64 cycle_last; + u64 mask; + u32 mult; + u32 shift; +}; + struct pvclock_gtod_data { seqcount_t seq; - struct { /* extract of a clocksource struct */ - int vclock_mode; - u64 cycle_last; - u64 mask; - u32 mult; - u32 shift; - } clock; + struct pvclock_clock clock; /* extract of a clocksource struct */ + struct pvclock_clock raw_clock; /* extract of a clocksource struct */ + u64 boot_ns_raw; u64 boot_ns; u64 nsec_base; u64 wall_time_sec; + u64 monotonic_raw_nsec; }; static struct pvclock_gtod_data pvclock_gtod_data; @@ -1261,10 +1266,20 @@ static void update_pvclock_gtod(struct timekeeper *tk) { struct pvclock_gtod_data *vdata = &pvclock_gtod_data; - u64 boot_ns; + u64 boot_ns, boot_ns_raw; boot_ns = ktime_to_ns(ktime_add(tk->tkr_mono.base, tk->offs_boot)); + /* + * FIXME: tk->offs_boot should be converted to CLOCK_MONOTONIC_RAW + * interval (that is, without frequency adjustment for that interval). + * + * Lack of this fix can cause system_timestamp to not be equal to + * CLOCK_MONOTONIC_RAW (which happen if the host uses + * suspend/resume). + */ + boot_ns_raw = ktime_to_ns(ktime_add(tk->tkr_raw.base, tk->offs_boot)); + write_seqcount_begin(&vdata->seq); /* copy pvclock gtod data */ @@ -1274,11 +1289,20 @@ vdata->clock.mult = tk->tkr_mono.mult; vdata->clock.shift = tk->tkr_mono.shift; + vdata->raw_clock.vclock_mode = tk->tkr_raw.clock->archdata.vclock_mode; + vdata->raw_clock.cycle_last = tk->tkr_raw.cycle_last; + vdata->raw_clock.mask = tk->tkr_raw.mask; + vdata->raw_clock.mult = tk->tkr_raw.mult; + vdata->raw_clock.shift = tk->tkr_raw.shift; + vdata->boot_ns = boot_ns; vdata->nsec_base = tk->tkr_mono.xtime_nsec; vdata->wall_time_sec = tk->xtime_sec; + vdata->boot_ns_raw = boot_ns_raw; + vdata->monotonic_raw_nsec = tk->tkr_raw.xtime_nsec; + write_seqcount_end(&vdata->seq); } #endif @@ -1713,21 +1737,21 @@ return last; } -static inline u64 vgettsc(u64 *tsc_timestamp, int *mode) +static inline u64 vgettsc(struct pvclock_clock *clock, u64 *tsc_timestamp, + int *mode) { long v; - struct pvclock_gtod_data *gtod = &pvclock_gtod_data; u64 tsc_pg_val; - switch (gtod->clock.vclock_mode) { + switch (clock->vclock_mode) { case VCLOCK_HVCLOCK: tsc_pg_val = hv_read_tsc_page_tsc(hv_get_tsc_page(), tsc_timestamp); if (tsc_pg_val != U64_MAX) { /* TSC page valid */ *mode = VCLOCK_HVCLOCK; - v = (tsc_pg_val - gtod->clock.cycle_last) & - gtod->clock.mask; + v = (tsc_pg_val - clock->cycle_last) & + clock->mask; } else { /* TSC page invalid */ *mode = VCLOCK_NONE; @@ -1736,8 +1760,8 @@ case VCLOCK_TSC: *mode = VCLOCK_TSC; *tsc_timestamp = read_tsc(); - v = (*tsc_timestamp - gtod->clock.cycle_last) & - gtod->clock.mask; + v = (*tsc_timestamp - clock->cycle_last) & + clock->mask; break; default: *mode = VCLOCK_NONE; @@ -1746,10 +1770,10 @@ if (*mode == VCLOCK_NONE) *tsc_timestamp = v = 0; - return v * gtod->clock.mult; + return v * clock->mult; } -static int do_monotonic_boot(s64 *t, u64 *tsc_timestamp) +static int do_monotonic_raw(s64 *t, u64 *tsc_timestamp) { struct pvclock_gtod_data *gtod = &pvclock_gtod_data; unsigned long seq; @@ -1758,10 +1782,10 @@ do { seq = read_seqcount_begin(>od->seq); - ns = gtod->nsec_base; - ns += vgettsc(tsc_timestamp, &mode); + ns = gtod->monotonic_raw_nsec; + ns += vgettsc(>od->raw_clock, tsc_timestamp, &mode); ns >>= gtod->clock.shift; - ns += gtod->boot_ns; + ns += gtod->boot_ns_raw; } while (unlikely(read_seqcount_retry(>od->seq, seq))); *t = ns; @@ -1779,7 +1803,7 @@ seq = read_seqcount_begin(>od->seq); ts->tv_sec = gtod->wall_time_sec; ns = gtod->nsec_base; - ns += vgettsc(tsc_timestamp, &mode); + ns += vgettsc(>od->clock, tsc_timestamp, &mode); ns >>= gtod->clock.shift; } while (unlikely(read_seqcount_retry(>od->seq, seq))); @@ -1796,7 +1820,7 @@ if (!gtod_is_based_on_tsc(pvclock_gtod_data.clock.vclock_mode)) return false; - return gtod_is_based_on_tsc(do_monotonic_boot(kernel_ns, + return gtod_is_based_on_tsc(do_monotonic_raw(kernel_ns, tsc_timestamp)); }
Commit 0bc48bea36d1 ("KVM: x86: update master clock before computing kvmclock_offset") switches the order of operations to avoid the conversion TSC (without frequency correction) -> system_timestamp (with frequency correction), which might cause a time jump. However, it leaves any other masterclock update unsafe, which includes, at the moment: * HV_X64_MSR_REFERENCE_TSC MSR write. * TSC writes. * Host suspend/resume. Avoid the time jump issue by using frequency uncorrected CLOCK_MONOTONIC_RAW clock. Its the guests time keeping software responsability to track and correct a reference clock such as UTC. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> --- arch/x86/kvm/x86.c | 68 +++++++++++++++++++++++++++++++++++------------------ 1 file changed, 46 insertions(+), 22 deletions(-)