Message ID | 1501331711-12961-3-git-send-email-dplotnikov@virtuozzo.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 29/07/2017 14:35, Denis Plotnikov wrote: > arch/x86/kernel/kvmclock.c | 14 ++++++++++++-- > arch/x86/kernel/tsc.c | 6 ++++++ > arch/x86/kvm/x86.c | 26 ++++++++++++++++++-------- > include/linux/clocksource.h | 3 +++ > include/linux/timekeeping.h | 2 ++ > kernel/time/timekeeping.c | 21 +++++++++++++++++++-- > 6 files changed, 60 insertions(+), 12 deletions(-) This is pretty clean, thanks. Only it should be split in several patches: - introducing read_with_cycles and using it in ktime_get_snapshot. Looking at the code, the meaning of "cycles" is overloaded, so perhaps rename it to read_clock_and_systime or something similar? - introducing boot time in ktime_get_snapshot - implementing kvm_clock_read_with_cycles (can be merged with patch 6) - adding the cs_stable field to struct system_time_snapshot (see below, maybe this can be merged with read_with_cycles) - using ktime_get_snapshot in KVM (can be merged with patch 4?) so that the timekeeping maintainer can comment on each new feature you add to their code. cs_stable is the part that I'm still a bit wary of; here are the doubts I have: - if you want stability, you can use the CLOCK_SOURCE_UNSTABLE flag; a new callback shouldn't be needed (it's certainly not needed for TSC). - the meaning of "stable" for kvmclock is not exactly the same as clocksource_mark_unstable. Maybe what we want is some kind of "bool cycles_valid", and then read_clock_and_systime can return it: if (clock->read_clock_and_systime) { systime_snapshot->cycles_valid = clock->read_clock_and_systime( &now, &systime_snapshot->cycles); } else { now = tk_clock_read(&tk->tkr_mono); systime_snapshot->cycles_valid = true; systime_snapshot->cycles = now; } ? (This is honestly just a suggestion, which may be wrong depedning on the answer to the two questions above). Paolo > systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq; > systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq; > base_real = ktime_add(tk->tkr_mono.base, > tk_core.timekeeper.offs_real); > base_raw = tk->tkr_raw.base; > + base_boot = ktime_add(tk->tkr_mono.base, > + tk_core.timekeeper.offs_boot); > nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now); > nsec_raw = timekeeping_cycles_to_ns(&tk->tkr_raw, now); > } while (read_seqcount_retry(&tk_core.seq, seq)); > > - systime_snapshot->cycles = now; > systime_snapshot->real = ktime_add_ns(base_real, nsec_real); > systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw); > + systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real); > } > EXPORT_SYMBOL_GPL(ktime_get_snapshot); > >
On 31.07.2017 17:20, Paolo Bonzini wrote: > On 29/07/2017 14:35, Denis Plotnikov wrote: >> arch/x86/kernel/kvmclock.c | 14 ++++++++++++-- >> arch/x86/kernel/tsc.c | 6 ++++++ >> arch/x86/kvm/x86.c | 26 ++++++++++++++++++-------- >> include/linux/clocksource.h | 3 +++ >> include/linux/timekeeping.h | 2 ++ >> kernel/time/timekeeping.c | 21 +++++++++++++++++++-- >> 6 files changed, 60 insertions(+), 12 deletions(-) > > This is pretty clean, thanks. Only it should be split in several patches: > > - introducing read_with_cycles and using it in ktime_get_snapshot. > Looking at the code, the meaning of "cycles" is overloaded, so perhaps > rename it to read_clock_and_systime or something similar? > agree > - introducing boot time in ktime_get_snapshot agree > > - implementing kvm_clock_read_with_cycles (can be merged with patch 6) Having a stable clocksource for kvmklock means making kvmclock stable. The patch enables this functionality that's why I'd prefer to keep patch 6 separate > > - adding the cs_stable field to struct system_time_snapshot (see below, > maybe this can be merged with read_with_cycles) agree > > - using ktime_get_snapshot in KVM (can be merged with patch 4?) agree, but want to keep 4 separate. Just to make the changes done logically consecutive in git tree. > > so that the timekeeping maintainer can comment on each new feature you > add to their code. > > cs_stable is the part that I'm still a bit wary of; here are the doubts > I have: > > - if you want stability, you can use the CLOCK_SOURCE_UNSTABLE flag; a > new callback shouldn't be needed (it's certainly not needed for TSC). > > - the meaning of "stable" for kvmclock is not exactly the same as > clocksource_mark_unstable. > > Maybe what we want is some kind of "bool cycles_valid", and then > read_clock_and_systime can return it: > > > if (clock->read_clock_and_systime) { > systime_snapshot->cycles_valid = clock->read_clock_and_systime( > &now, &systime_snapshot->cycles); > } else { > now = tk_clock_read(&tk->tkr_mono); > systime_snapshot->cycles_valid = true; > systime_snapshot->cycles = now; > } > > ? (This is honestly just a suggestion, which may be wrong depedning > on the answer to the two questions above). cs_stable means "there is no unexpected time jumps". As you mentioned it's irrelevant for tsc but makes sense to kvmclock (and possibly some other clocksources). I think it's reasonable to ask a clocksource directly whether it's stable or not. That's why I made the callback. I don't mind merging this "check stability" functionality with read_clock_and_systime. Actually, I thought about having it there but eventually split it to make the code more explicit (detailed and understandable). I'd like to use your approach but keep the variable name the same. Thanks for reviewing! Denis > > Paolo > >> systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq; >> systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq; >> base_real = ktime_add(tk->tkr_mono.base, >> tk_core.timekeeper.offs_real); >> base_raw = tk->tkr_raw.base; >> + base_boot = ktime_add(tk->tkr_mono.base, >> + tk_core.timekeeper.offs_boot); >> nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now); >> nsec_raw = timekeeping_cycles_to_ns(&tk->tkr_raw, now); >> } while (read_seqcount_retry(&tk_core.seq, seq)); >> >> - systime_snapshot->cycles = now; >> systime_snapshot->real = ktime_add_ns(base_real, nsec_real); >> systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw); >> + systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real); >> } >> EXPORT_SYMBOL_GPL(ktime_get_snapshot); >> >> >
On 01/08/2017 11:30, Denis Plotnikov wrote: >> - implementing kvm_clock_read_with_cycles (can be merged with patch 6) > > Having a stable clocksource for kvmklock means making kvmclock stable. > The patch enables this functionality that's why I'd prefer to keep patch > 6 separate Ok, though it depends on how you deal with cs_stable. >> - using ktime_get_snapshot in KVM (can be merged with patch 4?) > > agree, but want to keep 4 separate. Just to make the changes done > logically consecutive in git tree. Fine by me. >> Maybe what we want is some kind of "bool cycles_valid", and then >> read_clock_and_systime can return it: >> >> >> if (clock->read_clock_and_systime) { >> systime_snapshot->cycles_valid = >> clock->read_clock_and_systime( >> &now, &systime_snapshot->cycles); >> } else { >> now = tk_clock_read(&tk->tkr_mono); >> systime_snapshot->cycles_valid = true; >> systime_snapshot->cycles = now; >> } >> >> ? (This is honestly just a suggestion, which may be wrong depedning >> on the answer to the two questions above). > > cs_stable means "there is no unexpected time jumps". But even for kvmclock there are no unexpected time jumps, the global accumulator in pvclock_clocksource_read ensures that. And the concept is different from CLOCK_SOURCE_UNSTABLE which will basically blacklist the clocksource for hrtimers. It seems a different concept to me, somewhat specific to ktime_get_snapshot. More precisely, the question is "is there a 1:1 mapping from cycles to nanoseconds?"---but if there is no such mapping cycles is useless, hence my proposal of "cycles_valid". Thanks, Paolo > I don't mind merging this "check stability" functionality with > read_clock_and_systime. Actually, I thought about having it there but > eventually split it to make the code more explicit > (detailed and understandable). > I'd like to use your approach but keep the variable name the same. > > Thanks for reviewing! > > Denis >> >> Paolo >> >>> systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq; >>> systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq; >>> base_real = ktime_add(tk->tkr_mono.base, >>> tk_core.timekeeper.offs_real); >>> base_raw = tk->tkr_raw.base; >>> + base_boot = ktime_add(tk->tkr_mono.base, >>> + tk_core.timekeeper.offs_boot); >>> nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now); >>> nsec_raw = timekeeping_cycles_to_ns(&tk->tkr_raw, now); >>> } while (read_seqcount_retry(&tk_core.seq, seq)); >>> - systime_snapshot->cycles = now; >>> systime_snapshot->real = ktime_add_ns(base_real, nsec_real); >>> systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw); >>> + systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real); >>> } >>> EXPORT_SYMBOL_GPL(ktime_get_snapshot); >>> >> >
A couple more observation now that we've agreed on the approach. On 01/08/2017 12:03, Paolo Bonzini wrote: > >>> Maybe what we want is some kind of "bool cycles_valid", and then >>> read_clock_and_systime can return it: >>> >>> >>> if (clock->read_clock_and_systime) { >>> systime_snapshot->cycles_valid = >>> clock->read_clock_and_systime( >>> &now, &systime_snapshot->cycles); Please pass the clocksource to the new function. However this: > + clock = tk->tkr_mono.clock; should then be READ_ONCE for the same reason mentioned in tk_clock_read's comments. Then, >>> } else { >>> now = tk_clock_read(&tk->tkr_mono); this can call clock->read(clock) directly, without going through tk_clock_read. Thanks, Paolo >>> systime_snapshot->cycles_valid = true; >>> systime_snapshot->cycles = now; >>> }
On 01.08.2017 13:03, Paolo Bonzini wrote: > On 01/08/2017 11:30, Denis Plotnikov wrote: >>> - implementing kvm_clock_read_with_cycles (can be merged with patch 6) >> >> Having a stable clocksource for kvmklock means making kvmclock stable. >> The patch enables this functionality that's why I'd prefer to keep patch >> 6 separate > > Ok, though it depends on how you deal with cs_stable. > >>> - using ktime_get_snapshot in KVM (can be merged with patch 4?) >> >> agree, but want to keep 4 separate. Just to make the changes done >> logically consecutive in git tree. > > Fine by me. > >>> Maybe what we want is some kind of "bool cycles_valid", and then >>> read_clock_and_systime can return it: >>> >>> >>> if (clock->read_clock_and_systime) { >>> systime_snapshot->cycles_valid = >>> clock->read_clock_and_systime( >>> &now, &systime_snapshot->cycles); >>> } else { >>> now = tk_clock_read(&tk->tkr_mono); >>> systime_snapshot->cycles_valid = true; >>> systime_snapshot->cycles = now; >>> } >>> >>> ? (This is honestly just a suggestion, which may be wrong depedning >>> on the answer to the two questions above). >> >> cs_stable means "there is no unexpected time jumps". > > But even for kvmclock there are no unexpected time jumps, the global > accumulator in pvclock_clocksource_read ensures that. And the concept > is different from CLOCK_SOURCE_UNSTABLE which will basically blacklist > the clocksource for hrtimers. > > It seems a different concept to me, somewhat specific to > ktime_get_snapshot. More precisely, the question is "is there a 1:1 > mapping from cycles to nanoseconds?"---but if there is no such mapping > cycles is useless, hence my proposal of "cycles_valid". > > Thanks, > > Paolo In fact, this "cycles_valid" is going to be used for deciding whether to use KVM masterclock or not. And if it's not we still want to know cycles_stamp value to use it in KVM. So the cycles is valid, but clocksource is not reliable (why? let decide to a clocksource, by default assume they are all not stable), thus we must calculate time values for a guest each time its needed. So, my proposal is to name the variable sightly differently: cs_reliable and go like: if (clock->read_clock_with_stamp) { systime_snapshot->cs_reliable = clock->read_clock_with_stamp( &now, &systime_snapshot->cycles); } else { now = tk_clock_read(&tk->tkr_mono); systime_snapshot->cs_reliable = false; systime_snapshot->cycles = now; } What do you think? Thanks! Denis > >> I don't mind merging this "check stability" functionality with >> read_clock_and_systime. Actually, I thought about having it there but >> eventually split it to make the code more explicit >> (detailed and understandable). >> I'd like to use your approach but keep the variable name the same. >> >> Thanks for reviewing! >> >> Denis >>> >>> Paolo >>> >>>> systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq; >>>> systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq; >>>> base_real = ktime_add(tk->tkr_mono.base, >>>> tk_core.timekeeper.offs_real); >>>> base_raw = tk->tkr_raw.base; >>>> + base_boot = ktime_add(tk->tkr_mono.base, >>>> + tk_core.timekeeper.offs_boot); >>>> nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now); >>>> nsec_raw = timekeeping_cycles_to_ns(&tk->tkr_raw, now); >>>> } while (read_seqcount_retry(&tk_core.seq, seq)); >>>> - systime_snapshot->cycles = now; >>>> systime_snapshot->real = ktime_add_ns(base_real, nsec_real); >>>> systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw); >>>> + systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real); >>>> } >>>> EXPORT_SYMBOL_GPL(ktime_get_snapshot); >>>> >>> >> >
On 01.08.2017 15:11, Denis Plotnikov wrote: > > > On 01.08.2017 13:03, Paolo Bonzini wrote: >> On 01/08/2017 11:30, Denis Plotnikov wrote: >>>> - implementing kvm_clock_read_with_cycles (can be merged with patch 6) >>> >>> Having a stable clocksource for kvmklock means making kvmclock stable. >>> The patch enables this functionality that's why I'd prefer to keep patch >>> 6 separate >> >> Ok, though it depends on how you deal with cs_stable. >> >>>> - using ktime_get_snapshot in KVM (can be merged with patch 4?) >>> >>> agree, but want to keep 4 separate. Just to make the changes done >>> logically consecutive in git tree. >> >> Fine by me. >> >>>> Maybe what we want is some kind of "bool cycles_valid", and then >>>> read_clock_and_systime can return it: >>>> >>>> >>>> if (clock->read_clock_and_systime) { >>>> systime_snapshot->cycles_valid = >>>> clock->read_clock_and_systime( >>>> &now, &systime_snapshot->cycles); >>>> } else { >>>> now = tk_clock_read(&tk->tkr_mono); >>>> systime_snapshot->cycles_valid = true; >>>> systime_snapshot->cycles = now; >>>> } >>>> >>>> ? (This is honestly just a suggestion, which may be wrong depedning >>>> on the answer to the two questions above). >>> >>> cs_stable means "there is no unexpected time jumps". >> >> But even for kvmclock there are no unexpected time jumps, the global >> accumulator in pvclock_clocksource_read ensures that. And the concept >> is different from CLOCK_SOURCE_UNSTABLE which will basically blacklist >> the clocksource for hrtimers. >> >> It seems a different concept to me, somewhat specific to >> ktime_get_snapshot. More precisely, the question is "is there a 1:1 >> mapping from cycles to nanoseconds?"---but if there is no such mapping >> cycles is useless, hence my proposal of "cycles_valid". >> >> Thanks, >> >> Paolo > In fact, this "cycles_valid" is going to be used for deciding whether to > use KVM masterclock or not. And if it's not we still want to know > cycles_stamp value to use it in KVM. > So the cycles is valid, but clocksource is not reliable (why? let decide > to a clocksource, by default assume they are all not stable), thus we > must calculate time values for a guest each time its needed. > So, my proposal is to name the variable sightly differently: cs_reliable > and go like: > if (clock->read_clock_with_stamp) { > systime_snapshot->cs_reliable = > clock->read_clock_with_stamp( > &now, &systime_snapshot->cycles); > } else { > now = tk_clock_read(&tk->tkr_mono); > systime_snapshot->cs_reliable = false; > systime_snapshot->cycles = now; > } > What do you think? > > Thanks! > > Denis How about this: Let's name the variable cycles_reliable. if it's true then what systime_snapshot->cycles holds, is truly cycles, if false then it's something else -- some special care needed Thanks! Denis >> >>> I don't mind merging this "check stability" functionality with >>> read_clock_and_systime. Actually, I thought about having it there but >>> eventually split it to make the code more explicit >>> (detailed and understandable). >>> I'd like to use your approach but keep the variable name the same. >>> >>> Thanks for reviewing! >>> >>> Denis >>>> >>>> Paolo >>>> >>>>> systime_snapshot->cs_was_changed_seq = >>>>> tk->cs_was_changed_seq; >>>>> systime_snapshot->clock_was_set_seq = >>>>> tk->clock_was_set_seq; >>>>> base_real = ktime_add(tk->tkr_mono.base, >>>>> tk_core.timekeeper.offs_real); >>>>> base_raw = tk->tkr_raw.base; >>>>> + base_boot = ktime_add(tk->tkr_mono.base, >>>>> + tk_core.timekeeper.offs_boot); >>>>> nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now); >>>>> nsec_raw = timekeeping_cycles_to_ns(&tk->tkr_raw, now); >>>>> } while (read_seqcount_retry(&tk_core.seq, seq)); >>>>> - systime_snapshot->cycles = now; >>>>> systime_snapshot->real = ktime_add_ns(base_real, nsec_real); >>>>> systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw); >>>>> + systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real); >>>>> } >>>>> EXPORT_SYMBOL_GPL(ktime_get_snapshot); >>>> >>> >> >
On 01/08/2017 14:11, Denis Plotnikov wrote: > In fact, this "cycles_valid" is going to be used for deciding whether to > use KVM masterclock or not. And if it's not we still want to know > cycles_stamp value to use it in KVM. Why? Neither pvclock_update_vm_gtod_copy nor kvm_pv_clock_pairing do anything with the two variables that are passed by reference, if the read returns false. Hence my suggestion of calling it cycles_valid. > So the cycles is valid, but clocksource is not reliable (why? let decide > to a clocksource, by default assume they are all not stable), thus we > must calculate time values for a guest each time its needed. > So, my proposal is to name the variable sightly differently: cs_reliable > and go like: > if (clock->read_clock_with_stamp) { > systime_snapshot->cs_reliable = > clock->read_clock_with_stamp( > &now, &systime_snapshot->cycles); > } else { > now = tk_clock_read(&tk->tkr_mono); > systime_snapshot->cs_reliable = false; > systime_snapshot->cycles = now; > } > What do you think? I'm afraid you still have to define the meaning of "reliable". (Though I agree that the right default is false, that was a thinko on my side. This also means that you need to define read_clock_with_stamp for the TSC clocksource too). Paolo
On 01.08.2017 15:41, Paolo Bonzini wrote: > On 01/08/2017 14:11, Denis Plotnikov wrote: >> In fact, this "cycles_valid" is going to be used for deciding whether to >> use KVM masterclock or not. And if it's not we still want to know >> cycles_stamp value to use it in KVM. > > Why? Neither pvclock_update_vm_gtod_copy nor kvm_pv_clock_pairing do > anything with the two variables that are passed by reference, if the > read returns false. Hence my suggestion of calling it cycles_valid. giving up > >> So the cycles is valid, but clocksource is not reliable (why? let decide >> to a clocksource, by default assume they are all not stable), thus we >> must calculate time values for a guest each time its needed. >> So, my proposal is to name the variable sightly differently: cs_reliable >> and go like: >> if (clock->read_clock_with_stamp) { >> systime_snapshot->cs_reliable = >> clock->read_clock_with_stamp( >> &now, &systime_snapshot->cycles); >> } else { >> now = tk_clock_read(&tk->tkr_mono); >> systime_snapshot->cs_reliable = false; >> systime_snapshot->cycles = now; >> } >> What do you think? > > I'm afraid you still have to define the meaning of "reliable". (Though > I agree that the right default is false, that was a thinko on my side. > This also means that you need to define read_clock_with_stamp for the > TSC clocksource too). > > Paolo > agree name: cycles_valid default: false read_clock_with_stamp: defined for tsc returning true Thanks! Patches are in progress. They well be sent soon, nice and shiny! Denis
2017-08-01 15:46+0300, Denis Plotnikov: > Patches are in progress. > They well be sent soon, nice and shiny! Please Cc timekeeper and x86 people too, thanks!
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index f5cfc5d..52156d9 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -82,7 +82,7 @@ static int kvm_set_wallclock(const struct timespec *now) return -1; } -static u64 kvm_clock_read(void) +static inline u64 __kvm_clock_read(u64 *cycles) { struct pvclock_vcpu_time_info *src; u64 ret; @@ -91,10 +91,14 @@ static u64 kvm_clock_read(void) preempt_disable_notrace(); cpu = smp_processor_id(); src = &hv_clock[cpu].pvti; - ret = pvclock_clocksource_read(src, NULL); + ret = pvclock_clocksource_read(src, cycles); preempt_enable_notrace(); return ret; } +static u64 kvm_clock_read(void) +{ + return __kvm_clock_read(NULL); +} static u64 kvm_clock_get_cycles(struct clocksource *cs) { @@ -177,9 +181,15 @@ bool kvm_check_and_clear_guest_paused(void) return ret; } +static void kvm_clock_read_with_cycles(u64 *cycles, u64 *cycles_stamp) +{ + *cycles = __kvm_clock_read(cycles_stamp); +} + struct clocksource kvm_clock = { .name = "kvm-clock", .read = kvm_clock_get_cycles, + .read_with_cycles = kvm_clock_read_with_cycles, .rating = 400, .mask = CLOCKSOURCE_MASK(64), .flags = CLOCK_SOURCE_IS_CONTINUOUS, diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index 796d96b..5d655af 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1015,6 +1015,11 @@ static u64 read_tsc(struct clocksource *cs) return (u64)rdtsc_ordered(); } +static bool is_tsc_stable(void) +{ + return !tsc_unstable; +} + static void tsc_cs_mark_unstable(struct clocksource *cs) { if (tsc_unstable) @@ -1043,6 +1048,7 @@ static struct clocksource clocksource_tsc = { .name = "tsc", .rating = 300, .read = read_tsc, + .is_stable = is_tsc_stable, .mask = CLOCKSOURCE_MASK(64), .flags = CLOCK_SOURCE_IS_CONTINUOUS | CLOCK_SOURCE_MUST_VERIFY, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6c97c82..496e731 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1643,22 +1643,32 @@ static int do_realtime(struct timespec *ts, u64 *cycle_now) /* returns true if host is using tsc clocksource */ static bool kvm_get_time_and_clockread(s64 *kernel_ns, u64 *cycle_now) { - /* checked again under seqlock below */ - if (pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC) - return false; + struct system_time_snapshot systime_snapshot; + + ktime_get_snapshot(&systime_snapshot); + + if (systime_snapshot.cs_stable) { + *kernel_ns = ktime_to_ns(systime_snapshot.boot); + *cycle_now = systime_snapshot.cycles; + } - return do_monotonic_boot(kernel_ns, cycle_now) == VCLOCK_TSC; + return systime_snapshot.cs_stable; } /* returns true if host is using tsc clocksource */ static bool kvm_get_walltime_and_clockread(struct timespec *ts, u64 *cycle_now) { - /* checked again under seqlock below */ - if (pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC) - return false; + struct system_time_snapshot systime_snapshot; + + ktime_get_snapshot(&systime_snapshot); + + if (systime_snapshot.cs_stable) { + *ts = ktime_to_timespec(systime_snapshot.real); + *cycle_now = systime_snapshot.cycles; + } - return do_realtime(ts, cycle_now) == VCLOCK_TSC; + return systime_snapshot.cs_stable; } #endif diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index a78cb18..f849b91 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -49,6 +49,7 @@ struct module; * The ideal clocksource. A must-use where * available. * @read: returns a cycle value, passes clocksource as argument + * @read_with_cycles * @enable: optional function to enable the clocksource * @disable: optional function to disable the clocksource * @mask: bitmask for two's complement @@ -78,6 +79,8 @@ struct module; */ struct clocksource { u64 (*read)(struct clocksource *cs); + void (*read_with_cycles)(u64 *cycles, u64 *cycles_stamp); + bool (*is_stable)(void); u64 mask; u32 mult; u32 shift; diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h index ddc229f..21917fa 100644 --- a/include/linux/timekeeping.h +++ b/include/linux/timekeeping.h @@ -290,8 +290,10 @@ struct system_time_snapshot { u64 cycles; ktime_t real; ktime_t raw; + ktime_t boot; unsigned int clock_was_set_seq; u8 cs_was_changed_seq; + bool cs_stable; }; /* diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index cedafa0..a2bfc12 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -953,27 +953,44 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot) unsigned long seq; ktime_t base_raw; ktime_t base_real; + ktime_t base_boot; u64 nsec_raw; u64 nsec_real; u64 now; + struct clocksource *clock; WARN_ON_ONCE(timekeeping_suspended); do { seq = read_seqcount_begin(&tk_core.seq); - now = tk_clock_read(&tk->tkr_mono); + clock = tk->tkr_mono.clock; + + if (clock->is_stable) + systime_snapshot->cs_stable = clock->is_stable(); + else + systime_snapshot->cs_stable = false; + + if (clock->read_with_cycles) { + clock->read_with_cycles( + &now, &systime_snapshot->cycles); + } else { + now = tk_clock_read(&tk->tkr_mono); + systime_snapshot->cycles = now; + } systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq; systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq; base_real = ktime_add(tk->tkr_mono.base, tk_core.timekeeper.offs_real); base_raw = tk->tkr_raw.base; + base_boot = ktime_add(tk->tkr_mono.base, + tk_core.timekeeper.offs_boot); nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now); nsec_raw = timekeeping_cycles_to_ns(&tk->tkr_raw, now); } while (read_seqcount_retry(&tk_core.seq, seq)); - systime_snapshot->cycles = now; systime_snapshot->real = ktime_add_ns(base_real, nsec_real); systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw); + systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real); } EXPORT_SYMBOL_GPL(ktime_get_snapshot);
It is reasonable to switch KVM to using a more simple, effective and conceptually correct scheme of dealing with the data needed for kvm masterclock values calculation. With the current scheme the kvm needs to have an up-to-date copy of some timekeeper data to provide a guest using kvmclock with necessary information. This is not: - simple KVM has to have a lot of code to do that, instead KVM could use a timekeeper function to get all the data it needs - effective the copy of the data used for time data calculation is updated every time it changed although this is not necessary since the updated piece of time data is needed in certain moments only (e.g masterclock updating), instead KVM can request this data directly form the timekeeper at the moments when it's really needed - conceptually correct to do the work (calculate the time data) which the other part of the system (timekeeper) has been designed and is able to do is not the proper way, instead deligate the work to the proper part This patch switches KVM to using the improved timekeeper function for the kvm masterclock time data. Removing the leftovers of the old scheme is the matter of the next patches. Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com> --- arch/x86/kernel/kvmclock.c | 14 ++++++++++++-- arch/x86/kernel/tsc.c | 6 ++++++ arch/x86/kvm/x86.c | 26 ++++++++++++++++++-------- include/linux/clocksource.h | 3 +++ include/linux/timekeeping.h | 2 ++ kernel/time/timekeeping.c | 21 +++++++++++++++++++-- 6 files changed, 60 insertions(+), 12 deletions(-)