diff mbox

[v3,2/6] KVM: x86: switch to masterclock update using timekeeper functionality

Message ID 1501331711-12961-3-git-send-email-dplotnikov@virtuozzo.com (mailing list archive)
State New, archived
Headers show

Commit Message

Denis Plotnikov July 29, 2017, 12:35 p.m. UTC
It is reasonable to switch KVM to using a more simple, effective
and conceptually correct scheme of dealing with the data needed
for kvm masterclock values calculation.

With the current scheme the kvm needs to have an up-to-date copy of
some timekeeper data to provide a guest using kvmclock with necessary
information.

This is not:
    - simple
        KVM has to have a lot of code to do that, instead KVM could use
        a timekeeper function to get all the data it needs
    - effective
        the copy of the data used for time data calculation is updated
        every time it changed although this is not necessary since
	the updated piece of time data is needed in certain moments only
        (e.g masterclock updating), instead KVM can request this data
        directly form the timekeeper at the moments when it's really needed
    - conceptually correct
        to do the work (calculate the time data) which the other part
	of the system (timekeeper) has been designed and is able to do
        is not the proper way, instead deligate the work to the proper part

This patch switches KVM to using the improved timekeeper function for
the kvm masterclock time data.

Removing the leftovers of the old scheme is the matter of the next patches.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 arch/x86/kernel/kvmclock.c  | 14 ++++++++++++--
 arch/x86/kernel/tsc.c       |  6 ++++++
 arch/x86/kvm/x86.c          | 26 ++++++++++++++++++--------
 include/linux/clocksource.h |  3 +++
 include/linux/timekeeping.h |  2 ++
 kernel/time/timekeeping.c   | 21 +++++++++++++++++++--
 6 files changed, 60 insertions(+), 12 deletions(-)

Comments

Paolo Bonzini July 31, 2017, 2:20 p.m. UTC | #1
On 29/07/2017 14:35, Denis Plotnikov wrote:
>  arch/x86/kernel/kvmclock.c  | 14 ++++++++++++--
>  arch/x86/kernel/tsc.c       |  6 ++++++
>  arch/x86/kvm/x86.c          | 26 ++++++++++++++++++--------
>  include/linux/clocksource.h |  3 +++
>  include/linux/timekeeping.h |  2 ++
>  kernel/time/timekeeping.c   | 21 +++++++++++++++++++--
>  6 files changed, 60 insertions(+), 12 deletions(-)

This is pretty clean, thanks.  Only it should be split in several patches:

- introducing read_with_cycles and using it in ktime_get_snapshot.  
Looking at the code, the meaning of "cycles" is overloaded, so perhaps 
rename it to read_clock_and_systime or something similar?

- introducing boot time in ktime_get_snapshot

- implementing kvm_clock_read_with_cycles (can be merged with patch 6)

- adding the cs_stable field to struct system_time_snapshot (see below,
maybe this can be merged with read_with_cycles)

- using ktime_get_snapshot in KVM (can be merged with patch 4?)

so that the timekeeping maintainer can comment on each new feature you 
add to their code.

cs_stable is the part that I'm still a bit wary of; here are the doubts
I have:

- if you want stability, you can use the CLOCK_SOURCE_UNSTABLE flag; a 
new callback shouldn't be needed (it's certainly not needed for TSC).

- the meaning of "stable" for kvmclock is not exactly the same as 
clocksource_mark_unstable.

Maybe what we want is some kind of "bool cycles_valid", and then 
read_clock_and_systime can return it:


		if (clock->read_clock_and_systime) {
			systime_snapshot->cycles_valid = clock->read_clock_and_systime(
				&now, &systime_snapshot->cycles);
		} else {
			now = tk_clock_read(&tk->tkr_mono);
			systime_snapshot->cycles_valid = true;
			systime_snapshot->cycles = now;
		}

?  (This is honestly just a suggestion, which may be wrong depedning
on the answer to the two questions above).

Paolo

>  		systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq;
>  		systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
>  		base_real = ktime_add(tk->tkr_mono.base,
>  				      tk_core.timekeeper.offs_real);
>  		base_raw = tk->tkr_raw.base;
> +		base_boot = ktime_add(tk->tkr_mono.base,
> +				      tk_core.timekeeper.offs_boot);
>  		nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
>  		nsec_raw  = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
>  	} while (read_seqcount_retry(&tk_core.seq, seq));
>  
> -	systime_snapshot->cycles = now;
>  	systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
>  	systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
> +	systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real);
>  }
>  EXPORT_SYMBOL_GPL(ktime_get_snapshot);
>  
>
Denis Plotnikov Aug. 1, 2017, 9:30 a.m. UTC | #2
On 31.07.2017 17:20, Paolo Bonzini wrote:
> On 29/07/2017 14:35, Denis Plotnikov wrote:
>>   arch/x86/kernel/kvmclock.c  | 14 ++++++++++++--
>>   arch/x86/kernel/tsc.c       |  6 ++++++
>>   arch/x86/kvm/x86.c          | 26 ++++++++++++++++++--------
>>   include/linux/clocksource.h |  3 +++
>>   include/linux/timekeeping.h |  2 ++
>>   kernel/time/timekeeping.c   | 21 +++++++++++++++++++--
>>   6 files changed, 60 insertions(+), 12 deletions(-)
> 
> This is pretty clean, thanks.  Only it should be split in several patches:
> 
> - introducing read_with_cycles and using it in ktime_get_snapshot.
> Looking at the code, the meaning of "cycles" is overloaded, so perhaps
> rename it to read_clock_and_systime or something similar?
>
agree
> - introducing boot time in ktime_get_snapshot
agree
> 
> - implementing kvm_clock_read_with_cycles (can be merged with patch 6)
Having a stable clocksource for kvmklock means making kvmclock stable. 
The patch enables this functionality that's why I'd prefer to keep patch 
6 separate
> 
> - adding the cs_stable field to struct system_time_snapshot (see below,
> maybe this can be merged with read_with_cycles)
agree
> 
> - using ktime_get_snapshot in KVM (can be merged with patch 4?)
agree, but want to keep 4 separate. Just to make the changes done 
logically consecutive in git tree.
> 
> so that the timekeeping maintainer can comment on each new feature you
> add to their code.
> 
> cs_stable is the part that I'm still a bit wary of; here are the doubts
> I have:
> 
> - if you want stability, you can use the CLOCK_SOURCE_UNSTABLE flag; a
> new callback shouldn't be needed (it's certainly not needed for TSC).
> 
> - the meaning of "stable" for kvmclock is not exactly the same as
> clocksource_mark_unstable.
> 
> Maybe what we want is some kind of "bool cycles_valid", and then
> read_clock_and_systime can return it:
> 
> 
> 		if (clock->read_clock_and_systime) {
> 			systime_snapshot->cycles_valid = clock->read_clock_and_systime(
> 				&now, &systime_snapshot->cycles);
> 		} else {
> 			now = tk_clock_read(&tk->tkr_mono);
> 			systime_snapshot->cycles_valid = true;
> 			systime_snapshot->cycles = now;
> 		}
> 
> ?  (This is honestly just a suggestion, which may be wrong depedning
> on the answer to the two questions above).
cs_stable means "there is no unexpected time jumps". As you mentioned 
it's irrelevant for tsc but makes sense to kvmclock (and possibly some 
other clocksources). I think it's reasonable to ask a clocksource 
directly whether it's stable or not. That's why I made the callback.
I don't mind merging this "check stability" functionality with 
read_clock_and_systime. Actually, I thought about having it there but 
eventually split it to make the code more explicit
(detailed and understandable).
I'd like to use your approach but keep the variable name the same.

Thanks for reviewing!

Denis
> 
> Paolo
> 
>>   		systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq;
>>   		systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
>>   		base_real = ktime_add(tk->tkr_mono.base,
>>   				      tk_core.timekeeper.offs_real);
>>   		base_raw = tk->tkr_raw.base;
>> +		base_boot = ktime_add(tk->tkr_mono.base,
>> +				      tk_core.timekeeper.offs_boot);
>>   		nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
>>   		nsec_raw  = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
>>   	} while (read_seqcount_retry(&tk_core.seq, seq));
>>   
>> -	systime_snapshot->cycles = now;
>>   	systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
>>   	systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
>> +	systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real);
>>   }
>>   EXPORT_SYMBOL_GPL(ktime_get_snapshot);
>>   
>>
>
Paolo Bonzini Aug. 1, 2017, 10:03 a.m. UTC | #3
On 01/08/2017 11:30, Denis Plotnikov wrote:
>> - implementing kvm_clock_read_with_cycles (can be merged with patch 6)
>
> Having a stable clocksource for kvmklock means making kvmclock stable.
> The patch enables this functionality that's why I'd prefer to keep patch
> 6 separate

Ok, though it depends on how you deal with cs_stable.

>> - using ktime_get_snapshot in KVM (can be merged with patch 4?)
>
> agree, but want to keep 4 separate. Just to make the changes done
> logically consecutive in git tree.

Fine by me.

>> Maybe what we want is some kind of "bool cycles_valid", and then
>> read_clock_and_systime can return it:
>>
>>
>>         if (clock->read_clock_and_systime) {
>>             systime_snapshot->cycles_valid =
>>              clock->read_clock_and_systime(
>>                 &now, &systime_snapshot->cycles);
>>         } else {
>>             now = tk_clock_read(&tk->tkr_mono);
>>             systime_snapshot->cycles_valid = true;
>>             systime_snapshot->cycles = now;
>>         }
>>
>> ?  (This is honestly just a suggestion, which may be wrong depedning
>> on the answer to the two questions above).
>
> cs_stable means "there is no unexpected time jumps".

But even for kvmclock there are no unexpected time jumps, the global
accumulator in pvclock_clocksource_read ensures that.  And the concept
is different from CLOCK_SOURCE_UNSTABLE which will basically blacklist
the clocksource for hrtimers.

It seems a different concept to me, somewhat specific to
ktime_get_snapshot.  More precisely, the question is "is there a 1:1
mapping from cycles to nanoseconds?"---but if there is no such mapping
cycles is useless, hence my proposal of "cycles_valid".

Thanks,

Paolo

> I don't mind merging this "check stability" functionality with
> read_clock_and_systime. Actually, I thought about having it there but
> eventually split it to make the code more explicit
> (detailed and understandable).
> I'd like to use your approach but keep the variable name the same.
> 
> Thanks for reviewing!
> 
> Denis
>>
>> Paolo
>>
>>>           systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq;
>>>           systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
>>>           base_real = ktime_add(tk->tkr_mono.base,
>>>                         tk_core.timekeeper.offs_real);
>>>           base_raw = tk->tkr_raw.base;
>>> +        base_boot = ktime_add(tk->tkr_mono.base,
>>> +                      tk_core.timekeeper.offs_boot);
>>>           nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
>>>           nsec_raw  = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
>>>       } while (read_seqcount_retry(&tk_core.seq, seq));
>>>   -    systime_snapshot->cycles = now;
>>>       systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
>>>       systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
>>> +    systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real);
>>>   }
>>>   EXPORT_SYMBOL_GPL(ktime_get_snapshot);
>>>  
>>
>
Paolo Bonzini Aug. 1, 2017, 10:16 a.m. UTC | #4
A couple more observation now that we've agreed on the approach.

On 01/08/2017 12:03, Paolo Bonzini wrote:
> 
>>> Maybe what we want is some kind of "bool cycles_valid", and then
>>> read_clock_and_systime can return it:
>>>
>>>
>>>         if (clock->read_clock_and_systime) {
>>>             systime_snapshot->cycles_valid =
>>>              clock->read_clock_and_systime(
>>>                 &now, &systime_snapshot->cycles);

Please pass the clocksource to the new function.  However this:

> +		clock = tk->tkr_mono.clock;

should then be READ_ONCE for the same reason mentioned in
tk_clock_read's comments.  Then,

>>>         } else {
>>>             now = tk_clock_read(&tk->tkr_mono);

this can call clock->read(clock) directly, without going through
tk_clock_read.

Thanks,

Paolo

>>>             systime_snapshot->cycles_valid = true;
>>>             systime_snapshot->cycles = now;
>>>         }
Denis Plotnikov Aug. 1, 2017, 12:11 p.m. UTC | #5
On 01.08.2017 13:03, Paolo Bonzini wrote:
> On 01/08/2017 11:30, Denis Plotnikov wrote:
>>> - implementing kvm_clock_read_with_cycles (can be merged with patch 6)
>>
>> Having a stable clocksource for kvmklock means making kvmclock stable.
>> The patch enables this functionality that's why I'd prefer to keep patch
>> 6 separate
> 
> Ok, though it depends on how you deal with cs_stable.
> 
>>> - using ktime_get_snapshot in KVM (can be merged with patch 4?)
>>
>> agree, but want to keep 4 separate. Just to make the changes done
>> logically consecutive in git tree.
> 
> Fine by me.
> 
>>> Maybe what we want is some kind of "bool cycles_valid", and then
>>> read_clock_and_systime can return it:
>>>
>>>
>>>          if (clock->read_clock_and_systime) {
>>>              systime_snapshot->cycles_valid =
>>>               clock->read_clock_and_systime(
>>>                  &now, &systime_snapshot->cycles);
>>>          } else {
>>>              now = tk_clock_read(&tk->tkr_mono);
>>>              systime_snapshot->cycles_valid = true;
>>>              systime_snapshot->cycles = now;
>>>          }
>>>
>>> ?  (This is honestly just a suggestion, which may be wrong depedning
>>> on the answer to the two questions above).
>>
>> cs_stable means "there is no unexpected time jumps".
> 
> But even for kvmclock there are no unexpected time jumps, the global
> accumulator in pvclock_clocksource_read ensures that.  And the concept
> is different from CLOCK_SOURCE_UNSTABLE which will basically blacklist
> the clocksource for hrtimers.
> 
> It seems a different concept to me, somewhat specific to
> ktime_get_snapshot.  More precisely, the question is "is there a 1:1
> mapping from cycles to nanoseconds?"---but if there is no such mapping
> cycles is useless, hence my proposal of "cycles_valid".
> 
> Thanks,
> 
> Paolo
In fact, this "cycles_valid" is going to be used for deciding whether to 
use KVM masterclock or not. And if it's not we still want to know 
cycles_stamp value to use it in KVM.
So the cycles is valid, but clocksource is not reliable (why? let decide 
to a clocksource, by default assume they are all not stable), thus we 
must calculate time values for a guest each time its needed.
So, my proposal is to name the variable sightly differently: cs_reliable
and go like:
	if (clock->read_clock_with_stamp) {
		systime_snapshot->cs_reliable =
			clock->read_clock_with_stamp(
				&now, &systime_snapshot->cycles);
	} else {
		now = tk_clock_read(&tk->tkr_mono);
		systime_snapshot->cs_reliable = false;
		systime_snapshot->cycles = now;
	}
What do you think?

Thanks!

Denis
> 
>> I don't mind merging this "check stability" functionality with
>> read_clock_and_systime. Actually, I thought about having it there but
>> eventually split it to make the code more explicit
>> (detailed and understandable).
>> I'd like to use your approach but keep the variable name the same.
>>
>> Thanks for reviewing!
>>
>> Denis
>>>
>>> Paolo
>>>
>>>>            systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq;
>>>>            systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
>>>>            base_real = ktime_add(tk->tkr_mono.base,
>>>>                          tk_core.timekeeper.offs_real);
>>>>            base_raw = tk->tkr_raw.base;
>>>> +        base_boot = ktime_add(tk->tkr_mono.base,
>>>> +                      tk_core.timekeeper.offs_boot);
>>>>            nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
>>>>            nsec_raw  = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
>>>>        } while (read_seqcount_retry(&tk_core.seq, seq));
>>>>    -    systime_snapshot->cycles = now;
>>>>        systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
>>>>        systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
>>>> +    systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real);
>>>>    }
>>>>    EXPORT_SYMBOL_GPL(ktime_get_snapshot);
>>>>   
>>>
>>
>
Denis Plotnikov Aug. 1, 2017, 12:28 p.m. UTC | #6
On 01.08.2017 15:11, Denis Plotnikov wrote:
> 
> 
> On 01.08.2017 13:03, Paolo Bonzini wrote:
>> On 01/08/2017 11:30, Denis Plotnikov wrote:
>>>> - implementing kvm_clock_read_with_cycles (can be merged with patch 6)
>>>
>>> Having a stable clocksource for kvmklock means making kvmclock stable.
>>> The patch enables this functionality that's why I'd prefer to keep patch
>>> 6 separate
>>
>> Ok, though it depends on how you deal with cs_stable.
>>
>>>> - using ktime_get_snapshot in KVM (can be merged with patch 4?)
>>>
>>> agree, but want to keep 4 separate. Just to make the changes done
>>> logically consecutive in git tree.
>>
>> Fine by me.
>>
>>>> Maybe what we want is some kind of "bool cycles_valid", and then
>>>> read_clock_and_systime can return it:
>>>>
>>>>
>>>>          if (clock->read_clock_and_systime) {
>>>>              systime_snapshot->cycles_valid =
>>>>               clock->read_clock_and_systime(
>>>>                  &now, &systime_snapshot->cycles);
>>>>          } else {
>>>>              now = tk_clock_read(&tk->tkr_mono);
>>>>              systime_snapshot->cycles_valid = true;
>>>>              systime_snapshot->cycles = now;
>>>>          }
>>>>
>>>> ?  (This is honestly just a suggestion, which may be wrong depedning
>>>> on the answer to the two questions above).
>>>
>>> cs_stable means "there is no unexpected time jumps".
>>
>> But even for kvmclock there are no unexpected time jumps, the global
>> accumulator in pvclock_clocksource_read ensures that.  And the concept
>> is different from CLOCK_SOURCE_UNSTABLE which will basically blacklist
>> the clocksource for hrtimers.
>>
>> It seems a different concept to me, somewhat specific to
>> ktime_get_snapshot.  More precisely, the question is "is there a 1:1
>> mapping from cycles to nanoseconds?"---but if there is no such mapping
>> cycles is useless, hence my proposal of "cycles_valid".
>>
>> Thanks,
>>
>> Paolo
> In fact, this "cycles_valid" is going to be used for deciding whether to 
> use KVM masterclock or not. And if it's not we still want to know 
> cycles_stamp value to use it in KVM.
> So the cycles is valid, but clocksource is not reliable (why? let decide 
> to a clocksource, by default assume they are all not stable), thus we 
> must calculate time values for a guest each time its needed.
> So, my proposal is to name the variable sightly differently: cs_reliable
> and go like:
>      if (clock->read_clock_with_stamp) {
>          systime_snapshot->cs_reliable =
>              clock->read_clock_with_stamp(
>                  &now, &systime_snapshot->cycles);
>      } else {
>          now = tk_clock_read(&tk->tkr_mono);
>          systime_snapshot->cs_reliable = false;
>          systime_snapshot->cycles = now;
>      }
> What do you think?
> 
> Thanks!
> 
> Denis

How about this:
Let's name the variable cycles_reliable. if it's true then what 
systime_snapshot->cycles holds, is truly cycles, if false then it's 
something else -- some special care needed

Thanks!

Denis
>>
>>> I don't mind merging this "check stability" functionality with
>>> read_clock_and_systime. Actually, I thought about having it there but
>>> eventually split it to make the code more explicit
>>> (detailed and understandable).
>>> I'd like to use your approach but keep the variable name the same.
>>>
>>> Thanks for reviewing!
>>>
>>> Denis
>>>>
>>>> Paolo
>>>>
>>>>>            systime_snapshot->cs_was_changed_seq = 
>>>>> tk->cs_was_changed_seq;
>>>>>            systime_snapshot->clock_was_set_seq = 
>>>>> tk->clock_was_set_seq;
>>>>>            base_real = ktime_add(tk->tkr_mono.base,
>>>>>                          tk_core.timekeeper.offs_real);
>>>>>            base_raw = tk->tkr_raw.base;
>>>>> +        base_boot = ktime_add(tk->tkr_mono.base,
>>>>> +                      tk_core.timekeeper.offs_boot);
>>>>>            nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
>>>>>            nsec_raw  = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
>>>>>        } while (read_seqcount_retry(&tk_core.seq, seq));
>>>>>    -    systime_snapshot->cycles = now;
>>>>>        systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
>>>>>        systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
>>>>> +    systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real);
>>>>>    }
>>>>>    EXPORT_SYMBOL_GPL(ktime_get_snapshot);
>>>>
>>>
>>
>
Paolo Bonzini Aug. 1, 2017, 12:41 p.m. UTC | #7
On 01/08/2017 14:11, Denis Plotnikov wrote:
> In fact, this "cycles_valid" is going to be used for deciding whether to
> use KVM masterclock or not. And if it's not we still want to know
> cycles_stamp value to use it in KVM.

Why?  Neither pvclock_update_vm_gtod_copy nor kvm_pv_clock_pairing do
anything with the two variables that are passed by reference, if the
read returns false.  Hence my suggestion of calling it cycles_valid.

> So the cycles is valid, but clocksource is not reliable (why? let decide
> to a clocksource, by default assume they are all not stable), thus we
> must calculate time values for a guest each time its needed.
> So, my proposal is to name the variable sightly differently: cs_reliable
> and go like:
>     if (clock->read_clock_with_stamp) {
>         systime_snapshot->cs_reliable =
>             clock->read_clock_with_stamp(
>                 &now, &systime_snapshot->cycles);
>     } else {
>         now = tk_clock_read(&tk->tkr_mono);
>         systime_snapshot->cs_reliable = false;
>         systime_snapshot->cycles = now;
>     }
> What do you think?

I'm afraid you still have to define the meaning of "reliable".  (Though
I agree that the right default is false, that was a thinko on my side.
This also means that you need to define read_clock_with_stamp for the
TSC clocksource too).

Paolo
Denis Plotnikov Aug. 1, 2017, 12:46 p.m. UTC | #8
On 01.08.2017 15:41, Paolo Bonzini wrote:
> On 01/08/2017 14:11, Denis Plotnikov wrote:
>> In fact, this "cycles_valid" is going to be used for deciding whether to
>> use KVM masterclock or not. And if it's not we still want to know
>> cycles_stamp value to use it in KVM.
> 
> Why?  Neither pvclock_update_vm_gtod_copy nor kvm_pv_clock_pairing do
> anything with the two variables that are passed by reference, if the
> read returns false.  Hence my suggestion of calling it cycles_valid.
giving up
> 
>> So the cycles is valid, but clocksource is not reliable (why? let decide
>> to a clocksource, by default assume they are all not stable), thus we
>> must calculate time values for a guest each time its needed.
>> So, my proposal is to name the variable sightly differently: cs_reliable
>> and go like:
>>      if (clock->read_clock_with_stamp) {
>>          systime_snapshot->cs_reliable =
>>              clock->read_clock_with_stamp(
>>                  &now, &systime_snapshot->cycles);
>>      } else {
>>          now = tk_clock_read(&tk->tkr_mono);
>>          systime_snapshot->cs_reliable = false;
>>          systime_snapshot->cycles = now;
>>      }
>> What do you think?
> 
> I'm afraid you still have to define the meaning of "reliable".  (Though
> I agree that the right default is false, that was a thinko on my side.
> This also means that you need to define read_clock_with_stamp for the
> TSC clocksource too).
> 
> Paolo
> 
agree
   name: cycles_valid
   default: false
   read_clock_with_stamp: defined for tsc returning true

Thanks!

Patches are in progress.
They well be sent soon, nice and shiny!

Denis
Radim Krčmář Aug. 1, 2017, 5:47 p.m. UTC | #9
2017-08-01 15:46+0300, Denis Plotnikov:
> Patches are in progress.
> They well be sent soon, nice and shiny!

Please Cc timekeeper and x86 people too, thanks!
diff mbox

Patch

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index f5cfc5d..52156d9 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -82,7 +82,7 @@  static int kvm_set_wallclock(const struct timespec *now)
 	return -1;
 }
 
-static u64 kvm_clock_read(void)
+static inline u64 __kvm_clock_read(u64 *cycles)
 {
 	struct pvclock_vcpu_time_info *src;
 	u64 ret;
@@ -91,10 +91,14 @@  static u64 kvm_clock_read(void)
 	preempt_disable_notrace();
 	cpu = smp_processor_id();
 	src = &hv_clock[cpu].pvti;
-	ret = pvclock_clocksource_read(src, NULL);
+	ret = pvclock_clocksource_read(src, cycles);
 	preempt_enable_notrace();
 	return ret;
 }
+static u64 kvm_clock_read(void)
+{
+	return __kvm_clock_read(NULL);
+}
 
 static u64 kvm_clock_get_cycles(struct clocksource *cs)
 {
@@ -177,9 +181,15 @@  bool kvm_check_and_clear_guest_paused(void)
 	return ret;
 }
 
+static void kvm_clock_read_with_cycles(u64 *cycles, u64 *cycles_stamp)
+{
+	*cycles = __kvm_clock_read(cycles_stamp);
+}
+
 struct clocksource kvm_clock = {
 	.name = "kvm-clock",
 	.read = kvm_clock_get_cycles,
+	.read_with_cycles = kvm_clock_read_with_cycles,
 	.rating = 400,
 	.mask = CLOCKSOURCE_MASK(64),
 	.flags = CLOCK_SOURCE_IS_CONTINUOUS,
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 796d96b..5d655af 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1015,6 +1015,11 @@  static u64 read_tsc(struct clocksource *cs)
 	return (u64)rdtsc_ordered();
 }
 
+static bool is_tsc_stable(void)
+{
+	return !tsc_unstable;
+}
+
 static void tsc_cs_mark_unstable(struct clocksource *cs)
 {
 	if (tsc_unstable)
@@ -1043,6 +1048,7 @@  static struct clocksource clocksource_tsc = {
 	.name                   = "tsc",
 	.rating                 = 300,
 	.read                   = read_tsc,
+	.is_stable		= is_tsc_stable,
 	.mask                   = CLOCKSOURCE_MASK(64),
 	.flags                  = CLOCK_SOURCE_IS_CONTINUOUS |
 				  CLOCK_SOURCE_MUST_VERIFY,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6c97c82..496e731 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1643,22 +1643,32 @@  static int do_realtime(struct timespec *ts, u64 *cycle_now)
 /* returns true if host is using tsc clocksource */
 static bool kvm_get_time_and_clockread(s64 *kernel_ns, u64 *cycle_now)
 {
-	/* checked again under seqlock below */
-	if (pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC)
-		return false;
+	struct system_time_snapshot systime_snapshot;
+
+	ktime_get_snapshot(&systime_snapshot);
+
+	if (systime_snapshot.cs_stable) {
+		*kernel_ns = ktime_to_ns(systime_snapshot.boot);
+		*cycle_now = systime_snapshot.cycles;
+	}
 
-	return do_monotonic_boot(kernel_ns, cycle_now) == VCLOCK_TSC;
+	return systime_snapshot.cs_stable;
 }
 
 /* returns true if host is using tsc clocksource */
 static bool kvm_get_walltime_and_clockread(struct timespec *ts,
 					   u64 *cycle_now)
 {
-	/* checked again under seqlock below */
-	if (pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC)
-		return false;
+	struct system_time_snapshot systime_snapshot;
+
+	ktime_get_snapshot(&systime_snapshot);
+
+	if (systime_snapshot.cs_stable) {
+		*ts = ktime_to_timespec(systime_snapshot.real);
+		*cycle_now = systime_snapshot.cycles;
+	}
 
-	return do_realtime(ts, cycle_now) == VCLOCK_TSC;
+	return systime_snapshot.cs_stable;
 }
 #endif
 
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index a78cb18..f849b91 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -49,6 +49,7 @@  struct module;
  *				The ideal clocksource. A must-use where
  *				available.
  * @read:		returns a cycle value, passes clocksource as argument
+ * @read_with_cycles
  * @enable:		optional function to enable the clocksource
  * @disable:		optional function to disable the clocksource
  * @mask:		bitmask for two's complement
@@ -78,6 +79,8 @@  struct module;
  */
 struct clocksource {
 	u64 (*read)(struct clocksource *cs);
+	void (*read_with_cycles)(u64 *cycles, u64 *cycles_stamp);
+	bool (*is_stable)(void);
 	u64 mask;
 	u32 mult;
 	u32 shift;
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index ddc229f..21917fa 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -290,8 +290,10 @@  struct system_time_snapshot {
 	u64		cycles;
 	ktime_t		real;
 	ktime_t		raw;
+	ktime_t		boot;
 	unsigned int	clock_was_set_seq;
 	u8		cs_was_changed_seq;
+	bool		cs_stable;
 };
 
 /*
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index cedafa0..a2bfc12 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -953,27 +953,44 @@  void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
 	unsigned long seq;
 	ktime_t base_raw;
 	ktime_t base_real;
+	ktime_t base_boot;
 	u64 nsec_raw;
 	u64 nsec_real;
 	u64 now;
+	struct clocksource *clock;
 
 	WARN_ON_ONCE(timekeeping_suspended);
 
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
-		now = tk_clock_read(&tk->tkr_mono);
+		clock = tk->tkr_mono.clock;
+
+		if (clock->is_stable)
+			systime_snapshot->cs_stable = clock->is_stable();
+		else
+			systime_snapshot->cs_stable = false;
+
+		if (clock->read_with_cycles) {
+			clock->read_with_cycles(
+				&now, &systime_snapshot->cycles);
+		} else {
+			now = tk_clock_read(&tk->tkr_mono);
+			systime_snapshot->cycles = now;
+		}
 		systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq;
 		systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
 		base_real = ktime_add(tk->tkr_mono.base,
 				      tk_core.timekeeper.offs_real);
 		base_raw = tk->tkr_raw.base;
+		base_boot = ktime_add(tk->tkr_mono.base,
+				      tk_core.timekeeper.offs_boot);
 		nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
 		nsec_raw  = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
 	} while (read_seqcount_retry(&tk_core.seq, seq));
 
-	systime_snapshot->cycles = now;
 	systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
 	systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
+	systime_snapshot->boot = ktime_add_ns(base_boot, nsec_real);
 }
 EXPORT_SYMBOL_GPL(ktime_get_snapshot);