diff mbox

[v3,2/6] x86/time: implement tsc as clocksource

Message ID 1472042632-12883-3-git-send-email-joao.m.martins@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Joao Martins Aug. 24, 2016, 12:43 p.m. UTC
Recent x86/time changes improved a lot of the monotonicity in xen timekeeping,
making it much harder to observe time going backwards.  Although platform timer
can't be expected to be perfectly in sync with TSC and so get_s_time won't be
guaranteed to always return monotonically increasing values across cpus. This
is the case in some of the boxes I am testing with, observing sometimes ~100
warps (of very few nanoseconds each) after a few hours.

This patch introduces support for using TSC as platform time source
which is the highest resolution time and most performant to get (~20
nsecs). Though there are also several problems associated with its
usage, and there isn't a complete (and architecturally defined)
guarantee that all machines will provide reliable and monotonic TSC in
all cases (I believe Intel to be the only that can guarantee that?) For
this reason it's set with less priority when compared to HPET unless
adminstrator changes "clocksource" boot option to "tsc". Initializing
TSC clocksource requires all CPUs up to have the tsc reliability checks
performed. init_xen_time is called before all CPUs are up, and so we
would start with HPET at boot time, and switch later to TSC. The switch
then happens on verify_tsc_reliability initcall that is invoked when all
CPUs are up. When attempting to initialize TSC we also check for time
warps and if it has invariant TSC. And in case none of these conditions
are met, we keep the clocksource that was previously initialized on
init_xen_time. Note that while we deem reliable a CONSTANT_TSC with no
deep C-states, it might not always be the case, so we're conservative
and allow TSC to be used as platform timer only with invariant TSC.

Since b64438c7c ("x86/time: use correct (local) time stamp in
constant-TSC calibration fast path") updates to cpu time use local
stamps, which means platform timer is only used to seed the initial cpu
time. With clocksource=tsc there is no need to be in sync with another
clocksource, so we reseed the local/master stamps to be values of TSC
and update the platform time stamps accordingly. Time calibration is set
to 1sec after we switch to TSC, thus these stamps are reseeded to also
ensure monotonic returning values right after the point we switch to
TSC. This is also to avoid the possibility of having inconsistent
readings in this short period (i.e. until calibration fires).

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Changes since v2:
 - Suggest "HPET switching to TSC" only as an example as otherwise it
 would be misleading on platforms not having one.
 - Change init_tsctimer to skip all the tests and assume it's called
 only on reliable TSC conditions and no warps observed. Tidy
 initialization on verify_tsc_reliability as suggested by Konrad.
 - CONSTANT_TSC and max_cstate <= 2 case removed and only allow tsc
   clocksource in invariant TSC boxes.
 - Prefer omit !=0 on init_platform_timer for tsc case.
 - Change comment on init_platform_timer.
 - Add comment on plt_tsc declaration.
 - Reinit CPU time for all online cpus instead of just CPU 0.
 - Use rdtsc_ordered() as opposed to rdtsc()
 - Remove tsc_freq variable and set plt_tsc clocksource frequency
 with the refined tsc calibration.
 - Rework a bit the commit message.

Changes since v1:
 - s/printk/printk(XENLOG_INFO
 - Remove extra space on inner brackets
 - Add missing space around brackets
 - Defer TSC initialization when all CPUs are up.

Changes since RFC:
 - Spelling fixes in the commit message.
 - Remove unused clocksource_is_tsc variable and introduce it instead
 on the patch that uses it.
 - Move plt_tsc from second to last in the available clocksources.
---
 xen/arch/x86/time.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 81 insertions(+), 1 deletion(-)

Comments

Jan Beulich Aug. 25, 2016, 10:06 a.m. UTC | #1
>>> On 24.08.16 at 14:43, <joao.m.martins@oracle.com> wrote:
> This patch introduces support for using TSC as platform time source
> which is the highest resolution time and most performant to get (~20
> nsecs).

Is this indeed still the case with the recently added fences?

> Though there are also several problems associated with its
> usage, and there isn't a complete (and architecturally defined)
> guarantee that all machines will provide reliable and monotonic TSC in
> all cases (I believe Intel to be the only that can guarantee that?) For
> this reason it's set with less priority when compared to HPET unless
> adminstrator changes "clocksource" boot option to "tsc". Initializing
> TSC clocksource requires all CPUs up to have the tsc reliability checks
> performed. init_xen_time is called before all CPUs are up, and so we
> would start with HPET at boot time, and switch later to TSC. The switch
> then happens on verify_tsc_reliability initcall that is invoked when all
> CPUs are up. When attempting to initialize TSC we also check for time
> warps and if it has invariant TSC. And in case none of these conditions
> are met,

DYM "in case any of these conditions is not met"?

> we keep the clocksource that was previously initialized on
> init_xen_time. Note that while we deem reliable a CONSTANT_TSC with no
> deep C-states, it might not always be the case, so we're conservative
> and allow TSC to be used as platform timer only with invariant TSC.
> 
> Since b64438c7c ("x86/time: use correct (local) time stamp in
> constant-TSC calibration fast path") updates to cpu time use local
> stamps, which means platform timer is only used to seed the initial cpu
> time. With clocksource=tsc there is no need to be in sync with another
> clocksource, so we reseed the local/master stamps to be values of TSC
> and update the platform time stamps accordingly. Time calibration is set
> to 1sec after we switch to TSC, thus these stamps are reseeded to also
> ensure monotonic returning values right after the point we switch to
> TSC. This is also to avoid the possibility of having inconsistent
> readings in this short period (i.e. until calibration fires).
> 
> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
> ---
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> 
> Changes since v2:
>  - Suggest "HPET switching to TSC" only as an example as otherwise it
>  would be misleading on platforms not having one.

What does this refer to? The description still unconditionally talks
about HPET.

>  - Change init_tsctimer to skip all the tests and assume it's called
>  only on reliable TSC conditions and no warps observed. Tidy
>  initialization on verify_tsc_reliability as suggested by Konrad.

Which means that contrary to what you say in the cover letter
there's nothing to prevent it from being used when CPU hotplug
is possible. I don't think you should skip basic sanity checks, and
place entirely on the admin the burden of knowing whether the
option is safe to use.

> +static struct platform_timesource __initdata plt_tsc =
> +{
> +    .id = "tsc",
> +    .name = "TSC",
> +    .read_counter = read_tsc,
> +    .counter_bits = 64,
> +    /* Not called by init_platform_timer as it is not on the plt_timers array */
> +    .init = init_tsctimer,

I consider this comment misleading: It took me several rounds of
looking until I understood that you don't mean to say the pointer
gets filled here only to not leave a NULL one around. I'd prefer if
you removed it.

> @@ -604,7 +647,9 @@ static u64 __init init_platform_timer(void)
>      unsigned int i;
>      s64 rc = -1;
>  
> -    if ( opt_clocksource[0] != '\0' )
> +    /* clocksource=tsc is initialized via __initcalls (when CPUs are up). */
> +    if ( (opt_clocksource[0] != '\0') &&
> +         (strcmp(opt_clocksource, "tsc")) )

Stray parentheses around a function call.

> @@ -1481,6 +1526,40 @@ static int __init verify_tsc_reliability(void)
>                     __func__);
>              setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
>          }
> +        else if ( !strcmp(opt_clocksource, "tsc") )
> +        {
> +            int cpu;

unsigned int

> +            if ( try_platform_timer(&plt_tsc) <= 0 )
> +                return 0;

If you merged this into the if() above you could avoid this extra
return path, which in turn lowers the risk of something getting
added to the tail of the function and then not being taken care of
here.

> +            /*
> +             * Platform timer has changed and CPU time will only be updated
> +             * after we set again the calibration timer, which means we need to
> +             * seed again each local CPU time. At this stage TSC is known to be
> +             * reliable i.e. monotonically increasing across all CPUs so this
> +             * lets us remove the skew between platform timer and TSC, since
> +             * these are now effectively the same.
> +             */
> +            for_each_online_cpu( cpu )

Please decide whether you consider for_each_online_cpu a
control keyword like item. If you do, a blank is missing prior to the
opening parenthesis. If you don't, the blanks inside the parentheses
need to be dropped.

> +            {
> +                struct cpu_time *t = &per_cpu(cpu_time, cpu);
> +
> +                t->stamp.local_tsc = boot_tsc_stamp;
> +                t->stamp.local_stime = 0;
> +                t->stamp.local_stime = get_s_time_fixed(boot_tsc_stamp);
> +                t->stamp.master_stime = t->stamp.local_stime;
> +            }

Without any synchronization, how "good" are the chances that
this updating would race with something the other CPUs do?

> @@ -1528,6 +1607,7 @@ void __init early_time_init(void)
>  
>      preinit_pit();
>      tmp = init_platform_timer();
> +    plt_tsc.frequency = tmp;

How can this be the TSC frequency? init_platform_timer()
specifically skips the TSC. And I can see no other place where
plt_tsc.frequency gets initialized. Am I overlooking something?

Jan
Joao Martins Aug. 26, 2016, 3:11 p.m. UTC | #2
On 08/25/2016 11:06 AM, Jan Beulich wrote:
>>>> On 24.08.16 at 14:43, <joao.m.martins@oracle.com> wrote:
>> This patch introduces support for using TSC as platform time source
>> which is the highest resolution time and most performant to get (~20
>> nsecs).
> 
> Is this indeed still the case with the recently added fences?
Hmm, may be not as fast with the added fences, But definitely the fastest time
source. If you prefer I can remove the mention to time taken.

> 
>> Though there are also several problems associated with its
>> usage, and there isn't a complete (and architecturally defined)
>> guarantee that all machines will provide reliable and monotonic TSC in
>> all cases (I believe Intel to be the only that can guarantee that?) For
>> this reason it's set with less priority when compared to HPET unless
>> adminstrator changes "clocksource" boot option to "tsc". Initializing
>> TSC clocksource requires all CPUs up to have the tsc reliability checks
>> performed. init_xen_time is called before all CPUs are up, and so we
>> would start with HPET at boot time, and switch later to TSC. The switch
>> then happens on verify_tsc_reliability initcall that is invoked when all
>> CPUs are up. When attempting to initialize TSC we also check for time
>> warps and if it has invariant TSC. And in case none of these conditions
>> are met,
> 
> DYM "in case any of these conditions is not met"?
Yeah I will change it. My english isn't at my best these days.

> 
>> we keep the clocksource that was previously initialized on
>> init_xen_time. Note that while we deem reliable a CONSTANT_TSC with no
>> deep C-states, it might not always be the case, so we're conservative
>> and allow TSC to be used as platform timer only with invariant TSC.
>>
>> Since b64438c7c ("x86/time: use correct (local) time stamp in
>> constant-TSC calibration fast path") updates to cpu time use local
>> stamps, which means platform timer is only used to seed the initial cpu
>> time. With clocksource=tsc there is no need to be in sync with another
>> clocksource, so we reseed the local/master stamps to be values of TSC
>> and update the platform time stamps accordingly. Time calibration is set
>> to 1sec after we switch to TSC, thus these stamps are reseeded to also
>> ensure monotonic returning values right after the point we switch to
>> TSC. This is also to avoid the possibility of having inconsistent
>> readings in this short period (i.e. until calibration fires).
>>
>> Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
>> ---
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>
>> Changes since v2:
>>  - Suggest "HPET switching to TSC" only as an example as otherwise it
>>  would be misleading on platforms not having one.
> 
> What does this refer to? The description still unconditionally talks
> about HPET.
I remember adding "For example" in the beginning of the sentence. This was clearly a
distraction on my end (you also found another small mistake in the changelog of patch
3, despite having addressed the comment).

> 
>>  - Change init_tsctimer to skip all the tests and assume it's called
>>  only on reliable TSC conditions and no warps observed. Tidy
>>  initialization on verify_tsc_reliability as suggested by Konrad.
> 
> Which means that contrary to what you say in the cover letter
> there's nothing to prevent it from being used when CPU hotplug
> is possible.
Probably the cover letter wasn't completely clear on this. I mention that I split it
between Patch 2 and 5 (intent was for easier review), and you can see that I add
check in patch 5, plus preventing online of CPUs too.

> I don't think you should skip basic sanity checks, and
> place entirely on the admin the burden of knowing whether the
> option is safe to use.
Neither do I think it should be skipped. We mainly don't duplicate these checks. In
the end I am just running init_tsctimer, in the TSC_RELIABLE block and if no warps
are observed. So putting that in init_tsctimer would just duplicate the previously
done checks. Or would you still prefer as done in previous version i.e. all checks in
both init_tsctimer and verify_tsc_reliability?


>> +static struct platform_timesource __initdata plt_tsc =
>> +{
>> +    .id = "tsc",
>> +    .name = "TSC",
>> +    .read_counter = read_tsc,
>> +    .counter_bits = 64,
>> +    /* Not called by init_platform_timer as it is not on the plt_timers array */
>> +    .init = init_tsctimer,
> 
> I consider this comment misleading: It took me several rounds of
> looking until I understood that you don't mean to say the pointer
> gets filled here only to not leave a NULL one around. I'd prefer if
> you removed it.
> 
OK.

>> @@ -604,7 +647,9 @@ static u64 __init init_platform_timer(void)
>>      unsigned int i;
>>      s64 rc = -1;
>>  
>> -    if ( opt_clocksource[0] != '\0' )
>> +    /* clocksource=tsc is initialized via __initcalls (when CPUs are up). */
>> +    if ( (opt_clocksource[0] != '\0') &&
>> +         (strcmp(opt_clocksource, "tsc")) )
> 
> Stray parentheses around a function call.
Fixed.

> 
>> @@ -1481,6 +1526,40 @@ static int __init verify_tsc_reliability(void)
>>                     __func__);
>>              setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
>>          }
>> +        else if ( !strcmp(opt_clocksource, "tsc") )
>> +        {
>> +            int cpu;
> 
> unsigned int
Fixed.

> 
>> +            if ( try_platform_timer(&plt_tsc) <= 0 )
>> +                return 0;
> 
> If you merged this into the if() above you could avoid this extra
> return path, which in turn lowers the risk of something getting
> added to the tail of the function and then not being taken care of
> here.
Good point, Fixed.

> 
>> +            /*
>> +             * Platform timer has changed and CPU time will only be updated
>> +             * after we set again the calibration timer, which means we need to
>> +             * seed again each local CPU time. At this stage TSC is known to be
>> +             * reliable i.e. monotonically increasing across all CPUs so this
>> +             * lets us remove the skew between platform timer and TSC, since
>> +             * these are now effectively the same.
>> +             */
>> +            for_each_online_cpu( cpu )
> 
> Please decide whether you consider for_each_online_cpu a
> control keyword like item. If you do, a blank is missing prior to the
> opening parenthesis. If you don't, the blanks inside the parentheses
> need to be dropped.
I will go with the control keyword like style.

>> +            {
>> +                struct cpu_time *t = &per_cpu(cpu_time, cpu);
>> +
>> +                t->stamp.local_tsc = boot_tsc_stamp;
>> +                t->stamp.local_stime = 0;
>> +                t->stamp.local_stime = get_s_time_fixed(boot_tsc_stamp);
>> +                t->stamp.master_stime = t->stamp.local_stime;
>> +            }
> 
> Without any synchronization, how "good" are the chances that
> this updating would race with something the other CPUs do?

I assumed that at this stage init calls are still being invoked that we update all
cpus time infos, but maybe it's a misconception. I can have this part in one function
and have it done on_selected_cpus() and wait until all cpu time structures get
updated. That perhaps would be better?

> 
>> @@ -1528,6 +1607,7 @@ void __init early_time_init(void)
>>  
>>      preinit_pit();
>>      tmp = init_platform_timer();
>> +    plt_tsc.frequency = tmp;
> 
> How can this be the TSC frequency? init_platform_timer()
> specifically skips the TSC. And I can see no other place where
> plt_tsc.frequency gets initialized. Am I overlooking something?
> 
That's the calibrated TSC frequency. Before I was setting pts->frequency in
init_tsctimer through a temporary variable called tsc_freq. So I thought I could just
drop the variable and set plt_tsc directly. The difference though from previous
versions is that since commit 9334029 this value is returned from platform time
source init() and calibrated against platform timer, instead of always against PIT.

Joao
Jan Beulich Aug. 29, 2016, 9:36 a.m. UTC | #3
>>> On 26.08.16 at 17:11, <joao.m.martins@oracle.com> wrote:
> On 08/25/2016 11:06 AM, Jan Beulich wrote:
>>>>> On 24.08.16 at 14:43, <joao.m.martins@oracle.com> wrote:
>>> This patch introduces support for using TSC as platform time source
>>> which is the highest resolution time and most performant to get (~20
>>> nsecs).
>> 
>> Is this indeed still the case with the recently added fences?
> Hmm, may be not as fast with the added fences, But definitely the fastest 
> time
> source. If you prefer I can remove the mention to time taken.

I'd say either you re-measure it with the added fences, or remove it
as potentially being stale/wrong.

>>>  - Change init_tsctimer to skip all the tests and assume it's called
>>>  only on reliable TSC conditions and no warps observed. Tidy
>>>  initialization on verify_tsc_reliability as suggested by Konrad.
>> 
>> Which means that contrary to what you say in the cover letter
>> there's nothing to prevent it from being used when CPU hotplug
>> is possible.
> Probably the cover letter wasn't completely clear on this. I mention that I split it
> between Patch 2 and 5 (intent was for easier review), and you can see that I add
> check in patch 5, plus preventing online of CPUs too.
> 
>> I don't think you should skip basic sanity checks, and
>> place entirely on the admin the burden of knowing whether the
>> option is safe to use.
> Neither do I think it should be skipped. We mainly don't duplicate these checks. In
> the end I am just running init_tsctimer, in the TSC_RELIABLE block and if no warps
> are observed. So putting that in init_tsctimer would just duplicate the previously
> done checks. Or would you still prefer as done in previous version i.e. all checks in
> both init_tsctimer and verify_tsc_reliability?

I'm not sure they're needed in both places; what you need to make
certain is that they're reliably gone through, and that this happens
early enough.

As to breaking this off into the later patch - please don't forget
that this (as any) series may get applied in pieces, so deferring a
crucial check to a later patch is not acceptable. If you mean things
to be split for easier review you may check whether you can find
a way to add the check in q prereq patch.

>>> +            {
>>> +                struct cpu_time *t = &per_cpu(cpu_time, cpu);
>>> +
>>> +                t->stamp.local_tsc = boot_tsc_stamp;
>>> +                t->stamp.local_stime = 0;
>>> +                t->stamp.local_stime = get_s_time_fixed(boot_tsc_stamp);
>>> +                t->stamp.master_stime = t->stamp.local_stime;
>>> +            }
>> 
>> Without any synchronization, how "good" are the chances that
>> this updating would race with something the other CPUs do?
> 
> I assumed that at this stage init calls are still being invoked that we update all
> cpus time infos, but maybe it's a misconception. I can have this part in one function
> and have it done on_selected_cpus() and wait until all cpu time structures get
> updated. That perhaps would be better?

I think so - even if the risk of thee being a race right now is rather
low, that way you'd avoid this becoming a problem if secondary
CPUs get made do something other than idling at this point in time.

>>> @@ -1528,6 +1607,7 @@ void __init early_time_init(void)
>>>  
>>>      preinit_pit();
>>>      tmp = init_platform_timer();
>>> +    plt_tsc.frequency = tmp;
>> 
>> How can this be the TSC frequency? init_platform_timer()
>> specifically skips the TSC. And I can see no other place where
>> plt_tsc.frequency gets initialized. Am I overlooking something?
>> 
> That's the calibrated TSC frequency. Before I was setting pts->frequency in
> init_tsctimer through a temporary variable called tsc_freq. So I thought I could just
> drop the variable and set plt_tsc directly. The difference though from previous
> versions is that since commit 9334029 this value is returned from platform time
> source init() and calibrated against platform timer, instead of always against PIT.

This doesn't seem to answer my primary question: Where does
plt_tsc.frequency get initialized now? try_platform_timer() and
init_platform_timer() don't - PIT and ACPI timer use static
initializers, and HEPT gets taken care of in init_hpet(), and hence so
should init_tsctimer() do (instead of just returning this apparently
never initialized value). And that's unrelated to what ->init() returns.

Jan
Joao Martins Aug. 30, 2016, 12:08 p.m. UTC | #4
On 08/29/2016 10:36 AM, Jan Beulich wrote:
>>>> On 26.08.16 at 17:11, <joao.m.martins@oracle.com> wrote:
>> On 08/25/2016 11:06 AM, Jan Beulich wrote:
>>>>>> On 24.08.16 at 14:43, <joao.m.martins@oracle.com> wrote:
>>>> This patch introduces support for using TSC as platform time source
>>>> which is the highest resolution time and most performant to get (~20
>>>> nsecs).
>>>
>>> Is this indeed still the case with the recently added fences?
>> Hmm, may be not as fast with the added fences, But definitely the fastest 
>> time
>> source. If you prefer I can remove the mention to time taken.
> 
> I'd say either you re-measure it with the added fences, or remove it
> as potentially being stale/wrong.
OK.

>>>>  - Change init_tsctimer to skip all the tests and assume it's called
>>>>  only on reliable TSC conditions and no warps observed. Tidy
>>>>  initialization on verify_tsc_reliability as suggested by Konrad.
>>>
>>> Which means that contrary to what you say in the cover letter
>>> there's nothing to prevent it from being used when CPU hotplug
>>> is possible.
>> Probably the cover letter wasn't completely clear on this. I mention that I split it
>> between Patch 2 and 5 (intent was for easier review), and you can see that I add
>> check in patch 5, plus preventing online of CPUs too.
>>
>>> I don't think you should skip basic sanity checks, and
>>> place entirely on the admin the burden of knowing whether the
>>> option is safe to use.
>> Neither do I think it should be skipped. We mainly don't duplicate these checks. In
>> the end I am just running init_tsctimer, in the TSC_RELIABLE block and if no warps
>> are observed. So putting that in init_tsctimer would just duplicate the previously
>> done checks. Or would you still prefer as done in previous version i.e. all checks in
>> both init_tsctimer and verify_tsc_reliability?
> 
> I'm not sure they're needed in both places; what you need to make
> certain is that they're reliably gone through, and that this happens
> early enough.
They are reliably gone through and we get to avoid duplication of checks. Unless
there's a preference to re-add these checks in init_tsctimer, I'll keep these as is.
verify_tsc_reliability(...) needs to perform this checks and init_tsctimer is only
called these reliable TSC conditions.

> As to breaking this off into the later patch - please don't forget
> that this (as any) series may get applied in pieces, so deferring a
> crucial check to a later patch is not acceptable. If you mean things
> to be split for easier review you may check whether you can find
> a way to add the check in q prereq patch.
OK, note taken. I'll get this check moved from patch 5 to here, as it's the place
where it should be.


>>>> +            {
>>>> +                struct cpu_time *t = &per_cpu(cpu_time, cpu);
>>>> +
>>>> +                t->stamp.local_tsc = boot_tsc_stamp;
>>>> +                t->stamp.local_stime = 0;
>>>> +                t->stamp.local_stime = get_s_time_fixed(boot_tsc_stamp);
>>>> +                t->stamp.master_stime = t->stamp.local_stime;
>>>> +            }
>>>
>>> Without any synchronization, how "good" are the chances that
>>> this updating would race with something the other CPUs do?
>>
>> I assumed that at this stage init calls are still being invoked that we update all
>> cpus time infos, but maybe it's a misconception. I can have this part in one function
>> and have it done on_selected_cpus() and wait until all cpu time structures get
>> updated. That perhaps would be better?
> 
> I think so - even if the risk of thee being a race right now is rather
> low, that way you'd avoid this becoming a problem if secondary
> CPUs get made do something other than idling at this point in time.
Indeed, I'll do it that way then.

>>>> @@ -1528,6 +1607,7 @@ void __init early_time_init(void)
>>>>  
>>>>      preinit_pit();
>>>>      tmp = init_platform_timer();
>>>> +    plt_tsc.frequency = tmp;
>>>
>>> How can this be the TSC frequency? init_platform_timer()
>>> specifically skips the TSC. And I can see no other place where
>>> plt_tsc.frequency gets initialized. Am I overlooking something?
>>>
>> That's the calibrated TSC frequency. Before I was setting pts->frequency in
>> init_tsctimer through a temporary variable called tsc_freq. So I thought I could just
>> drop the variable and set plt_tsc directly. The difference though from previous
>> versions is that since commit 9334029 this value is returned from platform time
>> source init() and calibrated against platform timer, instead of always against PIT.
> 
> This doesn't seem to answer my primary question: Where does
> plt_tsc.frequency get initialized now? try_platform_timer() and
> init_platform_timer() don't - PIT and ACPI timer use static
> initializers, and HEPT gets taken care of in init_hpet(), and hence so
> should init_tsctimer() do (instead of just returning this apparently
> never initialized value). And that's unrelated to what ->init() returns.

plt_tsc.frequency is certainly initialized in early_time_init(). And then on
try_platform_timer we have plt_src = *pts (pts would be a pointer to plt_tsc when
called from verify_tsc_reliability()).

IOW, effectively I changed from this:

#v2

static u64 tsc_freq;

static s64 __init init_tsctimer(struct platform_timesource *pts)
{
   ...
   pts->frequency = tsc_freq;
   return 1;
}

...

void __init early_time_init(void)
{
   u64 tmp = init_pit_and_calibrate_tsc();

   tsc_freq = tmp;
}

*To:*

#v3

static s64 __init init_tsctimer(struct platform_timesource *pts)
{
    return pts->frequency;
}


void __init early_time_init(void)
{
    ...
    tmp = init_platform_timer();
    plt_tsc.frequency = tmp;
}

Does this answer your question? Note that my purpose with the change, was to remove
the tsc_freq temporary variable. If it makes things less clear (as in doing things
differently from other platform timers) I can go back to v2 in this aspect.

Joao
Jan Beulich Aug. 30, 2016, 12:30 p.m. UTC | #5
>>> On 30.08.16 at 14:08, <joao.m.martins@oracle.com> wrote:
> On 08/29/2016 10:36 AM, Jan Beulich wrote:
>>>>> On 26.08.16 at 17:11, <joao.m.martins@oracle.com> wrote:
>>> On 08/25/2016 11:06 AM, Jan Beulich wrote:
>>>>>>> On 24.08.16 at 14:43, <joao.m.martins@oracle.com> wrote:
>>>>>  - Change init_tsctimer to skip all the tests and assume it's called
>>>>>  only on reliable TSC conditions and no warps observed. Tidy
>>>>>  initialization on verify_tsc_reliability as suggested by Konrad.
>>>>
>>>> Which means that contrary to what you say in the cover letter
>>>> there's nothing to prevent it from being used when CPU hotplug
>>>> is possible.
>>> Probably the cover letter wasn't completely clear on this. I mention that I split it
>>> between Patch 2 and 5 (intent was for easier review), and you can see that I add
>>> check in patch 5, plus preventing online of CPUs too.
>>>
>>>> I don't think you should skip basic sanity checks, and
>>>> place entirely on the admin the burden of knowing whether the
>>>> option is safe to use.
>>> Neither do I think it should be skipped. We mainly don't duplicate these checks. In
>>> the end I am just running init_tsctimer, in the TSC_RELIABLE block and if no warps
>>> are observed. So putting that in init_tsctimer would just duplicate the previously
>>> done checks. Or would you still prefer as done in previous version i.e. all checks in
>>> both init_tsctimer and verify_tsc_reliability?
>> 
>> I'm not sure they're needed in both places; what you need to make
>> certain is that they're reliably gone through, and that this happens
>> early enough.
> They are reliably gone through and we get to avoid duplication of checks. Unless
> there's a preference to re-add these checks in init_tsctimer, I'll keep these as is.
> verify_tsc_reliability(...) needs to perform this checks and init_tsctimer is only
> called these reliable TSC conditions.

But please make sure there's a comment in (or ahead of)
init_tsctimer() pointing out where the apparently missing checks
are. This will help both review and future readers.

>>>>> @@ -1528,6 +1607,7 @@ void __init early_time_init(void)
>>>>>  
>>>>>      preinit_pit();
>>>>>      tmp = init_platform_timer();
>>>>> +    plt_tsc.frequency = tmp;
>>>>
>>>> How can this be the TSC frequency? init_platform_timer()
>>>> specifically skips the TSC. And I can see no other place where
>>>> plt_tsc.frequency gets initialized. Am I overlooking something?
>>>>
>>> That's the calibrated TSC frequency. Before I was setting pts->frequency in
>>> init_tsctimer through a temporary variable called tsc_freq. So I thought I could just
>>> drop the variable and set plt_tsc directly. The difference though from previous
>>> versions is that since commit 9334029 this value is returned from platform time
>>> source init() and calibrated against platform timer, instead of always against PIT.
>> 
>> This doesn't seem to answer my primary question: Where does
>> plt_tsc.frequency get initialized now? try_platform_timer() and
>> init_platform_timer() don't - PIT and ACPI timer use static
>> initializers, and HEPT gets taken care of in init_hpet(), and hence so
>> should init_tsctimer() do (instead of just returning this apparently
>> never initialized value). And that's unrelated to what ->init() returns.
> 
> plt_tsc.frequency is certainly initialized in early_time_init(). And then on
> try_platform_timer we have plt_src = *pts (pts would be a pointer to plt_tsc 
> when
> called from verify_tsc_reliability()).
> 
> IOW, effectively I changed from this:
> 
> #v2
> 
> static u64 tsc_freq;
> 
> static s64 __init init_tsctimer(struct platform_timesource *pts)
> {
>    ...
>    pts->frequency = tsc_freq;
>    return 1;
> }
> 
> ...
> 
> void __init early_time_init(void)
> {
>    u64 tmp = init_pit_and_calibrate_tsc();
> 
>    tsc_freq = tmp;
> }
> 
> *To:*
> 
> #v3
> 
> static s64 __init init_tsctimer(struct platform_timesource *pts)
> {
>     return pts->frequency;
> }
> 
> 
> void __init early_time_init(void)
> {
>     ...
>     tmp = init_platform_timer();
>     plt_tsc.frequency = tmp;
> }
> 
> Does this answer your question? Note that my purpose with the change, was to remove
> the tsc_freq temporary variable. If it makes things less clear (as in doing things
> differently from other platform timers) I can go back to v2 in this aspect.

Ah, I see now how I got confused. This once again depends on TSC
to possible become the platform timer only much later than when
early_time_init() runs.

Jan
Joao Martins Aug. 30, 2016, 1:59 p.m. UTC | #6
On 08/30/2016 01:30 PM, Jan Beulich wrote:
>>>> On 30.08.16 at 14:08, <joao.m.martins@oracle.com> wrote:
>> On 08/29/2016 10:36 AM, Jan Beulich wrote:
>>>>>> On 26.08.16 at 17:11, <joao.m.martins@oracle.com> wrote:
>>>> On 08/25/2016 11:06 AM, Jan Beulich wrote:
>>>>>>>> On 24.08.16 at 14:43, <joao.m.martins@oracle.com> wrote:
>>>>>>  - Change init_tsctimer to skip all the tests and assume it's called
>>>>>>  only on reliable TSC conditions and no warps observed. Tidy
>>>>>>  initialization on verify_tsc_reliability as suggested by Konrad.
>>>>>
>>>>> Which means that contrary to what you say in the cover letter
>>>>> there's nothing to prevent it from being used when CPU hotplug
>>>>> is possible.
>>>> Probably the cover letter wasn't completely clear on this. I mention that I split it
>>>> between Patch 2 and 5 (intent was for easier review), and you can see that I add
>>>> check in patch 5, plus preventing online of CPUs too.
>>>>
>>>>> I don't think you should skip basic sanity checks, and
>>>>> place entirely on the admin the burden of knowing whether the
>>>>> option is safe to use.
>>>> Neither do I think it should be skipped. We mainly don't duplicate these checks. In
>>>> the end I am just running init_tsctimer, in the TSC_RELIABLE block and if no warps
>>>> are observed. So putting that in init_tsctimer would just duplicate the previously
>>>> done checks. Or would you still prefer as done in previous version i.e. all checks in
>>>> both init_tsctimer and verify_tsc_reliability?
>>>
>>> I'm not sure they're needed in both places; what you need to make
>>> certain is that they're reliably gone through, and that this happens
>>> early enough.
>> They are reliably gone through and we get to avoid duplication of checks. Unless
>> there's a preference to re-add these checks in init_tsctimer, I'll keep these as is.
>> verify_tsc_reliability(...) needs to perform this checks and init_tsctimer is only
>> called these reliable TSC conditions.
> 
> But please make sure there's a comment in (or ahead of)
> init_tsctimer() pointing out where the apparently missing checks
> are. This will help both review and future readers.
Okay, I'll add a comment in init_tsctimer().

>>>>>> @@ -1528,6 +1607,7 @@ void __init early_time_init(void)
>>>>>>  
>>>>>>      preinit_pit();
>>>>>>      tmp = init_platform_timer();
>>>>>> +    plt_tsc.frequency = tmp;
>>>>>
>>>>> How can this be the TSC frequency? init_platform_timer()
>>>>> specifically skips the TSC. And I can see no other place where
>>>>> plt_tsc.frequency gets initialized. Am I overlooking something?
>>>>>
>>>> That's the calibrated TSC frequency. Before I was setting pts->frequency in
>>>> init_tsctimer through a temporary variable called tsc_freq. So I thought I could just
>>>> drop the variable and set plt_tsc directly. The difference though from previous
>>>> versions is that since commit 9334029 this value is returned from platform time
>>>> source init() and calibrated against platform timer, instead of always against PIT.
>>>
>>> This doesn't seem to answer my primary question: Where does
>>> plt_tsc.frequency get initialized now? try_platform_timer() and
>>> init_platform_timer() don't - PIT and ACPI timer use static
>>> initializers, and HEPT gets taken care of in init_hpet(), and hence so
>>> should init_tsctimer() do (instead of just returning this apparently
>>> never initialized value). And that's unrelated to what ->init() returns.
>>
>> plt_tsc.frequency is certainly initialized in early_time_init(). And then on
>> try_platform_timer we have plt_src = *pts (pts would be a pointer to plt_tsc 
>> when
>> called from verify_tsc_reliability()).
>>
>> IOW, effectively I changed from this:
>>
>> #v2
>>
>> static u64 tsc_freq;
>>
>> static s64 __init init_tsctimer(struct platform_timesource *pts)
>> {
>>    ...
>>    pts->frequency = tsc_freq;
>>    return 1;
>> }
>>
>> ...
>>
>> void __init early_time_init(void)
>> {
>>    u64 tmp = init_pit_and_calibrate_tsc();
>>
>>    tsc_freq = tmp;
>> }
>>
>> *To:*
>>
>> #v3
>>
>> static s64 __init init_tsctimer(struct platform_timesource *pts)
>> {
>>     return pts->frequency;
>> }
>>
>>
>> void __init early_time_init(void)
>> {
>>     ...
>>     tmp = init_platform_timer();
>>     plt_tsc.frequency = tmp;
>> }
>>
>> Does this answer your question? Note that my purpose with the change, was to remove
>> the tsc_freq temporary variable. If it makes things less clear (as in doing things
>> differently from other platform timers) I can go back to v2 in this aspect.
> 
> Ah, I see now how I got confused. This once again depends on TSC
> to possible become the platform timer only much later than when
> early_time_init() runs.
>
True, but here we're only initializing frequency at this point, yet it's only used if
reliable tsc conditions do verify.

Joao
diff mbox

Patch

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index 6750e46..b2a11a8 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -475,6 +475,30 @@  uint64_t ns_to_acpi_pm_tick(uint64_t ns)
 }
 
 /************************************************************
+ * PLATFORM TIMER 4: TSC
+ */
+
+static s64 __init init_tsctimer(struct platform_timesource *pts)
+{
+    return pts->frequency;
+}
+
+static u64 read_tsc(void)
+{
+    return rdtsc_ordered();
+}
+
+static struct platform_timesource __initdata plt_tsc =
+{
+    .id = "tsc",
+    .name = "TSC",
+    .read_counter = read_tsc,
+    .counter_bits = 64,
+    /* Not called by init_platform_timer as it is not on the plt_timers array */
+    .init = init_tsctimer,
+};
+
+/************************************************************
  * GENERIC PLATFORM TIMER INFRASTRUCTURE
  */
 
@@ -576,6 +600,21 @@  static void resume_platform_timer(void)
     plt_stamp = plt_src.read_counter();
 }
 
+static void __init reset_platform_timer(void)
+{
+    /* Deactivate any timers running */
+    kill_timer(&plt_overflow_timer);
+    kill_timer(&calibration_timer);
+
+    /* Reset counters and stamps */
+    spin_lock_irq(&platform_timer_lock);
+    plt_stamp = 0;
+    plt_stamp64 = 0;
+    platform_timer_stamp = 0;
+    stime_platform_stamp = 0;
+    spin_unlock_irq(&platform_timer_lock);
+}
+
 static s64 __init try_platform_timer(struct platform_timesource *pts)
 {
     s64 rc = pts->init(pts);
@@ -583,6 +622,10 @@  static s64 __init try_platform_timer(struct platform_timesource *pts)
     if ( rc <= 0 )
         return rc;
 
+    /* We have a platform timesource already so reset it */
+    if ( plt_src.counter_bits != 0 )
+        reset_platform_timer();
+
     plt_mask = (u64)~0ull >> (64 - pts->counter_bits);
 
     set_time_scale(&plt_scale, pts->frequency);
@@ -604,7 +647,9 @@  static u64 __init init_platform_timer(void)
     unsigned int i;
     s64 rc = -1;
 
-    if ( opt_clocksource[0] != '\0' )
+    /* clocksource=tsc is initialized via __initcalls (when CPUs are up). */
+    if ( (opt_clocksource[0] != '\0') &&
+         (strcmp(opt_clocksource, "tsc")) )
     {
         for ( i = 0; i < ARRAY_SIZE(plt_timers); i++ )
         {
@@ -1481,6 +1526,40 @@  static int __init verify_tsc_reliability(void)
                    __func__);
             setup_clear_cpu_cap(X86_FEATURE_TSC_RELIABLE);
         }
+        else if ( !strcmp(opt_clocksource, "tsc") )
+        {
+            int cpu;
+
+            if ( try_platform_timer(&plt_tsc) <= 0 )
+                return 0;
+
+            /*
+             * Platform timer has changed and CPU time will only be updated
+             * after we set again the calibration timer, which means we need to
+             * seed again each local CPU time. At this stage TSC is known to be
+             * reliable i.e. monotonically increasing across all CPUs so this
+             * lets us remove the skew between platform timer and TSC, since
+             * these are now effectively the same.
+             */
+            for_each_online_cpu( cpu )
+            {
+                struct cpu_time *t = &per_cpu(cpu_time, cpu);
+
+                t->stamp.local_tsc = boot_tsc_stamp;
+                t->stamp.local_stime = 0;
+                t->stamp.local_stime = get_s_time_fixed(boot_tsc_stamp);
+                t->stamp.master_stime = t->stamp.local_stime;
+            }
+
+            platform_timer_stamp = plt_stamp64;
+            stime_platform_stamp = get_s_time_fixed(plt_stamp64);
+
+            printk(XENLOG_INFO "Switched to Platform timer %s TSC\n",
+                   freq_string(plt_src.frequency));
+
+            init_timer(&calibration_timer, time_calibration, NULL, 0);
+            set_timer(&calibration_timer, NOW() + EPOCH);
+        }
     }
 
     return 0;
@@ -1528,6 +1607,7 @@  void __init early_time_init(void)
 
     preinit_pit();
     tmp = init_platform_timer();
+    plt_tsc.frequency = tmp;
 
     set_time_scale(&t->tsc_scale, tmp);
     t->stamp.local_tsc = boot_tsc_stamp;