diff mbox

[v3,3/6] x86/time: streamline platform time init on plt_update()

Message ID 1472042632-12883-4-git-send-email-joao.m.martins@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Joao Martins Aug. 24, 2016, 12:43 p.m. UTC
And use to initialize platform time solely for clocksource=tsc,
as opposed to initializing platform overflow timer, which would
only fire in ~180 years (on 2.2 Ghz Broadwell processor).

Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>

Changes since v2:
 - Remove pointless intializer and replace it with the
 platform_time init return.
 - s/plt_init/plt_update/g
---
 xen/arch/x86/time.c | 37 +++++++++++++++++++++++++++++++------
 1 file changed, 31 insertions(+), 6 deletions(-)

Comments

Jan Beulich Aug. 25, 2016, 10:13 a.m. UTC | #1
>>> On 24.08.16 at 14:43, <joao.m.martins@oracle.com> wrote:
> And use to initialize platform time solely for clocksource=tsc,
> as opposed to initializing platform overflow timer, which would
> only fire in ~180 years (on 2.2 Ghz Broadwell processor).

Do we really want to risk this time period going down by two orders
of magnitude? Is there anything that's really expensive in setting the
overflow timer in the far distant future?

> Changes since v2:
>  - Remove pointless intializer and replace it with the
>  platform_time init return.

Does this really apply to this patch?

> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -526,17 +526,31 @@ static s_time_t __read_platform_stime(u64 
> platform_time)
>      return (stime_platform_stamp + scale_delta(diff, &plt_scale));
>  }
>  
> +static void __plt_update(void)

A single leading underscore only, please.

> @@ -630,10 +644,21 @@ static s64 __init try_platform_timer(struct platform_timesource *pts)
>  
>      set_time_scale(&plt_scale, pts->frequency);
>  
> -    plt_overflow_period = scale_delta(
> -        1ull << (pts->counter_bits - 1), &plt_scale);
>      plt_src = *pts;
>  
> +    if ( pts == &plt_tsc )
> +    {
> +        plt_update();
> +    }

Unnecessary braces.

Jan
Joao Martins Aug. 26, 2016, 3:12 p.m. UTC | #2
On 08/25/2016 11:13 AM, Jan Beulich wrote:
>>>> On 24.08.16 at 14:43, <joao.m.martins@oracle.com> wrote:
>> And use to initialize platform time solely for clocksource=tsc,
>> as opposed to initializing platform overflow timer, which would
>> only fire in ~180 years (on 2.2 Ghz Broadwell processor).
> 
> Do we really want to risk this time period going down by two orders
> of magnitude? Is there anything that's really expensive in setting the
> overflow timer in the far distant future?
It wasn't about cost but rather setting the timer in a so distant future. I could
decrease to an year time, month or day. But do you think we really need that overflow
handling for TSC?

>> Changes since v2:
>>  - Remove pointless intializer and replace it with the
>>  platform_time init return.
> 
> Does this really apply to this patch?
Oh no, The comment should have been something like:

"Remove clocksource_is_tsc in favor of comparing pts against plt_tsc"

> 
>> --- a/xen/arch/x86/time.c
>> +++ b/xen/arch/x86/time.c
>> @@ -526,17 +526,31 @@ static s_time_t __read_platform_stime(u64 
>> platform_time)
>>      return (stime_platform_stamp + scale_delta(diff, &plt_scale));
>>  }
>>  
>> +static void __plt_update(void)
> 
> A single leading underscore only, please.
Fixed.

> 
>> @@ -630,10 +644,21 @@ static s64 __init try_platform_timer(struct platform_timesource *pts)
>>  
>>      set_time_scale(&plt_scale, pts->frequency);
>>  
>> -    plt_overflow_period = scale_delta(
>> -        1ull << (pts->counter_bits - 1), &plt_scale);
>>      plt_src = *pts;
>>  
>> +    if ( pts == &plt_tsc )
>> +    {
>> +        plt_update();
>> +    }
> 
> Unnecessary braces.
Fixed.
Jan Beulich Aug. 29, 2016, 9:41 a.m. UTC | #3
>>> On 26.08.16 at 17:12, <joao.m.martins@oracle.com> wrote:
> On 08/25/2016 11:13 AM, Jan Beulich wrote:
>>>>> On 24.08.16 at 14:43, <joao.m.martins@oracle.com> wrote:
>>> And use to initialize platform time solely for clocksource=tsc,
>>> as opposed to initializing platform overflow timer, which would
>>> only fire in ~180 years (on 2.2 Ghz Broadwell processor).
>> 
>> Do we really want to risk this time period going down by two orders
>> of magnitude? Is there anything that's really expensive in setting the
>> overflow timer in the far distant future?
> It wasn't about cost but rather setting the timer in a so distant future. I could
> decrease to an year time, month or day. But do you think we really need that overflow
> handling for TSC?

I think we shouldn't introduce latent issues, no matter how unlikely
we may deem it for them to actually become active ones. Just
consider how unlikely it would have seemed to someone in the
i586 days (when the TSC got introduced) that clock speeds might
ever cross the 4GHz boundary. As a consequence long ago Linux
did use a 32-bit variable for it. It got noticed early enough before
processors got really close to that boundary, but it demonstrates
the fundamental issue. And yes, processor speeds have stopped
to always grow (at least in the x86 space), but that's not a rule
set in stone.

Jan
Joao Martins Aug. 30, 2016, 12:10 p.m. UTC | #4
On 08/29/2016 10:41 AM, Jan Beulich wrote:
>>>> On 26.08.16 at 17:12, <joao.m.martins@oracle.com> wrote:
>> On 08/25/2016 11:13 AM, Jan Beulich wrote:
>>>>>> On 24.08.16 at 14:43, <joao.m.martins@oracle.com> wrote:
>>>> And use to initialize platform time solely for clocksource=tsc,
>>>> as opposed to initializing platform overflow timer, which would
>>>> only fire in ~180 years (on 2.2 Ghz Broadwell processor).
>>>
>>> Do we really want to risk this time period going down by two orders
>>> of magnitude? Is there anything that's really expensive in setting the
>>> overflow timer in the far distant future?
>> It wasn't about cost but rather setting the timer in a so distant future. I could
>> decrease to an year time, month or day. But do you think we really need that overflow
>> handling for TSC?
> 
> I think we shouldn't introduce latent issues, no matter how unlikely
> we may deem it for them to actually become active ones. Just
> consider how unlikely it would have seemed to someone in the
> i586 days (when the TSC got introduced) that clock speeds might
> ever cross the 4GHz boundary. As a consequence long ago Linux
> did use a 32-bit variable for it. It got noticed early enough before
> processors got really close to that boundary, but it demonstrates
> the fundamental issue. And yes, processor speeds have stopped
> to always grow (at least in the x86 space), but that's not a rule
> set in stone.

Thanks for the insight.

What would be a preferred period - probably capping it to a day/hour/minutes? Or
keeping the current calculated value? Maximum right now in current platform timers
are few minutes depending on the HPET frequency.

Joao
Jan Beulich Aug. 30, 2016, 12:31 p.m. UTC | #5
>>> On 30.08.16 at 14:10, <joao.m.martins@oracle.com> wrote:
> What would be a preferred period - probably capping it to a day/hour/minutes? Or
> keeping the current calculated value? Maximum right now in current platform timers
> are few minutes depending on the HPET frequency.

Ideally I would see you just use the calculated value, unless that
causes problems elsewhere. Depending on such possible problems
would be what cap to alternatively use.

Jan
Joao Martins Sept. 9, 2016, 4:32 p.m. UTC | #6
On 08/30/2016 01:31 PM, Jan Beulich wrote:
>>>> On 30.08.16 at 14:10, <joao.m.martins@oracle.com> wrote:
>> What would be a preferred period - probably capping it to a day/hour/minutes? Or
>> keeping the current calculated value? Maximum right now in current platform timers
>> are few minutes depending on the HPET frequency.
> 
> Ideally I would see you just use the calculated value, unless that
> causes problems elsewhere. Depending on such possible problems
> would be what cap to alternatively use.

While sending v4 out, I spotted some issues with platform timer overflow
handling when we get close to u64 boundary. Specific to clocksource=tsc, and
setting up the plt_overflow, one would see at boot:

"Platform timer appears to have unexpectedly wrapped 10 or more times"

The counter doesn't wrap or stop at all.
But current overflowing checking goes like:

	count = plt_src.read_counter();
        plt_stamp64 += (count - plt_stamp) & plt_mask;

	now = NOW();
	plt_wrap = __read_platform_stime(plt_stamp64);
	plt_stamp = count;

	for (i=0;...)
	{

	plt_now = plt_wrap;
	plt_wrap = __read_platform_stime(plt_stamp64 + plt_mask + 1);

@@	if ( ABS(plt_wrap - now) > ABS(plt_now - now) )
            break; // didn't wrap around
	// did wrap
	plt_stamp64 += plt_mask + 1;

	}

	if (i!=0)
		"Platform timer appears to have unexpectedly wrapped "

Effectively the calculation in @@ doesn't handle the fact that for
clocksource=tsc, "ABS(plt_wrap - now)" would be == to "ABS(plt_now -
now)". Not that is right to be so, but TSC is a 64-bit counter and "plt_mask +
1" overflows the u64, turning the above condition into a comparison of same
value. For <= 32-bit platform timers the current math works fine, but reaching
the 64-bit boundary not so much. And then handling the wraparound further below
with "plt_stamp64 += plt_mask + 1" is effectively adding zeroes, which is
assuming that plt_stamp64/plt_stamp is alone enough to not represent any
discontinuity.

I am not sure we shouldn't handle this way at least for clocksource=tsc: we only
see issues when the delta of the two stamps overflows a u64 (computed with:
plt_stamp64 + (read_counter() - plt_stamp)). Keeping those two stamps updated
more often and we won't see issues. When the wraparound happens (in lots lots of
years away) we could not only update plt_stamp64 and plt_stamp, but also
increment stime_platform_stamp and platform_timer_stamp. This lets us compensate
when the stamps wraparound since for stime_platform_stamp (in ns) that would
only happen after STIME_MAX. Or as a simpler alternative, keeping this patch and
update plt_stamp64 (zeroing this one out) plus plt_stamp on
platform_time_calibration() as additional change.

Would that sound reasonable - am I overlooking something? To some extent this
might also applicable to the general case, although platform timer is now only
used for initial seeding so probably a non-visible issue.

Joao
Jan Beulich Sept. 12, 2016, 7:26 a.m. UTC | #7
>>> On 09.09.16 at 18:32, <joao.m.martins@oracle.com> wrote:
> Would that sound reasonable - am I overlooking something? To some extent this
> might also applicable to the general case, although platform timer is now only
> used for initial seeding so probably a non-visible issue.

Wouldn't it already help to simply make TSC a 62- or 63-bit counter,
masking off the high bit(s) during reads?

Jan
Joao Martins Sept. 12, 2016, 10:35 a.m. UTC | #8
On 09/12/2016 08:26 AM, Jan Beulich wrote:
>>>> On 09.09.16 at 18:32, <joao.m.martins@oracle.com> wrote:
>> Would that sound reasonable - am I overlooking something? To some extent this
>> might also applicable to the general case, although platform timer is now only
>> used for initial seeding so probably a non-visible issue.
> 
> Wouldn't it already help to simply make TSC a 62- or 63-bit counter,
> masking off the high bit(s) during reads?

Yeap - That indeed would be the simplest solution, as we currently mask out the
difference between stamps. Tested it and changed counter_bits to be 63 (amended
in patch 2).

Joao
diff mbox

Patch

diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
index b2a11a8..a03127e 100644
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -526,17 +526,31 @@  static s_time_t __read_platform_stime(u64 platform_time)
     return (stime_platform_stamp + scale_delta(diff, &plt_scale));
 }
 
+static void __plt_update(void)
+{
+    u64 count;
+
+    ASSERT(spin_is_locked(&platform_timer_lock));
+    count = plt_src.read_counter();
+    plt_stamp64 += (count - plt_stamp) & plt_mask;
+    plt_stamp = count;
+}
+
+static void plt_update(void)
+{
+    spin_lock_irq(&platform_timer_lock);
+    __plt_update();
+    spin_unlock_irq(&platform_timer_lock);
+}
+
 static void plt_overflow(void *unused)
 {
     int i;
-    u64 count;
     s_time_t now, plt_now, plt_wrap;
 
     spin_lock_irq(&platform_timer_lock);
 
-    count = plt_src.read_counter();
-    plt_stamp64 += (count - plt_stamp) & plt_mask;
-    plt_stamp = count;
+    __plt_update();
 
     now = NOW();
     plt_wrap = __read_platform_stime(plt_stamp64);
@@ -630,10 +644,21 @@  static s64 __init try_platform_timer(struct platform_timesource *pts)
 
     set_time_scale(&plt_scale, pts->frequency);
 
-    plt_overflow_period = scale_delta(
-        1ull << (pts->counter_bits - 1), &plt_scale);
     plt_src = *pts;
 
+    if ( pts == &plt_tsc )
+    {
+        plt_update();
+    }
+    else
+    {
+        plt_overflow_period = scale_delta(
+            1ull << (pts->counter_bits - 1), &plt_scale);
+
+        printk(XENLOG_INFO "Platform timer overflow period is %lu msecs\n",
+               plt_overflow_period / MILLISECS(1));
+    }
+
     return rc;
 }