diff mbox

[4/5] PTP: add PTP_SYS_OFFSET emulation via cross timestamps infrastructure

Message ID 20170120122503.746158230@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Marcelo Tosatti Jan. 20, 2017, 12:20 p.m. UTC
Emulate PTP_SYS_OFFSET by using an arithmetic mean of the
realtime samples from ->getcrosststamp callback.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---
 drivers/ptp/ptp_chardev.c        |    6 ++
 include/linux/ptp_clock_kernel.h |    5 ++
 include/linux/timekeeping_ptp.h  |   10 ++++
 kernel/time/timekeeping.c        |   79 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 100 insertions(+)



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Paolo Bonzini Jan. 20, 2017, 12:55 p.m. UTC | #1
On 20/01/2017 13:20, Marcelo Tosatti wrote:
>  kernel/time/timekeeping.c        |   79 +++++++++++++++++++++++++++++++++++++++

Why not leave this in drivers/ptp/ptp_chardev.c?

> +		if (ptp->info->emulate_ptp_sys_offset_mean) {
> +			err = emulate_ptp_sys_offset(ptp->info, sysoff, arg);
> +			break;
> +		}

I think this should be simply "if (!ptp->info->gettime64)" and,
likewise, there should be an emulation based getcrosststamp in
ptp_clock_gettime.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti Jan. 20, 2017, 1:07 p.m. UTC | #2
On Fri, Jan 20, 2017 at 01:55:27PM +0100, Paolo Bonzini wrote:
> 
> 
> On 20/01/2017 13:20, Marcelo Tosatti wrote:
> >  kernel/time/timekeeping.c        |   79 +++++++++++++++++++++++++++++++++++++++
> 
> Why not leave this in drivers/ptp/ptp_chardev.c?

timekeeper_lock

> > +		if (ptp->info->emulate_ptp_sys_offset_mean) {
> > +			err = emulate_ptp_sys_offset(ptp->info, sysoff, arg);
> > +			break;
> > +		}
> 
> I think this should be simply "if (!ptp->info->gettime64)" and,
> likewise, there should be an emulation based getcrosststamp in
> ptp_clock_gettime.
> 
> Paolo

gettime64 is called directly via ptp_clock_gettime.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Jan. 20, 2017, 1:36 p.m. UTC | #3
On 20/01/2017 14:07, Marcelo Tosatti wrote:
> On Fri, Jan 20, 2017 at 01:55:27PM +0100, Paolo Bonzini wrote:
>>
>>
>> On 20/01/2017 13:20, Marcelo Tosatti wrote:
>>>  kernel/time/timekeeping.c        |   79 +++++++++++++++++++++++++++++++++++++++
>>
>> Why not leave this in drivers/ptp/ptp_chardev.c?
> 
> timekeeper_lock

Why does emulate_ptp_sys_offset need it, if the current PTP_SYS_OFFSET
code doesn't?  Is the latency acceptable (considering this is a raw spin
lock) or is there a seqlock that we can use instead (such as tk_core.seq
like in get_device_system_crosststamp)?

>>> +		if (ptp->info->emulate_ptp_sys_offset_mean) {
>>> +			err = emulate_ptp_sys_offset(ptp->info, sysoff, arg);
>>> +			break;
>>> +		}
>>
>> I think this should be simply "if (!ptp->info->gettime64)" and,
>> likewise, there should be an emulation based getcrosststamp in
>> ptp_clock_gettime.
>>
>> Paolo
> 
> gettime64 is called directly via ptp_clock_gettime.

Yes, but ptp_clock_gettime can be taught to use getcrosststamp instead.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti Jan. 20, 2017, 1:52 p.m. UTC | #4
On Fri, Jan 20, 2017 at 02:36:40PM +0100, Paolo Bonzini wrote:
> 
> 
> On 20/01/2017 14:07, Marcelo Tosatti wrote:
> > On Fri, Jan 20, 2017 at 01:55:27PM +0100, Paolo Bonzini wrote:
> >>
> >>
> >> On 20/01/2017 13:20, Marcelo Tosatti wrote:
> >>>  kernel/time/timekeeping.c        |   79 +++++++++++++++++++++++++++++++++++++++
> >>
> >> Why not leave this in drivers/ptp/ptp_chardev.c?
> > 
> > timekeeper_lock
> 
> Why does emulate_ptp_sys_offset need it, if the current PTP_SYS_OFFSET
> code doesn't? 

Because if time is adjusted while you are taking the samples, 
the mean can return non existant values:

1) take sample1  (realtime = 2000)
2) userspace changes realtime (realtime = 100)
3) 2100/2 = 1050

However that 1050 value never existed, before or after
userspace changed realtime.
Such behaviour does not exist with PTP_SYS_OFFSET, because 
taking getnstimeofday64 is serialized against time changes.

I am not sure whether returning such bizzare values is fine, to 
drop the lock.

Hum... i think it must be because userspace will consider
the new values after realtime is changed as correct.

>  Is the latency acceptable (considering this is a raw spin
> lock) or is there a seqlock that we can use instead (such as tk_core.seq
> like in get_device_system_crosststamp)?

Well can move it after the ->getcrosststamp loop. 
I'll just drop the spinlock and document the behaviour.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář Jan. 20, 2017, 2:02 p.m. UTC | #5
2017-01-20 14:36+0100, Paolo Bonzini:
> On 20/01/2017 14:07, Marcelo Tosatti wrote:
>> On Fri, Jan 20, 2017 at 01:55:27PM +0100, Paolo Bonzini wrote:
>>>
>>>
>>> On 20/01/2017 13:20, Marcelo Tosatti wrote:
>>>>  kernel/time/timekeeping.c        |   79 +++++++++++++++++++++++++++++++++++++++
>>>
>>> Why not leave this in drivers/ptp/ptp_chardev.c?
>> 
>> timekeeper_lock
> 
> Why does emulate_ptp_sys_offset need it, if the current PTP_SYS_OFFSET
> code doesn't?  Is the latency acceptable (considering this is a raw spin
> lock) or is there a seqlock that we can use instead (such as tk_core.seq
> like in get_device_system_crosststamp)?

The spinlock prevents writers to take the tk_core.seq, which means that
time cannot be changed during that.

The simplest alternative would be to use tk_core.seq for all our reads,
but that would increse the chance of re-reading, even infinitely.

But we don't need to read everything with the same time base -- if the
time is changed (by NTP/user/...) between our reads, then the value will
be off, but if writer took tk_core.seq just to accumulate current time,
then the time after accumulation stays the same and it would work as if
we had the tk_core.seq for the whole time.

Another solution would be to do just one one read and set it to all
saples -- the difference between t[i] and t[i+2] would be 0.  We are
quite sure the just one read is enough, this hack could be even better.

>>>> +		if (ptp->info->emulate_ptp_sys_offset_mean) {
>>>> +			err = emulate_ptp_sys_offset(ptp->info, sysoff, arg);
>>>> +			break;
>>>> +		}
>>>
>>> I think this should be simply "if (!ptp->info->gettime64)" and,
>>> likewise, there should be an emulation based getcrosststamp in
>>> ptp_clock_gettime.
>>>
>>> Paolo
>> 
>> gettime64 is called directly via ptp_clock_gettime.
> 
> Yes, but ptp_clock_gettime can be taught to use getcrosststamp instead.

I agree,

  if (!gettime64)
      use getcrosststamp

and KVM PTP device will not implement gettime64().
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini Jan. 20, 2017, 2:23 p.m. UTC | #6
On 20/01/2017 15:02, Radim Krcmar wrote:
> 2017-01-20 14:36+0100, Paolo Bonzini:
>> On 20/01/2017 14:07, Marcelo Tosatti wrote:
>>> On Fri, Jan 20, 2017 at 01:55:27PM +0100, Paolo Bonzini wrote:
>>>>
>>>>
>>>> On 20/01/2017 13:20, Marcelo Tosatti wrote:
>>>>>  kernel/time/timekeeping.c        |   79 +++++++++++++++++++++++++++++++++++++++
>>>>
>>>> Why not leave this in drivers/ptp/ptp_chardev.c?
>>>
>>> timekeeper_lock
>>
>> Why does emulate_ptp_sys_offset need it, if the current PTP_SYS_OFFSET
>> code doesn't?  Is the latency acceptable (considering this is a raw spin
>> lock) or is there a seqlock that we can use instead (such as tk_core.seq
>> like in get_device_system_crosststamp)?
> 
> The spinlock prevents writers to take the tk_core.seq, which means that
> time cannot be changed during that.
> 
> The simplest alternative would be to use tk_core.seq for all our reads,
> but that would increse the chance of re-reading, even infinitely.

How much?  If a hypercall takes 1 microsecond, and PTP_MAX_SAMPLES is
25, we should be done in less than 50 microseconds.  If update_wall-time
is called with 250 Hz frequency (sounds like a lot), that's still 4000
microseconds so the probability of even 3-4 consecutive retries should
be very low.

> But we don't need to read everything with the same time base -- if the
> time is changed (by NTP/user/...) between our reads, then the value will
> be off, but if writer took tk_core.seq just to accumulate current time,
> then the time after accumulation stays the same and it would work as if
> we had the tk_core.seq for the whole time.

You mean only check seqlock separately for each sample, but restart the
entire loop upon changes to cs_was_changed_seq or clock_was_set_seq?
That would work too.

> Another solution would be to do just one one read and set it to all
> saples -- the difference between t[i] and t[i+2] would be 0.  We are
> quite sure the just one read is enough, this hack could be even better.

I'd be afraid of messing up chrony's stats...

Paolo

>>>>> +		if (ptp->info->emulate_ptp_sys_offset_mean) {
>>>>> +			err = emulate_ptp_sys_offset(ptp->info, sysoff, arg);
>>>>> +			break;
>>>>> +		}
>>>>
>>>> I think this should be simply "if (!ptp->info->gettime64)" and,
>>>> likewise, there should be an emulation based getcrosststamp in
>>>> ptp_clock_gettime.
>>>>
>>>> Paolo
>>>
>>> gettime64 is called directly via ptp_clock_gettime.
>>
>> Yes, but ptp_clock_gettime can be taught to use getcrosststamp instead.
> 
> I agree,
> 
>   if (!gettime64)
>       use getcrosststamp
> 
> and KVM PTP device will not implement gettime64().
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Miroslav Lichvar Jan. 20, 2017, 2:31 p.m. UTC | #7
On Fri, Jan 20, 2017 at 03:23:08PM +0100, Paolo Bonzini wrote:
> On 20/01/2017 15:02, Radim Krcmar wrote:
> > Another solution would be to do just one one read and set it to all
> > saples -- the difference between t[i] and t[i+2] would be 0.  We are
> > quite sure the just one read is enough, this hack could be even better.
> 
> I'd be afraid of messing up chrony's stats...

This should be ok. For the application it just looks as if the CPU and
everything else in the path of the measurements was infinitely fast. :)
Radim Krčmář Jan. 20, 2017, 6:30 p.m. UTC | #8
2017-01-20 15:23+0100, Paolo Bonzini:
> On 20/01/2017 15:02, Radim Krcmar wrote:
>> 2017-01-20 14:36+0100, Paolo Bonzini:
>>> On 20/01/2017 14:07, Marcelo Tosatti wrote:
>>>> On Fri, Jan 20, 2017 at 01:55:27PM +0100, Paolo Bonzini wrote:
>>>>>
>>>>>
>>>>> On 20/01/2017 13:20, Marcelo Tosatti wrote:
>>>>>>  kernel/time/timekeeping.c        |   79 +++++++++++++++++++++++++++++++++++++++
>>>>>
>>>>> Why not leave this in drivers/ptp/ptp_chardev.c?
>>>>
>>>> timekeeper_lock
>>>
>>> Why does emulate_ptp_sys_offset need it, if the current PTP_SYS_OFFSET
>>> code doesn't?  Is the latency acceptable (considering this is a raw spin
>>> lock) or is there a seqlock that we can use instead (such as tk_core.seq
>>> like in get_device_system_crosststamp)?
>> 
>> The spinlock prevents writers to take the tk_core.seq, which means that
>> time cannot be changed during that.
>> 
>> The simplest alternative would be to use tk_core.seq for all our reads,
>> but that would increse the chance of re-reading, even infinitely.
> 
> How much?  If a hypercall takes 1 microsecond, and PTP_MAX_SAMPLES is
> 25, we should be done in less than 50 microseconds.  If update_wall-time
> is called with 250 Hz frequency (sounds like a lot), that's still 4000
> microseconds so the probability of even 3-4 consecutive retries should
> be very low.

You are right, I was overestimating the worst case.
Host/guest preemption (1000 Hz) will also force a re-read, but both of
these diminishing probabilities and a tendency to align.

>> But we don't need to read everything with the same time base -- if the
>> time is changed (by NTP/user/...) between our reads, then the value will
>> be off, but if writer took tk_core.seq just to accumulate current time,
>> then the time after accumulation stays the same and it would work as if
>> we had the tk_core.seq for the whole time.
> 
> You mean only check seqlock separately for each sample, but restart the
> entire loop upon changes to cs_was_changed_seq or clock_was_set_seq?
> That would work too.

I wanted to accept that our measuerements can be imprecise and just have
the seqlock for each sample.  It should not make a difference without
misconfiguration and we can't do anything about a malicious root anyway.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Richard Cochran Jan. 20, 2017, 8:25 p.m. UTC | #9
On Fri, Jan 20, 2017 at 10:20:29AM -0200, Marcelo Tosatti wrote:
> Emulate PTP_SYS_OFFSET by using an arithmetic mean of the
> realtime samples from ->getcrosststamp callback.

This change log is not very informative.

Yes, I can see that the new code calculates a mean.

But WHY is this needed?

And why is this kvm case so special, that it requires its own new flag
within the generic PHC subsystem?

Thanks,
Richard
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti Jan. 23, 2017, 1:19 p.m. UTC | #10
On Fri, Jan 20, 2017 at 09:25:02PM +0100, Richard Cochran wrote:
> On Fri, Jan 20, 2017 at 10:20:29AM -0200, Marcelo Tosatti wrote:
> > Emulate PTP_SYS_OFFSET by using an arithmetic mean of the
> > realtime samples from ->getcrosststamp callback.
> 
> This change log is not very informative.
> 
> Yes, I can see that the new code calculates a mean.
> 
> But WHY is this needed?

This is needed to generate the PTP_SYS_OFFSET data: a table with read
from realtime clock, read from device clock, read from realtime clock,
... :

time ->
device clock	|	|sample2|	|sample4|	|sample6| ...
-------------------------------------------------------------
realtime clock  |sample1|       |sample3|	|sample5|


Where sampleN is the read of the respective clock.

From the following PTP_SYS_OFFSET_PRECISE data:

device clock	|sample1P,deviceclock|       |sample2P,deviceclock|
-------------------------------------------------------------
realtime clock  |sample1P,realtimeclock|     |sample2P,realtimeclock|

The mean of {sample2P,realtimeclock} and {sample1P,realtimeclock} is
used to generate sample1 for the realtime clock.

> And why is this kvm case so special, that it requires its own new flag
> within the generic PHC subsystem?

We get PTP_SYS_OFFSET_PRECISE support for users and use that to 
implement PTP_SYS_OFFSET as well. Other drivers could do the same, 
as long as their ->getcrosststamp callbacks return PTP synchronized
time.

Since the clock is moving forward, approximating the clock read at
"sample3" with the mean of "sample1" and "sample5" (in the first table
above) is a very good approximation.
Richard Cochran Jan. 23, 2017, 6:44 p.m. UTC | #11
On Mon, Jan 23, 2017 at 11:19:17AM -0200, Marcelo Tosatti wrote:
> This is needed to generate the PTP_SYS_OFFSET data: a table with read
> from realtime clock, read from device clock, read from realtime clock,
> ... :
> 
> time ->
> device clock	|	|sample2|	|sample4|	|sample6| ...
> -------------------------------------------------------------
> realtime clock  |sample1|       |sample3|	|sample5|

Here "realtime clock" is CLOCK_REALTIME on the guest, and "device
clock" is CLOCK_REALTIME on the host?
 
> Where sampleN is the read of the respective clock.
> 
> From the following PTP_SYS_OFFSET_PRECISE data:
> 
> device clock	|sample1P,deviceclock|       |sample2P,deviceclock|
> -------------------------------------------------------------
> realtime clock  |sample1P,realtimeclock|     |sample2P,realtimeclock|

Are |sample1P,deviceclock| and |sample1P,realtimeclock| taken at the
same instant in time?

If not, then calling that PTP_SYS_OFFSET_PRECISE is misleading.

Still don't get what you are doing here...

Thanks,
Richard
Paolo Bonzini Jan. 23, 2017, 7:44 p.m. UTC | #12
On 23/01/2017 19:44, Richard Cochran wrote:
>> device clock	   |sample1P,deviceclock|       |sample2P,deviceclock|
>> -------------------------------------------------------------
>> realtime clock  |sample1P,realtimeclock|     |sample2P,realtimeclock|
>
> Are |sample1P,deviceclock| and |sample1P,realtimeclock| taken at the
> same instant in time?
> 
> If not, then calling that PTP_SYS_OFFSET_PRECISE is misleading.

Yes, of course.  This was added for the e1000e drivers first, but chrony
isn't using it yet.  In the case of KVM, the same host TSC value is used
to produce the host clock and (converted to guest TSC) the guest clock.

If you just implement getclock64 the PTP_SYS_OFFSET output:

device clock	|	|sample2|	|sample4|	|sample6| ...
-------------------------------------------------------------
realtime clock  |sample1|       |sample3|	|sample5|

has a very large distance between samples on the same line (about 1 us),
and I think it is too noisy for userspace to make sense of the output.

So, on one hand chrony only uses the mean of realtime clock samples, in
an attempt to produce precise cross timestamps.  On the other hand, even
though KVM could produce those natively, chrony does not support
PTP_SYS_OFFSET_PRECISE.

Marcelo's patch then produces fake realtime clock samples that, however,
let chrony derive the cross timestamps that KVM produced in the first
place.  The outcome is really great accuracy compared to previous
versions of the patch, often just +/- 2 or 3 nanoseconds.

Paolo

> Still don't get what you are doing here...
Marcelo Tosatti Jan. 23, 2017, 11:06 p.m. UTC | #13
On Mon, Jan 23, 2017 at 07:44:15PM +0100, Richard Cochran wrote:
> On Mon, Jan 23, 2017 at 11:19:17AM -0200, Marcelo Tosatti wrote:
> > This is needed to generate the PTP_SYS_OFFSET data: a table with read
> > from realtime clock, read from device clock, read from realtime clock,
> > ... :
> > 
> > time ->
> > device clock	|	|sample2|	|sample4|	|sample6| ...
> > -------------------------------------------------------------
> > realtime clock  |sample1|       |sample3|	|sample5|
> 
> Here "realtime clock" is CLOCK_REALTIME on the guest, and "device
> clock" is CLOCK_REALTIME on the host?

Yes.

>  
> > Where sampleN is the read of the respective clock.
> > 
> > From the following PTP_SYS_OFFSET_PRECISE data:
> > 
> > device clock	|sample1P,deviceclock|       |sample2P,deviceclock|
> > -------------------------------------------------------------
> > realtime clock  |sample1P,realtimeclock|     |sample2P,realtimeclock|
> 
> Are |sample1P,deviceclock| and |sample1P,realtimeclock| taken at the
> same instant in time?

Yes they are.

> If not, then calling that PTP_SYS_OFFSET_PRECISE is misleading.

It is.

> Still don't get what you are doing here...

What KVM provides is reading two clocks (host clock and guest clock)
at the exact same time (because both are based on TSC, so you do RDTSC 
once and use that result to calculate two clocks: host and guest realtime
clocks).

So what this interface provides is the following: If a PTP driver
provides ->getcrosstimestamps callback (meaning the driver can read
the value of two clocks at the same instant), and does not provide 
->gettime64 callback, then you approximate the first "diagram" 
using the second "diagram" (ie: the arithmetic mean of the samples).

Can you please describe what problem exists with this scheme?

Would you prefer to get rid of the "emulation" of ->gettime64 
by implementing ->gettime64 directly? Can do that as well.

Thanks.
Richard Cochran Jan. 24, 2017, 5:32 a.m. UTC | #14
On Mon, Jan 23, 2017 at 09:06:20PM -0200, Marcelo Tosatti wrote:
> Can you please describe what problem exists with this scheme?

This new kernel code exists just because chrony doesn't implement the
PRECISE ioctl.  Instead of adding new "fake" modes, just teach chrony
about the better method.

I prefer to have kernel interfaces whose behavior is clear and
consistent.

Thanks,
Richard
Richard Cochran Jan. 24, 2017, 5:43 a.m. UTC | #15
On Mon, Jan 23, 2017 at 08:44:53PM +0100, Paolo Bonzini wrote:
> If you just implement getclock64 the PTP_SYS_OFFSET output:
> 
> device clock	|	|sample2|	|sample4|	|sample6| ...
> -------------------------------------------------------------
> realtime clock  |sample1|       |sample3|	|sample5|
> 
> has a very large distance between samples on the same line (about 1 us),
> and I think it is too noisy for userspace to make sense of the output.

One microsecond is not too bad at all.  The PCIe devices have
intervals of 5-6 usec, and the result is quite usable, certainly
better than generic NTP.

> Marcelo's patch then produces fake realtime clock samples that, however,
> let chrony derive the cross timestamps that KVM produced in the first
> place.  The outcome is really great accuracy compared to previous
> versions of the patch, often just +/- 2 or 3 nanoseconds.

Why can't chrony learn about PTP_SYS_OFFSET_PRECISE?

Thanks,
Richard
Miroslav Lichvar Jan. 24, 2017, 8:15 a.m. UTC | #16
On Tue, Jan 24, 2017 at 06:32:43AM +0100, Richard Cochran wrote:
> On Mon, Jan 23, 2017 at 09:06:20PM -0200, Marcelo Tosatti wrote:
> > Can you please describe what problem exists with this scheme?
> 
> This new kernel code exists just because chrony doesn't implement the
> PRECISE ioctl.  Instead of adding new "fake" modes, just teach chrony
> about the better method.

The latest development code of chrony now supports the PRECISE ioctl.

I did some tests with an e1000e NIC (i219) and it seemed the stability
was slightly worse than with the non-PRECISE ioctl, but there was a
400-500ns offset between the two, so it should be much more accurate
(we finally have something that avoids the asymmetry on the PCI-e
bus?). Configuring the refclock with a shorter dpoll should compensate
for the decrease in stability. In any case, there is a "nocrossts"
option to not use the PRECISE ioctl even if it's supported.
Marcelo Tosatti Jan. 24, 2017, 11:23 a.m. UTC | #17
On Mon, Jan 23, 2017 at 08:44:53PM +0100, Paolo Bonzini wrote:
> 
> 
> On 23/01/2017 19:44, Richard Cochran wrote:
> >> device clock	   |sample1P,deviceclock|       |sample2P,deviceclock|
> >> -------------------------------------------------------------
> >> realtime clock  |sample1P,realtimeclock|     |sample2P,realtimeclock|
> >
> > Are |sample1P,deviceclock| and |sample1P,realtimeclock| taken at the
> > same instant in time?
> > 
> > If not, then calling that PTP_SYS_OFFSET_PRECISE is misleading.
> 
> Yes, of course.  This was added for the e1000e drivers first, but chrony
> isn't using it yet.  In the case of KVM, the same host TSC value is used
> to produce the host clock and (converted to guest TSC) the guest clock.
> 
> If you just implement getclock64 the PTP_SYS_OFFSET output:
> 
> device clock	|	|sample2|	|sample4|	|sample6| ...
> -------------------------------------------------------------
> realtime clock  |sample1|       |sample3|	|sample5|
> 
> has a very large distance between samples on the same line (about 1 us),
> and I think it is too noisy for userspace to make sense of the output.
> 
> So, on one hand chrony only uses the mean of realtime clock samples, in
> an attempt to produce precise cross timestamps.  On the other hand, even
> though KVM could produce those natively, chrony does not support
> PTP_SYS_OFFSET_PRECISE.
> 
> Marcelo's patch then produces fake realtime clock samples that, however,
> let chrony derive the cross timestamps that KVM produced in the first
> place.  The outcome is really great accuracy compared to previous
> versions of the patch, often just +/- 2 or 3 nanoseconds.
> 
> Paolo

Nope, the clock offset is the value inside square brackets. The +/-
is the error. So the clock offset is actually up to 680ns.

Yes for greater accuracy userspace should implement _PRECISE
support.

I'm resending v5 with native support for ->gettime and 
->getcrosststamps.
Richard Cochran Jan. 24, 2017, 11:35 a.m. UTC | #18
On Tue, Jan 24, 2017 at 09:23:29AM -0200, Marcelo Tosatti wrote:
> I'm resending v5 with native support for ->gettime and 
> ->getcrosststamps.

And can you please drop the "fake" PTP_SYS_OFFSET stuff?

Here is another reason why this is illogical.  The application can
choose how many samples to take during PTP_SYS_OFFSET.  It will choose
more samples if it wants a better picture of the offsets, but in that
case the "fake" samples would cause more computation in the kernel for
no added benefit.

Thanks,
Richard
diff mbox

Patch

Index: kvm-ptpdriver/drivers/ptp/ptp_chardev.c
===================================================================
--- kvm-ptpdriver.orig/drivers/ptp/ptp_chardev.c	2017-01-19 15:34:52.181338789 -0200
+++ kvm-ptpdriver/drivers/ptp/ptp_chardev.c	2017-01-19 15:35:31.705421911 -0200
@@ -23,6 +23,7 @@ 
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/timekeeping.h>
+#include <linux/timekeeping_ptp.h>
 
 #include "ptp_private.h"
 
@@ -219,6 +220,11 @@ 
 			err = -EINVAL;
 			break;
 		}
+
+		if (ptp->info->emulate_ptp_sys_offset_mean) {
+			err = emulate_ptp_sys_offset(ptp->info, sysoff, arg);
+			break;
+		}
 		pct = &sysoff->ts[0];
 		for (i = 0; i < sysoff->n_samples; i++) {
 			getnstimeofday64(&ts);
Index: kvm-ptpdriver/include/linux/ptp_clock_kernel.h
===================================================================
--- kvm-ptpdriver.orig/include/linux/ptp_clock_kernel.h	2017-01-19 15:34:52.181338789 -0200
+++ kvm-ptpdriver/include/linux/ptp_clock_kernel.h	2017-01-19 15:35:31.706421913 -0200
@@ -99,6 +99,10 @@ 
  *            parameter func: the desired function to use.
  *            parameter chan: the function channel index to use.
  *
+ * @emulate_ptp_sys_offset_mean: Emulate PTP_SYS_OFFSET ioctl
+ *            using arithmetic mean of series of PTP_SYS_OFFSET_PRECISE
+ *            ioctl.
+ *
  * Drivers should embed their ptp_clock_info within a private
  * structure, obtaining a reference to it using container_of().
  *
@@ -115,6 +119,7 @@ 
 	int n_pins;
 	int pps;
 	struct ptp_pin_desc *pin_config;
+	bool emulate_ptp_sys_offset_mean;
 	int (*adjfine)(struct ptp_clock_info *ptp, long scaled_ppm);
 	int (*adjfreq)(struct ptp_clock_info *ptp, s32 delta);
 	int (*adjtime)(struct ptp_clock_info *ptp, s64 delta);
Index: kvm-ptpdriver/include/linux/timekeeping_ptp.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ kvm-ptpdriver/include/linux/timekeeping_ptp.h	2017-01-19 15:35:31.706421913 -0200
@@ -0,0 +1,10 @@ 
+#ifndef _LINUX_TIMEKEEPING_PTP_H
+#define _LINUX_TIMEKEEPING_PTP_H
+
+#include <uapi/linux/ptp_clock.h>
+#include <linux/ptp_clock_kernel.h>
+
+extern int emulate_ptp_sys_offset(struct ptp_clock_info *info,
+			struct ptp_sys_offset *sysoff,
+			unsigned long arg);
+#endif
Index: kvm-ptpdriver/kernel/time/timekeeping.c
===================================================================
--- kvm-ptpdriver.orig/kernel/time/timekeeping.c	2017-01-19 15:34:52.181338789 -0200
+++ kvm-ptpdriver/kernel/time/timekeeping.c	2017-01-20 09:29:13.978434149 -0200
@@ -23,6 +23,10 @@ 
 #include <linux/stop_machine.h>
 #include <linux/pvclock_gtod.h>
 #include <linux/compiler.h>
+#include <linux/timekeeping_ptp.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
 
 #include "tick-internal.h"
 #include "ntp_internal.h"
@@ -1168,6 +1172,81 @@ 
 }
 EXPORT_SYMBOL_GPL(get_device_system_crosststamp);
 
+static void mean_sys_realtime(struct system_device_crosststamp *xtstamp,
+			      struct system_device_crosststamp *n_xtstamp,
+			      struct ptp_clock_time *pct)
+{
+	ktime_t sys_realtime, n_sys_realtime;
+	struct timespec ts;
+	u64 ns;
+
+	/* estimate realtime with arithmetic mean of two crosststamp samples */
+	sys_realtime = xtstamp->sys_realtime;
+	n_sys_realtime = n_xtstamp->sys_realtime;
+	ns = ktime_divns(ktime_add(sys_realtime, n_sys_realtime), 2);
+	ts = ktime_to_timespec(ns_to_ktime(ns));
+	pct->sec = ts.tv_sec;
+	pct->nsec = ts.tv_nsec;
+}
+
+int emulate_ptp_sys_offset(struct ptp_clock_info *info,
+				  struct ptp_sys_offset *sysoff,
+				  unsigned long arg)
+{
+	unsigned long flags;
+	int i, err, len;
+	int n_samples;
+	struct system_device_crosststamp *xtstamps, *xtstamp, *n_xtstamp;
+	struct ptp_clock_time *pct;
+
+	n_samples = sysoff->n_samples + 2;
+
+	len = sizeof(struct system_device_crosststamp);
+	xtstamps = kzalloc(len * n_samples, GFP_KERNEL);
+	if (xtstamps == NULL)
+		return -ENOMEM;
+
+	/* stop update_wall_time, settimeofday */
+	raw_spin_lock_irqsave(&timekeeper_lock, flags);
+
+	xtstamp = xtstamps;
+	for (i = 0; i < n_samples; i++) {
+		err = info->getcrosststamp(info, xtstamp);
+		if (err) {
+			kfree(xtstamps);
+			raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
+			return err;
+		}
+		xtstamp++;
+	}
+
+	pct = &sysoff->ts[0];
+	xtstamp = n_xtstamp = xtstamps;
+	n_xtstamp++;
+	for (i = 0; i < sysoff->n_samples; i++) {
+		struct timespec ts;
+
+		mean_sys_realtime(xtstamp, n_xtstamp, pct);
+		pct++;
+
+		ts = ktime_to_timespec(n_xtstamp->device);
+		pct->sec = ts.tv_sec;
+		pct->nsec = ts.tv_nsec;
+		pct++;
+		xtstamp++;
+		n_xtstamp++;
+	}
+
+	mean_sys_realtime(xtstamp, n_xtstamp, pct);
+	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
+	kfree(xtstamps);
+	if (copy_to_user((void __user *)arg, sysoff, sizeof(*sysoff)))
+		return -EFAULT;
+
+	return 0;
+}
+EXPORT_SYMBOL(emulate_ptp_sys_offset);
+
 /**
  * do_gettimeofday - Returns the time of day in a timeval
  * @tv:		pointer to the timeval to be set