mbox series

[v32,0/5] Timestamp synchronization of host - guest tracing session

Message ID 20210315061857.168570-1-tz.stoyanov@gmail.com (mailing list archive)
Headers show
Series Timestamp synchronization of host - guest tracing session | expand

Message

Tzvetomir Stoyanov (VMware) March 15, 2021, 6:18 a.m. UTC
Basic infrastructure for host - guest timestamp synchronization and a
PoC implementation of PTP-like and KVM algorithms. 

The KVM plugin is with higher weight than PTP, it is preferred in case both
are active. KVM is used only if x86-tsc clock is selected for the current
trace, PTP is used for all other cases.

These changes depend on:
   "Refactoring and improvements of time sync logic" patch set.
   "TSC trace clock to nanosecond conversion" patch set.

 v32 changes:
  - Reduced the patch set by moving the logic for TSC to nanosecond
    conversion in a separate set.
  - Moved the logic for internal refactoring of existing and already merged
    time sync logic in a separate set.
 v31 changes:
  - Removed POC label from the patch introducing the PTP time sync
    algorithm.
  - Implemented logic for TSC clock to nanoseconds conversion.
  - Select x86-tsc clock for the tracing session, is KVM timesync plugin is
    available and the user did not specify any trace clock.
 v30 changes:
  - Moved all time sync logic in the trace-cmd library.
  - Separate the logic for pthread barrier time sync threads synchronization
    in its own patch.
 v29 changes:
  - Included <linux/types.h> in trace-timesync-kvm.c for __s64 type.
  - Added pthread barrier to host time sync pthread, to ensure times are
    synchronized before starting the trace.
  - Removed already merged patches from the set.
 v28 changes:
  - Fix bugs, introduced by previous trace-cmd commits, related to tracefs
    library usage.
  - Removed unused type defines for trace packets.
 v27 changes:
  - Fixed broken compilation due to usage of old libtracefs APIs.
  - Rewrote complex macros to follow the Linux kernel requirements for macros.
  - Removed pathces from the set, that were already merged upstream.
 v26 changes:
  - Rebased on top of latest master.
  - Abstracted Time sync protocol names in a structure, to avoid using
    triple char pointers.
  - Removed a patch from the set, as it is already merged:
    "trace-cmd: Add new libtrasefs API to get the current trace clock".
  - Minor coding style changes, for better code readability.
 v25 changes:
  - Addressed Steven's comments.
  - Reorded the patches in the plugin - put the PTP and KVM PoC patches
    last and all other fixes and improvements before them.
  - Implemented a new libtrasefs API to get the current trace clock.
  - Replaced IDs of trace timestamps synchronization protocols with strings.
  - Implemented an "accuracy" parameter for timestamps synchronization
    protocols.
  - Added "scaling ratio" correction, calculated by the timestamp
    synchronization protocols.
  - Added flags, declared by the timestamps synchronization protocols,
    and stored in the trace.dat file, together with the calculated
    time offsets.
  - Implemented timestamp offset calculation per virtual CPU.
 v24 changes:
  - trace-cmd internal refactoring, needed for KVM timestamp synchronization
    plugin.
  - Added KVM timestamp synchronization plugin.

 v23 changes:
  - Added flags to PTP start message to control the behaviour of the
    algorithm. This is for development purposes mostly, to compare and
    evaluate how these changes affect the accuracy.
	PTP_FLAG_FASTEST_RESPONSE:  Consider only the probe with fastest
				    response time, otherwise make a histogram
				    from all probes.
	PTP_FLAG_USE_MARKER: Use trace marker to get the clock,
 			     otherwise use directly the system clock

 v22 changes, addressed Steven's comments:
  - Fixed error flow in read_qemu_guests_pids(), free allocated resources
    in case of an error.
  - Fixed initialisation of reallocated memmory in set_vcpu_pid_mapping().
  - Fixed bitmask logic in tracecmd_tsync_proto_select().
  - Fixed usage of wrong structure in make_trace_resp().

 v21 changes:
  - Rebased on top of latest master.
  - Remove these two patches from the set, as they are already merged:
       trace-cmd: Add new library API for local CPU count
       trace-cmd: Add support for negative time offsets in trace.dat file
  - Added more descriptive warning message when fails to extract Time Shift
    information from the trace.dat file.
  - Added a warning message when fails to obtain vcpu - pid mappings.
  - Handle the case with sparse VCPU numbers in VCPU - PID mapping
    array.
  - Fixed broken recording over FIFOs.
 v20 changes:
  - Rebased on top of latest master.
  - Removed the VCPUS_MAX hardcoded limit, reimplemented the cpu_pid[] array
    to be dynamically allocated.
  - Added a warning when reading of TRACECMD_OPTION_TIME_SHIFT option
    fails, due to unexpected option's size.
  - Improved loading of TRACECMD_OPTION_GUEST option data, as Steven
    suggested.
 v19 changes:
  - Rebased on top of latest master. The librtacefs is already merged,
    this allows to drop few patches from the set and use librtacefs APIs
    instead.
  - Reimplemented all new trace.dat options to be in binary format,
    instead of text. Leverage the new "trace-cmd dump" sub command to
    track what is written in the file.
  - Addressed Steven's comments.

 v18 changes: addressed Steven Rostdet comments:
  - Replaced semaphores with pthread mutexes.
  - Made bitmask with time sync protocols unlimited, so we can support
    more than 32 protocols. That required to redesign the trace request
    packet format.
  - A lot of small fixes.
	
 v17 changes:
  - Implemented new PTP logic for calculating the clocks offset, using
    histogram of all PTP samples. It gives better results than the logic
    with the fastest response time, so set the new one as default in the
    POC patch.

 v16 changes:
  - Fixed compilation in case no VSOCK is available (Thanks to Slavomir Kaslev)
  - Fixed a typo in trace-cmd-record.1.txt (Thanks to Slavomir Kaslev)
  - Added forgotten file in the patch "trace-cmd: Add new library APIs for
    ftrace instances." - trace-instance.c
  - Fixed few compilation warnings related to TSYNC_DEBUG code.
  - Removed a blank line at the end of "tsync_readme" file. 

 v15 changes:
  - Removed the patch for "--proc-map" from the series, as it should not
    be part of it.

 v14 changes:
  - Bring back the PTP-like algorithm and removed the ftrace event based logic.
  - Reimplemented the PTP-like algorithm to use raw ftrace markers, instead of clock_gettime() API.
  - Refactored the logic to be algorithm independent and plugin friendly.
  - Implemented continuous timestamps synchronization, while the trace is running.
  - Moved logic from trace-cmd application to libtracecmd, as new library APIs.
  - Implemented new trace id functionality.
  - Implemented new guest section in host trace.dat file.
	
 v13 changes:
  - Remove few patches from the set, as they were merged.
  - Rebased to the latest master, Slavomirs patchest "Add VM kernel tracing over
    vsockets and FIFOs" got merged!
 	
 v12 changes:
  - Rebased on top of Slavomir's v13 "Add VM kernel tracing over vsockets and FIFOs"

 v11 changes:
  - Rebased on top of Slavomir's v10 "Add VM kernel tracing over vsockets and FIFOs"
  - Addressed Slavomir's commnents from version 10 of the patch series.

 v10 changes:
  - Fixed broken compilation, call to timestamp_correction_calc() in timestamp_correct
    was smashed.
  - Replaced deprecated tep_data_event_from_type() API with tep_find_event().
  - Fixed a warning on assignment const to non const.

 v9 changes:
  - Fixed implementation of binary search algorithm in timestamp_correct()

 v8 changes:
  - Added rmdir() call in tracecmd_remove_instance(), to completely remove the instance. 
  However, there is an issue with deleting the instances using rmdir(), which is investigated.
  - Few changes in read_qemu_guests_pids(), timestamp_correct(), tsync_offset_load() 
 tracecmd_clock_context_new() and find_raw_events() suggested by Slavomir. 

 v7 changes:
  - Added warning messages in case time synchronization cannot be negotiated or fails.
  - Few optimizations and checks in read_qemu_guests_pids(), tsync_offset_load(),
    and find_raw_events(), suggested by Slavomir Kaslev.
  - Reworked timestamp_correct() to not use static variables.
  - Check TRACECMD_OPTION_TIME_SHIFT before reading time sync samples from the trace.dat file

 v6 changes:
  - Refactored tracecmd_msg_snd_time_sync() and tracecmd_msg_rcv_time_sync() functions:
    removed any time sync calculations logic as separate functions in trace-timesync.c file
  - Defined TSYNC_PROBE, TSYNC_REQ and TSYNC_RESP messages, in order to make the time sync
    protocol comprehensible.
  - Addressed Steven Rostedt comments.
  - Addressed Slavomir Kaslev commnets.

v5 changes:
  - Rebased to Slavomir's v8 "Add VM kernel tracing over vsockets and FIFOs"
    patch series.
  - Implemented an algorithm for time drift correction.
  - Addressed Slavomir's commnets.
  - Refactored the code: moved all time sync specific implementation in trace-timesync.c
  - Isolated all hardcoded event specific stuff in a structure, so it could be easily
    moved to external plugins.
  - Added a check for VSOCK support: do not perform vsock dependent time synchronisation
    in case there is no VSOCK support.

 v4 changes:
  - Removed the implementation of PTP-like algorithm. The current
    logic relies on matching time stamps of kvm_exit/virtio_transport_recv_pkt
    events on host to virtio_transport_alloc_pkt/vp_notify events on guest.
  - Rebased to Slavomir's v7 "Add VM kernel tracing over vsockets and FIFOs"
    patch series.
  - Decreased the time synch probes from 5000 to 300.
  - Addressed Steven Rostedt comments.
  - Code cleanup.

 v3 changes:
 - Removed any magic constants, used in the PTP-like algorithm,
   as Slavomir Kaslev suggested.
 - Implemented new algorithm, based on mapping kvm_exit events
   in host context to vsock_send events in guest context,
   suggested by Steven Rostedt.

 v2 changes:
  - Addressed Steven Rostedt comments.
  - Modified PTP-like timestamps sync algorithm to gain more accuracy, with the
    help of Yordan Karadzhov and Slavomir Kaslev.
Tzvetomir Stoyanov (VMware) (5):
  trace-cmd: Add dummy function to initialize timestamp sync logic
  trace-cmd: Add timestamp synchronization per vCPU
  trace-cmd: PTP-like algorithm for host - guest timestamp
    synchronization
  trace-cmd: Debug scripts for PTP-like algorithm for host - guest
    timestamp synchronization
  trace-cmd [POC]: Add KVM timestamp synchronization plugin

 lib/trace-cmd/Makefile                        |   4 +
 .../include/private/trace-cmd-private.h       |   2 +-
 lib/trace-cmd/include/trace-tsync-local.h     |  23 +-
 lib/trace-cmd/trace-input.c                   | 175 +++--
 lib/trace-cmd/trace-timesync-kvm.c            | 460 +++++++++++
 lib/trace-cmd/trace-timesync-ptp.c            | 715 ++++++++++++++++++
 lib/trace-cmd/trace-timesync.c                | 294 +++++--
 scripts/debug/tsync_hist.py                   |  57 ++
 scripts/debug/tsync_readme                    |  12 +
 scripts/debug/tsync_res.py                    |  46 ++
 tracecmd/trace-agent.c                        |   2 +
 tracecmd/trace-dump.c                         |  52 +-
 tracecmd/trace-record.c                       |  23 +-
 13 files changed, 1685 insertions(+), 180 deletions(-)
 create mode 100644 lib/trace-cmd/trace-timesync-kvm.c
 create mode 100644 lib/trace-cmd/trace-timesync-ptp.c
 create mode 100644 scripts/debug/tsync_hist.py
 create mode 100644 scripts/debug/tsync_readme
 create mode 100644 scripts/debug/tsync_res.py

Comments

Stefano De Venuto March 19, 2021, 10:08 a.m. UTC | #1
Hello,

On 3/15/21 7:18 AM, Tzvetomir Stoyanov (VMware) wrote:
> Basic infrastructure for host - guest timestamp synchronization and a
> PoC implementation of PTP-like and KVM algorithms. 
I'm trying to make some analysis on traces generated by host and guest,
so I applied this series and the related dependencies.

However, when the KVM algorithm is used, the merging process
of the two traces doesn't seem to happen properly.

The output of `trace-cmd report -i trace.dat -i trace-tumbleweed.dat` is:

           trace.dat: cpus=4
trace-tumbleweed.dat: cpus=1
           trace.dat:            sleep-23948 [002]61124758463654:
write_msr:            c0000100, value 7f35092875c0
           trace.dat:           <idle>-0     [000]61124760841508:
write_msr:            c0000100, value 7f6136b75640
           trace.dat:        CPU 0/KVM-1640  [000]61124760851620:
write_msr:            c0011020, value 40000000000000
           trace.dat:        CPU 0/KVM-1640  [000]61124760861778:
read_msr:             c0000101, value ffff9b1b76c00000
           trace.dat:        CPU 0/KVM-1640  [000]61124760862832:
read_msr:             c0000081, value 23001000000000
           trace.dat:        CPU 0/KVM-1640  [000]61124760864347:
read_msr:             c0000082, value ffffffffb5400010
           trace.dat:        CPU 0/KVM-1640  [000]61124760865225:
read_msr:             c0000083, value ffffffffb5401670
           .
           .
           .
trace-tumbleweed.dat:   IPC I/O Parent-4313 [000]8446232460209009653:
write_msr:            c0000100, value 7f94535c7640
trace-tumbleweed.dat:    IPC I/O Child-4503 [000]8446348845232796661:
write_msr:            c001011f, value 4
trace-tumbleweed.dat:    IPC I/O Child-4503 [000]8447532508154803189:
write_msr:            c0000100, value 7f945e8d5780
trace-tumbleweed.dat:           <idle>-0    [000]8452336523564659701:
write_msr:            c001011f, value 0
trace-tumbleweed.dat:           <idle>-0    [000]8452515224268944373:
write_msr:            6e0, value 731478a3bf62
trace-tumbleweed.dat:           <idle>-0    [000]8452610851715789813:
write_msr:            6e0, value 731478968f37
trace-tumbleweed.dat:           <idle>-0    [000]8555757214338495477:
write_msr:            6e0, value 731478853ec1
trace-tumbleweed.dat:           <idle>-0    [000]8555904475882173429:
write_msr:            6e0, value 7314778a60e8
trace-tumbleweed.dat:      kworker/0:1-7351 [000]8556355589182174197:
write_msr:            c0000100, value 7f8373573740

The TimeShift information is inside the guest trace file,
so I think this is not a transfer problem.

I also tried to remove the last patch, the one related to the KVM clock,
in order to use PTP and the two traces are merged correctly.

The commands used to record are:

Host:
# trace-cmd record -C x86-tsc -e kvm:* -e msr:* -A tumbleweed:823 -e
msr:* -C x86-tsc sleep 1

Guest:
# echo x86-tsc > /sys/kernel/tracing/trace_clock
# trace-cmd agent

Am I doing something wrong?

If necessary, I can provide more info about my setup, or do more tests.


Thanks and Regards,

Stefano
Tzvetomir Stoyanov (VMware) March 19, 2021, 11:55 a.m. UTC | #2
Hi Stefano,

On Fri, Mar 19, 2021 at 12:08 PM Stefano De Venuto
<stefano.devenuto99@gmail.com> wrote:
>
> Hello,
>
> On 3/15/21 7:18 AM, Tzvetomir Stoyanov (VMware) wrote:
> > Basic infrastructure for host - guest timestamp synchronization and a
> > PoC implementation of PTP-like and KVM algorithms.
> I'm trying to make some analysis on traces generated by host and guest,
> so I applied this series and the related dependencies.
>
> However, when the KVM algorithm is used, the merging process
> of the two traces doesn't seem to happen properly.
>
> The output of `trace-cmd report -i trace.dat -i trace-tumbleweed.dat` is:
>
>            trace.dat: cpus=4
> trace-tumbleweed.dat: cpus=1
>            trace.dat:            sleep-23948 [002]61124758463654:
> write_msr:            c0000100, value 7f35092875c0
>            trace.dat:           <idle>-0     [000]61124760841508:
> write_msr:            c0000100, value 7f6136b75640
>            trace.dat:        CPU 0/KVM-1640  [000]61124760851620:
> write_msr:            c0011020, value 40000000000000
>            trace.dat:        CPU 0/KVM-1640  [000]61124760861778:
> read_msr:             c0000101, value ffff9b1b76c00000
>            trace.dat:        CPU 0/KVM-1640  [000]61124760862832:
> read_msr:             c0000081, value 23001000000000
>            trace.dat:        CPU 0/KVM-1640  [000]61124760864347:
> read_msr:             c0000082, value ffffffffb5400010
>            trace.dat:        CPU 0/KVM-1640  [000]61124760865225:
> read_msr:             c0000083, value ffffffffb5401670
>            .
>            .
>            .
> trace-tumbleweed.dat:   IPC I/O Parent-4313 [000]8446232460209009653:
> write_msr:            c0000100, value 7f94535c7640
> trace-tumbleweed.dat:    IPC I/O Child-4503 [000]8446348845232796661:
> write_msr:            c001011f, value 4
> trace-tumbleweed.dat:    IPC I/O Child-4503 [000]8447532508154803189:
> write_msr:            c0000100, value 7f945e8d5780
> trace-tumbleweed.dat:           <idle>-0    [000]8452336523564659701:
> write_msr:            c001011f, value 0
> trace-tumbleweed.dat:           <idle>-0    [000]8452515224268944373:
> write_msr:            6e0, value 731478a3bf62
> trace-tumbleweed.dat:           <idle>-0    [000]8452610851715789813:
> write_msr:            6e0, value 731478968f37
> trace-tumbleweed.dat:           <idle>-0    [000]8555757214338495477:
> write_msr:            6e0, value 731478853ec1
> trace-tumbleweed.dat:           <idle>-0    [000]8555904475882173429:
> write_msr:            6e0, value 7314778a60e8
> trace-tumbleweed.dat:      kworker/0:1-7351 [000]8556355589182174197:
> write_msr:            c0000100, value 7f8373573740
>
> The TimeShift information is inside the guest trace file,
> so I think this is not a transfer problem.
>
> I also tried to remove the last patch, the one related to the KVM clock,
> in order to use PTP and the two traces are merged correctly.
>
> The commands used to record are:
>
> Host:
> # trace-cmd record -C x86-tsc -e kvm:* -e msr:* -A tumbleweed:823 -e
> msr:* -C x86-tsc sleep 1

The guest trace clock is set automatically as the host, so this
command should be enough:
# trace-cmd record -C x86-tsc -e kvm:* -e msr:* -A tumbleweed:823 -e
msr:*  sleep 1

>
> Guest:
> # echo x86-tsc > /sys/kernel/tracing/trace_clock

There is no need to set manually the guest clock, it will be
overwritten by trace-cmd agent.

> # trace-cmd agent
>
> Am I doing something wrong?

All your commands and workflow look good, nothing wrong.

>
> If necessary, I can provide more info about my setup, or do more tests.

Yes, please can you send me both host and guest trace files ?
Also, it will be useful to send me the content of the KVM debug files:
    /sys/kernel/debug/kvm/<guest ID>/vcpu<*>/tsc-offset

>
>
> Thanks and Regards,
>
> Stefano

Thanks for testing this code!
Stefano De Venuto March 19, 2021, 5:44 p.m. UTC | #3
On 3/19/21 12:55 PM, Tzvetomir Stoyanov wrote:
> Hi Stefano,
>
> On Fri, Mar 19, 2021 at 12:08 PM Stefano De Venuto
> <stefano.devenuto99@gmail.com> wrote:
Hi!
>> The commands used to record are:
>>
>> Host:
>> # trace-cmd record -C x86-tsc -e kvm:* -e msr:* -A tumbleweed:823 -e
>> msr:* -C x86-tsc sleep 1
> The guest trace clock is set automatically as the host, so this
> command should be enough:
> # trace-cmd record -C x86-tsc -e kvm:* -e msr:* -A tumbleweed:823 -e
> msr:*  sleep 1
>
>> Guest:
>> # echo x86-tsc > /sys/kernel/tracing/trace_clock
> There is no need to set manually the guest clock, it will be
> overwritten by trace-cmd agent.
>
Thanks so much for the proper way to do it, really appreciated.
>> If necessary, I can provide more info about my setup, or do more tests.
> Yes, please can you send me both host and guest trace files ?
Here are the trace files, host and guest respectively:

- http://xenbits.xen.org/people/dariof/tracing-examples/kvm/sync-kvmclock/trace.dat
- http://xenbits.xen.org/people/dariof/tracing-examples/kvm/sync-kvmclock/trace-tumbleweed.dat

> Also, it will be useful to send me the content of the KVM debug files:
>     /sys/kernel/debug/kvm/<guest ID>/vcpu<*>/tsc-offset
The guest has one vcpu (vcpu0) and the content of the file is:

255647917761327
>>
>> Thanks and Regards,
>>
>> Stefano
> Thanks for testing this code!
>
Thanks for your time,

Stefano
Tzvetomir Stoyanov (VMware) March 20, 2021, 6:25 a.m. UTC | #4
Hi Stefano,

On Fri, Mar 19, 2021 at 7:44 PM Stefano De Venuto
<stefano.devenuto99@gmail.com> wrote:
>
>
>
> On 3/19/21 12:55 PM, Tzvetomir Stoyanov wrote:
> > Hi Stefano,
> >
> > On Fri, Mar 19, 2021 at 12:08 PM Stefano De Venuto
> > <stefano.devenuto99@gmail.com> wrote:
> Hi!
> >> The commands used to record are:
> >>
> >> Host:
> >> # trace-cmd record -C x86-tsc -e kvm:* -e msr:* -A tumbleweed:823 -e
> >> msr:* -C x86-tsc sleep 1
> > The guest trace clock is set automatically as the host, so this
> > command should be enough:
> > # trace-cmd record -C x86-tsc -e kvm:* -e msr:* -A tumbleweed:823 -e
> > msr:*  sleep 1
> >
> >> Guest:
> >> # echo x86-tsc > /sys/kernel/tracing/trace_clock
> > There is no need to set manually the guest clock, it will be
> > overwritten by trace-cmd agent.
> >
> Thanks so much for the proper way to do it, really appreciated.
> >> If necessary, I can provide more info about my setup, or do more tests.
> > Yes, please can you send me both host and guest trace files ?
> Here are the trace files, host and guest respectively:
>
> - http://xenbits.xen.org/people/dariof/tracing-examples/kvm/sync-kvmclock/trace.dat
> - http://xenbits.xen.org/people/dariof/tracing-examples/kvm/sync-kvmclock/trace-tumbleweed.dat
>
> > Also, it will be useful to send me the content of the KVM debug files:
> >     /sys/kernel/debug/kvm/<guest ID>/vcpu<*>/tsc-offset
> The guest has one vcpu (vcpu0) and the content of the file is:
>
> 255647917761327

Looks like there is a scaling between host and guest clocks in your
setup, not just a simple offset. We did not test yet our
implementation with scaling, although both offset and scaling are part
of the calculations. That makes your use case very valuable for us, as
we have an opportunity to test it now. And yes, looks like we have a
bug here.
Please, when you have time, can you repeat again the tracing session
and send again both trace files + the content of the KVM debug files:
     /sys/kernel/debug/kvm/<guest ID>/vcpu0/tsc-offset
     /sys/kernel/debug/kvm/<guest ID>/vcpu0/tsc-scaling-ratio
I'm asking to do a new trace, as most probably these offset and
scaling could be different now.

Thanks!

> >>
> >> Thanks and Regards,
> >>
> >> Stefano
> > Thanks for testing this code!
> >
> Thanks for your time,
>
> Stefano
Tzvetomir Stoyanov (VMware) March 22, 2021, 10:13 a.m. UTC | #5
On Sat, Mar 20, 2021 at 6:17 PM Stefano De Venuto
<stefano.devenuto99@gmail.com> wrote:
>
>
>
> On 3/20/21 7:25 AM, Tzvetomir Stoyanov wrote:
> > Hi Stefano,
> >
> > On Fri, Mar 19, 2021 at 7:44 PM Stefano De Venuto
> > <stefano.devenuto99@gmail.com> wrote:
> >>
> >>
> >> On 3/19/21 12:55 PM, Tzvetomir Stoyanov wrote:
> >>> Hi Stefano,
> >>>
> >>> On Fri, Mar 19, 2021 at 12:08 PM Stefano De Venuto
> >>> <stefano.devenuto99@gmail.com> wrote:
> >> Hi!
> >>>> The commands used to record are:
> >>>>
> >>>> Host:
> >>>> # trace-cmd record -C x86-tsc -e kvm:* -e msr:* -A tumbleweed:823 -e
> >>>> msr:* -C x86-tsc sleep 1
> >>> The guest trace clock is set automatically as the host, so this
> >>> command should be enough:
> >>> # trace-cmd record -C x86-tsc -e kvm:* -e msr:* -A tumbleweed:823 -e
> >>> msr:*  sleep 1
> >>>
> >>>> Guest:
> >>>> # echo x86-tsc > /sys/kernel/tracing/trace_clock
> >>> There is no need to set manually the guest clock, it will be
> >>> overwritten by trace-cmd agent.
> >>>
> >> Thanks so much for the proper way to do it, really appreciated.
> >>>> If necessary, I can provide more info about my setup, or do more tests.
> >>> Yes, please can you send me both host and guest trace files ?
> >> Here are the trace files, host and guest respectively:
> >>
> >> - http://xenbits.xen.org/people/dariof/tracing-examples/kvm/sync-kvmclock/trace.dat
> >> - http://xenbits.xen.org/people/dariof/tracing-examples/kvm/sync-kvmclock/trace-tumbleweed.dat
> >>
> >>> Also, it will be useful to send me the content of the KVM debug files:
> >>>     /sys/kernel/debug/kvm/<guest ID>/vcpu<*>/tsc-offset
> >> The guest has one vcpu (vcpu0) and the content of the file is:
> >>
> >> 255647917761327
> > Looks like there is a scaling between host and guest clocks in your
> > setup, not just a simple offset. We did not test yet our
> > implementation with scaling, although both offset and scaling are part
> > of the calculations. That makes your use case very valuable for us, as
> > we have an opportunity to test it now. And yes, looks like we have a
> > bug here.
> > Please, when you have time, can you repeat again the tracing session
> > and send again both trace files + the content of the KVM debug files:
> >      /sys/kernel/debug/kvm/<guest ID>/vcpu0/tsc-offset
> >      /sys/kernel/debug/kvm/<guest ID>/vcpu0/tsc-scaling-ratio
> > I'm asking to do a new trace, as most probably these offset and
> > scaling could be different now.
> Yes, the trace files are attached to this mail.
>
> The content of tsc-offset is:
> 453568564244284
>
> The content of tsc-scaling-ratio is:
> 4294967296
>

Confirmed, the problem is in the trace-cmd logic that works with the
KVM scaling.  Looks like tsc-scaling-ratio is the default KVM scaling
on your machine, (1 << 32), and in that case the scaling should be
ignored in our calculations. I still have no solution, going to
analyze how the KVM scaling works.

Thanks, Stafano!

>
> Thanks and Regards,
>
> Stefano
>
> >
> > Thanks!
> >
> >>>> Thanks and Regards,
> >>>>
> >>>> Stefano
> >>> Thanks for testing this code!
> >>>
> >> Thanks for your time,
> >>
> >> Stefano
> >
> >
>
Steven Rostedt March 22, 2021, 2:56 p.m. UTC | #6
On Mon, 22 Mar 2021 12:13:09 +0200
Tzvetomir Stoyanov <tz.stoyanov@gmail.com> wrote:

> Confirmed, the problem is in the trace-cmd logic that works with the
> KVM scaling.  Looks like tsc-scaling-ratio is the default KVM scaling
> on your machine, (1 << 32), and in that case the scaling should be
> ignored in our calculations. I still have no solution, going to
> analyze how the KVM scaling works.

I'm guessing that this was run on a AMD CPU, as the default scaling (the
one to ignore) is 1 << 32 for AMD and for Intel it's 1 << 48.

-- Steve
Tzvetomir Stoyanov (VMware) March 25, 2021, 7:20 a.m. UTC | #7
Hi Stefano,

On Sat, Mar 20, 2021 at 6:17 PM Stefano De Venuto
<stefano.devenuto99@gmail.com> wrote:
>
>
>
> On 3/20/21 7:25 AM, Tzvetomir Stoyanov wrote:
> > Hi Stefano,
> >
> > On Fri, Mar 19, 2021 at 7:44 PM Stefano De Venuto
> > <stefano.devenuto99@gmail.com> wrote:
> >>
> >>
> >> On 3/19/21 12:55 PM, Tzvetomir Stoyanov wrote:
> >>> Hi Stefano,
> >>>
> >>> On Fri, Mar 19, 2021 at 12:08 PM Stefano De Venuto
> >>> <stefano.devenuto99@gmail.com> wrote:
> >> Hi!
> >>>> The commands used to record are:
> >>>>
> >>>> Host:
> >>>> # trace-cmd record -C x86-tsc -e kvm:* -e msr:* -A tumbleweed:823 -e
> >>>> msr:* -C x86-tsc sleep 1
> >>> The guest trace clock is set automatically as the host, so this
> >>> command should be enough:
> >>> # trace-cmd record -C x86-tsc -e kvm:* -e msr:* -A tumbleweed:823 -e
> >>> msr:*  sleep 1
> >>>
> >>>> Guest:
> >>>> # echo x86-tsc > /sys/kernel/tracing/trace_clock
> >>> There is no need to set manually the guest clock, it will be
> >>> overwritten by trace-cmd agent.
> >>>
> >> Thanks so much for the proper way to do it, really appreciated.
> >>>> If necessary, I can provide more info about my setup, or do more tests.
> >>> Yes, please can you send me both host and guest trace files ?
> >> Here are the trace files, host and guest respectively:
> >>
> >> - http://xenbits.xen.org/people/dariof/tracing-examples/kvm/sync-kvmclock/trace.dat
> >> - http://xenbits.xen.org/people/dariof/tracing-examples/kvm/sync-kvmclock/trace-tumbleweed.dat
> >>
> >>> Also, it will be useful to send me the content of the KVM debug files:
> >>>     /sys/kernel/debug/kvm/<guest ID>/vcpu<*>/tsc-offset
> >> The guest has one vcpu (vcpu0) and the content of the file is:
> >>
> >> 255647917761327
> > Looks like there is a scaling between host and guest clocks in your
> > setup, not just a simple offset. We did not test yet our
> > implementation with scaling, although both offset and scaling are part
> > of the calculations. That makes your use case very valuable for us, as
> > we have an opportunity to test it now. And yes, looks like we have a
> > bug here.
> > Please, when you have time, can you repeat again the tracing session
> > and send again both trace files + the content of the KVM debug files:
> >      /sys/kernel/debug/kvm/<guest ID>/vcpu0/tsc-offset
> >      /sys/kernel/debug/kvm/<guest ID>/vcpu0/tsc-scaling-ratio
> > I'm asking to do a new trace, as most probably these offset and
> > scaling could be different now.
> Yes, the trace files are attached to this mail.
>
> The content of tsc-offset is:
> 453568564244284
>
> The content of tsc-scaling-ratio is:
> 4294967296
>
Just submitted the v33 of the patch set, added a check for a
non-default KVM scaling. Now it should work on your setup, as the KVM
scaling there looks to be the default. I didn't test it, as we have no
machine which supports KVM scaling. This is just a workaround, we have
to implement support to KVM scaling in our algorithm.
Please, can you test if it is OK ? Note, you have to apply  these
patch sets also, as there are dependencies:
   v2 Refactoring and improvements of time sync logic
   v4 TSC trace clock to nanosecond conversion

Thanks for testing this code!

>
> Thanks and Regards,
>
> Stefano
>
> >
> > Thanks!
> >
> >>>> Thanks and Regards,
> >>>>
> >>>> Stefano
> >>> Thanks for testing this code!
> >>>
> >> Thanks for your time,
> >>
> >> Stefano
> >
> >
>
Stefano De Venuto March 25, 2021, 6:01 p.m. UTC | #8
On 3/25/21 8:20 AM, Tzvetomir Stoyanov wrote:
> Hi Stefano,
>
> On Sat, Mar 20, 2021 at 6:17 PM Stefano De Venuto
> <stefano.devenuto99@gmail.com> wrote:
>>
>>
>> On 3/20/21 7:25 AM, Tzvetomir Stoyanov wrote:
>>> Hi Stefano,
>>>
>>> On Fri, Mar 19, 2021 at 7:44 PM Stefano De Venuto
>>> <stefano.devenuto99@gmail.com> wrote:
>>>>
>>>> On 3/19/21 12:55 PM, Tzvetomir Stoyanov wrote:
>>>>> Hi Stefano,
>>>>>
>>>>> On Fri, Mar 19, 2021 at 12:08 PM Stefano De Venuto
>>>>> <stefano.devenuto99@gmail.com> wrote:
>>>> Hi!
>>>>>> The commands used to record are:
>>>>>>
>>>>>> Host:
>>>>>> # trace-cmd record -C x86-tsc -e kvm:* -e msr:* -A tumbleweed:823 -e
>>>>>> msr:* -C x86-tsc sleep 1
>>>>> The guest trace clock is set automatically as the host, so this
>>>>> command should be enough:
>>>>> # trace-cmd record -C x86-tsc -e kvm:* -e msr:* -A tumbleweed:823 -e
>>>>> msr:*  sleep 1
>>>>>
>>>>>> Guest:
>>>>>> # echo x86-tsc > /sys/kernel/tracing/trace_clock
>>>>> There is no need to set manually the guest clock, it will be
>>>>> overwritten by trace-cmd agent.
>>>>>
>>>> Thanks so much for the proper way to do it, really appreciated.
>>>>>> If necessary, I can provide more info about my setup, or do more tests.
>>>>> Yes, please can you send me both host and guest trace files ?
>>>> Here are the trace files, host and guest respectively:
>>>>
>>>> - http://xenbits.xen.org/people/dariof/tracing-examples/kvm/sync-kvmclock/trace.dat
>>>> - http://xenbits.xen.org/people/dariof/tracing-examples/kvm/sync-kvmclock/trace-tumbleweed.dat
>>>>
>>>>> Also, it will be useful to send me the content of the KVM debug files:
>>>>>     /sys/kernel/debug/kvm/<guest ID>/vcpu<*>/tsc-offset
>>>> The guest has one vcpu (vcpu0) and the content of the file is:
>>>>
>>>> 255647917761327
>>> Looks like there is a scaling between host and guest clocks in your
>>> setup, not just a simple offset. We did not test yet our
>>> implementation with scaling, although both offset and scaling are part
>>> of the calculations. That makes your use case very valuable for us, as
>>> we have an opportunity to test it now. And yes, looks like we have a
>>> bug here.
>>> Please, when you have time, can you repeat again the tracing session
>>> and send again both trace files + the content of the KVM debug files:
>>>      /sys/kernel/debug/kvm/<guest ID>/vcpu0/tsc-offset
>>>      /sys/kernel/debug/kvm/<guest ID>/vcpu0/tsc-scaling-ratio
>>> I'm asking to do a new trace, as most probably these offset and
>>> scaling could be different now.
>> Yes, the trace files are attached to this mail.
>>
>> The content of tsc-offset is:
>> 453568564244284
>>
>> The content of tsc-scaling-ratio is:
>> 4294967296
>>
> Just submitted the v33 of the patch set, added a check for a
> non-default KVM scaling. Now it should work on your setup, as the KVM
> scaling there looks to be the default. I didn't test it, as we have no
> machine which supports KVM scaling. This is just a workaround, we have
> to implement support to KVM scaling in our algorithm.
> Please, can you test if it is OK ? Note, you have to apply  these
> patch sets also, as there are dependencies:
>    v2 Refactoring and improvements of time sync logic
>    v4 TSC trace clock to nanosecond conversion
>
> Thanks for testing this code!
I tested the patch series and now works properly on my setup!

Thanks,
Stefano
>
>> Thanks and Regards,
>>
>> Stefano
>>
>>> Thanks!
>>>
>>>>>> Thanks and Regards,
>>>>>>
>>>>>> Stefano
>>>>> Thanks for testing this code!
>>>>>
>>>> Thanks for your time,
>>>>
>>>> Stefano
>>>
>
Dario Faggioli March 25, 2021, 8:55 p.m. UTC | #9
On Mon, 2021-03-22 at 10:56 -0400, Steven Rostedt wrote:
> On Mon, 22 Mar 2021 12:13:09 +0200
> Tzvetomir Stoyanov <tz.stoyanov@gmail.com> wrote:
> 
> > Confirmed, the problem is in the trace-cmd logic that works with
> > the
> > KVM scaling.  Looks like tsc-scaling-ratio is the default KVM
> > scaling
> > on your machine, (1 << 32), and in that case the scaling should be
> > ignored in our calculations. I still have no solution, going to
> > analyze how the KVM scaling works.
> 
> I'm guessing that this was run on a AMD CPU, as the default scaling
> (the
> one to ignore) is 1 << 32 for AMD and for Intel it's 1 << 48.
> 
BTW, yes. Stefano could give us details (lscpu or anything) if
necessary, but this indeed is AMD.

Regards