diff mbox series

[v2,2/3] trace-cmd library: Add check before applying tsc2nsec offset

Message ID 20210416103409.24597-3-tz.stoyanov@gmail.com (mailing list archive)
State Superseded
Headers show
Series Fix overflow when applying tsc2nsec calculations | expand

Commit Message

Tzvetomir Stoyanov (VMware) April 16, 2021, 10:34 a.m. UTC
When converting TSC timestamps to nanoseconds, an offset is used. That
offset is subtracted from each TSC timestamp, before the conversion.
However, if the timestamp is lower that the offset this causes an
overflow, as both timestamp and offset are unsigned long integers. A
check is added to verify the subtraction will not cause an overflow. In
case of an overflow, the timestamp is set to 0 and a warning message is
printed.

Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
---
 lib/trace-cmd/trace-input.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Steven Rostedt April 16, 2021, 8:12 p.m. UTC | #1
On Fri, 16 Apr 2021 13:34:08 +0300
"Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> wrote:

> When converting TSC timestamps to nanoseconds, an offset is used. That
> offset is subtracted from each TSC timestamp, before the conversion.
> However, if the timestamp is lower that the offset this causes an
> overflow, as both timestamp and offset are unsigned long integers. A
> check is added to verify the subtraction will not cause an overflow. In
> case of an overflow, the timestamp is set to 0 and a warning message is
> printed.
> 
> Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
> ---
>  lib/trace-cmd/trace-input.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/trace-cmd/trace-input.c b/lib/trace-cmd/trace-input.c
> index 974879e8..991abd5f 100644
> --- a/lib/trace-cmd/trace-input.c
> +++ b/lib/trace-cmd/trace-input.c
> @@ -1230,8 +1230,14 @@ static unsigned long long timestamp_calc(unsigned long long ts, int cpu,
>  		ts *= handle->ts2secs;
>  	} else if (handle->tsc_calc.mult) {
>  		/* auto calculated TSC clock frequency */
> -		ts -= handle->tsc_calc.offset;
> -		ts = mul_u64_u32_shr(ts, handle->tsc_calc.mult, handle->tsc_calc.shift);
> +		if (ts > handle->tsc_calc.offset) {
> +			ts -= handle->tsc_calc.offset;
> +			ts = mul_u64_u32_shr(ts, handle->tsc_calc.mult, handle->tsc_calc.shift);
> +		} else {
> +			tracecmd_warning("Timestamp $llu is before the initial offset %llu, set it to 0",

   $llu  ??

Are you testing these? The bad trace files that I supplied triggers this.


> +					 ts, handle->tsc_calc.offset);
> +			ts = 0;

We should turn off the offset and not just zero out the timestamps,
otherwise we just have useless data:

  Timestamp $llu is before the initial offset 20094822433105, set it to 0
  Timestamp $llu is before the initial offset 20094822433105, set it to 0
  Timestamp $llu is before the initial offset 20094822460994, set it to 0
libtracecmd: File exists
  Timestamp $llu is before the initial offset 20094822961507, set it to 0
libtracecmd: File exists
  Timestamp $llu is before the initial offset 20094822967292, set it to 0
libtracecmd: File exists
  Timestamp $llu is before the initial offset 20094823302351, set it to 0
libtracecmd: File exists
  Timestamp $llu is before the initial offset 20094823306954, set it to 0
libtracecmd: File exists
  Timestamp $llu is before the initial offset 20094824963751, set it to 0
libtracecmd: File exists
  Timestamp $llu is before the initial offset 20094824965723, set it to 0
libtracecmd: File exists
  Timestamp $llu is before the initial offset 20094824972124, set it to 0
libtracecmd: File exists
  Timestamp $llu is before the initial offset 20094824972697, set it to 0
libtracecmd: File exists
  Timestamp $llu is before the initial offset 20094828868305, set it to 0
libtracecmd: File exists
  Timestamp $llu is before the initial offset 20094828871264, set it to 0
libtracecmd: File exists
  Timestamp $llu is before the initial offset 20094828881104, set it to 0
libtracecmd: File exists
  Timestamp $llu is before the initial offset 20094828882014, set it to 0
libtracecmd: File exists
  Timestamp $llu is before the initial offset 20094831874377, set it to 0
libtracecmd: File exists
  Timestamp $llu is before the initial offset 20094831876734, set it to 0
libtracecmd: File exists
  Timestamp $llu is before the initial offset 20094831881252, set it to 0
[..]
libtracecmd: File exists
  Timestamp $llu is before the initial offset 20094832060068, set it to 0
libtracecmd: File exists
  Timestamp $llu is before the initial offset 20094835262757, set it to 0
libtracecmd: File exists
  Timestamp $llu is before the initial offset 20094835265223, set it to 0
libtracecmd: File exists
  Timestamp $llu is before the initial offset 20094835270122, set it to 0
libtracecmd: File exists
  Timestamp $llu is before the initial offset 20094835273058, set it to 0
libtracecmd: File exists
  Timestamp $llu is before the initial offset 20094835273614, set it to 0
cpus=2
       trace-cmd-3196  [000]     0.000000: sched_stat_runtime:   comm=trace-cmd pid=3196 runtime=85576 [ns] vruntime=12853032342 [ns]
       trace-cmd-3196  [000]     0.000000: sched_switch:         trace-cmd:3196 [120] S ==> trace-cmd:3199 [120]
       trace-cmd-3199  [000]     0.000000: sched_stat_runtime:   comm=trace-cmd pid=3199 runtime=156103 [ns] vruntime=12857102867 [ns]
       trace-cmd-3199  [000]     0.000000: sched_switch:         trace-cmd:3199 [120] S ==> trace-cmd:3198 [120]
       trace-cmd-3198  [000]     0.000000: sched_stat_runtime:   comm=trace-cmd pid=3198 runtime=100540 [ns] vruntime=12858890364 [ns]
       trace-cmd-3198  [000]     0.000000: sched_switch:         trace-cmd:3198 [120] S ==> swapper/0:0 [120]
          <idle>-0     [000]     0.000000: hrtimer_cancel:       hrtimer=0xffff8cc5bd41f780
          <idle>-0     [000]     0.000000: hrtimer_expire_entry: hrtimer=0xffff8cc5bd41f780 now=1444054745103 function=tick_sched_timer/0x0
          <idle>-0     [000]     0.000000: hrtimer_expire_exit:  hrtimer=0xffff8cc5bd41f780
          <idle>-0     [000]     0.000000: hrtimer_start:        hrtimer=0xffff8cc5bd41f780 function=tick_sched_timer/0x0 expires=1444055726117 softexpires=1444055726117
          <idle>-0     [000]     0.000000: hrtimer_cancel:       hrtimer=0xffff8cc5bd41f780
          <idle>-0     [000]     0.000000: hrtimer_expire_entry: hrtimer=0xffff8cc5bd41f780 now=1444055895819 function=tick_sched_timer/0x0
          <idle>-0     [000]     0.000000: hrtimer_expire_exit:  hrtimer=0xffff8cc5bd41f780
          <idle>-0     [000]     0.000000: hrtimer_start:        hrtimer=0xffff8cc5bd41f780 function=tick_sched_timer/0x0 expires=1444056726117 softexpires=1444056726117
          <idle>-0     [000]     0.000000: hrtimer_cancel:       hrtimer=0xffff8cc5bd41f780
          <idle>-0     [000]     0.000000: hrtimer_expire_entry: hrtimer=0xffff8cc5bd41f780 now=1444056782207 function=tick_sched_timer/0x0
          <idle>-0     [000]     0.000000: softirq_raise:        vec=1 [action=TIMER]
          <idle>-0     [000]     0.000000: hrtimer_expire_exit:  hrtimer=0xffff8cc5bd41f780
          <idle>-0     [000]     0.000000: hrtimer_start:        hrtimer=0xffff8cc5bd41f780 function=tick_sched_timer/0x0 expires=1444057726117 softexpires=1444057726117
          <idle>-0     [000]     0.000000: softirq_entry:        vec=1 [action=TIMER]
          <idle>-0     [000]     0.000000: timer_cancel:         timer=0xffffa61a0006be40
          <idle>-0     [000]     0.000000: timer_expire_entry:   timer=0xffffa61a0006be40 function=process_timeout now=4296111312 baseclk=4296111312
          <idle>-0     [000]     0.000000: sched_waking:         comm=rcu_sched pid=12 prio=120 target_cpu=000

We don't want all timestamps zero. We want to disable the starting event.

By having this instead:

		} else {
			tracecmd_warning("Timestamp %llu is before the initial offset %llu\n"
					 "\tSetting offset to 0",
					 ts, handle->tsc_calc.offset);
			handle->tsc_calc.offset = 0;
			ts = mul_u64_u32_shr(ts, handle->tsc_calc.mult, handle->tsc_calc.shift);
		}

We get this as output:

  Timestamp 20094822433105 is before the initial offset 20159736262966
        Setting offset to 0
cpus=2
       trace-cmd-3196  [000]  5923.651995: sched_stat_runtime:   comm=trace-cmd pid=3196 runtime=85576 [ns] vruntime=12853032342 [ns]
       trace-cmd-3196  [000]  5923.652004: sched_switch:         trace-cmd:3196 [120] S ==> trace-cmd:3199 [120]
       trace-cmd-3199  [000]  5923.652151: sched_stat_runtime:   comm=trace-cmd pid=3199 runtime=156103 [ns] vruntime=12857102867 [ns]
       trace-cmd-3199  [000]  5923.652153: sched_switch:         trace-cmd:3199 [120] S ==> trace-cmd:3198 [120]
       trace-cmd-3198  [000]  5923.652252: sched_stat_runtime:   comm=trace-cmd pid=3198 runtime=100540 [ns] vruntime=12858890364 [ns]
       trace-cmd-3198  [000]  5923.652253: sched_switch:         trace-cmd:3198 [120] S ==> swapper/0:0 [120]
          <idle>-0     [000]  5923.652741: hrtimer_cancel:       hrtimer=0xffff8cc5bd41f780
          <idle>-0     [000]  5923.652742: hrtimer_expire_entry: hrtimer=0xffff8cc5bd41f780 now=1444054745103 function=tick_sched_timer/0x0
          <idle>-0     [000]  5923.652744: hrtimer_expire_exit:  hrtimer=0xffff8cc5bd41f780
          <idle>-0     [000]  5923.652744: hrtimer_start:        hrtimer=0xffff8cc5bd41f780 function=tick_sched_timer/0x0 expires=1444055726117 softexpires=1444055726117
          <idle>-0     [000]  5923.653892: hrtimer_cancel:       hrtimer=0xffff8cc5bd41f780

Much more usable.

-- Steve


> +		}
>  	}
>  
>  	/* User specified time offset with --ts-offset or --date options */
Tzvetomir Stoyanov (VMware) April 19, 2021, 8:08 a.m. UTC | #2
On Fri, Apr 16, 2021 at 11:12 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 16 Apr 2021 13:34:08 +0300
> "Tzvetomir Stoyanov (VMware)" <tz.stoyanov@gmail.com> wrote:
>
> > When converting TSC timestamps to nanoseconds, an offset is used. That
> > offset is subtracted from each TSC timestamp, before the conversion.
> > However, if the timestamp is lower that the offset this causes an
> > overflow, as both timestamp and offset are unsigned long integers. A
> > check is added to verify the subtraction will not cause an overflow. In
> > case of an overflow, the timestamp is set to 0 and a warning message is
> > printed.
> >
> > Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
> > ---
> >  lib/trace-cmd/trace-input.c | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/trace-cmd/trace-input.c b/lib/trace-cmd/trace-input.c
> > index 974879e8..991abd5f 100644
> > --- a/lib/trace-cmd/trace-input.c
> > +++ b/lib/trace-cmd/trace-input.c
> > @@ -1230,8 +1230,14 @@ static unsigned long long timestamp_calc(unsigned long long ts, int cpu,
> >               ts *= handle->ts2secs;
> >       } else if (handle->tsc_calc.mult) {
> >               /* auto calculated TSC clock frequency */
> > -             ts -= handle->tsc_calc.offset;
> > -             ts = mul_u64_u32_shr(ts, handle->tsc_calc.mult, handle->tsc_calc.shift);
> > +             if (ts > handle->tsc_calc.offset) {
> > +                     ts -= handle->tsc_calc.offset;
> > +                     ts = mul_u64_u32_shr(ts, handle->tsc_calc.mult, handle->tsc_calc.shift);
> > +             } else {
> > +                     tracecmd_warning("Timestamp $llu is before the initial offset %llu, set it to 0",
>
>    $llu  ??
>
> Are you testing these? The bad trace files that I supplied triggers this.
>
>
> > +                                      ts, handle->tsc_calc.offset);
> > +                     ts = 0;
>
> We should turn off the offset and not just zero out the timestamps,
> otherwise we just have useless data:
>
>   Timestamp $llu is before the initial offset 20094822433105, set it to 0
>   Timestamp $llu is before the initial offset 20094822433105, set it to 0
>   Timestamp $llu is before the initial offset 20094822460994, set it to 0
> libtracecmd: File exists
>   Timestamp $llu is before the initial offset 20094822961507, set it to 0
> libtracecmd: File exists
>   Timestamp $llu is before the initial offset 20094822967292, set it to 0
> libtracecmd: File exists
>   Timestamp $llu is before the initial offset 20094823302351, set it to 0
> libtracecmd: File exists
>   Timestamp $llu is before the initial offset 20094823306954, set it to 0
> libtracecmd: File exists
>   Timestamp $llu is before the initial offset 20094824963751, set it to 0
> libtracecmd: File exists
>   Timestamp $llu is before the initial offset 20094824965723, set it to 0
> libtracecmd: File exists
>   Timestamp $llu is before the initial offset 20094824972124, set it to 0
> libtracecmd: File exists
>   Timestamp $llu is before the initial offset 20094824972697, set it to 0
> libtracecmd: File exists
>   Timestamp $llu is before the initial offset 20094828868305, set it to 0
> libtracecmd: File exists
>   Timestamp $llu is before the initial offset 20094828871264, set it to 0
> libtracecmd: File exists
>   Timestamp $llu is before the initial offset 20094828881104, set it to 0
> libtracecmd: File exists
>   Timestamp $llu is before the initial offset 20094828882014, set it to 0
> libtracecmd: File exists
>   Timestamp $llu is before the initial offset 20094831874377, set it to 0
> libtracecmd: File exists
>   Timestamp $llu is before the initial offset 20094831876734, set it to 0
> libtracecmd: File exists
>   Timestamp $llu is before the initial offset 20094831881252, set it to 0
> [..]
> libtracecmd: File exists
>   Timestamp $llu is before the initial offset 20094832060068, set it to 0
> libtracecmd: File exists
>   Timestamp $llu is before the initial offset 20094835262757, set it to 0
> libtracecmd: File exists
>   Timestamp $llu is before the initial offset 20094835265223, set it to 0
> libtracecmd: File exists
>   Timestamp $llu is before the initial offset 20094835270122, set it to 0
> libtracecmd: File exists
>   Timestamp $llu is before the initial offset 20094835273058, set it to 0
> libtracecmd: File exists
>   Timestamp $llu is before the initial offset 20094835273614, set it to 0
> cpus=2
>        trace-cmd-3196  [000]     0.000000: sched_stat_runtime:   comm=trace-cmd pid=3196 runtime=85576 [ns] vruntime=12853032342 [ns]
>        trace-cmd-3196  [000]     0.000000: sched_switch:         trace-cmd:3196 [120] S ==> trace-cmd:3199 [120]
>        trace-cmd-3199  [000]     0.000000: sched_stat_runtime:   comm=trace-cmd pid=3199 runtime=156103 [ns] vruntime=12857102867 [ns]
>        trace-cmd-3199  [000]     0.000000: sched_switch:         trace-cmd:3199 [120] S ==> trace-cmd:3198 [120]
>        trace-cmd-3198  [000]     0.000000: sched_stat_runtime:   comm=trace-cmd pid=3198 runtime=100540 [ns] vruntime=12858890364 [ns]
>        trace-cmd-3198  [000]     0.000000: sched_switch:         trace-cmd:3198 [120] S ==> swapper/0:0 [120]
>           <idle>-0     [000]     0.000000: hrtimer_cancel:       hrtimer=0xffff8cc5bd41f780
>           <idle>-0     [000]     0.000000: hrtimer_expire_entry: hrtimer=0xffff8cc5bd41f780 now=1444054745103 function=tick_sched_timer/0x0
>           <idle>-0     [000]     0.000000: hrtimer_expire_exit:  hrtimer=0xffff8cc5bd41f780
>           <idle>-0     [000]     0.000000: hrtimer_start:        hrtimer=0xffff8cc5bd41f780 function=tick_sched_timer/0x0 expires=1444055726117 softexpires=1444055726117
>           <idle>-0     [000]     0.000000: hrtimer_cancel:       hrtimer=0xffff8cc5bd41f780
>           <idle>-0     [000]     0.000000: hrtimer_expire_entry: hrtimer=0xffff8cc5bd41f780 now=1444055895819 function=tick_sched_timer/0x0
>           <idle>-0     [000]     0.000000: hrtimer_expire_exit:  hrtimer=0xffff8cc5bd41f780
>           <idle>-0     [000]     0.000000: hrtimer_start:        hrtimer=0xffff8cc5bd41f780 function=tick_sched_timer/0x0 expires=1444056726117 softexpires=1444056726117
>           <idle>-0     [000]     0.000000: hrtimer_cancel:       hrtimer=0xffff8cc5bd41f780
>           <idle>-0     [000]     0.000000: hrtimer_expire_entry: hrtimer=0xffff8cc5bd41f780 now=1444056782207 function=tick_sched_timer/0x0
>           <idle>-0     [000]     0.000000: softirq_raise:        vec=1 [action=TIMER]
>           <idle>-0     [000]     0.000000: hrtimer_expire_exit:  hrtimer=0xffff8cc5bd41f780
>           <idle>-0     [000]     0.000000: hrtimer_start:        hrtimer=0xffff8cc5bd41f780 function=tick_sched_timer/0x0 expires=1444057726117 softexpires=1444057726117
>           <idle>-0     [000]     0.000000: softirq_entry:        vec=1 [action=TIMER]
>           <idle>-0     [000]     0.000000: timer_cancel:         timer=0xffffa61a0006be40
>           <idle>-0     [000]     0.000000: timer_expire_entry:   timer=0xffffa61a0006be40 function=process_timeout now=4296111312 baseclk=4296111312
>           <idle>-0     [000]     0.000000: sched_waking:         comm=rcu_sched pid=12 prio=120 target_cpu=000
>
> We don't want all timestamps zero. We want to disable the starting event.
>
> By having this instead:
>
>                 } else {
>                         tracecmd_warning("Timestamp %llu is before the initial offset %llu\n"
>                                          "\tSetting offset to 0",
>                                          ts, handle->tsc_calc.offset);
>                         handle->tsc_calc.offset = 0;
>                         ts = mul_u64_u32_shr(ts, handle->tsc_calc.mult, handle->tsc_calc.shift);
>                 }
>

This will set the guest offset to 0, will not change the offsets of
the other files. The result is that the files will not be displayed in
sync, because of these few broken events. As this is only a sanity
check, and should not happen if the patch "trace-cmd: Get the
timestamp of the first recorded event as TSC offset" is applied, I
would suggest to zero the broken timestamps and to advise the user to
use "--raw-ts" option instead:
 "Timestamp %llu is before the initial offset %llu, set it to 0.
Display the files with --raw-ts option to see the original
timestamps.\n"


> We get this as output:
>
>   Timestamp 20094822433105 is before the initial offset 20159736262966
>         Setting offset to 0
> cpus=2
>        trace-cmd-3196  [000]  5923.651995: sched_stat_runtime:   comm=trace-cmd pid=3196 runtime=85576 [ns] vruntime=12853032342 [ns]
>        trace-cmd-3196  [000]  5923.652004: sched_switch:         trace-cmd:3196 [120] S ==> trace-cmd:3199 [120]
>        trace-cmd-3199  [000]  5923.652151: sched_stat_runtime:   comm=trace-cmd pid=3199 runtime=156103 [ns] vruntime=12857102867 [ns]
>        trace-cmd-3199  [000]  5923.652153: sched_switch:         trace-cmd:3199 [120] S ==> trace-cmd:3198 [120]
>        trace-cmd-3198  [000]  5923.652252: sched_stat_runtime:   comm=trace-cmd pid=3198 runtime=100540 [ns] vruntime=12858890364 [ns]
>        trace-cmd-3198  [000]  5923.652253: sched_switch:         trace-cmd:3198 [120] S ==> swapper/0:0 [120]
>           <idle>-0     [000]  5923.652741: hrtimer_cancel:       hrtimer=0xffff8cc5bd41f780
>           <idle>-0     [000]  5923.652742: hrtimer_expire_entry: hrtimer=0xffff8cc5bd41f780 now=1444054745103 function=tick_sched_timer/0x0
>           <idle>-0     [000]  5923.652744: hrtimer_expire_exit:  hrtimer=0xffff8cc5bd41f780
>           <idle>-0     [000]  5923.652744: hrtimer_start:        hrtimer=0xffff8cc5bd41f780 function=tick_sched_timer/0x0 expires=1444055726117 softexpires=1444055726117
>           <idle>-0     [000]  5923.653892: hrtimer_cancel:       hrtimer=0xffff8cc5bd41f780
>
> Much more usable.
>
> -- Steve
>
>
> > +             }
> >       }
> >
> >       /* User specified time offset with --ts-offset or --date options */
>
Steven Rostedt April 19, 2021, 1:45 p.m. UTC | #3
On Mon, 19 Apr 2021 11:08:04 +0300
Tzvetomir Stoyanov <tz.stoyanov@gmail.com> wrote:

> > We don't want all timestamps zero. We want to disable the starting event.
> >
> > By having this instead:
> >
> >                 } else {
> >                         tracecmd_warning("Timestamp %llu is before the initial offset %llu\n"
> >                                          "\tSetting offset to 0",
> >                                          ts, handle->tsc_calc.offset);
> >                         handle->tsc_calc.offset = 0;
> >                         ts = mul_u64_u32_shr(ts, handle->tsc_calc.mult, handle->tsc_calc.shift);
> >                 }
> >  
> 
> This will set the guest offset to 0, will not change the offsets of
> the other files. The result is that the files will not be displayed in
> sync, because of these few broken events. As this is only a sanity
> check, and should not happen if the patch "trace-cmd: Get the
> timestamp of the first recorded event as TSC offset" is applied, I
> would suggest to zero the broken timestamps and to advise the user to
> use "--raw-ts" option instead:
>  "Timestamp %llu is before the initial offset %llu, set it to 0.
> Display the files with --raw-ts option to see the original
> timestamps.\n"

I still don't like it, because a thousand events all with the same
timestamp will still screw up everything (including the output of
kernelshark, as you'll see a thousand events at time zero, and then the
rest of the events at some way off in the future time).

And as this is in the library, having a warning is only applicable to
trace-cmd, and not any other user case.

Actually, I think we are looking at the wrong place to do the check. All
event streams (CPUs) need to be in monotonic order, or it's just corrupted
to begin with.

The library should supply a check to see if each stream starts after the
offset, and if not, give a report back to the user (trace-cmd in this case,
or kernelsharks ftrace plugin). Then give the user a way to give another
offset to tell all the other streams to convert to.

That is:

	max = 0;
	for each stream:
		ret = read first event, compared to offset
		if (!ret) {
			/* ret will be how much off by offset */
			if (ret > max)
				max = ret;
		}
	if (max) {
		for each stream:
			Update offset by subtracting max
	}

Look at each stream, and have some callback give you how much ahead the
first event is from its given offset. Take the biggest value from reading
all the streams, then tell all the streams to subtract its offset by that
max value. The end result is that all streams now start at a positive value.

-- Steve
Steven Rostedt April 19, 2021, 3:14 p.m. UTC | #4
On Mon, 19 Apr 2021 09:45:43 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:


> That is:
> 
> 	max = 0;
> 	for each stream:
> 		ret = read first event, compared to offset
> 		if (!ret) {
> 			/* ret will be how much off by offset */
> 			if (ret > max)
> 				max = ret;
> 		}
> 	if (max) {
> 		for each stream:
> 			Update offset by subtracting max
> 	}
> 
> Look at each stream, and have some callback give you how much ahead the
> first event is from its given offset. Take the biggest value from reading
> all the streams, then tell all the streams to subtract its offset by that
> max value. The end result is that all streams now start at a positive value.

From our call, here's the pseudo code that I was talking about:

> diff --git a/lib/trace-cmd/trace-input.c b/lib/trace-cmd/trace-input.c
> index b17b36e0..f03dadd3 100644
> --- a/lib/trace-cmd/trace-input.c
> +++ b/lib/trace-cmd/trace-input.c
> @@ -1302,7 +1302,7 @@ static unsigned long long timestamp_host_sync(unsigned long long ts, int cpu,
>  					 &tsync->ts_samples[mid+1]);
>  }
>  
> -static unsigned long long timestamp_calc(unsigned long long ts, int cpu,
> +static unsigned long long pre_timestamp_calc(unsigned long long ts, int cpu,
>  					 struct tracecmd_input *handle)

I pulled out the timestamp_calc into a helper function.

>  {
>  	/* do not modify raw timestamps */
> @@ -1318,17 +1318,44 @@ static unsigned long long timestamp_calc(unsigned long long ts, int cpu,
>  		ts *= handle->ts2secs;
>  	} else if (handle->tsc_calc.mult) {
>  		/* auto calculated TSC clock frequency */
> -		ts -= handle->tsc_calc.offset;

And removed the calc offset.

>  		ts = mul_u64_u32_shr(ts, handle->tsc_calc.mult, handle->tsc_calc.shift);
>  	}
>  
>  	/* User specified time offset with --ts-offset or --date options */
> -	if (handle->ts_offset)
> -		ts += handle->ts_offset;
> +	ts += handle->ts_offset;

As we mentioned (and this can be a separate patch), the if statement is
useless.

>  
>  	return ts;
>  }
>  
> +static unsigned long long timestamp_calc(unsigned long long ts, int cpu,
> +					 struct tracecmd_input *handle)
> +{
> +	static int once;
> +
> +	ts = pre_timestamp_calc(ts, cpu, handle);
> +	if (!once && ts > handle->start_ts_offset) {
> +		once++;
> +		tracecmd_warning();
> +	}
> +	ts -= handle->start_ts_offset;

After looking at this more, I think we should just have the ts_offset and
start_ts_offset be the same. And remove the ts += handle->ts_offset, from
the pre_timestamp_calc() above, and have this check test just ts_offset.

So now the timestamp_calc() will get the timestamp and then apply the
ts_offset separately (and warn if the offset is greater than the ts).

> +}
> +
> +
> +
> +long long tracecmd_cpu_first_ts_offset(struct tracecmd_input *handle, int cpu)
> +}
> +	struct tep_record rec;
> +
> +	rec = first_event(handle, cpu);
> +	return pre_timestamp_calc(rec->ts, cpu, handle) - handle->ts_offset;
> +}

Add an API that shows the difference between the first stream event
timestamp against the user supplied (or file supplied) ts_offset.


> +
> +int tracecmd_modify_ts_offset(struct tracecmd_input *handle, long long offset)
> +{
> +	handle->ts_offset += offset;
> +}

Allow the user to tweak that offset. As we already have:

tracecmd_set_ts_offset(handle, offset) to set ts_offset, if the user found
that the offset was before, it could tweak it.

> +
> +
>  /*
>   * Page is mapped, now read in the page header info.
>   */
> 


That is, in the options, we would need to have the calc offset from the
file (doing the sync), do:

	handle->ts_offset -= start_offset.

Or something like that. I'll let you look at this code and see what you
come up with, and we can discuss this further.

-- Steve
Tzvetomir Stoyanov (VMware) April 28, 2021, 12:31 p.m. UTC | #5
On Mon, Apr 19, 2021 at 6:14 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Mon, 19 Apr 2021 09:45:43 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
>
> > That is:
> >
> >       max = 0;
> >       for each stream:
> >               ret = read first event, compared to offset
> >               if (!ret) {
> >                       /* ret will be how much off by offset */
> >                       if (ret > max)
> >                               max = ret;
> >               }
> >       if (max) {
> >               for each stream:
> >                       Update offset by subtracting max
> >       }
> >
> > Look at each stream, and have some callback give you how much ahead the
> > first event is from its given offset. Take the biggest value from reading
> > all the streams, then tell all the streams to subtract its offset by that
> > max value. The end result is that all streams now start at a positive value.
>
> From our call, here's the pseudo code that I was talking about:
>
> > diff --git a/lib/trace-cmd/trace-input.c b/lib/trace-cmd/trace-input.c
> > index b17b36e0..f03dadd3 100644
> > --- a/lib/trace-cmd/trace-input.c
> > +++ b/lib/trace-cmd/trace-input.c
> > @@ -1302,7 +1302,7 @@ static unsigned long long timestamp_host_sync(unsigned long long ts, int cpu,
> >                                        &tsync->ts_samples[mid+1]);
> >  }
> >
> > -static unsigned long long timestamp_calc(unsigned long long ts, int cpu,
> > +static unsigned long long pre_timestamp_calc(unsigned long long ts, int cpu,
> >                                        struct tracecmd_input *handle)
>
> I pulled out the timestamp_calc into a helper function.
>
> >  {
> >       /* do not modify raw timestamps */
> > @@ -1318,17 +1318,44 @@ static unsigned long long timestamp_calc(unsigned long long ts, int cpu,
> >               ts *= handle->ts2secs;
> >       } else if (handle->tsc_calc.mult) {
> >               /* auto calculated TSC clock frequency */
> > -             ts -= handle->tsc_calc.offset;
>
> And removed the calc offset.
>
> >               ts = mul_u64_u32_shr(ts, handle->tsc_calc.mult, handle->tsc_calc.shift);
> >       }
> >
> >       /* User specified time offset with --ts-offset or --date options */
> > -     if (handle->ts_offset)
> > -             ts += handle->ts_offset;
> > +     ts += handle->ts_offset;
>
> As we mentioned (and this can be a separate patch), the if statement is
> useless.
>
> >
> >       return ts;
> >  }
> >
> > +static unsigned long long timestamp_calc(unsigned long long ts, int cpu,
> > +                                      struct tracecmd_input *handle)
> > +{
> > +     static int once;
> > +
> > +     ts = pre_timestamp_calc(ts, cpu, handle);
> > +     if (!once && ts > handle->start_ts_offset) {
> > +             once++;
> > +             tracecmd_warning();
> > +     }
> > +     ts -= handle->start_ts_offset;
>
> After looking at this more, I think we should just have the ts_offset and
> start_ts_offset be the same. And remove the ts += handle->ts_offset, from
> the pre_timestamp_calc() above, and have this check test just ts_offset.
>
> So now the timestamp_calc() will get the timestamp and then apply the
> ts_offset separately (and warn if the offset is greater than the ts).
>
> > +}
> > +
> > +
> > +
> > +long long tracecmd_cpu_first_ts_offset(struct tracecmd_input *handle, int cpu)
> > +}
> > +     struct tep_record rec;
> > +
> > +     rec = first_event(handle, cpu);
> > +     return pre_timestamp_calc(rec->ts, cpu, handle) - handle->ts_offset;
> > +}
>
> Add an API that shows the difference between the first stream event
> timestamp against the user supplied (or file supplied) ts_offset.
>
>
> > +
> > +int tracecmd_modify_ts_offset(struct tracecmd_input *handle, long long offset)
> > +{
> > +     handle->ts_offset += offset;
> > +}
>
> Allow the user to tweak that offset. As we already have:
>
> tracecmd_set_ts_offset(handle, offset) to set ts_offset, if the user found
> that the offset was before, it could tweak it.
>
> > +
> > +
> >  /*
> >   * Page is mapped, now read in the page header info.
> >   */
> >
>
>
> That is, in the options, we would need to have the calc offset from the
> file (doing the sync), do:
>
>         handle->ts_offset -= start_offset.
>
> Or something like that. I'll let you look at this code and see what you
> come up with, and we can discuss this further.
>
> -- Steve

On Thu, Apr 22, 2021 at 11:25 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 16 Apr 2021 10:01:18 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > On Fri, 16 Apr 2021 09:59:08 -0400
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> >
> > >
> > > As for pr_stat(), I think we should rename it to tep_info() and tep_vinfo()
> > > that acts just like tep_warning(), except it is for informational output
> > > (stdout instead of stderr). This is similar to what the kernel has.
> > >
> > > Since tep_vwarning() takes a name, so can tep_vinfo(), and I was thinking
> > > that we should expose this string to the application.
> > >
> >
> > Oh, and we could add a tep_critical() and tep_vcritical() which means that
> > the error is something really bad happened, (like failed to allocate).
>
> Any thoughts on this?

These changes are superseded by "RFC [PATCH 0/5] tsc2nsec fixes",
where some of these suggestions are implemented.
https://lore.kernel.org/linux-trace-devel/20210428122839.805296-1-tz.stoyanov@gmail.com/




--
Tzvetomir (Ceco) Stoyanov
VMware Open Source Technology Center
diff mbox series

Patch

diff --git a/lib/trace-cmd/trace-input.c b/lib/trace-cmd/trace-input.c
index 974879e8..991abd5f 100644
--- a/lib/trace-cmd/trace-input.c
+++ b/lib/trace-cmd/trace-input.c
@@ -1230,8 +1230,14 @@  static unsigned long long timestamp_calc(unsigned long long ts, int cpu,
 		ts *= handle->ts2secs;
 	} else if (handle->tsc_calc.mult) {
 		/* auto calculated TSC clock frequency */
-		ts -= handle->tsc_calc.offset;
-		ts = mul_u64_u32_shr(ts, handle->tsc_calc.mult, handle->tsc_calc.shift);
+		if (ts > handle->tsc_calc.offset) {
+			ts -= handle->tsc_calc.offset;
+			ts = mul_u64_u32_shr(ts, handle->tsc_calc.mult, handle->tsc_calc.shift);
+		} else {
+			tracecmd_warning("Timestamp $llu is before the initial offset %llu, set it to 0",
+					 ts, handle->tsc_calc.offset);
+			ts = 0;
+		}
 	}
 
 	/* User specified time offset with --ts-offset or --date options */