mbox series

[0/9] Trace2 stopwatch timers and global counters

Message ID pull.1099.git.1640012469.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Trace2 stopwatch timers and global counters | expand

Message

Koji Nakamaru via GitGitGadget Dec. 20, 2021, 3:01 p.m. UTC
Extend Trace2 to provide multiple "stopwatch timers" and "global counters".

 1. Stopwatch Timers

A stopwatch timer is a thread-safe timer that may be repeatedly started and
stopped to measure intervals of time spent in spans of code. A single
summary "timer" event record is written to the Trace2 event stream at the
end of the program. Timers are accumulated in TLS, so it can also report
per-thread interval times when desired.

Timer events are automatically written during the Trace2 "atexit" handler,
so various subsystems don't need to worry about that.

New timers may be defined by adding a new enum trace2_timer_id value and a
row to the trace2/tr2_tmr.c:tr2tmr_def_block[] global table.

Timer events include the number of intervals (start+stop calls), the total
elapsed time, and min/max intervals.

Two test timers are predefined and used by t/helper/test-trace2.c and the
t/t0211 and t/0212 tests.

 2. Global Counters

A global counter is a lighter weight version of the above that just
accumulates integer values, but without the timing and min/max statistics.

Counter events are written during the Trace2 "atexit" handler automatically,
so subsystems that use these counters don't need to create their own.

New counters may be defined by adding a new enum trace2_counter_id value and
a row to the trace2/tr2_ctr.c:tr2ctr_def_block[] global table.

 3. Rationale

Timers and counters are an alternative to the existing "region" and "data"
events. The latter are intended to trace the major flow (or phases) of the
program and possibly capture the amount of work performed within a loop, for
example. The former are offered as a way to measure activity that is not
localized, such as the time spent in zlib or lstat, which may be called from
many different parts of the program.

There are currently several places in the Git code where we want to measure
such activity -- changed-path Bloom filter stats, topo-walk commit counts,
and tree-walk counts and max-depths. A conversation in [1] suggested that we
should investigate a more general mechanism to collect stats so that each
instance doesn't need to recreate their own atexit handling mechanism.

This is an attempt to address that and let us easily explore other areas in
the future.

This patch series does not attempt to refactor those three instances to use
the new timers and counters. That should be a separate effort -- in part
because we may want to retool them rather than just translate them. For
example, rather than just translating the existing four Bloom filter counts
(in revision.c) into Trace2 counters, we may instead want to have a "happy
path timer" and a "sad path timer" if that would provide more insight.

 4. Notes

The first two commits in this series attend to some cleanup that was
discussed in [2] and [3]. The first (using size_t rather than int) is
harmless and could be done in a separate series if desired. The second
(using a char* rather than a strbuf for the thread-name) is a nice cleanup
before I change how I use the thread-name in a later commit in the series.

[1]
https://lore.kernel.org/git/cbc17f1b-57fc-497f-f1ab-baa8cc84620d@gmail.com/
[2] https://lore.kernel.org/all/YULF3hoaDxA9ENdO@nand.local/ [3]
https://lore.kernel.org/all/xmqqa6kdwo24.fsf@gitster.g/

Jeff Hostetler (9):
  trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
  trace2: convert tr2tls_thread_ctx.thread_name from strbuf to char*
  trace2: defer free of TLS CTX until program exit.
  trace2: add thread-name override to event target
  trace2: add thread-name override to perf target
  trace2: add timer events to perf and event target formats
  trace2: add stopwatch timers
  trace2: add counter events to perf and event target formats
  trace2: add global counters

 Documentation/technical/api-trace2.txt | 159 ++++++++++++++++++++-
 Makefile                               |   2 +
 t/helper/test-trace2.c                 | 184 +++++++++++++++++++++++++
 t/t0211-trace2-perf.sh                 |  49 +++++++
 t/t0212-trace2-event.sh                |  69 ++++++++++
 trace2.c                               | 114 +++++++++++++++
 trace2.h                               |  75 ++++++++++
 trace2/tr2_ctr.c                       |  65 +++++++++
 trace2/tr2_ctr.h                       |  75 ++++++++++
 trace2/tr2_tgt.h                       |  39 ++++++
 trace2/tr2_tgt_event.c                 | 122 ++++++++++++----
 trace2/tr2_tgt_normal.c                |   2 +
 trace2/tr2_tgt_perf.c                  | 112 +++++++++++----
 trace2/tr2_tls.c                       | 110 ++++++++++++---
 trace2/tr2_tls.h                       |  42 +++++-
 trace2/tr2_tmr.c                       | 126 +++++++++++++++++
 trace2/tr2_tmr.h                       | 120 ++++++++++++++++
 17 files changed, 1386 insertions(+), 79 deletions(-)
 create mode 100644 trace2/tr2_ctr.c
 create mode 100644 trace2/tr2_ctr.h
 create mode 100644 trace2/tr2_tmr.c
 create mode 100644 trace2/tr2_tmr.h


base-commit: e773545c7fe7eca21b134847f4fc2cbc9547fa14
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1099%2Fjeffhostetler%2Ftrace2-stopwatch-v2-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1099/jeffhostetler/trace2-stopwatch-v2-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1099

Comments

Derrick Stolee Dec. 21, 2021, 2:51 p.m. UTC | #1
On 12/20/2021 10:01 AM, Jeff Hostetler via GitGitGadget wrote:
> Extend Trace2 to provide multiple "stopwatch timers" and "global counters".

>  3. Rationale
> 
> Timers and counters are an alternative to the existing "region" and "data"
> events. The latter are intended to trace the major flow (or phases) of the
> program and possibly capture the amount of work performed within a loop, for
> example. The former are offered as a way to measure activity that is not
> localized, such as the time spent in zlib or lstat, which may be called from
> many different parts of the program.
> 
> There are currently several places in the Git code where we want to measure
> such activity -- changed-path Bloom filter stats, topo-walk commit counts,
> and tree-walk counts and max-depths. A conversation in [1] suggested that we
> should investigate a more general mechanism to collect stats so that each
> instance doesn't need to recreate their own atexit handling mechanism.
> 
> This is an attempt to address that and let us easily explore other areas in
> the future.
> 
> This patch series does not attempt to refactor those three instances to use
> the new timers and counters. That should be a separate effort -- in part
> because we may want to retool them rather than just translate them. For
> example, rather than just translating the existing four Bloom filter counts
> (in revision.c) into Trace2 counters, we may instead want to have a "happy
> path timer" and a "sad path timer" if that would provide more insight.

I'm excited for these API features. It might be nice to have an RFC-
quality series demonstrating how these examples could work with the
new API. Makes sense to delay in case there were recommended changes
to the API from review in this v1.

I also like your attention to thread contexts. I think these timers
would be very interesting to use in parallel checkout. CC'ing Matheus
for his thoughts on where he would want timer summaries for that
feature. I would probably want the per-thread summary to know if we
are blocked on one really long thread while the others finish quickly.
Within that: what are the things causing us to be slow? Is it zlib?
Is it lstat()?

Thanks,
-Stolee
Matheus Tavares Dec. 21, 2021, 11:27 p.m. UTC | #2
On Tue, Dec 21, 2021 at 11:51 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 12/20/2021 10:01 AM, Jeff Hostetler via GitGitGadget wrote:
> >
> >  3. Rationale
> >
> > Timers and counters are an alternative to the existing "region" and "data"
> > events. The latter are intended to trace the major flow (or phases) of the
> > program and possibly capture the amount of work performed within a loop, for
> > example. The former are offered as a way to measure activity that is not
> > localized, such as the time spent in zlib or lstat, which may be called from
> > many different parts of the program.
>
> I'm excited for these API features.

Me too! This would have been very useful on some experiments I had to
run in the past.

Thanks for working on it, Jeff :)

> I also like your attention to thread contexts. I think these timers
> would be very interesting to use in parallel checkout. CC'ing Matheus
> for his thoughts on where he would want timer summaries for that
> feature.

For parallel checkout, I think it would be interesting to have timer
summaries for open/close, fstat/lstat, write, and
inflation/delta-reconstruction. Perhaps pkt-line routines too, so that
we can see how much time we spend in inter-process communication.

It would be nice to have timer information for disk reading as well
(more on that below), but I don't think it is possible since we read
the objects through mmap() and thus, we cannot easily isolate the
actual reading time from the decompression time :(

> I would probably want the per-thread summary to know if we
> are blocked on one really long thread while the others finish quickly.

That would be interesting. Parallel checkout actually uses
subprocesses, but I can see the per-thread summary being useful on
grep, for example. (Nevertheless, the use case you mentioned for the
timers -- to evaluate the work balance on parallel checkout -- seems
very interesting.)

> Within that: what are the things causing us to be slow? Is it zlib?
> Is it lstat()?

On my tests, the bottleneck on checkout heavily depended on the
underlying storage type. On HDDs, the bottleneck was object reading
(i.e. page faults on mmap()-ed files), with about 70% to 80% of the
checkout runtime.

On SSDs, reading was much faster, so CPU (i.e. inflation) became the
bottleneck, with 50% of the runtime. (Inflation only lost to reading
when checking out from *many* loose objects.)

Finally, on NFS, file creation with open(O_CREAT | O_EXCL) and fstat()
(which makes the NFS client flush previously cached writes to the
server) were the bottlenecks, with about 40% of the total runtime
each.

These numbers come from a (sequential) `git checkout .` execution on
an empty working tree of the Linux kernel (v5.12), and they were
gathered using eBPF-based profilers. For other operations, especially
ones that require many file removals or more laborious tree merging in
unpack_trees(), I suspect the bottlenecks may change.

If anyone would be interested in seeing the flamegraphs and other
plots for these profiling numbers, I have them at:
https://matheustavares.gitlab.io/annexes/parallel-checkout/profiling

And there is a bit more context at:
https://matheustavares.gitlab.io/posts/parallel-checkout