mbox series

[0/7] Final optimization batch (#15): use memory pools

Message ID pull.990.git.1627044897.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Final optimization batch (#15): use memory pools | expand

Message

Linus Arver via GitGitGadget July 23, 2021, 12:54 p.m. UTC
This series textually depends on en/ort-perf-batch-14, but the ideas are
orthogonal to it and orthogonal to previous series. It can be reviewed
independently.

This series is more about strmaps & memory pools than merge logic. CC'ing
Peff since he reviewed the strmap work[1], and that work included a number
of decisions that specifically had this series in mind.

[1]
https://lore.kernel.org/git/20201111200701.GB39046@coredump.intra.peff.net/

=== Basic Optimization idea ===

In this series, I make use of memory pools to get faster allocations and
deallocations for many data structures that tend to all be deallocated at
the same time anyway.

=== Results ===

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28), the
changes in just this series improves the performance as follows:

                     Before Series           After Series
no-renames:      204.2  ms ±  3.0  ms    198.3 ms ±  2.9 ms
mega-renames:      1.076 s ±  0.015 s    661.8 ms ±  5.9 ms
just-one-mega:   364.1  ms ±  7.0  ms    264.6 ms ±  2.5 ms


As a reminder, before any merge-ort/diffcore-rename performance work, the
performance results we started with were:

no-renames-am:      6.940 s ±  0.485 s
no-renames:        18.912 s ±  0.174 s
mega-renames:    5964.031 s ± 10.459 s
just-one-mega:    149.583 s ±  0.751 s


=== Overall Results across all optimization work ===

This is my final prepared optimization series. It might be worth reviewing
how my optimizations fared overall, comparing the original merge-recursive
timings with three things: how much merge-recursive improved (as a
side-effect of optimizing merge-ort), how much improvement we would have
gotten from a hypothetical infinite parallelization of rename detection, and
what I achieved at the end with merge-ort:

                               Timings

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename    merge-ort
                 v2.30.0      current     detection     current
                ----------   ---------   -----------   ---------
no-renames:       18.912 s    18.030 s     11.699 s     198.3 ms
mega-renames:   5964.031 s   361.281 s    203.886 s     661.8 ms
just-one-mega:   149.583 s    11.009 s      7.553 s     264.6 ms

                           Speedup factors

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename
                 v2.30.0      current     detection    merge-ort
                ----------   ---------   -----------   ---------
no-renames:         1           1.05         1.6           95
mega-renames:       1          16.5         29           9012
just-one-mega:      1          13.6         20            565


And, for partial clone users:

             Factor reduction in number of objects needed

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename
                 v2.30.0      current     detection    merge-ort
                ----------   ---------   -----------   ---------
mega-renames:       1            1            1          181.3


=== Caveat ===

It may be worth noting, though, that my optimization numbers above for
merge-ort use test-tool fast-rebase. git rebase -s ort on the three
testcases above is 5-20 times slower (taking 3.835s, 6.798s, and 1.235s,
respectively). At this point, any further optimization work should go into
making a faster full-featured rebase by copying the ideas from fast-rebase:
avoid unnecessary process forking, avoid updating the index and working copy
until either the rebase is finished or you hit a conflict (and don't write
rebase metadata to disk until that point either), get rid of the glacially
slow revision walking of the upstream side of history (nuke
can_fast_forward(), make --reapply-cherry-picks the default) or at least
don't revision walk so many times (multiple calls to get_merge_bases in
can_fast_forward() plus a is_linear_history() walk, checking for upstream
cherry-picks, probably more), turn off per-commit hooks that probably should
have never been on anyway, etc.

Elijah Newren (7):
  diffcore-rename: use a mem_pool for exact rename detection's hashmap
  merge-ort: set up a memory pool
  merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
  merge-ort: switch our strmaps over to using memory pools
  diffcore-rename, merge-ort: add wrapper functions for filepair
    alloc/dealloc
  merge-ort: store filepairs and filespecs in our mem_pool
  merge-ort: reuse path strings in pool_alloc_filespec

 diffcore-rename.c |  66 +++++++++++--
 diffcore.h        |   3 +
 merge-ort.c       | 247 +++++++++++++++++++++++++++++++++++-----------
 3 files changed, 251 insertions(+), 65 deletions(-)


base-commit: c9ada8369e6575be488028aae0f654422a9b1410
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-990%2Fnewren%2Fort-perf-batch-15-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-990/newren/ort-perf-batch-15-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/990

Comments

Derrick Stolee July 26, 2021, 2:44 p.m. UTC | #1
On 7/23/2021 8:54 AM, Elijah Newren via GitGitGadget wrote:
...
> === Basic Optimization idea ===
> 
> In this series, I make use of memory pools to get faster allocations and
> deallocations for many data structures that tend to all be deallocated at
> the same time anyway.

Makes sense. This is appropriate for a final optimization, since the gains
tend to be quite small.

> === Results ===
> 
> For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
> performance work; instrument with trace2_region_* calls", 2020-10-28), the
> changes in just this series improves the performance as follows:
> 
>                      Before Series           After Series
> no-renames:      204.2  ms ±  3.0  ms    198.3 ms ±  2.9 ms
> mega-renames:      1.076 s ±  0.015 s    661.8 ms ±  5.9 ms
> just-one-mega:   364.1  ms ±  7.0  ms    264.6 ms ±  2.5 ms


But these are larger than I anticipated! Amazing.

> === Overall Results across all optimization work ===

I enjoyed reading this section. I'm excited to make ORT the default in
the microsoft/git fork and see how this improves the lives of our users.

> === Caveat ===
> 
> It may be worth noting, though, that my optimization numbers above for
> merge-ort use test-tool fast-rebase. git rebase -s ort on the three
> testcases above is 5-20 times slower (taking 3.835s, 6.798s, and 1.235s,
> respectively).

The performance and behavior changes recommended here should definitely
be considered. However, the benefits still apply and at the moment users
do not expect immediate responses from 'git rebase' so we have some time
to approach with caution.

I only had one small question that is not even important to the
correctness of this series, so feel free to ignore it. The patches tell
a convincing story.

Just to be careful, have you taken the time to run the merge-ORT tests
with --valgrind?

Thanks,
-Stolee
Elijah Newren July 28, 2021, 10:52 p.m. UTC | #2
On Mon, Jul 26, 2021 at 8:44 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 7/23/2021 8:54 AM, Elijah Newren via GitGitGadget wrote:
> ...
> > === Basic Optimization idea ===
> >
> > In this series, I make use of memory pools to get faster allocations and
> > deallocations for many data structures that tend to all be deallocated at
> > the same time anyway.
>
> Makes sense. This is appropriate for a final optimization, since the gains
> tend to be quite small.
>
> > === Results ===
> >
> > For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
> > performance work; instrument with trace2_region_* calls", 2020-10-28), the
> > changes in just this series improves the performance as follows:
> >
> >                      Before Series           After Series
> > no-renames:      204.2  ms ±  3.0  ms    198.3 ms ±  2.9 ms
> > mega-renames:      1.076 s ±  0.015 s    661.8 ms ±  5.9 ms
> > just-one-mega:   364.1  ms ±  7.0  ms    264.6 ms ±  2.5 ms
>
>
> But these are larger than I anticipated! Amazing.
>
> > === Overall Results across all optimization work ===
>
> I enjoyed reading this section. I'm excited to make ORT the default in
> the microsoft/git fork and see how this improves the lives of our users.
>
> > === Caveat ===
> >
> > It may be worth noting, though, that my optimization numbers above for
> > merge-ort use test-tool fast-rebase. git rebase -s ort on the three
> > testcases above is 5-20 times slower (taking 3.835s, 6.798s, and 1.235s,
> > respectively).
>
> The performance and behavior changes recommended here should definitely
> be considered. However, the benefits still apply and at the moment users
> do not expect immediate responses from 'git rebase' so we have some time
> to approach with caution.
>
> I only had one small question that is not even important to the
> correctness of this series, so feel free to ignore it. The patches tell
> a convincing story.
>
> Just to be careful, have you taken the time to run the merge-ORT tests
> with --valgrind?

Yes.  In addition to the testsuite, I also ran the testcases above
under valgrind (especially mega-renames) -- and with those testcases I
had the leak checker turned on.  It was somewhat surprising how much
slowdown I saw when I introduced some accidental memory leaks while
optimizing.  But it all runs clean with the patches I submitted.