mbox series

[v2,00/20] fundamentals of merge-ort implementation

Message ID 20201102204344.342633-1-newren@gmail.com (mailing list archive)
Headers show
Series fundamentals of merge-ort implementation | expand

Message

Elijah Newren Nov. 2, 2020, 8:43 p.m. UTC
This series depends on a merge of en/strmap (after updating to v3) and
en/merge-ort-api-null-impl.

As promised, here's the update of the series due to the strmap
updates...and two other tiny updates.

Changes since v1:
  * updates needed based on changes to made in v3 of strmap series
  * fixed a typo in a comment
  * tiny tweak to move a strmap_put() into setup_paths()

Elijah Newren (20):
  merge-ort: setup basic internal data structures
  merge-ort: add some high-level algorithm structure
  merge-ort: port merge_start() from merge-recursive
  merge-ort: use histogram diff
  merge-ort: add an err() function similar to one from merge-recursive
  merge-ort: implement a very basic collect_merge_info()
  merge-ort: avoid repeating fill_tree_descriptor() on the same tree
  merge-ort: compute a few more useful fields for collect_merge_info
  merge-ort: record stage and auxiliary info for every path
  merge-ort: avoid recursing into identical trees
  merge-ort: add a preliminary simple process_entries() implementation
  merge-ort: have process_entries operate in a defined order
  merge-ort: step 1 of tree writing -- record basenames, modes, and oids
  merge-ort: step 2 of tree writing -- function to create tree object
  merge-ort: step 3 of tree writing -- handling subdirectories as we go
  merge-ort: basic outline for merge_switch_to_result()
  merge-ort: add implementation of checkout()
  tree: enable cmp_cache_name_compare() to be used elsewhere
  merge-ort: add implementation of record_unmerged_index_entries()
  merge-ort: free data structures in merge_finalize()

 merge-ort.c | 929 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 tree.c      |   2 +-
 tree.h      |   2 +
 3 files changed, 929 insertions(+), 4 deletions(-)

Comments

Derrick Stolee Nov. 3, 2020, 2:49 p.m. UTC | #1
On 11/2/2020 3:43 PM, Elijah Newren wrote:
> This series depends on a merge of en/strmap (after updating to v3) and
> en/merge-ort-api-null-impl.
> 
> As promised, here's the update of the series due to the strmap
> updates...and two other tiny updates.

Hi Elijah,

I'm sorry that I've been unavailable to read and review your series
on this topic. I'm very excited about the opportunities here, and I
wanted to take your topic and merge it with our microsoft/git fork
so I could test the performance in a Scalar-enabled monorepo. My
branch is available in my fork [1]

[1] https://github.com/derrickstolee/git/tree/merge-ort-vfs

However, I'm unable to discover how to trigger your ort strategy,
even for a simple rebase. Perhaps you could supply a recommended
command for testing?

Thanks,
-Stolee
Elijah Newren Nov. 3, 2020, 4:36 p.m. UTC | #2
On Tue, Nov 3, 2020 at 6:50 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 11/2/2020 3:43 PM, Elijah Newren wrote:
> > This series depends on a merge of en/strmap (after updating to v3) and
> > en/merge-ort-api-null-impl.
> >
> > As promised, here's the update of the series due to the strmap
> > updates...and two other tiny updates.
>
> Hi Elijah,
>
> I'm sorry that I've been unavailable to read and review your series
> on this topic. I'm very excited about the opportunities here, and I
> wanted to take your topic and merge it with our microsoft/git fork
> so I could test the performance in a Scalar-enabled monorepo. My
> branch is available in my fork [1]
>
> [1] https://github.com/derrickstolee/git/tree/merge-ort-vfs
>
> However, I'm unable to discover how to trigger your ort strategy,
> even for a simple rebase. Perhaps you could supply a recommended
> command for testing?
>
> Thanks,
> -Stolee

If you want to test performance, you shouldn't test this particular
submission, you should test the end result which exists as the 'ort'
branch of my repo.  It actually passes all the tests rather than just
trivial cherry-picks and rebases, and has lots (and lots) of
performance work that hasn't even begun at the point of the
'ort-basics' branch.  (However, it also contains some unrelated memory
cleanup in revision.c, chdir-notify.c, and a number of other places
because I was annoyed that a rebase wouldn't run valgrind-free and
made it harder to spot my memory leaks.  And the day I went hunting
those memory "leaks", I went and grabbed some unrelated memory leaks
too.  If it causes you merge conflicts, let me know and I'll try to
create a branch for you that hash the minimal changes outside of
merge-ort*.[ch] and diffcore*.[ch])

All that said, for testing either branch you just need to first set
pull.twohead=ort in your git config (see
https://lore.kernel.org/git/61217a83bd7ff0ce9016eb4df9ded4fdf29a506c.1604360734.git.gitgitgadget@gmail.com/),
or, if running regression tests, set GIT_TEST_MERGE_ALGORITHM=ort.
Elijah Newren Nov. 7, 2020, 6:06 a.m. UTC | #3
Hi Derrick,

On Tue, Nov 3, 2020 at 8:36 AM Elijah Newren <newren@gmail.com> wrote:
>
> On Tue, Nov 3, 2020 at 6:50 AM Derrick Stolee <stolee@gmail.com> wrote:
> >
> > On 11/2/2020 3:43 PM, Elijah Newren wrote:
> > > This series depends on a merge of en/strmap (after updating to v3) and
> > > en/merge-ort-api-null-impl.
> > >
> > > As promised, here's the update of the series due to the strmap
> > > updates...and two other tiny updates.
> >
> > Hi Elijah,
> >
> > I'm sorry that I've been unavailable to read and review your series
> > on this topic. I'm very excited about the opportunities here, and I
> > wanted to take your topic and merge it with our microsoft/git fork
> > so I could test the performance in a Scalar-enabled monorepo. My
> > branch is available in my fork [1]
> >
> > [1] https://github.com/derrickstolee/git/tree/merge-ort-vfs
> >
> > However, I'm unable to discover how to trigger your ort strategy,
> > even for a simple rebase. Perhaps you could supply a recommended
> > command for testing?
> >
> > Thanks,
> > -Stolee
>
> If you want to test performance, you shouldn't test this particular
> submission, you should test the end result which exists as the 'ort'
> branch of my repo.  It actually passes all the tests rather than just
> trivial cherry-picks and rebases, and has lots (and lots) of
> performance work that hasn't even begun at the point of the
> 'ort-basics' branch.  (However, it also contains some unrelated memory
> cleanup in revision.c, chdir-notify.c, and a number of other places
> because I was annoyed that a rebase wouldn't run valgrind-free and
> made it harder to spot my memory leaks.  And the day I went hunting
> those memory "leaks", I went and grabbed some unrelated memory leaks
> too.  If it causes you merge conflicts, let me know and I'll try to
> create a branch for you that hash the minimal changes outside of
> merge-ort*.[ch] and diffcore*.[ch])
>
> All that said, for testing either branch you just need to first set
> pull.twohead=ort in your git config (see
> https://lore.kernel.org/git/61217a83bd7ff0ce9016eb4df9ded4fdf29a506c.1604360734.git.gitgitgadget@gmail.com/),
> or, if running regression tests, set GIT_TEST_MERGE_ALGORITHM=ort.

I probably also should have mentioned that merge-ort does not (yet?)
heed merge.renames configuration setting; it always detects renames.
I know you run with merge.renames=false, so you won't quite get an
apples-to-apples comparison.  However, part of my point was I wanted
to make renames fast enough that they could be left turned on, even
for the large scale repos, so I'm very interested in your experience.
If you need an escape hatch, though, just put a "return 1" at the top
of detect_and_process_renames() to turn it off.

Oh, and I went through and re-merged all the merge commits in the
linux kernel and found a bug in merge-ort while doing that (causing it
to die, not to merge badly).  I'm kind of surprised that none of my
testcases triggered that failure earlier; if you're testing it out,
you might want to update to get the fix (commit 067e5c1a38,
"merge-ort: fix bug with cached_target_names not being initialized in
redos", 2020-11-06).
Derrick Stolee Nov. 7, 2020, 3:02 p.m. UTC | #4
On 11/7/20 1:06 AM, Elijah Newren wrote:
> Hi Derrick,
> 
> On Tue, Nov 3, 2020 at 8:36 AM Elijah Newren <newren@gmail.com> wrote:
>> All that said, for testing either branch you just need to first set
>> pull.twohead=ort in your git config (see
>> https://lore.kernel.org/git/61217a83bd7ff0ce9016eb4df9ded4fdf29a506c.1604360734.git.gitgitgadget@gmail.com/),
>> or, if running regression tests, set GIT_TEST_MERGE_ALGORITHM=ort.
> 
> I probably also should have mentioned that merge-ort does not (yet?)
> heed merge.renames configuration setting; it always detects renames.
> I know you run with merge.renames=false, so you won't quite get an
> apples-to-apples comparison.  However, part of my point was I wanted
> to make renames fast enough that they could be left turned on, even
> for the large scale repos, so I'm very interested in your experience.
> If you need an escape hatch, though, just put a "return 1" at the top
> of detect_and_process_renames() to turn it off.
> 
> Oh, and I went through and re-merged all the merge commits in the
> linux kernel and found a bug in merge-ort while doing that (causing it
> to die, not to merge badly).  I'm kind of surprised that none of my
> testcases triggered that failure earlier; if you're testing it out,
> you might want to update to get the fix (commit 067e5c1a38,
> "merge-ort: fix bug with cached_target_names not being initialized in
> redos", 2020-11-06).

I did manage to do some testing to see what happens with
a large repo under a small sparse-checkout. And using
trace2, I was able to see that your code is being exercised.
Unfortunately, I didn't see any performance improvement, and
that is likely due to needing to expand the index entirely
when checking out the merge commit.

Is there a command to construct a merge commit without
actually checking it out? That would reduce the time spent
expanding the index, which would allow your algorithm to
really show its benefits!

Thanks,
-Stolee
Elijah Newren Nov. 7, 2020, 7:39 p.m. UTC | #5
On Sat, Nov 7, 2020 at 7:02 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 11/7/20 1:06 AM, Elijah Newren wrote:
> > Hi Derrick,
> >
> > On Tue, Nov 3, 2020 at 8:36 AM Elijah Newren <newren@gmail.com> wrote:
> >> All that said, for testing either branch you just need to first set
> >> pull.twohead=ort in your git config (see
> >> https://lore.kernel.org/git/61217a83bd7ff0ce9016eb4df9ded4fdf29a506c.1604360734.git.gitgitgadget@gmail.com/),
> >> or, if running regression tests, set GIT_TEST_MERGE_ALGORITHM=ort.
> >
> > I probably also should have mentioned that merge-ort does not (yet?)
> > heed merge.renames configuration setting; it always detects renames.
> > I know you run with merge.renames=false, so you won't quite get an
> > apples-to-apples comparison.  However, part of my point was I wanted
> > to make renames fast enough that they could be left turned on, even
> > for the large scale repos, so I'm very interested in your experience.
> > If you need an escape hatch, though, just put a "return 1" at the top
> > of detect_and_process_renames() to turn it off.
> >
> > Oh, and I went through and re-merged all the merge commits in the
> > linux kernel and found a bug in merge-ort while doing that (causing it
> > to die, not to merge badly).  I'm kind of surprised that none of my
> > testcases triggered that failure earlier; if you're testing it out,
> > you might want to update to get the fix (commit 067e5c1a38,
> > "merge-ort: fix bug with cached_target_names not being initialized in
> > redos", 2020-11-06).
>
> I did manage to do some testing to see what happens with
> a large repo under a small sparse-checkout. And using
> trace2, I was able to see that your code is being exercised.
> Unfortunately, I didn't see any performance improvement, and
> that is likely due to needing to expand the index entirely
> when checking out the merge commit.
>
> Is there a command to construct a merge commit without
> actually checking it out? That would reduce the time spent
> expanding the index, which would allow your algorithm to
> really show its benefits!

Wow, very interesting.  I am working on a --remerge-diff option for
log, which implies -p and is similar to -c or --cc in that it makes
merge commits show a diff, but which in particular remerges the two
parent commits complete with conflict markers and such and then diffs
the merge commit against that intermediate remerge.  That's a case
that constructs a merge commit without ever touching the index (or
working tree)...but there's no equivalent comparison point for
merge-recursive.  So, it doesn't provide something to compare against
(and while the code can be used I don't actually have a --remerge-diff
option yet -- it just hardcodes the behavior on if wanted or not), so
I'm not sure if you'd be interested in it.  If you are, let me know
though, and I'll send details.

However, I'm really surprised here, because merge-recursive always
reads and writes the index too (the index is the basis for its whole
algorithm).  In fact, merge-recursive always reads the index at least
*twice* (it unconditionally discards and re-reads the index), so you
must have some kind of specialized tweaking of merge-recursive if it
somehow avoids a full index read/write.  In order to do an
apples-to-apples comparison, we'd need to make those same tweaks to
merge-ort, but I don't have a clue what kind of tweaks you've made
here.  So, some investigation points:

*1*. Could you give me the accumulated times from the trace2_regions
so we can verify where the time is spent?  The 'summarize-perf' script
at the toplevel of the repo in my ort branch might be helpful for
this; just prefix any git command with that script and it accumulates
the trace2 region times and prints them out.  For example, I could run
'summarize-perf git merge --no-edit B^0' or 'summarize-perf test-tool
fast-rebase --onto HEAD ca76bea9 myfeature'.  Here's an example:

=== BEGIN OUTPUT ===
$ /home/newren/floss/git/summarize-perf test-tool fast-rebase --onto
HEAD 4703d9119972bf586d2cca76ec6438f819ffa30e hwmon-updates
Rebasing fd8bdb23b91876ac1e624337bb88dc1dcc21d67e...
Done.
Accumulated times:
    0.031 : <unmeasured> ( 3.2%)
    0.837 : 35 : label:incore_nonrecursive
       0.003 : <unmeasured> ( 0.4%)
       0.476 : 41 : ..label:collect_merge_info
          0.001 : <unmeasured> ( 0.2%)
          0.475 : 41 : ....label:traverse_trees
       0.298 : 41 : ..label:renames
          0.015 : <unmeasured> ( 5.1%)
          0.280 : 41 : ....label:regular renames
             0.036 : <unmeasured> (12.7%)
             0.244 : 6 : ......label:diffcore_rename
                0.001 : <unmeasured> ( 0.4%)
                0.078 : 6 : ........label:dir rename setup
                0.055 : 6 : ........label:basename matches
                0.051 : 6 : ........label:exact renames
                0.031 : 6 : ........label:write back to queue
                0.017 : 6 : ........label:setup
                0.009 : 6 : ........label:cull basename
                0.003 : 6 : ........label:cull exact
          0.002 : 35 : ....label:directory renames
          0.001 : 35 : ....label:process renames
       0.052 : 35 : ..label:process_entries
          0.001 : <unmeasured> ( 1.7%)
          0.033 : 35 : ....label:processing
          0.017 : 35 : ....label:process_entries setup
             0.001 : <unmeasured> ( 5.8%)
             0.008 : 35 : ......label:plist copy
             0.008 : 35 : ......label:plist sort
             0.000 : 35 : ......label:plist grow
          0.001 : 35 : ....label:finalize
       0.005 : 35 : ..label:merge_start
          0.001 : <unmeasured> (18.8%)
          0.004 : 34 : ....label:reset_maps
          0.000 : 35 : ....label:sanity checks
          0.000 : 1 : ....label:allocate/init
       0.003 : 6 : ..label:reset_maps
    0.035 : 1 : label:do_write_index
/home/newren/floss/linux-stable/.git/index.lock
    0.034 : 1 : label:checkout
       0.034 : <unmeasured> (99.9%)
       0.000 : 1 : ..label:Filtering content
    0.009 : 1 : label:do_read_index .git/index
    0.000 : 1 : label:write_auto_merge
    0.000 : 1 : label:record_unmerged
Estimated measurement overhead (.010 ms/region-measure * 679):
0.006790000000000001
Timing including forking:  0.960 (0.013 additional seconds)
=== END OUTPUT ===
This was a run that took just under 1s (and was a hot-cache case; I
had just done the same rebase before to warm the caches), and the
combination of index/working tree bits (everything at and after
do_write_index in the output) was 0.035+0.034+0.009+0+0=0.078 seconds,
corresponding to just over 8.1% of overall time.  I'm curious where
that lands for your repository testcase; if the larger time ends up
somewhere under the indented label:incore_nonrecursive region, then
it's due to something other than index reading/updating/writing.

*2*. If it really is due to index reading/updating/writing, then index
handling in merge-ort is confined to two functions: checkout() and
record_unmerged_index_entries().  Both functions aren't too long, and
neither one calls into any other function within merge-ort.c.
(Further, checkout() is a near copy of code from merge_working_tree()
in builtin/checkout.c, or at least a copy of that function from a year
or so ago.)  As such, it's possible you can go in and make whatever
special tweaks you have for partial index reading/writing to those
functions.

I'm curious to hear back more on this.

Elijah
Derrick Stolee Nov. 9, 2020, 12:30 p.m. UTC | #6
On 11/7/2020 2:39 PM, Elijah Newren wrote:
> On Sat, Nov 7, 2020 at 7:02 AM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> On 11/7/20 1:06 AM, Elijah Newren wrote:
>>> Hi Derrick,
>>>
>>> On Tue, Nov 3, 2020 at 8:36 AM Elijah Newren <newren@gmail.com> wrote:
>>>> All that said, for testing either branch you just need to first set
>>>> pull.twohead=ort in your git config (see
>>>> https://lore.kernel.org/git/61217a83bd7ff0ce9016eb4df9ded4fdf29a506c.1604360734.git.gitgitgadget@gmail.com/),
>>>> or, if running regression tests, set GIT_TEST_MERGE_ALGORITHM=ort.
>>>
>>> I probably also should have mentioned that merge-ort does not (yet?)
>>> heed merge.renames configuration setting; it always detects renames.
>>> I know you run with merge.renames=false, so you won't quite get an
>>> apples-to-apples comparison.  However, part of my point was I wanted
>>> to make renames fast enough that they could be left turned on, even
>>> for the large scale repos, so I'm very interested in your experience.
>>> If you need an escape hatch, though, just put a "return 1" at the top
>>> of detect_and_process_renames() to turn it off.
>>>
>>> Oh, and I went through and re-merged all the merge commits in the
>>> linux kernel and found a bug in merge-ort while doing that (causing it
>>> to die, not to merge badly).  I'm kind of surprised that none of my
>>> testcases triggered that failure earlier; if you're testing it out,
>>> you might want to update to get the fix (commit 067e5c1a38,
>>> "merge-ort: fix bug with cached_target_names not being initialized in
>>> redos", 2020-11-06).
>>
>> I did manage to do some testing to see what happens with
>> a large repo under a small sparse-checkout. And using
>> trace2, I was able to see that your code is being exercised.
>> Unfortunately, I didn't see any performance improvement, and
>> that is likely due to needing to expand the index entirely
>> when checking out the merge commit.
>>
>> Is there a command to construct a merge commit without
>> actually checking it out? That would reduce the time spent
>> expanding the index, which would allow your algorithm to
>> really show its benefits!
> 
> Wow, very interesting.  I am working on a --remerge-diff option for
> log, which implies -p and is similar to -c or --cc in that it makes
> merge commits show a diff, but which in particular remerges the two
> parent commits complete with conflict markers and such and then diffs
> the merge commit against that intermediate remerge.  That's a case
> that constructs a merge commit without ever touching the index (or
> working tree)...but there's no equivalent comparison point for
> merge-recursive.  So, it doesn't provide something to compare against
> (and while the code can be used I don't actually have a --remerge-diff
> option yet -- it just hardcodes the behavior on if wanted or not), so
> I'm not sure if you'd be interested in it.  If you are, let me know
> though, and I'll send details.
> 
> However, I'm really surprised here, because merge-recursive always
> reads and writes the index too (the index is the basis for its whole
> algorithm).  In fact, merge-recursive always reads the index at least
> *twice* (it unconditionally discards and re-reads the index), so you
> must have some kind of specialized tweaking of merge-recursive if it
> somehow avoids a full index read/write.  In order to do an
> apples-to-apples comparison, we'd need to make those same tweaks to
> merge-ort, but I don't have a clue what kind of tweaks you've made
> here.  So, some investigation points:
> 
> *1*. Could you give me the accumulated times from the trace2_regions
> so we can verify where the time is spent?  The 'summarize-perf' script
> at the toplevel of the repo in my ort branch might be helpful for
> this; just prefix any git command with that script and it accumulates
> the trace2 region times and prints them out.  For example, I could run
> 'summarize-perf git merge --no-edit B^0' or 'summarize-perf test-tool
> fast-rebase --onto HEAD ca76bea9 myfeature'.  Here's an example:
> 
> === BEGIN OUTPUT ===
> $ /home/newren/floss/git/summarize-perf test-tool fast-rebase --onto
> HEAD 4703d9119972bf586d2cca76ec6438f819ffa30e hwmon-updates
> Rebasing fd8bdb23b91876ac1e624337bb88dc1dcc21d67e...
> Done.
> Accumulated times:
>     0.031 : <unmeasured> ( 3.2%)
>     0.837 : 35 : label:incore_nonrecursive
>        0.003 : <unmeasured> ( 0.4%)
>        0.476 : 41 : ..label:collect_merge_info
>           0.001 : <unmeasured> ( 0.2%)
>           0.475 : 41 : ....label:traverse_trees
>        0.298 : 41 : ..label:renames
>           0.015 : <unmeasured> ( 5.1%)
>           0.280 : 41 : ....label:regular renames
>              0.036 : <unmeasured> (12.7%)
>              0.244 : 6 : ......label:diffcore_rename
>                 0.001 : <unmeasured> ( 0.4%)
>                 0.078 : 6 : ........label:dir rename setup
>                 0.055 : 6 : ........label:basename matches
>                 0.051 : 6 : ........label:exact renames
>                 0.031 : 6 : ........label:write back to queue
>                 0.017 : 6 : ........label:setup
>                 0.009 : 6 : ........label:cull basename
>                 0.003 : 6 : ........label:cull exact
>           0.002 : 35 : ....label:directory renames
>           0.001 : 35 : ....label:process renames
>        0.052 : 35 : ..label:process_entries
>           0.001 : <unmeasured> ( 1.7%)
>           0.033 : 35 : ....label:processing
>           0.017 : 35 : ....label:process_entries setup
>              0.001 : <unmeasured> ( 5.8%)
>              0.008 : 35 : ......label:plist copy
>              0.008 : 35 : ......label:plist sort
>              0.000 : 35 : ......label:plist grow
>           0.001 : 35 : ....label:finalize
>        0.005 : 35 : ..label:merge_start
>           0.001 : <unmeasured> (18.8%)
>           0.004 : 34 : ....label:reset_maps
>           0.000 : 35 : ....label:sanity checks
>           0.000 : 1 : ....label:allocate/init
>        0.003 : 6 : ..label:reset_maps
>     0.035 : 1 : label:do_write_index
> /home/newren/floss/linux-stable/.git/index.lock
>     0.034 : 1 : label:checkout
>        0.034 : <unmeasured> (99.9%)
>        0.000 : 1 : ..label:Filtering content
>     0.009 : 1 : label:do_read_index .git/index
>     0.000 : 1 : label:write_auto_merge
>     0.000 : 1 : label:record_unmerged
> Estimated measurement overhead (.010 ms/region-measure * 679):
> 0.006790000000000001
> Timing including forking:  0.960 (0.013 additional seconds)
> === END OUTPUT ===
> This was a run that took just under 1s (and was a hot-cache case; I
> had just done the same rebase before to warm the caches), and the
> combination of index/working tree bits (everything at and after
> do_write_index in the output) was 0.035+0.034+0.009+0+0=0.078 seconds,
> corresponding to just over 8.1% of overall time.  I'm curious where
> that lands for your repository testcase; if the larger time ends up
> somewhere under the indented label:incore_nonrecursive region, then
> it's due to something other than index reading/updating/writing.
> 
> *2*. If it really is due to index reading/updating/writing, then index
> handling in merge-ort is confined to two functions: checkout() and
> record_unmerged_index_entries().  Both functions aren't too long, and
> neither one calls into any other function within merge-ort.c.
> (Further, checkout() is a near copy of code from merge_working_tree()
> in builtin/checkout.c, or at least a copy of that function from a year
> or so ago.)  As such, it's possible you can go in and make whatever
> special tweaks you have for partial index reading/writing to those
> functions.
> 
> I'm curious to hear back more on this.

I don't have a lot of time to dig into this right now, but here are
the stats for my rebases and merges with and without your option.

The first thing I notice for each is that there is a significant
amount of "unmeasured" time at the beginning of each, and that
could possibly be improved separately.

First, try a rebase forward and backward.

$ /_git/git/summarize-perf git rebase --onto to from test
Successfully rebased and updated refs/heads/test.
Accumulated times:
    8.511 : <unmeasured> (74.9%)
    1.331 : 1 : ......label:unpack_trees
       0.200 : <unmeasured> (15.1%)
       0.580 : 1 : ........label:traverse_trees
       0.403 : 1 : ........label:clear_ce_flags/0x00000000_0x02000000
       0.126 : 1 : ........label:check_updates
          0.126 : <unmeasured> (100.0%)
          0.000 : 1 : ..........label:Filtering content
       0.021 : 1 : ........label:clear_ce_flags/0x00080000_0x42000000
       0.000 : 1 : ........label:fully_valid
    1.059 : 1 : ......label:do_write_index /_git/office/src/.git/index.lock
       0.930 : <unmeasured> (87.9%)
       0.128 : 1 : ........label:write/extension/cache_tree
    0.455 : 2 : ......label:fully_valid
    0.001 : 1 : ......label:traverse_trees
    0.000 : 1 : ......label:check_updates
Estimated measurement overhead (.010 ms/region-measure * 41): 0.00041000000000000005
Timing including forking: 11.382 (0.026 additional seconds)

$ /_git/git/summarize-perf git rebase --onto from to test
Successfully rebased and updated refs/heads/test.
Accumulated times:
    8.556 : <unmeasured> (75.2%)
    1.315 : 1 : ......label:unpack_trees
       0.197 : <unmeasured> (15.0%)
       0.580 : 1 : ........label:traverse_trees
       0.391 : 1 : ........label:clear_ce_flags/0x00000000_0x02000000
       0.126 : 1 : ........label:check_updates
          0.126 : <unmeasured> (100.0%)
          0.000 : 1 : ..........label:Filtering content
       0.021 : 1 : ........label:clear_ce_flags/0x00080000_0x42000000
       0.000 : 1 : ........label:fully_valid
    1.071 : 1 : ......label:do_write_index /_git/office/src/.git/index.lock
       0.942 : <unmeasured> (88.0%)
       0.129 : 1 : ........label:write/extension/cache_tree
    0.431 : 2 : ......label:fully_valid
    0.001 : 1 : ......label:traverse_trees
    0.000 : 1 : ......label:check_updates
Estimated measurement overhead (.010 ms/region-measure * 41): 0.00041000000000000005
Timing including forking: 11.399 (0.026 additional seconds)

Then do the same with the ort strategy.

$ /_git/git/summarize-perf git -c pull.twohead=ort rebase --onto to from test
Successfully rebased and updated refs/heads/test.
Accumulated times:
    8.350 : <unmeasured> (73.2%)
    1.403 : 1 : ....label:checkout  
       0.000 : <unmeasured> ( 0.0%)
       1.403 : 1 : ......label:unpack_trees
          0.312 : <unmeasured> (22.3%)
          0.539 : 1 : ........label:traverse_trees
          0.401 : 1 : ........label:clear_ce_flags/0x00000000_0x02000000
          0.128 : 1 : ........label:check_updates
             0.128 : <unmeasured> (100.0%)
             0.000 : 1 : ..........label:Filtering content
          0.021 : 1 : ........label:clear_ce_flags/0x00080000_0x42000000
          0.000 : 1 : ........label:fully_valid
    1.081 : 1 : ....label:do_write_index /_git/office/src/.git/index.lock
       0.951 : <unmeasured> (88.1%)
       0.129 : 1 : ......label:write/extension/cache_tree
    0.432 : 2 : ....label:fully_valid
    0.143 : 1 : ....label:do_read_index .git/index
       0.019 : <unmeasured> (13.1%)
       0.125 : 1 : label:read/extension/cache_tree
    0.004 : 1 : ....label:incore_nonrecursive
       0.001 : <unmeasured> (25.8%)
       0.002 : 1 : ......label:process_entries
          0.000 : <unmeasured> ( 2.6%)
          0.001 : 1 : ........label:finalize
          0.001 : 1 : ........label:process_entries setup
             0.000 : <unmeasured> ( 6.7%)
             0.001 : 1 : ..........label:plist sort
             0.000 : 1 : ..........label:plist copy
             0.000 : 1 : ..........label:plist grow
          0.000 : 1 : ........label:processing
       0.001 : 1 : ......label:collect_merge_info
          0.000 : <unmeasured> (35.3%)
          0.001 : 1 : ........label:traverse_trees
       0.000 : 1 : ......label:merge_start
          0.000 : <unmeasured> (42.3%)
          0.000 : 1 : ........label:allocate/init
          0.000 : 1 : ........label:sanity checks
       0.000 : 1 : ......label:renames 
    0.001 : 1 : ....label:traverse_trees
    0.000 : 1 : ....label:write_auto_merge
    0.000 : 1 : ....label:check_updates
    0.000 : 1 : ....label:record_unmerged
Estimated measurement overhead (.010 ms/region-measure * 56): 0.0005600000000000001
Timing including forking: 11.442 (0.027 additional seconds)

$ /_git/git/summarize-perf git -c pull.twohead=ort rebase --onto from to test
Successfully rebased and updated refs/heads/test.
Accumulated times:
    8.337 : <unmeasured> (73.2%)
    1.395 : 1 : ....label:checkout  
       0.000 : <unmeasured> ( 0.0%)
       1.395 : 1 : ......label:unpack_trees
          0.309 : <unmeasured> (22.1%)
          0.537 : 1 : ........label:traverse_trees
          0.403 : 1 : ........label:clear_ce_flags/0x00000000_0x02000000
          0.124 : 1 : ........label:check_updates
             0.124 : <unmeasured> (100.0%)
             0.000 : 1 : ..........label:Filtering content
          0.021 : 1 : ........label:clear_ce_flags/0x00080000_0x42000000
          0.000 : 1 : ........label:fully_valid
    1.084 : 1 : ....label:do_write_index /_git/office/src/.git/index.lock
       0.955 : <unmeasured> (88.1%)
       0.129 : 1 : ......label:write/extension/cache_tree
    0.436 : 2 : ....label:fully_valid
    0.137 : 1 : ....label:do_read_index .git/index
       0.013 : <unmeasured> ( 9.3%)
       0.125 : 1 : label:read/extension/cache_tree
    0.004 : 1 : ....label:incore_nonrecursive
       0.001 : <unmeasured> (24.5%)
       0.002 : 1 : ......label:process_entries
          0.000 : <unmeasured> ( 2.5%)
          0.001 : 1 : ........label:finalize
          0.001 : 1 : ........label:process_entries setup
             0.000 : <unmeasured> ( 6.5%)
             0.001 : 1 : ..........label:plist sort
             0.000 : 1 : ..........label:plist copy
             0.000 : 1 : ..........label:plist grow
          0.000 : 1 : ........label:processing
       0.001 : 1 : ......label:collect_merge_info
          0.000 : <unmeasured> (26.5%)
          0.001 : 1 : ........label:traverse_trees
       0.000 : 1 : ......label:merge_start
          0.000 : <unmeasured> (43.1%)
          0.000 : 1 : ........label:allocate/init
          0.000 : 1 : ........label:sanity checks
       0.000 : 1 : ......label:renames 
    0.001 : 1 : ....label:traverse_trees
    0.000 : 1 : ....label:write_auto_merge
    0.000 : 1 : ....label:check_updates
    0.000 : 1 : ....label:record_unmerged
Estimated measurement overhead (.010 ms/region-measure * 56): 0.0005600000000000001
Timing including forking: 11.418 (0.024 additional seconds)

And here are timings for a simple merge. Two files at root were changed in the
commits I made, but there are also some larger changes from the commit history.
These should all be seen as "this tree updated in one of the two, so take that
tree".

$ git reset --hard test2 && /_git/git/summarize-perf git merge test -m test
Merge made by the 'recursive' strategy.
Accumulated times:
    2.647 : <unmeasured> (48.6%)
    1.384 : 1 : ..label:unpack_trees
       0.267 : <unmeasured> (19.3%)
       0.582 : 1 : ....label:traverse_trees
       0.391 : 1 : ....label:clear_ce_flags/0x00000000_0x02000000
       0.124 : 1 : ....label:check_updates
          0.124 : <unmeasured> (100.0%)
          0.000 : 1 : ......label:Filtering content
       0.021 : 1 : ....label:clear_ce_flags/0x00080000_0x42000000
       0.000 : 1 : ....label:fully_valid
    1.060 : 1 : ..label:do_write_index /_git/office/src/.git/index.lock
       0.931 : <unmeasured> (87.9%)
       0.128 : 1 : ....label:write/extension/cache_tree
    0.226 : 1 : ..label:fully_valid 
    0.134 : 1 : ..label:do_read_index .git/index
       0.008 : <unmeasured> ( 5.8%)
       0.126 : 1 : label:read/extension/cache_tree
    0.001 : 1 : ..label:traverse_trees
    0.000 : 1 : ..label:check_updates
    0.000 : 1 : ..label:setup       
    0.000 : 1 : ..label:write back to queue
Estimated measurement overhead (.010 ms/region-measure * 20): 0.0002
Timing including forking:  5.466 (0.015 additional seconds)

$ git reset --hard test2 && /_git/git/summarize-perf git -c pull.twohead=ort merge test -m test
Merge made by the 'ort' strategy.
Accumulated times:
    2.531 : <unmeasured> (49.1%)
    1.328 : 1 : ..label:checkout    
       0.000 : <unmeasured> ( 0.0%)
       1.328 : 1 : ....label:unpack_trees
          0.228 : <unmeasured> (17.2%)
          0.566 : 1 : ......label:traverse_trees
          0.388 : 1 : ......label:clear_ce_flags/0x00000000_0x02000000
          0.125 : 1 : ......label:check_updates
             0.125 : <unmeasured> (100.0%)
             0.000 : 1 : ........label:Filtering content
          0.021 : 1 : ......label:clear_ce_flags/0x00080000_0x42000000
          0.000 : 1 : ......label:fully_valid
    1.067 : 1 : ..label:do_write_index /_git/office/src/.git/index.lock
       0.938 : <unmeasured> (87.9%)
       0.129 : 1 : ....label:write/extension/cache_tree
    0.230 : 1 : ..label:fully_valid 
    0.002 : 1 : ..label:incore_recursive
       0.001 : <unmeasured> (22.3%)
       0.001 : 1 : ....label:collect_merge_info
          0.001 : <unmeasured> (60.2%)
          0.000 : 1 : ......label:traverse_trees
       0.001 : 1 : ....label:process_entries
          0.000 : <unmeasured> ( 2.8%)
          0.001 : 1 : ......label:finalize
          0.000 : 1 : ......label:process_entries setup
             0.000 : <unmeasured> ( 6.9%)
             0.000 : 1 : ........label:plist sort
             0.000 : 1 : ........label:plist copy
             0.000 : 1 : ........label:plist grow
          0.000 : 1 : ......label:processing
       0.000 : 1 : ....label:merge_start
          0.000 : <unmeasured> (50.0%)
          0.000 : 1 : ......label:allocate/init
          0.000 : 1 : ......label:sanity checks
       0.000 : 1 : ....label:renames   
    0.001 : 1 : ..label:traverse_trees
    0.000 : 1 : ..label:write_auto_merge
    0.000 : 1 : ..label:check_updates
    0.000 : 1 : ..label:setup       
    0.000 : 1 : ..label:display messages
    0.000 : 1 : ..label:write back to queue
    0.000 : 1 : ..label:record_unmerged
Estimated measurement overhead (.010 ms/region-measure * 36): 0.00036
Timing including forking:  5.174 (0.015 additional seconds)

Thanks,
-Stolee
Elijah Newren Nov. 9, 2020, 5:13 p.m. UTC | #7
Hi Derrick,

On Mon, Nov 9, 2020 at 4:30 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 11/7/2020 2:39 PM, Elijah Newren wrote:

> > *1*. Could you give me the accumulated times from the trace2_regions
> > so we can verify where the time is spent?  The 'summarize-perf' script
> > at the toplevel of the repo in my ort branch might be helpful for
> > this; just prefix any git command with that script and it accumulates
> > the trace2 region times and prints them out.  For example, I could run
> > 'summarize-perf git merge --no-edit B^0' or 'summarize-perf test-tool
> > fast-rebase --onto HEAD ca76bea9 myfeature'.  Here's an example:
> >
> > === BEGIN OUTPUT ===
> > $ /home/newren/floss/git/summarize-perf test-tool fast-rebase --onto
> > HEAD 4703d9119972bf586d2cca76ec6438f819ffa30e hwmon-updates
> > Rebasing fd8bdb23b91876ac1e624337bb88dc1dcc21d67e...
> > Done.
> > Accumulated times:
> >     0.031 : <unmeasured> ( 3.2%)
> >     0.837 : 35 : label:incore_nonrecursive
> >        0.003 : <unmeasured> ( 0.4%)
> >        0.476 : 41 : ..label:collect_merge_info
> >           0.001 : <unmeasured> ( 0.2%)
> >           0.475 : 41 : ....label:traverse_trees
> >        0.298 : 41 : ..label:renames
> >           0.015 : <unmeasured> ( 5.1%)
> >           0.280 : 41 : ....label:regular renames
> >              0.036 : <unmeasured> (12.7%)
> >              0.244 : 6 : ......label:diffcore_rename
> >                 0.001 : <unmeasured> ( 0.4%)
> >                 0.078 : 6 : ........label:dir rename setup
> >                 0.055 : 6 : ........label:basename matches
> >                 0.051 : 6 : ........label:exact renames
> >                 0.031 : 6 : ........label:write back to queue
> >                 0.017 : 6 : ........label:setup
> >                 0.009 : 6 : ........label:cull basename
> >                 0.003 : 6 : ........label:cull exact
> >           0.002 : 35 : ....label:directory renames
> >           0.001 : 35 : ....label:process renames
> >        0.052 : 35 : ..label:process_entries
> >           0.001 : <unmeasured> ( 1.7%)
> >           0.033 : 35 : ....label:processing
> >           0.017 : 35 : ....label:process_entries setup
> >              0.001 : <unmeasured> ( 5.8%)
> >              0.008 : 35 : ......label:plist copy
> >              0.008 : 35 : ......label:plist sort
> >              0.000 : 35 : ......label:plist grow
> >           0.001 : 35 : ....label:finalize
> >        0.005 : 35 : ..label:merge_start
> >           0.001 : <unmeasured> (18.8%)
> >           0.004 : 34 : ....label:reset_maps
> >           0.000 : 35 : ....label:sanity checks
> >           0.000 : 1 : ....label:allocate/init
> >        0.003 : 6 : ..label:reset_maps
> >     0.035 : 1 : label:do_write_index
> > /home/newren/floss/linux-stable/.git/index.lock
> >     0.034 : 1 : label:checkout
> >        0.034 : <unmeasured> (99.9%)
> >        0.000 : 1 : ..label:Filtering content
> >     0.009 : 1 : label:do_read_index .git/index
> >     0.000 : 1 : label:write_auto_merge
> >     0.000 : 1 : label:record_unmerged
> > Estimated measurement overhead (.010 ms/region-measure * 679):
> > 0.006790000000000001
> > Timing including forking:  0.960 (0.013 additional seconds)
> > === END OUTPUT ===
> > This was a run that took just under 1s (and was a hot-cache case; I
> > had just done the same rebase before to warm the caches), and the
> > combination of index/working tree bits (everything at and after
> > do_write_index in the output) was 0.035+0.034+0.009+0+0=0.078 seconds,
> > corresponding to just over 8.1% of overall time.  I'm curious where
> > that lands for your repository testcase; if the larger time ends up
> > somewhere under the indented label:incore_nonrecursive region, then
> > it's due to something other than index reading/updating/writing.
> >
> > *2*. If it really is due to index reading/updating/writing, then index
> > handling in merge-ort is confined to two functions: checkout() and
> > record_unmerged_index_entries().  Both functions aren't too long, and
> > neither one calls into any other function within merge-ort.c.
> > (Further, checkout() is a near copy of code from merge_working_tree()
> > in builtin/checkout.c, or at least a copy of that function from a year
> > or so ago.)  As such, it's possible you can go in and make whatever
> > special tweaks you have for partial index reading/writing to those
> > functions.
> >
> > I'm curious to hear back more on this.
>
> I don't have a lot of time to dig into this right now, but here are
> the stats for my rebases and merges with and without your option.

Actually, this was pretty enlightening.  I think I know about what's
happening...

First, a few years ago, Ben said that merges in the Microsoft repos
took about an hour[1]:
"For the repro that I have been using this drops the merge time from ~1 hour to
~5 minutes and the unmerged entries goes down from ~40,000 to 1."
The change he made to drop it that far was to turn off rename detection.

[1] https://lore.kernel.org/git/20180426205202.23056-1-benpeart@microsoft.com/

Keep that in mind, especially since your times are actually
significantly less than 5 minutes...

> The first thing I notice for each is that there is a significant
> amount of "unmeasured" time at the beginning of each, and that
> could possibly be improved separately.
>
> First, try a rebase forward and backward.
>
> $ /_git/git/summarize-perf git rebase --onto to from test
> Successfully rebased and updated refs/heads/test.
> Accumulated times:
>     8.511 : <unmeasured> (74.9%)

Wild guess: This is setup_git_directory() loading your ~3 million entry index.

>     1.331 : 1 : ......label:unpack_trees
>        0.200 : <unmeasured> (15.1%)
>        0.580 : 1 : ........label:traverse_trees
>        0.403 : 1 : ........label:clear_ce_flags/0x00000000_0x02000000
>        0.126 : 1 : ........label:check_updates
>           0.126 : <unmeasured> (100.0%)
>           0.000 : 1 : ..........label:Filtering content
>        0.021 : 1 : ........label:clear_ce_flags/0x00080000_0x42000000
>        0.000 : 1 : ........label:fully_valid
>     1.059 : 1 : ......label:do_write_index /_git/office/src/.git/index.lock
>        0.930 : <unmeasured> (87.9%)
>        0.128 : 1 : ........label:write/extension/cache_tree
>     0.455 : 2 : ......label:fully_valid
>     0.001 : 1 : ......label:traverse_trees
>     0.000 : 1 : ......label:check_updates
> Estimated measurement overhead (.010 ms/region-measure * 41): 0.00041000000000000005
> Timing including forking: 11.382 (0.026 additional seconds)
>
> $ /_git/git/summarize-perf git rebase --onto from to test
> Successfully rebased and updated refs/heads/test.
> Accumulated times:
>     8.556 : <unmeasured> (75.2%)
>     1.315 : 1 : ......label:unpack_trees
>        0.197 : <unmeasured> (15.0%)
>        0.580 : 1 : ........label:traverse_trees
>        0.391 : 1 : ........label:clear_ce_flags/0x00000000_0x02000000
>        0.126 : 1 : ........label:check_updates
>           0.126 : <unmeasured> (100.0%)
>           0.000 : 1 : ..........label:Filtering content
>        0.021 : 1 : ........label:clear_ce_flags/0x00080000_0x42000000
>        0.000 : 1 : ........label:fully_valid
>     1.071 : 1 : ......label:do_write_index /_git/office/src/.git/index.lock
>        0.942 : <unmeasured> (88.0%)
>        0.129 : 1 : ........label:write/extension/cache_tree
>     0.431 : 2 : ......label:fully_valid
>     0.001 : 1 : ......label:traverse_trees
>     0.000 : 1 : ......label:check_updates
> Estimated measurement overhead (.010 ms/region-measure * 41): 0.00041000000000000005
> Timing including forking: 11.399 (0.026 additional seconds)

Did you include two runs of recursive and two runs of ort just to show
that the timings were stable and thus there wasn't warm or cold disk
cache issues affecting things?  If so, good plan.  (If there was
another reason, let me know; I missed it.)

> Then do the same with the ort strategy.
>
> $ /_git/git/summarize-perf git -c pull.twohead=ort rebase --onto to from test
> Successfully rebased and updated refs/heads/test.
> Accumulated times:
>     8.350 : <unmeasured> (73.2%)
>     1.403 : 1 : ....label:checkout
>        0.000 : <unmeasured> ( 0.0%)
>        1.403 : 1 : ......label:unpack_trees
>           0.312 : <unmeasured> (22.3%)
>           0.539 : 1 : ........label:traverse_trees
>           0.401 : 1 : ........label:clear_ce_flags/0x00000000_0x02000000
>           0.128 : 1 : ........label:check_updates
>              0.128 : <unmeasured> (100.0%)
>              0.000 : 1 : ..........label:Filtering content
>           0.021 : 1 : ........label:clear_ce_flags/0x00080000_0x42000000
>           0.000 : 1 : ........label:fully_valid
>     1.081 : 1 : ....label:do_write_index /_git/office/src/.git/index.lock
>        0.951 : <unmeasured> (88.1%)
>        0.129 : 1 : ......label:write/extension/cache_tree
>     0.432 : 2 : ....label:fully_valid
>     0.143 : 1 : ....label:do_read_index .git/index
>        0.019 : <unmeasured> (13.1%)
>        0.125 : 1 : label:read/extension/cache_tree
>     0.004 : 1 : ....label:incore_nonrecursive
>        0.001 : <unmeasured> (25.8%)
>        0.002 : 1 : ......label:process_entries
>           0.000 : <unmeasured> ( 2.6%)
>           0.001 : 1 : ........label:finalize
>           0.001 : 1 : ........label:process_entries setup
>              0.000 : <unmeasured> ( 6.7%)
>              0.001 : 1 : ..........label:plist sort
>              0.000 : 1 : ..........label:plist copy
>              0.000 : 1 : ..........label:plist grow
>           0.000 : 1 : ........label:processing
>        0.001 : 1 : ......label:collect_merge_info
>           0.000 : <unmeasured> (35.3%)
>           0.001 : 1 : ........label:traverse_trees
>        0.000 : 1 : ......label:merge_start
>           0.000 : <unmeasured> (42.3%)
>           0.000 : 1 : ........label:allocate/init
>           0.000 : 1 : ........label:sanity checks
>        0.000 : 1 : ......label:renames
>     0.001 : 1 : ....label:traverse_trees
>     0.000 : 1 : ....label:write_auto_merge
>     0.000 : 1 : ....label:check_updates
>     0.000 : 1 : ....label:record_unmerged
> Estimated measurement overhead (.010 ms/region-measure * 56): 0.0005600000000000001
> Timing including forking: 11.442 (0.027 additional seconds)

.004s on label:incore_nonrecursive -- that's the actual merge
operation.  This was a trivial rebase, and the merging took just 4
milliseconds.  But the overall run took 11.442 seconds because working
with 3M+ entries in the index just takes forever, and my code didn't
touch any on-disk formats, certainly not the index format.

_All_ of my optimization work was on the merging piece, not the stuff
outside.  But for what you're testing here, it appears to be
irrelevant compared to the overhead.

> $ /_git/git/summarize-perf git -c pull.twohead=ort rebase --onto from to test
> Successfully rebased and updated refs/heads/test.
> Accumulated times:
>     8.337 : <unmeasured> (73.2%)
>     1.395 : 1 : ....label:checkout
>        0.000 : <unmeasured> ( 0.0%)
>        1.395 : 1 : ......label:unpack_trees
>           0.309 : <unmeasured> (22.1%)
>           0.537 : 1 : ........label:traverse_trees
>           0.403 : 1 : ........label:clear_ce_flags/0x00000000_0x02000000
>           0.124 : 1 : ........label:check_updates
>              0.124 : <unmeasured> (100.0%)
>              0.000 : 1 : ..........label:Filtering content
>           0.021 : 1 : ........label:clear_ce_flags/0x00080000_0x42000000
>           0.000 : 1 : ........label:fully_valid
>     1.084 : 1 : ....label:do_write_index /_git/office/src/.git/index.lock
>        0.955 : <unmeasured> (88.1%)
>        0.129 : 1 : ......label:write/extension/cache_tree
>     0.436 : 2 : ....label:fully_valid
>     0.137 : 1 : ....label:do_read_index .git/index
>        0.013 : <unmeasured> ( 9.3%)
>        0.125 : 1 : label:read/extension/cache_tree
>     0.004 : 1 : ....label:incore_nonrecursive
>        0.001 : <unmeasured> (24.5%)
>        0.002 : 1 : ......label:process_entries
>           0.000 : <unmeasured> ( 2.5%)
>           0.001 : 1 : ........label:finalize
>           0.001 : 1 : ........label:process_entries setup
>              0.000 : <unmeasured> ( 6.5%)
>              0.001 : 1 : ..........label:plist sort
>              0.000 : 1 : ..........label:plist copy
>              0.000 : 1 : ..........label:plist grow
>           0.000 : 1 : ........label:processing
>        0.001 : 1 : ......label:collect_merge_info
>           0.000 : <unmeasured> (26.5%)
>           0.001 : 1 : ........label:traverse_trees
>        0.000 : 1 : ......label:merge_start
>           0.000 : <unmeasured> (43.1%)
>           0.000 : 1 : ........label:allocate/init
>           0.000 : 1 : ........label:sanity checks
>        0.000 : 1 : ......label:renames
>     0.001 : 1 : ....label:traverse_trees
>     0.000 : 1 : ....label:write_auto_merge
>     0.000 : 1 : ....label:check_updates
>     0.000 : 1 : ....label:record_unmerged
> Estimated measurement overhead (.010 ms/region-measure * 56): 0.0005600000000000001
> Timing including forking: 11.418 (0.024 additional seconds)

Ah, you included two copies for merge-ort too.  I'm guessing you did
that just to show there wasn't some cold cache issues or something and
that the runs showed consistent timings?


> And here are timings for a simple merge. Two files at root were changed in the
> commits I made, but there are also some larger changes from the commit history.
> These should all be seen as "this tree updated in one of the two, so take that
> tree".

Ahah!  That's a microsoft-specific optimization you guys made in the
recursive strategy, yes?  It does NOT exist in upstream git.  It's
also one that is nearly incompatible with rename detection; it turns
out you can only do that optimization in the face of rename detection
if you do a HUGE amount of specialized work and tracking in order to
determine when it's safe _despite_ needing to detect renames.  I
thought that optimization was totally incompatible with rename
detection for a long time; I tried it a couple times while working on
ort and watched it break all kinds of rename tests...but I eventually
discovered some tricks involving a lot of work to be able to run that
optimization.

So, you aren't comparing upstream "recursive" to "ort", you're
comparing a tweaked version of recursive, and one that is incompatible
with how recursive's rename detection work.  In fact, just to be clear
in case you go looking, I suspect that this tweak is to be found
within unpack_trees.c (which recursive relies on heavily).

Further, you've set it up so there are only a few files changed after
unpack_trees returns.

In total, you have: (1) turned off rename detection (most my
optimizations are for improving this factor, meaning I can't show an
advantage), (2) you took advantage of no rename detection to implement
trivial-tree merges (thus killing the main second advantage my
algorithm has), and (3) you are looking at a case with a tiny number
of changes for the merge algorithm to process (thus killing a third
optimization that removes quadratic performance).  Those are my three
big optimizations, and you've made them all irrelevant.  In fact,
you're in an area I would have been worried that ort would do _worse_
than recursive.  I track an awful lot of things and there is overhead
in checking and filling all that information in; if there are only a
few entries to merge, then all that information was a waste to collect
and ort might be slower than recursive.  But then again, that should
be a case where both algorithms are "nearly instantaneous" (or would
be if it weren't for your 3M+ index entry repo causing run_builtin()'s
call to setup_git_directory() in git.c to take a huge amount of time
before the builtin is even called.)


> $ git reset --hard test2 && /_git/git/summarize-perf git merge test -m test
> Merge made by the 'recursive' strategy.
> Accumulated times:
>     2.647 : <unmeasured> (48.6%)
>     1.384 : 1 : ..label:unpack_trees
>        0.267 : <unmeasured> (19.3%)
>        0.582 : 1 : ....label:traverse_trees
>        0.391 : 1 : ....label:clear_ce_flags/0x00000000_0x02000000
>        0.124 : 1 : ....label:check_updates
>           0.124 : <unmeasured> (100.0%)
>           0.000 : 1 : ......label:Filtering content
>        0.021 : 1 : ....label:clear_ce_flags/0x00080000_0x42000000
>        0.000 : 1 : ....label:fully_valid
>     1.060 : 1 : ..label:do_write_index /_git/office/src/.git/index.lock
>        0.931 : <unmeasured> (87.9%)
>        0.128 : 1 : ....label:write/extension/cache_tree
>     0.226 : 1 : ..label:fully_valid
>     0.134 : 1 : ..label:do_read_index .git/index
>        0.008 : <unmeasured> ( 5.8%)
>        0.126 : 1 : label:read/extension/cache_tree
>     0.001 : 1 : ..label:traverse_trees
>     0.000 : 1 : ..label:check_updates
>     0.000 : 1 : ..label:setup
>     0.000 : 1 : ..label:write back to queue
> Estimated measurement overhead (.010 ms/region-measure * 20): 0.0002
> Timing including forking:  5.466 (0.015 additional seconds)

5 seconds.  I do have to hand it to Ben and anyone else involved,
though.  From 1 hour down to 5 seconds is pretty good, even if it was
done by hacks (turning off rename detection, and then implementing
trivial-tree merging that would have broken rename detection).  I
suspect that whoever did that work might have found the unconditional
discarding and re-reading of the index and fixed it as well?

> $ git reset --hard test2 && /_git/git/summarize-perf git -c pull.twohead=ort merge test -m test
> Merge made by the 'ort' strategy.
> Accumulated times:
>     2.531 : <unmeasured> (49.1%)
>     1.328 : 1 : ..label:checkout
>        0.000 : <unmeasured> ( 0.0%)
>        1.328 : 1 : ....label:unpack_trees
>           0.228 : <unmeasured> (17.2%)
>           0.566 : 1 : ......label:traverse_trees
>           0.388 : 1 : ......label:clear_ce_flags/0x00000000_0x02000000
>           0.125 : 1 : ......label:check_updates
>              0.125 : <unmeasured> (100.0%)
>              0.000 : 1 : ........label:Filtering content
>           0.021 : 1 : ......label:clear_ce_flags/0x00080000_0x42000000
>           0.000 : 1 : ......label:fully_valid
>     1.067 : 1 : ..label:do_write_index /_git/office/src/.git/index.lock
>        0.938 : <unmeasured> (87.9%)
>        0.129 : 1 : ....label:write/extension/cache_tree
>     0.230 : 1 : ..label:fully_valid
>     0.002 : 1 : ..label:incore_recursive
>        0.001 : <unmeasured> (22.3%)
>        0.001 : 1 : ....label:collect_merge_info
>           0.001 : <unmeasured> (60.2%)
>           0.000 : 1 : ......label:traverse_trees
>        0.001 : 1 : ....label:process_entries
>           0.000 : <unmeasured> ( 2.8%)
>           0.001 : 1 : ......label:finalize
>           0.000 : 1 : ......label:process_entries setup
>              0.000 : <unmeasured> ( 6.9%)
>              0.000 : 1 : ........label:plist sort
>              0.000 : 1 : ........label:plist copy
>              0.000 : 1 : ........label:plist grow
>           0.000 : 1 : ......label:processing
>        0.000 : 1 : ....label:merge_start
>           0.000 : <unmeasured> (50.0%)
>           0.000 : 1 : ......label:allocate/init
>           0.000 : 1 : ......label:sanity checks
>        0.000 : 1 : ....label:renames
>     0.001 : 1 : ..label:traverse_trees
>     0.000 : 1 : ..label:write_auto_merge
>     0.000 : 1 : ..label:check_updates
>     0.000 : 1 : ..label:setup
>     0.000 : 1 : ..label:display messages
>     0.000 : 1 : ..label:write back to queue
>     0.000 : 1 : ..label:record_unmerged
> Estimated measurement overhead (.010 ms/region-measure * 36): 0.00036
> Timing including forking:  5.174 (0.015 additional seconds)

Heh, yeah 0.002 seconds for ..label:incore_recursive.  Only 2
milliseconds to create the actual merge tree.  That does suggest you
might have fun with 'git log -p --remerge-diff'; if you can redo
merges in 2 milliseconds, showing them in git log output is very
reasonable.  :-)


Could we have some fun, though?  What if you have some merge or rebase
involving lots of changes, and you turn rename detection back on, and
you disable that trivial-tree resolution optimization that breaks
recursive's rename detection handling...and then compare recursive and
ort?  (It might be easiest to just compare upstream recursive rather
than the one with all the microsoft changes to make sure you undid
whatever trivial tree handling work exists.)

For example, my testcase in the linux kernel was finding a series of a
few dozen patches I could rebase back to an older version, but
tweaking the "older" version by renaming drivers/ -> pilots/ (with
about 26K files under that directory, that meant about 26K renames).
So, I got to see rebasing of dozens of real changes across a massive
rename boundary -- and the massive rename boundary also guaranteed
there were lots of entries for the merge algorithm to deal with.


In the end, though, 4 milliseconds for the rebase and 2 milliseconds
for the merge, with the rest all being overhead of interfacing to the
index and working tree actually seems pretty good to me.  I'm just
curious if we can check how things work for more involved cases.
Derrick Stolee Nov. 9, 2020, 7:51 p.m. UTC | #8
On 11/9/20 12:13 PM, Elijah Newren wrote:> Actually, this was pretty enlightening.  I think I know about what's
> happening...
> 
> First, a few years ago, Ben said that merges in the Microsoft repos
> took about an hour[1]:
> "For the repro that I have been using this drops the merge time from ~1 hour to
> ~5 minutes and the unmerged entries goes down from ~40,000 to 1."
> The change he made to drop it that far was to turn off rename detection.
> 
> [1] https://lore.kernel.org/git/20180426205202.23056-1-benpeart@microsoft.com/
> 
> Keep that in mind, especially since your times are actually
> significantly less than 5 minutes...

Yes, the other thing to keep in mind is that this is
a Scalar repo with the default cone-mode sparse-checkout
of only the files at root. For this repo, that means that
there are only ~10 files actually present.

I wanted to remove any working directory updates/checks
from the performance check as possible.

>> $ /_git/git/summarize-perf git rebase --onto to from test
>> Successfully rebased and updated refs/heads/test.
>> Accumulated times:
>>     8.511 : <unmeasured> (74.9%)
> 
> Wild guess: This is setup_git_directory() loading your ~3 million entry index.

I think there is also some commit walking happening, but
it shouldn't be too much. 'from' and 'to' are not very
far away.

> Did you include two runs of recursive and two runs of ort just to show
> that the timings were stable and thus there wasn't warm or cold disk
> cache issues affecting things?  If so, good plan.  (If there was
> another reason, let me know; I missed it.)

For the rebase, I did "--onto to from test" and "--onto from to test"
to show both directions of the rebase. The merge I did twice for the
cache issues ;)

> .004s on label:incore_nonrecursive -- that's the actual merge
> operation.  This was a trivial rebase, and the merging took just 4
> milliseconds.  But the overall run took 11.442 seconds because working
> with 3M+ entries in the index just takes forever, and my code didn't
> touch any on-disk formats, certainly not the index format.
> 
> _All_ of my optimization work was on the merging piece, not the stuff
> outside.  But for what you're testing here, it appears to be
> irrelevant compared to the overhead.

OK, so since we already disable rename detection through config,
the machinery that you are changing is already fast with the old
algorithm in these trivial cases.

To actually show any benefits, we would need to disable rename
detection or use a larger change.
>> And here are timings for a simple merge. Two files at root were changed in the
>> commits I made, but there are also some larger changes from the commit history.
>> These should all be seen as "this tree updated in one of the two, so take that
>> tree".
> 
> Ahah!  That's a microsoft-specific optimization you guys made in the
> recursive strategy, yes? 

I'm not aware of any logic we have that's different from core Git.
The config we use [1] includes "merge.stat = false" and "merge.renames
= false" but otherwise seems to be using stock Git.

[1] https://github.com/microsoft/scalar/blob/1d7938d2df6921f7a3b4f3f1cce56a00929adc40/Scalar.Common/Maintenance/ConfigStep.cs#L100-L127

I'm CC'ing Jeff Hostetler to see if he knows anything about a custom
merge algorithm in microsoft/git.

> It does NOT exist in upstream git.  It's
> also one that is nearly incompatible with rename detection; it turns
> out you can only do that optimization in the face of rename detection
> if you do a HUGE amount of specialized work and tracking in order to
> determine when it's safe _despite_ needing to detect renames. 

Perhaps merge.renames=false is enough to trigger this logic already?

> I
> thought that optimization was totally incompatible with rename
> detection for a long time; I tried it a couple times while working on
> ort and watched it break all kinds of rename tests...but I eventually
> discovered some tricks involving a lot of work to be able to run that
> optimization.

I will try to keep this in mind.

> So, you aren't comparing upstream "recursive" to "ort", you're
> comparing a tweaked version of recursive, and one that is incompatible
> with how recursive's rename detection work.  In fact, just to be clear
> in case you go looking, I suspect that this tweak is to be found
> within unpack_trees.c (which recursive relies on heavily).
> 
> Further, you've set it up so there are only a few files changed after
> unpack_trees returns.
> 
> In total, you have: (1) turned off rename detection (most my
> optimizations are for improving this factor, meaning I can't show an
> advantage), (2) you took advantage of no rename detection to implement
> trivial-tree merges (thus killing the main second advantage my
> algorithm has), and (3) you are looking at a case with a tiny number
> of changes for the merge algorithm to process (thus killing a third
> optimization that removes quadratic performance).  Those are my three
> big optimizations, and you've made them all irrelevant.  In fact,
> you're in an area I would have been worried that ort would do _worse_
> than recursive.  I track an awful lot of things and there is overhead
> in checking and filling all that information in; if there are only a
> few entries to merge, then all that information was a waste to collect
> and ort might be slower than recursive.  But then again, that should
> be a case where both algorithms are "nearly instantaneous" (or would
> be if it weren't for your 3M+ index entry repo causing run_builtin()'s
> call to setup_git_directory() in git.c to take a huge amount of time
> before the builtin is even called.)

Thanks for your time isolating this case. I appreciate knowing exactly
which portions of the merge algorithm are being touched and which are
not.
> 5 seconds.  I do have to hand it to Ben and anyone else involved,
> though.  From 1 hour down to 5 seconds is pretty good, even if it was
> done by hacks (turning off rename detection, and then implementing
> trivial-tree merging that would have broken rename detection).  I
> suspect that whoever did that work might have found the unconditional
> discarding and re-reading of the index and fixed it as well?

As you can probably tell from my general confusion, I had nothing
to do with it. ;)

> Heh, yeah 0.002 seconds for ..label:incore_recursive.  Only 2
> milliseconds to create the actual merge tree.  That does suggest you
> might have fun with 'git log -p --remerge-diff'; if you can redo
> merges in 2 milliseconds, showing them in git log output is very
> reasonable.  :-)

Yeah, 'git merge-tree' is very fast for these cases, so I assumed
that something else was going on for that command.

> Could we have some fun, though?  What if you have some merge or rebase
> involving lots of changes, and you turn rename detection back on, and
> you disable that trivial-tree resolution optimization that breaks
> recursive's rename detection handling...and then compare recursive and
> ort?  (It might be easiest to just compare upstream recursive rather
> than the one with all the microsoft changes to make sure you undid
> whatever trivial tree handling work exists.)

I can try these kinds of cases, but it won't be today. I'm on kid duty
today, and answering emails in between running around with them.

> For example, my testcase in the linux kernel was finding a series of a
> few dozen patches I could rebase back to an older version, but
> tweaking the "older" version by renaming drivers/ -> pilots/ (with
> about 26K files under that directory, that meant about 26K renames).
> So, I got to see rebasing of dozens of real changes across a massive
> rename boundary -- and the massive rename boundary also guaranteed
> there were lots of entries for the merge algorithm to deal with.
> 
> In the end, though, 4 milliseconds for the rebase and 2 milliseconds
> for the merge, with the rest all being overhead of interfacing to the
> index and working tree actually seems pretty good to me.  I'm just
> curious if we can check how things work for more involved cases.

I'm definitely interested in identifying how your algorithm improves
over the previous cases, and perhaps re-enabling rename detection for
merges is enough of a benefit to justify the new one.

Eventually, I hope to actually engage with your patches in the form
of review. Just trying to build a mental model for what's going on
first.

Thanks,
-Stolee
Elijah Newren Nov. 9, 2020, 10:44 p.m. UTC | #9
Hi Derrick,

On Mon, Nov 9, 2020 at 11:51 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 11/9/20 12:13 PM, Elijah Newren wrote:> Actually, this was pretty enlightening.  I think I know about what's
> > happening...
> >
> > First, a few years ago, Ben said that merges in the Microsoft repos
> > took about an hour[1]:
> > "For the repro that I have been using this drops the merge time from ~1 hour to
> > ~5 minutes and the unmerged entries goes down from ~40,000 to 1."
> > The change he made to drop it that far was to turn off rename detection.
> >
> > [1] https://lore.kernel.org/git/20180426205202.23056-1-benpeart@microsoft.com/
> >
> > Keep that in mind, especially since your times are actually
> > significantly less than 5 minutes...
>
> Yes, the other thing to keep in mind is that this is
> a Scalar repo with the default cone-mode sparse-checkout
> of only the files at root. For this repo, that means that
> there are only ~10 files actually present.
>
> I wanted to remove any working directory updates/checks
> from the performance check as possible.

Ah, that explains how you got under 20s.  I remember elsewhere on the
list someone (I think it was Ben again) mentioned that a "git checkout
-b <newbranch>" took 20s, despite no need to update the working tree
or index.

I have only done one cursory test of merge-ort with sparse-checkouts;
I should do more.  There might be a bug somewhere, though it does at
least pass the regression tests and I think for the most part it's
actually better: there are cases where merge-recursive will vivify
files outside the sparse-checkout which were not conflicted (see e.g.
https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/);
in contrast, merge-ort shouldn't have any such cases -- it'll only add
files to the working copy if they match the sparsity patterns or the
path has conflicts.

> >> $ /_git/git/summarize-perf git rebase --onto to from test
> >> Successfully rebased and updated refs/heads/test.
> >> Accumulated times:
> >>     8.511 : <unmeasured> (74.9%)
> >
> > Wild guess: This is setup_git_directory() loading your ~3 million entry index.
>
> I think there is also some commit walking happening, but
> it shouldn't be too much. 'from' and 'to' are not very
> far away.

Makes sense.  I suspect that with your commit-graphs this ends up
being fast enough that you might have difficulty even measuring it,
though.

> > Did you include two runs of recursive and two runs of ort just to show
> > that the timings were stable and thus there wasn't warm or cold disk
> > cache issues affecting things?  If so, good plan.  (If there was
> > another reason, let me know; I missed it.)
>
> For the rebase, I did "--onto to from test" and "--onto from to test"
> to show both directions of the rebase. The merge I did twice for the
> cache issues ;)

Oh, good call.  Thanks for pointing it out, I missed that on first reading.

> > .004s on label:incore_nonrecursive -- that's the actual merge
> > operation.  This was a trivial rebase, and the merging took just 4
> > milliseconds.  But the overall run took 11.442 seconds because working
> > with 3M+ entries in the index just takes forever, and my code didn't
> > touch any on-disk formats, certainly not the index format.
> >
> > _All_ of my optimization work was on the merging piece, not the stuff
> > outside.  But for what you're testing here, it appears to be
> > irrelevant compared to the overhead.
>
> OK, so since we already disable rename detection through config,
> the machinery that you are changing is already fast with the old
> algorithm in these trivial cases.
>
> To actually show any benefits, we would need to disable rename
> detection or use a larger change.

...or both.  :-)

> >> And here are timings for a simple merge. Two files at root were changed in the
> >> commits I made, but there are also some larger changes from the commit history.
> >> These should all be seen as "this tree updated in one of the two, so take that
> >> tree".
> >
> > Ahah!  That's a microsoft-specific optimization you guys made in the
> > recursive strategy, yes?
>
> I'm not aware of any logic we have that's different from core Git.
> The config we use [1] includes "merge.stat = false" and "merge.renames
> = false" but otherwise seems to be using stock Git.
>
> [1] https://github.com/microsoft/scalar/blob/1d7938d2df6921f7a3b4f3f1cce56a00929adc40/Scalar.Common/Maintenance/ConfigStep.cs#L100-L127
>
> I'm CC'ing Jeff Hostetler to see if he knows anything about a custom
> merge algorithm in microsoft/git.

Oh, I took your wording that 'These should all be seen as "this tree
updated in one of the two, so take that tree"' as an implication that
you had a special merge tweak and wanted to verify it didn't regress.
I think I read too much into your wording.

Also, thinking over it more, I remember now that Ben also turned on
unpack_opts.aggressive when rename detection was turned off -- see
commit 6f10a09e0a ("merge: pass aggressive when rename detection is
turned off", 2018-05-02).  That isn't quite as advantageous as doing a
trivial tree merge, but if the algorithm that does the trivial tree
merge has to end up updating a complete index later anyway via the
checkout logic of unpack_trees, then the differences are basically a
wash.

> > It does NOT exist in upstream git.  It's
> > also one that is nearly incompatible with rename detection; it turns
> > out you can only do that optimization in the face of rename detection
> > if you do a HUGE amount of specialized work and tracking in order to
> > determine when it's safe _despite_ needing to detect renames.
>
> Perhaps merge.renames=false is enough to trigger this logic already?

Yeah, since I read too much into what you wrote and know that I
remember the "if (no_renames) o.aggressive = 1" bit, then yeah this
would be enough.

> > I
> > thought that optimization was totally incompatible with rename
> > detection for a long time; I tried it a couple times while working on
> > ort and watched it break all kinds of rename tests...but I eventually
> > discovered some tricks involving a lot of work to be able to run that
> > optimization.
>
> I will try to keep this in mind.
>
> > So, you aren't comparing upstream "recursive" to "ort", you're
> > comparing a tweaked version of recursive, and one that is incompatible
> > with how recursive's rename detection work.  In fact, just to be clear
> > in case you go looking, I suspect that this tweak is to be found
> > within unpack_trees.c (which recursive relies on heavily).
> >
> > Further, you've set it up so there are only a few files changed after
> > unpack_trees returns.
> >
> > In total, you have: (1) turned off rename detection (most my
> > optimizations are for improving this factor, meaning I can't show an
> > advantage), (2) you took advantage of no rename detection to implement
> > trivial-tree merges (thus killing the main second advantage my
> > algorithm has), and (3) you are looking at a case with a tiny number
> > of changes for the merge algorithm to process (thus killing a third
> > optimization that removes quadratic performance).  Those are my three
> > big optimizations, and you've made them all irrelevant.  In fact,
> > you're in an area I would have been worried that ort would do _worse_
> > than recursive.  I track an awful lot of things and there is overhead
> > in checking and filling all that information in; if there are only a
> > few entries to merge, then all that information was a waste to collect
> > and ort might be slower than recursive.  But then again, that should
> > be a case where both algorithms are "nearly instantaneous" (or would
> > be if it weren't for your 3M+ index entry repo causing run_builtin()'s
> > call to setup_git_directory() in git.c to take a huge amount of time
> > before the builtin is even called.)
>
> Thanks for your time isolating this case. I appreciate knowing exactly
> which portions of the merge algorithm are being touched and which are
> not.
> > 5 seconds.  I do have to hand it to Ben and anyone else involved,
> > though.  From 1 hour down to 5 seconds is pretty good, even if it was
> > done by hacks (turning off rename detection, and then implementing
> > trivial-tree merging that would have broken rename detection).  I
> > suspect that whoever did that work might have found the unconditional
> > discarding and re-reading of the index and fixed it as well?
>
> As you can probably tell from my general confusion, I had nothing
> to do with it. ;)
>
> > Heh, yeah 0.002 seconds for ..label:incore_recursive.  Only 2
> > milliseconds to create the actual merge tree.  That does suggest you
> > might have fun with 'git log -p --remerge-diff'; if you can redo
> > merges in 2 milliseconds, showing them in git log output is very
> > reasonable.  :-)
>
> Yeah, 'git merge-tree' is very fast for these cases, so I assumed
> that something else was going on for that command.

Oh, interesting.  I forgot about merge-tree.  Maybe I should make a
version based on merge-ort (and then it'd handle rename detection too,
something it doesn't currently do.)?  However, that wouldn't be
comparing merge algorithms, because builtin/merge-tree.c doesn't use
merge-recursive.[ch].  (It would be easy to get confused into thinking
it does, since merge-recursive.[ch] defines a function called
merge_trees(), but builtin/merge-tree.c doesn't use it despite the
name similarity.)

> > Could we have some fun, though?  What if you have some merge or rebase
> > involving lots of changes, and you turn rename detection back on, and
> > you disable that trivial-tree resolution optimization that breaks
> > recursive's rename detection handling...and then compare recursive and
> > ort?  (It might be easiest to just compare upstream recursive rather
> > than the one with all the microsoft changes to make sure you undid
> > whatever trivial tree handling work exists.)
>
> I can try these kinds of cases, but it won't be today. I'm on kid duty
> today, and answering emails in between running around with them.

One word of caution: merge.renameLimit may get in your way.  The
default of 1000 means that you're likely to hit that limit on your
first run, and get a warning message like the following printed out:

warning: inexact rename detection was skipped due to too many files.
warning: you may want to set your merge.renamelimit variable to at
least 27328 and retry the command.

You then need to undo your rebase or merge, bump the limit, and
re-run.  Also, you will need a higher limit for merge-recursive than
you do for merge-ort.  The default of 1000 is enough for merge-ort to
detect all the renames in my 26K-files-in-a-directory rename testcase
of the linux kernel, but the value needs to be bumped to 27328 for
merge-recursive.  And if you don't have the limit high enough, then
one algorithm is doing the work to detect renames and the other is
bailing and skipping it, so it's not an apples-to-apples comparison.
If that warning doesn't appear for either backend, then you have an
apples-to-apples comparison.

> > For example, my testcase in the linux kernel was finding a series of a
> > few dozen patches I could rebase back to an older version, but
> > tweaking the "older" version by renaming drivers/ -> pilots/ (with
> > about 26K files under that directory, that meant about 26K renames).
> > So, I got to see rebasing of dozens of real changes across a massive
> > rename boundary -- and the massive rename boundary also guaranteed
> > there were lots of entries for the merge algorithm to deal with.
> >
> > In the end, though, 4 milliseconds for the rebase and 2 milliseconds
> > for the merge, with the rest all being overhead of interfacing to the
> > index and working tree actually seems pretty good to me.  I'm just
> > curious if we can check how things work for more involved cases.
>
> I'm definitely interested in identifying how your algorithm improves
> over the previous cases, and perhaps re-enabling rename detection for
> merges is enough of a benefit to justify the new one.
>
> Eventually, I hope to actually engage with your patches in the form
> of review. Just trying to build a mental model for what's going on
> first.

Ooh, I can help with that; here's what's going on:  *** Magic ***

(Black, evil magic in the case of merge-recurisve.  Good magic in the
case of merge-ort.)

Glad I could help clear things up for you.  :-)
Derrick Stolee Nov. 11, 2020, 5:08 p.m. UTC | #10
On 11/2/2020 3:43 PM, Elijah Newren wrote:
> Elijah Newren (20):
>   merge-ort: setup basic internal data structures
>   merge-ort: add some high-level algorithm structure
>   merge-ort: port merge_start() from merge-recursive
>   merge-ort: use histogram diff
>   merge-ort: add an err() function similar to one from merge-recursive
>   merge-ort: implement a very basic collect_merge_info()
>   merge-ort: avoid repeating fill_tree_descriptor() on the same tree
>   merge-ort: compute a few more useful fields for collect_merge_info
>   merge-ort: record stage and auxiliary info for every path
>   merge-ort: avoid recursing into identical trees
>   merge-ort: add a preliminary simple process_entries() implementation
>   merge-ort: have process_entries operate in a defined order

I got this far before my attention to detail really started slipping.

>   merge-ort: step 1 of tree writing -- record basenames, modes, and oids
>   merge-ort: step 2 of tree writing -- function to create tree object
>   merge-ort: step 3 of tree writing -- handling subdirectories as we go
>   merge-ort: basic outline for merge_switch_to_result()
>   merge-ort: add implementation of checkout()
>   tree: enable cmp_cache_name_compare() to be used elsewhere
>   merge-ort: add implementation of record_unmerged_index_entries()
>   merge-ort: free data structures in merge_finalize()

I'll try to take another pass on these commits tomorrow.

For the series as a whole I'd love to see at least one test that
demonstrates that this code does something, if even only for a very
narrow case.

There's a lot of code being moved here, and it would be nice to have
even a very simple test case that can check that we didn't leave any
important die("not implemented") calls lying around or worse accessing
an uninitialized pointer or something.

Thanks,
-Stolee
Elijah Newren Nov. 11, 2020, 6:35 p.m. UTC | #11
On Wed, Nov 11, 2020 at 9:09 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 11/2/2020 3:43 PM, Elijah Newren wrote:
> > Elijah Newren (20):
> >   merge-ort: setup basic internal data structures
> >   merge-ort: add some high-level algorithm structure
> >   merge-ort: port merge_start() from merge-recursive
> >   merge-ort: use histogram diff
> >   merge-ort: add an err() function similar to one from merge-recursive
> >   merge-ort: implement a very basic collect_merge_info()
> >   merge-ort: avoid repeating fill_tree_descriptor() on the same tree
> >   merge-ort: compute a few more useful fields for collect_merge_info
> >   merge-ort: record stage and auxiliary info for every path
> >   merge-ort: avoid recursing into identical trees
> >   merge-ort: add a preliminary simple process_entries() implementation
> >   merge-ort: have process_entries operate in a defined order
>
> I got this far before my attention to detail really started slipping.
>
> >   merge-ort: step 1 of tree writing -- record basenames, modes, and oids
> >   merge-ort: step 2 of tree writing -- function to create tree object
> >   merge-ort: step 3 of tree writing -- handling subdirectories as we go
> >   merge-ort: basic outline for merge_switch_to_result()
> >   merge-ort: add implementation of checkout()
> >   tree: enable cmp_cache_name_compare() to be used elsewhere
> >   merge-ort: add implementation of record_unmerged_index_entries()
> >   merge-ort: free data structures in merge_finalize()
>
> I'll try to take another pass on these commits tomorrow.
>
> For the series as a whole I'd love to see at least one test that
> demonstrates that this code does something, if even only for a very
> narrow case.
>
> There's a lot of code being moved here, and it would be nice to have
> even a very simple test case that can check that we didn't leave any
> important die("not implemented") calls lying around or worse accessing
> an uninitialized pointer or something.

We absolutely left several die("not implemented") calls lying around.
The series was long enough at 20 patches; reviewers lose steam at 10
(at least both you and Jonathan have), so maybe I should have left
even more in there as an attempt to split up this series more.

However, if you run the testsuite with GIT_TEST_MERGE_ALGORITHM=ort,
then this series drops the number of failures in the testsuite from
around 2200, down to 1500.  So, there's about 700 testcases for you.

Also, there were several preparatory series all designed for getting
the testsuite in order for this new merge algorithm.  See the
following currently cooking topics:
  * en/merge-tests topic
  * en/dir-rename-tests
and the following topics that were previously merged:
  * 36d225c7d4 ("Merge branch 'en/merge-tests'", 2020-08-19)
  * cf372dc815 ("Merge branch 'en/test-cleanup'", 2020-03-09)
  * ac193e0e0a ("Merge branch 'en/merge-path-collision'", 2019-01-04)
  * c99033060f ("Merge branch
'en/t7405-recursive-submodule-conflicts'", 2018-08-02)
  * e6da45c7cd ("Merge branch 'en/t6036-merge-recursive-tests'", 2018-08-02)
  * 84e74c6403 ("Merge branch
'en/t6042-insane-merge-rename-testcases'", 2018-08-02)
  * bba1a5559c ("Merge branch 'en/t6036-recursive-corner-cases'", 2018-08-02)
  * 93b74a7cfa ("Merge branch 'en/merge-recursive-tests'", 2018-06-25)
and maybe others I missed.
Derrick Stolee Nov. 11, 2020, 8:48 p.m. UTC | #12
On 11/11/2020 1:35 PM, Elijah Newren wrote:
> On Wed, Nov 11, 2020 at 9:09 AM Derrick Stolee <stolee@gmail.com> wrote:
>> For the series as a whole I'd love to see at least one test that
>> demonstrates that this code does something, if even only for a very
>> narrow case.
>>
>> There's a lot of code being moved here, and it would be nice to have
>> even a very simple test case that can check that we didn't leave any
>> important die("not implemented") calls lying around or worse accessing
>> an uninitialized pointer or something.
> 
> We absolutely left several die("not implemented") calls lying around.
> The series was long enough at 20 patches; reviewers lose steam at 10
> (at least both you and Jonathan have), so maybe I should have left
> even more in there as an attempt to split up this series more.
> 
> However, if you run the testsuite with GIT_TEST_MERGE_ALGORITHM=ort,
> then this series drops the number of failures in the testsuite from
> around 2200, down to 1500.  So, there's about 700 testcases for you.

Sorry that I'm jumping in to the series-of-series in the middle, so
I am unfamiliar with the previous progress and testing strategy. This
"number of test failures" metric is sufficient to demonstrate the
progress provided in this series. Perhaps it was even in your v1 cover
letter.

Thanks,
-Stolee
Elijah Newren Nov. 11, 2020, 9:18 p.m. UTC | #13
Hi Derrick,

On Wed, Nov 11, 2020 at 12:48 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 11/11/2020 1:35 PM, Elijah Newren wrote:
> > On Wed, Nov 11, 2020 at 9:09 AM Derrick Stolee <stolee@gmail.com> wrote:
> >> For the series as a whole I'd love to see at least one test that
> >> demonstrates that this code does something, if even only for a very
> >> narrow case.
> >>
> >> There's a lot of code being moved here, and it would be nice to have
> >> even a very simple test case that can check that we didn't leave any
> >> important die("not implemented") calls lying around or worse accessing
> >> an uninitialized pointer or something.
> >
> > We absolutely left several die("not implemented") calls lying around.
> > The series was long enough at 20 patches; reviewers lose steam at 10
> > (at least both you and Jonathan have), so maybe I should have left
> > even more in there as an attempt to split up this series more.
> >
> > However, if you run the testsuite with GIT_TEST_MERGE_ALGORITHM=ort,
> > then this series drops the number of failures in the testsuite from
> > around 2200, down to 1500.  So, there's about 700 testcases for you.
>
> Sorry that I'm jumping in to the series-of-series in the middle, so
> I am unfamiliar with the previous progress and testing strategy. This

Not a problem at all.  Thanks much for jumping in and taking a look!
You always provide some good feedback and suggestions.

(Besides, those testcase changes have been spread over two and a half
years...hard to stay on top of all of them.)

> "number of test failures" metric is sufficient to demonstrate the
> progress provided in this series. Perhaps it was even in your v1 cover
> letter.

Um, oops; it's not.  I did mention there were still some "not
implemented" messages left, but didn't mention the testcase counts.
But even that mention is apparently in the v1 cover letter rather than
v2, and v2 wasn't sent in-reply-to v1, so it's harder to catch that.
Sorry about that; I'll include the testcase counts in the v3 cover
letter.