mbox series

[v2,0/6] Use generation numbers for --topo-order

Message ID pull.25.v2.git.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Use generation numbers for --topo-order | expand

Message

Johannes Schindelin via GitGitGadget Sept. 18, 2018, 4:08 a.m. UTC
This patch series performs a decently-sized refactoring of the revision-walk
machinery. Well, "refactoring" is probably the wrong word, as I don't
actually remove the old code. Instead, when we see certain options in the
'rev_info' struct, we redirect the commit-walk logic to a new set of methods
that distribute the workload differently. By using generation numbers in the
commit-graph, we can significantly improve 'git log --graph' commands (and
the underlying 'git rev-list --topo-order').

On the Linux repository, I got the following performance results when
comparing to the previous version with or without a commit-graph:

Test: git rev-list --topo-order -100 HEAD
HEAD~1, no commit-graph: 6.80 s
HEAD~1, w/ commit-graph: 0.77 s
  HEAD, w/ commit-graph: 0.02 s

Test: git rev-list --topo-order -100 HEAD -- tools
HEAD~1, no commit-graph: 9.63 s
HEAD~1, w/ commit-graph: 6.06 s
  HEAD, w/ commit-graph: 0.06 s

If you want to read this series but are unfamiliar with the commit-graph and
generation numbers, then I recommend reading 
Documentation/technical/commit-graph.txt or a blob post [1] I wrote on the
subject. In particular, the three-part walk described in "revision.c:
refactor basic topo-order logic" is present (but underexplained) as an
animated PNG [2].

Since revision.c is an incredibly important (and old) portion of the
codebase -- and because there are so many orthogonal options in 'struct
rev_info' -- I consider this submission to be "RFC quality". That is, I am
not confident that I am not missing anything, or that my solution is the
best it can be. I did merge this branch with ds/commit-graph-with-grafts and
the "DO-NOT-MERGE: write and read commit-graph always" commit that computes
a commit-graph with every 'git commit' command. The test suite passed with
that change, available on GitHub [3]. To ensure that I cover at least the
case I think are interesting, I added tests to t6600-test-reach.sh to verify
the walks report the correct results for the three cases there (no
commit-graph, full commit-graph, and a partial commit-graph so the walk
starts at GENERATION_NUMBER_INFINITY).

One notable case that is not included in this series is the case of a
history comparison such as 'git rev-list --topo-order A..B'. The existing
code in limit_list() has ways to cut the walk short when all pending commits
are UNINTERESTING. Since this code depends on commit_list instead of the
prio_queue we are using here, I chose to leave it untouched for now. We can
revisit it in a separate series later. Since handle_commit() turns on
revs->limited when a commit is UNINTERESTING, we do not hit the new code in
this case. Removing this 'revs->limited = 1;' line yields correct results,
but the performance is worse.

This series was based on ds/reachable, but is now based on 'master' to not
conflict with 182070 "commit: use timestamp_t for author_date_slab". There
is a small conflict with md/filter-trees, because it renamed a flag in
revisions.h in the line before I add new flags. Hopefully this conflict is
not too difficult to resolve.

Thanks, -Stolee

[1] 
https://blogs.msdn.microsoft.com/devops/2018/07/09/supercharging-the-git-commit-graph-iii-generations/
Supercharging the Git Commit Graph III: Generations and Graph Algorithms

[2] 
https://msdnshared.blob.core.windows.net/media/2018/06/commit-graph-topo-order-b-a.png
Animation showing three-part walk

[3] https://github.com/derrickstolee/git/tree/topo-order/testA branch
containing this series along with commits to compute commit-graph in entire
test suite.

Derrick Stolee (6):
  prio-queue: add 'peek' operation
  test-reach: add run_three_modes method
  test-reach: add rev-list tests
  revision.c: begin refactoring --topo-order logic
  commit/revisions: bookkeeping before refactoring
  revision.c: refactor basic topo-order logic

 commit.c                   |  11 +-
 commit.h                   |   8 ++
 object.h                   |   4 +-
 prio-queue.c               |   9 ++
 prio-queue.h               |   6 +
 revision.c                 | 232 ++++++++++++++++++++++++++++++++++++-
 revision.h                 |   6 +
 t/helper/test-prio-queue.c |  10 +-
 t/t6600-test-reach.sh      |  98 +++++++++++++++-
 9 files changed, 361 insertions(+), 23 deletions(-)


base-commit: 2d3b1c576c85b7f5db1f418907af00ab88e0c303
Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-25%2Fderrickstolee%2Ftopo-order%2Fprogress-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-25/derrickstolee/topo-order/progress-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/25

Range-diff vs v1:

 1:  5e55669f4d = 1:  cc1ec4c270 prio-queue: add 'peek' operation
 2:  9628396af1 = 2:  404c918608 test-reach: add run_three_modes method
 3:  708b4550a1 = 3:  30dee58c61 test-reach: add rev-list tests
 4:  908442417d ! 4:  a74ae13d4e revision.c: begin refactoring --topo-order logic
     @@ -168,4 +168,4 @@
      +	struct topo_walk_info *topo_walk_info;
       };
       
     - extern int ref_excluded(struct string_list *, const char *path);
     + int ref_excluded(struct string_list *, const char *path);
 5:  a7272f2799 ! 5:  0e64fc144c commit/revisions: bookkeeping before refactoring
     @@ -27,7 +27,7 @@
       define_commit_slab(indegree_slab, int);
       
      -/* record author-date for each commit object */
     --define_commit_slab(author_date_slab, unsigned long);
     +-define_commit_slab(author_date_slab, timestamp_t);
      -
      -static void record_author_date(struct author_date_slab *author_date,
      -			       struct commit *commit)
 6:  73713bcbee ! 6:  3b185ac3b1 revision.c: refactor basic topo-order logic
     @@ -153,11 +153,11 @@
       
       /*
        * object flag allocation:
     -- * revision.h:               0---------10                                26
     -+ * revision.h:               0---------10                                26--28
     -  * fetch-pack.c:             0----5
     +- * revision.h:               0---------10                              2526
     ++ * revision.h:               0---------10                              25----28
     +  * fetch-pack.c:             01
     +  * negotiator/default.c:       2--5
        * walker.c:                 0-2
     -  * upload-pack.c:                4       11-----14  16-----19
      @@
        * builtin/show-branch.c:    0-------------------------------------------26
        * builtin/unpack-objects.c:                                 2021
     @@ -404,11 +404,11 @@
      --- a/revision.h
      +++ b/revision.h
      @@
     - #define PATCHSAME	(1u<<9)
     - #define BOTTOM		(1u<<10)
     + #define USER_GIVEN	(1u<<25) /* given directly by the user */
       #define TRACK_LINEAR	(1u<<26)
     + #define ALL_REV_FLAGS	(((1u<<11)-1) | USER_GIVEN | TRACK_LINEAR)
      +#define TOPO_WALK_EXPLORED (1u<<27)
      +#define TOPO_WALK_INDEGREE (1u<<28)
     - #define ALL_REV_FLAGS	(((1u<<11)-1) | TRACK_LINEAR)
       
       #define DECORATE_SHORT_REFS	1
     + #define DECORATE_FULL_REFS	2

Comments

Ævar Arnfjörð Bjarmason Sept. 18, 2018, 6:05 a.m. UTC | #1
On Tue, Sep 18 2018, Derrick Stolee via GitGitGadget wrote:

Thanks. Good to see the commit graph used for more stuff.

> On the Linux repository, I got the following performance results when
> comparing to the previous version with or without a commit-graph:
>
> Test: git rev-list --topo-order -100 HEAD
> HEAD~1, no commit-graph: 6.80 s
> HEAD~1, w/ commit-graph: 0.77 s
>   HEAD, w/ commit-graph: 0.02 s
>
> Test: git rev-list --topo-order -100 HEAD -- tools
> HEAD~1, no commit-graph: 9.63 s
> HEAD~1, w/ commit-graph: 6.06 s
>   HEAD, w/ commit-graph: 0.06 s

It would be great if this were made into a t/perf/ test shipped with
this series, that would be later quoted in a commit, as in
e.g. 3b41fb0cb2 ("fsck: use oidset instead of oid_array for skipList",
2018-09-03).

Although generalizing that "-- tools" part (i.e. finding a candidate
dir) will require some heuristic, but would make it useful when running
this against other erpos.

> If you want to read this series but are unfamiliar with the commit-graph and
> generation numbers, then I recommend reading
> Documentation/technical/commit-graph.txt or a blob post [1] I wrote on the
> subject. In particular, the three-part walk described in "revision.c:
> refactor basic topo-order logic" is present (but underexplained) as an
> animated PNG [2].

We discussed some of this in private E-Mail, and this isn't really
feedback on *this* series in particular, just on the general
commit-graph work.

Right now git-config(1) just matter-of-factly says how to enable it, and
points to git-commit-graph(1) for further info, which just shows how to
run the tool. But nothing's describing what stuff is sped up, and those
sorts of docs aren't being updated as new optimizations (e.g. this
--topo-order walk) are added.

For that you need to scour a combination of your blogpost & commits in
git.git (with quoted perf numbers).
Derrick Stolee Sept. 21, 2018, 3:47 p.m. UTC | #2
On 9/18/2018 2:05 AM, Ævar Arnfjörð Bjarmason wrote:
> On Tue, Sep 18 2018, Derrick Stolee via GitGitGadget wrote:
>
> Thanks. Good to see the commit graph used for more stuff.
>
>> On the Linux repository, I got the following performance results when
>> comparing to the previous version with or without a commit-graph:
>>
>> Test: git rev-list --topo-order -100 HEAD
>> HEAD~1, no commit-graph: 6.80 s
>> HEAD~1, w/ commit-graph: 0.77 s
>>    HEAD, w/ commit-graph: 0.02 s
>>
>> Test: git rev-list --topo-order -100 HEAD -- tools
>> HEAD~1, no commit-graph: 9.63 s
>> HEAD~1, w/ commit-graph: 6.06 s
>>    HEAD, w/ commit-graph: 0.06 s
> It would be great if this were made into a t/perf/ test shipped with
> this series, that would be later quoted in a commit, as in
> e.g. 3b41fb0cb2 ("fsck: use oidset instead of oid_array for skipList",
> 2018-09-03).
>
> Although generalizing that "-- tools" part (i.e. finding a candidate
> dir) will require some heuristic, but would make it useful when running
> this against other erpos.

t/perf/p4211-line-log.sh has the following test:


     test_perf 'git log --oneline --raw --parents -1000' '
             git log --oneline --raw --parents -1000 >/dev/null
     '

We could add the following to the end of that script to get similar 
values, since it already selects a file randomly at the top of the script:

     test_perf 'git log --oneline --raw --parents -1000 -- <file>' '
             git log --oneline --raw --parents -1000 -- $file >/dev/null
     '

>
>> If you want to read this series but are unfamiliar with the commit-graph and
>> generation numbers, then I recommend reading
>> Documentation/technical/commit-graph.txt or a blob post [1] I wrote on the
>> subject. In particular, the three-part walk described in "revision.c:
>> refactor basic topo-order logic" is present (but underexplained) as an
>> animated PNG [2].
> We discussed some of this in private E-Mail, and this isn't really
> feedback on *this* series in particular, just on the general
> commit-graph work.
>
> Right now git-config(1) just matter-of-factly says how to enable it, and
> points to git-commit-graph(1) for further info, which just shows how to
> run the tool. But nothing's describing what stuff is sped up, and those
> sorts of docs aren't being updated as new optimizations (e.g. this
> --topo-order walk) are added.
>
> For that you need to scour a combination of your blogpost & commits in
> git.git (with quoted perf numbers).

Thanks for reminding me. I have this on my list of TODOs.

-Stolee