[v5,0/7] Use generation numbers for --topo-order
mbox series

Message ID 20181101134623.84055-1-dstolee@microsoft.com
Headers show
Series
  • Use generation numbers for --topo-order
Related show

Message

Derrick Stolee Nov. 1, 2018, 1:46 p.m. UTC
This patch series performs a decently-sized refactoring of the
revision-walk machinery. Well, "refactoring" is probably the wrong word,
as I don't actually remove the old code. Instead, when we see certain
options in the 'rev_info' struct, we redirect the commit-walk logic to
a new set of methods that distribute the workload differently. By using
generation numbers in the commit-graph, we can significantly improve
'git log --graph' commands (and the underlying 'git rev-list --topo-order').

On the Linux repository, I got the following performance results when
comparing to the previous version with or without a commit-graph:

    Test: git rev-list --topo-order -100 HEAD
    HEAD~1, no commit-graph: 6.80 s
    HEAD~1, w/ commit-graph: 0.77 s
      HEAD, w/ commit-graph: 0.02 s

    Test: git rev-list --topo-order -100 HEAD -- tools
    HEAD~1, no commit-graph: 9.63 s
    HEAD~1, w/ commit-graph: 6.06 s
      HEAD, w/ commit-graph: 0.06 s

If you want to read this series but are unfamiliar with the commit-graph
and generation numbers, then I recommend reading
`Documentation/technical/commit-graph.txt` or a blog post [1] I wrote on
the subject. In particular, the three-part walk described in "revision.c:
refactor basic topo-order logic" is present (but underexplained) as an
animated PNG [2].

**UPDATED** Now that we have had some review and some dogfooding, I'm
removing the paragraph I had here about "RFC quality". I think this is
ready to merge!

One notable case that is not included in this series is the case of a
history comparison such as 'git rev-list --topo-order A..B'. The existing
code in limit_list() has ways to cut the walk short when all pending
commits are UNINTERESTING. Since this code depends on commit_list instead
of the prio_queue we are using here, I chose to leave it untouched for now.
We can revisit it in a separate series later. Since handle_commit() turns
on revs->limited when a commit is UNINTERESTING, we do not hit the new
code in this case. Removing this 'revs->limited = 1;' line yields correct
results, but the performance can be worse.

**UPDATED** See the discussion about Generation Number V2 [4] for more
on this topic.

Changes in V5: Thanks Jakub for feedback on the huge commit! I think
I've responded to all the code feedback. See the range-diff at the
end of this cover-page.

Thanks,
-Stolee

[1] https://blogs.msdn.microsoft.com/devops/2018/07/09/supercharging-the-git-commit-graph-iii-generations/
   Supercharging the Git Commit Graph III: Generations and Graph Algorithms

[2] https://msdnshared.blob.core.windows.net/media/2018/06/commit-graph-topo-order-b-a.png
    Animation showing three-part walk

[3] https://github.com/derrickstolee/git/tree/topo-order/test
    A branch containing this series along with commits to compute commit-graph in entire test suite.

[4] https://public-inbox.org/git/6367e30a-1b3a-4fe9-611b-d931f51effef@gmail.com/
    [RFC] Generation Number v2

Note: I'm not submitting this version via GitGitGadget because it's
currently struggling with how to handle a PR in a conflict state.
The new flags in revision.h have a conflict with recent changes in
master.

Derrick Stolee (7):
  prio-queue: add 'peek' operation
  test-reach: add run_three_modes method
  test-reach: add rev-list tests
  revision.c: begin refactoring --topo-order logic
  commit/revisions: bookkeeping before refactoring
  revision.c: generation-based topo-order algorithm
  t6012: make rev-list tests more interesting

 commit.c                     |   9 +-
 commit.h                     |   7 +
 object.h                     |   4 +-
 prio-queue.c                 |   9 ++
 prio-queue.h                 |   6 +
 revision.c                   | 243 +++++++++++++++++++++++++++++++++--
 revision.h                   |   6 +
 t/helper/test-prio-queue.c   |  26 ++--
 t/t0009-prio-queue.sh        |  14 ++
 t/t6012-rev-list-simplify.sh |  45 +++++--
 t/t6600-test-reach.sh        |  96 +++++++++++++-
 11 files changed, 426 insertions(+), 39 deletions(-)


base-commit: 2d3b1c576c85b7f5db1f418907af00ab88e0c303