diff mbox series

Teach git-rev-list --simplify-forks

Message ID df0b9e59-e6d7-8839-ca3b-86145dc3bdf3@gmail.com (mailing list archive)
State New, archived
Headers show
Series Teach git-rev-list --simplify-forks | expand

Commit Message

Antonio Russo April 27, 2020, 5:07 a.m. UTC
When used with --graph, instead of displaying the full graph, display a
spanning subgraph produced by a depth-first search of the graph visiting
parents from left to right.  Edges to already visited commits are
discarded.  This process is repeated for every commit to be displayed.

This is valuable to reduce visual clutter when there are many merges
that were not rebased onto each other and the user is primarily
interested in the state of the branch being merged into.

Also adds documentation and tests of the above.

Signed-off-by: Antonio Russo <antonio.e.russo@gmail.com>
---
 Documentation/rev-list-options.txt         |  8 +++
 revision.c                                 | 67 +++++++++++++++++++++-
 revision.h                                 |  6 ++
 t/t6016-rev-list-graph-simplify-history.sh | 50 ++++++++++++++++
 4 files changed, 130 insertions(+), 1 deletion(-)

Hello,

I'm trying to address a long-standing problem I've had visualizing complex
git repositories with git log --graph.  I think that a picture is worth a
thousand words, so I'll just show what the patch does for git master:

% git log --graph --abbrev-commit --pretty=oneline --simplify-forks e870325ee8

> * e870325ee8 (HEAD -> master, origin/master, origin/HEAD) The third batch
> *   a397e9c236 Merge branch 'jk/credential-parsing-end-of-host-in-URL'
> |\
> | * 4c5971e18a credential: treat "?" and "#" in URLs as end of host
> *   d6d561db1c Merge branch 'jt/rebase-allow-duplicate'
> |\
> | * 0fcb4f6b62 rebase --merge: optionally skip upstreamed commits
> *   c7d8f69da5 Merge branch 'en/rebase-no-keep-empty'
> |\
> | * 50ed76148a rebase: fix an incompatible-options error message
> | * b9cbd2958f rebase: reinstate --no-keep-empty
> | * 1b5735f75c rebase -i: mark commits that begin empty in todo editor
> *   8b39dfdf47 Merge branch 'js/mingw-is-hidden-test-fix'
> |\
> | * 176a66a748 t: restrict `is_hidden` to be called only on Windows
> | * 9814d0a4ad mingw: make test_path_is_hidden more robust
> | * 7c2dfca7e8 t: consolidate the `is_hidden` functions
> *   a41b41ca74 Merge branch 'js/mingw-isilon-nfs'
> |\
> | * 23eafd924a mingw: cope with the Isilon network file system
> *   33feaca6bf Merge branch 'js/flush-prompt-before-interative-input'
> |\
> | * 1f09aed834 interactive: explicitly `fflush` stdout before expecting input
> | * 08d383f23e interactive: refactor code asking the user for interactive input
> *   9af3a7cb4d Merge branch 'ds/revision-show-pulls'
> |\
> | * 8d049e182e revision: --show-pulls adds helpful merges
> *   82fa169d55 Merge branch 'ma/simplify-merge-config-parsing'
> |\
> | * 9881b451f3 merge: use skip_prefix to parse config key
> *   b3eb70e0f8 Merge branch 'js/mingw-fixes'

Compare this to before:

% git log --graph --abbrev-commit --pretty=oneline e870325ee8

> * e870325ee8 (HEAD -> master, origin/master, origin/HEAD) The third batch
> *   a397e9c236 Merge branch 'jk/credential-parsing-end-of-host-in-URL'
> |\
> | * 4c5971e18a credential: treat "?" and "#" in URLs as end of host
> * |   d6d561db1c Merge branch 'jt/rebase-allow-duplicate'
> |\ \
> | * | 0fcb4f6b62 rebase --merge: optionally skip upstreamed commits
> * | | c7d8f69da5 Merge branch 'en/rebase-no-keep-empty'
> |\| |
> | * | 50ed76148a rebase: fix an incompatible-options error message
> | * | b9cbd2958f rebase: reinstate --no-keep-empty
> | * | 1b5735f75c rebase -i: mark commits that begin empty in todo editor
> * | |   8b39dfdf47 Merge branch 'js/mingw-is-hidden-test-fix'
> |\ \ \
> | * | | 176a66a748 t: restrict `is_hidden` to be called only on Windows
> | * | | 9814d0a4ad mingw: make test_path_is_hidden more robust
> | * | | 7c2dfca7e8 t: consolidate the `is_hidden` functions
> * | | |   a41b41ca74 Merge branch 'js/mingw-isilon-nfs'
> |\ \ \ \
> | * | | | 23eafd924a mingw: cope with the Isilon network file system
> | |/ / /
> * | | |   33feaca6bf Merge branch 'js/flush-prompt-before-interative-input'
> |\ \ \ \
> | * | | | 1f09aed834 interactive: explicitly `fflush` stdout before expecting input
> | * | | | 08d383f23e interactive: refactor code asking the user for interactive input
> | |/ / /
> * | | |   9af3a7cb4d Merge branch 'ds/revision-show-pulls'
> |\ \ \ \
> | * | | | 8d049e182e revision: --show-pulls adds helpful merges
> | |/ / /
> * | | |   82fa169d55 Merge branch 'ma/simplify-merge-config-parsing'
> |\ \ \ \
> | * | | | 9881b451f3 merge: use skip_prefix to parse config key
> | |/ / /
> * | | |   b3eb70e0f8 Merge branch 'js/mingw-fixes'
> |\ \ \ \

Roughly, the tool produces a spanning tree by traversing the commit graph,
leftmost parents first. It forgets information about where forks occurred,
removing the "tails" that connect branches back to their original fork
point.  These "tails" can be very long and often do not contain information
that is very important for someone glancing through commits.

=== Details

It's about 40% faster.  Tested this on current master (of git):

git log --graph --pretty=oneline --simplify-forks e870325ee8 > /dev/null  0.83s user 0.06s system 99% cpu 0.886 total

git log --graph --pretty=oneline e870325ee8 > /dev/null  1.41s user 0.03s system 99% cpu 1.443 total

This is because the produced graph is much simpler---it is a tree in the
above case (though not in general).

The exact approach uses two flags (reusing bits 27 and 28 that are used
by the topological sort walk).  One says whether the commit has been ever
visited, and the other says whether the commit was visited from the current
root.  The walk does not continue down any commit that was visited, and
leaves the connection to the parent only if has not been visited from the
current root.  Each commit that is to be shown is treated as its own root
(most of these root commits are immediately skipped, because they are
likely visited in an earlier traversal).

=== Problems

Travis CI is showing issues with the tests I've added [1].  I cannot
reproduce them locally, neither with gcc 4.8.5 (RHEL) nor 9.3.0 (Debian).
They're also only failing on one of the runners.


Thank you for taking the time to look at this,
Antonio Russo


[1] https://travis-ci.org/github/aerusso/git
[2] https://github.com/aerusso/git/tree/mrs/simplify-history-forks

Comments

Derrick Stolee April 27, 2020, 10:55 a.m. UTC | #1
On 4/27/2020 1:07 AM, Antonio Russo wrote:
> === Problems
> 
> Travis CI is showing issues with the tests I've added [1].  I cannot
> reproduce them locally, neither with gcc 4.8.5 (RHEL) nor 9.3.0 (Debian).
> They're also only failing on one of the runners.

This is probably because the tests run a second round with GIT_TEST_COMMIT_GRAPH=1, which enables the commit-graph feature. This triggers a different set of logic for the topo-order, which ignores the logic the way you inserted it here.

Thanks,
-Stolee
Antonio Russo April 28, 2020, 12:49 p.m. UTC | #2
On 4/27/20 4:55 AM, Derrick Stolee wrote:
> 
> This is probably because the tests run a second round with GIT_TEST_COMMIT_GRAPH=1, which enables the commit-graph feature. This triggers a different set of logic for the topo-order, which ignores the logic the way you inserted it here.
> 

Thank you for pointing me at this.  If I now understand correctly, the commit information
is not yet necessarily loaded for all commits at this point, and therefore the logic here
will need to be called later on (and makes it more complicated).

Am I correct that this loading of parents happens during traverse_commit_list_filtered
(for the case of rev-list)?  Also, am I correct that there are not yet any hooks to
filter out edges (of the graph of commits)?

Thank you,
Antonio
Derrick Stolee April 28, 2020, 2:11 p.m. UTC | #3
On 4/28/2020 8:49 AM, Antonio Russo wrote:
> On 4/27/20 4:55 AM, Derrick Stolee wrote:
>>
>> This is probably because the tests run a second round with GIT_TEST_COMMIT_GRAPH=1, which enables the commit-graph feature. This triggers a different set of logic for the topo-order, which ignores the logic the way you inserted it here.
>>
> 
> Thank you for pointing me at this.  If I now understand correctly, the commit information
> is not yet necessarily loaded for all commits at this point, and therefore the logic here
> will need to be called later on (and makes it more complicated).
> 
> Am I correct that this loading of parents happens during traverse_commit_list_filtered
> (for the case of rev-list)?  Also, am I correct that there are not yet any hooks to
> filter out edges (of the graph of commits)?

I mean that if you have a commit-graph, then a completely disjoint set of code
is run instead of the code you modified. See [1] for the blocks of code that
are run instead.

[1] https://github.com/git/git/commit/b45424181e9e8b2284a48c6db7b8db635bbfccc8

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index 04ad7dd36e..cbac09028c 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -363,6 +363,14 @@  Default mode::
 	merges from the resulting history, as there are no selected
 	commits contributing to this merge.

+--simplify-forks::
+	Convert the commit graph into a spanning subgraph produced by a
+	depth-first-search of the history graph, searching the leftmost
+	parent first, and discarding edges to commits already visited.
+	Useful with `--graph` to visualize repositories with many merges
+	when you are interested in was added to master, and not when the
+	branch was last rebased.
+
 --ancestry-path::
 	When given a range of commits to display (e.g. 'commit1..commit2'
 	or 'commit2 {caret}commit1'), only display commits that exist
diff --git a/revision.c b/revision.c
index 5bc96444b6..06d9697306 100644
--- a/revision.c
+++ b/revision.c
@@ -2082,6 +2082,8 @@  static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->simplify_by_decoration = 1;
 		revs->limited = 1;
 		revs->prune = 1;
+	} else if (!strcmp(arg, "--simplify-forks")) {
+		revs->simplify_forks = 1;
 	} else if (!strcmp(arg, "--date-order")) {
 		revs->sort_order = REV_SORT_BY_COMMIT_DATE;
 		revs->topo_order = 1;
@@ -3095,6 +3097,63 @@  static void simplify_merges(struct rev_info *revs)
 	}
 }

+static void simplify_forks(struct rev_info *revs)
+{
+	struct commit_list *stack, *list_lr, *iter_list;
+	struct commit_list **parents;
+	struct commit *commit, *parent;
+
+	stack = NULL;
+	list_lr = NULL;
+
+	clear_object_flags(SIMP_FORK_VISITED);
+
+	for(iter_list = revs->commits; iter_list; iter_list = iter_list->next) {
+		/* process every commit to be displayed exactly once */
+		if(iter_list->item->object.flags & SIMP_FORK_VISITED)
+			continue;
+		clear_object_flags(SIMP_FORK_VISITING);
+		commit_list_insert(iter_list->item, &stack);
+		iter_list->item->object.flags |= SIMP_FORK_VISITED | SIMP_FORK_VISITING;
+		while(stack) {
+			commit = pop_commit(&stack);
+			/* process the parent nodes: removing links to
+			 * commits already visited (creating a spanning tree)
+			 */
+			parents = &(commit->parents);
+			while(*parents) {
+				parent = (*parents)->item;
+				if(parent->object.flags & SIMP_FORK_VISITING) {
+					/* We have already visited this commit, from the same root.
+					 * We do not explore it at all.
+					 */
+					pop_commit(parents);
+				} else if(parent->object.flags & SIMP_FORK_VISITED) {
+					/* We visited this commit before, but from a different root.
+					 * Leave it attached, but do not explore it further.
+					 */
+					parents = &((*parents)->next);
+				} else {
+					/* We have not visited this commit yet. Explore it, as usual.
+					 */
+					parent->object.flags |= SIMP_FORK_VISITED | SIMP_FORK_VISITING;
+					commit_list_insert(parent, &list_lr);
+					parents = &((*parents)->next);
+				}
+			}
+
+			/* feed the parents, right to left (reversed) onto the
+			 * stack to do a depth-first traversal of the commit graph
+			 */
+			while(list_lr) {
+				commit_list_insert(pop_commit(&list_lr), &stack);
+			}
+		}
+	}
+
+	clear_object_flags(SIMP_FORK_VISITED | SIMP_FORK_VISITING);
+}
+
 static void set_children(struct rev_info *revs)
 {
 	struct commit_list *l;
@@ -3392,10 +3451,16 @@  int prepare_revision_walk(struct rev_info *revs)
 	if (revs->limited) {
 		if (limit_list(revs) < 0)
 			return -1;
+		if (revs->simplify_forks)
+			simplify_forks(revs);
 		if (revs->topo_order)
 			sort_in_topological_order(&revs->commits, revs->sort_order);
-	} else if (revs->topo_order)
+	} else if (revs->topo_order) {
+		if (revs->simplify_forks)
+			simplify_forks(revs);
 		init_topo_walk(revs);
+	} else if (revs->simplify_forks)
+		simplify_forks(revs);
 	if (revs->line_level_traverse)
 		line_log_filter(revs);
 	if (revs->simplify_merges)
diff --git a/revision.h b/revision.h
index c1af164b30..f1abdb26b0 100644
--- a/revision.h
+++ b/revision.h
@@ -51,6 +51,11 @@ 
 #define TOPO_WALK_EXPLORED	(1u<<27)
 #define TOPO_WALK_INDEGREE	(1u<<28)

+/* Re-use the TOPO_WALK flagspace for simplify_forks
+ */
+#define SIMP_FORK_VISITED	(1u<<27)
+#define SIMP_FORK_VISITING	(1u<<28)
+
 #define DECORATE_SHORT_REFS	1
 #define DECORATE_FULL_REFS	2

@@ -132,6 +137,7 @@  struct rev_info {
 			no_walk:2,
 			remove_empty_trees:1,
 			simplify_history:1,
+			simplify_forks:1,
 			show_pulls:1,
 			topo_order:1,
 			simplify_merges:1,
diff --git a/t/t6016-rev-list-graph-simplify-history.sh b/t/t6016-rev-list-graph-simplify-history.sh
index f5e6e92f5b..d99214b6df 100755
--- a/t/t6016-rev-list-graph-simplify-history.sh
+++ b/t/t6016-rev-list-graph-simplify-history.sh
@@ -85,6 +85,28 @@  test_expect_success '--graph --all' '
 	test_cmp expected actual
 	'

+# Make sure that simplify_histpry_forks produces a spanning tree
+test_expect_success '--graph --simplify-forks --all' '
+	rm -f expected &&
+	echo "* $A7" >> expected &&
+	echo "*   $A6" >> expected &&
+	echo "|\  " >> expected &&
+	echo "| * $C4" >> expected &&
+	echo "| * $C3" >> expected &&
+	echo "* $A5" >> expected &&
+	echo "*-.   $A4" >> expected &&
+	echo "|\ \  " >> expected &&
+	echo "| | * $C2" >> expected &&
+	echo "| | * $C1" >> expected &&
+	echo "| * $B2" >> expected &&
+	echo "| * $B1" >> expected &&
+	echo "* $A3" >> expected &&
+	echo "* $A2" >> expected &&
+	echo "* $A1" >> expected &&
+	git rev-list --graph --simplify-forks --all > actual &&
+	test_cmp expected actual
+	'
+
 # Make sure the graph_is_interesting() code still realizes
 # that undecorated merges are interesting, even with --simplify-by-decoration
 test_expect_success '--graph --simplify-by-decoration' '
@@ -157,6 +179,20 @@  test_expect_success '--graph --full-history -- bar.txt' '
 	test_cmp expected actual
 	'

+test_expect_success '--graph --simplify-forks --full-history -- bar.txt' '
+	rm -f expected &&
+	echo "* $A7" >> expected &&
+	echo "*   $A6" >> expected &&
+	echo "|\\  " >> expected &&
+	echo "| * $C4" >> expected &&
+	echo "* $A5" >> expected &&
+	echo "* $A4" >> expected &&
+	echo "* $A3" >> expected &&
+	echo "* $A2" >> expected &&
+	git rev-list --graph --simplify-forks --full-history --all -- bar.txt > actual &&
+	test_cmp expected actual
+	'
+
 test_expect_success '--graph --full-history --simplify-merges -- bar.txt' '
 	rm -f expected &&
 	echo "* $A7" >> expected &&
@@ -172,6 +208,20 @@  test_expect_success '--graph --full-history --simplify-merges -- bar.txt' '
 	test_cmp expected actual
 	'

+test_expect_success '--graph --simplify-forks --full-history --simplify-merges -- bar.txt' '
+	rm -f expected &&
+	echo "* $A7" >> expected &&
+	echo "*   $A6" >> expected &&
+	echo "|\\  " >> expected &&
+	echo "| * $C4" >> expected &&
+	echo "* $A5" >> expected &&
+	echo "* $A3" >> expected &&
+	echo "* $A2" >> expected &&
+	git rev-list --graph --simplify-forks --full-history --simplify-merges --all \
+		-- bar.txt > actual &&
+	test_cmp expected actual
+	'
+
 test_expect_success '--graph -- bar.txt' '
 	rm -f expected &&
 	echo "* $A7" >> expected &&