diff mbox series

[1/1] Add feature to show log as a tree

Message ID 20190303103751.6523-1-nelissen.micha@gmail.com (mailing list archive)
State New, archived
Headers show
Series [1/1] Add feature to show log as a tree | expand

Commit Message

Micha Nelissen March 3, 2019, 10:37 a.m. UTC
Modifies the topo-order code to keep track of the first child,
using a heuristic. The heuristic works by assigning a (depth)
level to all nodes. The first child is the node of which this
node is a parent of and has the lowest level. Then it cuts all
the links that are not the first child, resulting in a tree.

It also uses the level to sort the nodes: trying to keep
descendent line (of a merge) together as a group.

Add commandline option "--tree" to use this new feature.

Signed-off-by: Micha Nelissen <nelissen.micha@gmail.com>
---
 commit.c   | 136 +++++++++++++++++++++++++++++++++++++++++++++++------
 commit.h   |   1 +
 revision.c |   4 ++
 3 files changed, 127 insertions(+), 14 deletions(-)

Comments

Junio C Hamano March 3, 2019, 1:16 p.m. UTC | #1
Micha Nelissen <nelissen.micha@gmail.com> writes:

> Modifies the topo-order code to keep track of the first child,
> using a heuristic. The heuristic works by assigning a (depth)
> level to all nodes. The first child is the node of which this
> node is a parent of and has the lowest level. Then it cuts all
> the links that are not the first child, resulting in a tree.

I am not sure what you mean by a "tree".  It definitely is not a
tree perceived by Git users (which is what represents a directory
structure), and abusing the established term is confusing.

There is no inherent ordering among children of a given commit, so
you'd invent some way to define what "the first child" is.  That
much can be read from the above, but what is missing is why do you
even need to bother.  What benefit do users get by identifying "the
first" child and cutting all others off---that is the necessary
justification for this change, which is missing from the above
explanation.

"A (depth) level" is poorly explained---in fact, it is not explained
what it means at all.  "We designate the first child of each commit
by some magic" is all we can read from it.

Please try again.

After explaining "some magic" in a more meaningful way, perhaps
you'd realize that you'd want to stop saying "first child".  It
feels to me that you want to use a heuristic to identify the most
important single line of history leading to the tip.  If that is
what you are doing, then the resulting shape of the history would
not be a tree (where branching is an important aspect, and both
sides of the branches are important) but a linear line (where you
ignore all side branches and concentrate only on the trunk of a
tree).  And I'd probably call each child that is on the "trunk" line
of the most important lineage "primary child" or something like
that.

> It also uses the level to sort the nodes: trying to keep
> descendent line (of a merge) together as a group.
>
> Add commandline option "--tree" to use this new feature.

In any case, if this is to sit next to and be friends with
"--topo-order", the option should be named as "--some-order".
The option name "--tree" (or "--blob", "--tag", or "--commit" for
that matter) is obviously unacceptable if the option is about
specifying how the commits in the "git log" output are ordered.

Having said that, if you are omitting commits that are not on the
primary lineage of the history, then this is not "--anything-order"
option, in which case it may want to sit next to and be friends with
"--first-parent", perhaps?  with the fuzzy description and
explanation above, I cannot quite guess the answer to this question
myself so I'll stop here.

> Signed-off-by: Micha Nelissen <nelissen.micha@gmail.com>
> ---
>  commit.c   | 136 +++++++++++++++++++++++++++++++++++++++++++++++------
>  commit.h   |   1 +
>  revision.c |   4 ++

Without clear explanation of what the change attempts to achieve in
docs and tests, there is no way to review to see if the new code is
correct.
Micha Nelissen March 4, 2019, 9:13 a.m. UTC | #2
On 03-03-2019 14:16, Junio C Hamano wrote:
> I am not sure what you mean by a "tree".  It definitely is not a
> tree perceived by Git users (which is what represents a directory
> structure), and abusing the established term is confusing.

I called it tree because the output looks like (right half) of a tree. 
I'm open to alternatives though.

> There is no inherent ordering among children of a given commit, so
> you'd invent some way to define what "the first child" is.  That
> much can be read from the above, but what is missing is why do you
> even need to bother.  What benefit do users get by identifying "the
> first" child and cutting all others off---that is the necessary
> justification for this change, which is missing from the above
> explanation.

Right, in my mind the problem statement is so clear and I didn't write 
it down. There should have been an introduction paragraph.

Problem I want to solve. The --graph output is useful but often 
unreadable, due to the many parallel lines. On the other hand, with the 
default straight line log output all structure is lost. I am trying some 
middle ground to see merge structure, but keep history readable. I 
noticed in particular the branch point causes a line to go around or go 
in between many commits leading to clutter, so I want to remove these. 
By itself the exact point branched from is rarely important (at least to 
me). And if it is, then there still is --graph :-).

> "A (depth) level" is poorly explained---in fact, it is not explained
> what it means at all.  "We designate the first child of each commit
> by some magic" is all we can read from it.

When processing the commit nodes. Start with depth level 0. Given some 
current depth level D, assign depth level D, D+1, ... D+N-1 to the N 
parents of a (merge) commit. Recurse.

In the output, the depth level is more or less the column number the 
star '*' is in. (Although after merges of merges, there will be gaps in 
the level numbering.)

> After explaining "some magic" in a more meaningful way, perhaps
> you'd realize that you'd want to stop saying "first child".  It

OK well, the term "first parent" is already in the man page, right? 
--first-parent. So the first child of a commit is the commit C among all 
children H, such that when all of H are traced back to a common point S, 
the trace line from C is the one that ends being the first parent of S.

> ignore all side branches and concentrate only on the trunk of a
> tree).  And I'd probably call each child that is on the "trunk" line
> of the most important lineage "primary child" or something like
> that.

Sort of, but remember this works recursive, there can be merges of 
merges. Therefore the output is not a single trunk line. In other words, 
the above commit S doesn't need to be on the trunk (or, '*' completely 
left in the graph output).

> In any case, if this is to sit next to and be friends with
> "--topo-order", the option should be named as "--some-order".
> The option name "--tree" (or "--blob", "--tag", or "--commit" for
> that matter) is obviously unacceptable if the option is about
> specifying how the commits in the "git log" output are ordered.

I see it as alternative to the --graph option. I also made it jump 
there, but maybe gotos are rejected from style point of view. In this 
case I thought it would be clear to see it as alternative to --graph.

It doesn't do justice in my opinion to call it --graph --micha-order (or 
whatever ;-)), as it is not only about changing the order of commit 
display. But again, I'm open to alternatives, I can make a personal 
alias anyway :-).

> Having said that, if you are omitting commits that are not on the
> primary lineage of the history, then this is not "--anything-order"
> option, in which case it may want to sit next to and be friends with

Ah no, it doesn't omit any commits. All commits are displayed.

> Without clear explanation of what the change attempts to achieve in
> docs and tests, there is no way to review to see if the new code is
> correct.

How can I add tests for this feature? I'm also open to what kinds of 
documentation to add, as I find it way easier to explain when you 
actually run it to see the output it generates, than from abstract 
paragraph of documentation text (like a manpage has). Some 
mathematical-type of explanation like above is also not really 
immediately enlightening, and also not necessary to use it I think.

Thanks for review, Micha
diff mbox series

Patch

diff --git commit.c commit.c
index a5333c7ac6..340757adc2 100644
--- commit.c
+++ commit.c
@@ -662,7 +662,12 @@  struct commit *pop_commit(struct commit_list **stack)
  */
 
 /* count number of children that have not been emitted */
-define_commit_slab(indegree_slab, int);
+struct indegree {
+	unsigned short indegree;
+	unsigned short level;	  /* first level this commit has been seen at */
+};
+define_commit_slab(indegree_slab, struct indegree);
+define_commit_slab(first_child_slab, struct commit *);
 
 define_commit_slab(author_date_slab, timestamp_t);
 
@@ -708,6 +713,22 @@  int compare_commits_by_author_date(const void *a_, const void *b_,
 	return 0;
 }
 
+static int compare_commits_by_tree_level(const void *a_, const void *b_,
+					 void *cb_data)
+{
+	const struct commit *a = a_, *b = b_;
+	struct indegree_slab *indegree = cb_data;
+	unsigned short a_level = indegree_slab_at(indegree, a)->level;
+	unsigned short b_level = indegree_slab_at(indegree, b)->level;
+
+	/* deepest tree level commits first */
+	if (a_level < b_level)
+		return 1;
+	else if (a_level > b_level)
+		return -1;
+	return 0;
+}
+
 int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, void *unused)
 {
 	const struct commit *a = a_, *b = b_;
@@ -745,6 +766,7 @@  void sort_in_topological_order(struct commit_list **list, enum rev_sort_order so
 	struct commit_list *next, *orig = *list;
 	struct commit_list **pptr;
 	struct indegree_slab indegree;
+	struct first_child_slab first_child;
 	struct prio_queue queue;
 	struct commit *commit;
 	struct author_date_slab author_date;
@@ -760,6 +782,11 @@  void sort_in_topological_order(struct commit_list **list, enum rev_sort_order so
 	default: /* REV_SORT_IN_GRAPH_ORDER */
 		queue.compare = NULL;
 		break;
+	case REV_SORT_IN_TREE_ORDER:
+		init_first_child_slab(&first_child);
+		queue.compare = compare_commits_by_tree_level;
+		queue.cb_data = &indegree;
+		break;
 	case REV_SORT_BY_COMMIT_DATE:
 		queue.compare = compare_commits_by_commit_date;
 		break;
@@ -773,7 +800,8 @@  void sort_in_topological_order(struct commit_list **list, enum rev_sort_order so
 	/* Mark them and clear the indegree */
 	for (next = orig; next; next = next->next) {
 		struct commit *commit = next->item;
-		*(indegree_slab_at(&indegree, commit)) = 1;
+		struct indegree *pi = indegree_slab_at(&indegree, commit);
+		pi->indegree = 0, pi->level = 0;
 		/* also record the author dates, if needed */
 		if (sort_order == REV_SORT_BY_AUTHOR_DATE)
 			record_author_date(&author_date, commit);
@@ -782,13 +810,12 @@  void sort_in_topological_order(struct commit_list **list, enum rev_sort_order so
 	/* update the indegree */
 	for (next = orig; next; next = next->next) {
 		struct commit_list *parents = next->item->parents;
-		while (parents) {
+		for (; parents; parents = parents->next) {
 			struct commit *parent = parents->item;
-			int *pi = indegree_slab_at(&indegree, parent);
-
-			if (*pi)
-				(*pi)++;
-			parents = parents->next;
+			struct indegree *pi = indegree_slab_peek(&indegree, parent);
+			if (!pi)
+				continue;
+			pi->indegree++;
 		}
 	}
 
@@ -801,9 +828,12 @@  void sort_in_topological_order(struct commit_list **list, enum rev_sort_order so
 	 */
 	for (next = orig; next; next = next->next) {
 		struct commit *commit = next->item;
+		struct indegree *pi = indegree_slab_at(&indegree, commit);
 
-		if (*(indegree_slab_at(&indegree, commit)) == 1)
+		if (pi->indegree == 0) {
+			pi->level = 1;
 			prio_queue_put(&queue, commit);
+		}
 	}
 
 	/*
@@ -820,31 +850,109 @@  void sort_in_topological_order(struct commit_list **list, enum rev_sort_order so
 	*list = NULL;
 	while ((commit = prio_queue_get(&queue)) != NULL) {
 		struct commit_list *parents;
+		unsigned short commit_level, parent_level;
+		commit_level = parent_level = indegree_slab_at(&indegree, commit)->level;
 
-		for (parents = commit->parents; parents ; parents = parents->next) {
+		for (parents = commit->parents; parents; parents = parents->next) {
 			struct commit *parent = parents->item;
-			int *pi = indegree_slab_at(&indegree, parent);
+			struct indegree *pi = indegree_slab_peek(&indegree, parent);
 
-			if (!*pi)
+			if (!pi)
 				continue;
 
+			if (sort_order == REV_SORT_IN_TREE_ORDER) {
+				struct commit **pfirst_child =
+					first_child_slab_at(&first_child, parent);
+				if (*pfirst_child != NULL) {
+					/* already set a first child, if it is from higher
+					   level than we are, set ourselves as first */
+					struct indegree *old_pi =
+						indegree_slab_at(&indegree, *pfirst_child);
+					if (old_pi->level >= commit_level)
+						*pfirst_child = NULL;
+				}
+				if (*pfirst_child == NULL)
+					*pfirst_child = commit;
+
+				if (!pi->level || parent_level < pi->level) {
+					struct commit_list *gparent_list;
+					struct commit *gparent = parent;
+					struct indegree *gpi;
+					/* mark this 'branch' as this level */
+					pi->level = parent_level;
+					while ((gparent_list = gparent->parents) != NULL) {
+						gparent = gparent_list->item;
+						gpi = indegree_slab_peek(&indegree, gparent);
+						if (!gpi ||
+						    (gpi->level && gpi->level < parent_level))
+							break;
+						gpi->level = parent_level;
+					}
+				}
+				if (pi->level >= parent_level)
+					parent_level = pi->level + 1;
+			}
+
 			/*
 			 * parents are only enqueued for emission
 			 * when all their children have been emitted thereby
 			 * guaranteeing topological order.
 			 */
-			if (--(*pi) == 1)
+			if (--pi->indegree == 0)
 				prio_queue_put(&queue, parent);
 		}
 		/*
 		 * all children of commit have already been
 		 * emitted. we can emit it now.
 		 */
-		*(indegree_slab_at(&indegree, commit)) = 0;
+		indegree_slab_at(&indegree, commit)->indegree = 0;
 
 		pptr = &commit_list_insert(commit, pptr)->next;
 	}
 
+	if (sort_order == REV_SORT_IN_TREE_ORDER) {
+		struct commit *commit, *next_commit;
+		/*
+		 * go through the commit list to cut all the non-first
+		 * parent-child links, so we get a tree
+		 */
+		next = *list;
+		if (next) {
+			commit = next->item;
+			next = next->next;
+		}
+		for (next = *list; next; next = next->next, commit = next_commit) {
+			struct commit_list *parents, **pparents = &commit->parents;
+			next_commit = next->item;
+			for (parents = commit->parents; parents; parents = parents->next) {
+				/* leave link between sequential commits alone because
+				   the level is not 100% to column mapping. level might
+				   be higher due to merges of merges from the same
+				   origin; except if the next commit is on top level
+				   then for sure it's not the same column */
+				struct commit *parent = parents->item;
+				int cut = 1;
+				if (parent == next_commit) {
+					struct indegree *pi =
+						indegree_slab_peek(&indegree, parent);
+					/* cut as in, allow to cut */
+					cut = pi && pi->level == 1;
+				}
+				if (cut) {
+					struct commit **pfirst_child =
+						first_child_slab_at(&first_child, parent);
+					cut = commit != *pfirst_child;
+				}
+				if (cut)
+					*pparents = parents->next;
+				else
+					pparents = &parents->next;
+			}
+		}
+
+		clear_first_child_slab(&first_child);
+	}
+
 	clear_indegree_slab(&indegree);
 	clear_prio_queue(&queue);
 	if (sort_order == REV_SORT_BY_AUTHOR_DATE)
diff --git commit.h commit.h
index 42728c2906..25b236cb49 100644
--- commit.h
+++ commit.h
@@ -205,6 +205,7 @@  void clear_commit_marks_many(int nr, struct commit **commit, unsigned int mark);
 
 enum rev_sort_order {
 	REV_SORT_IN_GRAPH_ORDER = 0,
+	REV_SORT_IN_TREE_ORDER,
 	REV_SORT_BY_COMMIT_DATE,
 	REV_SORT_BY_AUTHOR_DATE
 };
diff --git revision.c revision.c
index 162d511d46..09074e2c08 100644
--- revision.c
+++ revision.c
@@ -2031,6 +2031,9 @@  static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if (!strcmp(arg, "--topo-order")) {
 		revs->sort_order = REV_SORT_IN_GRAPH_ORDER;
 		revs->topo_order = 1;
+	} else if (!strcmp(arg, "--tree")) {
+		revs->sort_order = REV_SORT_IN_TREE_ORDER;
+		goto graph;
 	} else if (!strcmp(arg, "--simplify-merges")) {
 		revs->simplify_merges = 1;
 		revs->topo_order = 1;
@@ -2227,6 +2230,7 @@  static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->pretty_given = 1;
 		revs->abbrev_commit = 1;
 	} else if (!strcmp(arg, "--graph")) {
+	    graph:
 		revs->topo_order = 1;
 		revs->rewrite_parents = 1;
 		revs->graph = graph_init(revs);