Message ID | 20190303103751.6523-1-nelissen.micha@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/1] Add feature to show log as a tree | expand |
Micha Nelissen <nelissen.micha@gmail.com> writes: > Modifies the topo-order code to keep track of the first child, > using a heuristic. The heuristic works by assigning a (depth) > level to all nodes. The first child is the node of which this > node is a parent of and has the lowest level. Then it cuts all > the links that are not the first child, resulting in a tree. I am not sure what you mean by a "tree". It definitely is not a tree perceived by Git users (which is what represents a directory structure), and abusing the established term is confusing. There is no inherent ordering among children of a given commit, so you'd invent some way to define what "the first child" is. That much can be read from the above, but what is missing is why do you even need to bother. What benefit do users get by identifying "the first" child and cutting all others off---that is the necessary justification for this change, which is missing from the above explanation. "A (depth) level" is poorly explained---in fact, it is not explained what it means at all. "We designate the first child of each commit by some magic" is all we can read from it. Please try again. After explaining "some magic" in a more meaningful way, perhaps you'd realize that you'd want to stop saying "first child". It feels to me that you want to use a heuristic to identify the most important single line of history leading to the tip. If that is what you are doing, then the resulting shape of the history would not be a tree (where branching is an important aspect, and both sides of the branches are important) but a linear line (where you ignore all side branches and concentrate only on the trunk of a tree). And I'd probably call each child that is on the "trunk" line of the most important lineage "primary child" or something like that. > It also uses the level to sort the nodes: trying to keep > descendent line (of a merge) together as a group. > > Add commandline option "--tree" to use this new feature. In any case, if this is to sit next to and be friends with "--topo-order", the option should be named as "--some-order". The option name "--tree" (or "--blob", "--tag", or "--commit" for that matter) is obviously unacceptable if the option is about specifying how the commits in the "git log" output are ordered. Having said that, if you are omitting commits that are not on the primary lineage of the history, then this is not "--anything-order" option, in which case it may want to sit next to and be friends with "--first-parent", perhaps? with the fuzzy description and explanation above, I cannot quite guess the answer to this question myself so I'll stop here. > Signed-off-by: Micha Nelissen <nelissen.micha@gmail.com> > --- > commit.c | 136 +++++++++++++++++++++++++++++++++++++++++++++++------ > commit.h | 1 + > revision.c | 4 ++ Without clear explanation of what the change attempts to achieve in docs and tests, there is no way to review to see if the new code is correct.
On 03-03-2019 14:16, Junio C Hamano wrote: > I am not sure what you mean by a "tree". It definitely is not a > tree perceived by Git users (which is what represents a directory > structure), and abusing the established term is confusing. I called it tree because the output looks like (right half) of a tree. I'm open to alternatives though. > There is no inherent ordering among children of a given commit, so > you'd invent some way to define what "the first child" is. That > much can be read from the above, but what is missing is why do you > even need to bother. What benefit do users get by identifying "the > first" child and cutting all others off---that is the necessary > justification for this change, which is missing from the above > explanation. Right, in my mind the problem statement is so clear and I didn't write it down. There should have been an introduction paragraph. Problem I want to solve. The --graph output is useful but often unreadable, due to the many parallel lines. On the other hand, with the default straight line log output all structure is lost. I am trying some middle ground to see merge structure, but keep history readable. I noticed in particular the branch point causes a line to go around or go in between many commits leading to clutter, so I want to remove these. By itself the exact point branched from is rarely important (at least to me). And if it is, then there still is --graph :-). > "A (depth) level" is poorly explained---in fact, it is not explained > what it means at all. "We designate the first child of each commit > by some magic" is all we can read from it. When processing the commit nodes. Start with depth level 0. Given some current depth level D, assign depth level D, D+1, ... D+N-1 to the N parents of a (merge) commit. Recurse. In the output, the depth level is more or less the column number the star '*' is in. (Although after merges of merges, there will be gaps in the level numbering.) > After explaining "some magic" in a more meaningful way, perhaps > you'd realize that you'd want to stop saying "first child". It OK well, the term "first parent" is already in the man page, right? --first-parent. So the first child of a commit is the commit C among all children H, such that when all of H are traced back to a common point S, the trace line from C is the one that ends being the first parent of S. > ignore all side branches and concentrate only on the trunk of a > tree). And I'd probably call each child that is on the "trunk" line > of the most important lineage "primary child" or something like > that. Sort of, but remember this works recursive, there can be merges of merges. Therefore the output is not a single trunk line. In other words, the above commit S doesn't need to be on the trunk (or, '*' completely left in the graph output). > In any case, if this is to sit next to and be friends with > "--topo-order", the option should be named as "--some-order". > The option name "--tree" (or "--blob", "--tag", or "--commit" for > that matter) is obviously unacceptable if the option is about > specifying how the commits in the "git log" output are ordered. I see it as alternative to the --graph option. I also made it jump there, but maybe gotos are rejected from style point of view. In this case I thought it would be clear to see it as alternative to --graph. It doesn't do justice in my opinion to call it --graph --micha-order (or whatever ;-)), as it is not only about changing the order of commit display. But again, I'm open to alternatives, I can make a personal alias anyway :-). > Having said that, if you are omitting commits that are not on the > primary lineage of the history, then this is not "--anything-order" > option, in which case it may want to sit next to and be friends with Ah no, it doesn't omit any commits. All commits are displayed. > Without clear explanation of what the change attempts to achieve in > docs and tests, there is no way to review to see if the new code is > correct. How can I add tests for this feature? I'm also open to what kinds of documentation to add, as I find it way easier to explain when you actually run it to see the output it generates, than from abstract paragraph of documentation text (like a manpage has). Some mathematical-type of explanation like above is also not really immediately enlightening, and also not necessary to use it I think. Thanks for review, Micha
diff --git commit.c commit.c index a5333c7ac6..340757adc2 100644 --- commit.c +++ commit.c @@ -662,7 +662,12 @@ struct commit *pop_commit(struct commit_list **stack) */ /* count number of children that have not been emitted */ -define_commit_slab(indegree_slab, int); +struct indegree { + unsigned short indegree; + unsigned short level; /* first level this commit has been seen at */ +}; +define_commit_slab(indegree_slab, struct indegree); +define_commit_slab(first_child_slab, struct commit *); define_commit_slab(author_date_slab, timestamp_t); @@ -708,6 +713,22 @@ int compare_commits_by_author_date(const void *a_, const void *b_, return 0; } +static int compare_commits_by_tree_level(const void *a_, const void *b_, + void *cb_data) +{ + const struct commit *a = a_, *b = b_; + struct indegree_slab *indegree = cb_data; + unsigned short a_level = indegree_slab_at(indegree, a)->level; + unsigned short b_level = indegree_slab_at(indegree, b)->level; + + /* deepest tree level commits first */ + if (a_level < b_level) + return 1; + else if (a_level > b_level) + return -1; + return 0; +} + int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, void *unused) { const struct commit *a = a_, *b = b_; @@ -745,6 +766,7 @@ void sort_in_topological_order(struct commit_list **list, enum rev_sort_order so struct commit_list *next, *orig = *list; struct commit_list **pptr; struct indegree_slab indegree; + struct first_child_slab first_child; struct prio_queue queue; struct commit *commit; struct author_date_slab author_date; @@ -760,6 +782,11 @@ void sort_in_topological_order(struct commit_list **list, enum rev_sort_order so default: /* REV_SORT_IN_GRAPH_ORDER */ queue.compare = NULL; break; + case REV_SORT_IN_TREE_ORDER: + init_first_child_slab(&first_child); + queue.compare = compare_commits_by_tree_level; + queue.cb_data = &indegree; + break; case REV_SORT_BY_COMMIT_DATE: queue.compare = compare_commits_by_commit_date; break; @@ -773,7 +800,8 @@ void sort_in_topological_order(struct commit_list **list, enum rev_sort_order so /* Mark them and clear the indegree */ for (next = orig; next; next = next->next) { struct commit *commit = next->item; - *(indegree_slab_at(&indegree, commit)) = 1; + struct indegree *pi = indegree_slab_at(&indegree, commit); + pi->indegree = 0, pi->level = 0; /* also record the author dates, if needed */ if (sort_order == REV_SORT_BY_AUTHOR_DATE) record_author_date(&author_date, commit); @@ -782,13 +810,12 @@ void sort_in_topological_order(struct commit_list **list, enum rev_sort_order so /* update the indegree */ for (next = orig; next; next = next->next) { struct commit_list *parents = next->item->parents; - while (parents) { + for (; parents; parents = parents->next) { struct commit *parent = parents->item; - int *pi = indegree_slab_at(&indegree, parent); - - if (*pi) - (*pi)++; - parents = parents->next; + struct indegree *pi = indegree_slab_peek(&indegree, parent); + if (!pi) + continue; + pi->indegree++; } } @@ -801,9 +828,12 @@ void sort_in_topological_order(struct commit_list **list, enum rev_sort_order so */ for (next = orig; next; next = next->next) { struct commit *commit = next->item; + struct indegree *pi = indegree_slab_at(&indegree, commit); - if (*(indegree_slab_at(&indegree, commit)) == 1) + if (pi->indegree == 0) { + pi->level = 1; prio_queue_put(&queue, commit); + } } /* @@ -820,31 +850,109 @@ void sort_in_topological_order(struct commit_list **list, enum rev_sort_order so *list = NULL; while ((commit = prio_queue_get(&queue)) != NULL) { struct commit_list *parents; + unsigned short commit_level, parent_level; + commit_level = parent_level = indegree_slab_at(&indegree, commit)->level; - for (parents = commit->parents; parents ; parents = parents->next) { + for (parents = commit->parents; parents; parents = parents->next) { struct commit *parent = parents->item; - int *pi = indegree_slab_at(&indegree, parent); + struct indegree *pi = indegree_slab_peek(&indegree, parent); - if (!*pi) + if (!pi) continue; + if (sort_order == REV_SORT_IN_TREE_ORDER) { + struct commit **pfirst_child = + first_child_slab_at(&first_child, parent); + if (*pfirst_child != NULL) { + /* already set a first child, if it is from higher + level than we are, set ourselves as first */ + struct indegree *old_pi = + indegree_slab_at(&indegree, *pfirst_child); + if (old_pi->level >= commit_level) + *pfirst_child = NULL; + } + if (*pfirst_child == NULL) + *pfirst_child = commit; + + if (!pi->level || parent_level < pi->level) { + struct commit_list *gparent_list; + struct commit *gparent = parent; + struct indegree *gpi; + /* mark this 'branch' as this level */ + pi->level = parent_level; + while ((gparent_list = gparent->parents) != NULL) { + gparent = gparent_list->item; + gpi = indegree_slab_peek(&indegree, gparent); + if (!gpi || + (gpi->level && gpi->level < parent_level)) + break; + gpi->level = parent_level; + } + } + if (pi->level >= parent_level) + parent_level = pi->level + 1; + } + /* * parents are only enqueued for emission * when all their children have been emitted thereby * guaranteeing topological order. */ - if (--(*pi) == 1) + if (--pi->indegree == 0) prio_queue_put(&queue, parent); } /* * all children of commit have already been * emitted. we can emit it now. */ - *(indegree_slab_at(&indegree, commit)) = 0; + indegree_slab_at(&indegree, commit)->indegree = 0; pptr = &commit_list_insert(commit, pptr)->next; } + if (sort_order == REV_SORT_IN_TREE_ORDER) { + struct commit *commit, *next_commit; + /* + * go through the commit list to cut all the non-first + * parent-child links, so we get a tree + */ + next = *list; + if (next) { + commit = next->item; + next = next->next; + } + for (next = *list; next; next = next->next, commit = next_commit) { + struct commit_list *parents, **pparents = &commit->parents; + next_commit = next->item; + for (parents = commit->parents; parents; parents = parents->next) { + /* leave link between sequential commits alone because + the level is not 100% to column mapping. level might + be higher due to merges of merges from the same + origin; except if the next commit is on top level + then for sure it's not the same column */ + struct commit *parent = parents->item; + int cut = 1; + if (parent == next_commit) { + struct indegree *pi = + indegree_slab_peek(&indegree, parent); + /* cut as in, allow to cut */ + cut = pi && pi->level == 1; + } + if (cut) { + struct commit **pfirst_child = + first_child_slab_at(&first_child, parent); + cut = commit != *pfirst_child; + } + if (cut) + *pparents = parents->next; + else + pparents = &parents->next; + } + } + + clear_first_child_slab(&first_child); + } + clear_indegree_slab(&indegree); clear_prio_queue(&queue); if (sort_order == REV_SORT_BY_AUTHOR_DATE) diff --git commit.h commit.h index 42728c2906..25b236cb49 100644 --- commit.h +++ commit.h @@ -205,6 +205,7 @@ void clear_commit_marks_many(int nr, struct commit **commit, unsigned int mark); enum rev_sort_order { REV_SORT_IN_GRAPH_ORDER = 0, + REV_SORT_IN_TREE_ORDER, REV_SORT_BY_COMMIT_DATE, REV_SORT_BY_AUTHOR_DATE }; diff --git revision.c revision.c index 162d511d46..09074e2c08 100644 --- revision.c +++ revision.c @@ -2031,6 +2031,9 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg } else if (!strcmp(arg, "--topo-order")) { revs->sort_order = REV_SORT_IN_GRAPH_ORDER; revs->topo_order = 1; + } else if (!strcmp(arg, "--tree")) { + revs->sort_order = REV_SORT_IN_TREE_ORDER; + goto graph; } else if (!strcmp(arg, "--simplify-merges")) { revs->simplify_merges = 1; revs->topo_order = 1; @@ -2227,6 +2230,7 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg revs->pretty_given = 1; revs->abbrev_commit = 1; } else if (!strcmp(arg, "--graph")) { + graph: revs->topo_order = 1; revs->rewrite_parents = 1; revs->graph = graph_init(revs);
Modifies the topo-order code to keep track of the first child, using a heuristic. The heuristic works by assigning a (depth) level to all nodes. The first child is the node of which this node is a parent of and has the lowest level. Then it cuts all the links that are not the first child, resulting in a tree. It also uses the level to sort the nodes: trying to keep descendent line (of a merge) together as a group. Add commandline option "--tree" to use this new feature. Signed-off-by: Micha Nelissen <nelissen.micha@gmail.com> --- commit.c | 136 +++++++++++++++++++++++++++++++++++++++++++++++------ commit.h | 1 + revision.c | 4 ++ 3 files changed, 127 insertions(+), 14 deletions(-)