Message ID | pull.599.v3.git.1586521183873.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] revision: --show-pulls adds helpful merges | expand |
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > A----C1----C2--- ... ---Cn----M------P1---P2--- ... ---Pn > \ \ \ \ / / / / > \ \__.. \ \/ ..__T1 / Tn > \ \__.. /\ ..__T2 / > \_____________________B \____________________/ > > If the commits T1, T2, ... Tn did not change the file, then all of > P1 through Pn will be TREESAME to their first parent, but not > TREESAME to their second. This means that all of those merge commits > appear in the --full-history view, with edges that immediately > collapse into the lower history without introducing interesting > single-parent commits. > > The --simplify-merges option was introduced to remove these extra > merge commits. By noticing that the rewritten parents are reachable > from their first parents, those edges can be simplified away. Finally, > the commits now look like single-parent commits that are TREESAME to > their "only" parent. Thus, they are removed and this issue does not > cause issues anymore. However, this also ends up removing the commit > M from the history view! Even worse, the --simplify-merges option > requires walking the entire history before returning a single result. True. It is not advisable to use --simplify-merges unless you are limiting the history at the bottom for that reason. > In some sense, users are asking for the "first" merge commit to > bring in the change to their branch. As long as the parent order is > consistent, this can be handled with the following rule: > > Include a merge commit if it is not TREESAME to its first > parent, but is TREESAME to a later parent. "but is" -> "even if it is" would make it a bit more accurate, I would think. Normally, such a merge that is treesame to some side branch is omitted from the output, but the rule wants it to be shown even if all the changes were brought in from one single parent. > Update Documentation/rev-list-options.txt with significant details > around this option. This requires updating the example in the > History Simplification section to demonstrate some of the problems > with TREESAME second parents. Good. > diff --git a/revision.c b/revision.c > index 8136929e236..f89dd6caa55 100644 > --- a/revision.c > +++ b/revision.c > @@ -870,7 +870,19 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit) > } > parent->next = NULL; > commit->parents = parent; > - commit->object.flags |= TREESAME; > + > + /* > + * A merge commit is a "diversion" if it is not > + * TREESAME to its first parent but is TREESAME > + * to a later parent. In the simplified history, > + * we "divert" the history walk to the later > + * parent. These commits are shown when "show_pulls" > + * is enabled, so do not mark the object as > + * TREESAME here. As we no longer use the word "diversion", this explanation should shift the focus from defining the word "diversion" and giving its background to explaining why the above parent rewriting is done and why the TREESAME marking is conditional, e.g. The tree of the merge and of the parent are the same; from here on, we stop following histories of all other parents but this one by culling commit->parents list. We also normally mark the merge commit TREESAME as the merge itself did not introduce any change relative to the parent, but we refrain from doing so for the first parent under "--show-pulls" mode, in order to let the output phase to show the merge, which is the last commit before we divert our walk to a side history. or something along that line. > + if (!revs->show_pulls || !nth_parent) > + commit->object.flags |= TREESAME; > + > return; > @@ -897,6 +909,10 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit) > relevant_change = 1; > else > irrelevant_change = 1; > + > + if (!nth_parent) > + commit->object.flags |= PULL_MERGE; For a three-parent merge that brings in changes to the first parent, if the result matches the second parent, we would have returned in the previous hunk before having a chance to inspect the third one and marking the merge result with PULL_MERGE, but if you swap the order of the second and the third parent, the second parent, which has different tree from the result, would not return in the previous hunk and cause the merge with PULL_MERGE here. And then when we inspect the third parent, the previous hunk's return would kick in. So for two merges that merge exactly the same two branches on top of exactly the same commit on the mainline, you sometimes mark the result with PULL_MERGE and sometimes don't, depending on the order of the second and the third parent. That feels iffy. Treating the first parent differently from others is what we intend to do with this change, bhut this hunk treats the other parents differently depending on the merge order. > @@ -3019,7 +3037,8 @@ static struct commit_list **simplify_one(struct rev_info *revs, struct commit *c > if (!cnt || > (commit->object.flags & UNINTERESTING) || > !(commit->object.flags & TREESAME) || > - (parent = one_relevant_parent(revs, commit->parents)) == NULL) > + (parent = one_relevant_parent(revs, commit->parents)) == NULL || > + (revs->show_pulls && (commit->object.flags & PULL_MERGE))) > st->simplified = commit; ... hence, wouldn't we see different result here ... > @@ -3602,6 +3621,10 @@ enum commit_action get_commit_action(struct rev_info *revs, struct commit *commi > /* drop merges unless we want parenthood */ > if (!want_ancestry(revs)) > return commit_ignore; > + > + if (revs->show_pulls && (commit->object.flags & PULL_MERGE)) > + return commit_show; ... and also here? Thanks.
On 4/10/2020 4:06 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> A----C1----C2--- ... ---Cn----M------P1---P2--- ... ---Pn >> \ \ \ \ / / / / >> \ \__.. \ \/ ..__T1 / Tn >> \ \__.. /\ ..__T2 / >> \_____________________B \____________________/ >> >> If the commits T1, T2, ... Tn did not change the file, then all of >> P1 through Pn will be TREESAME to their first parent, but not >> TREESAME to their second. This means that all of those merge commits >> appear in the --full-history view, with edges that immediately >> collapse into the lower history without introducing interesting >> single-parent commits. >> >> The --simplify-merges option was introduced to remove these extra >> merge commits. By noticing that the rewritten parents are reachable >> from their first parents, those edges can be simplified away. Finally, >> the commits now look like single-parent commits that are TREESAME to >> their "only" parent. Thus, they are removed and this issue does not >> cause issues anymore. However, this also ends up removing the commit >> M from the history view! Even worse, the --simplify-merges option >> requires walking the entire history before returning a single result. > > True. It is not advisable to use --simplify-merges unless you are > limiting the history at the bottom for that reason. I will modify my advice to include this "limiting the history at the bottom" advice. >> In some sense, users are asking for the "first" merge commit to >> bring in the change to their branch. As long as the parent order is >> consistent, this can be handled with the following rule: >> >> Include a merge commit if it is not TREESAME to its first >> parent, but is TREESAME to a later parent. > > "but is" -> "even if it is" would make it a bit more accurate, I > would think. Normally, such a merge that is treesame to some side > branch is omitted from the output, but the rule wants it to be shown > even if all the changes were brought in from one single parent. This is an excellent clarification, since the way I wrote it does not describe every merge that we include. I wrote it to include the "extra" merges that are included. A better strategy is to include a complete description of every merge commit that appears in the output, as you defined it. >> Update Documentation/rev-list-options.txt with significant details >> around this option. This requires updating the example in the >> History Simplification section to demonstrate some of the problems >> with TREESAME second parents. > > Good. > >> diff --git a/revision.c b/revision.c >> index 8136929e236..f89dd6caa55 100644 >> --- a/revision.c >> +++ b/revision.c >> @@ -870,7 +870,19 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit) >> } >> parent->next = NULL; >> commit->parents = parent; >> - commit->object.flags |= TREESAME; >> + >> + /* >> + * A merge commit is a "diversion" if it is not >> + * TREESAME to its first parent but is TREESAME >> + * to a later parent. In the simplified history, >> + * we "divert" the history walk to the later >> + * parent. These commits are shown when "show_pulls" >> + * is enabled, so do not mark the object as >> + * TREESAME here. > > As we no longer use the word "diversion", this explanation should > shift the focus from defining the word "diversion" and giving its > background to explaining why the above parent rewriting is done and > why the TREESAME marking is conditional, e.g. Sorry about this. I intended to replace all references to this old terminology, but failed. > The tree of the merge and of the parent are > the same; from here on, we stop following > histories of all other parents but this one > by culling commit->parents list. We also > normally mark the merge commit TREESAME as > the merge itself did not introduce any > change relative to the parent, but we > refrain from doing so for the first parent > under "--show-pulls" mode, in order to let > the output phase to show the merge, which is > the last commit before we divert our walk to > a side history. > > or something along that line. This is pretty good. I plan to take it with very minor modifications in my next version. > >> + if (!revs->show_pulls || !nth_parent) >> + commit->object.flags |= TREESAME; >> + >> return; > >> @@ -897,6 +909,10 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit) >> relevant_change = 1; >> else >> irrelevant_change = 1; >> + >> + if (!nth_parent) >> + commit->object.flags |= PULL_MERGE; > > For a three-parent merge that brings in changes to the first parent, > if the result matches the second parent, we would have returned in > the previous hunk before having a chance to inspect the third one > and marking the merge result with PULL_MERGE, but if you swap the > order of the second and the third parent, the second parent, which > has different tree from the result, would not return in the previous > hunk and cause the merge with PULL_MERGE here. And then when we > inspect the third parent, the previous hunk's return would kick in. > So for two merges that merge exactly the same two branches on top of > exactly the same commit on the mainline, you sometimes mark the > result with PULL_MERGE and sometimes don't, depending on the order > of the second and the third parent. > > That feels iffy. Treating the first parent differently from others > is what we intend to do with this change, bhut this hunk treats the > other parents differently depending on the merge order. It is worth adding a test with an octopus merge to really be clear about the intended behavior. When describing the concept, I've been careful to say "TREESAME to the first parent" and "not TREESAME to a later parent" instead of assuming two parents. In the default mode, we should not reach here unless we found a TREESAME parent that was not the first parent, since we stop visiting parents after we find any TREESAME parent. But this would change for --full-history or --simplify-merges. >> @@ -3019,7 +3037,8 @@ static struct commit_list **simplify_one(struct rev_info *revs, struct commit *c >> if (!cnt || >> (commit->object.flags & UNINTERESTING) || >> !(commit->object.flags & TREESAME) || >> - (parent = one_relevant_parent(revs, commit->parents)) == NULL) >> + (parent = one_relevant_parent(revs, commit->parents)) == NULL || >> + (revs->show_pulls && (commit->object.flags & PULL_MERGE))) >> st->simplified = commit; > > ... hence, wouldn't we see different result here ... > >> @@ -3602,6 +3621,10 @@ enum commit_action get_commit_action(struct rev_info *revs, struct commit *commi >> /* drop merges unless we want parenthood */ >> if (!want_ancestry(revs)) >> return commit_ignore; >> + >> + if (revs->show_pulls && (commit->object.flags & PULL_MERGE)) >> + return commit_show; > > ... and also here? If there is a flaw in how I set PULL_MERGE, then hopefully the fix is where I assign the PULL_MERGE flag and these lines do not need to change. Thanks, -Stolee
diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt index bfd02ade991..04ad7dd36eb 100644 --- a/Documentation/rev-list-options.txt +++ b/Documentation/rev-list-options.txt @@ -342,6 +342,12 @@ Default mode:: branches if the end result is the same (i.e. merging branches with the same content) +--show-pulls:: + Include all commits from the default mode, but also any merge + commits that are not TREESAME to the first parent but are + TREESAME to a later parent. This mode is helpful for showing + the merge commits that "first introduced" a change to a branch. + --full-history:: Same as the default mode, but does not prune some history. @@ -534,7 +540,7 @@ Note the major differences in `N`, `P`, and `Q` over `--full-history`: parent and is TREESAME. -- -Finally, there is a fifth simplification mode available: +There is another simplification mode available: --ancestry-path:: Limit the displayed commits to those directly on the ancestry @@ -573,6 +579,132 @@ option does. Applied to the 'D..M' range, it results in: L--M ----------------------------------------------------------------------- +Before discussing another option, `--show-pulls`, we need to +create a new example history. ++ +A common problem users face when looking at simplified history is that a +commit they know changed a file somehow does not appear in the file's +simplified history. Let's demonstrate a new example and show how options +such as `--full-history` and `--simplify-merges` works in that case: ++ +----------------------------------------------------------------------- + .-A---M-----C--N---O---P + / / \ \ \/ / / + I B \ R-'`-Z' / + \ / \/ / + \ / /\ / + `---X--' `---Y--' +----------------------------------------------------------------------- ++ +For this example, suppose `I` created `file.txt` which was modified by +`A`, `B`, and `X` in different ways. The single-parent commits `C`, `Z`, +and `Y` do not change `file.txt`. The merge commit `M` was created by +resolving the merge conflict to include both changes from `A` and `B` +and hence is not TREESAME to either. The merge commit `R`, however, was +created by ignoring the contents of `file.txt` at `M` and taking only +the contents of `file.txt` at `X`. Hence, `R` is TREESAME to `X` but not +`M`. Finally, the natural merge resolution to create `N` is to take the +contents of `file.txt` at `R`, so `N` is TREESAME to `R` but not `C`. +The merge commits `O` and `P` are TREESAME to their first parents, but +not to their second parents, `Z` and `Y` respectively. ++ +When using the default mode, `N` and `R` both have a TREESAME parent, so +those edges are walked and the others are ignored. The resulting history +graph is: ++ +----------------------------------------------------------------------- + I---X +----------------------------------------------------------------------- ++ +When using `--full-history`, Git walks every edge. This will discover +the commits `A` and `B` and the merge `M`, but also will reveal the +merge commits `O` and `P`. With parent rewriting, the resulting graph is: ++ +----------------------------------------------------------------------- + .-A---M--------N---O---P + / / \ \ \/ / / + I B \ R-'`--' / + \ / \/ / + \ / /\ / + `---X--' `------' +----------------------------------------------------------------------- ++ +Here, the merge commits `O` and `P` contribute extra noise, as they did +not actually contribute a change to `file.txt`. They only merged a topic +that was based on an older version of `file.txt`. This is a common +issue in repositories using a workflow where many contributors work in +parallel and merge their topic branches along a single trunk: manu +unrelated merges appear in the `--full-history` results. ++ +When using the `--simplify-merges` option, the commits `O` and `P` +disappear from the results. This is because the rewritten second parents +of `O` and `P` are reachable from their first parents. Those edges are +removed and then the commits look like single-parent commits that are +TREESAME to their parent. This also happens to the commit `N`, resulting +in a history view as follows: ++ +----------------------------------------------------------------------- + .-A---M--. + / / \ + I B R + \ / / + \ / / + `---X--' +----------------------------------------------------------------------- ++ +In this view, we see all of the important single-parent changes from +`A`, `B`, and `X`. We also see the carefully-resolved merge `M` and the +not-so-carefully-resolved merge `R`. This is usually enough information +to determine why the commits `A` and `B` "disappeared" from history in +the default view. However, there are a few issues with this approach. ++ +The first issue is performance. Unlike any previous option, the +`--simplify-merges` option requires walking the entire commit history +before returning a single result. This can make the option difficult to +use for very large repositories. ++ +The second issue is one of auditing. When many contributors are working +on the same repository, it is important which merge commits introduced +a change into an important branch. The problematic merge `R` above is +not likely to be the merge commit that was used to merge into an +important branch. Instead, the merge `N` was used to merge `R` and `X` +into the important branch. This commit may have information about why +the change `X` came to override the changes from `A` and `B` in its +commit message. ++ +The `--show-pulls` option helps with both of these issues by adding more +merge commits to the history results. If a merge is not TREESAME to its +first parent but is TREESAME to a later parent, then that merge is +treated as if it "pulled" the change from another branch. When using +`--show-pulls` on this example (and no other options) the resulting +graph is: ++ +----------------------------------------------------------------------- + I---X---R---N +----------------------------------------------------------------------- ++ +Here, the merge commits `R` and `N` are included because they pulled +the commits `X` and `R` into the base branch, respectively. These +merges are the reason the commits `A` and `B` do not appear in the +default history. ++ +When `--show-pulls` is paired with `--simplify-merges`, the +graph includes all of the necessary information: ++ +----------------------------------------------------------------------- + .-A---M--. N + / / \ / + I B R + \ / / + \ / / + `---X--' +----------------------------------------------------------------------- ++ +Notice that since `M` is reachable from `R`, the edge from `N` to `M` +was simplified away. However, `N` still appears in the history as an +important commit because it "pulled" the change `R` into the main +branch. + The `--simplify-by-decoration` option allows you to view only the big picture of the topology of the history, by omitting commits that are not referenced by tags. Commits are marked as !TREESAME diff --git a/object.h b/object.h index 2dbabfca0ab..b22328b8383 100644 --- a/object.h +++ b/object.h @@ -59,7 +59,7 @@ struct object_array { /* * object flag allocation: - * revision.h: 0---------10 25----28 + * revision.h: 0---------10 15 25----28 * fetch-pack.c: 01 * negotiator/default.c: 2--5 * walker.c: 0-2 diff --git a/revision.c b/revision.c index 8136929e236..f89dd6caa55 100644 --- a/revision.c +++ b/revision.c @@ -870,7 +870,19 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit) } parent->next = NULL; commit->parents = parent; - commit->object.flags |= TREESAME; + + /* + * A merge commit is a "diversion" if it is not + * TREESAME to its first parent but is TREESAME + * to a later parent. In the simplified history, + * we "divert" the history walk to the later + * parent. These commits are shown when "show_pulls" + * is enabled, so do not mark the object as + * TREESAME here. + */ + if (!revs->show_pulls || !nth_parent) + commit->object.flags |= TREESAME; + return; case REV_TREE_NEW: @@ -897,6 +909,10 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit) relevant_change = 1; else irrelevant_change = 1; + + if (!nth_parent) + commit->object.flags |= PULL_MERGE; + continue; } die("bad tree compare for commit %s", oid_to_hex(&commit->object.oid)); @@ -2265,6 +2281,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg } else if (!strcmp(arg, "--full-diff")) { revs->diff = 1; revs->full_diff = 1; + } else if (!strcmp(arg, "--show-pulls")) { + revs->show_pulls = 1; } else if (!strcmp(arg, "--full-history")) { revs->simplify_history = 0; } else if (!strcmp(arg, "--relative-date")) { @@ -3019,7 +3037,8 @@ static struct commit_list **simplify_one(struct rev_info *revs, struct commit *c if (!cnt || (commit->object.flags & UNINTERESTING) || !(commit->object.flags & TREESAME) || - (parent = one_relevant_parent(revs, commit->parents)) == NULL) + (parent = one_relevant_parent(revs, commit->parents)) == NULL || + (revs->show_pulls && (commit->object.flags & PULL_MERGE))) st->simplified = commit; else { pst = locate_simplify_state(revs, parent); @@ -3602,6 +3621,10 @@ enum commit_action get_commit_action(struct rev_info *revs, struct commit *commi /* drop merges unless we want parenthood */ if (!want_ancestry(revs)) return commit_ignore; + + if (revs->show_pulls && (commit->object.flags & PULL_MERGE)) + return commit_show; + /* * If we want ancestry, then need to keep any merges * between relevant commits to tie together topology. diff --git a/revision.h b/revision.h index 475f048fb61..70899eb246c 100644 --- a/revision.h +++ b/revision.h @@ -34,6 +34,9 @@ #define SYMMETRIC_LEFT (1u<<8) #define PATCHSAME (1u<<9) #define BOTTOM (1u<<10) + +/* WARNING: This is also used as REACHABLE in commit-graph.c. */ +#define PULL_MERGE (1u<<15) /* * Indicates object was reached by traversal. i.e. not given by user on * command-line or stdin. @@ -43,7 +46,7 @@ */ #define NOT_USER_GIVEN (1u<<25) #define TRACK_LINEAR (1u<<26) -#define ALL_REV_FLAGS (((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR) +#define ALL_REV_FLAGS (((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR | PULL_MERGE) #define TOPO_WALK_EXPLORED (1u<<27) #define TOPO_WALK_INDEGREE (1u<<28) @@ -129,6 +132,7 @@ struct rev_info { no_walk:2, remove_empty_trees:1, simplify_history:1, + show_pulls:1, topo_order:1, simplify_merges:1, simplify_by_decoration:1, diff --git a/t/t6012-rev-list-simplify.sh b/t/t6012-rev-list-simplify.sh index a10f0df02b0..b6fa43ace01 100755 --- a/t/t6012-rev-list-simplify.sh +++ b/t/t6012-rev-list-simplify.sh @@ -154,4 +154,124 @@ test_expect_success '--full-diff is not affected by --parents' ' test_cmp expected actual ' +# +# Create a new history to demonstrate the value of --show-pulls +# with respect to the subtleties of simplified history, --full-history, +# and --simplify-merges. +# +# .-A---M-----C--N---O---P +# / / \ \ \/ / / +# I B \ R-'`-Z' / +# \ / \/ / +# \ / /\ / +# `---X--' `---Y--' +# +# This example is explained in Documentation/rev-list-options.txt + +test_expect_success 'rebuild repo' ' + rm -rf .git * && + git init && + git switch -c main && + + echo base >file && + git add file && + test_commit I && + + echo A >file && + git add file && + test_commit A && + + git switch -c branchB I && + echo B >file && + git add file && + test_commit B && + + git switch main && + test_must_fail git merge -m "M" B && + echo A >file && + echo B >>file && + git add file && + git merge --continue && + note M && + + echo C >other && + git add other && + test_commit C && + + git switch -c branchX I && + echo X >file && + git add file && + test_commit X && + + git switch -c branchR M && + git merge -m R -Xtheirs X && + note R && + + git switch main && + git merge -m N R && + note N && + + git switch -c branchY M && + echo Y >y && + git add y && + test_commit Y && + + git switch -c branchZ C && + echo Z >z && + git add z && + test_commit Z && + + git switch main && + git merge -m O Z && + note O && + + git merge -m P Y && + note P +' + +check_result 'X I' -- file +check_result 'N R X I' --show-pulls -- file + +check_result 'P O N R X M B A I' --full-history --topo-order -- file +check_result 'N R X M B A I' --simplify-merges --topo-order --show-pulls -- file +check_result 'R X M B A I' --simplify-merges --topo-order -- file +check_result 'N M A I' --first-parent -- file +check_result 'N M A I' --first-parent --show-pulls -- file + +# --ancestry-path implies --full-history +check_result 'P O N R M' --topo-order \ + --ancestry-path A..HEAD -- file +check_result 'P O N R M' --topo-order \ + --show-pulls \ + --ancestry-path A..HEAD -- file +check_result 'P O N R M' --topo-order \ + --full-history \ + --ancestry-path A..HEAD -- file +check_result 'R M' --topo-order \ + --simplify-merges \ + --ancestry-path A..HEAD -- file +check_result 'N R M' --topo-order \ + --simplify-merges --show-pulls \ + --ancestry-path A..HEAD -- file + +test_expect_success 'log --graph --simplify-merges --show-pulls' ' + cat >expect <<-\EOF && + * N + * R + |\ + | * X + * | M + |\ \ + | * | B + | |/ + * / A + |/ + * I + EOF + git log --graph --pretty="%s" \ + --simplify-merges --show-pulls \ + -- file >actual && + test_cmp expect actual +' + test_done