Message ID | pull.1734.v2.git.1731073383564.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v2] range-diff: optionally include merge commits' diffs in the analysis | expand |
On Fri, Nov 8, 2024 at 5:43 AM Johannes Schindelin via GitGitGadget <gitgitgadget@gmail.com> wrote: > > From: Johannes Schindelin <johannes.schindelin@gmx.de> > > The `git log` command already offers support for including diffs for > merges, via the `--diff-merges=<format>` option. > > Let's add corresponding support for `git range-diff`, too. This makes it > more convenient to spot differences between iterations of non-linear > contributions, where so-called "evil merges" are sometimes necessary and > need to be reviewed, too. > > In my code reviews, I found the `--diff-merges=first-parent` option > particularly useful. > > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> > --- > Support diff merges option in range diff > > The git range-diff command does the same with merge commits as git > rebase: It ignores them. > > However, when comparing branch thickets it can be quite illuminating to > watch out for inadvertent changes in merge commits, in particular when > some "evil" merges have been replayed, i.e. merges that needed to > introduce changes outside of the merge conflicts (e.g. when one branch > changed a function's signature and another branch introduced a caller of > said function), in case the replayed merge is no longer "evil" and > therefore potentially incorrect. > > Let's introduce support for the --diff-merges option that is passed > through to those git log commands. > > I had a need for this earlier this year and got it working, leaving the > GitGitGadget PR in a draft mode. Phil Blain found it and kindly > nerd-sniped me into readying it for submitting, so say thanks to Phil! > > Changes since v1: > > * Changed the documentation to recommend first-parent mode instead of > vaguely talking about various modes' merits. > * Disallowed the no-arg --diff-merges option (because --diff-merges > requires an argument). > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1734%2Fdscho%2Fsupport-diff-merges-option-in-range-diff-v2 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1734/dscho/support-diff-merges-option-in-range-diff-v2 > Pull-Request: https://github.com/gitgitgadget/git/pull/1734 > > Range-diff vs v1: > > 1: 11361e07af8 ! 1: d01a395900b range-diff: optionally include merge commits' diffs in the analysis > @@ Documentation/git-range-diff.txt: to revert to color all lines according to the > + corresponding `--diff-merges=<format>` option of linkgit:git-log[1], > + and include them in the comparison. > ++ > -+Note: Some of the formats supported by linkgit:git-log[1] make less sense in > -+the context of the `range-diff` command than other formats, so choose wisely! > ++Note: In the common case, the `first-parent` mode will be the most natural one > ++to use, as it is consistent with the idea that a merge is kind of a "meta > ++patch", comprising all the merged commits' patches into a single one. > + > --[no-]notes[=<ref>]:: > This flag is passed to the `git log` program > @@ builtin/range-diff.c: int cmd_range_diff(int argc, > N_("notes"), N_("passed to 'git log'"), > PARSE_OPT_OPTARG), > + OPT_PASSTHRU_ARGV(0, "diff-merges", &diff_merges_arg, > -+ N_("style"), N_("passed to 'git log'"), > -+ PARSE_OPT_OPTARG), > ++ N_("style"), N_("passed to 'git log'"), 0), > OPT_BOOL(0, "left-only", &left_only, > N_("only emit output related to the first range")), > OPT_BOOL(0, "right-only", &right_only, > > > Documentation/git-range-diff.txt | 11 ++++++++++- > builtin/range-diff.c | 10 ++++++++++ > range-diff.c | 15 +++++++++++---- > range-diff.h | 1 + > t/t3206-range-diff.sh | 16 ++++++++++++++++ > 5 files changed, 48 insertions(+), 5 deletions(-) > > diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt > index fbdbe0befeb..17a85957877 100644 > --- a/Documentation/git-range-diff.txt > +++ b/Documentation/git-range-diff.txt > @@ -10,7 +10,7 @@ SYNOPSIS > [verse] > 'git range-diff' [--color=[<when>]] [--no-color] [<diff-options>] > [--no-dual-color] [--creation-factor=<factor>] > - [--left-only | --right-only] > + [--left-only | --right-only] [--diff-merges=<format>] > ( <range1> <range2> | <rev1>...<rev2> | <base> <rev1> <rev2> ) > [[--] <path>...] > > @@ -81,6 +81,15 @@ to revert to color all lines according to the outer diff markers > Suppress commits that are missing from the second specified range > (or the "right range" when using the `<rev1>...<rev2>` format). > > +--diff-merges=<format>:: > + Instead of ignoring merge commits, generate diffs for them using the > + corresponding `--diff-merges=<format>` option of linkgit:git-log[1], > + and include them in the comparison. > ++ > +Note: In the common case, the `first-parent` mode will be the most natural one > +to use, I think we need more wording around "common case"; I believe this "common case" is when you are diffing against a merely transplanted series of commits (a series created by rebasing without inserting, removing, or minimally modifying those commits in the process) that `first-parent` makes sense. And it only makes sense in that case. I think `remerge-diff` generally makes sense here, both in the cases when `first-parent` makes sense and when `first-parent` does not. It could be improved by suppressing the inclusion of short commit ids (and maybe also commit messages) in the labels of conflict markers. I suspect that issue might make `remerge-diff` less useful than `first-parent` in simple common cases currently, but I think it's the right thing to build upon for what you are trying to view. If `remerge-diff` didn't exist, I think I'd always use `dense-combined` over `first-parent` because of this merely-transplanted limitation. I suspect dense-combined would probably be kind of ugly and hard to parse when there is an actual evil merge, but it'd at least limit what it shows to the evil merge content. > as it is consistent with the idea that a merge is kind of a "meta > +patch", comprising all the merged commits' patches into a single one. Doesn't this wording of yours naturally lead to the idea that `first-parent` is a bad choice if patches leading to the merge have been inserted, removed, or more than minimally modified? It points out that first-parent is a diff over the whole range, and so range-diff shows you a diff of the diffs over the whole range. If you want to look at the "evilness" of merge commits, i.e. the user-generated portion of the diffs included in by merge commits, you need either remerge-diff or dense-combined. > + > --[no-]notes[=<ref>]:: > This flag is passed to the `git log` program > (see linkgit:git-log[1]) that generates the patches. > diff --git a/builtin/range-diff.c b/builtin/range-diff.c > index 1b33ab66a7b..901de5d133d 100644 > --- a/builtin/range-diff.c > +++ b/builtin/range-diff.c > @@ -21,6 +21,7 @@ int cmd_range_diff(int argc, > { > struct diff_options diffopt = { NULL }; > struct strvec other_arg = STRVEC_INIT; > + struct strvec diff_merges_arg = STRVEC_INIT; > struct range_diff_options range_diff_opts = { > .creation_factor = RANGE_DIFF_CREATION_FACTOR_DEFAULT, > .diffopt = &diffopt, > @@ -36,6 +37,8 @@ int cmd_range_diff(int argc, > OPT_PASSTHRU_ARGV(0, "notes", &other_arg, > N_("notes"), N_("passed to 'git log'"), > PARSE_OPT_OPTARG), > + OPT_PASSTHRU_ARGV(0, "diff-merges", &diff_merges_arg, > + N_("style"), N_("passed to 'git log'"), 0), > OPT_BOOL(0, "left-only", &left_only, > N_("only emit output related to the first range")), > OPT_BOOL(0, "right-only", &right_only, > @@ -62,6 +65,12 @@ int cmd_range_diff(int argc, > if (!simple_color) > diffopt.use_color = 1; > > + /* If `--diff-merges` was specified, imply `--merges` */ > + if (diff_merges_arg.nr) { > + range_diff_opts.include_merges = 1; > + strvec_pushv(&other_arg, diff_merges_arg.v); > + } > + > for (i = 0; i < argc; i++) > if (!strcmp(argv[i], "--")) { > dash_dash = i; > @@ -155,6 +164,7 @@ int cmd_range_diff(int argc, > res = show_range_diff(range1.buf, range2.buf, &range_diff_opts); > > strvec_clear(&other_arg); > + strvec_clear(&diff_merges_arg); > strbuf_release(&range1); > strbuf_release(&range2); > > diff --git a/range-diff.c b/range-diff.c > index bbb0952264b..9e59733059b 100644 > --- a/range-diff.c > +++ b/range-diff.c > @@ -38,7 +38,8 @@ struct patch_util { > * as struct object_id (will need to be free()d). > */ > static int read_patches(const char *range, struct string_list *list, > - const struct strvec *other_arg) > + const struct strvec *other_arg, > + unsigned int include_merges) > { > struct child_process cp = CHILD_PROCESS_INIT; > struct strbuf buf = STRBUF_INIT, contents = STRBUF_INIT; > @@ -49,7 +50,7 @@ static int read_patches(const char *range, struct string_list *list, > size_t size; > int ret = -1; > > - strvec_pushl(&cp.args, "log", "--no-color", "-p", "--no-merges", > + strvec_pushl(&cp.args, "log", "--no-color", "-p", > "--reverse", "--date-order", "--decorate=no", > "--no-prefix", "--submodule=short", > /* > @@ -64,6 +65,8 @@ static int read_patches(const char *range, struct string_list *list, > "--pretty=medium", > "--show-notes-by-default", > NULL); > + if (!include_merges) > + strvec_push(&cp.args, "--no-merges"); > strvec_push(&cp.args, range); > if (other_arg) > strvec_pushv(&cp.args, other_arg->v); > @@ -96,11 +99,14 @@ static int read_patches(const char *range, struct string_list *list, > } > > if (skip_prefix(line, "commit ", &p)) { > + char *q; > if (util) { > string_list_append(list, buf.buf)->util = util; > strbuf_reset(&buf); > } > CALLOC_ARRAY(util, 1); > + if (include_merges && (q = strstr(p, " (from "))) > + *q = '\0'; > if (repo_get_oid(the_repository, p, &util->oid)) { > error(_("could not parse commit '%s'"), p); > FREE_AND_NULL(util); > @@ -571,13 +577,14 @@ int show_range_diff(const char *range1, const char *range2, > > struct string_list branch1 = STRING_LIST_INIT_DUP; > struct string_list branch2 = STRING_LIST_INIT_DUP; > + unsigned int include_merges = range_diff_opts->include_merges; > > if (range_diff_opts->left_only && range_diff_opts->right_only) > res = error(_("options '%s' and '%s' cannot be used together"), "--left-only", "--right-only"); > > - if (!res && read_patches(range1, &branch1, range_diff_opts->other_arg)) > + if (!res && read_patches(range1, &branch1, range_diff_opts->other_arg, include_merges)) > res = error(_("could not parse log for '%s'"), range1); > - if (!res && read_patches(range2, &branch2, range_diff_opts->other_arg)) > + if (!res && read_patches(range2, &branch2, range_diff_opts->other_arg, include_merges)) > res = error(_("could not parse log for '%s'"), range2); > > if (!res) { > diff --git a/range-diff.h b/range-diff.h > index 2f69f6a434d..cd85000b5a0 100644 > --- a/range-diff.h > +++ b/range-diff.h > @@ -16,6 +16,7 @@ struct range_diff_options { > int creation_factor; > unsigned dual_color:1; > unsigned left_only:1, right_only:1; > + unsigned include_merges:1; > const struct diff_options *diffopt; /* may be NULL */ > const struct strvec *other_arg; /* may be NULL */ > }; > diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh > index 86010931ab6..c18a3fdab83 100755 > --- a/t/t3206-range-diff.sh > +++ b/t/t3206-range-diff.sh > @@ -909,4 +909,20 @@ test_expect_success 'submodule changes are shown irrespective of diff.submodule' > test_cmp expect actual > ' > > +test_expect_success '--diff-merges' ' > + renamed_oid=$(git rev-parse --short renamed-file) && > + tree=$(git merge-tree unmodified renamed-file) && > + clean=$(git commit-tree -m merge -p unmodified -p renamed-file $tree) && > + clean_oid=$(git rev-parse --short $clean) && > + conflict=$(git commit-tree -m merge -p unmodified -p renamed-file^ $tree) && > + conflict_oid=$(git rev-parse --short $conflict) && > + > + git range-diff --diff-merges=1 $clean...$conflict >actual && > + cat >expect <<-EOF && > + 1: $renamed_oid < -: ------- s/12/B/ > + 2: $clean_oid = 1: $conflict_oid merge > + EOF > + test_cmp expect actual > +' > + > test_done > > base-commit: 777489f9e09c8d0dd6b12f9d90de6376330577a2 > -- > gitgitgadget
Hi Dscho, Le 2024-11-08 à 08:43, Johannes Schindelin via GitGitGadget a écrit : > From: Johannes Schindelin <johannes.schindelin@gmx.de> > > The `git log` command already offers support for including diffs for > merges, via the `--diff-merges=<format>` option. > > Let's add corresponding support for `git range-diff`, too. This makes it > more convenient to spot differences between iterations of non-linear > contributions, where so-called "evil merges" are sometimes necessary and > need to be reviewed, too. Maybe "between commit ranges that include merge commits" would be more workflow-agnostic ? Just thinking out loud here, I think what you wrote is also OK. > In my code reviews, I found the `--diff-merges=first-parent` option > particularly useful. > > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> > --- > Support diff merges option in range diff > > The git range-diff command does the same with merge commits as git > rebase: It ignores them. > > However, when comparing branch thickets it can be quite illuminating to > watch out for inadvertent changes in merge commits, in particular when > some "evil" merges have been replayed, i.e. merges that needed to > introduce changes outside of the merge conflicts (e.g. when one branch > changed a function's signature and another branch introduced a caller of > said function), in case the replayed merge is no longer "evil" and > therefore potentially incorrect. > > Let's introduce support for the --diff-merges option that is passed > through to those git log commands. > > I had a need for this earlier this year and got it working, leaving the > GitGitGadget PR in a draft mode. Phil Blain found it and kindly > nerd-sniped me into readying it for submitting, so say thanks to Phil! > > Changes since v1: > > * Changed the documentation to recommend first-parent mode instead of > vaguely talking about various modes' merits. > * Disallowed the no-arg --diff-merges option (because --diff-merges > requires an argument). > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1734%2Fdscho%2Fsupport-diff-merges-option-in-range-diff-v2 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1734/dscho/support-diff-merges-option-in-range-diff-v2 > Pull-Request: https://github.com/gitgitgadget/git/pull/1734 > > Range-diff vs v1: > > 1: 11361e07af8 ! 1: d01a395900b range-diff: optionally include merge commits' diffs in the analysis > @@ Documentation/git-range-diff.txt: to revert to color all lines according to the > + corresponding `--diff-merges=<format>` option of linkgit:git-log[1], > + and include them in the comparison. > ++ > -+Note: Some of the formats supported by linkgit:git-log[1] make less sense in > -+the context of the `range-diff` command than other formats, so choose wisely! > ++Note: In the common case, the `first-parent` mode will be the most natural one > ++to use, as it is consistent with the idea that a merge is kind of a "meta > ++patch", comprising all the merged commits' patches into a single one. > + > --[no-]notes[=<ref>]:: > This flag is passed to the `git log` program > @@ builtin/range-diff.c: int cmd_range_diff(int argc, > N_("notes"), N_("passed to 'git log'"), > PARSE_OPT_OPTARG), > + OPT_PASSTHRU_ARGV(0, "diff-merges", &diff_merges_arg, > -+ N_("style"), N_("passed to 'git log'"), > -+ PARSE_OPT_OPTARG), > ++ N_("style"), N_("passed to 'git log'"), 0), > OPT_BOOL(0, "left-only", &left_only, > N_("only emit output related to the first range")), > OPT_BOOL(0, "right-only", &right_only, > > > Documentation/git-range-diff.txt | 11 ++++++++++- > builtin/range-diff.c | 10 ++++++++++ > range-diff.c | 15 +++++++++++---- > range-diff.h | 1 + > t/t3206-range-diff.sh | 16 ++++++++++++++++ > 5 files changed, 48 insertions(+), 5 deletions(-) > > diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt > index fbdbe0befeb..17a85957877 100644 > --- a/Documentation/git-range-diff.txt > +++ b/Documentation/git-range-diff.txt > @@ -10,7 +10,7 @@ SYNOPSIS > [verse] > 'git range-diff' [--color=[<when>]] [--no-color] [<diff-options>] > [--no-dual-color] [--creation-factor=<factor>] > - [--left-only | --right-only] > + [--left-only | --right-only] [--diff-merges=<format>] > ( <range1> <range2> | <rev1>...<rev2> | <base> <rev1> <rev2> ) > [[--] <path>...] > > @@ -81,6 +81,15 @@ to revert to color all lines according to the outer diff markers > Suppress commits that are missing from the second specified range > (or the "right range" when using the `<rev1>...<rev2>` format). > > +--diff-merges=<format>:: > + Instead of ignoring merge commits, generate diffs for them using the > + corresponding `--diff-merges=<format>` option of linkgit:git-log[1], > + and include them in the comparison. > ++ > +Note: In the common case, the `first-parent` mode will be the most natural one > +to use, as it is consistent with the idea that a merge is kind of a "meta > +patch", comprising all the merged commits' patches into a single one. I think I agree with Elijah that we probably should also highlight at least 'remerge'. Also, is it worth making this a proper Asciidoc "[NOTE]" ? (I'm not sure, there are a lot of "Notes:" in our doc that are not Asciidoc "[NOTE]"s. > diff --git a/builtin/range-diff.c b/builtin/range-diff.c > index 1b33ab66a7b..901de5d133d 100644 > --- a/builtin/range-diff.c > +++ b/builtin/range-diff.c The changes look good to me. Maybe it would be nice to add a corresponding 'range-diff.diffMerges' config option to allow users to configure the behaviour more permanently ? > diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh > index 86010931ab6..c18a3fdab83 100755 > --- a/t/t3206-range-diff.sh > +++ b/t/t3206-range-diff.sh > @@ -909,4 +909,20 @@ test_expect_success 'submodule changes are shown irrespective of diff.submodule' > test_cmp expect actual > ' > > +test_expect_success '--diff-merges' ' > + renamed_oid=$(git rev-parse --short renamed-file) && > + tree=$(git merge-tree unmodified renamed-file) && > + clean=$(git commit-tree -m merge -p unmodified -p renamed-file $tree) && > + clean_oid=$(git rev-parse --short $clean) && > + conflict=$(git commit-tree -m merge -p unmodified -p renamed-file^ $tree) && > + conflict_oid=$(git rev-parse --short $conflict) && > + > + git range-diff --diff-merges=1 $clean...$conflict >actual && > + cat >expect <<-EOF && > + 1: $renamed_oid < -: ------- s/12/B/ > + 2: $clean_oid = 1: $conflict_oid merge > + EOF > + test_cmp expect actual > +' The test looks good. Thanks for submitting that patch! Philippe.
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com> writes: > * Disallowed the no-arg --diff-merges option (because --diff-merges > requires an argument). Yup, I was happy to see it in the range-diff output, because I was wondering where the optarg came from in the initial submission. > +--diff-merges=<format>:: > + Instead of ignoring merge commits, generate diffs for them using the > + corresponding `--diff-merges=<format>` option of linkgit:git-log[1], > + and include them in the comparison. > ++ > +Note: In the common case, the `first-parent` mode will be the most natural one > +to use, as it is consistent with the idea that a merge is kind of a "meta > +patch", comprising all the merged commits' patches into a single one. I'll let you and Elijah figure out the best phrasing around here, including whether we recommend only first-parent or give other options. For me personally, I find that use of `first-parent` here is consistent with the mindset of those users who prefer to read "git log --first-parent -p", and to a smaller degree, those who use squash merges. To them, a merge is merely bringing in lots of changes into the mainline as a single unit, and other than that these lots of changes are broken out for finer grained inspection if/when they wanted to, they prefer to treat the changes brought in by merges just like a single-parent commit. And from that point of view, (1) reordering the changes inside the side branch is immaterial for the purpose of talking about the result, the net effect the merged topic makes to the mainline, and (2) dropping or adding new changes to the side branch is treated just like a newer iteration of a single parent commit having fewer or more changes. So it is very natural that first-parent helps to make the world view very simplistic, and more readable range-diff is all about how you can present the two iterations in a simple and flattened form. There _may_ need a tweak of the matching algorithm to allow the "same" merge on both sides to match, even if they diverge a lot, though. A range-diff that pairs a merge in the previous iteration with a single patch in the updated iteration may be hard to read.
On Sun, Nov 10, 2024 at 4:37 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com> > writes: > [...] > > +--diff-merges=<format>:: > > + Instead of ignoring merge commits, generate diffs for them using the > > + corresponding `--diff-merges=<format>` option of linkgit:git-log[1], > > + and include them in the comparison. > > ++ > > +Note: In the common case, the `first-parent` mode will be the most natural one > > +to use, as it is consistent with the idea that a merge is kind of a "meta > > +patch", comprising all the merged commits' patches into a single one. > > I'll let you and Elijah figure out the best phrasing around here, > including whether we recommend only first-parent or give other > options. I'd probably suggest: Note: When trying to see how users adjusted the contents of a merge commit, `remerge-diff` is a natural choice. When viewing merges as a "meta patch", comprising a squash of all the merged commits' patches into a single one (similar to `git log --first-parent -p`), `first-parent` is a natural choice. > For me personally, I find that use of `first-parent` here is > consistent with the mindset of those users who prefer to read "git > log --first-parent -p", and to a smaller degree, those who use > squash merges. To them, a merge is merely bringing in lots of > changes into the mainline as a single unit, and other than that > these lots of changes are broken out for finer grained inspection > if/when they wanted to, they prefer to treat the changes brought in > by merges just like a single-parent commit. And from that point of > view, (1) reordering the changes inside the side branch is > immaterial for the purpose of talking about the result, the net > effect the merged topic makes to the mainline, and (2) dropping or > adding new changes to the side branch is treated just like a newer > iteration of a single parent commit having fewer or more changes. > So it is very natural that first-parent helps to make the world view > very simplistic, and more readable range-diff is all about how you > can present the two iterations in a simple and flattened form. > > There _may_ need a tweak of the matching algorithm to allow the > "same" merge on both sides to match, even if they diverge a lot, > though. A range-diff that pairs a merge in the previous iteration > with a single patch in the updated iteration may be hard to read. Sounds like you are arguing that there is an angle/usecase from which `first-parent` makes sense, namely the viewpoint that "a merge is merely bringing in lots of changes into the mainline as a single unit" as you put it. What was surprising to me, though, is that it's a completely different usecase than the one that was brought up in the commit message for this feature, namely "so-called 'evil merges' are sometimes necessary and need to be reviewed too". If you want to review "evilness" or the differences between the changes that `git merge` would make and what got recorded in the commit, then `first-parent` is not that great an approximation of what you are looking for, BUT it happens to work beautifully in common cases where you are merely transplanting commits due to the fact that the huge amounts of extras being put into the diff happen to be equal on the two sides, so they cancel out and leave you with the "evilness" you are interested in. But if you stray from that common case (by e.g. inserting, removing, or modifying commits while rebasing) and are still interested in "evilness", your approximation quickly drops from accurately pulling out just the bits you are interested in towards being completely swamped by the noise. My concern with the commit message and patch as they stand, is that I feel the `first-parent` mode is being recommended as a solution for a certain usecase where it is really only an approximation that luckily is highly accurate in special but common scenarios while being very inaccurate in others. I think that will lead to users getting negatively surprised when they move outside those common cases.
Hi Elijah, On Fri, 8 Nov 2024, Elijah Newren wrote: > On Fri, Nov 8, 2024 at 5:43 AM Johannes Schindelin via GitGitGadget > <gitgitgadget@gmail.com> wrote: > > > > diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt > > index fbdbe0befeb..17a85957877 100644 > > --- a/Documentation/git-range-diff.txt > > +++ b/Documentation/git-range-diff.txt > > @@ -81,6 +81,15 @@ to revert to color all lines according to the outer diff markers > > Suppress commits that are missing from the second specified range > > (or the "right range" when using the `<rev1>...<rev2>` format). > > > > +--diff-merges=<format>:: > > + Instead of ignoring merge commits, generate diffs for them using the > > + corresponding `--diff-merges=<format>` option of linkgit:git-log[1], > > + and include them in the comparison. > > ++ > > +Note: In the common case, the `first-parent` mode will be the most natural one > > +to use, > > I think we need more wording around "common case"; I believe this > "common case" is when you are diffing against a merely transplanted > series of commits (a series created by rebasing without inserting, > removing, or minimally modifying those commits in the process) that > `first-parent` makes sense. And it only makes sense in that case. I tried to reproduce some of the common cases (which were more along the lines of stacked PRs than transplanted series of commits), and you're right: with the remerge bug fix, the range-diffs are much more easily interpreted. For example, I had to squash in a few fixes in https://github.com/git/git-scm.com/pull/1915, and I try to combine all of my currently-open git-scm.com PRs into the `gh-pages` branch in my fork. With `--diff-merges=1`, all of those single-commit merges that update the various translations of the book had diffs identical to the diffs of the commits they merged. Which confused range-diff quite a bit, often picking a non-merge as matching a merge (which is of course suboptimal). With `--diff-merges=remerge`, these trivial merges had no diffs at all, which made the review much easier. > I think `remerge-diff` generally makes sense here, both in the cases > when `first-parent` makes sense and when `first-parent` does not. It > could be improved by suppressing the inclusion of short commit ids > (and maybe also commit messages) in the labels of conflict markers. I > suspect that issue might make `remerge-diff` less useful than > `first-parent` in simple common cases currently, but I think it's the > right thing to build upon for what you are trying to view. Interesting idea about the suppressed commit IDs. I'll play with that. Ciao, Johannes
Hi Philippe, On Sun, 10 Nov 2024, Philippe Blain wrote: > Le 2024-11-08 à 08:43, Johannes Schindelin via GitGitGadget a écrit : > > From: Johannes Schindelin <johannes.schindelin@gmx.de> > > > > The `git log` command already offers support for including diffs for > > merges, via the `--diff-merges=<format>` option. > > > > Let's add corresponding support for `git range-diff`, too. This makes it > > more convenient to spot differences between iterations of non-linear > > contributions, where so-called "evil merges" are sometimes necessary and > > need to be reviewed, too. > > Maybe "between commit ranges that include merge commits" would be more > workflow-agnostic ? Good idea, this is much clearer than what I wrote, too. > > diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt > > index fbdbe0befeb..17a85957877 100644 > > --- a/Documentation/git-range-diff.txt > > +++ b/Documentation/git-range-diff.txt > > @@ -81,6 +81,15 @@ to revert to color all lines according to the outer diff markers > > Suppress commits that are missing from the second specified range > > (or the "right range" when using the `<rev1>...<rev2>` format). > > > > +--diff-merges=<format>:: > > + Instead of ignoring merge commits, generate diffs for them using the > > + corresponding `--diff-merges=<format>` option of linkgit:git-log[1], > > + and include them in the comparison. > > ++ > > +Note: In the common case, the `first-parent` mode will be the most natural one > > +to use, as it is consistent with the idea that a merge is kind of a "meta > > +patch", comprising all the merged commits' patches into a single one. > > I think I agree with Elijah that we probably should also highlight at least > 'remerge'. > > Also, is it worth making this a proper Asciidoc "[NOTE]" ? (I'm not sure, there are > a lot of "Notes:" in our doc that are not Asciidoc "[NOTE]"s. Right, I did not want to deviate too much from the surrounding style. Besides, I am not _so_ versed in AsciiDoc, I consider Markdown to be much more common, so I am much more familiar with that. I had no idea that there was a proper AsciiDoc [NOTE]. > > diff --git a/builtin/range-diff.c b/builtin/range-diff.c > > index 1b33ab66a7b..901de5d133d 100644 > > --- a/builtin/range-diff.c > > +++ b/builtin/range-diff.c > > The changes look good to me. Maybe it would be nice to add a corresponding > 'range-diff.diffMerges' config option to allow users to configure the > behaviour more permanently ? Seeing as there are no existing `rangeDiff.*` options, I am loathe to introduce the first one lest I am asked why I don't balloon this patch series into introducing config settings for the other options, too. Ciao, Johannes
Elijah Newren <newren@gmail.com> writes: >> There _may_ need a tweak of the matching algorithm to allow the >> "same" merge on both sides to match, even if they diverge a lot, >> though. A range-diff that pairs a merge in the previous iteration >> with a single patch in the updated iteration may be hard to read. > > Sounds like you are arguing that there is an angle/usecase from which > `first-parent` makes sense, namely the viewpoint that "a merge is > merely bringing in lots of changes into the mainline as a single unit" > as you put it. What was surprising to me, though, is that it's a > completely different usecase than the one that was brought up in the > commit message for this feature, namely "so-called 'evil merges' are > sometimes necessary and need to be reviewed too". What I had in mind when I wrote the example you are responding to is based on what sometimes happens while I make repeated merges (and as you may know, I make lots of them). In the first attempt I miss the fact that I need semantic adjustments and then in the second attempt I know what additional changes are necessary in the same merge (i.e. merging exactly the same iteration of the same topic). If you run the first-parent range-diff between one iteration of 'seen' and another, the "additional changes" I would make in the second attempt would be the only thing that will appear in the output, showing the "evil merge". There can also be updates in the topic itself when I rebuild 'seen', in addition to merge-fixes to adjust for semantic conflicts. Such a change would also appear in the first-parent view. If you used other views, like dense-combined or remerge-diff, these updates in the topic itself may be hidden, as these other views are specifically designed to highlight conflict resolutions and evil merges by discarding mechanically resolvable changes. So it all depends on what you are looking for and what you are deliberately excluding from the lower level of diff generation when you prepare your range-diff, which is a "diff of diff". Giving an impression that first-parent is the most useful may risk misleading readers in that sense.
diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt index fbdbe0befeb..17a85957877 100644 --- a/Documentation/git-range-diff.txt +++ b/Documentation/git-range-diff.txt @@ -10,7 +10,7 @@ SYNOPSIS [verse] 'git range-diff' [--color=[<when>]] [--no-color] [<diff-options>] [--no-dual-color] [--creation-factor=<factor>] - [--left-only | --right-only] + [--left-only | --right-only] [--diff-merges=<format>] ( <range1> <range2> | <rev1>...<rev2> | <base> <rev1> <rev2> ) [[--] <path>...] @@ -81,6 +81,15 @@ to revert to color all lines according to the outer diff markers Suppress commits that are missing from the second specified range (or the "right range" when using the `<rev1>...<rev2>` format). +--diff-merges=<format>:: + Instead of ignoring merge commits, generate diffs for them using the + corresponding `--diff-merges=<format>` option of linkgit:git-log[1], + and include them in the comparison. ++ +Note: In the common case, the `first-parent` mode will be the most natural one +to use, as it is consistent with the idea that a merge is kind of a "meta +patch", comprising all the merged commits' patches into a single one. + --[no-]notes[=<ref>]:: This flag is passed to the `git log` program (see linkgit:git-log[1]) that generates the patches. diff --git a/builtin/range-diff.c b/builtin/range-diff.c index 1b33ab66a7b..901de5d133d 100644 --- a/builtin/range-diff.c +++ b/builtin/range-diff.c @@ -21,6 +21,7 @@ int cmd_range_diff(int argc, { struct diff_options diffopt = { NULL }; struct strvec other_arg = STRVEC_INIT; + struct strvec diff_merges_arg = STRVEC_INIT; struct range_diff_options range_diff_opts = { .creation_factor = RANGE_DIFF_CREATION_FACTOR_DEFAULT, .diffopt = &diffopt, @@ -36,6 +37,8 @@ int cmd_range_diff(int argc, OPT_PASSTHRU_ARGV(0, "notes", &other_arg, N_("notes"), N_("passed to 'git log'"), PARSE_OPT_OPTARG), + OPT_PASSTHRU_ARGV(0, "diff-merges", &diff_merges_arg, + N_("style"), N_("passed to 'git log'"), 0), OPT_BOOL(0, "left-only", &left_only, N_("only emit output related to the first range")), OPT_BOOL(0, "right-only", &right_only, @@ -62,6 +65,12 @@ int cmd_range_diff(int argc, if (!simple_color) diffopt.use_color = 1; + /* If `--diff-merges` was specified, imply `--merges` */ + if (diff_merges_arg.nr) { + range_diff_opts.include_merges = 1; + strvec_pushv(&other_arg, diff_merges_arg.v); + } + for (i = 0; i < argc; i++) if (!strcmp(argv[i], "--")) { dash_dash = i; @@ -155,6 +164,7 @@ int cmd_range_diff(int argc, res = show_range_diff(range1.buf, range2.buf, &range_diff_opts); strvec_clear(&other_arg); + strvec_clear(&diff_merges_arg); strbuf_release(&range1); strbuf_release(&range2); diff --git a/range-diff.c b/range-diff.c index bbb0952264b..9e59733059b 100644 --- a/range-diff.c +++ b/range-diff.c @@ -38,7 +38,8 @@ struct patch_util { * as struct object_id (will need to be free()d). */ static int read_patches(const char *range, struct string_list *list, - const struct strvec *other_arg) + const struct strvec *other_arg, + unsigned int include_merges) { struct child_process cp = CHILD_PROCESS_INIT; struct strbuf buf = STRBUF_INIT, contents = STRBUF_INIT; @@ -49,7 +50,7 @@ static int read_patches(const char *range, struct string_list *list, size_t size; int ret = -1; - strvec_pushl(&cp.args, "log", "--no-color", "-p", "--no-merges", + strvec_pushl(&cp.args, "log", "--no-color", "-p", "--reverse", "--date-order", "--decorate=no", "--no-prefix", "--submodule=short", /* @@ -64,6 +65,8 @@ static int read_patches(const char *range, struct string_list *list, "--pretty=medium", "--show-notes-by-default", NULL); + if (!include_merges) + strvec_push(&cp.args, "--no-merges"); strvec_push(&cp.args, range); if (other_arg) strvec_pushv(&cp.args, other_arg->v); @@ -96,11 +99,14 @@ static int read_patches(const char *range, struct string_list *list, } if (skip_prefix(line, "commit ", &p)) { + char *q; if (util) { string_list_append(list, buf.buf)->util = util; strbuf_reset(&buf); } CALLOC_ARRAY(util, 1); + if (include_merges && (q = strstr(p, " (from "))) + *q = '\0'; if (repo_get_oid(the_repository, p, &util->oid)) { error(_("could not parse commit '%s'"), p); FREE_AND_NULL(util); @@ -571,13 +577,14 @@ int show_range_diff(const char *range1, const char *range2, struct string_list branch1 = STRING_LIST_INIT_DUP; struct string_list branch2 = STRING_LIST_INIT_DUP; + unsigned int include_merges = range_diff_opts->include_merges; if (range_diff_opts->left_only && range_diff_opts->right_only) res = error(_("options '%s' and '%s' cannot be used together"), "--left-only", "--right-only"); - if (!res && read_patches(range1, &branch1, range_diff_opts->other_arg)) + if (!res && read_patches(range1, &branch1, range_diff_opts->other_arg, include_merges)) res = error(_("could not parse log for '%s'"), range1); - if (!res && read_patches(range2, &branch2, range_diff_opts->other_arg)) + if (!res && read_patches(range2, &branch2, range_diff_opts->other_arg, include_merges)) res = error(_("could not parse log for '%s'"), range2); if (!res) { diff --git a/range-diff.h b/range-diff.h index 2f69f6a434d..cd85000b5a0 100644 --- a/range-diff.h +++ b/range-diff.h @@ -16,6 +16,7 @@ struct range_diff_options { int creation_factor; unsigned dual_color:1; unsigned left_only:1, right_only:1; + unsigned include_merges:1; const struct diff_options *diffopt; /* may be NULL */ const struct strvec *other_arg; /* may be NULL */ }; diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh index 86010931ab6..c18a3fdab83 100755 --- a/t/t3206-range-diff.sh +++ b/t/t3206-range-diff.sh @@ -909,4 +909,20 @@ test_expect_success 'submodule changes are shown irrespective of diff.submodule' test_cmp expect actual ' +test_expect_success '--diff-merges' ' + renamed_oid=$(git rev-parse --short renamed-file) && + tree=$(git merge-tree unmodified renamed-file) && + clean=$(git commit-tree -m merge -p unmodified -p renamed-file $tree) && + clean_oid=$(git rev-parse --short $clean) && + conflict=$(git commit-tree -m merge -p unmodified -p renamed-file^ $tree) && + conflict_oid=$(git rev-parse --short $conflict) && + + git range-diff --diff-merges=1 $clean...$conflict >actual && + cat >expect <<-EOF && + 1: $renamed_oid < -: ------- s/12/B/ + 2: $clean_oid = 1: $conflict_oid merge + EOF + test_cmp expect actual +' + test_done