diff mbox series

[v2] range-diff: optionally include merge commits' diffs in the analysis

Message ID pull.1734.v2.git.1731073383564.gitgitgadget@gmail.com (mailing list archive)
State New
Headers show
Series [v2] range-diff: optionally include merge commits' diffs in the analysis | expand

Commit Message

Johannes Schindelin Nov. 8, 2024, 1:43 p.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

The `git log` command already offers support for including diffs for
merges, via the `--diff-merges=<format>` option.

Let's add corresponding support for `git range-diff`, too. This makes it
more convenient to spot differences between iterations of non-linear
contributions, where so-called "evil merges" are sometimes necessary and
need to be reviewed, too.

In my code reviews, I found the `--diff-merges=first-parent` option
particularly useful.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
    Support diff merges option in range diff
    
    The git range-diff command does the same with merge commits as git
    rebase: It ignores them.
    
    However, when comparing branch thickets it can be quite illuminating to
    watch out for inadvertent changes in merge commits, in particular when
    some "evil" merges have been replayed, i.e. merges that needed to
    introduce changes outside of the merge conflicts (e.g. when one branch
    changed a function's signature and another branch introduced a caller of
    said function), in case the replayed merge is no longer "evil" and
    therefore potentially incorrect.
    
    Let's introduce support for the --diff-merges option that is passed
    through to those git log commands.
    
    I had a need for this earlier this year and got it working, leaving the
    GitGitGadget PR in a draft mode. Phil Blain found it and kindly
    nerd-sniped me into readying it for submitting, so say thanks to Phil!
    
    Changes since v1:
    
     * Changed the documentation to recommend first-parent mode instead of
       vaguely talking about various modes' merits.
     * Disallowed the no-arg --diff-merges option (because --diff-merges
       requires an argument).

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1734%2Fdscho%2Fsupport-diff-merges-option-in-range-diff-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1734/dscho/support-diff-merges-option-in-range-diff-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1734

Range-diff vs v1:

 1:  11361e07af8 ! 1:  d01a395900b range-diff: optionally include merge commits' diffs in the analysis
     @@ Documentation/git-range-diff.txt: to revert to color all lines according to the
      +	corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
      +	and include them in the comparison.
      ++
     -+Note: Some of the formats supported by linkgit:git-log[1] make less sense in
     -+the context of the `range-diff` command than other formats, so choose wisely!
     ++Note: In the common case, the `first-parent` mode will be the most natural one
     ++to use, as it is consistent with the idea that a merge is kind of a "meta
     ++patch", comprising all the merged commits' patches into a single one.
      +
       --[no-]notes[=<ref>]::
       	This flag is passed to the `git log` program
     @@ builtin/range-diff.c: int cmd_range_diff(int argc,
       				  N_("notes"), N_("passed to 'git log'"),
       				  PARSE_OPT_OPTARG),
      +		OPT_PASSTHRU_ARGV(0, "diff-merges", &diff_merges_arg,
     -+				  N_("style"), N_("passed to 'git log'"),
     -+				  PARSE_OPT_OPTARG),
     ++				  N_("style"), N_("passed to 'git log'"), 0),
       		OPT_BOOL(0, "left-only", &left_only,
       			 N_("only emit output related to the first range")),
       		OPT_BOOL(0, "right-only", &right_only,


 Documentation/git-range-diff.txt | 11 ++++++++++-
 builtin/range-diff.c             | 10 ++++++++++
 range-diff.c                     | 15 +++++++++++----
 range-diff.h                     |  1 +
 t/t3206-range-diff.sh            | 16 ++++++++++++++++
 5 files changed, 48 insertions(+), 5 deletions(-)


base-commit: 777489f9e09c8d0dd6b12f9d90de6376330577a2

Comments

Elijah Newren Nov. 8, 2024, 5:04 p.m. UTC | #1
On Fri, Nov 8, 2024 at 5:43 AM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> The `git log` command already offers support for including diffs for
> merges, via the `--diff-merges=<format>` option.
>
> Let's add corresponding support for `git range-diff`, too. This makes it
> more convenient to spot differences between iterations of non-linear
> contributions, where so-called "evil merges" are sometimes necessary and
> need to be reviewed, too.
>
> In my code reviews, I found the `--diff-merges=first-parent` option
> particularly useful.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>     Support diff merges option in range diff
>
>     The git range-diff command does the same with merge commits as git
>     rebase: It ignores them.
>
>     However, when comparing branch thickets it can be quite illuminating to
>     watch out for inadvertent changes in merge commits, in particular when
>     some "evil" merges have been replayed, i.e. merges that needed to
>     introduce changes outside of the merge conflicts (e.g. when one branch
>     changed a function's signature and another branch introduced a caller of
>     said function), in case the replayed merge is no longer "evil" and
>     therefore potentially incorrect.
>
>     Let's introduce support for the --diff-merges option that is passed
>     through to those git log commands.
>
>     I had a need for this earlier this year and got it working, leaving the
>     GitGitGadget PR in a draft mode. Phil Blain found it and kindly
>     nerd-sniped me into readying it for submitting, so say thanks to Phil!
>
>     Changes since v1:
>
>      * Changed the documentation to recommend first-parent mode instead of
>        vaguely talking about various modes' merits.
>      * Disallowed the no-arg --diff-merges option (because --diff-merges
>        requires an argument).
>
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1734%2Fdscho%2Fsupport-diff-merges-option-in-range-diff-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1734/dscho/support-diff-merges-option-in-range-diff-v2
> Pull-Request: https://github.com/gitgitgadget/git/pull/1734
>
> Range-diff vs v1:
>
>  1:  11361e07af8 ! 1:  d01a395900b range-diff: optionally include merge commits' diffs in the analysis
>      @@ Documentation/git-range-diff.txt: to revert to color all lines according to the
>       + corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
>       + and include them in the comparison.
>       ++
>      -+Note: Some of the formats supported by linkgit:git-log[1] make less sense in
>      -+the context of the `range-diff` command than other formats, so choose wisely!
>      ++Note: In the common case, the `first-parent` mode will be the most natural one
>      ++to use, as it is consistent with the idea that a merge is kind of a "meta
>      ++patch", comprising all the merged commits' patches into a single one.
>       +
>        --[no-]notes[=<ref>]::
>         This flag is passed to the `git log` program
>      @@ builtin/range-diff.c: int cmd_range_diff(int argc,
>                                   N_("notes"), N_("passed to 'git log'"),
>                                   PARSE_OPT_OPTARG),
>       +         OPT_PASSTHRU_ARGV(0, "diff-merges", &diff_merges_arg,
>      -+                           N_("style"), N_("passed to 'git log'"),
>      -+                           PARSE_OPT_OPTARG),
>      ++                           N_("style"), N_("passed to 'git log'"), 0),
>                 OPT_BOOL(0, "left-only", &left_only,
>                          N_("only emit output related to the first range")),
>                 OPT_BOOL(0, "right-only", &right_only,
>
>
>  Documentation/git-range-diff.txt | 11 ++++++++++-
>  builtin/range-diff.c             | 10 ++++++++++
>  range-diff.c                     | 15 +++++++++++----
>  range-diff.h                     |  1 +
>  t/t3206-range-diff.sh            | 16 ++++++++++++++++
>  5 files changed, 48 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt
> index fbdbe0befeb..17a85957877 100644
> --- a/Documentation/git-range-diff.txt
> +++ b/Documentation/git-range-diff.txt
> @@ -10,7 +10,7 @@ SYNOPSIS
>  [verse]
>  'git range-diff' [--color=[<when>]] [--no-color] [<diff-options>]
>         [--no-dual-color] [--creation-factor=<factor>]
> -       [--left-only | --right-only]
> +       [--left-only | --right-only] [--diff-merges=<format>]
>         ( <range1> <range2> | <rev1>...<rev2> | <base> <rev1> <rev2> )
>         [[--] <path>...]
>
> @@ -81,6 +81,15 @@ to revert to color all lines according to the outer diff markers
>         Suppress commits that are missing from the second specified range
>         (or the "right range" when using the `<rev1>...<rev2>` format).
>
> +--diff-merges=<format>::
> +       Instead of ignoring merge commits, generate diffs for them using the
> +       corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
> +       and include them in the comparison.
> ++
> +Note: In the common case, the `first-parent` mode will be the most natural one
> +to use,

I think we need more wording around "common case"; I believe this
"common case" is when you are diffing against a merely transplanted
series of commits (a series created by rebasing without inserting,
removing, or minimally modifying those commits in the process) that
`first-parent` makes sense.  And it only makes sense in that case.

I think `remerge-diff` generally makes sense here, both in the cases
when `first-parent` makes sense and when `first-parent` does not.  It
could be improved by suppressing the inclusion of short commit ids
(and maybe also commit messages) in the labels of conflict markers.  I
suspect that issue might make `remerge-diff` less useful than
`first-parent` in simple common cases currently, but I think it's the
right thing to build upon for what you are trying to view.

If `remerge-diff` didn't exist, I think I'd always use
`dense-combined` over `first-parent` because of this
merely-transplanted limitation.  I suspect dense-combined would
probably be kind of ugly and hard to parse when there is an actual
evil merge, but it'd at least limit what it shows to the evil merge
content.

> as it is consistent with the idea that a merge is kind of a "meta
> +patch", comprising all the merged commits' patches into a single one.

Doesn't this wording of yours naturally lead to the idea that
`first-parent` is a bad choice if patches leading to the merge have
been inserted, removed, or more than minimally modified?  It points
out that first-parent is a diff over the whole range, and so
range-diff shows you a diff of the diffs over the whole range.  If you
want to look at the "evilness" of merge commits, i.e. the
user-generated portion of the diffs included in by merge commits, you
need either remerge-diff or dense-combined.

> +
>  --[no-]notes[=<ref>]::
>         This flag is passed to the `git log` program
>         (see linkgit:git-log[1]) that generates the patches.
> diff --git a/builtin/range-diff.c b/builtin/range-diff.c
> index 1b33ab66a7b..901de5d133d 100644
> --- a/builtin/range-diff.c
> +++ b/builtin/range-diff.c
> @@ -21,6 +21,7 @@ int cmd_range_diff(int argc,
>  {
>         struct diff_options diffopt = { NULL };
>         struct strvec other_arg = STRVEC_INIT;
> +       struct strvec diff_merges_arg = STRVEC_INIT;
>         struct range_diff_options range_diff_opts = {
>                 .creation_factor = RANGE_DIFF_CREATION_FACTOR_DEFAULT,
>                 .diffopt = &diffopt,
> @@ -36,6 +37,8 @@ int cmd_range_diff(int argc,
>                 OPT_PASSTHRU_ARGV(0, "notes", &other_arg,
>                                   N_("notes"), N_("passed to 'git log'"),
>                                   PARSE_OPT_OPTARG),
> +               OPT_PASSTHRU_ARGV(0, "diff-merges", &diff_merges_arg,
> +                                 N_("style"), N_("passed to 'git log'"), 0),
>                 OPT_BOOL(0, "left-only", &left_only,
>                          N_("only emit output related to the first range")),
>                 OPT_BOOL(0, "right-only", &right_only,
> @@ -62,6 +65,12 @@ int cmd_range_diff(int argc,
>         if (!simple_color)
>                 diffopt.use_color = 1;
>
> +       /* If `--diff-merges` was specified, imply `--merges` */
> +       if (diff_merges_arg.nr) {
> +               range_diff_opts.include_merges = 1;
> +               strvec_pushv(&other_arg, diff_merges_arg.v);
> +       }
> +
>         for (i = 0; i < argc; i++)
>                 if (!strcmp(argv[i], "--")) {
>                         dash_dash = i;
> @@ -155,6 +164,7 @@ int cmd_range_diff(int argc,
>         res = show_range_diff(range1.buf, range2.buf, &range_diff_opts);
>
>         strvec_clear(&other_arg);
> +       strvec_clear(&diff_merges_arg);
>         strbuf_release(&range1);
>         strbuf_release(&range2);
>
> diff --git a/range-diff.c b/range-diff.c
> index bbb0952264b..9e59733059b 100644
> --- a/range-diff.c
> +++ b/range-diff.c
> @@ -38,7 +38,8 @@ struct patch_util {
>   * as struct object_id (will need to be free()d).
>   */
>  static int read_patches(const char *range, struct string_list *list,
> -                       const struct strvec *other_arg)
> +                       const struct strvec *other_arg,
> +                       unsigned int include_merges)
>  {
>         struct child_process cp = CHILD_PROCESS_INIT;
>         struct strbuf buf = STRBUF_INIT, contents = STRBUF_INIT;
> @@ -49,7 +50,7 @@ static int read_patches(const char *range, struct string_list *list,
>         size_t size;
>         int ret = -1;
>
> -       strvec_pushl(&cp.args, "log", "--no-color", "-p", "--no-merges",
> +       strvec_pushl(&cp.args, "log", "--no-color", "-p",
>                      "--reverse", "--date-order", "--decorate=no",
>                      "--no-prefix", "--submodule=short",
>                      /*
> @@ -64,6 +65,8 @@ static int read_patches(const char *range, struct string_list *list,
>                      "--pretty=medium",
>                      "--show-notes-by-default",
>                      NULL);
> +       if (!include_merges)
> +               strvec_push(&cp.args, "--no-merges");
>         strvec_push(&cp.args, range);
>         if (other_arg)
>                 strvec_pushv(&cp.args, other_arg->v);
> @@ -96,11 +99,14 @@ static int read_patches(const char *range, struct string_list *list,
>                 }
>
>                 if (skip_prefix(line, "commit ", &p)) {
> +                       char *q;
>                         if (util) {
>                                 string_list_append(list, buf.buf)->util = util;
>                                 strbuf_reset(&buf);
>                         }
>                         CALLOC_ARRAY(util, 1);
> +                       if (include_merges && (q = strstr(p, " (from ")))
> +                               *q = '\0';
>                         if (repo_get_oid(the_repository, p, &util->oid)) {
>                                 error(_("could not parse commit '%s'"), p);
>                                 FREE_AND_NULL(util);
> @@ -571,13 +577,14 @@ int show_range_diff(const char *range1, const char *range2,
>
>         struct string_list branch1 = STRING_LIST_INIT_DUP;
>         struct string_list branch2 = STRING_LIST_INIT_DUP;
> +       unsigned int include_merges = range_diff_opts->include_merges;
>
>         if (range_diff_opts->left_only && range_diff_opts->right_only)
>                 res = error(_("options '%s' and '%s' cannot be used together"), "--left-only", "--right-only");
>
> -       if (!res && read_patches(range1, &branch1, range_diff_opts->other_arg))
> +       if (!res && read_patches(range1, &branch1, range_diff_opts->other_arg, include_merges))
>                 res = error(_("could not parse log for '%s'"), range1);
> -       if (!res && read_patches(range2, &branch2, range_diff_opts->other_arg))
> +       if (!res && read_patches(range2, &branch2, range_diff_opts->other_arg, include_merges))
>                 res = error(_("could not parse log for '%s'"), range2);
>
>         if (!res) {
> diff --git a/range-diff.h b/range-diff.h
> index 2f69f6a434d..cd85000b5a0 100644
> --- a/range-diff.h
> +++ b/range-diff.h
> @@ -16,6 +16,7 @@ struct range_diff_options {
>         int creation_factor;
>         unsigned dual_color:1;
>         unsigned left_only:1, right_only:1;
> +       unsigned include_merges:1;
>         const struct diff_options *diffopt; /* may be NULL */
>         const struct strvec *other_arg; /* may be NULL */
>  };
> diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> index 86010931ab6..c18a3fdab83 100755
> --- a/t/t3206-range-diff.sh
> +++ b/t/t3206-range-diff.sh
> @@ -909,4 +909,20 @@ test_expect_success 'submodule changes are shown irrespective of diff.submodule'
>         test_cmp expect actual
>  '
>
> +test_expect_success '--diff-merges' '
> +       renamed_oid=$(git rev-parse --short renamed-file) &&
> +       tree=$(git merge-tree unmodified renamed-file) &&
> +       clean=$(git commit-tree -m merge -p unmodified -p renamed-file $tree) &&
> +       clean_oid=$(git rev-parse --short $clean) &&
> +       conflict=$(git commit-tree -m merge -p unmodified -p renamed-file^ $tree) &&
> +       conflict_oid=$(git rev-parse --short $conflict) &&
> +
> +       git range-diff --diff-merges=1 $clean...$conflict >actual &&
> +       cat >expect <<-EOF &&
> +       1:  $renamed_oid < -:  ------- s/12/B/
> +       2:  $clean_oid = 1:  $conflict_oid merge
> +       EOF
> +       test_cmp expect actual
> +'
> +
>  test_done
>
> base-commit: 777489f9e09c8d0dd6b12f9d90de6376330577a2
> --
> gitgitgadget
Philippe Blain Nov. 10, 2024, 8:30 p.m. UTC | #2
Hi Dscho,

Le 2024-11-08 à 08:43, Johannes Schindelin via GitGitGadget a écrit :
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> 
> The `git log` command already offers support for including diffs for
> merges, via the `--diff-merges=<format>` option.
> 
> Let's add corresponding support for `git range-diff`, too. This makes it
> more convenient to spot differences between iterations of non-linear
> contributions, where so-called "evil merges" are sometimes necessary and
> need to be reviewed, too.

Maybe "between commit ranges that include merge commits" would be more
workflow-agnostic ? Just thinking out loud here, I think what you wrote 
is also OK.

> In my code reviews, I found the `--diff-merges=first-parent` option
> particularly useful.
> 
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>     Support diff merges option in range diff
>     
>     The git range-diff command does the same with merge commits as git
>     rebase: It ignores them.
>     
>     However, when comparing branch thickets it can be quite illuminating to
>     watch out for inadvertent changes in merge commits, in particular when
>     some "evil" merges have been replayed, i.e. merges that needed to
>     introduce changes outside of the merge conflicts (e.g. when one branch
>     changed a function's signature and another branch introduced a caller of
>     said function), in case the replayed merge is no longer "evil" and
>     therefore potentially incorrect.
>     
>     Let's introduce support for the --diff-merges option that is passed
>     through to those git log commands.
>     
>     I had a need for this earlier this year and got it working, leaving the
>     GitGitGadget PR in a draft mode. Phil Blain found it and kindly
>     nerd-sniped me into readying it for submitting, so say thanks to Phil!
>     
>     Changes since v1:
>     
>      * Changed the documentation to recommend first-parent mode instead of
>        vaguely talking about various modes' merits.
>      * Disallowed the no-arg --diff-merges option (because --diff-merges
>        requires an argument).
> 
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1734%2Fdscho%2Fsupport-diff-merges-option-in-range-diff-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1734/dscho/support-diff-merges-option-in-range-diff-v2
> Pull-Request: https://github.com/gitgitgadget/git/pull/1734
> 
> Range-diff vs v1:
> 
>  1:  11361e07af8 ! 1:  d01a395900b range-diff: optionally include merge commits' diffs in the analysis
>      @@ Documentation/git-range-diff.txt: to revert to color all lines according to the
>       +	corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
>       +	and include them in the comparison.
>       ++
>      -+Note: Some of the formats supported by linkgit:git-log[1] make less sense in
>      -+the context of the `range-diff` command than other formats, so choose wisely!
>      ++Note: In the common case, the `first-parent` mode will be the most natural one
>      ++to use, as it is consistent with the idea that a merge is kind of a "meta
>      ++patch", comprising all the merged commits' patches into a single one.
>       +
>        --[no-]notes[=<ref>]::
>        	This flag is passed to the `git log` program
>      @@ builtin/range-diff.c: int cmd_range_diff(int argc,
>        				  N_("notes"), N_("passed to 'git log'"),
>        				  PARSE_OPT_OPTARG),
>       +		OPT_PASSTHRU_ARGV(0, "diff-merges", &diff_merges_arg,
>      -+				  N_("style"), N_("passed to 'git log'"),
>      -+				  PARSE_OPT_OPTARG),
>      ++				  N_("style"), N_("passed to 'git log'"), 0),
>        		OPT_BOOL(0, "left-only", &left_only,
>        			 N_("only emit output related to the first range")),
>        		OPT_BOOL(0, "right-only", &right_only,
> 
> 
>  Documentation/git-range-diff.txt | 11 ++++++++++-
>  builtin/range-diff.c             | 10 ++++++++++
>  range-diff.c                     | 15 +++++++++++----
>  range-diff.h                     |  1 +
>  t/t3206-range-diff.sh            | 16 ++++++++++++++++
>  5 files changed, 48 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt
> index fbdbe0befeb..17a85957877 100644
> --- a/Documentation/git-range-diff.txt
> +++ b/Documentation/git-range-diff.txt
> @@ -10,7 +10,7 @@ SYNOPSIS
>  [verse]
>  'git range-diff' [--color=[<when>]] [--no-color] [<diff-options>]
>  	[--no-dual-color] [--creation-factor=<factor>]
> -	[--left-only | --right-only]
> +	[--left-only | --right-only] [--diff-merges=<format>]
>  	( <range1> <range2> | <rev1>...<rev2> | <base> <rev1> <rev2> )
>  	[[--] <path>...]
>  
> @@ -81,6 +81,15 @@ to revert to color all lines according to the outer diff markers
>  	Suppress commits that are missing from the second specified range
>  	(or the "right range" when using the `<rev1>...<rev2>` format).
>  
> +--diff-merges=<format>::
> +	Instead of ignoring merge commits, generate diffs for them using the
> +	corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
> +	and include them in the comparison.
> ++
> +Note: In the common case, the `first-parent` mode will be the most natural one
> +to use, as it is consistent with the idea that a merge is kind of a "meta
> +patch", comprising all the merged commits' patches into a single one.

I think I agree with Elijah that we probably should also highlight at least
'remerge'.

Also, is it worth making this a proper Asciidoc "[NOTE]" ? (I'm not sure, there are 
a lot of "Notes:" in our doc that are not Asciidoc "[NOTE]"s.

> diff --git a/builtin/range-diff.c b/builtin/range-diff.c
> index 1b33ab66a7b..901de5d133d 100644
> --- a/builtin/range-diff.c
> +++ b/builtin/range-diff.c

The changes look good to me. Maybe it would be nice to add a corresponding
'range-diff.diffMerges' config option to allow users to configure the 
behaviour more permanently ?

> diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
> index 86010931ab6..c18a3fdab83 100755
> --- a/t/t3206-range-diff.sh
> +++ b/t/t3206-range-diff.sh
> @@ -909,4 +909,20 @@ test_expect_success 'submodule changes are shown irrespective of diff.submodule'
>  	test_cmp expect actual
>  '
>  
> +test_expect_success '--diff-merges' '
> +	renamed_oid=$(git rev-parse --short renamed-file) &&
> +	tree=$(git merge-tree unmodified renamed-file) &&
> +	clean=$(git commit-tree -m merge -p unmodified -p renamed-file $tree) &&
> +	clean_oid=$(git rev-parse --short $clean) &&
> +	conflict=$(git commit-tree -m merge -p unmodified -p renamed-file^ $tree) &&
> +	conflict_oid=$(git rev-parse --short $conflict) &&
> +
> +	git range-diff --diff-merges=1 $clean...$conflict >actual &&
> +	cat >expect <<-EOF &&
> +	1:  $renamed_oid < -:  ------- s/12/B/
> +	2:  $clean_oid = 1:  $conflict_oid merge
> +	EOF
> +	test_cmp expect actual
> +'

The test looks good. 

Thanks for submitting that patch!

Philippe.
Junio C Hamano Nov. 11, 2024, 12:37 a.m. UTC | #3
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

>      * Disallowed the no-arg --diff-merges option (because --diff-merges
>        requires an argument).

Yup, I was happy to see it in the range-diff output, because I was
wondering where the optarg came from in the initial submission.

> +--diff-merges=<format>::
> +	Instead of ignoring merge commits, generate diffs for them using the
> +	corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
> +	and include them in the comparison.
> ++
> +Note: In the common case, the `first-parent` mode will be the most natural one
> +to use, as it is consistent with the idea that a merge is kind of a "meta
> +patch", comprising all the merged commits' patches into a single one.

I'll let you and Elijah figure out the best phrasing around here,
including whether we recommend only first-parent or give other
options.

For me personally, I find that use of `first-parent` here is
consistent with the mindset of those users who prefer to read "git
log --first-parent -p", and to a smaller degree, those who use
squash merges.  To them, a merge is merely bringing in lots of
changes into the mainline as a single unit, and other than that
these lots of changes are broken out for finer grained inspection
if/when they wanted to, they prefer to treat the changes brought in
by merges just like a single-parent commit.  And from that point of
view, (1) reordering the changes inside the side branch is
immaterial for the purpose of talking about the result, the net
effect the merged topic makes to the mainline, and (2) dropping or
adding new changes to the side branch is treated just like a newer
iteration of a single parent commit having fewer or more changes.
So it is very natural that first-parent helps to make the world view
very simplistic, and more readable range-diff is all about how you
can present the two iterations in a simple and flattened form.

There _may_ need a tweak of the matching algorithm to allow the
"same" merge on both sides to match, even if they diverge a lot,
though.  A range-diff that pairs a merge in the previous iteration
with a single patch in the updated iteration may be hard to read.
Elijah Newren Nov. 11, 2024, 4:51 p.m. UTC | #4
On Sun, Nov 10, 2024 at 4:37 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
[...]
> > +--diff-merges=<format>::
> > +     Instead of ignoring merge commits, generate diffs for them using the
> > +     corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
> > +     and include them in the comparison.
> > ++
> > +Note: In the common case, the `first-parent` mode will be the most natural one
> > +to use, as it is consistent with the idea that a merge is kind of a "meta
> > +patch", comprising all the merged commits' patches into a single one.
>
> I'll let you and Elijah figure out the best phrasing around here,
> including whether we recommend only first-parent or give other
> options.

I'd probably suggest:

Note: When trying to see how users adjusted the contents of a merge
commit, `remerge-diff` is a natural choice.  When viewing merges as a
"meta patch", comprising a squash of all the merged commits' patches
into a single one (similar to `git log --first-parent -p`),
`first-parent` is a natural choice.

> For me personally, I find that use of `first-parent` here is
> consistent with the mindset of those users who prefer to read "git
> log --first-parent -p", and to a smaller degree, those who use
> squash merges.  To them, a merge is merely bringing in lots of
> changes into the mainline as a single unit, and other than that
> these lots of changes are broken out for finer grained inspection
> if/when they wanted to, they prefer to treat the changes brought in
> by merges just like a single-parent commit.  And from that point of
> view, (1) reordering the changes inside the side branch is
> immaterial for the purpose of talking about the result, the net
> effect the merged topic makes to the mainline, and (2) dropping or
> adding new changes to the side branch is treated just like a newer
> iteration of a single parent commit having fewer or more changes.
> So it is very natural that first-parent helps to make the world view
> very simplistic, and more readable range-diff is all about how you
> can present the two iterations in a simple and flattened form.
>
> There _may_ need a tweak of the matching algorithm to allow the
> "same" merge on both sides to match, even if they diverge a lot,
> though.  A range-diff that pairs a merge in the previous iteration
> with a single patch in the updated iteration may be hard to read.

Sounds like you are arguing that there is an angle/usecase from which
`first-parent` makes sense, namely the viewpoint that "a merge is
merely bringing in lots of changes into the mainline as a single unit"
as you put it.  What was surprising to me, though, is that it's a
completely different usecase than the one that was brought up in the
commit message for this feature, namely "so-called 'evil merges' are
sometimes necessary and need to be reviewed too".

If you want to review "evilness" or the differences between the
changes that `git merge` would make and what got recorded in the
commit, then `first-parent` is not that great an approximation of what
you are looking for, BUT it happens to work beautifully in common
cases where you are merely transplanting commits due to the fact that
the huge amounts of extras being put into the diff happen to be equal
on the two sides, so they cancel out and leave you with the "evilness"
you are interested in.  But if you stray from that common case (by
e.g. inserting, removing, or modifying commits while rebasing) and are
still interested in "evilness", your approximation quickly drops from
accurately pulling out just the bits you are interested in towards
being completely swamped by the noise.

My concern with the commit message and patch as they stand, is that I
feel the `first-parent` mode is being recommended as a solution for a
certain usecase where it is really only an approximation that luckily
is highly accurate in special but common scenarios while being very
inaccurate in others.  I think that will lead to users getting
negatively surprised when they move outside those common cases.
Johannes Schindelin Nov. 11, 2024, 7:55 p.m. UTC | #5
Hi Elijah,

On Fri, 8 Nov 2024, Elijah Newren wrote:

> On Fri, Nov 8, 2024 at 5:43 AM Johannes Schindelin via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt
> > index fbdbe0befeb..17a85957877 100644
> > --- a/Documentation/git-range-diff.txt
> > +++ b/Documentation/git-range-diff.txt
> > @@ -81,6 +81,15 @@ to revert to color all lines according to the outer diff markers
> >         Suppress commits that are missing from the second specified range
> >         (or the "right range" when using the `<rev1>...<rev2>` format).
> >
> > +--diff-merges=<format>::
> > +       Instead of ignoring merge commits, generate diffs for them using the
> > +       corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
> > +       and include them in the comparison.
> > ++
> > +Note: In the common case, the `first-parent` mode will be the most natural one
> > +to use,
>
> I think we need more wording around "common case"; I believe this
> "common case" is when you are diffing against a merely transplanted
> series of commits (a series created by rebasing without inserting,
> removing, or minimally modifying those commits in the process) that
> `first-parent` makes sense.  And it only makes sense in that case.

I tried to reproduce some of the common cases (which were more along the
lines of stacked PRs than transplanted series of commits), and you're
right: with the remerge bug fix, the range-diffs are much more easily
interpreted.

For example, I had to squash in a few fixes in
https://github.com/git/git-scm.com/pull/1915, and I try to combine all of
my currently-open git-scm.com PRs into the `gh-pages` branch in my fork.

With `--diff-merges=1`, all of those single-commit merges that update the
various translations of the book had diffs identical to the diffs of the
commits they merged. Which confused range-diff quite a bit, often picking
a non-merge as matching a merge (which is of course suboptimal).

With `--diff-merges=remerge`, these trivial merges had no diffs at all,
which made the review much easier.

> I think `remerge-diff` generally makes sense here, both in the cases
> when `first-parent` makes sense and when `first-parent` does not.  It
> could be improved by suppressing the inclusion of short commit ids
> (and maybe also commit messages) in the labels of conflict markers.  I
> suspect that issue might make `remerge-diff` less useful than
> `first-parent` in simple common cases currently, but I think it's the
> right thing to build upon for what you are trying to view.

Interesting idea about the suppressed commit IDs. I'll play with that.

Ciao,
Johannes
Johannes Schindelin Nov. 11, 2024, 8:07 p.m. UTC | #6
Hi Philippe,

On Sun, 10 Nov 2024, Philippe Blain wrote:

> Le 2024-11-08 à 08:43, Johannes Schindelin via GitGitGadget a écrit :
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> >
> > The `git log` command already offers support for including diffs for
> > merges, via the `--diff-merges=<format>` option.
> >
> > Let's add corresponding support for `git range-diff`, too. This makes it
> > more convenient to spot differences between iterations of non-linear
> > contributions, where so-called "evil merges" are sometimes necessary and
> > need to be reviewed, too.
>
> Maybe "between commit ranges that include merge commits" would be more
> workflow-agnostic ?

Good idea, this is much clearer than what I wrote, too.

> > diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt
> > index fbdbe0befeb..17a85957877 100644
> > --- a/Documentation/git-range-diff.txt
> > +++ b/Documentation/git-range-diff.txt
> > @@ -81,6 +81,15 @@ to revert to color all lines according to the outer diff markers
> >  	Suppress commits that are missing from the second specified range
> >  	(or the "right range" when using the `<rev1>...<rev2>` format).
> >
> > +--diff-merges=<format>::
> > +	Instead of ignoring merge commits, generate diffs for them using the
> > +	corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
> > +	and include them in the comparison.
> > ++
> > +Note: In the common case, the `first-parent` mode will be the most natural one
> > +to use, as it is consistent with the idea that a merge is kind of a "meta
> > +patch", comprising all the merged commits' patches into a single one.
>
> I think I agree with Elijah that we probably should also highlight at least
> 'remerge'.
>
> Also, is it worth making this a proper Asciidoc "[NOTE]" ? (I'm not sure, there are
> a lot of "Notes:" in our doc that are not Asciidoc "[NOTE]"s.

Right, I did not want to deviate too much from the surrounding style.

Besides, I am not _so_ versed in AsciiDoc, I consider Markdown to be much
more common, so I am much more familiar with that. I had no idea that
there was a proper AsciiDoc [NOTE].

> > diff --git a/builtin/range-diff.c b/builtin/range-diff.c
> > index 1b33ab66a7b..901de5d133d 100644
> > --- a/builtin/range-diff.c
> > +++ b/builtin/range-diff.c
>
> The changes look good to me. Maybe it would be nice to add a corresponding
> 'range-diff.diffMerges' config option to allow users to configure the
> behaviour more permanently ?

Seeing as there are no existing `rangeDiff.*` options, I am loathe to
introduce the first one lest I am asked why I don't balloon this patch
series into introducing config settings for the other options, too.

Ciao,
Johannes
Junio C Hamano Nov. 12, 2024, 12:29 a.m. UTC | #7
Elijah Newren <newren@gmail.com> writes:

>> There _may_ need a tweak of the matching algorithm to allow the
>> "same" merge on both sides to match, even if they diverge a lot,
>> though.  A range-diff that pairs a merge in the previous iteration
>> with a single patch in the updated iteration may be hard to read.
>
> Sounds like you are arguing that there is an angle/usecase from which
> `first-parent` makes sense, namely the viewpoint that "a merge is
> merely bringing in lots of changes into the mainline as a single unit"
> as you put it.  What was surprising to me, though, is that it's a
> completely different usecase than the one that was brought up in the
> commit message for this feature, namely "so-called 'evil merges' are
> sometimes necessary and need to be reviewed too".

What I had in mind when I wrote the example you are responding to is
based on what sometimes happens while I make repeated merges (and as
you may know, I make lots of them).  In the first attempt I miss the
fact that I need semantic adjustments and then in the second attempt
I know what additional changes are necessary in the same merge (i.e.
merging exactly the same iteration of the same topic).  If you run
the first-parent range-diff between one iteration of 'seen' and
another, the "additional changes" I would make in the second attempt
would be the only thing that will appear in the output, showing the
"evil merge".

There can also be updates in the topic itself when I rebuild 'seen',
in addition to merge-fixes to adjust for semantic conflicts.  Such a
change would also appear in the first-parent view.  If you used
other views, like dense-combined or remerge-diff, these updates in
the topic itself may be hidden, as these other views are
specifically designed to highlight conflict resolutions and evil
merges by discarding mechanically resolvable changes.  So it all
depends on what you are looking for and what you are deliberately
excluding from the lower level of diff generation when you prepare
your range-diff, which is a "diff of diff".  Giving an impression
that first-parent is the most useful may risk misleading readers in
that sense.
diff mbox series

Patch

diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt
index fbdbe0befeb..17a85957877 100644
--- a/Documentation/git-range-diff.txt
+++ b/Documentation/git-range-diff.txt
@@ -10,7 +10,7 @@  SYNOPSIS
 [verse]
 'git range-diff' [--color=[<when>]] [--no-color] [<diff-options>]
 	[--no-dual-color] [--creation-factor=<factor>]
-	[--left-only | --right-only]
+	[--left-only | --right-only] [--diff-merges=<format>]
 	( <range1> <range2> | <rev1>...<rev2> | <base> <rev1> <rev2> )
 	[[--] <path>...]
 
@@ -81,6 +81,15 @@  to revert to color all lines according to the outer diff markers
 	Suppress commits that are missing from the second specified range
 	(or the "right range" when using the `<rev1>...<rev2>` format).
 
+--diff-merges=<format>::
+	Instead of ignoring merge commits, generate diffs for them using the
+	corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
+	and include them in the comparison.
++
+Note: In the common case, the `first-parent` mode will be the most natural one
+to use, as it is consistent with the idea that a merge is kind of a "meta
+patch", comprising all the merged commits' patches into a single one.
+
 --[no-]notes[=<ref>]::
 	This flag is passed to the `git log` program
 	(see linkgit:git-log[1]) that generates the patches.
diff --git a/builtin/range-diff.c b/builtin/range-diff.c
index 1b33ab66a7b..901de5d133d 100644
--- a/builtin/range-diff.c
+++ b/builtin/range-diff.c
@@ -21,6 +21,7 @@  int cmd_range_diff(int argc,
 {
 	struct diff_options diffopt = { NULL };
 	struct strvec other_arg = STRVEC_INIT;
+	struct strvec diff_merges_arg = STRVEC_INIT;
 	struct range_diff_options range_diff_opts = {
 		.creation_factor = RANGE_DIFF_CREATION_FACTOR_DEFAULT,
 		.diffopt = &diffopt,
@@ -36,6 +37,8 @@  int cmd_range_diff(int argc,
 		OPT_PASSTHRU_ARGV(0, "notes", &other_arg,
 				  N_("notes"), N_("passed to 'git log'"),
 				  PARSE_OPT_OPTARG),
+		OPT_PASSTHRU_ARGV(0, "diff-merges", &diff_merges_arg,
+				  N_("style"), N_("passed to 'git log'"), 0),
 		OPT_BOOL(0, "left-only", &left_only,
 			 N_("only emit output related to the first range")),
 		OPT_BOOL(0, "right-only", &right_only,
@@ -62,6 +65,12 @@  int cmd_range_diff(int argc,
 	if (!simple_color)
 		diffopt.use_color = 1;
 
+	/* If `--diff-merges` was specified, imply `--merges` */
+	if (diff_merges_arg.nr) {
+		range_diff_opts.include_merges = 1;
+		strvec_pushv(&other_arg, diff_merges_arg.v);
+	}
+
 	for (i = 0; i < argc; i++)
 		if (!strcmp(argv[i], "--")) {
 			dash_dash = i;
@@ -155,6 +164,7 @@  int cmd_range_diff(int argc,
 	res = show_range_diff(range1.buf, range2.buf, &range_diff_opts);
 
 	strvec_clear(&other_arg);
+	strvec_clear(&diff_merges_arg);
 	strbuf_release(&range1);
 	strbuf_release(&range2);
 
diff --git a/range-diff.c b/range-diff.c
index bbb0952264b..9e59733059b 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -38,7 +38,8 @@  struct patch_util {
  * as struct object_id (will need to be free()d).
  */
 static int read_patches(const char *range, struct string_list *list,
-			const struct strvec *other_arg)
+			const struct strvec *other_arg,
+			unsigned int include_merges)
 {
 	struct child_process cp = CHILD_PROCESS_INIT;
 	struct strbuf buf = STRBUF_INIT, contents = STRBUF_INIT;
@@ -49,7 +50,7 @@  static int read_patches(const char *range, struct string_list *list,
 	size_t size;
 	int ret = -1;
 
-	strvec_pushl(&cp.args, "log", "--no-color", "-p", "--no-merges",
+	strvec_pushl(&cp.args, "log", "--no-color", "-p",
 		     "--reverse", "--date-order", "--decorate=no",
 		     "--no-prefix", "--submodule=short",
 		     /*
@@ -64,6 +65,8 @@  static int read_patches(const char *range, struct string_list *list,
 		     "--pretty=medium",
 		     "--show-notes-by-default",
 		     NULL);
+	if (!include_merges)
+		strvec_push(&cp.args, "--no-merges");
 	strvec_push(&cp.args, range);
 	if (other_arg)
 		strvec_pushv(&cp.args, other_arg->v);
@@ -96,11 +99,14 @@  static int read_patches(const char *range, struct string_list *list,
 		}
 
 		if (skip_prefix(line, "commit ", &p)) {
+			char *q;
 			if (util) {
 				string_list_append(list, buf.buf)->util = util;
 				strbuf_reset(&buf);
 			}
 			CALLOC_ARRAY(util, 1);
+			if (include_merges && (q = strstr(p, " (from ")))
+				*q = '\0';
 			if (repo_get_oid(the_repository, p, &util->oid)) {
 				error(_("could not parse commit '%s'"), p);
 				FREE_AND_NULL(util);
@@ -571,13 +577,14 @@  int show_range_diff(const char *range1, const char *range2,
 
 	struct string_list branch1 = STRING_LIST_INIT_DUP;
 	struct string_list branch2 = STRING_LIST_INIT_DUP;
+	unsigned int include_merges = range_diff_opts->include_merges;
 
 	if (range_diff_opts->left_only && range_diff_opts->right_only)
 		res = error(_("options '%s' and '%s' cannot be used together"), "--left-only", "--right-only");
 
-	if (!res && read_patches(range1, &branch1, range_diff_opts->other_arg))
+	if (!res && read_patches(range1, &branch1, range_diff_opts->other_arg, include_merges))
 		res = error(_("could not parse log for '%s'"), range1);
-	if (!res && read_patches(range2, &branch2, range_diff_opts->other_arg))
+	if (!res && read_patches(range2, &branch2, range_diff_opts->other_arg, include_merges))
 		res = error(_("could not parse log for '%s'"), range2);
 
 	if (!res) {
diff --git a/range-diff.h b/range-diff.h
index 2f69f6a434d..cd85000b5a0 100644
--- a/range-diff.h
+++ b/range-diff.h
@@ -16,6 +16,7 @@  struct range_diff_options {
 	int creation_factor;
 	unsigned dual_color:1;
 	unsigned left_only:1, right_only:1;
+	unsigned include_merges:1;
 	const struct diff_options *diffopt; /* may be NULL */
 	const struct strvec *other_arg; /* may be NULL */
 };
diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
index 86010931ab6..c18a3fdab83 100755
--- a/t/t3206-range-diff.sh
+++ b/t/t3206-range-diff.sh
@@ -909,4 +909,20 @@  test_expect_success 'submodule changes are shown irrespective of diff.submodule'
 	test_cmp expect actual
 '
 
+test_expect_success '--diff-merges' '
+	renamed_oid=$(git rev-parse --short renamed-file) &&
+	tree=$(git merge-tree unmodified renamed-file) &&
+	clean=$(git commit-tree -m merge -p unmodified -p renamed-file $tree) &&
+	clean_oid=$(git rev-parse --short $clean) &&
+	conflict=$(git commit-tree -m merge -p unmodified -p renamed-file^ $tree) &&
+	conflict_oid=$(git rev-parse --short $conflict) &&
+
+	git range-diff --diff-merges=1 $clean...$conflict >actual &&
+	cat >expect <<-EOF &&
+	1:  $renamed_oid < -:  ------- s/12/B/
+	2:  $clean_oid = 1:  $conflict_oid merge
+	EOF
+	test_cmp expect actual
+'
+
 test_done