diff mbox series

range-diff: optionally include merge commits' diffs in the analysis

Message ID pull.1734.git.1731000007391.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series range-diff: optionally include merge commits' diffs in the analysis | expand

Commit Message

Johannes Schindelin Nov. 7, 2024, 5:20 p.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

The `git log` command already offers support for including diffs for
merges, via the `--diff-merges=<format>` option.

Let's add corresponding support for `git range-diff`, too. This makes it
more convenient to spot differences between iterations of non-linear
contributions, where so-called "evil merges" are sometimes necessary and
need to be reviewed, too.

In my code reviews, I found the `--diff-merges=first-parent` option
particularly useful.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
    Support diff merges option in range diff
    
    The git range-diff command does the same with merge commits as git
    rebase: It ignores them.
    
    However, when comparing branch thickets it can be quite illuminating to
    watch out for inadvertent changes in merge commits, in particular when
    some "evil" merges have been replayed, i.e. merges that needed to
    introduce changes outside of the merge conflicts (e.g. when one branch
    changed a function's signature and another branch introduced a caller of
    said function), in case the replayed merge is no longer "evil" and
    therefore potentially incorrect.
    
    Let's introduce support for the --diff-merges option that is passed
    through to those git log commands.
    
    I had a need for this earlier this year and got it working, leaving the
    GitGitGadget PR in a draft mode. Phil Blain found it and kindly
    nerd-sniped me into readying it for submitting, so say thanks to Phil!

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1734%2Fdscho%2Fsupport-diff-merges-option-in-range-diff-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1734/dscho/support-diff-merges-option-in-range-diff-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1734

 Documentation/git-range-diff.txt | 10 +++++++++-
 builtin/range-diff.c             | 11 +++++++++++
 range-diff.c                     | 15 +++++++++++----
 range-diff.h                     |  1 +
 t/t3206-range-diff.sh            | 16 ++++++++++++++++
 5 files changed, 48 insertions(+), 5 deletions(-)


base-commit: 777489f9e09c8d0dd6b12f9d90de6376330577a2

Comments

Junio C Hamano Nov. 8, 2024, 3:46 a.m. UTC | #1
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> Let's add corresponding support for `git range-diff`, too. This makes it
> more convenient to spot differences between iterations of non-linear
> contributions, where so-called "evil merges" are sometimes necessary and
> need to be reviewed, too.

Nice addition.

As the "diff of the diff" outer diff part would be two-"blob" diff,
we won't have to worry about this option causing confusions to
users, unlike things like "-U8", to which layer of diffs the option
would apply.

Will queue.  Thanks.
Johannes Sixt Nov. 8, 2024, 6:53 a.m. UTC | #2
Am 07.11.24 um 18:20 schrieb Johannes Schindelin via GitGitGadget:
> +--diff-merges=<format>::
> +	Instead of ignoring merge commits, generate diffs for them using the
> +	corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
> +	and include them in the comparison.
> ++
> +Note: Some of the formats supported by linkgit:git-log[1] make less sense in
> +the context of the `range-diff` command than other formats, so choose wisely!

Can we please be a bit more specific which options are usable or which
are not usable? Perhaps you even mean to say that 'first-parent' is the
only one that makes sense? At a minimum, we should mention that it is
the one that makes most sense (if that is the case).

-- Hannes
Johannes Schindelin Nov. 8, 2024, 10:53 a.m. UTC | #3
Hi Hannes

On Fri, 8 Nov 2024, Johannes Sixt wrote:

> Am 07.11.24 um 18:20 schrieb Johannes Schindelin via GitGitGadget:
> > +--diff-merges=<format>::
> > +	Instead of ignoring merge commits, generate diffs for them using the
> > +	corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
> > +	and include them in the comparison.
> > ++
> > +Note: Some of the formats supported by linkgit:git-log[1] make less sense in
> > +the context of the `range-diff` command than other formats, so choose wisely!
>
> Can we please be a bit more specific which options are usable or which
> are not usable?

There are a couple of reasons:

- Most importantly: I had not even studied that question in depth when I
  wrote the original patch, and not even when I submitted the first
  iteration.

  In my reviews of branch thickets, I used `--diff-merges=1` and it served
  my needs well. Therefore I can speak to this mode, but not really to the
  other modes.

- Which options are usable or not may lack a universally-correct answer.

  One person might find `--diff-merges=separate` confusing while the next
  person may find it quite useful.

- If the documentation points out an unsuitable mode, then an obvious
  follow-up wish would be to reject this mode in the command, with an
  error message. Some users might, however, need precisely this mode, so
  I am loathe to do that.

Maybe this compromise: Change the "Note:" to recommend the `first-parent`
mode, with a bit more explanation why that is so (it is consistent with
the idea that a merge is kind of a "meta patch", comprising all the merged
commits' patches, which is equivalent to the first-parent diff).

> Perhaps you even mean to say that 'first-parent' is the only one that
> makes sense?

I would not go _so_ far as to say that.

Before submitting the patch yesterday, I played a little with a few
modes, using the commit graph as per the new t3206 test case.

What that test case range-diffs looks like this:

 - * - changeA - changeB ------------
     \                     \          \
      \                      conflict  \
       \                   /            \
         rename - changeA - changeB ----- clean

In other words, the main branch modifies a file's contents twice, the
topic branch renames it first, then makes the same modifications.

The clean merge simply merges the topic, which makes the main branch
tree-same to the topic branch.

The conflicting merge misses the tip commit of the topic branch, i.e.
changeB on the renamed file. The result is tree-same to the clean merge
because changeB on the non-renamed file is already part of the main
branch.

Out of habit, I used `--diff-merge=first-parent` in the new test case,
whose output reiterates what I just said: both merges introduce the same
first-parent diff (renaming the file, no other changes):

	1:  1e6226b < -:  ------- s/12/B/
	2:  043e738 = 1:  6ddec15 merge

What about `separate`?


	2:  043e738 = 1:  6ddec15 merge
	1:  1e6226b ! 2:  6ddec15 s/12/B/
	    @@
	      ## Metadata ##
	    -Author: Thomas Rast <trast@inf.ethz.ch>
	    +Author: A U Thor <author@example.com>

	      ## Commit message ##
	    -    s/12/B/
	    +    merge

	      ## renamed-file ##
	     @@ renamed-file: A

This might look confusing at first because there are two merge diffs on
the right-hand side, but only one on the left-hand side. But that is all
good because the diff of the clean merge between merge head and merge is
empty and therefore not shown. And the `range-diff` demonstrates that the
conflicting merge had to recapitulate the tip commit of the `renamed-file`
topic branch.

The `combined` and `dense-combined` modes produce identical output to
`first-parent` in this instance, which is expected.

Now, let's use `remerge`, which should be illuminating with regards to
"evil merges" [*1*], as it shows what was done on top of the automatic
merge:

	1:  1e6226b < -:  ------- s/12/B/
	2:  043e738 < -:  ------- merge
	-:  ------- > 1:  6ddec15 merge

Huh. That is a bit surprising at first. Let's use a higher creation factor
to ask `range-diff` to match correspondences more loosely (I used 999,
which kinda forces even non-sensical pairings):

	1:  1e6226b ! 1:  6ddec15 s/12/B/
	    @@
	      ## Metadata ##
	    -Author: Thomas Rast <trast@inf.ethz.ch>
	    +Author: A U Thor <author@example.com>

	      ## Commit message ##
	    -    s/12/B/
	    +    merge

	      ## renamed-file ##
	    + remerge CONFLICT (content): Merge conflict in renamed-file
	    + index 3f1c0da..25ddf7a 100644
	    + --- renamed-file
	    + +++ renamed-file
	     @@ renamed-file: A
	      9
	      10
	      B
	    +-<<<<<<< 2901f77 (s/12/B/):file
	    + B
	    +-=======
	     -12
	    -+B
	    +->>>>>>> 3ce7af6 (s/11/B/):renamed-file
	      13
	      14
	      15
	2:  043e738 < -:  ------- merge

(Note that if you repeat this experiment, you will see something different
due to a bug in `--remerge-diff` that I will fix separately.)

Hold on, that looks wrong! It is actually not wrong, `range-diff` just
finds the changeB a better pairing than the merge commit with an empty
diff...

So: This mode can be useful to detect where merge conflicts had to be
resolved.

In short, I think that in `range-diff`'s context all of these modes have
merit, depending on a particular use case. It's just that `first-parent`
is probably the most natural to use in the common case.

> At a minimum, we should mention that it is the one that makes most sense
> (if that is the case).

Yeah, I'll go with that and drop saying anything about other modes.

Ciao,
Johannes

Footnote *1*: We should really find a much better name for this than "evil
merge". There is nothing evil about me having to add `#include
"environment.h"` in v2.45.1^, for example. It was necessary so that the
code would build. Tedious, yes, but not evil.
Junio C Hamano Nov. 8, 2024, 11:56 a.m. UTC | #4
Johannes Sixt <j6t@kdbg.org> writes:

> Am 07.11.24 um 18:20 schrieb Johannes Schindelin via GitGitGadget:
>> +--diff-merges=<format>::
>> +	Instead of ignoring merge commits, generate diffs for them using the
>> +	corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
>> +	and include them in the comparison.
>> ++
>> +Note: Some of the formats supported by linkgit:git-log[1] make less sense in
>> +the context of the `range-diff` command than other formats, so choose wisely!
>
> Can we please be a bit more specific which options are usable or which
> are not usable? Perhaps you even mean to say that 'first-parent' is the
> only one that makes sense? At a minimum, we should mention that it is
> the one that makes most sense (if that is the case).

Good suggestion.

I am curious to see how well things like "--cc" and "--remerge-diff"
fare in this use case.  I suspect that in a case where the original
round forgot to make necessary semantic conflict resolutions (aka
"evil merge") that the new round has, all of them (including the
"--first-parent") would essentially show the same change, but what
is shown in such a case would be more readable with "--first-parent"
by treating a merge as if it were just a single parent commit making
a lot of changes, iow, merely one among other commits with equal
footing.
diff mbox series

Patch

diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt
index fbdbe0befeb..a964e856c3c 100644
--- a/Documentation/git-range-diff.txt
+++ b/Documentation/git-range-diff.txt
@@ -10,7 +10,7 @@  SYNOPSIS
 [verse]
 'git range-diff' [--color=[<when>]] [--no-color] [<diff-options>]
 	[--no-dual-color] [--creation-factor=<factor>]
-	[--left-only | --right-only]
+	[--left-only | --right-only] [--diff-merges=<format>]
 	( <range1> <range2> | <rev1>...<rev2> | <base> <rev1> <rev2> )
 	[[--] <path>...]
 
@@ -81,6 +81,14 @@  to revert to color all lines according to the outer diff markers
 	Suppress commits that are missing from the second specified range
 	(or the "right range" when using the `<rev1>...<rev2>` format).
 
+--diff-merges=<format>::
+	Instead of ignoring merge commits, generate diffs for them using the
+	corresponding `--diff-merges=<format>` option of linkgit:git-log[1],
+	and include them in the comparison.
++
+Note: Some of the formats supported by linkgit:git-log[1] make less sense in
+the context of the `range-diff` command than other formats, so choose wisely!
+
 --[no-]notes[=<ref>]::
 	This flag is passed to the `git log` program
 	(see linkgit:git-log[1]) that generates the patches.
diff --git a/builtin/range-diff.c b/builtin/range-diff.c
index 1b33ab66a7b..e41719e0f0d 100644
--- a/builtin/range-diff.c
+++ b/builtin/range-diff.c
@@ -21,6 +21,7 @@  int cmd_range_diff(int argc,
 {
 	struct diff_options diffopt = { NULL };
 	struct strvec other_arg = STRVEC_INIT;
+	struct strvec diff_merges_arg = STRVEC_INIT;
 	struct range_diff_options range_diff_opts = {
 		.creation_factor = RANGE_DIFF_CREATION_FACTOR_DEFAULT,
 		.diffopt = &diffopt,
@@ -36,6 +37,9 @@  int cmd_range_diff(int argc,
 		OPT_PASSTHRU_ARGV(0, "notes", &other_arg,
 				  N_("notes"), N_("passed to 'git log'"),
 				  PARSE_OPT_OPTARG),
+		OPT_PASSTHRU_ARGV(0, "diff-merges", &diff_merges_arg,
+				  N_("style"), N_("passed to 'git log'"),
+				  PARSE_OPT_OPTARG),
 		OPT_BOOL(0, "left-only", &left_only,
 			 N_("only emit output related to the first range")),
 		OPT_BOOL(0, "right-only", &right_only,
@@ -62,6 +66,12 @@  int cmd_range_diff(int argc,
 	if (!simple_color)
 		diffopt.use_color = 1;
 
+	/* If `--diff-merges` was specified, imply `--merges` */
+	if (diff_merges_arg.nr) {
+		range_diff_opts.include_merges = 1;
+		strvec_pushv(&other_arg, diff_merges_arg.v);
+	}
+
 	for (i = 0; i < argc; i++)
 		if (!strcmp(argv[i], "--")) {
 			dash_dash = i;
@@ -155,6 +165,7 @@  int cmd_range_diff(int argc,
 	res = show_range_diff(range1.buf, range2.buf, &range_diff_opts);
 
 	strvec_clear(&other_arg);
+	strvec_clear(&diff_merges_arg);
 	strbuf_release(&range1);
 	strbuf_release(&range2);
 
diff --git a/range-diff.c b/range-diff.c
index bbb0952264b..9e59733059b 100644
--- a/range-diff.c
+++ b/range-diff.c
@@ -38,7 +38,8 @@  struct patch_util {
  * as struct object_id (will need to be free()d).
  */
 static int read_patches(const char *range, struct string_list *list,
-			const struct strvec *other_arg)
+			const struct strvec *other_arg,
+			unsigned int include_merges)
 {
 	struct child_process cp = CHILD_PROCESS_INIT;
 	struct strbuf buf = STRBUF_INIT, contents = STRBUF_INIT;
@@ -49,7 +50,7 @@  static int read_patches(const char *range, struct string_list *list,
 	size_t size;
 	int ret = -1;
 
-	strvec_pushl(&cp.args, "log", "--no-color", "-p", "--no-merges",
+	strvec_pushl(&cp.args, "log", "--no-color", "-p",
 		     "--reverse", "--date-order", "--decorate=no",
 		     "--no-prefix", "--submodule=short",
 		     /*
@@ -64,6 +65,8 @@  static int read_patches(const char *range, struct string_list *list,
 		     "--pretty=medium",
 		     "--show-notes-by-default",
 		     NULL);
+	if (!include_merges)
+		strvec_push(&cp.args, "--no-merges");
 	strvec_push(&cp.args, range);
 	if (other_arg)
 		strvec_pushv(&cp.args, other_arg->v);
@@ -96,11 +99,14 @@  static int read_patches(const char *range, struct string_list *list,
 		}
 
 		if (skip_prefix(line, "commit ", &p)) {
+			char *q;
 			if (util) {
 				string_list_append(list, buf.buf)->util = util;
 				strbuf_reset(&buf);
 			}
 			CALLOC_ARRAY(util, 1);
+			if (include_merges && (q = strstr(p, " (from ")))
+				*q = '\0';
 			if (repo_get_oid(the_repository, p, &util->oid)) {
 				error(_("could not parse commit '%s'"), p);
 				FREE_AND_NULL(util);
@@ -571,13 +577,14 @@  int show_range_diff(const char *range1, const char *range2,
 
 	struct string_list branch1 = STRING_LIST_INIT_DUP;
 	struct string_list branch2 = STRING_LIST_INIT_DUP;
+	unsigned int include_merges = range_diff_opts->include_merges;
 
 	if (range_diff_opts->left_only && range_diff_opts->right_only)
 		res = error(_("options '%s' and '%s' cannot be used together"), "--left-only", "--right-only");
 
-	if (!res && read_patches(range1, &branch1, range_diff_opts->other_arg))
+	if (!res && read_patches(range1, &branch1, range_diff_opts->other_arg, include_merges))
 		res = error(_("could not parse log for '%s'"), range1);
-	if (!res && read_patches(range2, &branch2, range_diff_opts->other_arg))
+	if (!res && read_patches(range2, &branch2, range_diff_opts->other_arg, include_merges))
 		res = error(_("could not parse log for '%s'"), range2);
 
 	if (!res) {
diff --git a/range-diff.h b/range-diff.h
index 2f69f6a434d..cd85000b5a0 100644
--- a/range-diff.h
+++ b/range-diff.h
@@ -16,6 +16,7 @@  struct range_diff_options {
 	int creation_factor;
 	unsigned dual_color:1;
 	unsigned left_only:1, right_only:1;
+	unsigned include_merges:1;
 	const struct diff_options *diffopt; /* may be NULL */
 	const struct strvec *other_arg; /* may be NULL */
 };
diff --git a/t/t3206-range-diff.sh b/t/t3206-range-diff.sh
index 86010931ab6..c18a3fdab83 100755
--- a/t/t3206-range-diff.sh
+++ b/t/t3206-range-diff.sh
@@ -909,4 +909,20 @@  test_expect_success 'submodule changes are shown irrespective of diff.submodule'
 	test_cmp expect actual
 '
 
+test_expect_success '--diff-merges' '
+	renamed_oid=$(git rev-parse --short renamed-file) &&
+	tree=$(git merge-tree unmodified renamed-file) &&
+	clean=$(git commit-tree -m merge -p unmodified -p renamed-file $tree) &&
+	clean_oid=$(git rev-parse --short $clean) &&
+	conflict=$(git commit-tree -m merge -p unmodified -p renamed-file^ $tree) &&
+	conflict_oid=$(git rev-parse --short $conflict) &&
+
+	git range-diff --diff-merges=1 $clean...$conflict >actual &&
+	cat >expect <<-EOF &&
+	1:  $renamed_oid < -:  ------- s/12/B/
+	2:  $clean_oid = 1:  $conflict_oid merge
+	EOF
+	test_cmp expect actual
+'
+
 test_done