diff mbox series

[v3,09/10] builtin/diff-tree: learn --merge-base

Message ID c0d27b125e969e13c52b0fa806a8e3caa8c20ac6.1600328336.git.liu.denton@gmail.com
State Superseded
Headers show
Series builtin/diff: learn --merge-base | expand

Commit Message

Denton Liu Sept. 17, 2020, 7:44 a.m. UTC
In order to get the diff between a commit and its merge base, the
currently preferred method is to use `git diff A...B`. However, the
range-notation with diff has, time and time again, been noted as a point
of confusion and thus, it should be avoided. Although we have a
substitute for the double-dot notation, we don't have any replacement
for the triple-dot notation.

Introduce the --merge-base flag as a replacement for triple-dot
notation. Thus, we would be able to write the above as
`git diff --merge-base A B`, allowing us to gently deprecate
range-notation completely.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
---
 Documentation/git-diff-tree.txt      |  7 ++++-
 Documentation/git-diff.txt           |  9 ++++---
 builtin/diff-tree.c                  | 18 +++++++++++++
 builtin/diff.c                       | 40 +++++++++++++++++++---------
 t/t4068-diff-symmetric-merge-base.sh | 34 +++++++++++++++++++++++
 5 files changed, 91 insertions(+), 17 deletions(-)

Comments

Junio C Hamano Sept. 17, 2020, 6:23 p.m. UTC | #1
Denton Liu <liu.denton@gmail.com> writes:

> +	if (read_stdin && merge_base)
> +		die(_("--stdin and --merge-base are mutually exclusive"));
> +
> +	if (merge_base) {
> +		struct object_id oid;
> +
> +		if (opt->pending.nr != 2)
> +			die(_("--merge-base only works with two commits"));
> +
> +		diff_get_merge_base(opt, &oid);
> +		opt->pending.objects[0].item = lookup_object(the_repository, &oid);
> +	}
> +

This looks quite straight-forward.

> -	/*
> -	 * We saw two trees, ent0 and ent1.  If ent1 is uninteresting,
> -	 * swap them.
> -	 */
> -	if (ent1->item->flags & UNINTERESTING)
> -		swap = 1;
> -	oid[swap] = &ent0->item->oid;
> -	oid[1 - swap] = &ent1->item->oid;
> +	if (merge_base) {
> +		diff_get_merge_base(revs, &mb_oid);
> +		oid[0] = &mb_oid;
> +		oid[1] = &revs->pending.objects[1].item->oid;
> +	} else {
> +		int swap = 0;
> +
> +		/*
> +		 * We saw two trees, ent0 and ent1.  If ent1 is uninteresting,
> +		 * swap them.
> +		 */
> +		if (ent1->item->flags & UNINTERESTING)
> +			swap = 1;
> +		oid[swap] = &ent0->item->oid;
> +		oid[1 - swap] = &ent1->item->oid;
> +	}

It is not entirely clear why the original has to become an [else]
clause here, unlike the change we saw earlier in cmd_diff_tree().
It feels quite inconsistent.

Thanks.
Denton Liu Sept. 18, 2020, 10:48 a.m. UTC | #2
On Thu, Sep 17, 2020 at 11:23:54AM -0700, Junio C Hamano wrote:
> Denton Liu <liu.denton@gmail.com> writes:
> 
> > +	if (read_stdin && merge_base)
> > +		die(_("--stdin and --merge-base are mutually exclusive"));
> > +
> > +	if (merge_base) {
> > +		struct object_id oid;
> > +
> > +		if (opt->pending.nr != 2)
> > +			die(_("--merge-base only works with two commits"));
> > +
> > +		diff_get_merge_base(opt, &oid);
> > +		opt->pending.objects[0].item = lookup_object(the_repository, &oid);
> > +	}
> > +
> 
> This looks quite straight-forward.
> 
> > -	/*
> > -	 * We saw two trees, ent0 and ent1.  If ent1 is uninteresting,
> > -	 * swap them.
> > -	 */
> > -	if (ent1->item->flags & UNINTERESTING)
> > -		swap = 1;
> > -	oid[swap] = &ent0->item->oid;
> > -	oid[1 - swap] = &ent1->item->oid;
> > +	if (merge_base) {
> > +		diff_get_merge_base(revs, &mb_oid);
> > +		oid[0] = &mb_oid;
> > +		oid[1] = &revs->pending.objects[1].item->oid;
> > +	} else {
> > +		int swap = 0;
> > +
> > +		/*
> > +		 * We saw two trees, ent0 and ent1.  If ent1 is uninteresting,
> > +		 * swap them.
> > +		 */
> > +		if (ent1->item->flags & UNINTERESTING)
> > +			swap = 1;
> > +		oid[swap] = &ent0->item->oid;
> > +		oid[1 - swap] = &ent1->item->oid;
> > +	}
> 
> It is not entirely clear why the original has to become an [else]
> clause here, unlike the change we saw earlier in cmd_diff_tree().
> It feels quite inconsistent.

Since we're only interested in the oids, I thought that it would be
possible to save a lookup_object() and just use the oids directly. If
it's clearer, this can be written as something like this but the lookup
feels unnecessary:

	/*
	 * We saw two trees, ent0 and ent1.  If ent1 is uninteresting,
	 * swap them.
	 */
	if (ent1->item->flags & UNINTERESTING)
		swap = 1;

	if (merge_base) {
		struct object_id mb_oid;
		if (swap)
			BUG("swap is unexpectedly set");
		if (diff_get_merge_base(revs, &mb_oid))
			exit(128);
		ent0->item = lookup_object(the_repository, &mb_oid);
	}


	oid[swap] = &ent0->item->oid;
	oid[1 - swap] = &ent1->item->oid;

Thanks,
Denton
Junio C Hamano Sept. 18, 2020, 4:52 p.m. UTC | #3
Denton Liu <liu.denton@gmail.com> writes:

> Since we're only interested in the oids, I thought that it would be
> possible to save a lookup_object() and just use the oids directly. If
> it's clearer, this can be written as something like this but the lookup
> feels unnecessary:

When running the tree diff, we'd need the object anyway, and the
result of the look-up made here is cached, right?

That is why I expected it would just be an insertion before the
existing code, like the other side.

But the existing "if we got either ^A B or B ^A, treat it as A..B"
logic is just like "if we got '--merge-base A B', treat it as
something else" we are adding, and they (and any future such special
syntax) should not interact with each other.  So in that sense, the
code structure you have in the originally posted patch (not the code
snippet in your message I am responding to) that does

    ...
    if (using merge-base feature) {
	do the merge base thing to populate oid[]
    } else if (user used A..B) {
	ensure "^A B" and "B ^A" both have A in oid[0] and B in oid[1]
    }
    ...
    call diff-tree between oid[0] and oid[1]

makes a lot more sense than anything else we discussed so far.

I wonder if turning the builtin/diff-tree.c to match that structure
make the result easier to understand (and I'll be perfectly happy if
the answer to this question turns out to be "no, the result of the
posted patch is the easiest to follow").

Thanks.
Denton Liu Sept. 20, 2020, 11:01 a.m. UTC | #4
Hi Junio,

On Fri, Sep 18, 2020 at 09:52:39AM -0700, Junio C Hamano wrote:
> I wonder if turning the builtin/diff-tree.c to match that structure
> make the result easier to understand (and I'll be perfectly happy if
> the answer to this question turns out to be "no, the result of the
> posted patch is the easiest to follow").

git diff-tree does not even recognise ranges so as a result, the else
case does not even need to exist there, unlike in git diff.
Junio C Hamano Sept. 21, 2020, 4:05 p.m. UTC | #5
Denton Liu <liu.denton@gmail.com> writes:

> Hi Junio,
>
> On Fri, Sep 18, 2020 at 09:52:39AM -0700, Junio C Hamano wrote:
>> I wonder if turning the builtin/diff-tree.c to match that structure
>> make the result easier to understand (and I'll be perfectly happy if
>> the answer to this question turns out to be "no, the result of the
>> posted patch is the easiest to follow").
>
> git diff-tree does not even recognise ranges so as a result, the else
> case does not even need to exist there, unlike in git diff.

(caution: before morning caffeine so what I say may be totally off)

Do you mean "git diff-tree HEAD^..HEAD" would fail, or something
else?

Thanks.
Denton Liu Sept. 21, 2020, 5:27 p.m. UTC | #6
Hi Junio,

On Mon, Sep 21, 2020 at 09:05:26AM -0700, Junio C Hamano wrote:
> Denton Liu <liu.denton@gmail.com> writes:
> 
> > Hi Junio,
> >
> > On Fri, Sep 18, 2020 at 09:52:39AM -0700, Junio C Hamano wrote:
> >> I wonder if turning the builtin/diff-tree.c to match that structure
> >> make the result easier to understand (and I'll be perfectly happy if
> >> the answer to this question turns out to be "no, the result of the
> >> posted patch is the easiest to follow").
> >
> > git diff-tree does not even recognise ranges so as a result, the else
> > case does not even need to exist there, unlike in git diff.
> 
> (caution: before morning caffeine so what I say may be totally off)
> 
> Do you mean "git diff-tree HEAD^..HEAD" would fail, or something
> else?

Yes, that is what I meant but I can see that what I wrote is totally
wrong. I was reading git-diff-tree.txt and I assumed that ranges were
not supported at all.

Anyway, now that I've realised my mistake, I've rewritten the diff-tree
part so that the structure matches what was written in diff and it
should be easier to follow.

-- >8 --

From: Denton Liu <liu.denton@gmail.com>
Date: Mon, 14 Sep 2020 11:36:52 -0700
Subject: [PATCH] builtin/diff-tree: learn --merge-base

The previous commit introduced ---merge-base a way to take the diff
between the working tree or index and the merge base between an arbitrary
commit and HEAD. It makes sense to extend this option to support the
case where two commits are given too and behave in a manner identical to
`git diff A...B`.

Introduce the --merge-base flag as an alternative to triple-dot
notation. Thus, we would be able to write the above as
`git diff --merge-base A B`.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
---
 Documentation/git-diff-tree.txt      |  7 ++++-
 Documentation/git-diff.txt           |  8 ++++--
 builtin/diff-tree.c                  | 17 +++++++++++-
 builtin/diff.c                       | 39 +++++++++++++++++++---------
 t/t4068-diff-symmetric-merge-base.sh | 34 ++++++++++++++++++++++++
 5 files changed, 89 insertions(+), 16 deletions(-)

diff --git a/Documentation/git-diff-tree.txt b/Documentation/git-diff-tree.txt
index 5c8a2a5e97..2fc24c542f 100644
--- a/Documentation/git-diff-tree.txt
+++ b/Documentation/git-diff-tree.txt
@@ -10,7 +10,7 @@ SYNOPSIS
 --------
 [verse]
 'git diff-tree' [--stdin] [-m] [-s] [-v] [--no-commit-id] [--pretty]
-	      [-t] [-r] [-c | --cc] [--combined-all-paths] [--root]
+	      [-t] [-r] [-c | --cc] [--combined-all-paths] [--root] [--merge-base]
 	      [<common diff options>] <tree-ish> [<tree-ish>] [<path>...]
 
 DESCRIPTION
@@ -43,6 +43,11 @@ include::diff-options.txt[]
 	When `--root` is specified the initial commit will be shown as a big
 	creation event. This is equivalent to a diff against the NULL tree.
 
+--merge-base::
+	Instead of comparing the <tree-ish>s directly, use the merge
+	base between the two <tree-ish>s as the "before" side.  There
+	must be two <tree-ish>s given and they must both be commits.
+
 --stdin::
 	When `--stdin` is specified, the command does not take
 	<tree-ish> arguments from the command line.  Instead, it
diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt
index 762ee6d074..7f4c8a8ce7 100644
--- a/Documentation/git-diff.txt
+++ b/Documentation/git-diff.txt
@@ -11,7 +11,7 @@ SYNOPSIS
 [verse]
 'git diff' [<options>] [<commit>] [--] [<path>...]
 'git diff' [<options>] --cached [--merge-base] [<commit>] [--] [<path>...]
-'git diff' [<options>] <commit> [<commit>...] <commit> [--] [<path>...]
+'git diff' [<options>] [--merge-base] <commit> [<commit>...] <commit> [--] [<path>...]
 'git diff' [<options>] <commit>...<commit> [--] [<path>...]
 'git diff' [<options>] <blob> <blob>
 'git diff' [<options>] --no-index [--] <path> <path>
@@ -62,10 +62,14 @@ of <commit> and HEAD.  `git diff --merge-base A` is equivalent to
 	branch name to compare with the tip of a different
 	branch.
 
-'git diff' [<options>] <commit> <commit> [--] [<path>...]::
+'git diff' [<options>] [--merge-base] <commit> <commit> [--] [<path>...]::
 
 	This is to view the changes between two arbitrary
 	<commit>.
++
+If --merge-base is given, use the merge base of the two commits for the
+"before" side.  `git diff --merge-base A B` is equivalent to
+`git diff $(git merge-base A B) B`.
 
 'git diff' [<options>] <commit> <commit>... <commit> [--] [<path>...]::
 
diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c
index 802363d0a2..9fc95e959f 100644
--- a/builtin/diff-tree.c
+++ b/builtin/diff-tree.c
@@ -111,6 +111,7 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)
 	struct setup_revision_opt s_r_opt;
 	struct userformat_want w;
 	int read_stdin = 0;
+	int merge_base = 0;
 
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage(diff_tree_usage);
@@ -143,9 +144,18 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)
 			read_stdin = 1;
 			continue;
 		}
+		if (!strcmp(arg, "--merge-base")) {
+			merge_base = 1;
+			continue;
+		}
 		usage(diff_tree_usage);
 	}
 
+	if (read_stdin && merge_base)
+		die(_("--stdin and --merge-base are mutually exclusive"));
+	if (merge_base && opt->pending.nr != 2)
+		die(_("--merge-base only works with two commits"));
+
 	/*
 	 * NOTE!  We expect "a..b" to expand to "^a b" but it is
 	 * perfectly valid for revision range parser to yield "b ^a",
@@ -165,7 +175,12 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)
 	case 2:
 		tree1 = opt->pending.objects[0].item;
 		tree2 = opt->pending.objects[1].item;
-		if (tree2->flags & UNINTERESTING) {
+		if (merge_base) {
+			struct object_id oid;
+
+			diff_get_merge_base(opt, &oid);
+			tree1 = lookup_object(the_repository, &oid);
+		} else if (tree2->flags & UNINTERESTING) {
 			SWAP(tree2, tree1);
 		}
 		diff_tree_oid(&tree1->oid, &tree2->oid, "", &opt->diffopt);
diff --git a/builtin/diff.c b/builtin/diff.c
index 1baea18ae0..b50fc68c2a 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -26,7 +26,7 @@
 static const char builtin_diff_usage[] =
 "git diff [<options>] [<commit>] [--] [<path>...]\n"
 "   or: git diff [<options>] --cached [<commit>] [--] [<path>...]\n"
-"   or: git diff [<options>] <commit> [<commit>...] <commit> [--] [<path>...]\n"
+"   or: git diff [<options>] <commit> [--merge-base] [<commit>...] <commit> [--] [<path>...]\n"
 "   or: git diff [<options>] <commit>...<commit>] [--] [<path>...]\n"
 "   or: git diff [<options>] <blob> <blob>]\n"
 "   or: git diff [<options>] --no-index [--] <path> <path>]\n"
@@ -172,19 +172,34 @@ static int builtin_diff_tree(struct rev_info *revs,
 			     struct object_array_entry *ent1)
 {
 	const struct object_id *(oid[2]);
-	int swap = 0;
+	struct object_id mb_oid;
+	int merge_base = 0;
 
-	if (argc > 1)
-		usage(builtin_diff_usage);
+	while (1 < argc) {
+		const char *arg = argv[1];
+		if (!strcmp(arg, "--merge-base"))
+			merge_base = 1;
+		else
+			usage(builtin_diff_usage);
+		argv++; argc--;
+	}
 
-	/*
-	 * We saw two trees, ent0 and ent1.  If ent1 is uninteresting,
-	 * swap them.
-	 */
-	if (ent1->item->flags & UNINTERESTING)
-		swap = 1;
-	oid[swap] = &ent0->item->oid;
-	oid[1 - swap] = &ent1->item->oid;
+	if (merge_base) {
+		diff_get_merge_base(revs, &mb_oid);
+		oid[0] = &mb_oid;
+		oid[1] = &revs->pending.objects[1].item->oid;
+	} else {
+		int swap = 0;
+
+		/*
+		 * We saw two trees, ent0 and ent1.  If ent1 is uninteresting,
+		 * swap them.
+		 */
+		if (ent1->item->flags & UNINTERESTING)
+			swap = 1;
+		oid[swap] = &ent0->item->oid;
+		oid[1 - swap] = &ent1->item->oid;
+	}
 	diff_tree_oid(oid[0], oid[1], "", &revs->diffopt);
 	log_tree_diff_flush(revs);
 	return 0;
diff --git a/t/t4068-diff-symmetric-merge-base.sh b/t/t4068-diff-symmetric-merge-base.sh
index 49432379cb..03487cc945 100755
--- a/t/t4068-diff-symmetric-merge-base.sh
+++ b/t/t4068-diff-symmetric-merge-base.sh
@@ -156,4 +156,38 @@ do
 	'
 done
 
+for cmd in diff-tree diff
+do
+	test_expect_success "$cmd --merge-base with two commits" '
+		git $cmd commit-C master >expect &&
+		git $cmd --merge-base br2 master >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "$cmd --merge-base commit and non-commit" '
+		test_must_fail git $cmd --merge-base br2 master^{tree} 2>err &&
+		test_i18ngrep "fatal: --merge-base only works with commits" err
+	'
+
+	test_expect_success "$cmd --merge-base with no merge bases and two commits" '
+		test_must_fail git $cmd --merge-base br2 br3 2>err &&
+		test_i18ngrep "fatal: no merge base found" err
+	'
+
+	test_expect_success "$cmd --merge-base with multiple merge bases and two commits" '
+		test_must_fail git $cmd --merge-base master br1 2>err &&
+		test_i18ngrep "fatal: multiple merge bases found" err
+	'
+done
+
+test_expect_success 'diff-tree --merge-base with one commit' '
+	test_must_fail git diff-tree --merge-base master 2>err &&
+	test_i18ngrep "fatal: --merge-base only works with two commits" err
+'
+
+test_expect_success 'diff --merge-base with range' '
+	test_must_fail git diff --merge-base br2..br3 2>err &&
+	test_i18ngrep "fatal: --merge-base does not work with ranges" err
+'
+
 test_done
Junio C Hamano Sept. 21, 2020, 9:09 p.m. UTC | #7
Denton Liu <liu.denton@gmail.com> writes:

> @@ -165,7 +175,12 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)
>  	case 2:
>  		tree1 = opt->pending.objects[0].item;
>  		tree2 = opt->pending.objects[1].item;
> -		if (tree2->flags & UNINTERESTING) {
> +		if (merge_base) {
> +			struct object_id oid;
> +
> +			diff_get_merge_base(opt, &oid);
> +			tree1 = lookup_object(the_repository, &oid);
> +		} else if (tree2->flags & UNINTERESTING) {
>  			SWAP(tree2, tree1);
>  		}
>  		diff_tree_oid(&tree1->oid, &tree2->oid, "", &opt->diffopt);

OK.  Handling this in that "case 2" does make sense.

However.

The above code as-is will allow something like

    git diff --merge-base A..B

and it will be taken the same as

    git diff --merge-base A B

But let's step back and think why we bother with SWAP() in the
normal case.  This is due to the possibility that A..B, which
currently is left in the pending.objects[] array as ^A B, might
someday be stored as B ^A.  If we leave that code to protect us from
the possibility, shouldn't we be protecting us from the same
"someday" for the new code, too?  

That is "git diff --merge-base A..B", when the control reaches this
part of the code, may have tree1=B tree2=^A

Which suggests that a consistently written code would look like so:

	tree1 = opt->pending.objects[0].item;
	tree2 = opt->pending.objects[1].item;

	if (tree2->flags & UNINTERESTING)
		/* 
                 * A..B currently becomes ^A B but it is perfectly
		 * ok for revision parser to leave us B ^A; detect
		 * and swap them in the original order.
		 */
		SWAP(tree2, tree1);

	if (merge_base) {
		struct object_id oid;

		diff_get_merge_base(opt, &oid);
		tree1 = lookup_object(the_repository, &oid);
	}
	diff_tree_oid(&tree1->oid, &tree2->oid, "", &opt->diffopt);
	log_tree_diff_flush(opt);

Another possibility is to error out when "--merge-base A..B" is
given, which might be simpler.  Then the code would look more like


	tree1 = ...
	tree2 = ...

	if (merge_base) {
		if ((tree1->flags | tree2->flags) & UNINTERESTING)
			die(_("use of --merge-base with A..B forbidden"));
		... get merge base and assign it to tree1 ...
	} else if (tree2->flags & UNINTERESTING) {
		SWAP();
	}

While we are at it, what happens when "--merge-base A...B" is given?

In the original code without "--merge-base", "git diff-tree A...B"
places the merge base between A and B in pending.objects[0] and B in
pending.objects[1], I think.  "git diff-tree --merge-base A...B"
would further compute the merge base between these two objects, but
luckily $(git merge-base $(merge-base A B) B) is the same as $(git
merge-base A B), so you won't get an incorrect answer from such a
request.  Is this something we want to diagnose as an error?  I am
inclined to say we should allow it (and if it hurts the user can
stop doing so) as there is no harm done.

Thanks.
Junio C Hamano Sept. 21, 2020, 9:19 p.m. UTC | #8
Junio C Hamano <gitster@pobox.com> writes:

> Another possibility is to error out when "--merge-base A..B" is
> given, which might be simpler.  Then the code would look more like
> ...
>
> While we are at it, what happens when "--merge-base A...B" is given?
>
> ...  Is this something we want to diagnose as an error?  I am
> inclined to say we should allow it (and if it hurts the user can
> stop doing so) as there is no harm done.

My recommendation is to allow both "git --merge-base A..B" and "git
--merge-base A...B".  The discussion about A..B and SWAP() would
equally apply to builtin/diff part of the patch.  The posted patch
ignores the swap logic when --merge-base is given, but we should
apply the swap logic first and then make sure the merge_base logic
will have the oid[0] and oid[1] in the correct order.
Denton Liu Sept. 21, 2020, 9:54 p.m. UTC | #9
Hi Junio,

On Mon, Sep 21, 2020 at 02:09:24PM -0700, Junio C Hamano wrote:
> Denton Liu <liu.denton@gmail.com> writes:
> 
> > @@ -165,7 +175,12 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix)
> >  	case 2:
> >  		tree1 = opt->pending.objects[0].item;
> >  		tree2 = opt->pending.objects[1].item;
> > -		if (tree2->flags & UNINTERESTING) {
> > +		if (merge_base) {
> > +			struct object_id oid;
> > +
> > +			diff_get_merge_base(opt, &oid);
> > +			tree1 = lookup_object(the_repository, &oid);
> > +		} else if (tree2->flags & UNINTERESTING) {
> >  			SWAP(tree2, tree1);
> >  		}
> >  		diff_tree_oid(&tree1->oid, &tree2->oid, "", &opt->diffopt);
> 
> OK.  Handling this in that "case 2" does make sense.
> 
> However.
> 
> The above code as-is will allow something like
> 
>     git diff --merge-base A..B
> 
> and it will be taken the same as
> 
>     git diff --merge-base A B

This does not happen because at the top of diff_get_merge_base(), we
have

	for (i = 0; i < revs->pending.nr; i++) {
		struct object *obj = revs->pending.objects[i].item;
		if (obj->flags)
			die(_("--merge-base does not work with ranges"));
		if (obj->type != OBJ_COMMIT)
			die(_("--merge-base only works with commits"));
	}

which ensures that we don't accept any ranges at all. This is why I
considered the SWAP and merge_base cases to be mutually exclusive.

> Another possibility is to error out when "--merge-base A..B" is
> given, which might be simpler.  Then the code would look more like
> 
> 
> 	tree1 = ...
> 	tree2 = ...
> 
> 	if (merge_base) {
> 		if ((tree1->flags | tree2->flags) & UNINTERESTING)
> 			die(_("use of --merge-base with A..B forbidden"));
> 		... get merge base and assign it to tree1 ...
> 	} else if (tree2->flags & UNINTERESTING) {
> 		SWAP();
> 	}

This is the route I picked, although the logic for this is in
diff_get_merge_base().

> While we are at it, what happens when "--merge-base A...B" is given?
> 
> In the original code without "--merge-base", "git diff-tree A...B"
> places the merge base between A and B in pending.objects[0] and B in
> pending.objects[1], I think.  "git diff-tree --merge-base A...B"
> would further compute the merge base between these two objects, but
> luckily $(git merge-base $(merge-base A B) B) is the same as $(git
> merge-base A B), so you won't get an incorrect answer from such a
> request.  Is this something we want to diagnose as an error?  I am
> inclined to say we should allow it (and if it hurts the user can
> stop doing so) as there is no harm done.

I think that we should error out for all ranges because this option
semantically only really makes sense on two endpoints, not a range of
commits. Since the check is cheap to protect users from themselves, we
might as well actually do it.

Worst case, if someone has a legimitate use case for --merge-base and
ranges, we can allow it in the future, which would be easier than
removing this feature.

Thanks,
Denton
Junio C Hamano Sept. 21, 2020, 10:18 p.m. UTC | #10
Denton Liu <liu.denton@gmail.com> writes:

> This does not happen because at the top of diff_get_merge_base(), we
> have
>
> 	for (i = 0; i < revs->pending.nr; i++) {
> 		struct object *obj = revs->pending.objects[i].item;
> 		if (obj->flags)
> 			die(_("--merge-base does not work with ranges"));
> 		if (obj->type != OBJ_COMMIT)
> 			die(_("--merge-base only works with commits"));
> 	}
>
> which ensures that we don't accept any ranges at all.

I think we should lose that loop, or at least the first test.

If we are not removing the support for "A..B" notation and still
accept "diff A..B" happily, not accepting "diff --merge-base A..B"
would appear inconsistent to the users.  

The same applies to "A...B".

Thanks.
Denton Liu Sept. 23, 2020, 9:47 a.m. UTC | #11
Hi Junio,

On Mon, Sep 21, 2020 at 03:18:06PM -0700, Junio C Hamano wrote:
> Denton Liu <liu.denton@gmail.com> writes:
> 
> > This does not happen because at the top of diff_get_merge_base(), we
> > have
> >
> > 	for (i = 0; i < revs->pending.nr; i++) {
> > 		struct object *obj = revs->pending.objects[i].item;
> > 		if (obj->flags)
> > 			die(_("--merge-base does not work with ranges"));
> > 		if (obj->type != OBJ_COMMIT)
> > 			die(_("--merge-base only works with commits"));
> > 	}
> >
> > which ensures that we don't accept any ranges at all.
> 
> I think we should lose that loop, or at least the first test.
> 
> If we are not removing the support for "A..B" notation and still
> accept "diff A..B" happily, not accepting "diff --merge-base A..B"
> would appear inconsistent to the users.  

I disagree, in the documentation, it clearly states that this option is
only available to the diff modes that accept endpoints, not
ranges:

	'git diff' [<options>] --cached [--merge-base] [<commit>] [--] [<path>...]::

and

	'git diff' [<options>] [--merge-base] <commit> <commit> [--] [<path>...]::

so it seems perfectly consistent to me. The documentation gives the
impression that the range notations are their own separate mode anyway.

And worst case scenario, if we receive user reports that they believe
the feature is inconsistent, it's 100x easier to change it to allow
ranges than attempting to remove support for ranges in the future.

Thanks,
Denton
Junio C Hamano Sept. 25, 2020, 9:02 p.m. UTC | #12
Denton Liu <liu.denton@gmail.com> writes:

> And worst case scenario, if we receive user reports that they believe
> the feature is inconsistent, it's 100x easier to change it to allow
> ranges than attempting to remove support for ranges in the future.

If we allow ranges from day one, we do not even have to worry about
it, no?
Denton Liu Sept. 26, 2020, 1:52 a.m. UTC | #13
Hi Junio,

On Fri, Sep 25, 2020 at 02:02:11PM -0700, Junio C Hamano wrote:
> Denton Liu <liu.denton@gmail.com> writes:
> 
> > And worst case scenario, if we receive user reports that they believe
> > the feature is inconsistent, it's 100x easier to change it to allow
> > ranges than attempting to remove support for ranges in the future.
> 
> If we allow ranges from day one, we do not even have to worry about
> it, no?

Yes, but I'm worried that being able to mix --merge-base with ranges
might cause more confusion for users since, in my opinion, it only
really makes sense for endpoints. That's why I restricted it in the
first place.

I think that since we're in disagreement, it makes more sense to take
the safer option where we can implement functionality later whereas if
we implement it and we want to remove it later, it'll be a much harder
time.

Thanks,
Denton
diff mbox series

Patch

diff --git a/Documentation/git-diff-tree.txt b/Documentation/git-diff-tree.txt
index 5c8a2a5e97..2fc24c542f 100644
--- a/Documentation/git-diff-tree.txt
+++ b/Documentation/git-diff-tree.txt
@@ -10,7 +10,7 @@  SYNOPSIS
 --------
 [verse]
 'git diff-tree' [--stdin] [-m] [-s] [-v] [--no-commit-id] [--pretty]
-	      [-t] [-r] [-c | --cc] [--combined-all-paths] [--root]
+	      [-t] [-r] [-c | --cc] [--combined-all-paths] [--root] [--merge-base]
 	      [<common diff options>] <tree-ish> [<tree-ish>] [<path>...]
 
 DESCRIPTION
@@ -43,6 +43,11 @@  include::diff-options.txt[]
 	When `--root` is specified the initial commit will be shown as a big
 	creation event. This is equivalent to a diff against the NULL tree.
 
+--merge-base::
+	Instead of comparing the <tree-ish>s directly, use the merge
+	base between the two <tree-ish>s as the "before" side.  There
+	must be two <tree-ish>s given and they must both be commits.
+
 --stdin::
 	When `--stdin` is specified, the command does not take
 	<tree-ish> arguments from the command line.  Instead, it
diff --git a/Documentation/git-diff.txt b/Documentation/git-diff.txt
index 762ee6d074..d3b526e00a 100644
--- a/Documentation/git-diff.txt
+++ b/Documentation/git-diff.txt
@@ -11,8 +11,7 @@  SYNOPSIS
 [verse]
 'git diff' [<options>] [<commit>] [--] [<path>...]
 'git diff' [<options>] --cached [--merge-base] [<commit>] [--] [<path>...]
-'git diff' [<options>] <commit> [<commit>...] <commit> [--] [<path>...]
-'git diff' [<options>] <commit>...<commit> [--] [<path>...]
+'git diff' [<options>] [--merge-base] <commit> [<commit>...] <commit> [--] [<path>...]
 'git diff' [<options>] <blob> <blob>
 'git diff' [<options>] --no-index [--] <path> <path>
 
@@ -62,10 +61,14 @@  of <commit> and HEAD.  `git diff --merge-base A` is equivalent to
 	branch name to compare with the tip of a different
 	branch.
 
-'git diff' [<options>] <commit> <commit> [--] [<path>...]::
+'git diff' [<options>] [--merge-base] <commit> <commit> [--] [<path>...]::
 
 	This is to view the changes between two arbitrary
 	<commit>.
++
+If --merge-base is given, use the merge base of the two commits for the
+"before" side.  `git diff --merge-base A B` is equivalent to
+`git diff $(git merge-base A B) B`.
 
 'git diff' [<options>] <commit> <commit>... <commit> [--] [<path>...]::
 
diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c
index 802363d0a2..823d6678e5 100644
--- a/builtin/diff-tree.c
+++ b/builtin/diff-tree.c
@@ -111,6 +111,7 @@  int cmd_diff_tree(int argc, const char **argv, const char *prefix)
 	struct setup_revision_opt s_r_opt;
 	struct userformat_want w;
 	int read_stdin = 0;
+	int merge_base = 0;
 
 	if (argc == 2 && !strcmp(argv[1], "-h"))
 		usage(diff_tree_usage);
@@ -143,9 +144,26 @@  int cmd_diff_tree(int argc, const char **argv, const char *prefix)
 			read_stdin = 1;
 			continue;
 		}
+		if (!strcmp(arg, "--merge-base")) {
+			merge_base = 1;
+			continue;
+		}
 		usage(diff_tree_usage);
 	}
 
+	if (read_stdin && merge_base)
+		die(_("--stdin and --merge-base are mutually exclusive"));
+
+	if (merge_base) {
+		struct object_id oid;
+
+		if (opt->pending.nr != 2)
+			die(_("--merge-base only works with two commits"));
+
+		diff_get_merge_base(opt, &oid);
+		opt->pending.objects[0].item = lookup_object(the_repository, &oid);
+	}
+
 	/*
 	 * NOTE!  We expect "a..b" to expand to "^a b" but it is
 	 * perfectly valid for revision range parser to yield "b ^a",
diff --git a/builtin/diff.c b/builtin/diff.c
index 1baea18ae0..ad78bc89b3 100644
--- a/builtin/diff.c
+++ b/builtin/diff.c
@@ -26,8 +26,7 @@ 
 static const char builtin_diff_usage[] =
 "git diff [<options>] [<commit>] [--] [<path>...]\n"
 "   or: git diff [<options>] --cached [<commit>] [--] [<path>...]\n"
-"   or: git diff [<options>] <commit> [<commit>...] <commit> [--] [<path>...]\n"
-"   or: git diff [<options>] <commit>...<commit>] [--] [<path>...]\n"
+"   or: git diff [<options>] <commit> [--merge-base] [<commit>...] <commit> [--] [<path>...]\n"
 "   or: git diff [<options>] <blob> <blob>]\n"
 "   or: git diff [<options>] --no-index [--] <path> <path>]\n"
 COMMON_DIFF_OPTIONS_HELP;
@@ -172,19 +171,34 @@  static int builtin_diff_tree(struct rev_info *revs,
 			     struct object_array_entry *ent1)
 {
 	const struct object_id *(oid[2]);
-	int swap = 0;
+	struct object_id mb_oid;
+	int merge_base = 0;
 
-	if (argc > 1)
-		usage(builtin_diff_usage);
+	while (1 < argc) {
+		const char *arg = argv[1];
+		if (!strcmp(arg, "--merge-base"))
+			merge_base = 1;
+		else
+			usage(builtin_diff_usage);
+		argv++; argc--;
+	}
 
-	/*
-	 * We saw two trees, ent0 and ent1.  If ent1 is uninteresting,
-	 * swap them.
-	 */
-	if (ent1->item->flags & UNINTERESTING)
-		swap = 1;
-	oid[swap] = &ent0->item->oid;
-	oid[1 - swap] = &ent1->item->oid;
+	if (merge_base) {
+		diff_get_merge_base(revs, &mb_oid);
+		oid[0] = &mb_oid;
+		oid[1] = &revs->pending.objects[1].item->oid;
+	} else {
+		int swap = 0;
+
+		/*
+		 * We saw two trees, ent0 and ent1.  If ent1 is uninteresting,
+		 * swap them.
+		 */
+		if (ent1->item->flags & UNINTERESTING)
+			swap = 1;
+		oid[swap] = &ent0->item->oid;
+		oid[1 - swap] = &ent1->item->oid;
+	}
 	diff_tree_oid(oid[0], oid[1], "", &revs->diffopt);
 	log_tree_diff_flush(revs);
 	return 0;
diff --git a/t/t4068-diff-symmetric-merge-base.sh b/t/t4068-diff-symmetric-merge-base.sh
index 49432379cb..03487cc945 100755
--- a/t/t4068-diff-symmetric-merge-base.sh
+++ b/t/t4068-diff-symmetric-merge-base.sh
@@ -156,4 +156,38 @@  do
 	'
 done
 
+for cmd in diff-tree diff
+do
+	test_expect_success "$cmd --merge-base with two commits" '
+		git $cmd commit-C master >expect &&
+		git $cmd --merge-base br2 master >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "$cmd --merge-base commit and non-commit" '
+		test_must_fail git $cmd --merge-base br2 master^{tree} 2>err &&
+		test_i18ngrep "fatal: --merge-base only works with commits" err
+	'
+
+	test_expect_success "$cmd --merge-base with no merge bases and two commits" '
+		test_must_fail git $cmd --merge-base br2 br3 2>err &&
+		test_i18ngrep "fatal: no merge base found" err
+	'
+
+	test_expect_success "$cmd --merge-base with multiple merge bases and two commits" '
+		test_must_fail git $cmd --merge-base master br1 2>err &&
+		test_i18ngrep "fatal: multiple merge bases found" err
+	'
+done
+
+test_expect_success 'diff-tree --merge-base with one commit' '
+	test_must_fail git diff-tree --merge-base master 2>err &&
+	test_i18ngrep "fatal: --merge-base only works with two commits" err
+'
+
+test_expect_success 'diff --merge-base with range' '
+	test_must_fail git diff --merge-base br2..br3 2>err &&
+	test_i18ngrep "fatal: --merge-base does not work with ranges" err
+'
+
 test_done