diff mbox series

[v3,06/10] diff-lib: define diff_get_merge_base()

Message ID 6aac57ca022963fb41d93905e41dff36dccd5969.1600328335.git.liu.denton@gmail.com (mailing list archive)
State Superseded
Headers show
Series builtin/diff: learn --merge-base | expand

Commit Message

Denton Liu Sept. 17, 2020, 7:44 a.m. UTC
In a future commit, we will be using this function to implement
--merge-base functionality in various diff commands.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
---
 diff-lib.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 diff.h     |  2 ++
 2 files changed, 50 insertions(+)

Comments

Junio C Hamano Sept. 17, 2020, 5:16 p.m. UTC | #1
Denton Liu <liu.denton@gmail.com> writes:

> +void diff_get_merge_base(const struct rev_info *revs, struct object_id *mb)
> +{
> +	int i;
> +	struct commit *mb_child[2] = {0};
> +	struct commit_list *merge_bases;
> +
> +	for (i = 0; i < revs->pending.nr; i++) {
> +		struct object *obj = revs->pending.objects[i].item;
> +		if (obj->flags)
> +			die(_("--merge-base does not work with ranges"));
> +		if (obj->type != OBJ_COMMIT)
> +			die(_("--merge-base only works with commits"));
> +	}

This is the first use of die() in this file, that is designed to
keep a set of reusable library functions so that the caller(s) can
do their own die().  They may want to become

	return error(_(...));

The same comment applies to all die()s the patch adds.

> +	/*
> +	 * This check must go after the for loop above because A...B
> +	 * ranges produce three pending commits, resulting in a
> +	 * misleading error message.
> +	 */

Should "git diff --merge-base A...B" be forbidden, or does it just
ask the same thing twice and is not a die-worthy offence?

> +	if (revs->pending.nr > ARRAY_SIZE(mb_child))
> +		die(_("--merge-base does not work with more than two commits"));

Compare with hardcoded '2' in the condition.  The requirement for
mb_child[] is that it can hold at least 2 (changes in the future may
find it convenient if its size gets increased to 3 to hold NULL sentinel
to signal end-of-list, for example).   Comparison with ARRAY_SIZE() may
be appropriate in different situations but not here where the code knows
it wants to reject more than two no matter how big mb_child[] is.

> +	for (i = 0; i < revs->pending.nr; i++)
> +		mb_child[i] = lookup_commit_reference(the_repository, &revs->pending.objects[i].item->oid);
> +	if (revs->pending.nr < ARRAY_SIZE(mb_child)) {
> +		struct object_id oid;
> +
> +		if (revs->pending.nr != 1)
> +			BUG("unexpected revs->pending.nr: %d", revs->pending.nr);

This is an obviously impossible condition as we will not take more
than 2.

> +		if (get_oid("HEAD", &oid))
> +			die(_("unable to get HEAD"));
> +		mb_child[1] = lookup_commit_reference(the_repository, &oid);
> +	}
> +
> +	merge_bases = repo_get_merge_bases(the_repository, mb_child[0], mb_child[1]);
> +	if (!merge_bases)
> +		die(_("no merge base found"));
> +	if (merge_bases->next)
> +		die(_("multiple merge bases found"));
> +
> +	oidcpy(mb, &merge_bases->item->object.oid);
> +
> +	free_commit_list(merge_bases);
> +}


OK.

>  int run_diff_index(struct rev_info *revs, unsigned int option)
>  {
>  	struct object_array_entry *ent;
> diff --git a/diff.h b/diff.h
> index 0cc874f2d5..ae2bb7001a 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -580,6 +580,8 @@ void diff_warn_rename_limit(const char *varname, int needed, int degraded_cc);
>   */
>  const char *diff_aligned_abbrev(const struct object_id *sha1, int);
>  
> +void diff_get_merge_base(const struct rev_info *revs, struct object_id *mb);

Without an actual caller, we cannot judge if this interface is
designed right, but we'll see soon in the next steps ;-)

Looking good so far.

Thanks.

> +
>  /* do not report anything on removed paths */
>  #define DIFF_SILENT_ON_REMOVED 01
>  /* report racily-clean paths as modified */
Denton Liu Sept. 18, 2020, 10:34 a.m. UTC | #2
Hi Junio,

On Thu, Sep 17, 2020 at 10:16:51AM -0700, Junio C Hamano wrote:
> Denton Liu <liu.denton@gmail.com> writes:
> 
> > +void diff_get_merge_base(const struct rev_info *revs, struct object_id *mb)
> > +{
> > +	int i;
> > +	struct commit *mb_child[2] = {0};
> > +	struct commit_list *merge_bases;
> > +
> > +	for (i = 0; i < revs->pending.nr; i++) {
> > +		struct object *obj = revs->pending.objects[i].item;
> > +		if (obj->flags)
> > +			die(_("--merge-base does not work with ranges"));
> > +		if (obj->type != OBJ_COMMIT)
> > +			die(_("--merge-base only works with commits"));
> > +	}
> 
> This is the first use of die() in this file, that is designed to
> keep a set of reusable library functions so that the caller(s) can
> do their own die().  They may want to become

Although this is the first instance of die(), run_diff_index() has an
exit(128), which is basically a die() in disguise.

> 	return error(_(...));
> 
> The same comment applies to all die()s the patch adds.

I applied this change but then each callsite of diff_get_merge_base()
became something like

	if (diff_get_merge_base(revs, &oid))
		exit(128);

so I do agree with the spirit of the change but in reality, it just
creates more busywork for the callers.

> > +	/*
> > +	 * This check must go after the for loop above because A...B
> > +	 * ranges produce three pending commits, resulting in a
> > +	 * misleading error message.
> > +	 */
> 
> Should "git diff --merge-base A...B" be forbidden, or does it just
> ask the same thing twice and is not a die-worthy offence?

I think that it should be die-worthy because it's a logic error for a
user to do this. I can't think of any situation where it wouldn't be
more desirable error early to correct a user's thinking. Plus, we're
trying to move away from the `...` notation anyway ;)

> > +	for (i = 0; i < revs->pending.nr; i++)
> > +		mb_child[i] = lookup_commit_reference(the_repository, &revs->pending.objects[i].item->oid);
> > +	if (revs->pending.nr < ARRAY_SIZE(mb_child)) {
> > +		struct object_id oid;
> > +
> > +		if (revs->pending.nr != 1)
> > +			BUG("unexpected revs->pending.nr: %d", revs->pending.nr);
> 
> This is an obviously impossible condition as we will not take more
> than 2.

We also want to ensure that revs->pending.nr isn't 0 here. That being
said, I can explicitly check earlier in the function that the number of
pending is 1 or 2 so that it's more clearly written.

Thanks,
Denton
Junio C Hamano Sept. 19, 2020, 12:33 a.m. UTC | #3
Denton Liu <liu.denton@gmail.com> writes:

> became something like
>
> 	if (diff_get_merge_base(revs, &oid))
> 		exit(128);
>
> so I do agree with the spirit of the change but in reality, it just
> creates more busywork for the callers.

OK, then lets keep them as die()s.

> I think that it should be die-worthy because it's a logic error for a
> user to do this. I can't think of any situation where it wouldn't be
> more desirable error early to correct a user's thinking. Plus, we're
> trying to move away from the `...` notation anyway ;)

I do not think so.  We are *NOT* trying to move away from A...B;
what is mistake is A..B and that is what we want to move away from.
Luckily, there is no need to introduce a new option there, because
the user can just stop typing .. and instead type SP.

The primary value of the new option you are adding is that it allows
us to compare the index and the working tree with the merge base.
The current A...B notation can only be used to compare two trees.
diff mbox series

Patch

diff --git a/diff-lib.c b/diff-lib.c
index 0a0e69113c..e01c3f0612 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -13,6 +13,7 @@ 
 #include "submodule.h"
 #include "dir.h"
 #include "fsmonitor.h"
+#include "commit-reach.h"
 
 /*
  * diff-files
@@ -518,6 +519,53 @@  static int diff_cache(struct rev_info *revs,
 	return unpack_trees(1, &t, &opts);
 }
 
+void diff_get_merge_base(const struct rev_info *revs, struct object_id *mb)
+{
+	int i;
+	struct commit *mb_child[2] = {0};
+	struct commit_list *merge_bases;
+
+	for (i = 0; i < revs->pending.nr; i++) {
+		struct object *obj = revs->pending.objects[i].item;
+		if (obj->flags)
+			die(_("--merge-base does not work with ranges"));
+		if (obj->type != OBJ_COMMIT)
+			die(_("--merge-base only works with commits"));
+	}
+
+	/*
+	 * This check must go after the for loop above because A...B
+	 * ranges produce three pending commits, resulting in a
+	 * misleading error message.
+	 */
+	if (revs->pending.nr > ARRAY_SIZE(mb_child))
+		die(_("--merge-base does not work with more than two commits"));
+
+	for (i = 0; i < revs->pending.nr; i++)
+		mb_child[i] = lookup_commit_reference(the_repository, &revs->pending.objects[i].item->oid);
+	if (revs->pending.nr < ARRAY_SIZE(mb_child)) {
+		struct object_id oid;
+
+		if (revs->pending.nr != 1)
+			BUG("unexpected revs->pending.nr: %d", revs->pending.nr);
+
+		if (get_oid("HEAD", &oid))
+			die(_("unable to get HEAD"));
+
+		mb_child[1] = lookup_commit_reference(the_repository, &oid);
+	}
+
+	merge_bases = repo_get_merge_bases(the_repository, mb_child[0], mb_child[1]);
+	if (!merge_bases)
+		die(_("no merge base found"));
+	if (merge_bases->next)
+		die(_("multiple merge bases found"));
+
+	oidcpy(mb, &merge_bases->item->object.oid);
+
+	free_commit_list(merge_bases);
+}
+
 int run_diff_index(struct rev_info *revs, unsigned int option)
 {
 	struct object_array_entry *ent;
diff --git a/diff.h b/diff.h
index 0cc874f2d5..ae2bb7001a 100644
--- a/diff.h
+++ b/diff.h
@@ -580,6 +580,8 @@  void diff_warn_rename_limit(const char *varname, int needed, int degraded_cc);
  */
 const char *diff_aligned_abbrev(const struct object_id *sha1, int);
 
+void diff_get_merge_base(const struct rev_info *revs, struct object_id *mb);
+
 /* do not report anything on removed paths */
 #define DIFF_SILENT_ON_REMOVED 01
 /* report racily-clean paths as modified */