diff mbox series

[v2,3/3] merge-ort: implement merge_incore_recursive()

Message ID d8f79450a40e5a91b401ccbbedc7326cfe8a33d6.1608097965.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series merge-ort: implement recursive merges | expand

Commit Message

Elijah Newren Dec. 16, 2020, 5:52 a.m. UTC
From: Elijah Newren <newren@gmail.com>

Implement merge_incore_recursive(), mostly through the use of a new
helper function, merge_ort_internal(), which itself is based off
merge_recursive_internal() from merge-recursive.c.

This drops the number of failures in the testsuite when run under
GIT_TEST_MERGE_ALGORITHM=ort from around 1500 to 647.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 merge-ort.h |  10 +++++
 2 files changed, 115 insertions(+), 3 deletions(-)

Comments

Junio C Hamano Dec. 16, 2020, 6:09 p.m. UTC | #1
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +	/*
> +	 * merge_incore_nonrecursive() exists for cases where we always
> +	 * know there is a well-defined single merge base.  However,
> +	 * despite a similar structure, merge-recursive.c noted that some
> +	 * callers preferred to call the recursive logic anyway and still
> +	 * set a special name for opt->ancestor that would appear in
> +	 * merge.conflictStyle=diff3 output.

Sorry, I do not understand the comment, especially the "anyway"
part.  There is no such thing as nonrecursive variant of
merge-recursive, is it?  If somebody wanted to perform a merge of
two trees with a designated single common ancestor ("am -3" would
want to do so using a fabricated tree, "cherry-pick" would want to
do so using the parent commit of what gets picked), it would be
natural to call "git merge-recursive O -- A B" if it is a scripted
Porcelain, or would call merge_recursive() with the single merge
base on the merge_bases commit_list parameter if it is written in C,
I would think.

> +	 * git-am was one such example (it wanted to set opt->ancestor to
> +	 * "constructed merge base", since it created a fake merge base);
> +	 * it called the recursive merge logic through a special
> +	 * merge_recursive_generic() wrapper.
> +	 *
> +	 * Allow the same kind of special case here.
> +	 */

In any case, the mention of "constructed merge base" may explain
very well why the assert in the previous iteration checked for the
string, but it does not seem relevant after the condition changed.

> +	int num_merge_bases_is_1 = (merge_bases && !merge_bases->next);
> +	assert(opt->ancestor == NULL || num_merge_bases_is_1);

The above comment may have explained why some callers that call the
machinery with a single merge base want to use its own diff3 label,
but the assert the comment applies to is stricter than that.

In other words, it is unclear why the caller is forbidden from
giving the diff3 label, when the recursive merge needs to synthesize
the virtual ancestor using all the given merge bases?

The answer could be a simple "opt->ancestor field is managed by the
recursive machinery itself when it needs to create virtual ancestor,
and must start as NULL, but when we do not create virtual ancestor,
it is allowed to start with any value, as the machinery itself does
not assign any new value to the field", but I cannot read if that is
the case from the comments in the patch.

> +
> +	merge_start(opt, result);
> +	merge_ort_internal(opt, merge_bases, side1, side2, result);
>  }
> diff --git a/merge-ort.h b/merge-ort.h
> index 55ae7ee865d..d53a0a339f3 100644
> --- a/merge-ort.h
> +++ b/merge-ort.h
> @@ -34,6 +34,16 @@ struct merge_result {
>  /*
>   * rename-detecting three-way merge with recursive ancestor consolidation.
>   * working tree and index are untouched.
> + *
> + * merge_bases will be consumed (emptied) so make a copy if you need it.
> + *
> + * NOTE: empirically, the recursive algorithm will perform better if you
> + *       pass the merge_bases in the order of oldest commit to the
> + *       newest[1][2].
> + *
> + *       [1] https://lore.kernel.org/git/nycvar.QRO.7.76.6.1907252055500.21907@tvgsbejvaqbjf.bet/
> + *       [2] commit 8918b0c9c2 ("merge-recur: try to merge older merge bases
> + *           first", 2006-08-09)
>   */

I initially thought that this was a bit out-of-place for the comment
that explains why the merge bases list gets reversed in the code, but
it is actually the right place---it guides the callers that hand a
list of merge_bases to the function.

>  void merge_incore_recursive(struct merge_options *opt,
>  			    struct commit_list *merge_bases,

Thanks.
Elijah Newren Dec. 16, 2020, 6:37 p.m. UTC | #2
On Wed, Dec 16, 2020 at 10:09 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > +     /*
> > +      * merge_incore_nonrecursive() exists for cases where we always
> > +      * know there is a well-defined single merge base.  However,
> > +      * despite a similar structure, merge-recursive.c noted that some
> > +      * callers preferred to call the recursive logic anyway and still
> > +      * set a special name for opt->ancestor that would appear in
> > +      * merge.conflictStyle=diff3 output.
>
> Sorry, I do not understand the comment, especially the "anyway"
> part.  There is no such thing as nonrecursive variant of
> merge-recursive, is it?  If somebody wanted to perform a merge of

There is absolutely a nonrecursive variant; it's called merge_trees()
in merge-recursive.[ch] and merge_incore_nonrecursive() in
merge-ort.[ch].

Note that I called out the nonrecursive variant at the beginning of
the comment.  Perhaps I should have said "recursive variant" rather
than "recursive logic"?

> two trees with a designated single common ancestor ("am -3" would
> want to do so using a fabricated tree, "cherry-pick" would want to
> do so using the parent commit of what gets picked), it would be
> natural to call "git merge-recursive O -- A B" if it is a scripted
> Porcelain, or would call merge_recursive() with the single merge
> base on the merge_bases commit_list parameter if it is written in C,
> I would think.

cherry-pick/sequencer uses merge_trees() or
merge_incore_nonrecursive().  In other words, the non-recursive API.

git-am uses merge_recursive_generic(), a weird wrapper that calls
merge_recursive() [whose analogue in merge-ort would be
merge_incore_recursive()].  It always passes in exactly one merge
base, though.  And the label for the "ancestor" commit in diff3 output
would default to showing some fake commit ID, if it weren't for the
ability to override the ancestor label in that case.

> > +      * git-am was one such example (it wanted to set opt->ancestor to
> > +      * "constructed merge base", since it created a fake merge base);
> > +      * it called the recursive merge logic through a special
> > +      * merge_recursive_generic() wrapper.
> > +      *
> > +      * Allow the same kind of special case here.
> > +      */
>
> In any case, the mention of "constructed merge base" may explain
> very well why the assert in the previous iteration checked for the
> string, but it does not seem relevant after the condition changed.

Sure, the question in the last iteration was "why check for this
string", but there's a new question with this iteration: who would
pass a special opt->ancestor and with what kind of meaning.  Providing
an example thus provides an answer to that question and gives people
an easy way to search for more information.

> > +     int num_merge_bases_is_1 = (merge_bases && !merge_bases->next);
> > +     assert(opt->ancestor == NULL || num_merge_bases_is_1);
>
> The above comment may have explained why some callers that call the
> machinery with a single merge base want to use its own diff3 label,
> but the assert the comment applies to is stricter than that.
>
> In other words, it is unclear why the caller is forbidden from
> giving the diff3 label, when the recursive merge needs to synthesize
> the virtual ancestor using all the given merge bases?
>
> The answer could be a simple "opt->ancestor field is managed by the
> recursive machinery itself when it needs to create virtual ancestor,
> and must start as NULL, but when we do not create virtual ancestor,
> it is allowed to start with any value, as the machinery itself does
> not assign any new value to the field", but I cannot read if that is
> the case from the comments in the patch.

Yeah, you're making me lean towards thinking that
merge_recursive_generic() is just a broken API that I shouldn't port
over (even as a wrapper), and that I further shouldn't support using
the merge_ort_recursive() API in a fashion wanted by that function.

I'll just toss the single-merge-base check along with the laborious comment.

To clarify, though, your comment above is mostly correct but is off in
the case where there is no need to create a virtual ancestor.
merge_incore_recursive() does assign a label in that case (if one is
not already assigned) -- and that label is just a unique abbreviation
of the relevant commit.  That's a great label when the merge base is a
"real" commit, less so when it's fake/constructed.

But we already have a merge_incore_nonrecursive() that allows (in
fact, requires) setting the ancestor label, so we should probably just
shift merge_recursive_generic() callers who have a special ancestor to
go through that API instead.

> > +
> > +     merge_start(opt, result);
> > +     merge_ort_internal(opt, merge_bases, side1, side2, result);
> >  }
> > diff --git a/merge-ort.h b/merge-ort.h
> > index 55ae7ee865d..d53a0a339f3 100644
> > --- a/merge-ort.h
> > +++ b/merge-ort.h
> > @@ -34,6 +34,16 @@ struct merge_result {
> >  /*
> >   * rename-detecting three-way merge with recursive ancestor consolidation.
> >   * working tree and index are untouched.
> > + *
> > + * merge_bases will be consumed (emptied) so make a copy if you need it.
> > + *
> > + * NOTE: empirically, the recursive algorithm will perform better if you
> > + *       pass the merge_bases in the order of oldest commit to the
> > + *       newest[1][2].
> > + *
> > + *       [1] https://lore.kernel.org/git/nycvar.QRO.7.76.6.1907252055500.21907@tvgsbejvaqbjf.bet/
> > + *       [2] commit 8918b0c9c2 ("merge-recur: try to merge older merge bases
> > + *           first", 2006-08-09)
> >   */
>
> I initially thought that this was a bit out-of-place for the comment
> that explains why the merge bases list gets reversed in the code, but
> it is actually the right place---it guides the callers that hand a
> list of merge_bases to the function.
>
> >  void merge_incore_recursive(struct merge_options *opt,
> >                           struct commit_list *merge_bases,
>
> Thanks.
diff mbox series

Patch

diff --git a/merge-ort.c b/merge-ort.c
index 10a97e944c4..de9585b3179 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -1476,6 +1476,91 @@  static void merge_ort_nonrecursive_internal(struct merge_options *opt,
 	}
 }
 
+/*
+ * Originally from merge_recursive_internal(); somewhat adapted, though.
+ */
+static void merge_ort_internal(struct merge_options *opt,
+			       struct commit_list *merge_bases,
+			       struct commit *h1,
+			       struct commit *h2,
+			       struct merge_result *result)
+{
+	struct commit_list *iter;
+	struct commit *merged_merge_bases;
+	const char *ancestor_name;
+	struct strbuf merge_base_abbrev = STRBUF_INIT;
+
+	if (!merge_bases) {
+		merge_bases = get_merge_bases(h1, h2);
+		/* See merge-ort.h:merge_incore_recursive() declaration NOTE */
+		merge_bases = reverse_commit_list(merge_bases);
+	}
+
+	merged_merge_bases = pop_commit(&merge_bases);
+	if (merged_merge_bases == NULL) {
+		/* if there is no common ancestor, use an empty tree */
+		struct tree *tree;
+
+		tree = lookup_tree(opt->repo, opt->repo->hash_algo->empty_tree);
+		merged_merge_bases = make_virtual_commit(opt->repo, tree,
+							 "ancestor");
+		ancestor_name = "empty tree";
+	} else if (opt->ancestor && !opt->priv->call_depth) {
+		ancestor_name = opt->ancestor;
+	} else if (merge_bases) {
+		ancestor_name = "merged common ancestors";
+	} else {
+		strbuf_add_unique_abbrev(&merge_base_abbrev,
+					 &merged_merge_bases->object.oid,
+					 DEFAULT_ABBREV);
+		ancestor_name = merge_base_abbrev.buf;
+	}
+
+	for (iter = merge_bases; iter; iter = iter->next) {
+		const char *saved_b1, *saved_b2;
+		struct commit *prev = merged_merge_bases;
+
+		opt->priv->call_depth++;
+		/*
+		 * When the merge fails, the result contains files
+		 * with conflict markers. The cleanness flag is
+		 * ignored (unless indicating an error), it was never
+		 * actually used, as result of merge_trees has always
+		 * overwritten it: the committed "conflicts" were
+		 * already resolved.
+		 */
+		saved_b1 = opt->branch1;
+		saved_b2 = opt->branch2;
+		opt->branch1 = "Temporary merge branch 1";
+		opt->branch2 = "Temporary merge branch 2";
+		merge_ort_internal(opt, NULL, prev, iter->item, result);
+		if (result->clean < 0)
+			return;
+		opt->branch1 = saved_b1;
+		opt->branch2 = saved_b2;
+		opt->priv->call_depth--;
+
+		merged_merge_bases = make_virtual_commit(opt->repo,
+							 result->tree,
+							 "merged tree");
+		commit_list_insert(prev, &merged_merge_bases->parents);
+		commit_list_insert(iter->item,
+				   &merged_merge_bases->parents->next);
+
+		clear_or_reinit_internal_opts(opt->priv, 1);
+	}
+
+	opt->ancestor = ancestor_name;
+	merge_ort_nonrecursive_internal(opt,
+					repo_get_commit_tree(opt->repo,
+							     merged_merge_bases),
+					repo_get_commit_tree(opt->repo, h1),
+					repo_get_commit_tree(opt->repo, h2),
+					result);
+	strbuf_release(&merge_base_abbrev);
+	opt->ancestor = NULL;  /* avoid accidental re-use of opt->ancestor */
+}
+
 void merge_incore_nonrecursive(struct merge_options *opt,
 			       struct tree *merge_base,
 			       struct tree *side1,
@@ -1493,7 +1578,24 @@  void merge_incore_recursive(struct merge_options *opt,
 			    struct commit *side2,
 			    struct merge_result *result)
 {
-	(void)reverse_commit_list;
-	(void)make_virtual_commit;
-	die("Not yet implemented");
+	/*
+	 * merge_incore_nonrecursive() exists for cases where we always
+	 * know there is a well-defined single merge base.  However,
+	 * despite a similar structure, merge-recursive.c noted that some
+	 * callers preferred to call the recursive logic anyway and still
+	 * set a special name for opt->ancestor that would appear in
+	 * merge.conflictStyle=diff3 output.
+	 *
+	 * git-am was one such example (it wanted to set opt->ancestor to
+	 * "constructed merge base", since it created a fake merge base);
+	 * it called the recursive merge logic through a special
+	 * merge_recursive_generic() wrapper.
+	 *
+	 * Allow the same kind of special case here.
+	 */
+	int num_merge_bases_is_1 = (merge_bases && !merge_bases->next);
+	assert(opt->ancestor == NULL || num_merge_bases_is_1);
+
+	merge_start(opt, result);
+	merge_ort_internal(opt, merge_bases, side1, side2, result);
 }
diff --git a/merge-ort.h b/merge-ort.h
index 55ae7ee865d..d53a0a339f3 100644
--- a/merge-ort.h
+++ b/merge-ort.h
@@ -34,6 +34,16 @@  struct merge_result {
 /*
  * rename-detecting three-way merge with recursive ancestor consolidation.
  * working tree and index are untouched.
+ *
+ * merge_bases will be consumed (emptied) so make a copy if you need it.
+ *
+ * NOTE: empirically, the recursive algorithm will perform better if you
+ *       pass the merge_bases in the order of oldest commit to the
+ *       newest[1][2].
+ *
+ *       [1] https://lore.kernel.org/git/nycvar.QRO.7.76.6.1907252055500.21907@tvgsbejvaqbjf.bet/
+ *       [2] commit 8918b0c9c2 ("merge-recur: try to merge older merge bases
+ *           first", 2006-08-09)
  */
 void merge_incore_recursive(struct merge_options *opt,
 			    struct commit_list *merge_bases,