diff mbox series

[04/12] merge-tree: implement real merges

Message ID 05bd17686e1404c81542b6bbf69dcd3decb83c5b.1642888562.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series RFC: In-core git merge-tree ("Server side merges") | expand

Commit Message

Elijah Newren Jan. 22, 2022, 9:55 p.m. UTC
From: Elijah Newren <newren@gmail.com>

This adds the ability to perform real merges rather than just trivial
merges (meaning handling three way content merges, recursive ancestor
consolidation, renames, proper directory/file conflict handling, and so
forth).  However, unlike `git merge`, the working tree and index are
left alone and no branch is updated.

The only output is:
  - the toplevel resulting tree printed on stdout
  - exit status of 0 (clean) or 1 (conflicts present)

This output is meant to be used by some higher level script, perhaps in
a sequence of steps like this:

   NEWTREE=$(git merge-tree --write-tree $BRANCH1 $BRANCH2)
   test $? -eq 0 || die "There were conflicts..."
   NEWCOMMIT=$(git commit-tree $NEWTREE -p $BRANCH1 -p $BRANCH2)
   git update-ref $BRANCH1 $NEWCOMMIT

Note that higher level scripts may also want to access the
conflict/warning messages normally output during a merge, or have quick
access to a list of files with conflicts.  That is not available in this
preliminary implementation, but subsequent commits will add that
ability.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-merge-tree.txt | 74 ++++++++++++++++++++++-----
 builtin/merge-tree.c             | 55 +++++++++++++++++++-
 t/t4301-merge-tree-real.sh       | 87 ++++++++++++++++++++++++++++++++
 3 files changed, 203 insertions(+), 13 deletions(-)
 create mode 100755 t/t4301-merge-tree-real.sh

Comments

Ævar Arnfjörð Bjarmason Jan. 24, 2022, 9:51 a.m. UTC | #1
On Sat, Jan 22 2022, Elijah Newren via GitGitGadget wrote:

> +	/*
> +	 * TODO: Support subtree and other -X options?
> +	if (use_strategies_nr == 1 &&
> +	    !strcmp(use_strategies[0]->name, "subtree"))
> +		opt.subtree_shift = "";
> +	for (x = 0; x < xopts_nr; x++)
> +		if (parse_merge_opt(&opt, xopts[x]))

Better omitted WIP code in a non-RFC series?

> +			die(_("Unknown strategy option: -X%s"), xopts[x]);

As a general issue with this series, die(), BUG() etc. messages should
start with a non-capital letter.

> +	printf("%s\n", oid_to_hex(&result.tree->object.oid));

And for both this...

> +		printf(_("Conflicts!\n"));

... and this we can just use puts(). For the former it's just less code,
but for the latter translators also don't need to see the always-there
\n in the translated message.

> +# This test is ort-specific
> +test "${GIT_TEST_MERGE_ALGORITHM:-ort}" = ort || {

Is this ${} trickery really needed? We're not testing with "set -u". So just:
	
	if test "$GIT_..." != "ort"
	then
		...
	fi

> +test_expect_success 'Barf on too many arguments' '
> +	test_expect_code 129 git merge-tree --write-tree side1 side2 side3 2>expect &&
> +
> +	grep "^usage: git merge-tree" expect
> +'

Nit: In most other tests these usage assertions are at the top of the
test, and for those we also make do with just testing the 129 exit code,
which is probably enough here too...
Elijah Newren Jan. 24, 2022, 5:12 p.m. UTC | #2
On Mon, Jan 24, 2022 at 1:55 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Sat, Jan 22 2022, Elijah Newren via GitGitGadget wrote:
>
> > +     /*
> > +      * TODO: Support subtree and other -X options?
> > +     if (use_strategies_nr == 1 &&
> > +         !strcmp(use_strategies[0]->name, "subtree"))
> > +             opt.subtree_shift = "";
> > +     for (x = 0; x < xopts_nr; x++)
> > +             if (parse_merge_opt(&opt, xopts[x]))
>
> Better omitted WIP code in a non-RFC series?

It's RFC: https://lore.kernel.org/git/pull.1122.git.1642888562.gitgitgadget@gmail.com/

But yeah, I should drop it.  Previous rounds of this RFC submission
got me feedback that the other commented-out code bit I used to have
was something folks wanted in the initial version of the series so I
uncommented and cleaned it up.  The fact that no one has commented on
this part suggests these options don't need to be supported from the
start.

> > +                     die(_("Unknown strategy option: -X%s"), xopts[x]);
>
> As a general issue with this series, die(), BUG() etc. messages should
> start with a non-capital letter.

Right, thanks for the reminder.  I'll go through and try to fix up.

> > +     printf("%s\n", oid_to_hex(&result.tree->object.oid));
>
> And for both this...
>
> > +             printf(_("Conflicts!\n"));
>
> ... and this we can just use puts(). For the former it's just less code,
> but for the latter translators also don't need to see the always-there
> \n in the translated message.

Makes sense.

> > +# This test is ort-specific
> > +test "${GIT_TEST_MERGE_ALGORITHM:-ort}" = ort || {
>
> Is this ${} trickery really needed? We're not testing with "set -u". So just:
>
>         if test "$GIT_..." != "ort"
>         then
>                 ...
>         fi

Ah, that would be simpler; thanks.

> > +test_expect_success 'Barf on too many arguments' '
> > +     test_expect_code 129 git merge-tree --write-tree side1 side2 side3 2>expect &&
> > +
> > +     grep "^usage: git merge-tree" expect
> > +'
>
> Nit: In most other tests these usage assertions are at the top of the
> test, and for those we also make do with just testing the 129 exit code,
> which is probably enough here too...

I see a fair number of counterexamples searching for 129 in the test
suite, and I've been bitten enough times seeing tests expect an error
but get a different kind of error than the commit message stated they
were expecting that I prefer the extra check beyond the error code.
Anyway, I'll leave this piece as-is.
Johannes Schindelin Jan. 25, 2022, 5:07 p.m. UTC | #3
Hi Elijah,

On Sat, 22 Jan 2022, Elijah Newren via GitGitGadget wrote:

> +The second form is deprecated; it is kept for backward compatibility
> +reasons but may be deleted in the future.  It will only do a trivial
> +merge.  It reads three tree-ish, and outputs trivial merge results and
> +conflicting stages to the standard output in a semi-diff format.
> +Since this was designed for higher level scripts to consume and merge
> +the results back into the index, it omits entries that match
> +<branch1>.  The result of this second form is is similar to what

There is a double "is" in this line. Taking a step back, I would suggest
to not only remove this paragraph, but to mark the `[--trivial-merge]`
option clearly as `(DEPRECATED)`.

> +three-way 'git read-tree -m' does, but instead of storing the results
> +in the index, the command outputs the entries to the standard output.
> +This form not only has limited applicability, the output format is
> +also difficult to work with, and it will generally be less performant
> +than the first form even on successful merges (especially if working
> +in large repositories).  The remainder of this manual will only
> +discuss the first form.

Thank you,
Dscho
Christian Couder Jan. 26, 2022, 9:44 a.m. UTC | #4
On Sat, Jan 22, 2022 at 10:56 PM Elijah Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Elijah Newren <newren@gmail.com>
>
> This adds the ability to perform real merges rather than just trivial
> merges (meaning handling three way content merges, recursive ancestor
> consolidation, renames, proper directory/file conflict handling, and so
> forth).  However, unlike `git merge`, the working tree and index are
> left alone and no branch is updated.
>
> The only output is:
>   - the toplevel resulting tree printed on stdout
>   - exit status of 0 (clean) or 1 (conflicts present)

The exit status can now actually be something other than 0 and 1
according to the doc and code below.

> +Performs a merge, but does not make any new commits and does not read
> +from or write to either the working tree or index.
> +
> +The first form will merge the two branches, doing a full recursive
> +merge with rename detection.

Maybe this could already tell that the first form will also write a
tree with the result of the merge (even in case of conflict) as this
could help understand the reason why the associated option is called
'--write-tree'. It could also help to say that we call such a merge a
'real' merge.

> The rest of this manual (other than the
> +next paragraph) describes the first form in more detail -- including
> +options, output format, exit status, and usage notes.

>  static int real_merge(struct merge_tree_options *o,
>                       const char *branch1, const char *branch2)
>  {
> -       die(_("real merges are not yet implemented"));
> +       struct commit *parent1, *parent2;
> +       struct commit_list *common;
> +       struct commit_list *merge_bases = NULL;
> +       struct commit_list *j;
> +       struct merge_options opt;
> +       struct merge_result result = { 0 };
> +
> +       parent1 = get_merge_parent(branch1);
> +       if (!parent1)
> +               help_unknown_ref(branch1, "merge",
> +                                _("not something we can merge"));

The second argument is supposed to be the command (it's called "cmd"),
so maybe "merge-tree" instead of "merge".

> +       parent2 = get_merge_parent(branch2);
> +       if (!parent2)
> +               help_unknown_ref(branch2, "merge",
> +                                _("not something we can merge"));

idem

> +       opt.show_rename_progress = 0;
> +
> +       opt.branch1 = merge_remote_util(parent1)->name; /* or just branch1? */
> +       opt.branch2 = merge_remote_util(parent2)->name; /* or just branch2? */

I think just:

       opt.branch1 = branch1
       opt.branch2 = branch2

might be better for users as it should show the name as it was passed
to the command.

> +       merge_incore_recursive(&opt, merge_bases, parent1, parent2, &result);
> +       printf("%s\n", oid_to_hex(&result.tree->object.oid));

I wonder if we can actually always output a valid tree when
result.clean < 0. In case we might not, the printing should go a few
lines below.

> +       if (result.clean < 0)
> +               die(_("failure to merge"));
> +       else if (!result.clean)

The "else" is not necessary above.

> +               printf(_("Conflicts!\n"));
> +       merge_finalize(&opt, &result);
> +       return !result.clean; /* result.clean < 0 handled above */
>  }

> diff --git a/t/t4301-merge-tree-real.sh b/t/t4301-merge-tree-real.sh
> new file mode 100755
> index 00000000000..e03688515c5
> --- /dev/null
> +++ b/t/t4301-merge-tree-real.sh

I wonder if it would be better named 't4301-merge-tree-write-tree.sh'...

> @@ -0,0 +1,87 @@
> +#!/bin/sh
> +
> +test_description='git merge-tree --write-tree'

... especially given this description.
Elijah Newren Jan. 29, 2022, 4:09 a.m. UTC | #5
On Wed, Jan 26, 2022 at 1:45 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Sat, Jan 22, 2022 at 10:56 PM Elijah Newren via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Elijah Newren <newren@gmail.com>
> >
> > This adds the ability to perform real merges rather than just trivial
> > merges (meaning handling three way content merges, recursive ancestor
> > consolidation, renames, proper directory/file conflict handling, and so
> > forth).  However, unlike `git merge`, the working tree and index are
> > left alone and no branch is updated.
> >
> > The only output is:
> >   - the toplevel resulting tree printed on stdout
> >   - exit status of 0 (clean) or 1 (conflicts present)
>
> The exit status can now actually be something other than 0 and 1
> according to the doc and code below.

Thanks for catching; will fix.

> > +Performs a merge, but does not make any new commits and does not read
> > +from or write to either the working tree or index.
> > +
> > +The first form will merge the two branches, doing a full recursive
> > +merge with rename detection.
>
> Maybe this could already tell that the first form will also write a
> tree with the result of the merge (even in case of conflict) as this
> could help understand the reason why the associated option is called
> '--write-tree'. It could also help to say that we call such a merge a
> 'real' merge.

Makes sense.

> > The rest of this manual (other than the
> > +next paragraph) describes the first form in more detail -- including
> > +options, output format, exit status, and usage notes.
>
> >  static int real_merge(struct merge_tree_options *o,
> >                       const char *branch1, const char *branch2)
> >  {
> > -       die(_("real merges are not yet implemented"));
> > +       struct commit *parent1, *parent2;
> > +       struct commit_list *common;
> > +       struct commit_list *merge_bases = NULL;
> > +       struct commit_list *j;
> > +       struct merge_options opt;
> > +       struct merge_result result = { 0 };
> > +
> > +       parent1 = get_merge_parent(branch1);
> > +       if (!parent1)
> > +               help_unknown_ref(branch1, "merge",
> > +                                _("not something we can merge"));
>
> The second argument is supposed to be the command (it's called "cmd"),
> so maybe "merge-tree" instead of "merge".

Oh, good catch; thanks for pointing this out.

>
> > +       parent2 = get_merge_parent(branch2);
> > +       if (!parent2)
> > +               help_unknown_ref(branch2, "merge",
> > +                                _("not something we can merge"));
>
> idem
>
> > +       opt.show_rename_progress = 0;
> > +
> > +       opt.branch1 = merge_remote_util(parent1)->name; /* or just branch1? */
> > +       opt.branch2 = merge_remote_util(parent2)->name; /* or just branch2? */
>
> I think just:
>
>        opt.branch1 = branch1
>        opt.branch2 = branch2
>
> might be better for users as it should show the name as it was passed
> to the command.

After digging for a bit, I think in this case there actually isn't a
difference to users because both will give the same result.  But, if
that's the case, the simpler code is warranted.

> > +       merge_incore_recursive(&opt, merge_bases, parent1, parent2, &result);
> > +       printf("%s\n", oid_to_hex(&result.tree->object.oid));
>
> I wonder if we can actually always output a valid tree when
> result.clean < 0. In case we might not, the printing should go a few
> lines below.

Yeah, I caught that and fixed it, but got it squashed into a later
commit.  I'll fix it up.

> > +       if (result.clean < 0)
> > +               die(_("failure to merge"));
> > +       else if (!result.clean)
>
> The "else" is not necessary above.
>
> > +               printf(_("Conflicts!\n"));

Yes, and the else clause should just be ripped out.

> > +       merge_finalize(&opt, &result);
> > +       return !result.clean; /* result.clean < 0 handled above */
> >  }
>
> > diff --git a/t/t4301-merge-tree-real.sh b/t/t4301-merge-tree-real.sh
> > new file mode 100755
> > index 00000000000..e03688515c5
> > --- /dev/null
> > +++ b/t/t4301-merge-tree-real.sh
>
> I wonder if it would be better named 't4301-merge-tree-write-tree.sh'...
>
> > @@ -0,0 +1,87 @@
> > +#!/bin/sh
> > +
> > +test_description='git merge-tree --write-tree'
>
> ... especially given this description.

Makes sense; will rename.
diff mbox series

Patch

diff --git a/Documentation/git-merge-tree.txt b/Documentation/git-merge-tree.txt
index 58731c19422..b900bc1362c 100644
--- a/Documentation/git-merge-tree.txt
+++ b/Documentation/git-merge-tree.txt
@@ -3,26 +3,76 @@  git-merge-tree(1)
 
 NAME
 ----
-git-merge-tree - Show three-way merge without touching index
+git-merge-tree - Perform merge without touching index or working tree
 
 
 SYNOPSIS
 --------
 [verse]
-'git merge-tree' <base-tree> <branch1> <branch2>
+'git merge-tree' [--write-tree] <branch1> <branch2>
+'git merge-tree' [--trivial-merge] <base-tree> <branch1> <branch2>
 
 DESCRIPTION
 -----------
-Reads three tree-ish, and output trivial merge results and
-conflicting stages to the standard output.  This is similar to
-what three-way 'git read-tree -m' does, but instead of storing the
-results in the index, the command outputs the entries to the
-standard output.
-
-This is meant to be used by higher level scripts to compute
-merge results outside of the index, and stuff the results back into the
-index.  For this reason, the output from the command omits
-entries that match the <branch1> tree.
+
+Performs a merge, but does not make any new commits and does not read
+from or write to either the working tree or index.
+
+The first form will merge the two branches, doing a full recursive
+merge with rename detection.  The rest of this manual (other than the
+next paragraph) describes the first form in more detail -- including
+options, output format, exit status, and usage notes.
+
+The second form is deprecated; it is kept for backward compatibility
+reasons but may be deleted in the future.  It will only do a trivial
+merge.  It reads three tree-ish, and outputs trivial merge results and
+conflicting stages to the standard output in a semi-diff format.
+Since this was designed for higher level scripts to consume and merge
+the results back into the index, it omits entries that match
+<branch1>.  The result of this second form is is similar to what
+three-way 'git read-tree -m' does, but instead of storing the results
+in the index, the command outputs the entries to the standard output.
+This form not only has limited applicability, the output format is
+also difficult to work with, and it will generally be less performant
+than the first form even on successful merges (especially if working
+in large repositories).  The remainder of this manual will only
+discuss the first form.
+
+OUTPUT
+------
+
+For either a successful or conflicted merge, the output from
+git-merge-tree is simply one line:
+
+	<OID of toplevel tree>
+
+The printed tree object corresponds to what would be checked out in
+the working tree at the end of `git merge`, and thus may have files
+with conflict markers in them.
+
+EXIT STATUS
+-----------
+
+For a successful, non-conflicted merge, the exit status is 0.  When the
+merge has conflicts, the exit status is 1.  If the merge is not able to
+complete (or start) due to some kind of error, the exit status is
+something other than 0 or 1.
+
+USAGE NOTES
+-----------
+
+git-merge-tree was written to be low-level plumbing, similar to
+hash-object, mktree, commit-tree, update-ref, and mktag.  Thus, it could
+be used as a part of a series of steps such as
+
+       NEWTREE=$(git merge-tree --write-tree $BRANCH1 $BRANCH2)
+       test $? -eq 0 || die "There were conflicts..."
+       NEWCOMMIT=$(git commit-tree $NEWTREE -p $BRANCH1 -p $BRANCH2)
+       git update-ref $BRANCH1 $NEWCOMMIT
+
+However, it does not quite fit into the same category of low-level
+plumbing commands since the possibility of merge conflicts give it a
+much higher chance of the command not succeeding.
 
 GIT
 ---
diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index 33e47cc1534..0c19639594d 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -2,6 +2,9 @@ 
 #include "builtin.h"
 #include "tree-walk.h"
 #include "xdiff-interface.h"
+#include "help.h"
+#include "commit-reach.h"
+#include "merge-ort.h"
 #include "object-store.h"
 #include "parse-options.h"
 #include "repository.h"
@@ -393,7 +396,57 @@  struct merge_tree_options {
 static int real_merge(struct merge_tree_options *o,
 		      const char *branch1, const char *branch2)
 {
-	die(_("real merges are not yet implemented"));
+	struct commit *parent1, *parent2;
+	struct commit_list *common;
+	struct commit_list *merge_bases = NULL;
+	struct commit_list *j;
+	struct merge_options opt;
+	struct merge_result result = { 0 };
+
+	parent1 = get_merge_parent(branch1);
+	if (!parent1)
+		help_unknown_ref(branch1, "merge",
+				 _("not something we can merge"));
+
+	parent2 = get_merge_parent(branch2);
+	if (!parent2)
+		help_unknown_ref(branch2, "merge",
+				 _("not something we can merge"));
+
+	init_merge_options(&opt, the_repository);
+	/*
+	 * TODO: Support subtree and other -X options?
+	if (use_strategies_nr == 1 &&
+	    !strcmp(use_strategies[0]->name, "subtree"))
+		opt.subtree_shift = "";
+	for (x = 0; x < xopts_nr; x++)
+		if (parse_merge_opt(&opt, xopts[x]))
+			die(_("Unknown strategy option: -X%s"), xopts[x]);
+	*/
+
+	opt.show_rename_progress = 0;
+
+	opt.branch1 = merge_remote_util(parent1)->name; /* or just branch1? */
+	opt.branch2 = merge_remote_util(parent2)->name; /* or just branch2? */
+
+	/*
+	 * Get the merge bases, in reverse order; see comment above
+	 * merge_incore_recursive in merge-ort.h
+	 */
+	common = get_merge_bases(parent1, parent2);
+	if (!common)
+		die(_("refusing to merge unrelated histories"));
+	for (j = common; j; j = j->next)
+		commit_list_insert(j->item, &merge_bases);
+
+	merge_incore_recursive(&opt, merge_bases, parent1, parent2, &result);
+	printf("%s\n", oid_to_hex(&result.tree->object.oid));
+	if (result.clean < 0)
+		die(_("failure to merge"));
+	else if (!result.clean)
+		printf(_("Conflicts!\n"));
+	merge_finalize(&opt, &result);
+	return !result.clean; /* result.clean < 0 handled above */
 }
 
 int cmd_merge_tree(int argc, const char **argv, const char *prefix)
diff --git a/t/t4301-merge-tree-real.sh b/t/t4301-merge-tree-real.sh
new file mode 100755
index 00000000000..e03688515c5
--- /dev/null
+++ b/t/t4301-merge-tree-real.sh
@@ -0,0 +1,87 @@ 
+#!/bin/sh
+
+test_description='git merge-tree --write-tree'
+
+. ./test-lib.sh
+
+# This test is ort-specific
+test "${GIT_TEST_MERGE_ALGORITHM:-ort}" = ort || {
+	skip_all="GIT_TEST_MERGE_ALGORITHM != ort"
+	test_done
+}
+
+test_expect_success setup '
+	test_write_lines 1 2 3 4 5 >numbers &&
+	echo hello >greeting &&
+	echo foo >whatever &&
+	git add numbers greeting whatever &&
+	test_tick &&
+	git commit -m initial &&
+
+	git branch side1 &&
+	git branch side2 &&
+
+	git checkout side1 &&
+	test_write_lines 1 2 3 4 5 6 >numbers &&
+	echo hi >greeting &&
+	echo bar >whatever &&
+	git add numbers greeting whatever &&
+	test_tick &&
+	git commit -m modify-stuff &&
+
+	git checkout side2 &&
+	test_write_lines 0 1 2 3 4 5 >numbers &&
+	echo yo >greeting &&
+	git rm whatever &&
+	mkdir whatever &&
+	>whatever/empty &&
+	git add numbers greeting whatever/empty &&
+	test_tick &&
+	git commit -m other-modifications
+'
+
+test_expect_success 'Content merge and a few conflicts' '
+	git checkout side1^0 &&
+	test_must_fail git merge side2 &&
+	expected_tree=$(cat .git/AUTO_MERGE) &&
+
+	# We will redo the merge, while we are still in a conflicted state!
+	test_when_finished "git reset --hard" &&
+
+	test_expect_code 1 git merge-tree --write-tree side1 side2 >RESULT &&
+	actual_tree=$(head -n 1 RESULT) &&
+
+	# Due to differences of e.g. "HEAD" vs "side1", the results will not
+	# exactly match.  Dig into individual files.
+
+	# Numbers should have three-way merged cleanly
+	test_write_lines 0 1 2 3 4 5 6 >expect &&
+	git show ${actual_tree}:numbers >actual &&
+	test_cmp expect actual &&
+
+	# whatever and whatever~<branch> should have same HASHES
+	git rev-parse ${expected_tree}:whatever ${expected_tree}:whatever~HEAD >expect &&
+	git rev-parse ${actual_tree}:whatever ${actual_tree}:whatever~side1 >actual &&
+	test_cmp expect actual &&
+
+	# greeting should have a merge conflict
+	git show ${expected_tree}:greeting >tmp &&
+	cat tmp | sed -e s/HEAD/side1/ >expect &&
+	git show ${actual_tree}:greeting >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'Barf on misspelled option, with exit code other than 0 or 1' '
+	# Mis-spell with single "s" instead of double "s"
+	test_expect_code 129 git merge-tree --write-tree --mesages FOOBAR side1 side2 2>expect &&
+
+	grep "error: unknown option.*mesages" expect
+'
+
+test_expect_success 'Barf on too many arguments' '
+	test_expect_code 129 git merge-tree --write-tree side1 side2 side3 2>expect &&
+
+	grep "^usage: git merge-tree" expect
+'
+
+test_done