diff mbox series

[1/2] Change default merge backend from recursive to ort

Message ID 8f6af8d494e0924aef4ae6963b8dca2228dad9b1.1627776462.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Switch default merge backend from recursive to ort | expand

Commit Message

Elijah Newren Aug. 1, 2021, 12:07 a.m. UTC
From: Elijah Newren <newren@gmail.com>

There are a few reasons to switch the default:
  * Correctness
  * Extensibility
  * Performance

I'll provide some summaries about each.

=== Correctness ===

The original impetus for a new merge backend was to fix issues that were
difficult to fix within recursive's design.  The success with this goal
is perhaps most easily demonstrated by running the following:

  $ git grep -2 KNOWN_FAILURE t/ | grep -A 4 GIT_TEST_MERGE_ALGORITHM
  $ git grep test_expect_merge_algorithm.failure.success t/
  $ git grep test_expect_merge_algorithm.success.failure t/

In order, these greps show:

  * Seven sets of submodule tests (10 total tests) that fail with
    recursive but succeed with ort
  * 22 other tests that fail with recursive, but succeed with ort
  * 0 tests that pass with recursive, but fail with ort

=== Extensibility ===

Being able to perform merges without touching the working tree or index
makes it possible to create new features that were difficult with the
old backend:

  * Merging, cherry-picking, rebasing, reverting in bare repositories...
    or just on branches that aren't checked out.

  * `git diff AUTO_MERGE` -- ability to see what changes the user has
    made to resolve conflicts so far (see commit 5291828df8 ("merge-ort:
    write $GIT_DIR/AUTO_MERGE whenever we hit a conflict", 2021-03-20)

  * A --remerge-diff option for log/show, used to show diffs for merges
    that display the difference between what an automatic merge would
    have created and what was recorded in the merge.  (This option will
    often result in an empty diff because many merges are clean, but for
    the non-clean ones it will show how conflicts were fixed including
    the removal of conflict markers, and also show additional changes
    made outside of conflict regions to e.g. fix semantic conflicts.)

  * A --remerge-diff-only option for log/show, similar to --remerge-diff
    but also showing how cherry-picks or reverts differed from what an
    automatic cherry-pick or revert would provide.

The last three have been implemented already (though only one has been
submitted upstream so far; the others were waiting for performance work
to complete), and I still plan to implement the first one.

=== Performance ===

I'll quote from the summary of my final optimization for merge-ort
(while fixing the testcase name from 'no-renames' to 'few-renames'):

                               Timings

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename    merge-ort
                 v2.30.0      current     detection     current
                ----------   ---------   -----------   ---------
few-renames:      18.912 s    18.030 s     11.699 s     198.3 ms
mega-renames:   5964.031 s   361.281 s    203.886 s     661.8 ms
just-one-mega:   149.583 s    11.009 s      7.553 s     264.6 ms

                           Speedup factors

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename
                 v2.30.0      current     detection    merge-ort
                ----------   ---------   -----------   ---------
few-renames:        1           1.05         1.6           95
mega-renames:       1          16.5         29           9012
just-one-mega:      1          13.6         20            565

And, for partial clone users:

             Factor reduction in number of objects needed

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename
                 v2.30.0      current     detection    merge-ort
                ----------   ---------   -----------   ---------
mega-renames:       1            1            1          181.3

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/merge.c  | 10 ++++++++--
 builtin/rebase.c |  2 +-
 sequencer.c      |  4 ++--
 3 files changed, 11 insertions(+), 5 deletions(-)

Comments

Ævar Arnfjörð Bjarmason Aug. 2, 2021, 3:55 p.m. UTC | #1
On Sun, Aug 01 2021, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
> [...]
> @@ -3968,7 +3968,7 @@ static int do_merge(struct repository *r,
>  	o.branch2 = ref_name.buf;
>  	o.buffer_output = 2;
>  
> -	if (opts->strategy && !strcmp(opts->strategy, "ort")) {
> +	if (!opts->strategy || strcmp(opts->strategy, "recursive")) {
>  		/*
>  		 * TODO: Should use merge_incore_recursive() and
>  		 * merge_switch_to_result(), skipping the call to

I might spot more tiny issues, but it looks like our error messaging
needs updating for 14c4586c2df (merge,rebase,revert: select ort or
recursive by config or environment, 2020-11-02).

I.e. we die on "Unknown option for merge-recursive", presumably that
should be updated to indicate that we might call one of
merge_recursive() or merge_ort_recursive() now.

And perhaps this in sequencer.c:

    that represents the "current" state for merge-recursive[...]
Elijah Newren Aug. 2, 2021, 4:33 p.m. UTC | #2
On Mon, Aug 2, 2021 at 9:56 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
> On Sun, Aug 01 2021, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> > [...]
> > @@ -3968,7 +3968,7 @@ static int do_merge(struct repository *r,
> >       o.branch2 = ref_name.buf;
> >       o.buffer_output = 2;
> >
> > -     if (opts->strategy && !strcmp(opts->strategy, "ort")) {
> > +     if (!opts->strategy || strcmp(opts->strategy, "recursive")) {
> >               /*
> >                * TODO: Should use merge_incore_recursive() and
> >                * merge_switch_to_result(), skipping the call to
>
> I might spot more tiny issues, but it looks like our error messaging
> needs updating for 14c4586c2df (merge,rebase,revert: select ort or
> recursive by config or environment, 2020-11-02).
>
> I.e. we die on "Unknown option for merge-recursive", presumably that
> should be updated to indicate that we might call one of
> merge_recursive() or merge_ort_recursive() now.

Ooh, good catch.  I think I'd prefer to reword this to "Unknown
strategy option: -X%s"

> And perhaps this in sequencer.c:
>
>     that represents the "current" state for merge-recursive[...]

Yeah, it's just a comment but it should still be updated.  I'll
s/merge-recursive/the merge machinery/ for this one.

I tried to look for other error messages or comments similar to these
two but didn't find anything.  I might have missed something, though.

I'll get these fixed up with the next submission.
Johannes Schindelin Aug. 2, 2021, 10:46 p.m. UTC | #3
Hi Elijah,

On Sun, 1 Aug 2021, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
>
> There are a few reasons to switch the default:
> [...]

I think it would be really fantastic to change to the new default right
after v2.33.0.

As to the patch, I only struggled slightly with the changes to
`sequencer.c`:

> diff --git a/sequencer.c b/sequencer.c
> index 0bec01cf38e..a98de9a8d15 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -636,7 +636,7 @@ static int do_recursive_merge(struct repository *r,
>  	for (i = 0; i < opts->xopts_nr; i++)
>  		parse_merge_opt(&o, opts->xopts[i]);
>
> -	if (opts->strategy && !strcmp(opts->strategy, "ort")) {
> +	if (!opts->strategy || strcmp(opts->strategy, "recursive")) {

At this stage, we're in `do_recursive_merge()`, and there is only one
caller, `do_pick_commit()`, and the caller is guarded by the following
condition:

        else if (!opts->strategy ||
                 !strcmp(opts->strategy, "recursive") ||
                 !strcmp(opts->strategy, "ort") ||
                 command == TODO_REVERT) {

The issue I see is with `git revert` allowing custom merge strategies. I
_think_ we need a slightly different patch here, something like this:

-	if (opts->strategy && !strcmp(opts->strategy, "ort")) {
+	if (!opts->strategy || !strcmp(opts->strategy, "ort")) {

>  		memset(&result, 0, sizeof(result));
>  		merge_incore_nonrecursive(&o, base_tree, head_tree, next_tree,
>  					    &result);
> @@ -3968,7 +3968,7 @@ static int do_merge(struct repository *r,
>  	o.branch2 = ref_name.buf;
>  	o.buffer_output = 2;
>
> -	if (opts->strategy && !strcmp(opts->strategy, "ort")) {
> +	if (!opts->strategy || strcmp(opts->strategy, "recursive")) {

It took me a while to convince myself that this is correct. At least now I
_think_ it is correct: `do_merge()` defines:

        const char *strategy = !opts->xopts_nr &&
                (!opts->strategy ||
                 !strcmp(opts->strategy, "recursive") ||
                 !strcmp(opts->strategy, "ort")) ?
                NULL : opts->strategy;

and then hands off to `git merge -s <strategy>` if `strategy` is set,
_before_ this hunk. Therefore we can be pretty certain that
`opts->strategy` is either not set, or "ort", or "recursive" at that
stage.

However, I think we could use the same idea I outlined in the previous
hunk, to make things more obvious:

-	if (opts->strategy && !strcmp(opts->strategy, "ort")) {
+	if (!opts->strategy || !strcmp(opts->strategy, "ort")) {

Thank you,
Dscho

>  		/*
>  		 * TODO: Should use merge_incore_recursive() and
>  		 * merge_switch_to_result(), skipping the call to
> --
> gitgitgadget
>
>
Elijah Newren Aug. 3, 2021, 1:04 a.m. UTC | #4
On Mon, Aug 2, 2021 at 4:46 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Elijah,
>
> On Sun, 1 Aug 2021, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > There are a few reasons to switch the default:
> > [...]
>
> I think it would be really fantastic to change to the new default right
> after v2.33.0.
>
> As to the patch, I only struggled slightly with the changes to
> `sequencer.c`:
>
> > diff --git a/sequencer.c b/sequencer.c
> > index 0bec01cf38e..a98de9a8d15 100644
> > --- a/sequencer.c
> > +++ b/sequencer.c
> > @@ -636,7 +636,7 @@ static int do_recursive_merge(struct repository *r,
> >       for (i = 0; i < opts->xopts_nr; i++)
> >               parse_merge_opt(&o, opts->xopts[i]);
> >
> > -     if (opts->strategy && !strcmp(opts->strategy, "ort")) {
> > +     if (!opts->strategy || strcmp(opts->strategy, "recursive")) {
>
> At this stage, we're in `do_recursive_merge()`, and there is only one
> caller, `do_pick_commit()`, and the caller is guarded by the following
> condition:
>
>         else if (!opts->strategy ||
>                  !strcmp(opts->strategy, "recursive") ||
>                  !strcmp(opts->strategy, "ort") ||
>                  command == TODO_REVERT) {
>
> The issue I see is with `git revert` allowing custom merge strategies. I
> _think_ we need a slightly different patch here, something like this:
>
> -       if (opts->strategy && !strcmp(opts->strategy, "ort")) {
> +       if (!opts->strategy || !strcmp(opts->strategy, "ort")) {
>
> >               memset(&result, 0, sizeof(result));
> >               merge_incore_nonrecursive(&o, base_tree, head_tree, next_tree,
> >                                           &result);
> > @@ -3968,7 +3968,7 @@ static int do_merge(struct repository *r,
> >       o.branch2 = ref_name.buf;
> >       o.buffer_output = 2;
> >
> > -     if (opts->strategy && !strcmp(opts->strategy, "ort")) {
> > +     if (!opts->strategy || strcmp(opts->strategy, "recursive")) {
>
> It took me a while to convince myself that this is correct. At least now I
> _think_ it is correct: `do_merge()` defines:
>
>         const char *strategy = !opts->xopts_nr &&
>                 (!opts->strategy ||
>                  !strcmp(opts->strategy, "recursive") ||
>                  !strcmp(opts->strategy, "ort")) ?
>                 NULL : opts->strategy;
>
> and then hands off to `git merge -s <strategy>` if `strategy` is set,
> _before_ this hunk. Therefore we can be pretty certain that
> `opts->strategy` is either not set, or "ort", or "recursive" at that
> stage.
>
> However, I think we could use the same idea I outlined in the previous
> hunk, to make things more obvious:
>
> -       if (opts->strategy && !strcmp(opts->strategy, "ort")) {
> +       if (!opts->strategy || !strcmp(opts->strategy, "ort")) {
>
> Thank you,
> Dscho
>
> >               /*
> >                * TODO: Should use merge_incore_recursive() and
> >                * merge_switch_to_result(), skipping the call to
> > --
> > gitgitgadget

I'll include both suggestions in my next re-roll.  Thanks for the feedback!
Philippe Blain Aug. 3, 2021, 2:56 a.m. UTC | #5
Hi Elijah,

Le 2021-07-31 à 20:07, Elijah Newren via GitGitGadget a écrit :
> From: Elijah Newren <newren@gmail.com>
> 
> 
>    * `git diff AUTO_MERGE` -- ability to see what changes the user has
>      made to resolve conflicts so far (see commit 5291828df8 ("merge-ort:
>      write $GIT_DIR/AUTO_MERGE whenever we hit a conflict", 2021-03-20)
> 
>
> The last three have been implemented already (though only one has been
> submitted upstream so far; 

 From what I could find this indeed only refers to your 5291828df8 (merge-ort:
write $GIT_DIR/AUTO_MERGE whenever we hit a conflict, 2021-03-20).
This is a very nice improvement, but I noticed it is not mentioned in the doc.
Do you plan to update the 'git diff' doc to mention that special ref ?
(And maybe also gitrevisions(5), where most of the special refs are listed ?)

Do you plan to implement a new '--auto-merge' option to 'git diff' as a shortcut
to 'git diff AUTO_MERGE', in order to hide a bit the special ref from users ?

Thanks a lot for your work,

Philippe.
Elijah Newren Aug. 3, 2021, 3:39 a.m. UTC | #6
Hi Philippe,

On Mon, Aug 2, 2021 at 8:56 PM Philippe Blain
<levraiphilippeblain@gmail.com> wrote:
>
> Hi Elijah,
>
> Le 2021-07-31 à 20:07, Elijah Newren via GitGitGadget a écrit :
> > From: Elijah Newren <newren@gmail.com>
> >
> >
> >    * `git diff AUTO_MERGE` -- ability to see what changes the user has
> >      made to resolve conflicts so far (see commit 5291828df8 ("merge-ort:
> >      write $GIT_DIR/AUTO_MERGE whenever we hit a conflict", 2021-03-20)
> >
> >
> > The last three have been implemented already (though only one has been
> > submitted upstream so far;
>
>  From what I could find this indeed only refers to your 5291828df8 (merge-ort:
> write $GIT_DIR/AUTO_MERGE whenever we hit a conflict, 2021-03-20).
> This is a very nice improvement, but I noticed it is not mentioned in the doc.
> Do you plan to update the 'git diff' doc to mention that special ref ?
> (And maybe also gitrevisions(5), where most of the special refs are listed ?)
>
> Do you plan to implement a new '--auto-merge' option to 'git diff' as a shortcut
> to 'git diff AUTO_MERGE', in order to hide a bit the special ref from users ?

Fair point, it probably does deserve to be documented somewhere, at
least once ort is the default merge algorithm.

I don't think it'd make sense to include a reference to it in
gitrevisions(5), since $GIT_DIR/AUTO_MERGE is a reference to a tree
rather than to a revision.  But documenting that special ref in
Documentation/git-diff.txt, and perhaps linking to it from other
conflict-related options (e.g. --base, --ours, --theirs) may make
sense.  Your --auto-merge idea may also make sense, and it'd be
somewhat similar to how git-rebase has a --show-current-patch option
that is shorthand for `git show REBASE_HEAD` and is documented as
such.  However, it might be confusing to users how to combine
--auto-merge with paths, whereas `git diff AUTO_MERGE -- pathname` is
pretty clear once you know that AUTO_MERGE is a tree you can diff
against.  Hmmm....
diff mbox series

Patch

diff --git a/builtin/merge.c b/builtin/merge.c
index a8a843b1f54..217c5061eb6 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -88,9 +88,9 @@  static int autostash;
 static int no_verify;
 
 static struct strategy all_strategy[] = {
-	{ "recursive",  DEFAULT_TWOHEAD | NO_TRIVIAL },
+	{ "recursive",  NO_TRIVIAL },
 	{ "octopus",    DEFAULT_OCTOPUS },
-	{ "ort",        NO_TRIVIAL },
+	{ "ort",        DEFAULT_TWOHEAD | NO_TRIVIAL },
 	{ "resolve",    0 },
 	{ "ours",       NO_FAST_FORWARD | NO_TRIVIAL },
 	{ "subtree",    NO_FAST_FORWARD | NO_TRIVIAL },
@@ -1484,6 +1484,12 @@  int cmd_merge(int argc, const char **argv, const char *prefix)
 			fast_forward = FF_NO;
 	}
 
+	if (!use_strategies && !pull_twohead &&
+	    remoteheads && !remoteheads->next) {
+		char *default_strategy = getenv("GIT_TEST_MERGE_ALGORITHM");
+		if (default_strategy)
+			append_strategy(get_strategy(default_strategy));
+	}
 	if (!use_strategies) {
 		if (!remoteheads)
 			; /* already up-to-date */
diff --git a/builtin/rebase.c b/builtin/rebase.c
index 12f093121d9..587a2f1d299 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -1713,7 +1713,7 @@  int cmd_rebase(int argc, const char **argv, const char *prefix)
 		int i;
 
 		if (!options.strategy)
-			options.strategy = "recursive";
+			options.strategy = "ort";
 
 		strbuf_reset(&buf);
 		for (i = 0; i < strategy_options.nr; i++)
diff --git a/sequencer.c b/sequencer.c
index 0bec01cf38e..a98de9a8d15 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -636,7 +636,7 @@  static int do_recursive_merge(struct repository *r,
 	for (i = 0; i < opts->xopts_nr; i++)
 		parse_merge_opt(&o, opts->xopts[i]);
 
-	if (opts->strategy && !strcmp(opts->strategy, "ort")) {
+	if (!opts->strategy || strcmp(opts->strategy, "recursive")) {
 		memset(&result, 0, sizeof(result));
 		merge_incore_nonrecursive(&o, base_tree, head_tree, next_tree,
 					    &result);
@@ -3968,7 +3968,7 @@  static int do_merge(struct repository *r,
 	o.branch2 = ref_name.buf;
 	o.buffer_output = 2;
 
-	if (opts->strategy && !strcmp(opts->strategy, "ort")) {
+	if (!opts->strategy || strcmp(opts->strategy, "recursive")) {
 		/*
 		 * TODO: Should use merge_incore_recursive() and
 		 * merge_switch_to_result(), skipping the call to