diff mbox series

[1/2] Change default merge backend from recursive to ort

Message ID 4a0f088f3669a95c7f75e885d06c0a3bdaf31f42.1628055482.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Switch default merge backend from recursive to ort | expand

Commit Message

Elijah Newren Aug. 4, 2021, 5:38 a.m. UTC
From: Elijah Newren <newren@gmail.com>

There are a few reasons to switch the default:
  * Correctness
  * Extensibility
  * Performance

I'll provide some summaries about each.

=== Correctness ===

The original impetus for a new merge backend was to fix issues that were
difficult to fix within recursive's design.  The success with this goal
is perhaps most easily demonstrated by running the following:

  $ git grep -2 KNOWN_FAILURE t/ | grep -A 4 GIT_TEST_MERGE_ALGORITHM
  $ git grep test_expect_merge_algorithm.failure.success t/
  $ git grep test_expect_merge_algorithm.success.failure t/

In order, these greps show:

  * Seven sets of submodule tests (10 total tests) that fail with
    recursive but succeed with ort
  * 22 other tests that fail with recursive, but succeed with ort
  * 0 tests that pass with recursive, but fail with ort

=== Extensibility ===

Being able to perform merges without touching the working tree or index
makes it possible to create new features that were difficult with the
old backend:

  * Merging, cherry-picking, rebasing, reverting in bare repositories...
    or just on branches that aren't checked out.

  * `git diff AUTO_MERGE` -- ability to see what changes the user has
    made to resolve conflicts so far (see commit 5291828df8 ("merge-ort:
    write $GIT_DIR/AUTO_MERGE whenever we hit a conflict", 2021-03-20)

  * A --remerge-diff option for log/show, used to show diffs for merges
    that display the difference between what an automatic merge would
    have created and what was recorded in the merge.  (This option will
    often result in an empty diff because many merges are clean, but for
    the non-clean ones it will show how conflicts were fixed including
    the removal of conflict markers, and also show additional changes
    made outside of conflict regions to e.g. fix semantic conflicts.)

  * A --remerge-diff-only option for log/show, similar to --remerge-diff
    but also showing how cherry-picks or reverts differed from what an
    automatic cherry-pick or revert would provide.

The last three have been implemented already (though only one has been
submitted upstream so far; the others were waiting for performance work
to complete), and I still plan to implement the first one.

=== Performance ===

I'll quote from the summary of my final optimization for merge-ort
(while fixing the testcase name from 'no-renames' to 'few-renames'):

                               Timings

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename    merge-ort
                 v2.30.0      current     detection     current
                ----------   ---------   -----------   ---------
few-renames:      18.912 s    18.030 s     11.699 s     198.3 ms
mega-renames:   5964.031 s   361.281 s    203.886 s     661.8 ms
just-one-mega:   149.583 s    11.009 s      7.553 s     264.6 ms

                           Speedup factors

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename
                 v2.30.0      current     detection    merge-ort
                ----------   ---------   -----------   ---------
few-renames:        1           1.05         1.6           95
mega-renames:       1          16.5         29           9012
just-one-mega:      1          13.6         20            565

And, for partial clone users:

             Factor reduction in number of objects needed

                                          Infinite
                 merge-       merge-     Parallelism
                recursive    recursive    of rename
                 v2.30.0      current     detection    merge-ort
                ----------   ---------   -----------   ---------
mega-renames:       1            1            1          181.3

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/merge.c  | 10 ++++++++--
 builtin/rebase.c |  2 +-
 sequencer.c      |  4 ++--
 3 files changed, 11 insertions(+), 5 deletions(-)

Comments

Phillip Wood Aug. 9, 2021, 5:38 p.m. UTC | #1
Hi Elijah

On 04/08/2021 06:38, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> There are a few reasons to switch the default:
>    * Correctness
>    * Extensibility
>    * Performance
> 
> I'll provide some summaries about each.
> 
> === Correctness ===
> 
> The original impetus for a new merge backend was to fix issues that were
> difficult to fix within recursive's design.  The success with this goal
> is perhaps most easily demonstrated by running the following:
> 
>    $ git grep -2 KNOWN_FAILURE t/ | grep -A 4 GIT_TEST_MERGE_ALGORITHM
>    $ git grep test_expect_merge_algorithm.failure.success t/
>    $ git grep test_expect_merge_algorithm.success.failure t/
> 
> In order, these greps show:
> 
>    * Seven sets of submodule tests (10 total tests) that fail with
>      recursive but succeed with ort
>    * 22 other tests that fail with recursive, but succeed with ort
>    * 0 tests that pass with recursive, but fail with ort

I've not followed your patches too closely but I did notice you seem to 
have put a lot of effort into improving the test coverage which is 
fantastic.

> === Extensibility ===
> 
> Being able to perform merges without touching the working tree or index
> makes it possible to create new features that were difficult with the
> old backend:
> 
>    * Merging, cherry-picking, rebasing, reverting in bare repositories...
>      or just on branches that aren't checked out.

Rebasing without updating the worktree for each commit is exciting.

I agree with others that this should be merged into next sooner rather 
than later. I also agree with Peff's point about moving it into master 
to get more people using it rather than sitting in next for too long.

I think the sequencer changes below are easier to follow in this 
version. One thing I did wonder is whether there needs to be any change 
to the CI scripts to ensure we keep testing both merge implementations.

Best wishes and congratulations on an impressive achievement

Phillip

>    * `git diff AUTO_MERGE` -- ability to see what changes the user has
>      made to resolve conflicts so far (see commit 5291828df8 ("merge-ort:
>      write $GIT_DIR/AUTO_MERGE whenever we hit a conflict", 2021-03-20)
> 
>    * A --remerge-diff option for log/show, used to show diffs for merges
>      that display the difference between what an automatic merge would
>      have created and what was recorded in the merge.  (This option will
>      often result in an empty diff because many merges are clean, but for
>      the non-clean ones it will show how conflicts were fixed including
>      the removal of conflict markers, and also show additional changes
>      made outside of conflict regions to e.g. fix semantic conflicts.)
> 
>    * A --remerge-diff-only option for log/show, similar to --remerge-diff
>      but also showing how cherry-picks or reverts differed from what an
>      automatic cherry-pick or revert would provide.
> 
> The last three have been implemented already (though only one has been
> submitted upstream so far; the others were waiting for performance work
> to complete), and I still plan to implement the first one.
> 
> === Performance ===
> 
> I'll quote from the summary of my final optimization for merge-ort
> (while fixing the testcase name from 'no-renames' to 'few-renames'):
> 
>                                 Timings
> 
>                                            Infinite
>                   merge-       merge-     Parallelism
>                  recursive    recursive    of rename    merge-ort
>                   v2.30.0      current     detection     current
>                  ----------   ---------   -----------   ---------
> few-renames:      18.912 s    18.030 s     11.699 s     198.3 ms
> mega-renames:   5964.031 s   361.281 s    203.886 s     661.8 ms
> just-one-mega:   149.583 s    11.009 s      7.553 s     264.6 ms
> 
>                             Speedup factors
> 
>                                            Infinite
>                   merge-       merge-     Parallelism
>                  recursive    recursive    of rename
>                   v2.30.0      current     detection    merge-ort
>                  ----------   ---------   -----------   ---------
> few-renames:        1           1.05         1.6           95
> mega-renames:       1          16.5         29           9012
> just-one-mega:      1          13.6         20            565
> 
> And, for partial clone users:
> 
>               Factor reduction in number of objects needed
> 
>                                            Infinite
>                   merge-       merge-     Parallelism
>                  recursive    recursive    of rename
>                   v2.30.0      current     detection    merge-ort
>                  ----------   ---------   -----------   ---------
> mega-renames:       1            1            1          181.3
> 
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>   builtin/merge.c  | 10 ++++++++--
>   builtin/rebase.c |  2 +-
>   sequencer.c      |  4 ++--
>   3 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/builtin/merge.c b/builtin/merge.c
> index d7b14bf4a7f..fbe21d413c1 100644
> --- a/builtin/merge.c
> +++ b/builtin/merge.c
> @@ -88,9 +88,9 @@ static int autostash;
>   static int no_verify;
>   
>   static struct strategy all_strategy[] = {
> -	{ "recursive",  DEFAULT_TWOHEAD | NO_TRIVIAL },
> +	{ "recursive",  NO_TRIVIAL },
>   	{ "octopus",    DEFAULT_OCTOPUS },
> -	{ "ort",        NO_TRIVIAL },
> +	{ "ort",        DEFAULT_TWOHEAD | NO_TRIVIAL },
>   	{ "resolve",    0 },
>   	{ "ours",       NO_FAST_FORWARD | NO_TRIVIAL },
>   	{ "subtree",    NO_FAST_FORWARD | NO_TRIVIAL },
> @@ -1484,6 +1484,12 @@ int cmd_merge(int argc, const char **argv, const char *prefix)
>   			fast_forward = FF_NO;
>   	}
>   
> +	if (!use_strategies && !pull_twohead &&
> +	    remoteheads && !remoteheads->next) {
> +		char *default_strategy = getenv("GIT_TEST_MERGE_ALGORITHM");
> +		if (default_strategy)
> +			append_strategy(get_strategy(default_strategy));
> +	}
>   	if (!use_strategies) {
>   		if (!remoteheads)
>   			; /* already up-to-date */
> diff --git a/builtin/rebase.c b/builtin/rebase.c
> index 12f093121d9..587a2f1d299 100644
> --- a/builtin/rebase.c
> +++ b/builtin/rebase.c
> @@ -1713,7 +1713,7 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
>   		int i;
>   
>   		if (!options.strategy)
> -			options.strategy = "recursive";
> +			options.strategy = "ort";
>   
>   		strbuf_reset(&buf);
>   		for (i = 0; i < strategy_options.nr; i++)
> diff --git a/sequencer.c b/sequencer.c
> index a4e5c43fcf2..d08ddae52fa 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -636,7 +636,7 @@ static int do_recursive_merge(struct repository *r,
>   	for (i = 0; i < opts->xopts_nr; i++)
>   		parse_merge_opt(&o, opts->xopts[i]);
>   
> -	if (opts->strategy && !strcmp(opts->strategy, "ort")) {
> +	if (!opts->strategy || !strcmp(opts->strategy, "ort")) {
>   		memset(&result, 0, sizeof(result));
>   		merge_incore_nonrecursive(&o, base_tree, head_tree, next_tree,
>   					    &result);
> @@ -3988,7 +3988,7 @@ static int do_merge(struct repository *r,
>   	o.branch2 = ref_name.buf;
>   	o.buffer_output = 2;
>   
> -	if (opts->strategy && !strcmp(opts->strategy, "ort")) {
> +	if (!opts->strategy || !strcmp(opts->strategy, "ort")) {
>   		/*
>   		 * TODO: Should use merge_incore_recursive() and
>   		 * merge_switch_to_result(), skipping the call to
>
Elijah Newren Aug. 9, 2021, 9:01 p.m. UTC | #2
On Mon, Aug 9, 2021 at 10:38 AM Phillip Wood <phillip.wood123@gmail.com> wrote:
...
> > === Extensibility ===
> >
> > Being able to perform merges without touching the working tree or index
> > makes it possible to create new features that were difficult with the
> > old backend:
> >
> >    * Merging, cherry-picking, rebasing, reverting in bare repositories...
> >      or just on branches that aren't checked out.
>
> Rebasing without updating the worktree for each commit is exciting.

Yeah, there appear to be some interesting twists from my initial
investigations so far.  I may need to start some interesting
discussions in the next cycle or two about what we should do here (and
also for merges in bare repositories.)

> I agree with others that this should be merged into next sooner rather
> than later. I also agree with Peff's point about moving it into master
> to get more people using it rather than sitting in next for too long.

:-)

> I think the sequencer changes below are easier to follow in this
> version.

Thanks for taking a look.

> One thing I did wonder is whether there needs to be any change
> to the CI scripts to ensure we keep testing both merge implementations.

There did need to be such a change, but it was made previously in
commit f3b964a07e ("Add testing with merge-ort merge strategy",
2021-03-20).  That commit changed the default merge backend *for
testing* to be merge-ort, and modified the linux-gcc job to explicitly
specify recursive.  This commit doesn't change the testing story, so
the recursive backend will still continue to be tested with the
linux-gcc tests, or whenever someone requests it with
GIT_TEST_MERGE_ALGORITHM=recursive, and otherwise merge-ort will be
used.

> Best wishes and congratulations on an impressive achievement

Thanks!
diff mbox series

Patch

diff --git a/builtin/merge.c b/builtin/merge.c
index d7b14bf4a7f..fbe21d413c1 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -88,9 +88,9 @@  static int autostash;
 static int no_verify;
 
 static struct strategy all_strategy[] = {
-	{ "recursive",  DEFAULT_TWOHEAD | NO_TRIVIAL },
+	{ "recursive",  NO_TRIVIAL },
 	{ "octopus",    DEFAULT_OCTOPUS },
-	{ "ort",        NO_TRIVIAL },
+	{ "ort",        DEFAULT_TWOHEAD | NO_TRIVIAL },
 	{ "resolve",    0 },
 	{ "ours",       NO_FAST_FORWARD | NO_TRIVIAL },
 	{ "subtree",    NO_FAST_FORWARD | NO_TRIVIAL },
@@ -1484,6 +1484,12 @@  int cmd_merge(int argc, const char **argv, const char *prefix)
 			fast_forward = FF_NO;
 	}
 
+	if (!use_strategies && !pull_twohead &&
+	    remoteheads && !remoteheads->next) {
+		char *default_strategy = getenv("GIT_TEST_MERGE_ALGORITHM");
+		if (default_strategy)
+			append_strategy(get_strategy(default_strategy));
+	}
 	if (!use_strategies) {
 		if (!remoteheads)
 			; /* already up-to-date */
diff --git a/builtin/rebase.c b/builtin/rebase.c
index 12f093121d9..587a2f1d299 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -1713,7 +1713,7 @@  int cmd_rebase(int argc, const char **argv, const char *prefix)
 		int i;
 
 		if (!options.strategy)
-			options.strategy = "recursive";
+			options.strategy = "ort";
 
 		strbuf_reset(&buf);
 		for (i = 0; i < strategy_options.nr; i++)
diff --git a/sequencer.c b/sequencer.c
index a4e5c43fcf2..d08ddae52fa 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -636,7 +636,7 @@  static int do_recursive_merge(struct repository *r,
 	for (i = 0; i < opts->xopts_nr; i++)
 		parse_merge_opt(&o, opts->xopts[i]);
 
-	if (opts->strategy && !strcmp(opts->strategy, "ort")) {
+	if (!opts->strategy || !strcmp(opts->strategy, "ort")) {
 		memset(&result, 0, sizeof(result));
 		merge_incore_nonrecursive(&o, base_tree, head_tree, next_tree,
 					    &result);
@@ -3988,7 +3988,7 @@  static int do_merge(struct repository *r,
 	o.branch2 = ref_name.buf;
 	o.buffer_output = 2;
 
-	if (opts->strategy && !strcmp(opts->strategy, "ort")) {
+	if (!opts->strategy || !strcmp(opts->strategy, "ort")) {
 		/*
 		 * TODO: Should use merge_incore_recursive() and
 		 * merge_switch_to_result(), skipping the call to