diff mbox series

[v7,09/15] merge-resolve: rewrite in C

Message ID 20210317204939.17890-10-alban.gruin@gmail.com (mailing list archive)
State New, archived
Headers show
Series Rewrite the remaining merge strategies from shell to C | expand

Commit Message

Alban Gruin March 17, 2021, 8:49 p.m. UTC
This rewrites `git merge-resolve' from shell to C.  As for `git
merge-one-file', this port is not completely straightforward and removes
calls to external processes to avoid reading and writing the index over
and over again.

 - The call to `update-index -q --refresh' is replaced by a call to
   refresh_index().

 - The call to `read-tree' is replaced by a call to unpack_trees() (and
   all the setup needed).

 - The call to `write-tree' is replaced by a call to
   write_index_as_tree().

 - The call to `merge-index', needed to invoke `git merge-one-file', is
   replaced by a call to the new merge_all_index() function.

The index is read in cmd_merge_resolve(), and is wrote back by
merge_strategies_resolve().

The parameters of merge_strategies_resolve() will be surprising at first
glance: why using a commit list for `bases' and `remote', where we could
use an oid array, and a pointer to an oid?  Because, in a later commit,
try_merge_strategy() will be able to call merge_strategies_resolve()
directly, and it already uses a commit list for `bases' (`common') and
`remote' (`remoteheads'), and a string for `head_arg'.  To reduce
frictions later, merge_strategies_resolve() takes the same types of
parameters.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---
 Makefile                |  2 +-
 builtin.h               |  1 +
 builtin/merge-resolve.c | 74 ++++++++++++++++++++++++++++++++
 git-merge-resolve.sh    | 54 -----------------------
 git.c                   |  1 +
 merge-strategies.c      | 95 +++++++++++++++++++++++++++++++++++++++++
 merge-strategies.h      |  5 +++
 7 files changed, 177 insertions(+), 55 deletions(-)
 create mode 100644 builtin/merge-resolve.c
 delete mode 100755 git-merge-resolve.sh

Comments

Johannes Schindelin March 23, 2021, 10:21 p.m. UTC | #1
Hi Alban,

On Wed, 17 Mar 2021, Alban Gruin wrote:

> diff --git a/merge-strategies.c b/merge-strategies.c
> index 2717af51fd..a51700dae5 100644
> --- a/merge-strategies.c
> +++ b/merge-strategies.c
> @@ -272,3 +275,95 @@ int merge_all_index(struct index_state *istate, int oneshot, int quiet,
>
>  	return err;
>  }
> +
> +static int fast_forward(struct repository *r, struct tree_desc *t,
> +			int nr, int aggressive)
> +{
> +	struct unpack_trees_options opts;
> +	struct lock_file lock = LOCK_INIT;
> +
> +	refresh_index(r->index, REFRESH_QUIET, NULL, NULL, NULL);
> +	repo_hold_locked_index(r, &lock, LOCK_DIE_ON_ERROR);

Shouldn't we lock the index first, and _then_ refresh it? I guess not,
seeing as we don't do that either in `cmd_status()`: there, we also
refresh the index and _then_ lock it.

> +
> +	memset(&opts, 0, sizeof(opts));
> +	opts.head_idx = 1;
> +	opts.src_index = r->index;
> +	opts.dst_index = r->index;
> +	opts.merge = 1;
> +	opts.update = 1;
> +	opts.aggressive = aggressive;
> +
> +	if (nr == 1)
> +		opts.fn = oneway_merge;
> +	else if (nr == 2) {
> +		opts.fn = twoway_merge;
> +		opts.initial_checkout = is_index_unborn(r->index);
> +	} else if (nr >= 3) {
> +		opts.fn = threeway_merge;
> +		opts.head_idx = nr - 1;
> +	}

Given the function's name `fast_forward()`, I have to admit that I
somewhat stumbled over these merges.
> +
> +	if (unpack_trees(nr, t, &opts))
> +		return -1;
> +
> +	if (write_locked_index(r->index, &lock, COMMIT_LOCK))
> +		return error(_("unable to write new index file"));
> +
> +	return 0;
> +}
> +
> +static int add_tree(struct tree *tree, struct tree_desc *t)
> +{
> +	if (parse_tree(tree))
> +		return -1;
> +
> +	init_tree_desc(t, tree->buffer, tree->size);
> +	return 0;
> +}

This is a really trivial helper, but it is used a couple times below, so
it makes sense to have it encapsulated in a separate function.

> +
> +int merge_strategies_resolve(struct repository *r,
> +			     struct commit_list *bases, const char *head_arg,
> +			     struct commit_list *remote)

Since it is a list, and since the original variable in the shell script
had been named in the plural form, let's do the same here: `remotes`.

> +{
> +	struct tree_desc t[MAX_UNPACK_TREES];
> +	struct object_id head, oid;
> +	struct commit_list *i;
> +	int nr = 0;
> +
> +	if (head_arg)
> +		get_oid(head_arg, &head);
> +
> +	puts(_("Trying simple merge."));

Good. Usually I would recommend to print this to `stderr`, but the
original script prints it to `stdout`, so we should do that here, too.

> +
> +	for (i = bases; i && i->item; i = i->next) {
> +		if (add_tree(repo_get_commit_tree(r, i->item), t + (nr++)))
> +			return 2;

Since we're talking about a library function, not a `cmd_*()` function,
the return value on error should probably be negative.

Even better would be to let the function return an `enum` that contains
labels with more intuitive meaning than "2".

It _is_ the expected exit code when calling `git merge-resolve`, of course
(because of the `|| exit 2` after that `read-tree` call), but I wonder
whether a better layer for that `2` would be the `cmd_merge_resolve()`
function, letting `merge_strategies_resolve()` report failures in a more
fine-grained fashion.

> +	}
> +
> +	if (head_arg) {

It would probably be easier to read if the `if (head_arg)` clause above
was merged into this here clause.

> +		struct tree *tree = parse_tree_indirect(&head);
> +		if (add_tree(tree, t + (nr++)))
> +			return 2;
> +	}
> +
> +	if (remote && add_tree(repo_get_commit_tree(r, remote->item), t + (nr++)))
> +		return 2;

You get away with assuming that `remotes` only contains at most a single
entry because `cmd_merge_resolve()` verified it.

However, as the intention is to use this as a library function, I think
the input validation needs to be moved here instead of relying on all
callers to verify that they send at most one "remote" ref.

Other than that, this patch looks good to me.

Thanks,
Dscho

> +
> +	if (fast_forward(r, t, nr, 1))
> +		return 2;
> +
> +	if (write_index_as_tree(&oid, r->index, r->index_file,
> +				WRITE_TREE_SILENT, NULL)) {
> +		int ret;
> +		struct lock_file lock = LOCK_INIT;
> +
> +		puts(_("Simple merge failed, trying Automatic merge."));
> +		repo_hold_locked_index(r, &lock, LOCK_DIE_ON_ERROR);
> +		ret = merge_all_index(r->index, 1, 0, merge_one_file_func, NULL);
> +
> +		write_locked_index(r->index, &lock, COMMIT_LOCK);
> +		return !!ret;
> +	}
> +
> +	return 0;
> +}
> diff --git a/merge-strategies.h b/merge-strategies.h
> index 8705a550ca..bba4bf999c 100644
> --- a/merge-strategies.h
> +++ b/merge-strategies.h
> @@ -1,6 +1,7 @@
>  #ifndef MERGE_STRATEGIES_H
>  #define MERGE_STRATEGIES_H
>
> +#include "commit.h"
>  #include "object.h"
>
>  int merge_three_way(struct index_state *istate,
> @@ -28,4 +29,8 @@ int merge_index_path(struct index_state *istate, int oneshot, int quiet,
>  int merge_all_index(struct index_state *istate, int oneshot, int quiet,
>  		    merge_fn fn, void *data);
>
> +int merge_strategies_resolve(struct repository *r,
> +			     struct commit_list *bases, const char *head_arg,
> +			     struct commit_list *remote);
> +
>  #endif /* MERGE_STRATEGIES_H */
> --
> 2.31.0
>
>
Alban Gruin April 10, 2021, 2:17 p.m. UTC | #2
Hi Johannes,

Le 23/03/2021 à 23:21, Johannes Schindelin a écrit :
> Hi Alban,
> 
> On Wed, 17 Mar 2021, Alban Gruin wrote:
> 
>> diff --git a/merge-strategies.c b/merge-strategies.c
>> index 2717af51fd..a51700dae5 100644
>> --- a/merge-strategies.c
>> +++ b/merge-strategies.c
>> @@ -272,3 +275,95 @@ int merge_all_index(struct index_state *istate, int oneshot, int quiet,
>>
>>  	return err;
>>  }
>> +
>> +static int fast_forward(struct repository *r, struct tree_desc *t,
>> +			int nr, int aggressive)
>> +{
>> +	struct unpack_trees_options opts;
>> +	struct lock_file lock = LOCK_INIT;
>> +
>> +	refresh_index(r->index, REFRESH_QUIET, NULL, NULL, NULL);
>> +	repo_hold_locked_index(r, &lock, LOCK_DIE_ON_ERROR);
> 
> Shouldn't we lock the index first, and _then_ refresh it? I guess not,
> seeing as we don't do that either in `cmd_status()`: there, we also
> refresh the index and _then_ lock it.
> 

Yeah, I don't think I saw a lock/refresh sequence, but I may be wrong.

>> +
>> +	memset(&opts, 0, sizeof(opts));
>> +	opts.head_idx = 1;
>> +	opts.src_index = r->index;
>> +	opts.dst_index = r->index;
>> +	opts.merge = 1;
>> +	opts.update = 1;
>> +	opts.aggressive = aggressive;
>> +
>> +	if (nr == 1)
>> +		opts.fn = oneway_merge;
>> +	else if (nr == 2) {
>> +		opts.fn = twoway_merge;
>> +		opts.initial_checkout = is_index_unborn(r->index);
>> +	} else if (nr >= 3) {
>> +		opts.fn = threeway_merge;
>> +		opts.head_idx = nr - 1;
>> +	}
> 
> Given the function's name `fast_forward()`, I have to admit that I
> somewhat stumbled over these merges.
>> +
>> +	if (unpack_trees(nr, t, &opts))
>> +		return -1;
>> +

I just noticed that the lock is not released if there is an error here.

>> +	if (write_locked_index(r->index, &lock, COMMIT_LOCK))
>> +		return error(_("unable to write new index file"));
>> +
>> +	return 0;
>> +}
>> +
>> +static int add_tree(struct tree *tree, struct tree_desc *t)
>> +{
>> +	if (parse_tree(tree))
>> +		return -1;
>> +
>> +	init_tree_desc(t, tree->buffer, tree->size);
>> +	return 0;
>> +}
> 
> This is a really trivial helper, but it is used a couple times below, so
> it makes sense to have it encapsulated in a separate function.
> 
>> +
>> +int merge_strategies_resolve(struct repository *r,
>> +			     struct commit_list *bases, const char *head_arg,
>> +			     struct commit_list *remote)
> 
> Since it is a list, and since the original variable in the shell script
> had been named in the plural form, let's do the same here: `remotes`.
> 

This one is supposed to contain only one commit, so I'm not really
conviced that this parameter should be in the plural form.

>> +{
>> +	struct tree_desc t[MAX_UNPACK_TREES];
>> +	struct object_id head, oid;
>> +	struct commit_list *i;
>> +	int nr = 0;
>> +
>> +	if (head_arg)
>> +		get_oid(head_arg, &head);
>> +
>> +	puts(_("Trying simple merge."));
> 
> Good. Usually I would recommend to print this to `stderr`, but the
> original script prints it to `stdout`, so we should do that here, too.
> 
>> +
>> +	for (i = bases; i && i->item; i = i->next) {
>> +		if (add_tree(repo_get_commit_tree(r, i->item), t + (nr++)))
>> +			return 2;
> 
> Since we're talking about a library function, not a `cmd_*()` function,
> the return value on error should probably be negative.
> 
> Even better would be to let the function return an `enum` that contains
> labels with more intuitive meaning than "2".
> 
> It _is_ the expected exit code when calling `git merge-resolve`, of course
> (because of the `|| exit 2` after that `read-tree` call), but I wonder
> whether a better layer for that `2` would be the `cmd_merge_resolve()`
> function, letting `merge_strategies_resolve()` report failures in a more
> fine-grained fashion.
> 

Right -- I'll see what I can do here.

>> +	}
>> +
>> +	if (head_arg) {
> 
> It would probably be easier to read if the `if (head_arg)` clause above
> was merged into this here clause.
> 
>> +		struct tree *tree = parse_tree_indirect(&head);
>> +		if (add_tree(tree, t + (nr++)))
>> +			return 2;
>> +	}
>> +
>> +	if (remote && add_tree(repo_get_commit_tree(r, remote->item), t + (nr++)))
>> +		return 2;
> 
> You get away with assuming that `remotes` only contains at most a single
> entry because `cmd_merge_resolve()` verified it.
> 
> However, as the intention is to use this as a library function, I think
> the input validation needs to be moved here instead of relying on all
> callers to verify that they send at most one "remote" ref.
> 
> Other than that, this patch looks good to me.
> 
Well, this condition checks that there is one commit, and if so, uses it
to call add_tree().  I don't see the mistake here.

Cheers,
Alban

> Thanks,
> Dscho
>
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index e2e4389f76..8fccc38006 100644
--- a/Makefile
+++ b/Makefile
@@ -600,7 +600,6 @@  SCRIPT_SH += git-bisect.sh
 SCRIPT_SH += git-difftool--helper.sh
 SCRIPT_SH += git-filter-branch.sh
 SCRIPT_SH += git-merge-octopus.sh
-SCRIPT_SH += git-merge-resolve.sh
 SCRIPT_SH += git-mergetool.sh
 SCRIPT_SH += git-quiltimport.sh
 SCRIPT_SH += git-request-pull.sh
@@ -1102,6 +1101,7 @@  BUILTIN_OBJS += builtin/merge-index.o
 BUILTIN_OBJS += builtin/merge-one-file.o
 BUILTIN_OBJS += builtin/merge-ours.o
 BUILTIN_OBJS += builtin/merge-recursive.o
+BUILTIN_OBJS += builtin/merge-resolve.o
 BUILTIN_OBJS += builtin/merge-tree.o
 BUILTIN_OBJS += builtin/merge.o
 BUILTIN_OBJS += builtin/mktag.o
diff --git a/builtin.h b/builtin.h
index 227c133036..c3029cef46 100644
--- a/builtin.h
+++ b/builtin.h
@@ -181,6 +181,7 @@  int cmd_merge_ours(int argc, const char **argv, const char *prefix);
 int cmd_merge_file(int argc, const char **argv, const char *prefix);
 int cmd_merge_one_file(int argc, const char **argv, const char *prefix);
 int cmd_merge_recursive(int argc, const char **argv, const char *prefix);
+int cmd_merge_resolve(int argc, const char **argv, const char *prefix);
 int cmd_merge_tree(int argc, const char **argv, const char *prefix);
 int cmd_mktag(int argc, const char **argv, const char *prefix);
 int cmd_mktree(int argc, const char **argv, const char *prefix);
diff --git a/builtin/merge-resolve.c b/builtin/merge-resolve.c
new file mode 100644
index 0000000000..0f2e487c4d
--- /dev/null
+++ b/builtin/merge-resolve.c
@@ -0,0 +1,74 @@ 
+/*
+ * Builtin "git merge-resolve"
+ *
+ * Copyright (c) 2020 Alban Gruin
+ *
+ * Based on git-merge-resolve.sh, written by Linus Torvalds and Junio C
+ * Hamano.
+ *
+ * Resolve two trees, using enhanced multi-base read-tree.
+ */
+
+#include "cache.h"
+#include "builtin.h"
+#include "merge-strategies.h"
+
+static const char builtin_merge_resolve_usage[] =
+	"git merge-resolve <bases>... -- <head> <remote>";
+
+int cmd_merge_resolve(int argc, const char **argv, const char *prefix)
+{
+	int i, sep_seen = 0;
+	const char *head = NULL;
+	struct commit_list *bases = NULL, *remote = NULL;
+	struct commit_list **next_base = &bases;
+	struct repository *r = the_repository;
+
+	if (argc < 5)
+		usage(builtin_merge_resolve_usage);
+
+	setup_work_tree();
+	if (repo_read_index(r) < 0)
+		die("invalid index");
+
+	/*
+	 * The first parameters up to -- are merge bases; the rest are
+	 * heads.
+	 */
+	for (i = 1; i < argc; i++) {
+		if (!strcmp(argv[i], "--"))
+			sep_seen = 1;
+		else if (!strcmp(argv[i], "-h"))
+			usage(builtin_merge_resolve_usage);
+		else if (sep_seen && !head)
+			head = argv[i];
+		else {
+			struct object_id oid;
+			struct commit *commit;
+
+			if (get_oid(argv[i], &oid))
+				die("object %s not found.", argv[i]);
+
+			commit = oideq(&oid, r->hash_algo->empty_tree) ?
+				NULL : lookup_commit_or_die(&oid, argv[i]);
+
+			if (sep_seen)
+				commit_list_insert(commit, &remote);
+			else
+				next_base = commit_list_append(commit, next_base);
+		}
+	}
+
+	/*
+	 * Give up if we are given two or more remotes.  Not handling
+	 * octopus.
+	 */
+	if (remote && remote->next)
+		return 2;
+
+	/* Give up if this is a baseless merge. */
+	if (!bases)
+		return 2;
+
+	return merge_strategies_resolve(r, bases, head, remote);
+}
diff --git a/git-merge-resolve.sh b/git-merge-resolve.sh
deleted file mode 100755
index 0b4fc88b61..0000000000
--- a/git-merge-resolve.sh
+++ /dev/null
@@ -1,54 +0,0 @@ 
-#!/bin/sh
-#
-# Copyright (c) 2005 Linus Torvalds
-# Copyright (c) 2005 Junio C Hamano
-#
-# Resolve two trees, using enhanced multi-base read-tree.
-
-# The first parameters up to -- are merge bases; the rest are heads.
-bases= head= remotes= sep_seen=
-for arg
-do
-	case ",$sep_seen,$head,$arg," in
-	*,--,)
-		sep_seen=yes
-		;;
-	,yes,,*)
-		head=$arg
-		;;
-	,yes,*)
-		remotes="$remotes$arg "
-		;;
-	*)
-		bases="$bases$arg "
-		;;
-	esac
-done
-
-# Give up if we are given two or more remotes -- not handling octopus.
-case "$remotes" in
-?*' '?*)
-	exit 2 ;;
-esac
-
-# Give up if this is a baseless merge.
-if test '' = "$bases"
-then
-	exit 2
-fi
-
-git update-index -q --refresh
-git read-tree -u -m --aggressive $bases $head $remotes || exit 2
-echo "Trying simple merge."
-if result_tree=$(git write-tree 2>/dev/null)
-then
-	exit 0
-else
-	echo "Simple merge failed, trying Automatic merge."
-	if git merge-index -o --use=merge-one-file -a
-	then
-		exit 0
-	else
-		exit 1
-	fi
-fi
diff --git a/git.c b/git.c
index 95eb74efe1..ce1f237369 100644
--- a/git.c
+++ b/git.c
@@ -548,6 +548,7 @@  static struct cmd_struct commands[] = {
 	{ "merge-recursive", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
 	{ "merge-recursive-ours", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
 	{ "merge-recursive-theirs", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
+	{ "merge-resolve", cmd_merge_resolve, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
 	{ "merge-subtree", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
 	{ "merge-tree", cmd_merge_tree, RUN_SETUP | NO_PARSEOPT },
 	{ "mktag", cmd_mktag, RUN_SETUP | NO_PARSEOPT },
diff --git a/merge-strategies.c b/merge-strategies.c
index 2717af51fd..a51700dae5 100644
--- a/merge-strategies.c
+++ b/merge-strategies.c
@@ -1,6 +1,9 @@ 
 #include "cache.h"
+#include "cache-tree.h"
 #include "dir.h"
+#include "lockfile.h"
 #include "merge-strategies.h"
+#include "unpack-trees.h"
 #include "xdiff-interface.h"
 
 static int add_merge_result_to_index(struct index_state *istate, unsigned int mode,
@@ -272,3 +275,95 @@  int merge_all_index(struct index_state *istate, int oneshot, int quiet,
 
 	return err;
 }
+
+static int fast_forward(struct repository *r, struct tree_desc *t,
+			int nr, int aggressive)
+{
+	struct unpack_trees_options opts;
+	struct lock_file lock = LOCK_INIT;
+
+	refresh_index(r->index, REFRESH_QUIET, NULL, NULL, NULL);
+	repo_hold_locked_index(r, &lock, LOCK_DIE_ON_ERROR);
+
+	memset(&opts, 0, sizeof(opts));
+	opts.head_idx = 1;
+	opts.src_index = r->index;
+	opts.dst_index = r->index;
+	opts.merge = 1;
+	opts.update = 1;
+	opts.aggressive = aggressive;
+
+	if (nr == 1)
+		opts.fn = oneway_merge;
+	else if (nr == 2) {
+		opts.fn = twoway_merge;
+		opts.initial_checkout = is_index_unborn(r->index);
+	} else if (nr >= 3) {
+		opts.fn = threeway_merge;
+		opts.head_idx = nr - 1;
+	}
+
+	if (unpack_trees(nr, t, &opts))
+		return -1;
+
+	if (write_locked_index(r->index, &lock, COMMIT_LOCK))
+		return error(_("unable to write new index file"));
+
+	return 0;
+}
+
+static int add_tree(struct tree *tree, struct tree_desc *t)
+{
+	if (parse_tree(tree))
+		return -1;
+
+	init_tree_desc(t, tree->buffer, tree->size);
+	return 0;
+}
+
+int merge_strategies_resolve(struct repository *r,
+			     struct commit_list *bases, const char *head_arg,
+			     struct commit_list *remote)
+{
+	struct tree_desc t[MAX_UNPACK_TREES];
+	struct object_id head, oid;
+	struct commit_list *i;
+	int nr = 0;
+
+	if (head_arg)
+		get_oid(head_arg, &head);
+
+	puts(_("Trying simple merge."));
+
+	for (i = bases; i && i->item; i = i->next) {
+		if (add_tree(repo_get_commit_tree(r, i->item), t + (nr++)))
+			return 2;
+	}
+
+	if (head_arg) {
+		struct tree *tree = parse_tree_indirect(&head);
+		if (add_tree(tree, t + (nr++)))
+			return 2;
+	}
+
+	if (remote && add_tree(repo_get_commit_tree(r, remote->item), t + (nr++)))
+		return 2;
+
+	if (fast_forward(r, t, nr, 1))
+		return 2;
+
+	if (write_index_as_tree(&oid, r->index, r->index_file,
+				WRITE_TREE_SILENT, NULL)) {
+		int ret;
+		struct lock_file lock = LOCK_INIT;
+
+		puts(_("Simple merge failed, trying Automatic merge."));
+		repo_hold_locked_index(r, &lock, LOCK_DIE_ON_ERROR);
+		ret = merge_all_index(r->index, 1, 0, merge_one_file_func, NULL);
+
+		write_locked_index(r->index, &lock, COMMIT_LOCK);
+		return !!ret;
+	}
+
+	return 0;
+}
diff --git a/merge-strategies.h b/merge-strategies.h
index 8705a550ca..bba4bf999c 100644
--- a/merge-strategies.h
+++ b/merge-strategies.h
@@ -1,6 +1,7 @@ 
 #ifndef MERGE_STRATEGIES_H
 #define MERGE_STRATEGIES_H
 
+#include "commit.h"
 #include "object.h"
 
 int merge_three_way(struct index_state *istate,
@@ -28,4 +29,8 @@  int merge_index_path(struct index_state *istate, int oneshot, int quiet,
 int merge_all_index(struct index_state *istate, int oneshot, int quiet,
 		    merge_fn fn, void *data);
 
+int merge_strategies_resolve(struct repository *r,
+			     struct commit_list *bases, const char *head_arg,
+			     struct commit_list *remote);
+
 #endif /* MERGE_STRATEGIES_H */