diff mbox series

[RFC] sequencer - tipped merge strategy

Message ID 20230303145311.513960-1-eantoranz@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC] sequencer - tipped merge strategy | expand

Commit Message

Edmundo Carmona Antoranz March 3, 2023, 2:53 p.m. UTC
When rebasing merge commits and dealing with conflicts, having the
original merge commit as a reference can help us avoid some of
them.

With this patch, we leverage the original merge commit to handle the most
obvious case:
- HEAD tree has to match the tree of the first parent of the original merge
  commit.
- MERGE_HEAD tree has to match the tree of the second parent of the original
  merge commit.
- At least one tree in the merge bases of HEAD/MERGE_HEAD has to match
  a tree in the merge bases of the parent commits of the original merge
  commit.

If all of those conditions are met, we can safely use the tree of the
original merge commit as the resulting tree of this merge that is being
attempted at the time.

Signed-off-by: Edmundo Carmona Antoranz <eantoranz@gmail.com>
---
 .gitignore          |   1 +
 Makefile            |   1 +
 builtin.h           |   1 +
 builtin/merge-tms.c | 148 ++++++++++++++++++++++++++++++++++++++++++++
 git.c               |   1 +
 sequencer.c         |  36 ++++++++++-
 6 files changed, 187 insertions(+), 1 deletion(-)
 create mode 100644 builtin/merge-tms.c

Comments

Junio C Hamano March 3, 2023, 4:45 p.m. UTC | #1
Edmundo Carmona Antoranz <eantoranz@gmail.com> writes:

> When rebasing merge commits and dealing with conflicts, having the
> original merge commit as a reference can help us avoid some of
> them.
>
> With this patch, we leverage the original merge commit to handle the most
> obvious case:
> - HEAD tree has to match the tree of the first parent of the original merge
>   commit.
> - MERGE_HEAD tree has to match the tree of the second parent of the original
>   merge commit.
> - At least one tree in the merge bases of HEAD/MERGE_HEAD has to match
>   a tree in the merge bases of the parent commits of the original merge
>   commit.

The first two conditions look a bit too restrictive for the purpose
of reusing previous conflict resolution, while I am not sure ...

> If all of those conditions are met, we can safely use the tree of the
> original merge commit as the resulting tree of this merge that is being
> attempted at the time.

...if the "at least one" in the last condition is a safe and
sensible loosening; if it introduces a mismerge by ignoring bases
that are different from the original, then it is a bit too bold to
declare that we can safely use the tree of the original.

What was the motivating usecase behind this new feature?  Was it
more about reusing the structural merge conflict resolution, or
about the textual merge conflict resolution?  For the latter, after
doing the usual three-way file-level merge and seeing a conflicted
textual merge, requiring the match of the blob objects for only these
conflicted paths and taking the previous merge result may give you a
safe way to raising the chance to find more reusable merges.

> +static void load_tree_oids(struct oid_array *oids, struct commit_list *bases)
> +{
> +	struct commit_list *i;
> +	for (i = bases; i; i = i->next)

Using 'i' for a pointer looks novel.  Don't.

> +static int find_oid(const struct object_id *oid,
> +		void *data)

The result of unfolding these two lines would not be overly long, I suspect?

> +{
> +	struct oid_array *other_list = (struct oid_array *) data;
> +	int pos = oid_array_lookup(other_list, oid);
> +	return pos >= 0 ? 1 : 0;

That's an unusual way to say 

	return pos >= 0;

or even

	return 0 <= oid_array_lookup(other_list, oid);

without otherwise unused variable.

> +static int run_tms_merge(struct tms_options *options)
> +{
> +	struct commit *head, *merge_head, *tip;
> +	struct commit_list *i;
> +
> +	head = lookup_commit_reference_by_name("HEAD");
> +	merge_head = lookup_commit_reference_by_name(options->merge_head);
> +	tip = lookup_commit_reference_by_name(options->tip);
> +
> +	if (!(head && merge_head && tip)) {
> +		return 2;
> +	}

Unnecessary {} around a single statement block.

> +	if (commit_list_count(tip->parents) != 2)
> +		return 2;
> +
> +	for (i = tip->parents; i; i = i->next)
> +		parse_commit(i->item);
> +	if (!oideq(get_commit_tree_oid(head),
> +		   get_commit_tree_oid(tip->parents->item)))
> +		return 2;
> +	if (!oideq(get_commit_tree_oid(merge_head),
> +		   get_commit_tree_oid(tip->parents->next->item)))
> +		return 2;
> +
> +	if (!base_match(tip, head, merge_head))
> +		return 2;
> +
> +	if (restore(tip))
> +		return 2;

I somehow thought that reverting the trashed working tree and the
index to their original state was not the responsibility for a merge
strategy but for the caller?  Shouldn't this restoration be on the
caller's side?

Oh, has this code even touched anything in the working tree and the
index at this point?  It looks more like everything we did above in
order to punt by returning 2 was to see if the condition for us to
reuse the resulting tree holds and nothing else.

Ah, "restore()" is misnamed, perhaps?  I thought it was about "oh,
we made a mess and need to go back to the state that was given to us
before failing", but is this the real "ok, the condition holds and
we can just reuse the tree from the previous merge"?  Then it makes
sense for the code to attempt to check out that tree and return 2
when it fails.  Only the function name was misleading.

> +	if (!opts->strategy || !strcmp(opts->strategy, "tms")) {
> +		rollback_lock_file(&lock);
> +		ret = try_tms_merge(opts, commit, to_merge->item);
> +		if (ret) {
> +			discard_index(r->index);
> +			if (repo_read_index(r) < 0) {
> +				ret = error(_("could not read index"));
> +				goto leave_merge;
> +			}
> +			goto ran_merge;
> +		}
> +		// regain lock to go into recursive

No // comments here.

> +		if (repo_hold_locked_index(r, &lock, LOCK_REPORT_ON_ERROR) < 0) {
Edmundo Carmona Antoranz March 4, 2023, 11:45 a.m. UTC | #2
On Fri, Mar 3, 2023 at 5:45 PM Junio C Hamano <gitster@pobox.com> wrote:
> > With this patch, we leverage the original merge commit to handle the most
> > obvious case:
> > - HEAD tree has to match the tree of the first parent of the original merge
> >   commit.
> > - MERGE_HEAD tree has to match the tree of the second parent of the original
> >   merge commit.
> > - At least one tree in the merge bases of HEAD/MERGE_HEAD has to match
> >   a tree in the merge bases of the parent commits of the original merge
> >   commit.
>
> The first two conditions look a bit too restrictive for the purpose
> of reusing previous conflict resolution, while I am not sure ...
>
> > If all of those conditions are met, we can safely use the tree of the
> > original merge commit as the resulting tree of this merge that is being
> > attempted at the time.
>
> ...if the "at least one" in the last condition is a safe and
> sensible loosening; if it introduces a mismerge by ignoring bases
> that are different from the original, then it is a bit too bold to
> declare that we can safely use the tree of the original.
>
I think the conditions hold _but_ I will think it through or perhaps
create a few scenarios that we could talk about. Will come back to it
in a few days.

I agree that the current restrictions make it too narrow. Very
restricted scenarios would match at the moment. I will start working
on making this a little bit more accepting to widen the scope.

> What was the motivating usecase behind this new feature?  Was it
> more about reusing the structural merge conflict resolution, or
> about the textual merge conflict resolution?  For the latter, after
> doing the usual three-way file-level merge and seeing a conflicted
> textual merge, requiring the match of the blob objects for only these
> conflicted paths and taking the previous merge result may give you a
> safe way to raising the chance to find more reusable merges.

Usercase can be at the moment trying to rebase (with merges) on top of
an exact base copy. In cases like this, git just crashes on merge
commits. An easy example:

git checkout v2.38.0
git commit --amend --no-edit
git rebase --rebase-merges --onto HEAD v2.38.0 v2.39.0

>
> > +static void load_tree_oids(struct oid_array *oids, struct commit_list *bases)
> > +{
> > +     struct commit_list *i;
> > +     for (i = bases; i; i = i->next)
>
> Using 'i' for a pointer looks novel.  Don't.
>

Thanks for the comments on code. At least it doesn't sound like I
messed up big time.... so far.

>
> I somehow thought that reverting the trashed working tree and the
> index to their original state was not the responsibility for a merge
> strategy but for the caller?  Shouldn't this restoration be on the
> caller's side?
>
> Oh, has this code even touched anything in the working tree and the
> index at this point?  It looks more like everything we did above in
> order to punt by returning 2 was to see if the condition for us to
> reuse the resulting tree holds and nothing else.
>
> Ah, "restore()" is misnamed, perhaps?  I thought it was about "oh,
> we made a mess and need to go back to the state that was given to us
> before failing", but is this the real "ok, the condition holds and
> we can just reuse the tree from the previous merge"?  Then it makes
> sense for the code to attempt to check out that tree and return 2
> when it fails.  Only the function name was misleading.
>

calling _git restore_ to do that hence _restore_ :-) But it's ok. I
can give it a better name.
Edmundo Carmona Antoranz March 4, 2023, 11:47 a.m. UTC | #3
On Sat, Mar 4, 2023 at 12:45 PM Edmundo Carmona Antoranz
<eantoranz@gmail.com> wrote:
>
> Usercase can be at the moment trying to rebase (with merges) on top of
> an exact base copy. In cases like this, git just crashes on merge
> commits. An easy example:

I should have said _crashes on merge commits where there was a conflict_.
Elijah Newren March 4, 2023, 8:31 p.m. UTC | #4
Hi Edmundo,

On Fri, Mar 3, 2023 at 7:43 AM Edmundo Carmona Antoranz
<eantoranz@gmail.com> wrote:
>
> When rebasing merge commits and dealing with conflicts, having the
> original merge commit as a reference can help us avoid some of
> them.
>
>
> With this patch, we leverage the original merge commit to handle the most
> obvious case:
> - HEAD tree has to match the tree of the first parent of the original merge
>   commit.
> - MERGE_HEAD tree has to match the tree of the second parent of the original
>   merge commit.
> - At least one tree in the merge bases of HEAD/MERGE_HEAD has to match
>   a tree in the merge bases of the parent commits of the original merge
>   commit.
>
> If all of those conditions are met, we can safely use the tree of the
> original merge commit as the resulting tree of this merge that is being
> attempted at the time.

The conditions are quite specific, making one wonder what you are
trying to do, and yet also leave an obvious open hole that seems to be
inviting bugs.

Having read the rest of this thread, I notice you pointed out to Junio
that you want to amend a commit in the history of the merge,
suggesting you are just modifying the commit message (or maybe
author/committer info).  More generally, I _think_ your usecase and
justification for this patch could be worded something like:

"""
We often rebase with `--rebase-merges`, `--interactive`, and
`--keep-base` (or equivalent command line flags) and only modify
commit metadata during the rebase.  Since we do not modify any files,
we would like the rebase to proceed without conflicts.  However, since
--rebase-merges currently does not rebase merges but recreates them
from scratch (ignoring everything but the commit metadata of the old
merge), it forces users to redo conflict resolution.  Since the trees
of all relevant commits have not changed, this conflict resolution
feels unnecessary.  In this patch we do not try to solve the general
problem of rebasing merges, but instead introduce a narrow hack
specific to this particular scenario: we check that the trees of all
relevant commits involved in the new merge are the same as for the old
merge, and when that holds, use the tree from the original merge as
the merge resolution.  In more detail:
"""

which would be followed immediately by your text after "handle the
most obvious case:"

Am I reading your motivation correctly?

Also, as Junio highlighted, I don't believe it's safe to only require
that one tree of the merge bases match.  You should require having
both the same number of merge bases and that the set of trees for the
merge bases of both the old and new merge commits exactly match.

As a high level review, I personally tend to dislike piecemeal
solutions that only work for very specific cases.  However, others on
the list may decide to include it, so long as it doesn't actively hurt
other cases.  One request I would make, if it is to be included, is
that we design it such that we can easily jettison this code (& any
documentation it needs) later when we gain a more general solution for
rebasing merges.

> Signed-off-by: Edmundo Carmona Antoranz <eantoranz@gmail.com>
> ---
>  .gitignore          |   1 +
>  Makefile            |   1 +
>  builtin.h           |   1 +
>  builtin/merge-tms.c | 148 ++++++++++++++++++++++++++++++++++++++++++++
>  git.c               |   1 +
>  sequencer.c         |  36 ++++++++++-
>  6 files changed, 187 insertions(+), 1 deletion(-)

I understand waiting to make documentation updates while your proposal
is just an RFC, but I think tests might help showcase how the strategy
is meant to be used and verify whether the behavior is sane when your
new strategy doesn't apply.

>  create mode 100644 builtin/merge-tms.c
>
> diff --git a/.gitignore b/.gitignore
> index e875c59054..8b534f98e6 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -103,6 +103,7 @@
>  /git-merge-recursive
>  /git-merge-resolve
>  /git-merge-subtree
> +/git-merge-tms
>  /git-mergetool
>  /git-mergetool--lib
>  /git-mktag
> diff --git a/Makefile b/Makefile
> index 50ee51fde3..10a3167c50 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1264,6 +1264,7 @@ BUILTIN_OBJS += builtin/merge-file.o
>  BUILTIN_OBJS += builtin/merge-index.o
>  BUILTIN_OBJS += builtin/merge-ours.o
>  BUILTIN_OBJS += builtin/merge-recursive.o
> +BUILTIN_OBJS += builtin/merge-tms.o
>  BUILTIN_OBJS += builtin/merge-tree.o
>  BUILTIN_OBJS += builtin/merge.o
>  BUILTIN_OBJS += builtin/mktag.o
> diff --git a/builtin.h b/builtin.h
> index 46cc789789..94dcb73f85 100644
> --- a/builtin.h
> +++ b/builtin.h
> @@ -180,6 +180,7 @@ int cmd_merge_index(int argc, const char **argv, const char *prefix);
>  int cmd_merge_ours(int argc, const char **argv, const char *prefix);
>  int cmd_merge_file(int argc, const char **argv, const char *prefix);
>  int cmd_merge_recursive(int argc, const char **argv, const char *prefix);
> +int cmd_merge_tms(int argc, const char **argv, const char *prefix);
>  int cmd_merge_tree(int argc, const char **argv, const char *prefix);
>  int cmd_mktag(int argc, const char **argv, const char *prefix);
>  int cmd_mktree(int argc, const char **argv, const char *prefix);
> diff --git a/builtin/merge-tms.c b/builtin/merge-tms.c
> new file mode 100644
> index 0000000000..37a2427757
> --- /dev/null
> +++ b/builtin/merge-tms.c
> @@ -0,0 +1,148 @@
> +/*
> + * Copyright (c) 2023 Edmundo Carmona Antoranz

I dislike having these in our files.  Despite the file being modified
later, they are virtually never updated to list the new authors and
new years.  Needing to always update these lines of code would
obviously be onerous, and is why we don't do that, but that just means
these lines always eventually become inaccurate and perhaps even
wildly misleading.  People can look at log/blame/etc. to find out the
_correct_ information, so what purpose does a line like this serve?

(I'm not a lawyer or anything close; I previously suggested ripping
these lines out of our codebase but people pointed out we can't rip
them out once put in.  But it'd be nice to avoid spreading the
problem.  Or maybe there are still lawyer-ish reasons for including
these that I'm not aware of?)

> + * Released under the terms of GPL2

That's automatic for anything in the project that doesn't explicitly
state otherwise; I'd rather not have this included for every file.

Tangential question for the list: If folks do want their contributions
to also be available under an alternate license (e.g. as done for
sha1dc or as done with
https://github.com/libgit2/libgit2/blob/main/git.git-authors), is
there a scheme for doing so?  Incidentally, my employer told me that
any new entire files or subsystems that I contributed should be
licensed under MIT (like sha1dc is).  I didn't want to have to deal
with that headache, but luckily I was copying a bunch of
merge-recursive.[ch] code into merge-ort.[ch], so I just did that in
the initial commits and side-stepped the whole question.  :-)  (But
really, if anyone wants any of the code I've contributed to Git under
MIT during my time at Palantir, that's what my employer
wanted/intended so you've got permission to do so.)

...and with these two first lines out of the way, I can stop
commenting on things I have a tenuous grasp on (copyright & licenses)
to things that I actually know something about...

> + *
> + * Tipped merge strategy.... a.k.a. fortune-teller merge strategy
> + *
> + * In cases like rebases, merge commits offer us the advantage of knowing
> + * _before hand_ what the previous result of the _original_ branches
> + * involved was.
> + *
> + * This merge strategy tries to leverage this knowledge so that we can
> + * avoid at least the most obvious conflicts that have been solved in the
> + * original merge commit.

Except it doesn't solve "the most obvious conflicts", it either solves
all of the conflicts or none of them.  Perhaps this wording can be fixed up?

> + *
> + * In the current state, the strategy works based on exact matches of the trees
> + * involved:
> + * - HEAD tree has to match the tree of the first parent of the original merge
> + *   commit.
> + * - MERGE_HEAD tree has to match the tree of the second parent of the original
> + *   merge commit.
> + * - At least one tree in the merge bases of HEAD/MERGE_HEAD has to match
> + *   a tree in the merge bases of the parent commits of the original merge
> + *   commit.
> + * If all of those conditions are met, we can safely use the tree of the
> + * original merge commit as the resulting tree of this merge that is being
> + * attempted at the time.
> + */
> +
> +#include "builtin.h"
> +#include "commit-reach.h"
> +#include "oid-array.h"
> +#include "parse-options.h"
> +#include "run-command.h"
> +
> +
> +struct tms_options {
> +       const char *tip;
> +       const char *merge_head;
> +} tms_options;
> +
> +static int restore(struct commit *commit)
> +{
> +       struct child_process cmd = CHILD_PROCESS_INIT;
> +
> +       strvec_push(&cmd.args, "restore");
> +       strvec_push(&cmd.args, "--worktree");
> +       strvec_push(&cmd.args, "--stage");
> +       strvec_pushf(&cmd.args, "--source=%s",
> +                    oid_to_hex(&commit->object.oid));
> +       strvec_push(&cmd.args, "--");
> +       strvec_push(&cmd.args, ".");
> +       cmd.git_cmd = 1;
> +       return run_command(&cmd);

We fork subprocesses a lot in git, but it was a horrible mistake.  It
hurts performance, it *really* hurts on Windows (or so I hear), it
hurts debuggability/maintainability in general, and we should be
removing this from our codebase rather than adding more.  As someone
who has put time into slowly eradicating this from our codebase,
please make library functions you can call instead.  (Actually, first
look and see if there are relevant library functions.
Rebase/sequencer probably already needed this and already wrote such a
function.)

> +}
> +
> +static void load_tree_oids(struct oid_array *oids, struct commit_list *bases)
> +{
> +       struct commit_list *i;
> +
> +       for (i = bases; i; i = i->next)
> +               oid_array_append(oids, get_commit_tree_oid(i->item));
> +}
> +
> +static int find_oid(const struct object_id *oid,
> +               void *data)
> +{
> +       struct oid_array *other_list = (struct oid_array *) data;
> +       int pos = oid_array_lookup(other_list, oid);
> +       return pos >= 0 ? 1 : 0;
> +}
> +
> +static int base_match(struct commit *rebase_head,
> +               struct commit *head,
> +               struct commit *merge_head)
> +{
> +       struct commit_list *bases_current, *bases_tip;
> +       struct oid_array trees_current = OID_ARRAY_INIT;
> +       struct oid_array trees_tip = OID_ARRAY_INIT;
> +       int oid_match;
> +
> +       bases_current = get_merge_bases(head, merge_head);
> +       bases_tip = get_merge_bases(rebase_head->parents->item,
> +                                   rebase_head->parents->next->item);
> +       load_tree_oids(&trees_current, bases_current);
> +       load_tree_oids(&trees_tip, bases_tip);
> +
> +       oid_match = oid_array_for_each(&trees_current, find_oid, &trees_tip);
> +
> +       oid_array_clear(&trees_current);
> +       oid_array_clear(&trees_tip);
> +
> +       return oid_match;
> +}
> +
> +static int run_tms_merge(struct tms_options *options)
> +{
> +       struct commit *head, *merge_head, *tip;
> +       struct commit_list *i;
> +
> +       head = lookup_commit_reference_by_name("HEAD");
> +       merge_head = lookup_commit_reference_by_name(options->merge_head);
> +       tip = lookup_commit_reference_by_name(options->tip);
> +
> +       if (!(head && merge_head && tip)) {
> +               return 2;
> +       }
> +       if (commit_list_count(tip->parents) != 2)
> +               return 2;
> +
> +       for (i = tip->parents; i; i = i->next)
> +               parse_commit(i->item);
> +       if (!oideq(get_commit_tree_oid(head),
> +                  get_commit_tree_oid(tip->parents->item)))
> +               return 2;
> +       if (!oideq(get_commit_tree_oid(merge_head),
> +                  get_commit_tree_oid(tip->parents->next->item)))
> +               return 2;
> +
> +       if (!base_match(tip, head, merge_head))
> +               return 2;
> +
> +       if (restore(tip))
> +               return 2;

So six different cases where this merge strategy can currently bail
and do nothing; I'll discuss this more below.

> +
> +       return 0;
> +}
> +
> +int cmd_merge_tms(int argc, const char **argv, const char *prefix)
> +{
> +
> +       struct option mt_options[] = {
> +               OPT_STRING(0, "tip", &tms_options.tip,
> +                           N_("tip-merge-commit"),
> +                           N_("merge commit being rebased used as a tip for conflict resolution.")),
> +               OPT_END()
> +       };
> +       argc = parse_options(argc, argv, NULL, mt_options,
> +                            NULL, 0);
> +
> +       if (argc != 1)
> +               return 2;
> +       tms_options.merge_head = argv[0];
> +
> +       if (!tms_options.tip)
> +               return 2;

By creating a builtin named "merge-<foo>", you implicitly create a
`--strategy <foo>` option for git-merge, not just something for
git-rebase.  If we publish this new merge strategy in a released
version of Git, we'll have to support it for ~forever.  For something
that is meant as a short-term hack, I'd rather avoid that.  Further,
your merge strategy does not accept the normal arguments that a merge
strategy accepts, and will bail with an exit status of 2 if a "--tip"
is not specified.  Luckily, my comments elsewhere to make library
calls instead of forking subprocesses would also solve these issues
here.

I will also point out that this main function adds two more cases
where we can bail and do nothing, returning a status of 2, so now
we're up to eight cases.  Again, I'll discuss these more below.

> +
> +       return run_tms_merge(&tms_options);
> +}
> diff --git a/git.c b/git.c
> index 96b0a2837d..2e843731f1 100644
> --- a/git.c
> +++ b/git.c
> @@ -544,6 +544,7 @@ static struct cmd_struct commands[] = {
>         { "merge-recursive-ours", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
>         { "merge-recursive-theirs", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
>         { "merge-subtree", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
> +       { "merge-tms", cmd_merge_tms, RUN_SETUP },
>         { "merge-tree", cmd_merge_tree, RUN_SETUP },
>         { "mktag", cmd_mktag, RUN_SETUP },
>         { "mktree", cmd_mktree, RUN_SETUP },
> diff --git a/sequencer.c b/sequencer.c
> index 65a34f9676..559169814b 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -3833,6 +3833,21 @@ static int do_reset(struct repository *r,
>         return ret;
>  }
>
> +static int try_tms_merge(struct replay_opts *opts,
> +                        struct commit *rebase_head,
> +                        struct commit *merge_commit)
> +{
> +       struct child_process cmd = CHILD_PROCESS_INIT;
> +
> +       strvec_push(&cmd.args, "merge-tms");
> +       strvec_push(&cmd.args, "--tip");
> +       strvec_pushf(&cmd.args, "%s", oid_to_hex(&rebase_head->object.oid));
> +       strvec_pushf(&cmd.args, "%s", oid_to_hex(&merge_commit->object.oid));
> +
> +       cmd.git_cmd = 1;
> +       return run_command(&cmd) ? 0 : 1;

Again, let's please not fork subprocesses for builtin code.

> +}
> +
>  static int do_merge(struct repository *r,
>                     struct commit *commit,
>                     const char *arg, int arg_len,
> @@ -3846,7 +3861,8 @@ static int do_merge(struct repository *r,
>         const char *strategy = !opts->xopts_nr &&
>                 (!opts->strategy ||
>                  !strcmp(opts->strategy, "recursive") ||
> -                !strcmp(opts->strategy, "ort")) ?
> +                !strcmp(opts->strategy, "ort") ||
> +                !strcmp(opts->strategy, "tms")) ?

So folks trigger this by passing `--strategy tms` to rebase.
Typically, all the --strategy options to rebase are also ones that
merge accepts.  So if we trigger via that method, we may need to
expose the strategy to merge as well...or update the documentation in
various places that talks about merge strategies to be more specific
about which ones apply where.

Also, what if the merge strategy is inappropriate for the case in
question?  "ort" and "recursive" are appropriate for all non-octopus
merges, and the code only gets to this point in sequencer if we have a
non-octopus merge to do.  But your strategy is inappropriate other
than in very specific circumstances that haven't yet been checked.
What will the code do if we're under one of the many cases where "tms"
isn't applicable?  More on this question below.

>                 NULL : opts->strategy;
>         struct merge_options o;
>         int merge_arg_len, oneline_offset, can_fast_forward, ret, k;
> @@ -4086,6 +4102,23 @@ static int do_merge(struct repository *r,
>         o.branch2 = ref_name.buf;
>         o.buffer_output = 2;
>
> +       if (!opts->strategy || !strcmp(opts->strategy, "tms")) {
> +               rollback_lock_file(&lock);
> +               ret = try_tms_merge(opts, commit, to_merge->item);
> +               if (ret) {
> +                       discard_index(r->index);
> +                       if (repo_read_index(r) < 0) {
> +                               ret = error(_("could not read index"));
> +                               goto leave_merge;
> +                       }
> +                       goto ran_merge;

We "goto ran_merge", which could be very misleading.  Under the 8
conditions given, we did not run a merge; we bailed early saying the
merge strategy was inappropriate for the conditions at hand.  For
those 8 cases, ret will be 2 and we will jump to...

> +               }
> +               // regain lock to go into recursive
> +               if (repo_hold_locked_index(r, &lock, LOCK_REPORT_ON_ERROR) < 0) {
> +                       ret = -1;
> +                       goto leave_merge;
> +               }
> +       }
>         if (!opts->strategy || !strcmp(opts->strategy, "ort")) {
>                 /*
>                  * TODO: Should use merge_incore_recursive() and
> @@ -4100,6 +4133,7 @@ static int do_merge(struct repository *r,
>                 ret = merge_recursive(&o, head_commit, merge_commit, bases,
>                                       &i);
>         }
> +ran_merge:
>         if (ret <= 0)
>                 fputs(o.obuf.buf, stdout);
>         strbuf_release(&o.obuf);

...here.  The code following this includes a check for ret < 0, and then runs:

    /*
     * The return value of merge_recursive() is 1 on clean, and 0 on
     * unclean merge.
     *
     * Let's reverse that, so that do_merge() returns 0 upon success and
     * 1 upon failed merge (keeping the return value -1 for the cases where
     * we will want to reschedule the `merge` command).
     */
    ret = !ret;

    if (r->index->cache_changed &&
        write_locked_index(r->index, &lock, COMMIT_LOCK)) {
        ret = error(_("merge: Unable to write new index file"));
        goto leave_merge;
    }

So, if I'm reading correctly (please double check me), since your
merge strategy made no changes to the index or working tree, and
returned a status of 2, and since !2 == !1 == 0, we'll treat this the
same as a successful merge and commit the "results", i.e. the tree of
the first parent.  Doesn't this tipped merge strategy thus behave the
same as a `--strategy ours` merge when its preconditions are not
satisfied?  If it does, that would be horrifying.
Elijah Newren March 4, 2023, 8:36 p.m. UTC | #5
On Sat, Mar 4, 2023 at 3:57 AM Edmundo Carmona Antoranz
<eantoranz@gmail.com> wrote:
>
> On Sat, Mar 4, 2023 at 12:45 PM Edmundo Carmona Antoranz
> <eantoranz@gmail.com> wrote:
> >
> > Usercase can be at the moment trying to rebase (with merges) on top of
> > an exact base copy. In cases like this, git just crashes on merge
> > commits. An easy example:
>
> I should have said _crashes on merge commits where there was a conflict_.

Crash tends to mean the program has done something not allowed by the
operating system.  I don't think that's what's happening here.  Do you
mean Git stops with conflicts?
Edmundo Carmona Antoranz March 5, 2023, 1:40 p.m. UTC | #6
On Sat, Mar 4, 2023 at 9:36 PM Elijah Newren <newren@gmail.com> wrote:
>
> On Sat, Mar 4, 2023 at 3:57 AM Edmundo Carmona Antoranz
> <eantoranz@gmail.com> wrote:
> >
> > On Sat, Mar 4, 2023 at 12:45 PM Edmundo Carmona Antoranz
> > <eantoranz@gmail.com> wrote:
> > >
> > > Usercase can be at the moment trying to rebase (with merges) on top of
> > > an exact base copy. In cases like this, git just crashes on merge
> > > commits. An easy example:
> >
> > I should have said _crashes on merge commits where there was a conflict_.
>
> Crash tends to mean the program has done something not allowed by the
> operating system.  I don't think that's what's happening here.  Do you
> mean Git stops with conflicts?

Ok, sorry for the sloppy wording. Yes, it stops with conflicts.
Edmundo Carmona Antoranz March 5, 2023, 2 p.m. UTC | #7
Hey, Elijah!

Thank you for all of that feedback. Really good Will need to go
through it which will take me some time to digest but I did want to go
over one of the subjects.


> So, if I'm reading correctly (please double check me), since your
> merge strategy made no changes to the index or working tree, and
> returned a status of 2, and since !2 == !1 == 0, we'll treat this the
> same as a successful merge and commit the "results", i.e. the tree of
> the first parent.  Doesn't this tipped merge strategy thus behave the
> same as a `--strategy ours` merge when its preconditions are not
> satisfied?  If it does, that would be horrifying.

I think that is not correct. The possible values that come out of
try_tms_merge are 0 if nothing happened and 1 if the merge was
successful and we changed the index and the working tree. Then I think
I wrote this correctly following the call:

if (ret) {
  discard_index(r->index);
  if (repo_read_index(r) < 0) {
    ret = error(_("could not read index"));
    goto leave_merge;
  }
  goto ran_merge;
}
// regain lock to go into recursive
if (repo_hold_locked_index(r, &lock, LOCK_REPORT_ON_ERROR) < 0) {
  ret = -1;
  goto leave_merge;
}

Actually, if we will be switching to using a library, then this won't
be that important because we might be able to pull it off without
having to release the lock given that we would be running in-process,
but I wanted to clear up what the intended flow is there, just in
case.

Ok.... more questions or comments will be coming in the following
days. And thank you, again.

BR!
Junio C Hamano March 6, 2023, 5:32 p.m. UTC | #8
Elijah Newren <newren@gmail.com> writes:

> Having read the rest of this thread, I notice you pointed out to Junio
> that you want to amend a commit in the history of the merge,
> suggesting you are just modifying the commit message (or maybe
> author/committer info).  More generally, I _think_ your usecase and
> justification for this patch could be worded something like:
>
> """
> We often rebase with `--rebase-merges`, `--interactive`, and
> `--keep-base` (or equivalent command line flags) and only modify
> commit metadata during the rebase.  Since we do not modify any files,
> we would like the rebase to proceed without conflicts.

It makes very much sense to focus on this narrow but useful use
case, and I view it a very natural extension to already existing "if
we just pick without any user interaction a commit on top of its
current base, the we do not do anything, fast-forward and just
pretend we picked it".  IOW, shouldn't it something the sequencer
machinery should be able to do natively without forcing the user to
specify a new merge strategy?
diff mbox series

Patch

diff --git a/.gitignore b/.gitignore
index e875c59054..8b534f98e6 100644
--- a/.gitignore
+++ b/.gitignore
@@ -103,6 +103,7 @@ 
 /git-merge-recursive
 /git-merge-resolve
 /git-merge-subtree
+/git-merge-tms
 /git-mergetool
 /git-mergetool--lib
 /git-mktag
diff --git a/Makefile b/Makefile
index 50ee51fde3..10a3167c50 100644
--- a/Makefile
+++ b/Makefile
@@ -1264,6 +1264,7 @@  BUILTIN_OBJS += builtin/merge-file.o
 BUILTIN_OBJS += builtin/merge-index.o
 BUILTIN_OBJS += builtin/merge-ours.o
 BUILTIN_OBJS += builtin/merge-recursive.o
+BUILTIN_OBJS += builtin/merge-tms.o
 BUILTIN_OBJS += builtin/merge-tree.o
 BUILTIN_OBJS += builtin/merge.o
 BUILTIN_OBJS += builtin/mktag.o
diff --git a/builtin.h b/builtin.h
index 46cc789789..94dcb73f85 100644
--- a/builtin.h
+++ b/builtin.h
@@ -180,6 +180,7 @@  int cmd_merge_index(int argc, const char **argv, const char *prefix);
 int cmd_merge_ours(int argc, const char **argv, const char *prefix);
 int cmd_merge_file(int argc, const char **argv, const char *prefix);
 int cmd_merge_recursive(int argc, const char **argv, const char *prefix);
+int cmd_merge_tms(int argc, const char **argv, const char *prefix);
 int cmd_merge_tree(int argc, const char **argv, const char *prefix);
 int cmd_mktag(int argc, const char **argv, const char *prefix);
 int cmd_mktree(int argc, const char **argv, const char *prefix);
diff --git a/builtin/merge-tms.c b/builtin/merge-tms.c
new file mode 100644
index 0000000000..37a2427757
--- /dev/null
+++ b/builtin/merge-tms.c
@@ -0,0 +1,148 @@ 
+/*
+ * Copyright (c) 2023 Edmundo Carmona Antoranz
+ * Released under the terms of GPL2
+ *
+ * Tipped merge strategy.... a.k.a. fortune-teller merge strategy
+ *
+ * In cases like rebases, merge commits offer us the advantage of knowing
+ * _before hand_ what the previous result of the _original_ branches
+ * involved was.
+ *
+ * This merge strategy tries to leverage this knowledge so that we can
+ * avoid at least the most obvious conflicts that have been solved in the
+ * original merge commit.
+ *
+ * In the current state, the strategy works based on exact matches of the trees
+ * involved:
+ * - HEAD tree has to match the tree of the first parent of the original merge
+ *   commit.
+ * - MERGE_HEAD tree has to match the tree of the second parent of the original
+ *   merge commit.
+ * - At least one tree in the merge bases of HEAD/MERGE_HEAD has to match
+ *   a tree in the merge bases of the parent commits of the original merge
+ *   commit.
+ * If all of those conditions are met, we can safely use the tree of the
+ * original merge commit as the resulting tree of this merge that is being
+ * attempted at the time.
+ */
+
+#include "builtin.h"
+#include "commit-reach.h"
+#include "oid-array.h"
+#include "parse-options.h"
+#include "run-command.h"
+
+
+struct tms_options {
+	const char *tip;
+	const char *merge_head;
+} tms_options;
+
+static int restore(struct commit *commit)
+{
+	struct child_process cmd = CHILD_PROCESS_INIT;
+
+	strvec_push(&cmd.args, "restore");
+	strvec_push(&cmd.args, "--worktree");
+	strvec_push(&cmd.args, "--stage");
+	strvec_pushf(&cmd.args, "--source=%s",
+		     oid_to_hex(&commit->object.oid));
+	strvec_push(&cmd.args, "--");
+	strvec_push(&cmd.args, ".");
+	cmd.git_cmd = 1;
+	return run_command(&cmd);
+}
+
+static void load_tree_oids(struct oid_array *oids, struct commit_list *bases)
+{
+	struct commit_list *i;
+
+	for (i = bases; i; i = i->next)
+		oid_array_append(oids, get_commit_tree_oid(i->item));
+}
+
+static int find_oid(const struct object_id *oid,
+		void *data)
+{
+	struct oid_array *other_list = (struct oid_array *) data;
+	int pos = oid_array_lookup(other_list, oid);
+	return pos >= 0 ? 1 : 0;
+}
+
+static int base_match(struct commit *rebase_head,
+		struct commit *head,
+		struct commit *merge_head)
+{
+	struct commit_list *bases_current, *bases_tip;
+	struct oid_array trees_current = OID_ARRAY_INIT;
+	struct oid_array trees_tip = OID_ARRAY_INIT;
+	int oid_match;
+
+	bases_current = get_merge_bases(head, merge_head);
+	bases_tip = get_merge_bases(rebase_head->parents->item,
+				    rebase_head->parents->next->item);
+	load_tree_oids(&trees_current, bases_current);
+	load_tree_oids(&trees_tip, bases_tip);
+
+	oid_match = oid_array_for_each(&trees_current, find_oid, &trees_tip);
+
+	oid_array_clear(&trees_current);
+	oid_array_clear(&trees_tip);
+
+	return oid_match;
+}
+
+static int run_tms_merge(struct tms_options *options)
+{
+	struct commit *head, *merge_head, *tip;
+	struct commit_list *i;
+
+	head = lookup_commit_reference_by_name("HEAD");
+	merge_head = lookup_commit_reference_by_name(options->merge_head);
+	tip = lookup_commit_reference_by_name(options->tip);
+
+	if (!(head && merge_head && tip)) {
+		return 2;
+	}
+	if (commit_list_count(tip->parents) != 2)
+		return 2;
+
+	for (i = tip->parents; i; i = i->next)
+		parse_commit(i->item);
+	if (!oideq(get_commit_tree_oid(head),
+		   get_commit_tree_oid(tip->parents->item)))
+		return 2;
+	if (!oideq(get_commit_tree_oid(merge_head),
+		   get_commit_tree_oid(tip->parents->next->item)))
+		return 2;
+
+	if (!base_match(tip, head, merge_head))
+		return 2;
+
+	if (restore(tip))
+		return 2;
+
+	return 0;
+}
+
+int cmd_merge_tms(int argc, const char **argv, const char *prefix)
+{
+
+	struct option mt_options[] = {
+		OPT_STRING(0, "tip", &tms_options.tip,
+			    N_("tip-merge-commit"),
+			    N_("merge commit being rebased used as a tip for conflict resolution.")),
+		OPT_END()
+	};
+	argc = parse_options(argc, argv, NULL, mt_options,
+			     NULL, 0);
+
+	if (argc != 1)
+		return 2;
+	tms_options.merge_head = argv[0];
+
+	if (!tms_options.tip)
+		return 2;
+
+	return run_tms_merge(&tms_options);
+}
diff --git a/git.c b/git.c
index 96b0a2837d..2e843731f1 100644
--- a/git.c
+++ b/git.c
@@ -544,6 +544,7 @@  static struct cmd_struct commands[] = {
 	{ "merge-recursive-ours", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
 	{ "merge-recursive-theirs", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
 	{ "merge-subtree", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT },
+	{ "merge-tms", cmd_merge_tms, RUN_SETUP },
 	{ "merge-tree", cmd_merge_tree, RUN_SETUP },
 	{ "mktag", cmd_mktag, RUN_SETUP },
 	{ "mktree", cmd_mktree, RUN_SETUP },
diff --git a/sequencer.c b/sequencer.c
index 65a34f9676..559169814b 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -3833,6 +3833,21 @@  static int do_reset(struct repository *r,
 	return ret;
 }
 
+static int try_tms_merge(struct replay_opts *opts,
+			 struct commit *rebase_head,
+			 struct commit *merge_commit)
+{
+	struct child_process cmd = CHILD_PROCESS_INIT;
+
+	strvec_push(&cmd.args, "merge-tms");
+	strvec_push(&cmd.args, "--tip");
+	strvec_pushf(&cmd.args, "%s", oid_to_hex(&rebase_head->object.oid));
+	strvec_pushf(&cmd.args, "%s", oid_to_hex(&merge_commit->object.oid));
+
+	cmd.git_cmd = 1;
+	return run_command(&cmd) ? 0 : 1;
+}
+
 static int do_merge(struct repository *r,
 		    struct commit *commit,
 		    const char *arg, int arg_len,
@@ -3846,7 +3861,8 @@  static int do_merge(struct repository *r,
 	const char *strategy = !opts->xopts_nr &&
 		(!opts->strategy ||
 		 !strcmp(opts->strategy, "recursive") ||
-		 !strcmp(opts->strategy, "ort")) ?
+		 !strcmp(opts->strategy, "ort") ||
+		 !strcmp(opts->strategy, "tms")) ?
 		NULL : opts->strategy;
 	struct merge_options o;
 	int merge_arg_len, oneline_offset, can_fast_forward, ret, k;
@@ -4086,6 +4102,23 @@  static int do_merge(struct repository *r,
 	o.branch2 = ref_name.buf;
 	o.buffer_output = 2;
 
+	if (!opts->strategy || !strcmp(opts->strategy, "tms")) {
+		rollback_lock_file(&lock);
+		ret = try_tms_merge(opts, commit, to_merge->item);
+		if (ret) {
+			discard_index(r->index);
+			if (repo_read_index(r) < 0) {
+				ret = error(_("could not read index"));
+				goto leave_merge;
+			}
+			goto ran_merge;
+		}
+		// regain lock to go into recursive
+		if (repo_hold_locked_index(r, &lock, LOCK_REPORT_ON_ERROR) < 0) {
+			ret = -1;
+			goto leave_merge;
+		}
+	}
 	if (!opts->strategy || !strcmp(opts->strategy, "ort")) {
 		/*
 		 * TODO: Should use merge_incore_recursive() and
@@ -4100,6 +4133,7 @@  static int do_merge(struct repository *r,
 		ret = merge_recursive(&o, head_commit, merge_commit, bases,
 				      &i);
 	}
+ran_merge:
 	if (ret <= 0)
 		fputs(o.obuf.buf, stdout);
 	strbuf_release(&o.obuf);