Message ID | 20230303145311.513960-1-eantoranz@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] sequencer - tipped merge strategy | expand |
Edmundo Carmona Antoranz <eantoranz@gmail.com> writes: > When rebasing merge commits and dealing with conflicts, having the > original merge commit as a reference can help us avoid some of > them. > > With this patch, we leverage the original merge commit to handle the most > obvious case: > - HEAD tree has to match the tree of the first parent of the original merge > commit. > - MERGE_HEAD tree has to match the tree of the second parent of the original > merge commit. > - At least one tree in the merge bases of HEAD/MERGE_HEAD has to match > a tree in the merge bases of the parent commits of the original merge > commit. The first two conditions look a bit too restrictive for the purpose of reusing previous conflict resolution, while I am not sure ... > If all of those conditions are met, we can safely use the tree of the > original merge commit as the resulting tree of this merge that is being > attempted at the time. ...if the "at least one" in the last condition is a safe and sensible loosening; if it introduces a mismerge by ignoring bases that are different from the original, then it is a bit too bold to declare that we can safely use the tree of the original. What was the motivating usecase behind this new feature? Was it more about reusing the structural merge conflict resolution, or about the textual merge conflict resolution? For the latter, after doing the usual three-way file-level merge and seeing a conflicted textual merge, requiring the match of the blob objects for only these conflicted paths and taking the previous merge result may give you a safe way to raising the chance to find more reusable merges. > +static void load_tree_oids(struct oid_array *oids, struct commit_list *bases) > +{ > + struct commit_list *i; > + for (i = bases; i; i = i->next) Using 'i' for a pointer looks novel. Don't. > +static int find_oid(const struct object_id *oid, > + void *data) The result of unfolding these two lines would not be overly long, I suspect? > +{ > + struct oid_array *other_list = (struct oid_array *) data; > + int pos = oid_array_lookup(other_list, oid); > + return pos >= 0 ? 1 : 0; That's an unusual way to say return pos >= 0; or even return 0 <= oid_array_lookup(other_list, oid); without otherwise unused variable. > +static int run_tms_merge(struct tms_options *options) > +{ > + struct commit *head, *merge_head, *tip; > + struct commit_list *i; > + > + head = lookup_commit_reference_by_name("HEAD"); > + merge_head = lookup_commit_reference_by_name(options->merge_head); > + tip = lookup_commit_reference_by_name(options->tip); > + > + if (!(head && merge_head && tip)) { > + return 2; > + } Unnecessary {} around a single statement block. > + if (commit_list_count(tip->parents) != 2) > + return 2; > + > + for (i = tip->parents; i; i = i->next) > + parse_commit(i->item); > + if (!oideq(get_commit_tree_oid(head), > + get_commit_tree_oid(tip->parents->item))) > + return 2; > + if (!oideq(get_commit_tree_oid(merge_head), > + get_commit_tree_oid(tip->parents->next->item))) > + return 2; > + > + if (!base_match(tip, head, merge_head)) > + return 2; > + > + if (restore(tip)) > + return 2; I somehow thought that reverting the trashed working tree and the index to their original state was not the responsibility for a merge strategy but for the caller? Shouldn't this restoration be on the caller's side? Oh, has this code even touched anything in the working tree and the index at this point? It looks more like everything we did above in order to punt by returning 2 was to see if the condition for us to reuse the resulting tree holds and nothing else. Ah, "restore()" is misnamed, perhaps? I thought it was about "oh, we made a mess and need to go back to the state that was given to us before failing", but is this the real "ok, the condition holds and we can just reuse the tree from the previous merge"? Then it makes sense for the code to attempt to check out that tree and return 2 when it fails. Only the function name was misleading. > + if (!opts->strategy || !strcmp(opts->strategy, "tms")) { > + rollback_lock_file(&lock); > + ret = try_tms_merge(opts, commit, to_merge->item); > + if (ret) { > + discard_index(r->index); > + if (repo_read_index(r) < 0) { > + ret = error(_("could not read index")); > + goto leave_merge; > + } > + goto ran_merge; > + } > + // regain lock to go into recursive No // comments here. > + if (repo_hold_locked_index(r, &lock, LOCK_REPORT_ON_ERROR) < 0) {
On Fri, Mar 3, 2023 at 5:45 PM Junio C Hamano <gitster@pobox.com> wrote: > > With this patch, we leverage the original merge commit to handle the most > > obvious case: > > - HEAD tree has to match the tree of the first parent of the original merge > > commit. > > - MERGE_HEAD tree has to match the tree of the second parent of the original > > merge commit. > > - At least one tree in the merge bases of HEAD/MERGE_HEAD has to match > > a tree in the merge bases of the parent commits of the original merge > > commit. > > The first two conditions look a bit too restrictive for the purpose > of reusing previous conflict resolution, while I am not sure ... > > > If all of those conditions are met, we can safely use the tree of the > > original merge commit as the resulting tree of this merge that is being > > attempted at the time. > > ...if the "at least one" in the last condition is a safe and > sensible loosening; if it introduces a mismerge by ignoring bases > that are different from the original, then it is a bit too bold to > declare that we can safely use the tree of the original. > I think the conditions hold _but_ I will think it through or perhaps create a few scenarios that we could talk about. Will come back to it in a few days. I agree that the current restrictions make it too narrow. Very restricted scenarios would match at the moment. I will start working on making this a little bit more accepting to widen the scope. > What was the motivating usecase behind this new feature? Was it > more about reusing the structural merge conflict resolution, or > about the textual merge conflict resolution? For the latter, after > doing the usual three-way file-level merge and seeing a conflicted > textual merge, requiring the match of the blob objects for only these > conflicted paths and taking the previous merge result may give you a > safe way to raising the chance to find more reusable merges. Usercase can be at the moment trying to rebase (with merges) on top of an exact base copy. In cases like this, git just crashes on merge commits. An easy example: git checkout v2.38.0 git commit --amend --no-edit git rebase --rebase-merges --onto HEAD v2.38.0 v2.39.0 > > > +static void load_tree_oids(struct oid_array *oids, struct commit_list *bases) > > +{ > > + struct commit_list *i; > > + for (i = bases; i; i = i->next) > > Using 'i' for a pointer looks novel. Don't. > Thanks for the comments on code. At least it doesn't sound like I messed up big time.... so far. > > I somehow thought that reverting the trashed working tree and the > index to their original state was not the responsibility for a merge > strategy but for the caller? Shouldn't this restoration be on the > caller's side? > > Oh, has this code even touched anything in the working tree and the > index at this point? It looks more like everything we did above in > order to punt by returning 2 was to see if the condition for us to > reuse the resulting tree holds and nothing else. > > Ah, "restore()" is misnamed, perhaps? I thought it was about "oh, > we made a mess and need to go back to the state that was given to us > before failing", but is this the real "ok, the condition holds and > we can just reuse the tree from the previous merge"? Then it makes > sense for the code to attempt to check out that tree and return 2 > when it fails. Only the function name was misleading. > calling _git restore_ to do that hence _restore_ :-) But it's ok. I can give it a better name.
On Sat, Mar 4, 2023 at 12:45 PM Edmundo Carmona Antoranz <eantoranz@gmail.com> wrote: > > Usercase can be at the moment trying to rebase (with merges) on top of > an exact base copy. In cases like this, git just crashes on merge > commits. An easy example: I should have said _crashes on merge commits where there was a conflict_.
Hi Edmundo, On Fri, Mar 3, 2023 at 7:43 AM Edmundo Carmona Antoranz <eantoranz@gmail.com> wrote: > > When rebasing merge commits and dealing with conflicts, having the > original merge commit as a reference can help us avoid some of > them. > > > With this patch, we leverage the original merge commit to handle the most > obvious case: > - HEAD tree has to match the tree of the first parent of the original merge > commit. > - MERGE_HEAD tree has to match the tree of the second parent of the original > merge commit. > - At least one tree in the merge bases of HEAD/MERGE_HEAD has to match > a tree in the merge bases of the parent commits of the original merge > commit. > > If all of those conditions are met, we can safely use the tree of the > original merge commit as the resulting tree of this merge that is being > attempted at the time. The conditions are quite specific, making one wonder what you are trying to do, and yet also leave an obvious open hole that seems to be inviting bugs. Having read the rest of this thread, I notice you pointed out to Junio that you want to amend a commit in the history of the merge, suggesting you are just modifying the commit message (or maybe author/committer info). More generally, I _think_ your usecase and justification for this patch could be worded something like: """ We often rebase with `--rebase-merges`, `--interactive`, and `--keep-base` (or equivalent command line flags) and only modify commit metadata during the rebase. Since we do not modify any files, we would like the rebase to proceed without conflicts. However, since --rebase-merges currently does not rebase merges but recreates them from scratch (ignoring everything but the commit metadata of the old merge), it forces users to redo conflict resolution. Since the trees of all relevant commits have not changed, this conflict resolution feels unnecessary. In this patch we do not try to solve the general problem of rebasing merges, but instead introduce a narrow hack specific to this particular scenario: we check that the trees of all relevant commits involved in the new merge are the same as for the old merge, and when that holds, use the tree from the original merge as the merge resolution. In more detail: """ which would be followed immediately by your text after "handle the most obvious case:" Am I reading your motivation correctly? Also, as Junio highlighted, I don't believe it's safe to only require that one tree of the merge bases match. You should require having both the same number of merge bases and that the set of trees for the merge bases of both the old and new merge commits exactly match. As a high level review, I personally tend to dislike piecemeal solutions that only work for very specific cases. However, others on the list may decide to include it, so long as it doesn't actively hurt other cases. One request I would make, if it is to be included, is that we design it such that we can easily jettison this code (& any documentation it needs) later when we gain a more general solution for rebasing merges. > Signed-off-by: Edmundo Carmona Antoranz <eantoranz@gmail.com> > --- > .gitignore | 1 + > Makefile | 1 + > builtin.h | 1 + > builtin/merge-tms.c | 148 ++++++++++++++++++++++++++++++++++++++++++++ > git.c | 1 + > sequencer.c | 36 ++++++++++- > 6 files changed, 187 insertions(+), 1 deletion(-) I understand waiting to make documentation updates while your proposal is just an RFC, but I think tests might help showcase how the strategy is meant to be used and verify whether the behavior is sane when your new strategy doesn't apply. > create mode 100644 builtin/merge-tms.c > > diff --git a/.gitignore b/.gitignore > index e875c59054..8b534f98e6 100644 > --- a/.gitignore > +++ b/.gitignore > @@ -103,6 +103,7 @@ > /git-merge-recursive > /git-merge-resolve > /git-merge-subtree > +/git-merge-tms > /git-mergetool > /git-mergetool--lib > /git-mktag > diff --git a/Makefile b/Makefile > index 50ee51fde3..10a3167c50 100644 > --- a/Makefile > +++ b/Makefile > @@ -1264,6 +1264,7 @@ BUILTIN_OBJS += builtin/merge-file.o > BUILTIN_OBJS += builtin/merge-index.o > BUILTIN_OBJS += builtin/merge-ours.o > BUILTIN_OBJS += builtin/merge-recursive.o > +BUILTIN_OBJS += builtin/merge-tms.o > BUILTIN_OBJS += builtin/merge-tree.o > BUILTIN_OBJS += builtin/merge.o > BUILTIN_OBJS += builtin/mktag.o > diff --git a/builtin.h b/builtin.h > index 46cc789789..94dcb73f85 100644 > --- a/builtin.h > +++ b/builtin.h > @@ -180,6 +180,7 @@ int cmd_merge_index(int argc, const char **argv, const char *prefix); > int cmd_merge_ours(int argc, const char **argv, const char *prefix); > int cmd_merge_file(int argc, const char **argv, const char *prefix); > int cmd_merge_recursive(int argc, const char **argv, const char *prefix); > +int cmd_merge_tms(int argc, const char **argv, const char *prefix); > int cmd_merge_tree(int argc, const char **argv, const char *prefix); > int cmd_mktag(int argc, const char **argv, const char *prefix); > int cmd_mktree(int argc, const char **argv, const char *prefix); > diff --git a/builtin/merge-tms.c b/builtin/merge-tms.c > new file mode 100644 > index 0000000000..37a2427757 > --- /dev/null > +++ b/builtin/merge-tms.c > @@ -0,0 +1,148 @@ > +/* > + * Copyright (c) 2023 Edmundo Carmona Antoranz I dislike having these in our files. Despite the file being modified later, they are virtually never updated to list the new authors and new years. Needing to always update these lines of code would obviously be onerous, and is why we don't do that, but that just means these lines always eventually become inaccurate and perhaps even wildly misleading. People can look at log/blame/etc. to find out the _correct_ information, so what purpose does a line like this serve? (I'm not a lawyer or anything close; I previously suggested ripping these lines out of our codebase but people pointed out we can't rip them out once put in. But it'd be nice to avoid spreading the problem. Or maybe there are still lawyer-ish reasons for including these that I'm not aware of?) > + * Released under the terms of GPL2 That's automatic for anything in the project that doesn't explicitly state otherwise; I'd rather not have this included for every file. Tangential question for the list: If folks do want their contributions to also be available under an alternate license (e.g. as done for sha1dc or as done with https://github.com/libgit2/libgit2/blob/main/git.git-authors), is there a scheme for doing so? Incidentally, my employer told me that any new entire files or subsystems that I contributed should be licensed under MIT (like sha1dc is). I didn't want to have to deal with that headache, but luckily I was copying a bunch of merge-recursive.[ch] code into merge-ort.[ch], so I just did that in the initial commits and side-stepped the whole question. :-) (But really, if anyone wants any of the code I've contributed to Git under MIT during my time at Palantir, that's what my employer wanted/intended so you've got permission to do so.) ...and with these two first lines out of the way, I can stop commenting on things I have a tenuous grasp on (copyright & licenses) to things that I actually know something about... > + * > + * Tipped merge strategy.... a.k.a. fortune-teller merge strategy > + * > + * In cases like rebases, merge commits offer us the advantage of knowing > + * _before hand_ what the previous result of the _original_ branches > + * involved was. > + * > + * This merge strategy tries to leverage this knowledge so that we can > + * avoid at least the most obvious conflicts that have been solved in the > + * original merge commit. Except it doesn't solve "the most obvious conflicts", it either solves all of the conflicts or none of them. Perhaps this wording can be fixed up? > + * > + * In the current state, the strategy works based on exact matches of the trees > + * involved: > + * - HEAD tree has to match the tree of the first parent of the original merge > + * commit. > + * - MERGE_HEAD tree has to match the tree of the second parent of the original > + * merge commit. > + * - At least one tree in the merge bases of HEAD/MERGE_HEAD has to match > + * a tree in the merge bases of the parent commits of the original merge > + * commit. > + * If all of those conditions are met, we can safely use the tree of the > + * original merge commit as the resulting tree of this merge that is being > + * attempted at the time. > + */ > + > +#include "builtin.h" > +#include "commit-reach.h" > +#include "oid-array.h" > +#include "parse-options.h" > +#include "run-command.h" > + > + > +struct tms_options { > + const char *tip; > + const char *merge_head; > +} tms_options; > + > +static int restore(struct commit *commit) > +{ > + struct child_process cmd = CHILD_PROCESS_INIT; > + > + strvec_push(&cmd.args, "restore"); > + strvec_push(&cmd.args, "--worktree"); > + strvec_push(&cmd.args, "--stage"); > + strvec_pushf(&cmd.args, "--source=%s", > + oid_to_hex(&commit->object.oid)); > + strvec_push(&cmd.args, "--"); > + strvec_push(&cmd.args, "."); > + cmd.git_cmd = 1; > + return run_command(&cmd); We fork subprocesses a lot in git, but it was a horrible mistake. It hurts performance, it *really* hurts on Windows (or so I hear), it hurts debuggability/maintainability in general, and we should be removing this from our codebase rather than adding more. As someone who has put time into slowly eradicating this from our codebase, please make library functions you can call instead. (Actually, first look and see if there are relevant library functions. Rebase/sequencer probably already needed this and already wrote such a function.) > +} > + > +static void load_tree_oids(struct oid_array *oids, struct commit_list *bases) > +{ > + struct commit_list *i; > + > + for (i = bases; i; i = i->next) > + oid_array_append(oids, get_commit_tree_oid(i->item)); > +} > + > +static int find_oid(const struct object_id *oid, > + void *data) > +{ > + struct oid_array *other_list = (struct oid_array *) data; > + int pos = oid_array_lookup(other_list, oid); > + return pos >= 0 ? 1 : 0; > +} > + > +static int base_match(struct commit *rebase_head, > + struct commit *head, > + struct commit *merge_head) > +{ > + struct commit_list *bases_current, *bases_tip; > + struct oid_array trees_current = OID_ARRAY_INIT; > + struct oid_array trees_tip = OID_ARRAY_INIT; > + int oid_match; > + > + bases_current = get_merge_bases(head, merge_head); > + bases_tip = get_merge_bases(rebase_head->parents->item, > + rebase_head->parents->next->item); > + load_tree_oids(&trees_current, bases_current); > + load_tree_oids(&trees_tip, bases_tip); > + > + oid_match = oid_array_for_each(&trees_current, find_oid, &trees_tip); > + > + oid_array_clear(&trees_current); > + oid_array_clear(&trees_tip); > + > + return oid_match; > +} > + > +static int run_tms_merge(struct tms_options *options) > +{ > + struct commit *head, *merge_head, *tip; > + struct commit_list *i; > + > + head = lookup_commit_reference_by_name("HEAD"); > + merge_head = lookup_commit_reference_by_name(options->merge_head); > + tip = lookup_commit_reference_by_name(options->tip); > + > + if (!(head && merge_head && tip)) { > + return 2; > + } > + if (commit_list_count(tip->parents) != 2) > + return 2; > + > + for (i = tip->parents; i; i = i->next) > + parse_commit(i->item); > + if (!oideq(get_commit_tree_oid(head), > + get_commit_tree_oid(tip->parents->item))) > + return 2; > + if (!oideq(get_commit_tree_oid(merge_head), > + get_commit_tree_oid(tip->parents->next->item))) > + return 2; > + > + if (!base_match(tip, head, merge_head)) > + return 2; > + > + if (restore(tip)) > + return 2; So six different cases where this merge strategy can currently bail and do nothing; I'll discuss this more below. > + > + return 0; > +} > + > +int cmd_merge_tms(int argc, const char **argv, const char *prefix) > +{ > + > + struct option mt_options[] = { > + OPT_STRING(0, "tip", &tms_options.tip, > + N_("tip-merge-commit"), > + N_("merge commit being rebased used as a tip for conflict resolution.")), > + OPT_END() > + }; > + argc = parse_options(argc, argv, NULL, mt_options, > + NULL, 0); > + > + if (argc != 1) > + return 2; > + tms_options.merge_head = argv[0]; > + > + if (!tms_options.tip) > + return 2; By creating a builtin named "merge-<foo>", you implicitly create a `--strategy <foo>` option for git-merge, not just something for git-rebase. If we publish this new merge strategy in a released version of Git, we'll have to support it for ~forever. For something that is meant as a short-term hack, I'd rather avoid that. Further, your merge strategy does not accept the normal arguments that a merge strategy accepts, and will bail with an exit status of 2 if a "--tip" is not specified. Luckily, my comments elsewhere to make library calls instead of forking subprocesses would also solve these issues here. I will also point out that this main function adds two more cases where we can bail and do nothing, returning a status of 2, so now we're up to eight cases. Again, I'll discuss these more below. > + > + return run_tms_merge(&tms_options); > +} > diff --git a/git.c b/git.c > index 96b0a2837d..2e843731f1 100644 > --- a/git.c > +++ b/git.c > @@ -544,6 +544,7 @@ static struct cmd_struct commands[] = { > { "merge-recursive-ours", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT }, > { "merge-recursive-theirs", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT }, > { "merge-subtree", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT }, > + { "merge-tms", cmd_merge_tms, RUN_SETUP }, > { "merge-tree", cmd_merge_tree, RUN_SETUP }, > { "mktag", cmd_mktag, RUN_SETUP }, > { "mktree", cmd_mktree, RUN_SETUP }, > diff --git a/sequencer.c b/sequencer.c > index 65a34f9676..559169814b 100644 > --- a/sequencer.c > +++ b/sequencer.c > @@ -3833,6 +3833,21 @@ static int do_reset(struct repository *r, > return ret; > } > > +static int try_tms_merge(struct replay_opts *opts, > + struct commit *rebase_head, > + struct commit *merge_commit) > +{ > + struct child_process cmd = CHILD_PROCESS_INIT; > + > + strvec_push(&cmd.args, "merge-tms"); > + strvec_push(&cmd.args, "--tip"); > + strvec_pushf(&cmd.args, "%s", oid_to_hex(&rebase_head->object.oid)); > + strvec_pushf(&cmd.args, "%s", oid_to_hex(&merge_commit->object.oid)); > + > + cmd.git_cmd = 1; > + return run_command(&cmd) ? 0 : 1; Again, let's please not fork subprocesses for builtin code. > +} > + > static int do_merge(struct repository *r, > struct commit *commit, > const char *arg, int arg_len, > @@ -3846,7 +3861,8 @@ static int do_merge(struct repository *r, > const char *strategy = !opts->xopts_nr && > (!opts->strategy || > !strcmp(opts->strategy, "recursive") || > - !strcmp(opts->strategy, "ort")) ? > + !strcmp(opts->strategy, "ort") || > + !strcmp(opts->strategy, "tms")) ? So folks trigger this by passing `--strategy tms` to rebase. Typically, all the --strategy options to rebase are also ones that merge accepts. So if we trigger via that method, we may need to expose the strategy to merge as well...or update the documentation in various places that talks about merge strategies to be more specific about which ones apply where. Also, what if the merge strategy is inappropriate for the case in question? "ort" and "recursive" are appropriate for all non-octopus merges, and the code only gets to this point in sequencer if we have a non-octopus merge to do. But your strategy is inappropriate other than in very specific circumstances that haven't yet been checked. What will the code do if we're under one of the many cases where "tms" isn't applicable? More on this question below. > NULL : opts->strategy; > struct merge_options o; > int merge_arg_len, oneline_offset, can_fast_forward, ret, k; > @@ -4086,6 +4102,23 @@ static int do_merge(struct repository *r, > o.branch2 = ref_name.buf; > o.buffer_output = 2; > > + if (!opts->strategy || !strcmp(opts->strategy, "tms")) { > + rollback_lock_file(&lock); > + ret = try_tms_merge(opts, commit, to_merge->item); > + if (ret) { > + discard_index(r->index); > + if (repo_read_index(r) < 0) { > + ret = error(_("could not read index")); > + goto leave_merge; > + } > + goto ran_merge; We "goto ran_merge", which could be very misleading. Under the 8 conditions given, we did not run a merge; we bailed early saying the merge strategy was inappropriate for the conditions at hand. For those 8 cases, ret will be 2 and we will jump to... > + } > + // regain lock to go into recursive > + if (repo_hold_locked_index(r, &lock, LOCK_REPORT_ON_ERROR) < 0) { > + ret = -1; > + goto leave_merge; > + } > + } > if (!opts->strategy || !strcmp(opts->strategy, "ort")) { > /* > * TODO: Should use merge_incore_recursive() and > @@ -4100,6 +4133,7 @@ static int do_merge(struct repository *r, > ret = merge_recursive(&o, head_commit, merge_commit, bases, > &i); > } > +ran_merge: > if (ret <= 0) > fputs(o.obuf.buf, stdout); > strbuf_release(&o.obuf); ...here. The code following this includes a check for ret < 0, and then runs: /* * The return value of merge_recursive() is 1 on clean, and 0 on * unclean merge. * * Let's reverse that, so that do_merge() returns 0 upon success and * 1 upon failed merge (keeping the return value -1 for the cases where * we will want to reschedule the `merge` command). */ ret = !ret; if (r->index->cache_changed && write_locked_index(r->index, &lock, COMMIT_LOCK)) { ret = error(_("merge: Unable to write new index file")); goto leave_merge; } So, if I'm reading correctly (please double check me), since your merge strategy made no changes to the index or working tree, and returned a status of 2, and since !2 == !1 == 0, we'll treat this the same as a successful merge and commit the "results", i.e. the tree of the first parent. Doesn't this tipped merge strategy thus behave the same as a `--strategy ours` merge when its preconditions are not satisfied? If it does, that would be horrifying.
On Sat, Mar 4, 2023 at 3:57 AM Edmundo Carmona Antoranz <eantoranz@gmail.com> wrote: > > On Sat, Mar 4, 2023 at 12:45 PM Edmundo Carmona Antoranz > <eantoranz@gmail.com> wrote: > > > > Usercase can be at the moment trying to rebase (with merges) on top of > > an exact base copy. In cases like this, git just crashes on merge > > commits. An easy example: > > I should have said _crashes on merge commits where there was a conflict_. Crash tends to mean the program has done something not allowed by the operating system. I don't think that's what's happening here. Do you mean Git stops with conflicts?
On Sat, Mar 4, 2023 at 9:36 PM Elijah Newren <newren@gmail.com> wrote: > > On Sat, Mar 4, 2023 at 3:57 AM Edmundo Carmona Antoranz > <eantoranz@gmail.com> wrote: > > > > On Sat, Mar 4, 2023 at 12:45 PM Edmundo Carmona Antoranz > > <eantoranz@gmail.com> wrote: > > > > > > Usercase can be at the moment trying to rebase (with merges) on top of > > > an exact base copy. In cases like this, git just crashes on merge > > > commits. An easy example: > > > > I should have said _crashes on merge commits where there was a conflict_. > > Crash tends to mean the program has done something not allowed by the > operating system. I don't think that's what's happening here. Do you > mean Git stops with conflicts? Ok, sorry for the sloppy wording. Yes, it stops with conflicts.
Hey, Elijah! Thank you for all of that feedback. Really good Will need to go through it which will take me some time to digest but I did want to go over one of the subjects. > So, if I'm reading correctly (please double check me), since your > merge strategy made no changes to the index or working tree, and > returned a status of 2, and since !2 == !1 == 0, we'll treat this the > same as a successful merge and commit the "results", i.e. the tree of > the first parent. Doesn't this tipped merge strategy thus behave the > same as a `--strategy ours` merge when its preconditions are not > satisfied? If it does, that would be horrifying. I think that is not correct. The possible values that come out of try_tms_merge are 0 if nothing happened and 1 if the merge was successful and we changed the index and the working tree. Then I think I wrote this correctly following the call: if (ret) { discard_index(r->index); if (repo_read_index(r) < 0) { ret = error(_("could not read index")); goto leave_merge; } goto ran_merge; } // regain lock to go into recursive if (repo_hold_locked_index(r, &lock, LOCK_REPORT_ON_ERROR) < 0) { ret = -1; goto leave_merge; } Actually, if we will be switching to using a library, then this won't be that important because we might be able to pull it off without having to release the lock given that we would be running in-process, but I wanted to clear up what the intended flow is there, just in case. Ok.... more questions or comments will be coming in the following days. And thank you, again. BR!
Elijah Newren <newren@gmail.com> writes: > Having read the rest of this thread, I notice you pointed out to Junio > that you want to amend a commit in the history of the merge, > suggesting you are just modifying the commit message (or maybe > author/committer info). More generally, I _think_ your usecase and > justification for this patch could be worded something like: > > """ > We often rebase with `--rebase-merges`, `--interactive`, and > `--keep-base` (or equivalent command line flags) and only modify > commit metadata during the rebase. Since we do not modify any files, > we would like the rebase to proceed without conflicts. It makes very much sense to focus on this narrow but useful use case, and I view it a very natural extension to already existing "if we just pick without any user interaction a commit on top of its current base, the we do not do anything, fast-forward and just pretend we picked it". IOW, shouldn't it something the sequencer machinery should be able to do natively without forcing the user to specify a new merge strategy?
diff --git a/.gitignore b/.gitignore index e875c59054..8b534f98e6 100644 --- a/.gitignore +++ b/.gitignore @@ -103,6 +103,7 @@ /git-merge-recursive /git-merge-resolve /git-merge-subtree +/git-merge-tms /git-mergetool /git-mergetool--lib /git-mktag diff --git a/Makefile b/Makefile index 50ee51fde3..10a3167c50 100644 --- a/Makefile +++ b/Makefile @@ -1264,6 +1264,7 @@ BUILTIN_OBJS += builtin/merge-file.o BUILTIN_OBJS += builtin/merge-index.o BUILTIN_OBJS += builtin/merge-ours.o BUILTIN_OBJS += builtin/merge-recursive.o +BUILTIN_OBJS += builtin/merge-tms.o BUILTIN_OBJS += builtin/merge-tree.o BUILTIN_OBJS += builtin/merge.o BUILTIN_OBJS += builtin/mktag.o diff --git a/builtin.h b/builtin.h index 46cc789789..94dcb73f85 100644 --- a/builtin.h +++ b/builtin.h @@ -180,6 +180,7 @@ int cmd_merge_index(int argc, const char **argv, const char *prefix); int cmd_merge_ours(int argc, const char **argv, const char *prefix); int cmd_merge_file(int argc, const char **argv, const char *prefix); int cmd_merge_recursive(int argc, const char **argv, const char *prefix); +int cmd_merge_tms(int argc, const char **argv, const char *prefix); int cmd_merge_tree(int argc, const char **argv, const char *prefix); int cmd_mktag(int argc, const char **argv, const char *prefix); int cmd_mktree(int argc, const char **argv, const char *prefix); diff --git a/builtin/merge-tms.c b/builtin/merge-tms.c new file mode 100644 index 0000000000..37a2427757 --- /dev/null +++ b/builtin/merge-tms.c @@ -0,0 +1,148 @@ +/* + * Copyright (c) 2023 Edmundo Carmona Antoranz + * Released under the terms of GPL2 + * + * Tipped merge strategy.... a.k.a. fortune-teller merge strategy + * + * In cases like rebases, merge commits offer us the advantage of knowing + * _before hand_ what the previous result of the _original_ branches + * involved was. + * + * This merge strategy tries to leverage this knowledge so that we can + * avoid at least the most obvious conflicts that have been solved in the + * original merge commit. + * + * In the current state, the strategy works based on exact matches of the trees + * involved: + * - HEAD tree has to match the tree of the first parent of the original merge + * commit. + * - MERGE_HEAD tree has to match the tree of the second parent of the original + * merge commit. + * - At least one tree in the merge bases of HEAD/MERGE_HEAD has to match + * a tree in the merge bases of the parent commits of the original merge + * commit. + * If all of those conditions are met, we can safely use the tree of the + * original merge commit as the resulting tree of this merge that is being + * attempted at the time. + */ + +#include "builtin.h" +#include "commit-reach.h" +#include "oid-array.h" +#include "parse-options.h" +#include "run-command.h" + + +struct tms_options { + const char *tip; + const char *merge_head; +} tms_options; + +static int restore(struct commit *commit) +{ + struct child_process cmd = CHILD_PROCESS_INIT; + + strvec_push(&cmd.args, "restore"); + strvec_push(&cmd.args, "--worktree"); + strvec_push(&cmd.args, "--stage"); + strvec_pushf(&cmd.args, "--source=%s", + oid_to_hex(&commit->object.oid)); + strvec_push(&cmd.args, "--"); + strvec_push(&cmd.args, "."); + cmd.git_cmd = 1; + return run_command(&cmd); +} + +static void load_tree_oids(struct oid_array *oids, struct commit_list *bases) +{ + struct commit_list *i; + + for (i = bases; i; i = i->next) + oid_array_append(oids, get_commit_tree_oid(i->item)); +} + +static int find_oid(const struct object_id *oid, + void *data) +{ + struct oid_array *other_list = (struct oid_array *) data; + int pos = oid_array_lookup(other_list, oid); + return pos >= 0 ? 1 : 0; +} + +static int base_match(struct commit *rebase_head, + struct commit *head, + struct commit *merge_head) +{ + struct commit_list *bases_current, *bases_tip; + struct oid_array trees_current = OID_ARRAY_INIT; + struct oid_array trees_tip = OID_ARRAY_INIT; + int oid_match; + + bases_current = get_merge_bases(head, merge_head); + bases_tip = get_merge_bases(rebase_head->parents->item, + rebase_head->parents->next->item); + load_tree_oids(&trees_current, bases_current); + load_tree_oids(&trees_tip, bases_tip); + + oid_match = oid_array_for_each(&trees_current, find_oid, &trees_tip); + + oid_array_clear(&trees_current); + oid_array_clear(&trees_tip); + + return oid_match; +} + +static int run_tms_merge(struct tms_options *options) +{ + struct commit *head, *merge_head, *tip; + struct commit_list *i; + + head = lookup_commit_reference_by_name("HEAD"); + merge_head = lookup_commit_reference_by_name(options->merge_head); + tip = lookup_commit_reference_by_name(options->tip); + + if (!(head && merge_head && tip)) { + return 2; + } + if (commit_list_count(tip->parents) != 2) + return 2; + + for (i = tip->parents; i; i = i->next) + parse_commit(i->item); + if (!oideq(get_commit_tree_oid(head), + get_commit_tree_oid(tip->parents->item))) + return 2; + if (!oideq(get_commit_tree_oid(merge_head), + get_commit_tree_oid(tip->parents->next->item))) + return 2; + + if (!base_match(tip, head, merge_head)) + return 2; + + if (restore(tip)) + return 2; + + return 0; +} + +int cmd_merge_tms(int argc, const char **argv, const char *prefix) +{ + + struct option mt_options[] = { + OPT_STRING(0, "tip", &tms_options.tip, + N_("tip-merge-commit"), + N_("merge commit being rebased used as a tip for conflict resolution.")), + OPT_END() + }; + argc = parse_options(argc, argv, NULL, mt_options, + NULL, 0); + + if (argc != 1) + return 2; + tms_options.merge_head = argv[0]; + + if (!tms_options.tip) + return 2; + + return run_tms_merge(&tms_options); +} diff --git a/git.c b/git.c index 96b0a2837d..2e843731f1 100644 --- a/git.c +++ b/git.c @@ -544,6 +544,7 @@ static struct cmd_struct commands[] = { { "merge-recursive-ours", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT }, { "merge-recursive-theirs", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT }, { "merge-subtree", cmd_merge_recursive, RUN_SETUP | NEED_WORK_TREE | NO_PARSEOPT }, + { "merge-tms", cmd_merge_tms, RUN_SETUP }, { "merge-tree", cmd_merge_tree, RUN_SETUP }, { "mktag", cmd_mktag, RUN_SETUP }, { "mktree", cmd_mktree, RUN_SETUP }, diff --git a/sequencer.c b/sequencer.c index 65a34f9676..559169814b 100644 --- a/sequencer.c +++ b/sequencer.c @@ -3833,6 +3833,21 @@ static int do_reset(struct repository *r, return ret; } +static int try_tms_merge(struct replay_opts *opts, + struct commit *rebase_head, + struct commit *merge_commit) +{ + struct child_process cmd = CHILD_PROCESS_INIT; + + strvec_push(&cmd.args, "merge-tms"); + strvec_push(&cmd.args, "--tip"); + strvec_pushf(&cmd.args, "%s", oid_to_hex(&rebase_head->object.oid)); + strvec_pushf(&cmd.args, "%s", oid_to_hex(&merge_commit->object.oid)); + + cmd.git_cmd = 1; + return run_command(&cmd) ? 0 : 1; +} + static int do_merge(struct repository *r, struct commit *commit, const char *arg, int arg_len, @@ -3846,7 +3861,8 @@ static int do_merge(struct repository *r, const char *strategy = !opts->xopts_nr && (!opts->strategy || !strcmp(opts->strategy, "recursive") || - !strcmp(opts->strategy, "ort")) ? + !strcmp(opts->strategy, "ort") || + !strcmp(opts->strategy, "tms")) ? NULL : opts->strategy; struct merge_options o; int merge_arg_len, oneline_offset, can_fast_forward, ret, k; @@ -4086,6 +4102,23 @@ static int do_merge(struct repository *r, o.branch2 = ref_name.buf; o.buffer_output = 2; + if (!opts->strategy || !strcmp(opts->strategy, "tms")) { + rollback_lock_file(&lock); + ret = try_tms_merge(opts, commit, to_merge->item); + if (ret) { + discard_index(r->index); + if (repo_read_index(r) < 0) { + ret = error(_("could not read index")); + goto leave_merge; + } + goto ran_merge; + } + // regain lock to go into recursive + if (repo_hold_locked_index(r, &lock, LOCK_REPORT_ON_ERROR) < 0) { + ret = -1; + goto leave_merge; + } + } if (!opts->strategy || !strcmp(opts->strategy, "ort")) { /* * TODO: Should use merge_incore_recursive() and @@ -4100,6 +4133,7 @@ static int do_merge(struct repository *r, ret = merge_recursive(&o, head_commit, merge_commit, bases, &i); } +ran_merge: if (ret <= 0) fputs(o.obuf.buf, stdout); strbuf_release(&o.obuf);
When rebasing merge commits and dealing with conflicts, having the original merge commit as a reference can help us avoid some of them. With this patch, we leverage the original merge commit to handle the most obvious case: - HEAD tree has to match the tree of the first parent of the original merge commit. - MERGE_HEAD tree has to match the tree of the second parent of the original merge commit. - At least one tree in the merge bases of HEAD/MERGE_HEAD has to match a tree in the merge bases of the parent commits of the original merge commit. If all of those conditions are met, we can safely use the tree of the original merge commit as the resulting tree of this merge that is being attempted at the time. Signed-off-by: Edmundo Carmona Antoranz <eantoranz@gmail.com> --- .gitignore | 1 + Makefile | 1 + builtin.h | 1 + builtin/merge-tms.c | 148 ++++++++++++++++++++++++++++++++++++++++++++ git.c | 1 + sequencer.c | 36 ++++++++++- 6 files changed, 187 insertions(+), 1 deletion(-) create mode 100644 builtin/merge-tms.c