diff mbox series

[RFC] introducing git replay

Message ID 20220413164336.101390-1-eantoranz@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC] introducing git replay | expand

Commit Message

Edmundo Carmona Antoranz April 13, 2022, 4:43 p.m. UTC
Let me explain with an easy-to-follow example:

$ git checkout v2.35.0
.
.
.
HEAD is now at 89bece5c8c Git 2.35
$ git commit --amend --no-edit
[detached HEAD c58a5e5621] Git 2.35
 Author: Junio C Hamano <someone@somewhere>
 Date: Mon Jan 24 09:25:25 2022 -0800
 2 files changed, 11 insertions(+), 1 deletion(-)
$ git rebase --rebase-merges --onto HEAD v2.35.0 v2.36.0-rc1
Auto-merging GIT-VERSION-GEN
CONFLICT (content): Merge conflict in GIT-VERSION-GEN
CONFLICT (content): Merge conflict in RelNotes
error: could not apply 4c53a8c20f... Git 2.35.1
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply 4c53a8c20f... Git 2.35.1

If HEAD and v2.35.0 share the same tree, it _should_ be possible
to recreate the commits that make up the range v2.35.0..v2.36.0-rc1
on top of HEAD without requiring any real "rebasing". Just creating
new revisions with the same information except for different parents
(and possibly a committer?).

This is what git replay does. To achieve the same in this example:

$ git checkout v2.35.0
.
.
.
HEAD is now at 89bece5c8c Git 2.35
$ git commit --amend --no-edit
[detached HEAD c682d8a22e] Git 2.35
 Author: Junio C Hamano <someone@somewhere>
 Date: Mon Jan 24 09:25:25 2022 -0800
 2 files changed, 11 insertions(+), 1 deletion(-)
$ git replay HEAD v2.35.0 v2.36.0-rc1
8312ecf6404ab1bacd5521a2d8681a2410d13ede

The ID that is printed out is the equivalent
of V2.36.0-rc1 after replaying.

This is a RFC because:
- Perhaps it is already possible to do it with git rebase
  to achieve the same? But I haven't seen a recipe that
  gets it done in stackoverflow, at least.
- If it is not possible, I think getting this logic in rebase
  (with a flag, for example --replay) makes sense.

Let me know what you think.
Interesting? Not?
Keep it as a builtin (polishing it, it's just a rough cut
at the time) or get it into rebase?
---
 Makefile         |   1 +
 builtin.h        |   1 +
 builtin/replay.c | 148 +++++++++++++++++++++++++++++++++++++++++++++++
 git.c            |   1 +
 4 files changed, 151 insertions(+)
 create mode 100644 builtin/replay.c

Comments

Junio C Hamano April 13, 2022, 5:05 p.m. UTC | #1
Edmundo Carmona Antoranz <eantoranz@gmail.com> writes:

> This is a RFC because:
> - Perhaps it is already possible to do it with git rebase
>   to achieve the same? But I haven't seen a recipe that
>   gets it done in stackoverflow, at least.

Without thinking about it too much, out of gut reaction, it looks
like a better target for fast-export piped to fast-import than
rebase or amend, if all it can do is to replay on _identical_ state
and nothing else.

> Let me know what you think.
> Interesting? Not?

If this _were_ to allow some slight deviations of the base and carry
the differences forward, then it definitely belongs to rebase, and
perhaps "rebase --replay-merges" should be taught to behave better
without introducing a new option.  But otherwise, I do not think it
is all that useful.  

Also, if this _were_ to allow recreating the shape of the history,
using updated tips of branches that were merged in the original
history, perhaps taking hints from "Merge branch X into Y" in the
original merge commit's log messages, that would be quite useful
addition to the rebase mechanism, but this is not that.

So, not really, to me at least.
Randall S. Becker April 13, 2022, 5:26 p.m. UTC | #2
On April 13, 2022 12:44 PM, Edmundo Carmona Antoranz wrote:
>Let me explain with an easy-to-follow example:
>
>$ git checkout v2.35.0
>.
>.
>.
>HEAD is now at 89bece5c8c Git 2.35
>$ git commit --amend --no-edit
>[detached HEAD c58a5e5621] Git 2.35
> Author: Junio C Hamano <someone@somewhere>
> Date: Mon Jan 24 09:25:25 2022 -0800
> 2 files changed, 11 insertions(+), 1 deletion(-) $ git rebase
--rebase-merges --onto
>HEAD v2.35.0 v2.36.0-rc1 Auto-merging GIT-VERSION-GEN CONFLICT (content):
>Merge conflict in GIT-VERSION-GEN CONFLICT (content): Merge conflict in
>RelNotes
>error: could not apply 4c53a8c20f... Git 2.35.1
>hint: Resolve all conflicts manually, mark them as resolved with
>hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
>hint: You can instead skip this commit: run "git rebase --skip".
>hint: To abort and get back to the state before "git rebase", run "git
rebase --
>abort".
>Could not apply 4c53a8c20f... Git 2.35.1
>
>If HEAD and v2.35.0 share the same tree, it _should_ be possible to
recreate the
>commits that make up the range v2.35.0..v2.36.0-rc1 on top of HEAD without
>requiring any real "rebasing". Just creating new revisions with the same
>information except for different parents (and possibly a committer?).
>
>This is what git replay does. To achieve the same in this example:
>
>$ git checkout v2.35.0
>.
>.
>.
>HEAD is now at 89bece5c8c Git 2.35
>$ git commit --amend --no-edit
>[detached HEAD c682d8a22e] Git 2.35
> Author: Junio C Hamano <someone@somewhere>
> Date: Mon Jan 24 09:25:25 2022 -0800
> 2 files changed, 11 insertions(+), 1 deletion(-) $ git replay HEAD v2.35.0
v2.36.0-rc1
>8312ecf6404ab1bacd5521a2d8681a2410d13ede
>
>The ID that is printed out is the equivalent of V2.36.0-rc1 after
replaying.
>
>This is a RFC because:
>- Perhaps it is already possible to do it with git rebase
>  to achieve the same? But I haven't seen a recipe that
>  gets it done in stackoverflow, at least.
>- If it is not possible, I think getting this logic in rebase
>  (with a flag, for example --replay) makes sense.
>
>Let me know what you think.
>Interesting? Not?
>Keep it as a builtin (polishing it, it's just a rough cut at the time) or
get it into
>rebase?
>---
> Makefile         |   1 +
> builtin.h        |   1 +
> builtin/replay.c | 148
>+++++++++++++++++++++++++++++++++++++++++++++++
> git.c            |   1 +
> 4 files changed, 151 insertions(+)
> create mode 100644 builtin/replay.c
>
>diff --git a/Makefile b/Makefile
>index f8bccfab5e..71924d0c43 100644
>--- a/Makefile
>+++ b/Makefile
>@@ -1194,6 +1194,7 @@ BUILTIN_OBJS += builtin/remote-fd.o  BUILTIN_OBJS +=
>builtin/remote.o  BUILTIN_OBJS += builtin/repack.o  BUILTIN_OBJS +=
>builtin/replace.o
>+BUILTIN_OBJS += builtin/replay.o
> BUILTIN_OBJS += builtin/rerere.o
> BUILTIN_OBJS += builtin/reset.o
> BUILTIN_OBJS += builtin/rev-list.o
>diff --git a/builtin.h b/builtin.h
>index 40e9ecc848..0c1915c4c9 100644
>--- a/builtin.h
>+++ b/builtin.h
>@@ -207,6 +207,7 @@ int cmd_remote(int argc, const char **argv, const char
>*prefix);  int cmd_remote_ext(int argc, const char **argv, const char
*prefix);  int
>cmd_remote_fd(int argc, const char **argv, const char *prefix);  int
>cmd_repack(int argc, const char **argv, const char *prefix);
>+int cmd_replay(int argc, const char **argv, const char *prefix);
> int cmd_rerere(int argc, const char **argv, const char *prefix);  int
cmd_reset(int
>argc, const char **argv, const char *prefix);  int cmd_restore(int argc,
const char
>**argv, const char *prefix); diff --git a/builtin/replay.c
b/builtin/replay.c new file
>mode 100644 index 0000000000..ed970fa057
>--- /dev/null
>+++ b/builtin/replay.c
>@@ -0,0 +1,148 @@
>+/*
>+ * "git replay" builtin command
>+ *
>+ * Copyright (c) 2022 Edmundo Carmona Antoranz
>+ * Released under the terms of GPLv2
>+ */
>+
>+#include "builtin.h"
>+#include "revision.h"
>+#include "commit.h"
>+#include "cache.h"
>+
>+static struct commit **new_commits;
>+static unsigned long mappings_size = 0; static struct commit_list
>+*old_commits = NULL;
>+
>+static unsigned int replay_indexof(struct commit *commit,
>+				   struct commit_list *list)
>+{
>+	int res;
>+	if (list == NULL)
>+		return -1;
>+	if (!oidcmp(&list->item->object.oid,
>+		    &commit->object.oid))
>+		return 0;
>+	res = replay_indexof(commit, list->next);
>+	return res < 0 ? res : res + 1;
>+}
>+
>+static struct commit *replay_find_commit(const char *name) {
>+	struct commit *commit = lookup_commit_reference_by_name(name);
>+	if (!commit)
>+		die(_("no such branch/commit '%s'"), name);
>+	return commit;
>+}
>+
>+static struct commit* replay_commit(struct commit * orig_commit) {
>+	struct pretty_print_context ctx = {0};
>+	struct strbuf body = STRBUF_INIT;
>+	struct strbuf author = STRBUF_INIT;
>+	struct strbuf committer = STRBUF_INIT;
>+	struct object_id new_commit_oid;
>+	struct commit *new_commit;
>+
>+	struct commit_list *new_parents_head = NULL;
>+	struct commit_list **new_parents = &new_parents_head;
>+	struct commit_list *parents = orig_commit->parents;
>+	while (parents) {
>+		struct commit *parent = parents->item;
>+		int commit_index;
>+		struct commit *new_parent;
>+
>+		commit_index = replay_indexof(parent, old_commits);
>+
>+		if (commit_index < 0)
>+			 // won't be replayed, use the original parent
>+			new_parent = parent;
>+		else {
>+			// it might have been translated already
>+			if (!new_commits[commit_index])
>+				new_commits[commit_index] =
>replay_commit(parent);
>+			new_parent = new_commits[commit_index];
>+		}
>+		new_parents = commit_list_append(new_parent, new_parents);
>+		parents = parents->next;
>+	}
>+
>+	format_commit_message(orig_commit, "%B", &body, &ctx);
>+	// TODO timezones
>+	format_commit_message(orig_commit, "%an <%ae> %at +0000",
>&author, &ctx);
>+	// TODO consider committer (control with an option)
>+	format_commit_message(orig_commit, "%cn <%ce> %ct +0000",
>&committer,
>+&ctx);
>+
>+	commit_tree_extended(body.buf,
>+			     body.len,
>+			     get_commit_tree_oid(orig_commit),
>+			     new_parents_head,
>+			     &new_commit_oid,
>+			     author.buf,
>+			     committer.buf,
>+			     NULL, NULL);
>+
>+	new_commit = lookup_commit_or_die(&new_commit_oid,
>+					  "new commit");
>+
>+	strbuf_release(&author);
>+	strbuf_release(&body);
>+	strbuf_release(&committer);
>+
>+	return new_commit;
>+}
>+
>+static struct commit* replay(struct commit *new_base, struct commit
*old_base,
>+		      struct commit *tip)
>+{
>+	struct rev_info revs;
>+	struct commit *commit;
>+
>+	init_revisions(&revs, NULL);
>+
>+	old_base->object.flags |= UNINTERESTING;
>+	add_pending_object(&revs, &old_base->object, "old-base");
>+	add_pending_object(&revs, &tip->object, "tip");
>+
>+	if (prepare_revision_walk(&revs))
>+		die("Could not get revisions to replay");
>+
>+	while ((commit = get_revision(&revs)) != NULL)
>+		commit_list_insert(commit, &old_commits);
>+
>+	// save the mapping between the new and the old base
>+	commit_list_insert(old_base, &old_commits);
>+	mappings_size = commit_list_count(old_commits);
>+	new_commits = calloc(mappings_size, sizeof(struct commit));
>+	new_commits[replay_indexof(old_base, old_commits)] = new_base;
>+
>+	return replay_commit(tip);
>+}
>+
>+
>+int cmd_replay(int argc, const char **argv, const char *prefix) {
>+	struct commit *new_base;
>+	struct commit *old_base;
>+	struct commit *tip;
>+	struct commit *new_tip;
>+
>+	if (argc < 4) {
>+		die("Not enough parameters");
>+	}
>+
>+	new_base = replay_find_commit(argv[1]);
>+	old_base = replay_find_commit(argv[2]);
>+	tip = replay_find_commit(argv[3]);
>+
>+	if (oidcmp(get_commit_tree_oid(new_base),
>+		   get_commit_tree_oid(old_base)))
>+		die("The old and the new base do not have the same tree");
>+
>+	// get the list of revisions between old_base and tip
>+	new_tip = replay(new_base, old_base, tip);
>+
>+	printf("%s\n", oid_to_hex(&new_tip->object.oid));
>+
>+	return 0;
>+}
>diff --git a/git.c b/git.c
>index 3d8e48cf55..14de8d666f 100644
>--- a/git.c
>+++ b/git.c
>@@ -590,6 +590,7 @@ static struct cmd_struct commands[] = {
> 	{ "remote-fd", cmd_remote_fd, NO_PARSEOPT },
> 	{ "repack", cmd_repack, RUN_SETUP },
> 	{ "replace", cmd_replace, RUN_SETUP },
>+	{ "replay", cmd_replay, RUN_SETUP },
> 	{ "rerere", cmd_rerere, RUN_SETUP },
> 	{ "reset", cmd_reset, RUN_SETUP },
> 	{ "restore", cmd_restore, RUN_SETUP | NEED_WORK_TREE },
>--
>2.35.1

I'm sorry if I'm missing something here but how is this different from
cherry-pick A..B?
--Randall
Edmundo Carmona Antoranz April 13, 2022, 5:30 p.m. UTC | #3
On Wed, Apr 13, 2022 at 7:26 PM <rsbecker@nexbridge.com> wrote:
>
> >2.35.1
>
> I'm sorry if I'm missing something here but how is this different from
> cherry-pick A..B?
> --Randall
>

Good question, but cherry-pick has troubles with merges:

(same example, after amending):
$ git cherry-pick v2.35.0..v2.36.0-rc1
error: commit bb4921cf45e11d063e7bbe55f594adf8f0077d5d is a merge but
no -m option was given.
fatal: cherry-pick failed
Phillip Susi April 13, 2022, 5:44 p.m. UTC | #4
Edmundo Carmona Antoranz <eantoranz@gmail.com> writes:

> If HEAD and v2.35.0 share the same tree, it _should_ be possible
> to recreate the commits that make up the range v2.35.0..v2.36.0-rc1
> on top of HEAD without requiring any real "rebasing". Just creating

Isn't that literally the definition of rebase?
Edmundo Carmona Antoranz April 13, 2022, 5:44 p.m. UTC | #5
On Wed, Apr 13, 2022 at 7:30 PM Edmundo Carmona Antoranz
<eantoranz@gmail.com> wrote:
>
> On Wed, Apr 13, 2022 at 7:26 PM <rsbecker@nexbridge.com> wrote:
> >
> > >2.35.1
> >
> > I'm sorry if I'm missing something here but how is this different from
> > cherry-pick A..B?
> > --Randall
> >
>
> Good question, but cherry-pick has troubles with merges:
>
> (same example, after amending):
> $ git cherry-pick v2.35.0..v2.36.0-rc1
> error: commit bb4921cf45e11d063e7bbe55f594adf8f0077d5d is a merge but
> no -m option was given.
> fatal: cherry-pick failed

By the way, Randall, it's not just _merges_. Correct me if I'm wrong
but cherry-pick (or rebase) will run merges (in all their glory) to get code
cherry-picked. What I want to do is skip merges/conflicts altogether
by using the information of existing revisions (the ones we want to
cherry-pick) just adjusting their parents to get the new revisions.

Let me know!
Junio C Hamano April 13, 2022, 5:48 p.m. UTC | #6
Edmundo Carmona Antoranz <eantoranz@gmail.com> writes:

I've already gave my take on "is this interesting?" question with a
"not really", but let's look at the code, as the expertise will
translate to your future contributions easily, even if this
particular code may turn out not to be used in the project.

> +static unsigned int replay_indexof(struct commit *commit,
> +				   struct commit_list *list)
> +{
> +	int res;
> +	if (list == NULL)
> +		return -1;

We encourage to have a blank line between the end of the decls and
the beginning of the statements.  Add one after the line that
declares the variable "res".  

In an existing code that consistently uses the abbreviation, it is a
different story, but "result" is not too long to spell out, and
because you do not use the variable that often anyway, you would
avoid unnecessary friction on the readers not to abbreviate it to
"res" here.

> +	if (!oidcmp(&list->item->object.oid,
> +		    &commit->object.oid))

	if (!oidcmp(&list->item->object.oid, &commit->object.oid))

is 66 columns wide and this line is wrapped too short, without an
apparent upside to make the result any easier to read.

> +		return 0;
> +	res = replay_indexof(commit, list->next);

Do we need to go recursive here?  It feels wasteful, compared to
iteratively doing this.  FWIW, is the singly chained commit_list the
best data structure if you have "a collection of commits, among
which you'd need to find an existing one, if any"?  You may want to
consider using a hashset, perhaps, as the only thing you seem to be
getting out of the data structure is "is this among the old_commits
set?"  Another possibility, if you do not call APIs that use the
object flags, may be to allocate a flag bit and mark these commits
in the original history you discover via get_revision(), instead of
placing them in a singly chained commit_list structure.  Then the
loop in replay_commit() can just see if parent->object.flags has
that "in the original history?" bit set to decide.

> +	return res < 0 ? res : res + 1;
> +}
> +
> +static struct commit *replay_find_commit(const char *name)
> +{
> +	struct commit *commit = lookup_commit_reference_by_name(name);
> +	if (!commit)
> +		die(_("no such branch/commit '%s'"), name);
> +	return commit;
> +}
> +
> +static struct commit* replay_commit(struct commit * orig_commit)

In our codebase, an asterisk sticks to the identifier, not the type.

> +{
> +	struct pretty_print_context ctx = {0};
> +	struct strbuf body = STRBUF_INIT;
> +	struct strbuf author = STRBUF_INIT;
> +	struct strbuf committer = STRBUF_INIT;
> +	struct object_id new_commit_oid;
> +	struct commit *new_commit;
> +
> +	struct commit_list *new_parents_head = NULL;
> +	struct commit_list **new_parents = &new_parents_head;
> +	struct commit_list *parents = orig_commit->parents;
> +	while (parents) {

It is not _wrong_ to have a blank line inside the decl block if
there is a logical separation between the groups.  I am not sure if
the one in the above after "struct commit *new_commit" qualifies as
one.

Regardless, after such a large decl block, we want a blank line
before the first statement.

> +		struct commit *parent = parents->item;
> +		int commit_index;
> +		struct commit *new_parent;
> +
> +		commit_index = replay_indexof(parent, old_commits);
> +
> +		if (commit_index < 0)
> +			 // won't be replayed, use the original parent

We still frown upon // comments in this codebase.

> +			new_parent = parent;
> +		else {
> +			// it might have been translated already
> +			if (!new_commits[commit_index])
> +				new_commits[commit_index] = replay_commit(parent);
> +			new_parent = new_commits[commit_index];
> +		}
> +		new_parents = commit_list_append(new_parent, new_parents);
> +		parents = parents->next;
> +	}
> +
> +	format_commit_message(orig_commit, "%B", &body, &ctx);
> +	// TODO timezones
> +	format_commit_message(orig_commit, "%an <%ae> %at +0000", &author, &ctx);
> +	// TODO consider committer (control with an option)
> +	format_commit_message(orig_commit, "%cn <%ce> %ct +0000", &committer, &ctx);

Yuck.  Shouldn't this code, whose purpose is to replay the messages
and metadata of the commit as faithfully as possible while
reparenting them, be just reusing the original commit object?

Perhaps turning save_commit_buffer on, reading the original commit
object in the raw format, rewriting the parent pointers (and nothing
else) and finally calling write_object_file() to create the new
commit object is what this code should be doing instead.

And that would be another reason why this is probably better done as
fast-export piped to fast-import, but this message is not about the
design of the "feature" itself, so let me stop talking about that.
Edmundo Carmona Antoranz April 13, 2022, 5:49 p.m. UTC | #7
On Wed, Apr 13, 2022 at 7:45 PM Phillip Susi <phill@thesusis.net> wrote:
>
>
> Edmundo Carmona Antoranz <eantoranz@gmail.com> writes:
>
> > If HEAD and v2.35.0 share the same tree, it _should_ be possible
> > to recreate the commits that make up the range v2.35.0..v2.36.0-rc1
> > on top of HEAD without requiring any real "rebasing". Just creating
>
> Isn't that literally the definition of rebase?
>

Well, yeah. :-) What I mean is to skip the rebase _engine_. No
merging/cherry-picking/conflicts along the way of recreating the
new revisions. Say, clone the exact same revisions that we want to
_rebase_ and adjust their parents, nothing else (or little else, like adjusting
the committer).
Edmundo Carmona Antoranz April 13, 2022, 5:56 p.m. UTC | #8
On Wed, Apr 13, 2022 at 7:48 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Edmundo Carmona Antoranz <eantoranz@gmail.com> writes:
>
> I've already gave my take on "is this interesting?" question with a
> "not really", but let's look at the code, as the expertise will
> translate to your future contributions easily, even if this
> particular code may turn out not to be used in the project.
>

Actually, thanks for taking the time. I knew there would be better
ways to do things so thanks for the tips.
Ævar Arnfjörð Bjarmason April 13, 2022, 7:07 p.m. UTC | #9
On Wed, Apr 13 2022, Edmundo Carmona Antoranz wrote:

> On Wed, Apr 13, 2022 at 7:45 PM Phillip Susi <phill@thesusis.net> wrote:
>>
>>
>> Edmundo Carmona Antoranz <eantoranz@gmail.com> writes:
>>
>> > If HEAD and v2.35.0 share the same tree, it _should_ be possible
>> > to recreate the commits that make up the range v2.35.0..v2.36.0-rc1
>> > on top of HEAD without requiring any real "rebasing". Just creating
>>
>> Isn't that literally the definition of rebase?
>>
>
> Well, yeah. :-) What I mean is to skip the rebase _engine_. No
> merging/cherry-picking/conflicts along the way of recreating the
> new revisions. Say, clone the exact same revisions that we want to
> _rebase_ and adjust their parents, nothing else (or little else, like adjusting
> the committer).

Yeah I think this is fundimentally a good idea to pursue, and it's been
discussed at various times in the past, and indeed, it seems best to
pursue it as a rebase optimization.

I.e. given a history that has say files A.txt and B.txt, and a fork from
A adding X.txt and Y.txt (and nothing else) we should be able to do a
"light rebase" in moving that X & Y forward to it has B as the parent.

Right now we do a rebase in all its glory to do that, with index
updating along the way (I forget how much that's been optimized, if at
all) etc.

But if we can detect that we say only have additions of new files we
could just munge the headers as we go along, and the rest should all be
happening essentially as fast as we can SHA-1 the commit objects, which
is basically what this built-in does, right?
Eric Sunshine April 13, 2022, 8:06 p.m. UTC | #10
On Wed, Apr 13, 2022 at 3:27 PM Junio C Hamano <gitster@pobox.com> wrote:
> Edmundo Carmona Antoranz <eantoranz@gmail.com> writes:
> > +static unsigned int replay_indexof(struct commit *commit,
> > +                                struct commit_list *list)
> > +{
> > +     int res;
> > +     if (list == NULL)
> > +             return -1;
>
> We encourage to have a blank line between the end of the decls and
> the beginning of the statements.  Add one after the line that
> declares the variable "res".
>
> In an existing code that consistently uses the abbreviation, it is a
> different story, but "result" is not too long to spell out, and
> because you do not use the variable that often anyway, you would
> avoid unnecessary friction on the readers not to abbreviate it to
> "res" here.

One other minor point... in Git source code, comparison against NULL is spelled:

    if (!list)
        ...

(And the inverse case of `if (list != NULL)` would just be `if (list)`.)
Edmundo Carmona Antoranz April 15, 2022, 6:46 p.m. UTC | #11
On Wed, Apr 13, 2022 at 7:05 PM Junio C Hamano <gitster@pobox.com> wrote:
>
>
> Also, if this _were_ to allow recreating the shape of the history,
> using updated tips of branches that were merged in the original
> history, perhaps taking hints from "Merge branch X into Y" in the
> original merge commit's log messages, that would be quite useful
> addition to the rebase mechanism, but this is not that.

Ok.... I have taken some time to read it through in calm and this part
is making my head scratch because it sounds like the stuff that I had
in mind in replay (or if we move  it into rebase or any other place).
So, to clarify what you are asking, when replaying _a given commit_:

- If a parent is included in the history of the _old_ base, you would
like to see it included as is (no changes) as a parent of the replayed
commit.
- If a parent is not included in the history of the _old base_, it
means it has been replayed and we need to take the equivalent replayed
commit of the parent.

So, coming back to the example:

git replay HEAD v2.35.0 v2.36.0-rc1

We would replay all commits that make up the range v2.35.0..v2.36.0-rc1

Now, for any given commit being replayed, if a parent is in that
range, it would use the replayed commit of the parent. If the parent
is not in that range (which means that it's part of the history of
v2.35.0, which is not being replayed) then we need to take it as-is
straight no changes which means that _the original_ parent commit will
be  linked in the resulting replayed commit that we are working on at
the time therefore linking the original history. (sorry if it sounds a
little bit reiterative).

Is that what you mean? Because, if that is the case, that is what
replay does as of this patch.

If you gave it a try like in the eample, you should end up with a
commit (the one reported at the end of the execution) that has the
_exact same_ history shape of v2.36.0-rc1 (even linking to  commits
that are part of the history of v2.35.0).

But I am probably wrong in terms of what I understand that you meant.
Can you expand a little bit, if you don't mind?
Junio C Hamano April 15, 2022, 8:33 p.m. UTC | #12
Edmundo Carmona Antoranz <eantoranz@gmail.com> writes:

> But I am probably wrong in terms of what I understand that you meant.
> Can you expand a little bit, if you don't mind?

What I had in mind is what I have to do every day multiple times.
'master'..'seen' is a series of merges of tips of different topic
branches.

                 ---T topic
                     \
       \    \    \    \
  --M---o----o----o----S seen
    ^
    master


Some of the topic branches may get updated and 'master' may gain
more commits by merging some topics.  Now it is time to update the
'master'..'seen' chain.

                 ---T---P topic (updated)
                     \
       \    \    \    \
  --M---o----o----o----S seen
     \
      o
       \
        N
       master

It would be wonderful if a single command like replay can be used to
say "In the old history master..seen I have bunch of merges.  master
used to be M but now it is at N.  Rebuild M..S on top of N _but_
with a bit of twist.  Some of the topics in M...S may have been
merged to 'master' between M..N and the replayed history on top of N
does not want to have a merge from such 'already graduated' topics.
Many topics are updated, either by adding a new commit on top or
completely rewritten, and we want an updated tip of these topic
branches, not the old tip that I merged when I created M..S chain,
when replaying the history on top of N."

That kind of operation is quite different from what "rebase" does,
and deserves to be under a different name.

Compared to that, "replay exactly the same set of commits in the
same shape on top of a different commit whose tree happens to be the
same as the original", is a mere special case of "rebase" that is
not all that interesting.  It may be a worthwhile thing to do to
teach "rebase" capable of doing so reliably and more efficiently,
but that still falls into "improving rebase" category, not meriting
a separate command.
Edmundo Carmona Antoranz April 16, 2022, 5:35 a.m. UTC | #13
On Fri, Apr 15, 2022 at 10:33 PM Junio C Hamano <gitster@pobox.com> wrote:
>
>
> It would be wonderful if a single command like replay can be used to
> say "In the old history master..seen I have bunch of merges.  master
> used to be M but now it is at N.  Rebuild M..S on top of N _but_
> with a bit of twist.  Some of the topics in M...S may have been
> merged to 'master' between M..N and the replayed history on top of N
> does not want to have a merge from such 'already graduated' topics.
> Many topics are updated, either by adding a new commit on top or
> completely rewritten, and we want an updated tip of these topic
> branches, not the old tip that I merged when I created M..S chain,
> when replaying the history on top of N."
>
> That kind of operation is quite different from what "rebase" does,
> and deserves to be under a different name.
>

Let me work a little bit on your workflow to see what I can do. Tip:
It will probably come out in the shape of a script. We can talk about
what to do with it later.

> Compared to that, "replay exactly the same set of commits in the
> same shape on top of a different commit whose tree happens to be the
> same as the original", is a mere special case of "rebase" that is
> not all that interesting.  It may be a worthwhile thing to do to
> teach "rebase" capable of doing so reliably and more efficiently,
> but that still falls into "improving rebase" category, not meriting
> a separate command.
>
>

I agree that it might not require a full separate command. I'll see if
I am able to get it into rebase.

Thanks for the feedback.
Junio C Hamano April 16, 2022, 6:39 a.m. UTC | #14
Edmundo Carmona Antoranz <eantoranz@gmail.com> writes:

> Let me work a little bit on your workflow to see what I can do. Tip:
> It will probably come out in the shape of a script. We can talk about
> what to do with it later.

Heh, I have a script to do what I have to do every day multiple
times already, can be viewed in my 'todo' branch.
Edmundo Carmona Antoranz April 16, 2022, 7:02 a.m. UTC | #15
On Sat, Apr 16, 2022 at 8:39 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Edmundo Carmona Antoranz <eantoranz@gmail.com> writes:
>
> > Let me work a little bit on your workflow to see what I can do. Tip:
> > It will probably come out in the shape of a script. We can talk about
> > what to do with it later.
>
> Heh, I have a script to do what I have to do every day multiple
> times already, can be viewed in my 'todo' branch.

:-D I thought you were in desperate need.
Elijah Newren April 17, 2022, 5:05 a.m. UTC | #16
On Fri, Apr 15, 2022 at 10:41 PM Edmundo Carmona Antoranz
<eantoranz@gmail.com> wrote:
>
> On Fri, Apr 15, 2022 at 10:33 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > It would be wonderful if a single command like replay can be used to
> > say "In the old history master..seen I have bunch of merges.  master
> > used to be M but now it is at N.  Rebuild M..S on top of N _but_
> > with a bit of twist.  Some of the topics in M...S may have been
> > merged to 'master' between M..N and the replayed history on top of N
> > does not want to have a merge from such 'already graduated' topics.
> > Many topics are updated, either by adding a new commit on top or
> > completely rewritten, and we want an updated tip of these topic
> > branches, not the old tip that I merged when I created M..S chain,
> > when replaying the history on top of N."
> >
> > That kind of operation is quite different from what "rebase" does,
> > and deserves to be under a different name.
> >
>
> Let me work a little bit on your workflow to see what I can do.

Replaying merges is something I've put a little thought into, so allow
me to provide some pointers that may help.  Merges need special
handling for replaying, and in my opinion, doing either just a new
merge of the new trees (what rebase --rebase-merges does), or just
reusing existing trees (what you proposed to start this thread) are
both suboptimal, though the former is likely to just be annoying and
require potentially unnecessary user refixing, whereas the latter can
silently discard changes or reintroduce discarded changes and could be
dangerous.  More details on both of these...

An important part about merges is they may have resolved conflicts --
both textual (the standard conflict markers people have to resolve)
and semantic (e.g. one person changes the API of some function, and
the other branch being merged adds a caller of that function, so the
merge has to modify the new caller to use the new API).  We do not
just want to do a new merge and re-use the commit message (as rebase
--rebase-merges does), for two reasons: (1) the user either has to
re-resolve the textual conflict resolutions by hand, or use rerere
which requires a working tree (and we'd like replays to proceed
without a working tree where possible), and (2) it tosses semantic
merge conflict resolutions entirely.  We also do not just want to use
existing trees as-is (as you started with in your patch), for three
reasons: (1) when we move to a new base the new merge needs to include
the changes from the newer base, (2) the topic might have additional
changes added (or removed) during the "rebase" which need to be
reflected in the merge as well, and (3) the merge may have had
additional changes stuffed directly into it to solve semantic
conflicts which we want "ported" to the new merge commit.    So, for
handling merges, we should avoid both of these overly simplistic
mechanisms, and do something that tries to handle forward-porting
these conflict resolutions.  I have outlined steps to do so at
https://lore.kernel.org/git/CABPp-BHp+d62dCyAaJfh1cZ8xVpGyb97mZryd02aCOX=Qn=Ltw@mail.gmail.com/

> Tip: It will probably come out in the shape of a script. We can talk about
> what to do with it later.

Note that we've worked hard to replace scripts with builtins in git,
especially in the case of rebase.  Scripts are great for prototyping,
and I fully support that, but I'd rather that scripts remained a
prototype.  I'd be sad to see us regress and return to scripts for
rebase.

> > Compared to that, "replay exactly the same set of commits in the
> > same shape on top of a different commit whose tree happens to be the
> > same as the original", is a mere special case of "rebase" that is
> > not all that interesting.  It may be a worthwhile thing to do to
> > teach "rebase" capable of doing so reliably and more efficiently,
> > but that still falls into "improving rebase" category, not meriting
> > a separate command.
>
> I agree that it might not require a full separate command. I'll see if
> I am able to get it into rebase.

If you'd like to try out some of the ideas in the link above for
handling replaying of merges, feel free.  I've done a little bit of
playing with this idea, and would be especially interested to learn of
any new testcases or challenges you come up with.  I think a full
implementation requires changes to the merge machinery (at each of the
merge-ort.c, ll-merge.c, and maybe even xdiff/ levels), on top of the
merge machinery changes being driven by the merge-tree changes.

[1] https://lore.kernel.org/git/CABPp-BGW39_5r8Lbt3ymR+F_=hWJcf=2e7O75vFNJ=3CEL5s=g@mail.gmail.com/

I'd also like to mention that I have a git-replay command, as
mentioned previously at various places (see [2,3,4,5,6,7] and probably
elsewhere).  It's far from complete (I was busy with sparse-checkout,
merge-tree, etc., and then disappeared for over a month on top of
that), but I still intend to complete it.  I would like it to cover
several usecases, including better replaying of merges, and I think
usescases like the one Junio points out here should be in scope.  If
you'd like to play around with my git-replay (not even alpha quality
yet, though it can do some things), feel free to take a look:
https://github.com/newren/git/tree/replay

[2] https://lore.kernel.org/git/nycvar.QRO.7.76.6.2110211147490.56@tvgsbejvaqbjf.bet/
[3] https://lore.kernel.org/git/CABPp-BH_TiJaDpn2+VVjCb83NEFjL9teSk06+YiZyFGiTu8Lpg@mail.gmail.com/
[4] https://lore.kernel.org/git/CABPp-BHpK8hPsiuHoYsf5D_rjcGLSW-_faL3ODoh56pG_2Luwg@mail.gmail.com/
[5] https://lore.kernel.org/git/CABPp-BFQv+mrWj8iH0Vo5Pr5L922v=ZsVthFjofy5pm1Sx8x5Q@mail.gmail.com/
[6] https://lore.kernel.org/git/CABPp-BE+DaBkis0r7pqs-kaChCvFhCEsyDg=gs3=QjWOPERaXQ@mail.gmail.com/
[7] https://lore.kernel.org/git/pull.1122.v5.git.1645340082.gitgitgadget@gmail.com/
Edmundo Carmona Antoranz April 17, 2022, 5:37 a.m. UTC | #17
On Sun, Apr 17, 2022 at 7:05 AM Elijah Newren <newren@gmail.com> wrote:
>
>
> Replaying merges is something I've put a little thought into, so allow
> me to provide some pointers that may help.  Merges need special
> handling for replaying, and in my opinion, doing either just a new
> merge of the new trees (what rebase --rebase-merges does), or just
> reusing existing trees (what you proposed to start this thread) are
> both suboptimal, though the former is likely to just be annoying and
> require potentially unnecessary user refixing, whereas the latter can
> silently discard changes or reintroduce discarded changes and could be
> dangerous.  More details on both of these...
>
> An important part about merges is they may have resolved conflicts --
> both textual (the standard conflict markers people have to resolve)
> and semantic (e.g. one person changes the API of some function, and
> the other branch being merged adds a caller of that function, so the
> merge has to modify the new caller to use the new API).  We do not
> just want to do a new merge and re-use the commit message (as rebase
> --rebase-merges does), for two reasons: (1) the user either has to
> re-resolve the textual conflict resolutions by hand, or use rerere
> which requires a working tree (and we'd like replays to proceed
> without a working tree where possible), and (2) it tosses semantic
> merge conflict resolutions entirely.  We also do not just want to use
> existing trees as-is (as you started with in your patch), for three
> reasons: (1) when we move to a new base the new merge needs to include
> the changes from the newer base, (2) the topic might have additional
> changes added (or removed) during the "rebase" which need to be
> reflected in the merge as well, and (3) the merge may have had
> additional changes stuffed directly into it to solve semantic
> conflicts which we want "ported" to the new merge commit.    So, for
> handling merges, we should avoid both of these overly simplistic
> mechanisms, and do something that tries to handle forward-porting
> these conflict resolutions.  I have outlined steps to do so at
> https://lore.kernel.org/git/CABPp-BHp+d62dCyAaJfh1cZ8xVpGyb97mZryd02aCOX=Qn=Ltw@mail.gmail.com/
>

Hey, Elijah! Thanks for taking the time and the feedback.

Forget about me introducing replay as a separate command as a "real"
proposal. My intent (and which I saw most simple to be able to show
it) was to present the idea of an optimization (if you will) to the
rebase mechanism under certain rather narrow conditions:

git rebase --onto A B C

if A^{tree} == B^{tree} that means that we could create an equivalent
commit for the segment B..C on top of A without much hassle by reusing
the same trees from that segment (no need to calculate new trees...and
no need to move along the working tree as we are creating those
commits).

My impression from reading your feedback is that you have a much
broader scope in terms of what you want to achieve.So, for the time
being, I will work on trying to get the optimization in rebase and see
how far I am able to move it forward.... and you are able to keep
replay as a separate command if that is your will for the
not-so-distant future. :-)

BR!

PS I will be snooping around all of that material you are linking as I
am sure there will be interesting stuff in there. And thanks, again!
Martin von Zweigbergk April 17, 2022, 5:22 p.m. UTC | #18
On Sun, Apr 17, 2022 at 5:30 AM Edmundo Carmona Antoranz
<eantoranz@gmail.com> wrote:
>
> On Sun, Apr 17, 2022 at 7:05 AM Elijah Newren <newren@gmail.com> wrote:
> >
> >
> > Replaying merges is something I've put a little thought into, so allow
> > me to provide some pointers that may help.  Merges need special
> > handling for replaying, and in my opinion, doing either just a new
> > merge of the new trees (what rebase --rebase-merges does), or just
> > reusing existing trees (what you proposed to start this thread) are
> > both suboptimal, though the former is likely to just be annoying and
> > require potentially unnecessary user refixing, whereas the latter can
> > silently discard changes or reintroduce discarded changes and could be
> > dangerous.  More details on both of these...
> >
> > An important part about merges is they may have resolved conflicts --
> > both textual (the standard conflict markers people have to resolve)
> > and semantic (e.g. one person changes the API of some function, and
> > the other branch being merged adds a caller of that function, so the
> > merge has to modify the new caller to use the new API).  We do not
> > just want to do a new merge and re-use the commit message (as rebase
> > --rebase-merges does), for two reasons: (1) the user either has to
> > re-resolve the textual conflict resolutions by hand, or use rerere
> > which requires a working tree (and we'd like replays to proceed
> > without a working tree where possible), and (2) it tosses semantic
> > merge conflict resolutions entirely.  We also do not just want to use
> > existing trees as-is (as you started with in your patch), for three
> > reasons: (1) when we move to a new base the new merge needs to include
> > the changes from the newer base, (2) the topic might have additional
> > changes added (or removed) during the "rebase" which need to be
> > reflected in the merge as well, and (3) the merge may have had
> > additional changes stuffed directly into it to solve semantic
> > conflicts which we want "ported" to the new merge commit.    So, for
> > handling merges, we should avoid both of these overly simplistic
> > mechanisms, and do something that tries to handle forward-porting
> > these conflict resolutions.  I have outlined steps to do so at
> > https://lore.kernel.org/git/CABPp-BHp+d62dCyAaJfh1cZ8xVpGyb97mZryd02aCOX=Qn=Ltw@mail.gmail.com/
> >
>
> Hey, Elijah! Thanks for taking the time and the feedback.
>
> Forget about me introducing replay as a separate command as a "real"
> proposal. My intent (and which I saw most simple to be able to show
> it) was to present the idea of an optimization (if you will) to the
> rebase mechanism under certain rather narrow conditions:
>
> git rebase --onto A B C
>
> if A^{tree} == B^{tree} that means that we could create an equivalent
> commit for the segment B..C on top of A without much hassle by reusing
> the same trees from that segment (no need to calculate new trees...and
> no need to move along the working tree as we are creating those
> commits).
>
> My impression from reading your feedback is that you have a much
> broader scope in terms of what you want to achieve.So, for the time
> being, I will work on trying to get the optimization in rebase and see
> how far I am able to move it forward.... and you are able to keep
> replay as a separate command if that is your will for the
> not-so-distant future. :-)

My (Git-compatible) VCS [1] is very relevant to this thread. It always
treats the contents of a merge commit as the diff compared to the
re-merge (auto-merged) parents. That applies to diffs (like
--remerge-diff) and rebases (what Elijah suggested in that link above)
. An important part of the solution I went with is to store
information about conflicts in the commits. Note that it's a more
high-level representation of the conflicts - not conflict *markers* -
that's stored in the commits [2]. Adding a new kind of object type is
obviously a huge step to take for Git, but perhaps you can consider it
as long as these objects are not exchanged. Also, as you have probably
noticed with your `git replay` command, this kind of rebasing without
touching the working copy or trees can get pretty fast. I didn't see
any performance numbers in your original message, but you are probably
able to rebase >1k commits per second in the git.git repo [3].

[1] https://github.com/martinvonz/jj
[2] https://github.com/martinvonz/jj/blob/main/docs/technical/conflicts.md
[3] https://github.com/martinvonz/jj/discussions/49
Edmundo Carmona Antoranz April 18, 2022, 7:04 a.m. UTC | #19
On Sun, Apr 17, 2022 at 7:23 PM Martin von Zweigbergk
<martinvonz@gmail.com> wrote:
>

>
> My (Git-compatible) VCS [1] is very relevant to this thread. It always
> treats the contents of a merge commit as the diff compared to the
> re-merge (auto-merged) parents. That applies to diffs (like
> --remerge-diff) and rebases (what Elijah suggested in that link above)
> . An important part of the solution I went with is to store
> information about conflicts in the commits. Note that it's a more
> high-level representation of the conflicts - not conflict *markers* -
> that's stored in the commits [2]. Adding a new kind of object type is
> obviously a huge step to take for Git, but perhaps you can consider it
> as long as these objects are not exchanged. Also, as you have probably
> noticed with your `git replay` command, this kind of rebasing without
> touching the working copy or trees can get pretty fast. I didn't see
> any performance numbers in your original message, but you are probably
> able to rebase >1k commits per second in the git.git repo [3].
>

Thanks for all your input (I mean, everybody who has jumped into the
conversation). In my last experiment to get this into rebase (and
still without moving the working tree), the 900+ commits that make up
the segment v2.35.0..v2.36.0-rc1 are converted in some 143 ms in my
computer, which is an aging dinosaur.
Sergey Organov April 18, 2022, 7:29 a.m. UTC | #20
Elijah Newren <newren@gmail.com> writes:

> On Fri, Apr 15, 2022 at 10:41 PM Edmundo Carmona Antoranz
> <eantoranz@gmail.com> wrote:
>>
>> On Fri, Apr 15, 2022 at 10:33 PM Junio C Hamano <gitster@pobox.com> wrote:
>> >
>> > It would be wonderful if a single command like replay can be used to
>> > say "In the old history master..seen I have bunch of merges.  master
>> > used to be M but now it is at N.  Rebuild M..S on top of N _but_
>> > with a bit of twist.  Some of the topics in M...S may have been
>> > merged to 'master' between M..N and the replayed history on top of N
>> > does not want to have a merge from such 'already graduated' topics.
>> > Many topics are updated, either by adding a new commit on top or
>> > completely rewritten, and we want an updated tip of these topic
>> > branches, not the old tip that I merged when I created M..S chain,
>> > when replaying the history on top of N."
>> >
>> > That kind of operation is quite different from what "rebase" does,
>> > and deserves to be under a different name.
>> >
>>
>> Let me work a little bit on your workflow to see what I can do.
>
> Replaying merges is something I've put a little thought into, so allow
> me to provide some pointers that may help.  Merges need special
> handling for replaying, and in my opinion, doing either just a new
> merge of the new trees (what rebase --rebase-merges does), or just
> reusing existing trees (what you proposed to start this thread) are
> both suboptimal, though the former is likely to just be annoying and
> require potentially unnecessary user refixing,

It silently drops user changes as well, and that's the worst thing about
it, not annoyance.

> whereas the latter can silently discard changes or reintroduce
> discarded changes and could be dangerous. More details on both of
> these...

Please consider yet another option:

https://public-inbox.org/git/87r2oxe3o1.fsf@javad.com/

that at least is safe with respect to user changes.

-- Sergey Organov
Elijah Newren April 18, 2022, 4:27 p.m. UTC | #21
Hi Sergey,

On Mon, Apr 18, 2022 at 12:30 AM Sergey Organov <sorganov@gmail.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
[...]
> > Replaying merges is something I've put a little thought into, so allow
> > me to provide some pointers that may help.  Merges need special
> > handling for replaying, and in my opinion, doing either just a new
> > merge of the new trees (what rebase --rebase-merges does), or just
> > reusing existing trees (what you proposed to start this thread) are
> > both suboptimal, though the former is likely to just be annoying and
> > require potentially unnecessary user refixing,
>
> It silently drops user changes as well, and that's the worst thing about
> it, not annoyance.

Yes, I mentioned that later in the email, but omitted it in the
summary you highlight here just because the fixed-tree case was so
much more likely to do it.  Anyway, sorry for the inaccuracy in the
summarized version.

> > whereas the latter can silently discard changes or reintroduce
> > discarded changes and could be dangerous. More details on both of
> > these...
>
> Please consider yet another option:

I linked to where I had given another option.

> https://public-inbox.org/git/87r2oxe3o1.fsf@javad.com/
>
> that at least is safe with respect to user changes.

If you read the suggestion I made (which I'll reinclude here at [1]),
you'll note that I read the old thread you link to with both your and
Phillips' suggestions.  I dug into them with some examples, and came
to the conclusion that we needed something better, as I briefly
commented when proposing my suggested alternative (at [1]).  I
appreciate your suggestion and the time you put into it, but based on
my earlier investigation, I believe my suggestion would be a better
way of preserving user changes in merges and I'll be implementing it.
The fact that Martin (in this thread) independently came up with the
same basic idea and implemented it in jj (though he apparently has
some further tweaks around the object model) and it works well
suggests to me that the idea has some real world testing too that
gives me further confidence in the idea.

[1] https://lore.kernel.org/git/CABPp-BGW39_5r8Lbt3ymR+F_=hWJcf=2e7O75vFNJ=3CEL5s=g@mail.gmail.com/
Sergey Organov April 18, 2022, 5:33 p.m. UTC | #22
Elijah Newren <newren@gmail.com> writes:

> Hi Sergey,
>
> On Mon, Apr 18, 2022 at 12:30 AM Sergey Organov <sorganov@gmail.com> wrote:
>>
>> Elijah Newren <newren@gmail.com> writes:
>>
> [...]
>> > Replaying merges is something I've put a little thought into, so allow
>> > me to provide some pointers that may help.  Merges need special
>> > handling for replaying, and in my opinion, doing either just a new
>> > merge of the new trees (what rebase --rebase-merges does), or just
>> > reusing existing trees (what you proposed to start this thread) are
>> > both suboptimal, though the former is likely to just be annoying and
>> > require potentially unnecessary user refixing,
>>
>> It silently drops user changes as well, and that's the worst thing about
>> it, not annoyance.
>
> Yes, I mentioned that later in the email, but omitted it in the
> summary you highlight here just because the fixed-tree case was so
> much more likely to do it.  Anyway, sorry for the inaccuracy in the
> summarized version.
>
>> > whereas the latter can silently discard changes or reintroduce
>> > discarded changes and could be dangerous. More details on both of
>> > these...
>>
>> Please consider yet another option:
>
> I linked to where I had given another option.
>
>> https://public-inbox.org/git/87r2oxe3o1.fsf@javad.com/
>>
>> that at least is safe with respect to user changes.
>
> If you read the suggestion I made (which I'll reinclude here at [1]),
> you'll note that I read the old thread you link to with both your and
> Phillips' suggestions.  I dug into them with some examples, and came
> to the conclusion that we needed something better, as I briefly
> commented when proposing my suggested alternative (at [1]).  I
> appreciate your suggestion and the time you put into it, but based on
> my earlier investigation, I believe my suggestion would be a better
> way of preserving user changes in merges and I'll be implementing it.
> The fact that Martin (in this thread) independently came up with the
> same basic idea and implemented it in jj (though he apparently has
> some further tweaks around the object model) and it works well
> suggests to me that the idea has some real world testing too that
> gives me further confidence in the idea.

Yep, whoever is going to actually implement something always wins, and
that's a good thing. I'm looking forward for the outcome of all this
with a hope.

Thanks,
-- Sergey Organov
Tao Klerks April 20, 2022, 11:27 a.m. UTC | #23
On Mon, Apr 18, 2022 at 6:28 PM Elijah Newren <newren@gmail.com> wrote:
>
> If you read the suggestion I made (which I'll reinclude here at [1]),
> you'll note that I read the old thread you link to with both your and
> Phillips' suggestions.  I dug into them with some examples, and came
> to the conclusion that we needed something better, as I briefly
> commented when proposing my suggested alternative (at [1]).  I
> appreciate your suggestion and the time you put into it, but based on
> my earlier investigation, I believe my suggestion would be a better
> way of preserving user changes in merges and I'll be implementing it.
> The fact that Martin (in this thread) independently came up with the
> same basic idea and implemented it in jj (though he apparently has
> some further tweaks around the object model) and it works well
> suggests to me that the idea has some real world testing too that
> gives me further confidence in the idea.
>
> [1] https://lore.kernel.org/git/CABPp-BGW39_5r8Lbt3ymR+F_=hWJcf=2e7O75vFNJ=3CEL5s=g@mail.gmail.com/

Thank you for the clarification, and sorry I'm clearly missing
something here - the link you provided is to a deeply threaded
conversation about "[PATCH 08/12] merge-ort: provide a
merge_get_conflicted_files() helper function", in the context of a
server-side merge support patchset... I can't figure out how to relate
that conversation to the "how can safely reusing previous merge
outcomes when rebasing a merge work well?" topic I thought you had
introduced here :(
Elijah Newren April 21, 2022, 2:33 a.m. UTC | #24
On Wed, Apr 20, 2022 at 4:27 AM Tao Klerks <tao@klerks.biz> wrote:
>
> On Mon, Apr 18, 2022 at 6:28 PM Elijah Newren <newren@gmail.com> wrote:
> >
> > If you read the suggestion I made (which I'll reinclude here at [1]),
> > you'll note that I read the old thread you link to with both your and
> > Phillips' suggestions.  I dug into them with some examples, and came
> > to the conclusion that we needed something better, as I briefly
> > commented when proposing my suggested alternative (at [1]).  I
> > appreciate your suggestion and the time you put into it, but based on
> > my earlier investigation, I believe my suggestion would be a better
> > way of preserving user changes in merges and I'll be implementing it.
> > The fact that Martin (in this thread) independently came up with the
> > same basic idea and implemented it in jj (though he apparently has
> > some further tweaks around the object model) and it works well
> > suggests to me that the idea has some real world testing too that
> > gives me further confidence in the idea.
> >
> > [1] https://lore.kernel.org/git/CABPp-BGW39_5r8Lbt3ymR+F_=hWJcf=2e7O75vFNJ=3CEL5s=g@mail.gmail.com/
>
> Thank you for the clarification, and sorry I'm clearly missing
> something here - the link you provided is to a deeply threaded
> conversation about "[PATCH 08/12] merge-ort: provide a
> merge_get_conflicted_files() helper function", in the context of a
> server-side merge support patchset... I can't figure out how to relate
> that conversation to the "how can safely reusing previous merge
> outcomes when rebasing a merge work well?" topic I thought you had
> introduced here :(

Sorry, I entered the wrong link, and assumed it was right when I
copied it into the response email.  Whoops.  The link for [1] was
supposed to be https://lore.kernel.org/git/CABPp-BHp+d62dCyAaJfh1cZ8xVpGyb97mZryd02aCOX=Qn=Ltw@mail.gmail.com/

But as has been said elsewhere, what you're asking for doesn't exist
yet.  That other email was where I outlined a bunch of details about
what someone could do to implement it, and where I pointed out that I
planned to eventually implement it if Dscho didn't beat me to it.
I've since started on it.

I also linked to my tree a few times where anyone can look at what I
have done so far (which isn't useful to users yet).  If you want to
take what I've implemented and implement the rest before I can, go
ahead.  If you want to take the steps I outlined on how to do it in
the email link I provided above, and implement it from scratch then go
ahead.  But otherwise, as Junio already pointed out in this thread, it
just doesn't exist today.
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index f8bccfab5e..71924d0c43 100644
--- a/Makefile
+++ b/Makefile
@@ -1194,6 +1194,7 @@  BUILTIN_OBJS += builtin/remote-fd.o
 BUILTIN_OBJS += builtin/remote.o
 BUILTIN_OBJS += builtin/repack.o
 BUILTIN_OBJS += builtin/replace.o
+BUILTIN_OBJS += builtin/replay.o
 BUILTIN_OBJS += builtin/rerere.o
 BUILTIN_OBJS += builtin/reset.o
 BUILTIN_OBJS += builtin/rev-list.o
diff --git a/builtin.h b/builtin.h
index 40e9ecc848..0c1915c4c9 100644
--- a/builtin.h
+++ b/builtin.h
@@ -207,6 +207,7 @@  int cmd_remote(int argc, const char **argv, const char *prefix);
 int cmd_remote_ext(int argc, const char **argv, const char *prefix);
 int cmd_remote_fd(int argc, const char **argv, const char *prefix);
 int cmd_repack(int argc, const char **argv, const char *prefix);
+int cmd_replay(int argc, const char **argv, const char *prefix);
 int cmd_rerere(int argc, const char **argv, const char *prefix);
 int cmd_reset(int argc, const char **argv, const char *prefix);
 int cmd_restore(int argc, const char **argv, const char *prefix);
diff --git a/builtin/replay.c b/builtin/replay.c
new file mode 100644
index 0000000000..ed970fa057
--- /dev/null
+++ b/builtin/replay.c
@@ -0,0 +1,148 @@ 
+/*
+ * "git replay" builtin command
+ *
+ * Copyright (c) 2022 Edmundo Carmona Antoranz
+ * Released under the terms of GPLv2
+ */
+
+#include "builtin.h"
+#include "revision.h"
+#include "commit.h"
+#include "cache.h"
+
+static struct commit **new_commits;
+static unsigned long mappings_size = 0;
+static struct commit_list *old_commits = NULL;
+
+static unsigned int replay_indexof(struct commit *commit,
+				   struct commit_list *list)
+{
+	int res;
+	if (list == NULL)
+		return -1;
+	if (!oidcmp(&list->item->object.oid,
+		    &commit->object.oid))
+		return 0;
+	res = replay_indexof(commit, list->next);
+	return res < 0 ? res : res + 1;
+}
+
+static struct commit *replay_find_commit(const char *name)
+{
+	struct commit *commit = lookup_commit_reference_by_name(name);
+	if (!commit)
+		die(_("no such branch/commit '%s'"), name);
+	return commit;
+}
+
+static struct commit* replay_commit(struct commit * orig_commit)
+{
+	struct pretty_print_context ctx = {0};
+	struct strbuf body = STRBUF_INIT;
+	struct strbuf author = STRBUF_INIT;
+	struct strbuf committer = STRBUF_INIT;
+	struct object_id new_commit_oid;
+	struct commit *new_commit;
+
+	struct commit_list *new_parents_head = NULL;
+	struct commit_list **new_parents = &new_parents_head;
+	struct commit_list *parents = orig_commit->parents;
+	while (parents) {
+		struct commit *parent = parents->item;
+		int commit_index;
+		struct commit *new_parent;
+
+		commit_index = replay_indexof(parent, old_commits);
+
+		if (commit_index < 0)
+			 // won't be replayed, use the original parent
+			new_parent = parent;
+		else {
+			// it might have been translated already
+			if (!new_commits[commit_index])
+				new_commits[commit_index] = replay_commit(parent);
+			new_parent = new_commits[commit_index];
+		}
+		new_parents = commit_list_append(new_parent, new_parents);
+		parents = parents->next;
+	}
+
+	format_commit_message(orig_commit, "%B", &body, &ctx);
+	// TODO timezones
+	format_commit_message(orig_commit, "%an <%ae> %at +0000", &author, &ctx);
+	// TODO consider committer (control with an option)
+	format_commit_message(orig_commit, "%cn <%ce> %ct +0000", &committer, &ctx);
+
+	commit_tree_extended(body.buf,
+			     body.len,
+			     get_commit_tree_oid(orig_commit),
+			     new_parents_head,
+			     &new_commit_oid,
+			     author.buf,
+			     committer.buf,
+			     NULL, NULL);
+
+	new_commit = lookup_commit_or_die(&new_commit_oid,
+					  "new commit");
+
+	strbuf_release(&author);
+	strbuf_release(&body);
+	strbuf_release(&committer);
+
+	return new_commit;
+}
+
+static struct commit* replay(struct commit *new_base, struct commit *old_base,
+		      struct commit *tip)
+{
+	struct rev_info revs;
+	struct commit *commit;
+
+	init_revisions(&revs, NULL);
+
+	old_base->object.flags |= UNINTERESTING;
+	add_pending_object(&revs, &old_base->object, "old-base");
+	add_pending_object(&revs, &tip->object, "tip");
+
+	if (prepare_revision_walk(&revs))
+		die("Could not get revisions to replay");
+
+	while ((commit = get_revision(&revs)) != NULL)
+		commit_list_insert(commit, &old_commits);
+
+	// save the mapping between the new and the old base
+	commit_list_insert(old_base, &old_commits);
+	mappings_size = commit_list_count(old_commits);
+	new_commits = calloc(mappings_size, sizeof(struct commit));
+	new_commits[replay_indexof(old_base, old_commits)] = new_base;
+
+	return replay_commit(tip);
+}
+
+
+int cmd_replay(int argc, const char **argv, const char *prefix)
+{
+	struct commit *new_base;
+	struct commit *old_base;
+	struct commit *tip;
+	struct commit *new_tip;
+
+	if (argc < 4) {
+		die("Not enough parameters");
+	}
+
+	new_base = replay_find_commit(argv[1]);
+	old_base = replay_find_commit(argv[2]);
+	tip = replay_find_commit(argv[3]);
+
+	if (oidcmp(get_commit_tree_oid(new_base),
+		   get_commit_tree_oid(old_base)))
+		die("The old and the new base do not have the same tree");
+
+	// get the list of revisions between old_base and tip
+	new_tip = replay(new_base, old_base, tip);
+
+	printf("%s\n", oid_to_hex(&new_tip->object.oid));
+
+	return 0;
+}
diff --git a/git.c b/git.c
index 3d8e48cf55..14de8d666f 100644
--- a/git.c
+++ b/git.c
@@ -590,6 +590,7 @@  static struct cmd_struct commands[] = {
 	{ "remote-fd", cmd_remote_fd, NO_PARSEOPT },
 	{ "repack", cmd_repack, RUN_SETUP },
 	{ "replace", cmd_replace, RUN_SETUP },
+	{ "replay", cmd_replay, RUN_SETUP },
 	{ "rerere", cmd_rerere, RUN_SETUP },
 	{ "reset", cmd_reset, RUN_SETUP },
 	{ "restore", cmd_restore, RUN_SETUP | NEED_WORK_TREE },