diff mbox series

[v2,1/4] merge-ort: barebones API of new merge strategy with empty implementation

Message ID b9e73975eab1f349be678779ff57155feb4c3501.1603731448.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Beginning of new merge strategy: New API, empty implementation | expand

Commit Message

Elijah Newren Oct. 26, 2020, 4:57 p.m. UTC
From: Elijah Newren <newren@gmail.com>

This is the beginning of a new merge strategy.  While there are some API
differences, and the implementation has some differences in behavior, it
is essentially meant as an eventual drop-in replacement for
merge-recursive.c.  However, it is being built to exist side-by-side
with merge-recursive so that we have plenty of time to find out how
those differences pan out in the real world while people can still fall
back to merge-recursive.  (Also, I intend to avoid modifying
merge-recursive during this process, to keep it stable.)

The primary difference noticable here is that the updating of the
working tree and index is not done simultaneously with the merge
algorithm, but is a separate post-processing step.  The new API is
designed so that one can do repeated merges (e.g. during a rebase or
cherry-pick) and only update the index and working tree one time at the
end instead of updating it with every intermediate result.  Also, one
can perform a merge between two branches, neither of which match the
index or the working tree, without clobbering the index or working tree.

The next three commits will demonstrate various uses of this new API.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Makefile    |  1 +
 merge-ort.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 merge-ort.h | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 102 insertions(+)
 create mode 100644 merge-ort.c
 create mode 100644 merge-ort.h

Comments

Junio C Hamano Oct. 26, 2020, 8:45 p.m. UTC | #1
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> + *   git merge [-s recursive]
> + *
> + * with
> + *
> + *   git merge -s ort
> + *
> + * Note: git's parser allows the space between '-s' and its argument to be
> + * missing.  (Should I have backronymed "ham", "alsa", "kip", "nap, "alvo",
> + * "cale", "peedy", or "ins" instead of "ort"?)

One thing that is quite unpleasant is "git grep ort" gives us too
many hits already, and it will be hard to locate ort related changes
with "git log --grep=ort", as the name is too short to serve as an
effective way to limit the search.

> diff --git a/merge-ort.h b/merge-ort.h
> new file mode 100644
> index 0000000000..47d30cf538
> --- /dev/null
> +++ b/merge-ort.h
> @@ -0,0 +1,49 @@
> +#ifndef MERGE_ORT_H
> +#define MERGE_ORT_H
> +
> +#include "merge-recursive.h"
> +
> +struct commit;
> +struct tree;
> +
> +struct merge_result {
> +	/* whether the merge is clean */
> +	int clean;
> +
> +	/* Result of merge.  If !clean, represents what would go in worktree */
> +	struct tree *tree;

Curious.  Because there is no way for "struct tree" to hold an
in-core pointer to a "struct blob" (iow, for a blob to be in a
"struct tree", it has to have been assigned an object name), unless
we are using the "pretend" mechanism, which has its own downsides,
we are committed to create a throw-away blob objects with conflict
markers in them, and write them to the object store.

If we were writing a new merge machinery from scratch, I would have
preferred a truly in-core implementation that does not have to write
out to the object store but if this makes the implementation simpler,
perhaps it is a small enough price to pay.

> +	/*
> +	 * Additional metadata used by merge_switch_to_result() or future calls
> +	 * to merge_inmemory_*().  Not for external use.
> +	 */
> +	void *priv;
> +	unsigned ate;

I'd prefer to see this named not so cute.  Will we hang random
variations of things, or would this be better to be made into a
pointer to union, with an enum that tells us which kind it is in
use?

> +};


> +/* rename-detecting three-way merge with recursive ancestor consolidation. */
> +void merge_inmemory_recursive(struct merge_options *opt,
> +			      struct commit_list *merge_bases,
> +			      struct commit *side1,
> +			      struct commit *side2,
> +			      struct merge_result *result);

I've seen "incore" spelled as a squashed-into-a-single-word, but not
"in_memory".

> +/* rename-detecting three-way merge, no recursion. */
> +void merge_inmemory_nonrecursive(struct merge_options *opt,
> +				 struct tree *merge_base,
> +				 struct tree *side1,
> +				 struct tree *side2,
> +				 struct merge_result *result);
> +
> +/* Update the working tree and index from head to result after inmemory merge */
> +void merge_switch_to_result(struct merge_options *opt,
> +			    struct tree *head,
> +			    struct merge_result *result,
> +			    int update_worktree_and_index,
> +			    int display_update_msgs);

To those who have known how our merge works, a natural expectation
for an "in-core" merge is that when the "in-core" merge finishes,
the index would hold the higher stages for the conflicted paths, and
cleanly merged paths would have the result at stage 0, and there is
an extra thing that we haven't had that represents what the working
tree files for conflicted paths should look like (historically we
wrote out the conflicted result to the working tree files---being
in-core operation we cannot afford to), so that (1) cleanly merged
paths can be externalized by writing from their stage 0 entries and
(2) contents with conflicts can be externalized by that "extra
thing".

But this helper says "working tree and index" are both updated, so
the "in-core" merge it expects must have not just the working tree
result (in result->tree, as the comment in the structure says) but
also how the higher stages of the index should look like somewhere
in the result structure.  How the latter is done is not at all clear
at this point in the mock-up.  Leaving it opaque is fine, but the
function, and the result structure, deserve clarification to avoid
confusing readers by highlighting how it is different from the
traditional ways (e.g. "we don't touch the index at all---instead we
store that in the priv/ate fields", if that is what is going on).

Thanks.
Elijah Newren Oct. 26, 2020, 9:18 p.m. UTC | #2
On Mon, Oct 26, 2020 at 1:45 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > + *   git merge [-s recursive]
> > + *
> > + * with
> > + *
> > + *   git merge -s ort
> > + *
> > + * Note: git's parser allows the space between '-s' and its argument to be
> > + * missing.  (Should I have backronymed "ham", "alsa", "kip", "nap, "alvo",
> > + * "cale", "peedy", or "ins" instead of "ort"?)
>
> One thing that is quite unpleasant is "git grep ort" gives us too
> many hits already, and it will be hard to locate ort related changes
> with "git log --grep=ort", as the name is too short to serve as an
> effective way to limit the search.

Suggestions for an alternative name?  merge-pandemic.c since it was
mostly written during the pandemic?

I'm really not good at naming things...

> > diff --git a/merge-ort.h b/merge-ort.h
> > new file mode 100644
> > index 0000000000..47d30cf538
> > --- /dev/null
> > +++ b/merge-ort.h
> > @@ -0,0 +1,49 @@
> > +#ifndef MERGE_ORT_H
> > +#define MERGE_ORT_H
> > +
> > +#include "merge-recursive.h"
> > +
> > +struct commit;
> > +struct tree;
> > +
> > +struct merge_result {
> > +     /* whether the merge is clean */
> > +     int clean;
> > +
> > +     /* Result of merge.  If !clean, represents what would go in worktree */
> > +     struct tree *tree;
>
> Curious.  Because there is no way for "struct tree" to hold an
> in-core pointer to a "struct blob" (iow, for a blob to be in a
> "struct tree", it has to have been assigned an object name), unless
> we are using the "pretend" mechanism, which has its own downsides,
> we are committed to create a throw-away blob objects with conflict
> markers in them, and write them to the object store.

This is something merge-recursive already does (and I've copied some
of that code over, around merge_3way() and the call to
write_object_file() with the results).  I thought the reasoning behind
this was memory -- we're okay assuming any given file fits in memory
(and perhaps up to three copies of it so we can do a three-way merge),
but we're not okay assuming all (changed) files from a commit
simultaneously fit in memory.

> If we were writing a new merge machinery from scratch, I would have
> preferred a truly in-core implementation that does not have to write
> out to the object store but if this makes the implementation simpler,
> perhaps it is a small enough price to pay.

I thought about that early on, but I was worried about out-of-memory
situations if we attempt to do truly in-memory, at least for large
changes in large repositories.

And as you have seen above, I do rely on being able to create trees.

> > +     /*
> > +      * Additional metadata used by merge_switch_to_result() or future calls
> > +      * to merge_inmemory_*().  Not for external use.
> > +      */
> > +     void *priv;
> > +     unsigned ate;
>
> I'd prefer to see this named not so cute.  Will we hang random
> variations of things, or would this be better to be made into a
> pointer to union, with an enum that tells us which kind it is in
> use?

I don't understand the union suggestion.  Both fields are used.

Would you object if 'ate' was named '_'?  That was my original name,
but Taylor didn't like it.  It is used on about 4 lines of code, I'm
99.9% sure it will never be used in additional locations, and callers
shouldn't mess with it.  I just don't have a good name for it.  I
guess maybe I should just call it "properly_initialized" or something.

> > +};
>
>
> > +/* rename-detecting three-way merge with recursive ancestor consolidation. */
> > +void merge_inmemory_recursive(struct merge_options *opt,
> > +                           struct commit_list *merge_bases,
> > +                           struct commit *side1,
> > +                           struct commit *side2,
> > +                           struct merge_result *result);
>
> I've seen "incore" spelled as a squashed-into-a-single-word, but not
> "in_memory".

I can add an underscore.  Or switch to incore.  Preference?

> > +/* rename-detecting three-way merge, no recursion. */
> > +void merge_inmemory_nonrecursive(struct merge_options *opt,
> > +                              struct tree *merge_base,
> > +                              struct tree *side1,
> > +                              struct tree *side2,
> > +                              struct merge_result *result);
> > +
> > +/* Update the working tree and index from head to result after inmemory merge */
> > +void merge_switch_to_result(struct merge_options *opt,
> > +                         struct tree *head,
> > +                         struct merge_result *result,
> > +                         int update_worktree_and_index,
> > +                         int display_update_msgs);
>
> To those who have known how our merge works, a natural expectation
> for an "in-core" merge is that when the "in-core" merge finishes,
> the index would hold the higher stages for the conflicted paths, and
> cleanly merged paths would have the result at stage 0, and there is
> an extra thing that we haven't had that represents what the working
> tree files for conflicted paths should look like (historically we
> wrote out the conflicted result to the working tree files---being
> in-core operation we cannot afford to), so that (1) cleanly merged
> paths can be externalized by writing from their stage 0 entries and
> (2) contents with conflicts can be externalized by that "extra
> thing".
>
> But this helper says "working tree and index" are both updated, so
> the "in-core" merge it expects must have not just the working tree
> result (in result->tree, as the comment in the structure says) but
> also how the higher stages of the index should look like somewhere
> in the result structure.  How the latter is done is not at all clear
> at this point in the mock-up.  Leaving it opaque is fine, but the
> function, and the result structure, deserve clarification to avoid
> confusing readers by highlighting how it is different from the
> traditional ways (e.g. "we don't touch the index at all---instead we
> store that in the priv/ate fields", if that is what is going on).

Yes, your reading is correct.  We don't touch the index (or any index,
or any cache_entry) at all.  Among other things, data that can be used
to update the index are in the "priv" field.

I'll try to add some notes to the file.
Junio C Hamano Oct. 26, 2020, 10:10 p.m. UTC | #3
Elijah Newren <newren@gmail.com> writes:

>> > +     /*
>> > +      * Additional metadata used by merge_switch_to_result() or future calls
>> > +      * to merge_inmemory_*().  Not for external use.
>> > +      */
>> > +     void *priv;
>> > +     unsigned ate;
>>
>> I'd prefer to see this named not so cute.  Will we hang random
>> variations of things, or would this be better to be made into a
>> pointer to union, with an enum that tells us which kind it is in
>> use?
>
> I don't understand the union suggestion.  Both fields are used.

I thought "priv" shouldn't be "anything goes, so it is 'void *'" but
is probably a "union { ... } priv;" with associated enum next to it
that tells which one of the possibilities in the union is in effect.

> Would you object if 'ate' was named '_'?

Either is horrible name.

>> > +/* rename-detecting three-way merge with recursive ancestor consolidation. */
>> > +void merge_inmemory_recursive(struct merge_options *opt,
>> > +                           struct commit_list *merge_bases,
>> > +                           struct commit *side1,
>> > +                           struct commit *side2,
>> > +                           struct merge_result *result);
>>
>> I've seen "incore" spelled as a squashed-into-a-single-word, but not
>> "in_memory".
>
> I can add an underscore.  Or switch to incore.  Preference?

Anything shorter would get my vote.

> Yes, your reading is correct.  We don't touch the index (or any index,
> or any cache_entry) at all.  Among other things, data that can be used
> to update the index are in the "priv" field.
>
> I'll try to add some notes to the file.

Sounds good.
Elijah Newren Oct. 26, 2020, 10:28 p.m. UTC | #4
On Mon, Oct 26, 2020 at 3:10 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:

> >> > +     /*
> >> > +      * Additional metadata used by merge_switch_to_result() or future calls
> >> > +      * to merge_inmemory_*().  Not for external use.
> >> > +      */
> >> > +     void *priv;
> >> > +     unsigned ate;
> >>
> >> I'd prefer to see this named not so cute.  Will we hang random
> >> variations of things, or would this be better to be made into a
> >> pointer to union, with an enum that tells us which kind it is in
> >> use?
> >
> > I don't understand the union suggestion.  Both fields are used.
>
> I thought "priv" shouldn't be "anything goes, so it is 'void *'" but
> is probably a "union { ... } priv;" with associated enum next to it
> that tells which one of the possibilities in the union is in effect.

I guess I'm still not following where these "possibilities" come from
or what I could have said that would have suggested possibilities.

priv is a pointer to a private data structure.  The data structure is
defined and used only in merge-ort.c and not exposed to callers.  That
data does need to be passed from merge_incore_*() to
merge_switch_to_result() (or to merge_finalize()).  Since those two
are separate function calls invoked by the caller, and since callers
shouldn't touch any of this data in priv, it's passed as an opaque
field within merge_result.  Thus void*.

> > Would you object if 'ate' was named '_'?
>
> Either is horrible name.

I'll just rip it out, for now.  It isn't relevant to early versions of
merge-ort anyway, and when I re-introduce it with yet another horrible
name, there will at least be more context for others to suggest a
better name for me.

Besides, the variable isn't at all necessary for the algorithm; it
exists solely as a way to catch a small gotcha in API usage of later
versions of merge-ort, which I otherwise didn't have a clean way of
detecting.

> >> > +/* rename-detecting three-way merge with recursive ancestor consolidation. */
> >> > +void merge_inmemory_recursive(struct merge_options *opt,
> >> > +                           struct commit_list *merge_bases,
> >> > +                           struct commit *side1,
> >> > +                           struct commit *side2,
> >> > +                           struct merge_result *result);
> >>
> >> I've seen "incore" spelled as a squashed-into-a-single-word, but not
> >> "in_memory".
> >
> > I can add an underscore.  Or switch to incore.  Preference?
>
> Anything shorter would get my vote.

Sounds good, will do.

> > Yes, your reading is correct.  We don't touch the index (or any index,
> > or any cache_entry) at all.  Among other things, data that can be used
> > to update the index are in the "priv" field.
> >
> > I'll try to add some notes to the file.
>
> Sounds good.

:-)
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index 95571ee3fc..088770c2ae 100644
--- a/Makefile
+++ b/Makefile
@@ -921,6 +921,7 @@  LIB_OBJS += mailmap.o
 LIB_OBJS += match-trees.o
 LIB_OBJS += mem-pool.o
 LIB_OBJS += merge-blobs.o
+LIB_OBJS += merge-ort.o
 LIB_OBJS += merge-recursive.o
 LIB_OBJS += merge.o
 LIB_OBJS += mergesort.o
diff --git a/merge-ort.c b/merge-ort.c
new file mode 100644
index 0000000000..5230364a8d
--- /dev/null
+++ b/merge-ort.c
@@ -0,0 +1,52 @@ 
+/*
+ * "Ostensibly Recursive's Twin" merge strategy, or "ort" for short.  Meant
+ * as a drop-in replacement for the "recursive" merge strategy, allowing one
+ * to replace
+ *
+ *   git merge [-s recursive]
+ *
+ * with
+ *
+ *   git merge -s ort
+ *
+ * Note: git's parser allows the space between '-s' and its argument to be
+ * missing.  (Should I have backronymed "ham", "alsa", "kip", "nap, "alvo",
+ * "cale", "peedy", or "ins" instead of "ort"?)
+ */
+
+#include "cache.h"
+#include "merge-ort.h"
+
+void merge_switch_to_result(struct merge_options *opt,
+			    struct tree *head,
+			    struct merge_result *result,
+			    int update_worktree_and_index,
+			    int display_update_msgs)
+{
+	die("Not yet implemented");
+	merge_finalize(opt, result);
+}
+
+void merge_finalize(struct merge_options *opt,
+		    struct merge_result *result)
+{
+	die("Not yet implemented");
+}
+
+void merge_inmemory_nonrecursive(struct merge_options *opt,
+				 struct tree *merge_base,
+				 struct tree *side1,
+				 struct tree *side2,
+				 struct merge_result *result)
+{
+	die("Not yet implemented");
+}
+
+void merge_inmemory_recursive(struct merge_options *opt,
+			      struct commit_list *merge_bases,
+			      struct commit *side1,
+			      struct commit *side2,
+			      struct merge_result *result)
+{
+	die("Not yet implemented");
+}
diff --git a/merge-ort.h b/merge-ort.h
new file mode 100644
index 0000000000..47d30cf538
--- /dev/null
+++ b/merge-ort.h
@@ -0,0 +1,49 @@ 
+#ifndef MERGE_ORT_H
+#define MERGE_ORT_H
+
+#include "merge-recursive.h"
+
+struct commit;
+struct tree;
+
+struct merge_result {
+	/* whether the merge is clean */
+	int clean;
+
+	/* Result of merge.  If !clean, represents what would go in worktree */
+	struct tree *tree;
+
+	/*
+	 * Additional metadata used by merge_switch_to_result() or future calls
+	 * to merge_inmemory_*().  Not for external use.
+	 */
+	void *priv;
+	unsigned ate;
+};
+
+/* rename-detecting three-way merge with recursive ancestor consolidation. */
+void merge_inmemory_recursive(struct merge_options *opt,
+			      struct commit_list *merge_bases,
+			      struct commit *side1,
+			      struct commit *side2,
+			      struct merge_result *result);
+
+/* rename-detecting three-way merge, no recursion. */
+void merge_inmemory_nonrecursive(struct merge_options *opt,
+				 struct tree *merge_base,
+				 struct tree *side1,
+				 struct tree *side2,
+				 struct merge_result *result);
+
+/* Update the working tree and index from head to result after inmemory merge */
+void merge_switch_to_result(struct merge_options *opt,
+			    struct tree *head,
+			    struct merge_result *result,
+			    int update_worktree_and_index,
+			    int display_update_msgs);
+
+/* Do needed cleanup when not calling merge_switch_to_result() */
+void merge_finalize(struct merge_options *opt,
+		    struct merge_result *result);
+
+#endif