diff mbox series

[v2,07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree

Message ID 20201102204344.342633-8-newren@gmail.com (mailing list archive)
State New, archived
Headers show
Series fundamentals of merge-ort implementation | expand

Commit Message

Elijah Newren Nov. 2, 2020, 8:43 p.m. UTC
Three-way merges, by their nature, are going to often have two or more
trees match at a given subdirectory.  We can avoid calling
fill_tree_descriptor() on the same tree by checking when these trees
match.  Noting when various oids match will also be useful in other
calculations and optimizations as well.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

Comments

Derrick Stolee Nov. 11, 2020, 2:51 p.m. UTC | #1
On 11/2/2020 3:43 PM, Elijah Newren wrote:
> @@ -99,6 +99,15 @@ static int collect_merge_info_callback(int n,
>  	unsigned mbase_null = !(mask & 1);
>  	unsigned side1_null = !(mask & 2);
>  	unsigned side2_null = !(mask & 4);
> +	unsigned side1_matches_mbase = (!side1_null && !mbase_null &&
> +					names[0].mode == names[1].mode &&
> +					oideq(&names[0].oid, &names[1].oid));
> +	unsigned side2_matches_mbase = (!side2_null && !mbase_null &&
> +					names[0].mode == names[2].mode &&
> +					oideq(&names[0].oid, &names[2].oid));
> +	unsigned sides_match = (!side1_null && !side2_null &&
> +				names[1].mode == names[2].mode &&
> +				oideq(&names[1].oid, &names[2].oid));

If the *_null values were in an array, instead, then all of these
lines could be grouped as a macro:

	unsigned null_oid[3] = {
		!(mask & 1),
		!(mask & 2),
		!(mask & 4)
	};

	#define trivial_merge(i,j) (!null_oid[i] && !null_oid[j] && \
				    names[i].mode == names[j].mode && \
				    oideq(&names[i].oid, &names[j].oid))

	unsigned side1_matches_mbase = trivial_merge(0, 1);
	unsigned side2_matches_mbase = trivial_merge(0, 2);
	unsigned sides_match = trivial_merge(1, 2);

I briefly considered making these last three an array, as well,
except the loop below doesn't use 'i' in a symmetrical way:

> +			if (i == 1 && side1_matches_mbase)
> +				t[1] = t[0];
> +			else if (i == 2 && side2_matches_mbase)
> +				t[2] = t[0];
> +			else if (i == 2 && sides_match)
> +				t[2] = t[1];

Since the 'i == 2' case has two possible options, it wouldn't be
possible to just have 'side_matches[i]' here.

> +			else {
> +				const struct object_id *oid = NULL;
> +				if (dirmask & 1)
> +					oid = &names[i].oid;
> +				buf[i] = fill_tree_descriptor(opt->repo,
> +							      t + i, oid);
> +			}

I do appreciate the reduced recursion here!

Thanks,
-Stolee
Elijah Newren Nov. 11, 2020, 5:13 p.m. UTC | #2
On Wed, Nov 11, 2020 at 6:51 AM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 11/2/2020 3:43 PM, Elijah Newren wrote:
> > @@ -99,6 +99,15 @@ static int collect_merge_info_callback(int n,
> >       unsigned mbase_null = !(mask & 1);
> >       unsigned side1_null = !(mask & 2);
> >       unsigned side2_null = !(mask & 4);
> > +     unsigned side1_matches_mbase = (!side1_null && !mbase_null &&
> > +                                     names[0].mode == names[1].mode &&
> > +                                     oideq(&names[0].oid, &names[1].oid));
> > +     unsigned side2_matches_mbase = (!side2_null && !mbase_null &&
> > +                                     names[0].mode == names[2].mode &&
> > +                                     oideq(&names[0].oid, &names[2].oid));
> > +     unsigned sides_match = (!side1_null && !side2_null &&
> > +                             names[1].mode == names[2].mode &&
> > +                             oideq(&names[1].oid, &names[2].oid));
>
> If the *_null values were in an array, instead, then all of these
> lines could be grouped as a macro:
>
>         unsigned null_oid[3] = {
>                 !(mask & 1),
>                 !(mask & 2),
>                 !(mask & 4)
>         };
>
>         #define trivial_merge(i,j) (!null_oid[i] && !null_oid[j] && \
>                                     names[i].mode == names[j].mode && \
>                                     oideq(&names[i].oid, &names[j].oid))
>
>         unsigned side1_matches_mbase = trivial_merge(0, 1);
>         unsigned side2_matches_mbase = trivial_merge(0, 2);
>         unsigned sides_match = trivial_merge(1, 2);

Hmm, I like it.  I think I'll rename trivial_merge() to
non_null_match() (trivial merge suggests it can immediately be
resolved which is not necessarily true if rename detection is on), but
otherwise I'll use this.

> I briefly considered making these last three an array, as well,
> except the loop below doesn't use 'i' in a symmetrical way:
>
> > +                     if (i == 1 && side1_matches_mbase)
> > +                             t[1] = t[0];
> > +                     else if (i == 2 && side2_matches_mbase)
> > +                             t[2] = t[0];
> > +                     else if (i == 2 && sides_match)
> > +                             t[2] = t[1];
>
> Since the 'i == 2' case has two possible options, it wouldn't be
> possible to just have 'side_matches[i]' here.
>
> > +                     else {
> > +                             const struct object_id *oid = NULL;
> > +                             if (dirmask & 1)
> > +                                     oid = &names[i].oid;
> > +                             buf[i] = fill_tree_descriptor(opt->repo,
> > +                                                           t + i, oid);
> > +                     }
>
> I do appreciate the reduced recursion here!

Technically, not my own optimization; I just copied from
unpack-trees.c:traverse_trees_recursive() -- though the code looks
slightly different because I didn't want to compare oids multiple
times (I use the side match variables earlier in the function as
well).
Eric Sunshine Nov. 11, 2020, 5:21 p.m. UTC | #3
On Wed, Nov 11, 2020 at 12:13 PM Elijah Newren <newren@gmail.com> wrote:
> On Wed, Nov 11, 2020 at 6:51 AM Derrick Stolee <stolee@gmail.com> wrote:
> > If the *_null values were in an array, instead, then all of these
> > lines could be grouped as a macro:
> >
> >         unsigned null_oid[3] = {
> >                 !(mask & 1),
> >                 !(mask & 2),
> >                 !(mask & 4)
> >         };
>
> Hmm, I like it.  I think I'll rename trivial_merge() to
> non_null_match() (trivial merge suggests it can immediately be
> resolved which is not necessarily true if rename detection is on), but
> otherwise I'll use this.

Are we allowing non-constant array initializers in the codebase these
days? I don't see anything in CodingGuidelines suggesting the use of
them.
diff mbox series

Patch

diff --git a/merge-ort.c b/merge-ort.c
index 626eb9713e..d3c1d00fc6 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -99,6 +99,15 @@  static int collect_merge_info_callback(int n,
 	unsigned mbase_null = !(mask & 1);
 	unsigned side1_null = !(mask & 2);
 	unsigned side2_null = !(mask & 4);
+	unsigned side1_matches_mbase = (!side1_null && !mbase_null &&
+					names[0].mode == names[1].mode &&
+					oideq(&names[0].oid, &names[1].oid));
+	unsigned side2_matches_mbase = (!side2_null && !mbase_null &&
+					names[0].mode == names[2].mode &&
+					oideq(&names[0].oid, &names[2].oid));
+	unsigned sides_match = (!side1_null && !side2_null &&
+				names[1].mode == names[2].mode &&
+				oideq(&names[1].oid, &names[2].oid));
 
 	/* n = 3 is a fundamental assumption. */
 	if (n != 3)
@@ -154,10 +163,19 @@  static int collect_merge_info_callback(int n,
 		newinfo.pathlen = st_add3(newinfo.pathlen, p->pathlen, 1);
 
 		for (i = 0; i < 3; i++, dirmask >>= 1) {
-			const struct object_id *oid = NULL;
-			if (dirmask & 1)
-				oid = &names[i].oid;
-			buf[i] = fill_tree_descriptor(opt->repo, t + i, oid);
+			if (i == 1 && side1_matches_mbase)
+				t[1] = t[0];
+			else if (i == 2 && side2_matches_mbase)
+				t[2] = t[0];
+			else if (i == 2 && sides_match)
+				t[2] = t[1];
+			else {
+				const struct object_id *oid = NULL;
+				if (dirmask & 1)
+					oid = &names[i].oid;
+				buf[i] = fill_tree_descriptor(opt->repo,
+							      t + i, oid);
+			}
 		}
 
 		original_dir_name = opti->current_dir_name;