diff mbox series

[07/11] merge-ort: add implementation of both sides renaming differently

Message ID d4595397052568ea6ea0cf46190374c74140fed7.1607542887.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series merge-ort: add basic rename detection | expand

Commit Message

Elijah Newren Dec. 9, 2020, 7:41 p.m. UTC
From: Elijah Newren <newren@gmail.com>

Implement rename/rename(1to2) handling, i.e. both sides of history
renaming a file and rename it differently.  This code replaces the
following from merge-recurisve.c:

  * all the 1to2 code in process_renames()
  * the RENAME_ONE_FILE_TO_TWO case of process_entry()
  * handle_rename_rename_1to2()

Also, there is some shared code from merge-recursive.c for multiple
different rename cases which we will no longer need for this case (or
other rename cases):

  * handle_file_collision()
  * setup_rename_conflict_info()

The consolidation of five separate codepaths into one is made possible
by a change in design: process_renames() tweaks the conflict_info
entries within opt->priv->paths such that process_entry() can then
handle all the non-rename conflict types (directory/file, modify/delete,
etc.) orthogonally.  This means we're much less likely to miss special
implementation of some kind of combination of conflict types (see
commits brought in by 66c62eaec6 ("Merge branch 'en/merge-tests'",
2020-11-18), especially commit ef52778708 ("merge tests: expect improved
directory/file conflict handling in ort", 2020-10-26) for more details).
That, together with letting worktree/index updating be handled
orthogonally in the merge_switch_to_result() function, dramatically
simplifies the code for various special rename cases.

To be fair, there is a _slight_ tweak to process_entry() here to make
sure that the two different paths aren't marked as clean but are left in
a conflicted state.  So process_renames() and process_entry() aren't
quite entirely orthogonal, but they are pretty close.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 55 insertions(+), 3 deletions(-)

Comments

Derrick Stolee Dec. 11, 2020, 3:39 a.m. UTC | #1
On 12/9/2020 2:41 PM, Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Implement rename/rename(1to2) handling, i.e. both sides of history
> renaming a file and rename it differently.  This code replaces the
> following from merge-recurisve.c:
> 
>   * all the 1to2 code in process_renames()
>   * the RENAME_ONE_FILE_TO_TWO case of process_entry()
>   * handle_rename_rename_1to2()
> 
> Also, there is some shared code from merge-recursive.c for multiple
> different rename cases which we will no longer need for this case (or
> other rename cases):
> 
>   * handle_file_collision()
>   * setup_rename_conflict_info()
> 
> The consolidation of five separate codepaths into one is made possible
> by a change in design:

Excellent!

>  			/* This is a rename/rename(1to2) */
> -			die("Not yet implemented");
> +			clean_merge = handle_content_merge(opt,
> +							   pair->one->path,
> +							   &base->stages[0],
> +							   &side1->stages[1],
> +							   &side2->stages[2],
> +							   pathnames,
> +							   1 + 2 * opt->priv->call_depth,
> +							   &merged);

(this method currently die()s. ok)

> +			if (!clean_merge &&
> +			    merged.mode == side1->stages[1].mode &&
> +			    oideq(&merged.oid, &side1->stages[1].oid)) {
> +				was_binary_blob = 1;
> +			}

nit: Extraneous braces?

> +			memcpy(&side1->stages[1], &merged, sizeof(merged));
> +			if (was_binary_blob) {
> +				/*
> +				 * Getting here means we were attempting to
> +				 * merge a binary blob.
> +				 *
> +				 * Since we can't merge binaries,
> +				 * handle_content_merge() just takes one
> +				 * side.  But we don't want to copy the
> +				 * contents of one side to both paths.  We
> +				 * used the contents of side1 above for
> +				 * side1->stages, let's use the contents of
> +				 * side2 for side2->stages below.
> +				 */
> +				oidcpy(&merged.oid, &side2->stages[2].oid);
> +				merged.mode = side2->stages[2].mode;
> +			}
> +			memcpy(&side2->stages[2], &merged, sizeof(merged));
> +
> +			side1->path_conflict = 1;
> +			side2->path_conflict = 1;
> +			/*
> +			 * TODO: For renames we normally remove the path at the
> +			 * old name.  It would thus seem consistent to do the
> +			 * same for rename/rename(1to2) cases, but we haven't
> +			 * done so traditionally and a number of the regression
> +			 * tests now encode an expectation that the file is
> +			 * left there at stage 1.  If we ever decide to change
> +			 * this, add the following two lines here:
> +			 *    base->merged.is_null = 1;
> +			 *    base->merged.clean = 1;
> +			 * and remove the setting of base->path_conflict to 1.
> +			 */
> +			base->path_conflict = 1;

I'm getting the point of the review/evening where I'm starting to gloss
over these important details. Time to take a break (after this patch).

> +			path_msg(opt, oldpath, 0,
> +				 _("CONFLICT (rename/rename): %s renamed to "
> +				   "%s in %s and to %s in %s."),
> +				 pathnames[0],
> +				 pathnames[1], opt->branch1,
> +				 pathnames[2], opt->branch2);

This output differs a bit from handle_rename_rename_1to2() in
merge-recursive.c:

	output(opt, 1, _("CONFLICT (rename/rename): "
	       "Rename \"%s\"->\"%s\" in branch \"%s\" "
	       "rename \"%s\"->\"%s\" in \"%s\"%s"),
	       o->path, a->path, ci->ren1->branch,
	       o->path, b->path, ci->ren2->branch,
	       opt->priv->call_depth ? _(" (left unresolved)") : "");

How much do we want to have _exact_ output matches between the
two strategies, at least in the short term?

> @@ -1257,13 +1309,13 @@ static void process_entry(struct merge_options *opt,
>  		int side = (ci->filemask == 4) ? 2 : 1;
>  		ci->merged.result.mode = ci->stages[side].mode;
>  		oidcpy(&ci->merged.result.oid, &ci->stages[side].oid);
> -		ci->merged.clean = !ci->df_conflict;
> +		ci->merged.clean = !ci->df_conflict && !ci->path_conflict;
>  	} else if (ci->filemask == 1) {
>  		/* Deleted on both sides */
>  		ci->merged.is_null = 1;
>  		ci->merged.result.mode = 0;
>  		oidcpy(&ci->merged.result.oid, &null_oid);
> -		ci->merged.clean = 1;
> +		ci->merged.clean = !ci->path_conflict;

These exist because this is the first time we assign path_conflict.
Sure.

Thanks,
-Stolee
Elijah Newren Dec. 11, 2020, 9:56 p.m. UTC | #2
On Thu, Dec 10, 2020 at 7:39 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 12/9/2020 2:41 PM, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > Implement rename/rename(1to2) handling, i.e. both sides of history
> > renaming a file and rename it differently.  This code replaces the
> > following from merge-recurisve.c:
> >
> >   * all the 1to2 code in process_renames()
> >   * the RENAME_ONE_FILE_TO_TWO case of process_entry()
> >   * handle_rename_rename_1to2()
> >
> > Also, there is some shared code from merge-recursive.c for multiple
> > different rename cases which we will no longer need for this case (or
> > other rename cases):
> >
> >   * handle_file_collision()
> >   * setup_rename_conflict_info()
> >
> > The consolidation of five separate codepaths into one is made possible
> > by a change in design:
>
> Excellent!
>
> >                       /* This is a rename/rename(1to2) */
> > -                     die("Not yet implemented");
> > +                     clean_merge = handle_content_merge(opt,
> > +                                                        pair->one->path,
> > +                                                        &base->stages[0],
> > +                                                        &side1->stages[1],
> > +                                                        &side2->stages[2],
> > +                                                        pathnames,
> > +                                                        1 + 2 * opt->priv->call_depth,
> > +                                                        &merged);
>
> (this method currently die()s. ok)
>
> > +                     if (!clean_merge &&
> > +                         merged.mode == side1->stages[1].mode &&
> > +                         oideq(&merged.oid, &side1->stages[1].oid)) {
> > +                             was_binary_blob = 1;
> > +                     }
>
> nit: Extraneous braces?

Yeah.

> > +                     memcpy(&side1->stages[1], &merged, sizeof(merged));
> > +                     if (was_binary_blob) {
> > +                             /*
> > +                              * Getting here means we were attempting to
> > +                              * merge a binary blob.
> > +                              *
> > +                              * Since we can't merge binaries,
> > +                              * handle_content_merge() just takes one
> > +                              * side.  But we don't want to copy the
> > +                              * contents of one side to both paths.  We
> > +                              * used the contents of side1 above for
> > +                              * side1->stages, let's use the contents of
> > +                              * side2 for side2->stages below.
> > +                              */
> > +                             oidcpy(&merged.oid, &side2->stages[2].oid);
> > +                             merged.mode = side2->stages[2].mode;
> > +                     }
> > +                     memcpy(&side2->stages[2], &merged, sizeof(merged));
> > +
> > +                     side1->path_conflict = 1;
> > +                     side2->path_conflict = 1;
> > +                     /*
> > +                      * TODO: For renames we normally remove the path at the
> > +                      * old name.  It would thus seem consistent to do the
> > +                      * same for rename/rename(1to2) cases, but we haven't
> > +                      * done so traditionally and a number of the regression
> > +                      * tests now encode an expectation that the file is
> > +                      * left there at stage 1.  If we ever decide to change
> > +                      * this, add the following two lines here:
> > +                      *    base->merged.is_null = 1;
> > +                      *    base->merged.clean = 1;
> > +                      * and remove the setting of base->path_conflict to 1.
> > +                      */
> > +                     base->path_conflict = 1;
>
> I'm getting the point of the review/evening where I'm starting to gloss
> over these important details. Time to take a break (after this patch).

I'm surprised you didn't take a break after giving a talk at GitHub
Universe earlier yesterday.  I don't think I got anything done the
rest of the day after I gave my talk at GitMerge 2020.  Nice job on
your talk, by the way; I'm sending it along to some others to watch.

> > +                     path_msg(opt, oldpath, 0,
> > +                              _("CONFLICT (rename/rename): %s renamed to "
> > +                                "%s in %s and to %s in %s."),
> > +                              pathnames[0],
> > +                              pathnames[1], opt->branch1,
> > +                              pathnames[2], opt->branch2);
>
> This output differs a bit from handle_rename_rename_1to2() in
> merge-recursive.c:
>
>         output(opt, 1, _("CONFLICT (rename/rename): "
>                "Rename \"%s\"->\"%s\" in branch \"%s\" "
>                "rename \"%s\"->\"%s\" in \"%s\"%s"),
>                o->path, a->path, ci->ren1->branch,
>                o->path, b->path, ci->ren2->branch,
>                opt->priv->call_depth ? _(" (left unresolved)") : "");
>
> How much do we want to have _exact_ output matches between the
> two strategies, at least in the short term?

Good question.  I started with such a goal, and discarded it when I
discovered it was unrealistic and even actively harmful -- at least in
the extreme of exact matching in all cases.  I've already updated the
regression tests to expect differences in output and behavior in a few
different series that merged down months ago[1][2][3].
   [1] https://lore.kernel.org/git/pull.827.v3.git.git.1597098559.gitgitgadget@gmail.com/
   [2] https://lore.kernel.org/git/pull.769.v2.git.1603731704.gitgitgadget@gmail.com/
   [3] https://lore.kernel.org/git/pull.879.git.git.1602794790.gitgitgadget@gmail.com/

 Of particular note from those series, at least as far as output
messages, are the following commits:
   1f3c9ba707 ("t6425: be more flexible with rename/delete conflict
messages", 2020-08-10)
   2a7c16c980 ("t6422, t6426: be more flexible for add/add conflicts
involving renames", 2020-08-10)
   c8c35f6a02 ("merge tests: expect slight differences in output for
recursive vs. ort", 2020-10-26)

Summarizing those, I split some conflict messages into multiple
messages, changed the name of some conflict types that get reported,
avoided having some messages go to stdout while others go to stderr
(based on who added them rather than differences in severity of the
message), and even just changed some messages (because once you've
accepted the other changes, exact matching just doesn't matter).
Other commits in the series also note how I changed the merge behavior
in addition to the output in various cases.

The rename/delete conflict commit message is particularly
illustrative.  It demonstrates why exact matching of output messages
is unachievable short of keeping a completely busted code design.

We can certainly tweak individual messages if we feel something will
make them clearer, but matching isn't a goal for me.

> > @@ -1257,13 +1309,13 @@ static void process_entry(struct merge_options *opt,
> >               int side = (ci->filemask == 4) ? 2 : 1;
> >               ci->merged.result.mode = ci->stages[side].mode;
> >               oidcpy(&ci->merged.result.oid, &ci->stages[side].oid);
> > -             ci->merged.clean = !ci->df_conflict;
> > +             ci->merged.clean = !ci->df_conflict && !ci->path_conflict;
> >       } else if (ci->filemask == 1) {
> >               /* Deleted on both sides */
> >               ci->merged.is_null = 1;
> >               ci->merged.result.mode = 0;
> >               oidcpy(&ci->merged.result.oid, &null_oid);
> > -             ci->merged.clean = 1;
> > +             ci->merged.clean = !ci->path_conflict;
>
> These exist because this is the first time we assign path_conflict.
> Sure.
>
> Thanks,
> -Stolee


Thanks for all your detailed reviews!
diff mbox series

Patch

diff --git a/merge-ort.c b/merge-ort.c
index 085e81196a5..75e638a23eb 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -680,7 +680,59 @@  static int process_renames(struct merge_options *opt,
 			}
 
 			/* This is a rename/rename(1to2) */
-			die("Not yet implemented");
+			clean_merge = handle_content_merge(opt,
+							   pair->one->path,
+							   &base->stages[0],
+							   &side1->stages[1],
+							   &side2->stages[2],
+							   pathnames,
+							   1 + 2 * opt->priv->call_depth,
+							   &merged);
+			if (!clean_merge &&
+			    merged.mode == side1->stages[1].mode &&
+			    oideq(&merged.oid, &side1->stages[1].oid)) {
+				was_binary_blob = 1;
+			}
+			memcpy(&side1->stages[1], &merged, sizeof(merged));
+			if (was_binary_blob) {
+				/*
+				 * Getting here means we were attempting to
+				 * merge a binary blob.
+				 *
+				 * Since we can't merge binaries,
+				 * handle_content_merge() just takes one
+				 * side.  But we don't want to copy the
+				 * contents of one side to both paths.  We
+				 * used the contents of side1 above for
+				 * side1->stages, let's use the contents of
+				 * side2 for side2->stages below.
+				 */
+				oidcpy(&merged.oid, &side2->stages[2].oid);
+				merged.mode = side2->stages[2].mode;
+			}
+			memcpy(&side2->stages[2], &merged, sizeof(merged));
+
+			side1->path_conflict = 1;
+			side2->path_conflict = 1;
+			/*
+			 * TODO: For renames we normally remove the path at the
+			 * old name.  It would thus seem consistent to do the
+			 * same for rename/rename(1to2) cases, but we haven't
+			 * done so traditionally and a number of the regression
+			 * tests now encode an expectation that the file is
+			 * left there at stage 1.  If we ever decide to change
+			 * this, add the following two lines here:
+			 *    base->merged.is_null = 1;
+			 *    base->merged.clean = 1;
+			 * and remove the setting of base->path_conflict to 1.
+			 */
+			base->path_conflict = 1;
+			path_msg(opt, oldpath, 0,
+				 _("CONFLICT (rename/rename): %s renamed to "
+				   "%s in %s and to %s in %s."),
+				 pathnames[0],
+				 pathnames[1], opt->branch1,
+				 pathnames[2], opt->branch2);
 
 			i++; /* We handled both renames, i.e. i+1 handled */
 			continue;
@@ -1257,13 +1309,13 @@  static void process_entry(struct merge_options *opt,
 		int side = (ci->filemask == 4) ? 2 : 1;
 		ci->merged.result.mode = ci->stages[side].mode;
 		oidcpy(&ci->merged.result.oid, &ci->stages[side].oid);
-		ci->merged.clean = !ci->df_conflict;
+		ci->merged.clean = !ci->df_conflict && !ci->path_conflict;
 	} else if (ci->filemask == 1) {
 		/* Deleted on both sides */
 		ci->merged.is_null = 1;
 		ci->merged.result.mode = 0;
 		oidcpy(&ci->merged.result.oid, &null_oid);
-		ci->merged.clean = 1;
+		ci->merged.clean = !ci->path_conflict;
 	}
 
 	/*