diff mbox series

[v2,3/3] merge-ort: fix issue with dual rename and add/add conflict

Message ID da3ae38e390df8acf86e910389d1620569a95a87.1656572226.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Fix dual rename into each other plus conflicting adds | expand

Commit Message

Elijah Newren June 30, 2022, 6:57 a.m. UTC
From: Elijah Newren <newren@gmail.com>

There is code in both merge-recursive and merge-ort for avoiding doubly
transitive renames (i.e. one side renames directory A/ -> B/, and the
other side renames directory B/ -> C/), because this combination would
otherwise make a mess for new files added to A/ on the first side and
wondering which directory they end up in -- especially if there were
even more renames such as the first side renaming C/ -> D/.  In such
cases, it just turns "off" directory rename detection for the higher
order transitive cases.

The testcases added in t6423 a couple commits ago are slightly different
but similar in principle.  They involve a similar case of paired
renaming but instead of A/ -> B/ and B/ -> C/, the second side renames
a leading directory of B/ to C/.  And both sides add a new file
somewhere under the directory that the other side will rename.  While
the new files added start within different directories and thus could
logically end up within different directories, it is weird for a file
on one side to end up where the other one started and not move along
with it.  So, let's just turn off directory rename detection in this
case as well.

Another way to look at this is that if the source name involved in a
directory rename on one side is the target name of a directory rename
operation for a file from the other side, then we avoid the doubly
transitive rename.  (More concretely, if a directory rename on side D
wants to rename a file on side E from OLD_NAME -> NEW_NAME, and side D
already had a file named NEW_NAME, and a directory rename on side E
wants to rename side D's NEW_NAME -> NEWER_NAME, then we turn off the
directory rename detection for NEW_NAME to prevent the
NEW_NAME -> NEWER_NAME rename, and instead end up with an add/add
conflict on NEW_NAME.)

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 merge-ort.c                         | 7 +++++++
 t/t6423-merge-rename-directories.sh | 4 ++--
 2 files changed, 9 insertions(+), 2 deletions(-)

Comments

Ævar Arnfjörð Bjarmason June 30, 2022, 10:31 a.m. UTC | #1
On Thu, Jun 30 2022, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
>
> There is code in both merge-recursive and merge-ort for avoiding doubly
> transitive renames (i.e. one side renames directory A/ -> B/, and the
> other side renames directory B/ -> C/), because this combination would
> otherwise make a mess for new files added to A/ on the first side and
> wondering which directory they end up in -- especially if there were
> even more renames such as the first side renaming C/ -> D/.  In such
> cases, it just turns "off" directory rename detection for the higher
> order transitive cases.
>
> The testcases added in t6423 a couple commits ago are slightly different
> but similar in principle.  They involve a similar case of paired
> renaming but instead of A/ -> B/ and B/ -> C/, the second side renames
> a leading directory of B/ to C/.  And both sides add a new file
> somewhere under the directory that the other side will rename.  While
> the new files added start within different directories and thus could
> logically end up within different directories, it is weird for a file
> on one side to end up where the other one started and not move along
> with it.  So, let's just turn off directory rename detection in this
> case as well.
>
> Another way to look at this is that if the source name involved in a
> directory rename on one side is the target name of a directory rename
> operation for a file from the other side, then we avoid the doubly
> transitive rename.  (More concretely, if a directory rename on side D
> wants to rename a file on side E from OLD_NAME -> NEW_NAME, and side D
> already had a file named NEW_NAME, and a directory rename on side E
> wants to rename side D's NEW_NAME -> NEWER_NAME, then we turn off the
> directory rename detection for NEW_NAME to prevent the
> NEW_NAME -> NEWER_NAME rename, and instead end up with an add/add
> conflict on NEW_NAME.)
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  merge-ort.c                         | 7 +++++++
>  t/t6423-merge-rename-directories.sh | 4 ++--
>  2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/merge-ort.c b/merge-ort.c
> index fa6667de18c..17db4c30e5b 100644
> --- a/merge-ort.c
> +++ b/merge-ort.c
> @@ -2292,9 +2292,16 @@ static char *check_for_directory_rename(struct merge_options *opt,
>  	struct strmap_entry *rename_info;
>  	struct strmap_entry *otherinfo = NULL;
>  	const char *new_dir;
> +	int other_side = 3 - side_index;
>  
> +	/*
> +	 * Cases where we don't have or don't want a directory rename for
> +	 * this path, so we return NULL.
> +	 */
>  	if (strmap_empty(dir_renames))
>  		return new_path;
> +	if (strmap_get(&collisions[other_side], path))
> +		return new_path;

I realize from looking at merge-recursive.c that this is carrying
forward some legacy debt, but I find this code and the need for a
comment more complex than it needs to. On top of master this will work
just as well:
	
	diff --git a/merge-ort.c b/merge-ort.c
	index b5015b9afd4..f5a02b1ff6f 100644
	--- a/merge-ort.c
	+++ b/merge-ort.c
	@@ -2268,16 +2268,16 @@ static char *check_for_directory_rename(struct merge_options *opt,
	 					struct strmap *collisions,
	 					int *clean_merge)
	 {
	-	char *new_path = NULL;
	+	char *new_path;
	 	struct strmap_entry *rename_info;
	 	struct strmap_entry *otherinfo = NULL;
	 	const char *new_dir;
	 
	 	if (strmap_empty(dir_renames))
	-		return new_path;
	+		return NULL;
	 	rename_info = check_dir_renamed(path, dir_renames);
	 	if (!rename_info)
	-		return new_path;
	+		return NULL;
	 	/* old_dir = rename_info->key; */
	 	new_dir = rename_info->value;

I.e. we're really just making the reader squint to see that we're
actually returning NULL here, it's only later that we have an actual
"new path".

Wouldn't sticking that earlier in this series be an improvement? The
reason you need to explain "so we return NULL" is because we're carrying
forward this oddity, but if we just don't initialize it and return NULL
instead...

If you want to keep this pretty much as-is I think you should add a \n
before the (not seen in your context) "old_dir" comment seen in the
context here above, to make it visually clear that your new comment is
referring to these chains of returns. That could also be made clearer
with (again, on top of master, and could be combined with the above):
	
	diff --git a/merge-ort.c b/merge-ort.c
	index b5015b9afd4..a418f81a3eb 100644
	--- a/merge-ort.c
	+++ b/merge-ort.c
	@@ -2278,8 +2278,6 @@ static char *check_for_directory_rename(struct merge_options *opt,
	 	rename_info = check_dir_renamed(path, dir_renames);
	 	if (!rename_info)
	 		return new_path;
	-	/* old_dir = rename_info->key; */
	-	new_dir = rename_info->value;
	 
	 	/*
	 	 * This next part is a little weird.  We do not want to do an
	@@ -2305,6 +2303,7 @@ static char *check_for_directory_rename(struct merge_options *opt,
	 	 * As it turns out, this also prevents N-way transient rename
	 	 * confusion; See testcases 9c and 9d of t6043.
	 	 */
	+	new_dir = rename_info->value; /* old_dir = rename_info->key; */
	 	otherinfo = strmap_get_entry(dir_rename_exclusions, new_dir);
	 	if (otherinfo) {
	 		path_msg(opt, rename_info->key, 1,
Elijah Newren July 1, 2022, 3:03 a.m. UTC | #2
On Thu, Jun 30, 2022 at 3:41 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Thu, Jun 30 2022, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > There is code in both merge-recursive and merge-ort for avoiding doubly
> > transitive renames (i.e. one side renames directory A/ -> B/, and the
> > other side renames directory B/ -> C/), because this combination would
> > otherwise make a mess for new files added to A/ on the first side and
> > wondering which directory they end up in -- especially if there were
> > even more renames such as the first side renaming C/ -> D/.  In such
> > cases, it just turns "off" directory rename detection for the higher
> > order transitive cases.
> >
> > The testcases added in t6423 a couple commits ago are slightly different
> > but similar in principle.  They involve a similar case of paired
> > renaming but instead of A/ -> B/ and B/ -> C/, the second side renames
> > a leading directory of B/ to C/.  And both sides add a new file
> > somewhere under the directory that the other side will rename.  While
> > the new files added start within different directories and thus could
> > logically end up within different directories, it is weird for a file
> > on one side to end up where the other one started and not move along
> > with it.  So, let's just turn off directory rename detection in this
> > case as well.
> >
> > Another way to look at this is that if the source name involved in a
> > directory rename on one side is the target name of a directory rename
> > operation for a file from the other side, then we avoid the doubly
> > transitive rename.  (More concretely, if a directory rename on side D
> > wants to rename a file on side E from OLD_NAME -> NEW_NAME, and side D
> > already had a file named NEW_NAME, and a directory rename on side E
> > wants to rename side D's NEW_NAME -> NEWER_NAME, then we turn off the
> > directory rename detection for NEW_NAME to prevent the
> > NEW_NAME -> NEWER_NAME rename, and instead end up with an add/add
> > conflict on NEW_NAME.)
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  merge-ort.c                         | 7 +++++++
> >  t/t6423-merge-rename-directories.sh | 4 ++--
> >  2 files changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/merge-ort.c b/merge-ort.c
> > index fa6667de18c..17db4c30e5b 100644
> > --- a/merge-ort.c
> > +++ b/merge-ort.c
> > @@ -2292,9 +2292,16 @@ static char *check_for_directory_rename(struct merge_options *opt,
> >       struct strmap_entry *rename_info;
> >       struct strmap_entry *otherinfo = NULL;
> >       const char *new_dir;
> > +     int other_side = 3 - side_index;
> >
> > +     /*
> > +      * Cases where we don't have or don't want a directory rename for
> > +      * this path, so we return NULL.
> > +      */
> >       if (strmap_empty(dir_renames))
> >               return new_path;
> > +     if (strmap_get(&collisions[other_side], path))
> > +             return new_path;
>
> I realize from looking at merge-recursive.c that this is carrying
> forward some legacy debt

...which was introduced by me as well...

> , but I find this code and the need for a
> comment more complex than it needs to. On top of master this will work
> just as well:
>
>         diff --git a/merge-ort.c b/merge-ort.c
>         index b5015b9afd4..f5a02b1ff6f 100644
>         --- a/merge-ort.c
>         +++ b/merge-ort.c
>         @@ -2268,16 +2268,16 @@ static char *check_for_directory_rename(struct merge_options *opt,
>                                                 struct strmap *collisions,
>                                                 int *clean_merge)
>          {
>         -       char *new_path = NULL;
>         +       char *new_path;
>                 struct strmap_entry *rename_info;
>                 struct strmap_entry *otherinfo = NULL;
>                 const char *new_dir;
>
>                 if (strmap_empty(dir_renames))
>         -               return new_path;
>         +               return NULL;
>                 rename_info = check_dir_renamed(path, dir_renames);
>                 if (!rename_info)
>         -               return new_path;
>         +               return NULL;
>                 /* old_dir = rename_info->key; */
>                 new_dir = rename_info->value;

You know, while making this patch series I was asking myself, "Why
didn't I just explicitly return NULL here?"  I don't know why I did it
this way, but it now seems slightly odd to me too.  I decided to not
fix it up and just provide a more targeted fix, but since it tripped
you up, I'm happy to add this as a preparatory cleanup.

> I.e. we're really just making the reader squint to see that we're
> actually returning NULL here, it's only later that we have an actual
> "new path".
>
> Wouldn't sticking that earlier in this series be an improvement? The
> reason you need to explain "so we return NULL" is because we're carrying
> forward this oddity, but if we just don't initialize it and return NULL
> instead...

I still think the comment makes sense to add, other than the "so we
return NULL" bit, even after this change.  But yeah, we can strike the
"so we return NULL" part of it after this cleanup.

> If you want to keep this pretty much as-is I think you should add a \n
> before the (not seen in your context) "old_dir" comment seen in the
> context here above, to make it visually clear that your new comment is
> referring to these chains of returns. That could also be made clearer
> with (again, on top of master, and could be combined with the above):
>
>         diff --git a/merge-ort.c b/merge-ort.c
>         index b5015b9afd4..a418f81a3eb 100644
>         --- a/merge-ort.c
>         +++ b/merge-ort.c
>         @@ -2278,8 +2278,6 @@ static char *check_for_directory_rename(struct merge_options *opt,
>                 rename_info = check_dir_renamed(path, dir_renames);
>                 if (!rename_info)
>                         return new_path;
>         -       /* old_dir = rename_info->key; */
>         -       new_dir = rename_info->value;
>
>                 /*
>                  * This next part is a little weird.  We do not want to do an
>         @@ -2305,6 +2303,7 @@ static char *check_for_directory_rename(struct merge_options *opt,
>                  * As it turns out, this also prevents N-way transient rename
>                  * confusion; See testcases 9c and 9d of t6043.
>                  */
>         +       new_dir = rename_info->value; /* old_dir = rename_info->key; */
>                 otherinfo = strmap_get_entry(dir_rename_exclusions, new_dir);
>                 if (otherinfo) {
>                         path_msg(opt, rename_info->key, 1,

Sure, seems fine.
diff mbox series

Patch

diff --git a/merge-ort.c b/merge-ort.c
index fa6667de18c..17db4c30e5b 100644
--- a/merge-ort.c
+++ b/merge-ort.c
@@ -2292,9 +2292,16 @@  static char *check_for_directory_rename(struct merge_options *opt,
 	struct strmap_entry *rename_info;
 	struct strmap_entry *otherinfo = NULL;
 	const char *new_dir;
+	int other_side = 3 - side_index;
 
+	/*
+	 * Cases where we don't have or don't want a directory rename for
+	 * this path, so we return NULL.
+	 */
 	if (strmap_empty(dir_renames))
 		return new_path;
+	if (strmap_get(&collisions[other_side], path))
+		return new_path;
 	rename_info = check_dir_renamed(path, dir_renames);
 	if (!rename_info)
 		return new_path;
diff --git a/t/t6423-merge-rename-directories.sh b/t/t6423-merge-rename-directories.sh
index db13bb995f3..1300a01206a 100755
--- a/t/t6423-merge-rename-directories.sh
+++ b/t/t6423-merge-rename-directories.sh
@@ -5245,7 +5245,7 @@  test_setup_12l () {
 	)
 }
 
-test_expect_merge_algorithm failure failure '12l (B into A): Rename into each other + add/add conflict' '
+test_expect_merge_algorithm failure success '12l (B into A): Rename into each other + add/add conflict' '
 	test_setup_12l BintoA &&
 	(
 		cd 12l_BintoA &&
@@ -5273,7 +5273,7 @@  test_expect_merge_algorithm failure failure '12l (B into A): Rename into each ot
 	)
 '
 
-test_expect_merge_algorithm failure failure '12l (A into B): Rename into each other + add/add conflict' '
+test_expect_merge_algorithm failure success '12l (A into B): Rename into each other + add/add conflict' '
 	test_setup_12l AintoB &&
 	(
 		cd 12l_AintoB &&