BUG?: xdl_merge surprisingly does not recognize content conflict
diff mbox series

Message ID 20190815220303.17209-1-newren@gmail.com
State New
Headers show
Series
  • BUG?: xdl_merge surprisingly does not recognize content conflict
Related show

Commit Message

Elijah Newren Aug. 15, 2019, 10:03 p.m. UTC
It appears git.git had a case of a patch being resubmitted and both the
original (nd/checkout-m-doc-update) and new (nd/checkout-m) versions
getting applied, with the merge picking to include both versions of some
of the text rather than just one of the two.  I have a patch below to
delete the duplicate hunk, but more surprising to me was the fact that
re-running the merge in question did not show any conflicts despite the
two patches adding different text in the same location.

The "duplicate" commits:
  * commit a7256debd4b6 ("checkout.txt: note about losing staged changes
    with --merge", 2019-03-19), from nd/checkout-m-doc-update
  * commit 6eff409e8a76 ("checkout: prevent losing staged changes with
    --merge", 2019-03-22), from nd/checkout-m
(From reading the mailing list, the former was v1 and the latter was
meant as v2, but the latter was not marked as v2 and apparently both
were picked up.)

The problematic merge was commit 4a3ed2bec603 ("Merge branch
'nd/checkout-m'", 2019-04-25), but redoing that merge produces no merge
conflicts.  This can be seen at the individual file level with the
following:

  $ git show 4a3ed2bec603^1:builtin/checkout.c >ours
  $ git show 4a3ed2bec603^2:builtin/checkout.c >theirs
  $ git show $(git merge-base 4a3ed2bec603^1 4a3ed2bec603^2):builtin/checkout.c >base

At this point, we can note that ours and theirs both added similar code
at the same place; looking at ours:

  $ git diff --no-index base ours | grep -B 4 -A 7 repo_index_has_changes
  @@ -736,6 +738,13 @@ static int merge_working_tree(const struct checkout_opts *opts,
   			if (!old_branch_info->commit)
   				return 1;

  +			if (repo_index_has_changes(the_repository,
  +						   get_commit_tree(old_branch_info->commit),
  +						   &sb))
  +				warning(_("staged changes in the following files may be lost: %s"),
  +					sb.buf);
  +			strbuf_release(&sb);
  +
   			/* Do more real merge */

Looking at theirs:

  $ git diff --no-index base theirs | grep -B 4 -A 7 repo_index_has_changes
   			if (!old_branch_info->commit)
   				return 1;
  +			old_tree = get_commit_tree(old_branch_info->commit);
  +
  +			if (repo_index_has_changes(the_repository, old_tree, &sb))
  +				die(_("cannot continue with staged changes in "
  +				      "the following files:\n%s"), sb.buf);
  +			strbuf_release(&sb);

   			/* Do more real merge */

  @@ -772,7 +781,7 @@ static int merge_working_tree(const struct checkout_opts *opts,

Now, a manual merge of these files gives no conflicts, which surprises me:

  $ git merge-file ours base theirs; echo $?
  0

and this merge includes BOTH additions:

  $ git diff --no-index base ours | grep -B 4 -A 7 repo_index_has_changes
   			if (!old_branch_info->commit)
   				return 1;
  +			old_tree = get_commit_tree(old_branch_info->commit);
  +
  +			if (repo_index_has_changes(the_repository, old_tree, &sb))
  +				die(_("cannot continue with staged changes in "
  +				      "the following files:\n%s"), sb.buf);
  +			strbuf_release(&sb);
  +
  +			if (repo_index_has_changes(the_repository,
  +						   get_commit_tree(old_branch_info->commit),
  +						   &sb))
  +				warning(_("staged changes in the following files may be lost: %s"),
  +					sb.buf);
  +			strbuf_release(&sb);

   			/* Do more real merge */

I'm not that familiar with the xdl_merge stuff, but this seemed buggy
to me.  Or is there something about the content merge that I'm just not
understanding and this merge is actually correct?

Elijah

-- 8< --
Subject: checkout: remove duplicate code

Both commit a7256debd4b6 ("checkout.txt: note about losing staged
changes with --merge", 2019-03-19) from nd/checkout-m-doc-update and
commit 6eff409e8a76 ("checkout: prevent losing staged changes with
--merge", 2019-03-22) from nd/checkout-m were included in git.git
despite the fact that the latter was meant to be v2 of the former.
The merge of these two topics resulted in a redundant chunk of code;
remove it.

Signed-off-by: Elijah Newren <newren@gmail.com>


---
 builtin/checkout.c | 7 -------
 1 file changed, 7 deletions(-)

Comments

Elijah Newren Aug. 15, 2019, 10:06 p.m. UTC | #1
On Thu, Aug 15, 2019 at 3:03 PM Elijah Newren <newren@gmail.com> wrote:
>
> It appears git.git had a case of a patch being resubmitted and both the
> original (nd/checkout-m-doc-update) and new (nd/checkout-m) versions
> getting applied, with the merge picking to include both versions of some
> of the text rather than just one of the two.  I have a patch below to
> delete the duplicate hunk, but more surprising to me was the fact that
> re-running the merge in question did not show any conflicts despite the
> two patches adding different text in the same location.

Sorry, I should specify that I've tested that this behavior goes back
to at least git-1.6.0, so it certainly isn't a new regression and
doesn't need to hold up git-2.23.0 by any means.
Junio C Hamano Aug. 16, 2019, 4:51 p.m. UTC | #2
Elijah Newren <newren@gmail.com> writes:

> Now, a manual merge of these files gives no conflicts, which surprises me:
>
>   $ git merge-file ours base theirs; echo $?
>   0

Indeed that is surprising.

> -- 8< --
> Subject: checkout: remove duplicate code
>
> Both commit a7256debd4b6 ("checkout.txt: note about losing staged
> changes with --merge", 2019-03-19) from nd/checkout-m-doc-update and
> commit 6eff409e8a76 ("checkout: prevent losing staged changes with
> --merge", 2019-03-22) from nd/checkout-m were included in git.git
> despite the fact that the latter was meant to be v2 of the former.
> The merge of these two topics resulted in a redundant chunk of code;
> remove it.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>

The latter half that you remove cannot be reached, because the first
half would have already died after checking the same condition, so
we were doing the same repo-index-has-changes check twice when there
is no change; in other words, we were lucky that this accidental dup
did not cause any real damange.

Thanks for spotting.  Will apply.
Jeff King Aug. 16, 2019, 6:40 p.m. UTC | #3
On Thu, Aug 15, 2019 at 03:03:03PM -0700, Elijah Newren wrote:

> The problematic merge was commit 4a3ed2bec603 ("Merge branch
> 'nd/checkout-m'", 2019-04-25), but redoing that merge produces no merge
> conflicts.  This can be seen at the individual file level with the
> following:
> [...]
> I'm not that familiar with the xdl_merge stuff, but this seemed buggy
> to me.  Or is there something about the content merge that I'm just not
> understanding and this merge is actually correct?

Interesting case. If you look at the combined diff, you can see that the
two hunks are actually separated by a single blank line from the
original:

  $ git show 4a3ed2bec603
  [...]
  diff --cc builtin/checkout.c
  index 2e72a5e5a9,7cd01f62be..ffa776c6e1
  --- a/builtin/checkout.c
  +++ b/builtin/checkout.c
  @@@ -737,14 -738,13 +738,20 @@@ static int merge_working_tree(const str
                           */
                          if (!old_branch_info->commit)
                                  return 1;
  +                       old_tree = get_commit_tree(old_branch_info->commit);
  + 
  +                       if (repo_index_has_changes(the_repository, old_tree, &sb))
  +                               die(_("cannot continue with staged changes in "
  +                                     "the following files:\n%s"), sb.buf);
  +                       strbuf_release(&sb);
    
   +                      if (repo_index_has_changes(the_repository,
   +                                                 get_commit_tree(old_branch_info->commit),
   +                                                 &sb))
   +                              warning(_("staged changes in the following files may be lost: %s"),
   +                                      sb.buf);
   +                      strbuf_release(&sb);
   +
                          /* Do more real merge */
    
                          /*

which is itself an interesting diff artifact. The original had a line at
the end, splitting the "if (!old_branch...)" conditional from the "Do
more real merge" comment.  The upper half of the hunk _didn't_ add a new
line between that old conditional and the new code. But the lower half
did, but diff reports it as adding the line at the end (which is equally
valid to adding the line at the top; who knows what the author actually
did!).

That tidbit aside, in general I'd think a single line would not be
enough to separate two hunks and consider them independent. At first I
thought that XDL_MERGE_ZEALOUS was to blame. If I do this:

diff --git a/ll-merge.c b/ll-merge.c
index 5b8d46aede..ea445dfb55 100644
--- a/ll-merge.c
+++ b/ll-merge.c
@@ -107,7 +107,6 @@ static int ll_xdl_merge(const struct ll_merge_driver *drv_unused,
 	}
 
 	memset(&xmp, 0, sizeof(xmp));
-	xmp.level = XDL_MERGE_ZEALOUS;
 	xmp.favor = opts->variant;
 	xmp.xpp.flags = opts->xdl_opts;
 	if (git_xmerge_style >= 0)

and re-run the merge, I get a conflict. But it's not in those lines! The
diff of the conflicted state is:

diff --cc builtin/checkout.c
index 2e72a5e5a9,7cd01f62be..0000000000
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@@ -725,9 -725,10 +725,15 @@@ static int merge_working_tree(const str
  			 */
  			struct tree *result;
  			struct tree *work;
+ 			struct tree *old_tree;
  			struct merge_options o;
++<<<<<<< HEAD
  			struct strbuf sb = STRBUF_INIT;
  
++=======
++			struct strbuf sb = STRBUF_INIT;
++
++>>>>>>> 4a3ed2bec603^2
  			if (!opts->merge)
  				return 1;
  
@@@ -737,14 -738,13 +743,20 @@@
  			 */
  			if (!old_branch_info->commit)
  				return 1;
+ 			old_tree = get_commit_tree(old_branch_info->commit);
+ 
+ 			if (repo_index_has_changes(the_repository, old_tree, &sb))
+ 				die(_("cannot continue with staged changes in "
+ 				      "the following files:\n%s"), sb.buf);
+ 			strbuf_release(&sb);
  
 +			if (repo_index_has_changes(the_repository,
 +						   get_commit_tree(old_branch_info->commit),
 +						   &sb))
 +				warning(_("staged changes in the following files may be lost: %s"),
 +					sb.buf);
 +			strbuf_release(&sb);
 +
  			/* Do more real merge */
  
  			/*

So it found another conflict (where the zealous resolution did the
_right_ thing!) but didn't do anything for the hunk in question.

I do wonder if the fact that the separating line a blank one is relevant
or not (i.e., is it tickling some heuristics in xdiff).

-Peff
Elijah Newren Aug. 16, 2019, 7:26 p.m. UTC | #4
On Fri, Aug 16, 2019 at 9:51 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > Now, a manual merge of these files gives no conflicts, which surprises me:
> >
> >   $ git merge-file ours base theirs; echo $?
> >   0
>
> Indeed that is surprising.
>
> > -- 8< --
> > Subject: checkout: remove duplicate code
> >
> > Both commit a7256debd4b6 ("checkout.txt: note about losing staged
> > changes with --merge", 2019-03-19) from nd/checkout-m-doc-update and
> > commit 6eff409e8a76 ("checkout: prevent losing staged changes with
> > --merge", 2019-03-22) from nd/checkout-m were included in git.git
> > despite the fact that the latter was meant to be v2 of the former.
> > The merge of these two topics resulted in a redundant chunk of code;
> > remove it.
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
>
> The latter half that you remove cannot be reached, because the first
> half would have already died after checking the same condition, so
> we were doing the same repo-index-has-changes check twice when there
> is no change; in other words, we were lucky that this accidental dup
> did not cause any real damange.

Yes, but I am wondering if there are other cases we just don't know
about yet and for which we did not get lucky.

I dug a little further and found that even the first version of `git
merge-file` that Dscho added to git in 2006 also exhibited the same
behavior.  Since he was basically making a minimal replacement for RCS
merge, I tried out /usr/bin/merge from the rcs package and found it
has the same behavior.  I then downloaded the oldest tarball on the
GNU ftp site I could find for rcs (5.7 from June 1995) and found that
its' merge command had the same behavior.

I'm not sure if that alarms me (not only have all versions of git
potentially been mis-merging things, but all versions of rcs, cvs, and
probably any other version control system out there in current use),
or comforts me ("hey, it's been around for at least 24 years so it
must only mis-merge things in both rare and innocuous ways or else
someone would have noticed it decades ago")

I'm leaning towards "It can't be that bad; I'll look at it again later
if someone else who knows xdiff better hasn't already solved it"

Patch
diff mbox series

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 6123f732a2..dc61afa066 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -730,13 +730,6 @@  static int merge_working_tree(const struct checkout_opts *opts,
 				      "the following files:\n%s"), sb.buf);
 			strbuf_release(&sb);
 
-			if (repo_index_has_changes(the_repository,
-						   get_commit_tree(old_branch_info->commit),
-						   &sb))
-				warning(_("staged changes in the following files may be lost: %s"),
-					sb.buf);
-			strbuf_release(&sb);
-
 			/* Do more real merge */
 
 			/*