diff mbox series

[1/2] xdiff: implement a zealous diff3, or "zdiff3"

Message ID b7561a67c192d4bdede47fee5b7b1cb30c44b785.1623734171.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series RFC: implement new zdiff3 conflict style | expand

Commit Message

Elijah Newren June 15, 2021, 5:16 a.m. UTC
From: Elijah Newren <newren@gmail.com>

"zdiff3" is identical to ordinary diff3 except that it allows compaction
of common lines on the two sides of history at the beginning or end of
the conflict hunk.  For example, the following diff3 conflict:

    1
    2
    3
    4
    <<<<<<
    A
    B
    C
    D
    E
    ||||||
    5
    6
    ======
    A
    X
    C
    Y
    E
    >>>>>>
    7
    8
    9

has common lines 'A', 'C', and 'E' on the two sides.  With zdiff3, one
would instead get the following conflict:

    1
    2
    3
    4
    A
    <<<<<<
    B
    C
    D
    ||||||
    5
    6
    ======
    X
    C
    Y
    >>>>>>
    E
    7
    8
    9

Note that the common lines, 'A', and 'E' were moved outside the
conflict.  Unlike with the two-way conflicts from the 'merge'
conflictStyle, the zdiff3 conflict is NOT split into multiple conflict
regions to allow the common 'C' lines to be shown outside a conflict,
because zdiff3 shows the base version too and the base version cannot be
reasonably split.

Initial-patch-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/merge-file.c                   |  2 +
 contrib/completion/git-completion.bash |  2 +-
 xdiff-interface.c                      |  2 +
 xdiff/xdiff.h                          |  1 +
 xdiff/xmerge.c                         | 52 ++++++++++++++++++++++++--
 5 files changed, 55 insertions(+), 4 deletions(-)

Comments

Junio C Hamano June 15, 2021, 6:13 a.m. UTC | #1
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

[An example to reduce
    1234<ABCDE|56=AXCYE>789
to
    1234A<BCD|56=XCY>E789
elided]

> Note that the common lines, 'A', and 'E' were moved outside the
> conflict.  Unlike with the two-way conflicts from the 'merge'
> conflictStyle, the zdiff3 conflict is NOT split into multiple conflict
> regions to allow the common 'C' lines to be shown outside a conflict,
> because zdiff3 shows the base version too and the base version cannot be
> reasonably split.

Another thing that was brought up during the original discussion is
that zdiff3 uses different meaning of "base version" from diff3.  In
diff3 output, if you remove everything enclosed in <<< and |||, and
everything enclosed in === and >>>, i.e. 123456789 in the original
example above, you will get "the common ancestor" version, which is
what is shown as the "base".  After zdiff3 munges the lines, that is
not the case, 1234A56E789 never appeared in anybody's version.  It
is a "starting from the common ancestor version, both sides agreed
to make the same uncontroversial changes, which is already included
in this base version" and zdiff3 inserts the conflict markers and
material unique to each side into it.  Being able to recover the
common ancestor version is not always necessary and that is what
makes zdiff3 a plausible solution, but to some workflows and tools
it may matter, so it would be helpful to mention it in the
documentation.

Thanks.
Felipe Contreras June 15, 2021, 9:40 a.m. UTC | #2
Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> "zdiff3" is identical to ordinary diff3 except that it allows compaction
> of common lines on the two sides of history at the beginning or end of
> the conflict hunk.

That was not the main reason behind zdiff3.

The whole point of zdiff3 was to have something closer to the "merge"
style, even if not technically correct.

Your proposal is better than diff3 in that respect, but worse than Uwe's
zdiff3.

If you have this:

  l  b  r
  =  =  =
  A  A  A

  B     b
  C     C
  D     D
  E     E
  F     F
  I     i

merge will output this:

  A

  <<<<<<< l
  B
  =======
  b
  >>>>>>> r
  C
  D
  E
  F
  <<<<<<< l
  I
  =======
  i
  >>>>>>> r

This is simple, and useful.

diff3 will output this:

  A
  <<<<<<< l

  B
  C
  D
  E
  F
  I
  ||||||| b
  =======

  b
  C
  D
  E
  F
  i
  >>>>>>> r

Not very friendly.

Your zdiff3:

  A

  <<<<<<< l
  B
  C
  D
  E
  F
  I
  ||||||| b
  =======
  b
  C
  D
  E
  F
  i
  >>>>>>> r

Just marginally better.

Uwe's zdiff3:

  A

  <<<<<<< l
  B
  ||||||| b
  =======
  b
  >>>>>>> r
  C
  D
  E
  F
  <<<<<<< l
  I
  ||||||| b
  =======
  i
  >>>>>>> r

In my view of all the *diff3's, Uwe's version is the most useful.
Elijah Newren June 15, 2021, 6:12 p.m. UTC | #3
On Tue, Jun 15, 2021 at 2:40 AM Felipe Contreras
<felipe.contreras@gmail.com> wrote:
>
> Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@gmail.com>
> >
> > "zdiff3" is identical to ordinary diff3 except that it allows compaction
> > of common lines on the two sides of history at the beginning or end of
> > the conflict hunk.
>
> That was not the main reason behind zdiff3.
>
> The whole point of zdiff3 was to have something closer to the "merge"
> style, even if not technically correct.
>
> Your proposal is better than diff3 in that respect, but worse than Uwe's
> zdiff3.
>
> If you have this:
>
>   l  b  r
>   =  =  =
>   A  A  A
>
>   B     b
>   C     C
>   D     D
>   E     E
>   F     F
>   I     i
>
> merge will output this:
>
>   A
>
>   <<<<<<< l
>   B
>   =======
>   b
>   >>>>>>> r
>   C
>   D
>   E
>   F
>   <<<<<<< l
>   I
>   =======
>   i
>   >>>>>>> r
>
> This is simple, and useful.
>
> diff3 will output this:
>
>   A
>   <<<<<<< l
>
>   B
>   C
>   D
>   E
>   F
>   I
>   ||||||| b
>   =======
>
>   b
>   C
>   D
>   E
>   F
>   i
>   >>>>>>> r
>
> Not very friendly.
>
> Your zdiff3:
>
>   A
>
>   <<<<<<< l
>   B
>   C
>   D
>   E
>   F
>   I
>   ||||||| b
>   =======
>   b
>   C
>   D
>   E
>   F
>   i
>   >>>>>>> r
>
> Just marginally better.

Your example here is one where diff3 has no original text in the
conflicted region.  Empty text is trivially easy to split, making it a
somewhat uninteresting testcase for zdiff3.  The interesting question
is what do you do when that region is non-empty?  When it's non-empty,
it's not going to match the two sides (i.e. it won't have "C D E F"
lines for your example) -- we know that because when the original also
matches the two sides, the xdiff code will start with multiple
separate conflicts instead of one big one.  So, in such a case, do you
still decide to split the conflict regions, and if so, how do you
split the non-matching original text?
Sergey Organov June 15, 2021, 6:50 p.m. UTC | #4
Elijah Newren <newren@gmail.com> writes:

> On Tue, Jun 15, 2021 at 2:40 AM Felipe Contreras
> <felipe.contreras@gmail.com> wrote:
>>
>> Elijah Newren via GitGitGadget wrote:
>> > From: Elijah Newren <newren@gmail.com>
>> >
>> > "zdiff3" is identical to ordinary diff3 except that it allows compaction
>> > of common lines on the two sides of history at the beginning or end of
>> > the conflict hunk.
>>
>> That was not the main reason behind zdiff3.
>>
>> The whole point of zdiff3 was to have something closer to the "merge"
>> style, even if not technically correct.
>>
>> Your proposal is better than diff3 in that respect, but worse than Uwe's
>> zdiff3.
>>
>> If you have this:
>>
>>   l  b  r
>>   =  =  =
>>   A  A  A
>>
>>   B     b
>>   C     C
>>   D     D
>>   E     E
>>   F     F
>>   I     i
>>
>> merge will output this:
>>
>>   A
>>
>>   <<<<<<< l
>>   B
>>   =======
>>   b
>>   >>>>>>> r
>>   C
>>   D
>>   E
>>   F
>>   <<<<<<< l
>>   I
>>   =======
>>   i
>>   >>>>>>> r
>>
>> This is simple, and useful.
>>
>> diff3 will output this:
>>
>>   A
>>   <<<<<<< l
>>
>>   B
>>   C
>>   D
>>   E
>>   F
>>   I
>>   ||||||| b
>>   =======
>>
>>   b
>>   C
>>   D
>>   E
>>   F
>>   i
>>   >>>>>>> r
>>
>> Not very friendly.

For me it's friendly enough. One key-press in Emacs and I get:

--- upper/xx.txt
+++ lower/xx.txt
@@ -1,7 +1,7 @@
 
-B
+b
 C
 D
 E
 F
-I
+i

In a separate window, that is roughly what you've get in the "merge"
output in the first place, but even more compact.

My point is that once Git provides enough context, a good interactive
tool will do the rest just fine, beneficial to the end-user.

-- Sergey Organov
diff mbox series

Patch

diff --git a/builtin/merge-file.c b/builtin/merge-file.c
index 06a2f90c4875..e695867ee548 100644
--- a/builtin/merge-file.c
+++ b/builtin/merge-file.c
@@ -34,6 +34,8 @@  int cmd_merge_file(int argc, const char **argv, const char *prefix)
 	struct option options[] = {
 		OPT_BOOL('p', "stdout", &to_stdout, N_("send results to standard output")),
 		OPT_SET_INT(0, "diff3", &xmp.style, N_("use a diff3 based merge"), XDL_MERGE_DIFF3),
+		OPT_SET_INT(0, "zdiff3", &xmp.style, N_("use a zealous diff3 based merge"),
+				XDL_MERGE_ZEALOUS_DIFF3),
 		OPT_SET_INT(0, "ours", &xmp.favor, N_("for conflicts, use our version"),
 			    XDL_MERGE_FAVOR_OURS),
 		OPT_SET_INT(0, "theirs", &xmp.favor, N_("for conflicts, use their version"),
diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index b50c5d0ea386..859455929814 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1566,7 +1566,7 @@  _git_checkout ()
 
 	case "$cur" in
 	--conflict=*)
-		__gitcomp "diff3 merge" "" "${cur##--conflict=}"
+		__gitcomp "diff3 merge zdiff3" "" "${cur##--conflict=}"
 		;;
 	--*)
 		__gitcomp_builtin checkout
diff --git a/xdiff-interface.c b/xdiff-interface.c
index 609615db2cd6..9977813a9d37 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -308,6 +308,8 @@  int git_xmerge_config(const char *var, const char *value, void *cb)
 			die("'%s' is not a boolean", var);
 		if (!strcmp(value, "diff3"))
 			git_xmerge_style = XDL_MERGE_DIFF3;
+		else if (!strcmp(value, "zdiff3"))
+			git_xmerge_style = XDL_MERGE_ZEALOUS_DIFF3;
 		else if (!strcmp(value, "merge"))
 			git_xmerge_style = 0;
 		/*
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 7a046051468f..8629ae287c79 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -65,6 +65,7 @@  extern "C" {
 
 /* merge output styles */
 #define XDL_MERGE_DIFF3 1
+#define XDL_MERGE_ZEALOUS_DIFF3 2
 
 typedef struct s_mmfile {
 	char *ptr;
diff --git a/xdiff/xmerge.c b/xdiff/xmerge.c
index 1659edb45393..b1dc9df7ea0a 100644
--- a/xdiff/xmerge.c
+++ b/xdiff/xmerge.c
@@ -230,7 +230,7 @@  static int fill_conflict_hunk(xdfenv_t *xe1, const char *name1,
 	size += xdl_recs_copy(xe1, m->i1, m->chg1, needs_cr, 1,
 			      dest ? dest + size : NULL);
 
-	if (style == XDL_MERGE_DIFF3) {
+	if (style == XDL_MERGE_DIFF3 || style == XDL_MERGE_ZEALOUS_DIFF3) {
 		/* Shared preimage */
 		if (!dest) {
 			size += marker_size + 1 + needs_cr + marker3_size;
@@ -327,7 +327,7 @@  static int xdl_fill_merge_buffer(xdfenv_t *xe1, const char *name1,
  * lines. Try hard to show only these few lines as conflicting.
  */
 static int xdl_refine_conflicts(xdfenv_t *xe1, xdfenv_t *xe2, xdmerge_t *m,
-		xpparam_t const *xpp)
+				xpparam_t const *xpp, int style)
 {
 	for (; m; m = m->next) {
 		mmfile_t t1, t2;
@@ -368,6 +368,42 @@  static int xdl_refine_conflicts(xdfenv_t *xe1, xdfenv_t *xe2, xdmerge_t *m,
 			continue;
 		}
 		x = xscr;
+		if (style == XDL_MERGE_ZEALOUS_DIFF3) {
+			int advance1 = xscr->i1, advance2 = xscr->i2;
+
+			/*
+			 * Advance m->i1 and m->i2 so that conflict for sides
+			 * 1 and 2 start after common region.  Decrement
+			 * m->chg[12] since there are now fewer conflict lines
+			 * for those sides.
+			 */
+			m->i1 += advance1;
+			m->i2 += advance2;
+			m->chg1 -= advance1;
+			m->chg2 -= advance2;
+
+			/*
+			 * Splitting conflicts due to internal common regions
+			 * on the two sides would be inappropriate since we
+			 * are also showing the merge base and have no
+			 * reasonable way to split the merge base text.
+			 */
+			while (xscr->next)
+				xscr = xscr->next;
+
+			/*
+			 * Lower the number of conflict lines to not include
+			 * the final common lines, if any.  Do this by setting
+			 * number of conflict lines to
+			 *   (line offset for start of conflict in xscr) +
+			 *   (number of lines in the conflict in xscr)
+			 */
+			m->chg1 = (xscr->i1 - advance1) + (xscr->chg1);
+			m->chg2 = (xscr->i2 - advance2) + (xscr->chg2);
+			xdl_free_env(&xe);
+			xdl_free_script(x);
+			continue;
+		}
 		m->i1 = xscr->i1 + i1;
 		m->chg1 = xscr->chg1;
 		m->i2 = xscr->i2 + i2;
@@ -482,6 +518,16 @@  static int xdl_do_merge(xdfenv_t *xe1, xdchange_t *xscr1,
 	int style = xmp->style;
 	int favor = xmp->favor;
 
+	/*
+	 * XDL_MERGE_DIFF3 does not attempt to refine conflicts by looking
+	 * at common areas of sides 1 & 2, because the base (side 0) does
+	 * not match and is being shown.
+	 *
+	 * XDL_MERGE_ZEALOUS_DIFF3 will attempt to refine conflicts
+	 * looking for common areas of sides 1 & 2, despite the base
+	 * not matching and being shown, but will only look for common
+	 * areas at the beginning or ending of the conflict block.
+	 */
 	if (style == XDL_MERGE_DIFF3) {
 		/*
 		 * "diff3 -m" output does not make sense for anything
@@ -604,7 +650,7 @@  static int xdl_do_merge(xdfenv_t *xe1, xdchange_t *xscr1,
 		changes = c;
 	/* refine conflicts */
 	if (XDL_MERGE_ZEALOUS <= level &&
-	    (xdl_refine_conflicts(xe1, xe2, changes, xpp) < 0 ||
+	    (xdl_refine_conflicts(xe1, xe2, changes, xpp, style) < 0 ||
 	     xdl_simplify_non_conflicts(xe1, changes,
 					XDL_MERGE_ZEALOUS < level) < 0)) {
 		xdl_cleanup_merge(changes);