diff mbox series

ll-merge: teach ll_binary_merge() a trivial three-way merge

Message ID xmqqeebi9vd0.fsf_-_@gitster.g (mailing list archive)
State New
Headers show
Series ll-merge: teach ll_binary_merge() a trivial three-way merge | expand

Commit Message

Junio C Hamano July 28, 2021, 5:55 p.m. UTC
The low-level binary merge code assumed that the caller will not
feed trivial merges that would have been resolved at the tree level;
because of this, ll_binary_merge() assumes the ancestor is different
from either side, always failing the merge in conflict unless -Xours
or -Xtheirs is in effect.

But "git apply --3way" codepath could ask us to perform three-way
merge between two binaries A and B using A as the ancestor version.
The current code always fails such an application, but when given a
binary patch that turns A into B and asked to apply it to A, there
is no reason to fail such a request---we can trivially tell that the
result must be B.

Arguably, this fix may belong to one level higher at ll_merge()
function, which dispatches to lower-level merge drivers, possibly
even before it renormalizes the three input buffers.  But let's
first see how this goes.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
[jc: stolen new tests from Jerry's patch]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * This time as a proper patch form.  I am asking Elijah's input as
   I suspect this belongs to ll_merge() layer and it may impact not
   just "apply --3way" codepath (which is the primary intended user
   of this "feature") but the merge strategies.  On the other hand,
   properly written merge strategies would not pass trivial merges
   down to the low-level backends, so it may not matter much to,
   say, "merge -sort" and friends.

 ll-merge.c                | 56 +++++++++++++++++++++++++++------------
 t/t4108-apply-threeway.sh | 45 +++++++++++++++++++++++++++++++
 2 files changed, 84 insertions(+), 17 deletions(-)

Comments

Elijah Newren July 28, 2021, 11:49 p.m. UTC | #1
On Wed, Jul 28, 2021 at 11:55 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> The low-level binary merge code assumed that the caller will not
> feed trivial merges that would have been resolved at the tree level;
> because of this, ll_binary_merge() assumes the ancestor is different
> from either side, always failing the merge in conflict unless -Xours
> or -Xtheirs is in effect.
>
> But "git apply --3way" codepath could ask us to perform three-way
> merge between two binaries A and B using A as the ancestor version.
> The current code always fails such an application, but when given a
> binary patch that turns A into B and asked to apply it to A, there
> is no reason to fail such a request---we can trivially tell that the
> result must be B.
>
> Arguably, this fix may belong to one level higher at ll_merge()
> function, which dispatches to lower-level merge drivers, possibly
> even before it renormalizes the three input buffers.  But let's
> first see how this goes.
>
> Signed-off-by: Jerry Zhang <jerry@skydio.com>
> [jc: stolen new tests from Jerry's patch]
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>
>  * This time as a proper patch form.  I am asking Elijah's input as
>    I suspect this belongs to ll_merge() layer and it may impact not
>    just "apply --3way" codepath (which is the primary intended user
>    of this "feature") but the merge strategies.  On the other hand,
>    properly written merge strategies would not pass trivial merges
>    down to the low-level backends, so it may not matter much to,
>    say, "merge -sort" and friends.

The patch looks correct if we choose to modify the ll_binary_merge level.

I agree that properly written merge strategies (at least both
merge-recursive and merge-ort) would not pass trivial merges down to
the low-level backends...but I think this change still matters to them
from a performance perspective.  Additional up-front full content
comparisons feel like an unnecessary performance penalty, so if we do
something like this, keeping it at the ll_binary_merge() level to
limit it to binary files would limit the penalty.  However...

It appears that try_threeway() in apply.c is already computing the
OIDs of the blobs involved, so it looks like the full content
comparison is unnecessary even in the apply --3way case.  If we moved
the trivial-merge check to that function, it could just compare the
OIDs rather than comparing the full content.

>  ll-merge.c                | 56 +++++++++++++++++++++++++++------------
>  t/t4108-apply-threeway.sh | 45 +++++++++++++++++++++++++++++++
>  2 files changed, 84 insertions(+), 17 deletions(-)
>
> diff --git a/ll-merge.c b/ll-merge.c
> index 261657578c..301e244971 100644
> --- a/ll-merge.c
> +++ b/ll-merge.c
> @@ -46,6 +46,13 @@ void reset_merge_attributes(void)
>         merge_attributes = NULL;
>  }
>
> +static int same_mmfile(mmfile_t *a, mmfile_t *b)
> +{
> +       if (a->size != b->size)
> +               return 0;
> +       return !memcmp(a->ptr, b->ptr, a->size);
> +}
> +
>  /*
>   * Built-in low-levels
>   */
> @@ -58,9 +65,18 @@ static int ll_binary_merge(const struct ll_merge_driver *drv_unused,
>                            const struct ll_merge_options *opts,
>                            int marker_size)
>  {
> +       int status;
>         mmfile_t *stolen;
>         assert(opts);
>
> +       /*
> +        * With -Xtheirs or -Xours, we have cleanly merged;
> +        * otherwise we got a conflict, unless 3way trivially
> +        * resolves.
> +        */
> +       status = (opts->variant == XDL_MERGE_FAVOR_OURS ||
> +                 opts->variant == XDL_MERGE_FAVOR_THEIRS) ? 0 : 1;
> +
>         /*
>          * The tentative merge result is the common ancestor for an
>          * internal merge.  For the final merge, it is "ours" by
> @@ -68,18 +84,30 @@ static int ll_binary_merge(const struct ll_merge_driver *drv_unused,
>          */
>         if (opts->virtual_ancestor) {
>                 stolen = orig;
> +               status = 0;
>         } else {
> -               switch (opts->variant) {
> -               default:
> -                       warning("Cannot merge binary files: %s (%s vs. %s)",
> -                               path, name1, name2);
> -                       /* fallthru */
> -               case XDL_MERGE_FAVOR_OURS:
> -                       stolen = src1;
> -                       break;
> -               case XDL_MERGE_FAVOR_THEIRS:
> +               if (same_mmfile(orig, src1)) {
>                         stolen = src2;
> -                       break;
> +                       status = 0;
> +               } else if (same_mmfile(orig, src2)) {
> +                       stolen = src1;
> +                       status = 0;
> +               } else if (same_mmfile(src1, src2)) {
> +                       stolen = src1;
> +                       status = 0;
> +               } else {
> +                       switch (opts->variant) {
> +                       default:
> +                               warning("Cannot merge binary files: %s (%s vs. %s)",
> +                                       path, name1, name2);
> +                               /* fallthru */
> +                       case XDL_MERGE_FAVOR_OURS:
> +                               stolen = src1;
> +                               break;
> +                       case XDL_MERGE_FAVOR_THEIRS:
> +                               stolen = src2;
> +                               break;
> +                       }
>                 }
>         }
>
> @@ -87,13 +115,7 @@ static int ll_binary_merge(const struct ll_merge_driver *drv_unused,
>         result->size = stolen->size;
>         stolen->ptr = NULL;
>
> -       /*
> -        * With -Xtheirs or -Xours, we have cleanly merged;
> -        * otherwise we got a conflict.
> -        */
> -       return opts->variant == XDL_MERGE_FAVOR_OURS ||
> -              opts->variant == XDL_MERGE_FAVOR_THEIRS ?
> -              0 : 1;
> +       return status;
>  }
>
>  static int ll_xdl_merge(const struct ll_merge_driver *drv_unused,
> diff --git a/t/t4108-apply-threeway.sh b/t/t4108-apply-threeway.sh
> index 65147efdea..cc3aa3314a 100755
> --- a/t/t4108-apply-threeway.sh
> +++ b/t/t4108-apply-threeway.sh
> @@ -230,4 +230,49 @@ test_expect_success 'apply with --3way --cached and conflicts' '
>         test_cmp expect.diff actual.diff
>  '
>
> +test_expect_success 'apply binary file patch' '
> +       git reset --hard main &&
> +       cp "$TEST_DIRECTORY/test-binary-1.png" bin.png &&
> +       git add bin.png &&
> +       git commit -m "add binary file" &&
> +
> +       cp "$TEST_DIRECTORY/test-binary-2.png" bin.png &&
> +
> +       git diff --binary >bin.diff &&
> +       git reset --hard &&
> +
> +       # Apply must succeed.
> +       git apply bin.diff
> +'
> +
> +test_expect_success 'apply binary file patch with 3way' '
> +       git reset --hard main &&
> +       cp "$TEST_DIRECTORY/test-binary-1.png" bin.png &&
> +       git add bin.png &&
> +       git commit -m "add binary file" &&
> +
> +       cp "$TEST_DIRECTORY/test-binary-2.png" bin.png &&
> +
> +       git diff --binary >bin.diff &&
> +       git reset --hard &&
> +
> +       # Apply must succeed.
> +       git apply --3way --index bin.diff
> +'
> +
> +test_expect_success 'apply full-index patch with 3way' '
> +       git reset --hard main &&
> +       cp "$TEST_DIRECTORY/test-binary-1.png" bin.png &&
> +       git add bin.png &&
> +       git commit -m "add binary file" &&
> +
> +       cp "$TEST_DIRECTORY/test-binary-2.png" bin.png &&
> +
> +       git diff --full-index >bin.diff &&
> +       git reset --hard &&
> +
> +       # Apply must succeed.
> +       git apply --3way --index bin.diff
> +'
> +
>  test_done
> --
> 2.32.0-561-g6177dfa0d2
Junio C Hamano July 29, 2021, 1:06 a.m. UTC | #2
Elijah Newren <newren@gmail.com> writes:

> It appears that try_threeway() in apply.c is already computing the
> OIDs of the blobs involved, so it looks like the full content
> comparison is unnecessary even in the apply --3way case.  If we moved
> the trivial-merge check to that function, it could just compare the
> OIDs rather than comparing the full content.

Yeah, if we trust merge backends and only fix "apply --3way"
codepath, which I actually am OK with, I agree that it would be
vastly simpler and nicer to do it in try_threeway().

Thanks.
diff mbox series

Patch

diff --git a/ll-merge.c b/ll-merge.c
index 261657578c..301e244971 100644
--- a/ll-merge.c
+++ b/ll-merge.c
@@ -46,6 +46,13 @@  void reset_merge_attributes(void)
 	merge_attributes = NULL;
 }
 
+static int same_mmfile(mmfile_t *a, mmfile_t *b)
+{
+	if (a->size != b->size)
+		return 0;
+	return !memcmp(a->ptr, b->ptr, a->size);
+}
+
 /*
  * Built-in low-levels
  */
@@ -58,9 +65,18 @@  static int ll_binary_merge(const struct ll_merge_driver *drv_unused,
 			   const struct ll_merge_options *opts,
 			   int marker_size)
 {
+	int status;
 	mmfile_t *stolen;
 	assert(opts);
 
+	/*
+	 * With -Xtheirs or -Xours, we have cleanly merged;
+	 * otherwise we got a conflict, unless 3way trivially
+	 * resolves.
+	 */
+	status = (opts->variant == XDL_MERGE_FAVOR_OURS ||
+		  opts->variant == XDL_MERGE_FAVOR_THEIRS) ? 0 : 1;
+
 	/*
 	 * The tentative merge result is the common ancestor for an
 	 * internal merge.  For the final merge, it is "ours" by
@@ -68,18 +84,30 @@  static int ll_binary_merge(const struct ll_merge_driver *drv_unused,
 	 */
 	if (opts->virtual_ancestor) {
 		stolen = orig;
+		status = 0;
 	} else {
-		switch (opts->variant) {
-		default:
-			warning("Cannot merge binary files: %s (%s vs. %s)",
-				path, name1, name2);
-			/* fallthru */
-		case XDL_MERGE_FAVOR_OURS:
-			stolen = src1;
-			break;
-		case XDL_MERGE_FAVOR_THEIRS:
+		if (same_mmfile(orig, src1)) {
 			stolen = src2;
-			break;
+			status = 0;
+		} else if (same_mmfile(orig, src2)) {
+			stolen = src1;
+			status = 0;
+		} else if (same_mmfile(src1, src2)) {
+			stolen = src1;
+			status = 0;
+		} else {
+			switch (opts->variant) {
+			default:
+				warning("Cannot merge binary files: %s (%s vs. %s)",
+					path, name1, name2);
+				/* fallthru */
+			case XDL_MERGE_FAVOR_OURS:
+				stolen = src1;
+				break;
+			case XDL_MERGE_FAVOR_THEIRS:
+				stolen = src2;
+				break;
+			}
 		}
 	}
 
@@ -87,13 +115,7 @@  static int ll_binary_merge(const struct ll_merge_driver *drv_unused,
 	result->size = stolen->size;
 	stolen->ptr = NULL;
 
-	/*
-	 * With -Xtheirs or -Xours, we have cleanly merged;
-	 * otherwise we got a conflict.
-	 */
-	return opts->variant == XDL_MERGE_FAVOR_OURS ||
-	       opts->variant == XDL_MERGE_FAVOR_THEIRS ?
-	       0 : 1;
+	return status;
 }
 
 static int ll_xdl_merge(const struct ll_merge_driver *drv_unused,
diff --git a/t/t4108-apply-threeway.sh b/t/t4108-apply-threeway.sh
index 65147efdea..cc3aa3314a 100755
--- a/t/t4108-apply-threeway.sh
+++ b/t/t4108-apply-threeway.sh
@@ -230,4 +230,49 @@  test_expect_success 'apply with --3way --cached and conflicts' '
 	test_cmp expect.diff actual.diff
 '
 
+test_expect_success 'apply binary file patch' '
+	git reset --hard main &&
+	cp "$TEST_DIRECTORY/test-binary-1.png" bin.png &&
+	git add bin.png &&
+	git commit -m "add binary file" &&
+
+	cp "$TEST_DIRECTORY/test-binary-2.png" bin.png &&
+
+	git diff --binary >bin.diff &&
+	git reset --hard &&
+
+	# Apply must succeed.
+	git apply bin.diff
+'
+
+test_expect_success 'apply binary file patch with 3way' '
+	git reset --hard main &&
+	cp "$TEST_DIRECTORY/test-binary-1.png" bin.png &&
+	git add bin.png &&
+	git commit -m "add binary file" &&
+
+	cp "$TEST_DIRECTORY/test-binary-2.png" bin.png &&
+
+	git diff --binary >bin.diff &&
+	git reset --hard &&
+
+	# Apply must succeed.
+	git apply --3way --index bin.diff
+'
+
+test_expect_success 'apply full-index patch with 3way' '
+	git reset --hard main &&
+	cp "$TEST_DIRECTORY/test-binary-1.png" bin.png &&
+	git add bin.png &&
+	git commit -m "add binary file" &&
+
+	cp "$TEST_DIRECTORY/test-binary-2.png" bin.png &&
+
+	git diff --full-index >bin.diff &&
+	git reset --hard &&
+
+	# Apply must succeed.
+	git apply --3way --index bin.diff
+'
+
 test_done