diff mbox series

[v2] diff-tree: fix crash when used with --remerge-diff

Message ID pull.1771.v2.git.1723188292498.gitgitgadget@gmail.com (mailing list archive)
State Accepted
Commit a77554ea0972e4ec3518fb8fb0b2582c76cf3105
Headers show
Series [v2] diff-tree: fix crash when used with --remerge-diff | expand

Commit Message

Xing Xin Aug. 9, 2024, 7:24 a.m. UTC
From: Xing Xin <xingxin.xx@bytedance.com>

When using "git-diff-tree" to get the tree diff for merge commits with
the diff format set to `remerge`, a bug is triggered as shown below:

  $ git diff-tree -r --remerge-diff 363337e6eb
  363337e6eb812d0c0d785ed4261544f35559ff8b
  BUG: log-tree.c:1006: did a remerge diff without remerge_objdir?!?

This bug is reported by `log-tree.c:do_remerge_diff`, where a bug check
added in commit 7b90ab467a (log: clean unneeded objects during log
--remerge-diff, 2022-02-02) detects the absence of `remerge_objdir` when
attempting to clean up temporary objects generated during the remerge
process.

After some further digging, I find that the remerge-related diff options
were introduced in db757e8b8d (show, log: provide a --remerge-diff
capability, 2022-02-02), which also affect the setup of `rev_info` for
"git-diff-tree", but were not accounted for in the original
implementation (inferred from the commit message).

Elijah Newren, the author of the remerge diff feature, notes that other
callers of `log-tree.c:log_tree_commit` (the only caller of
`log-tree.c:do_remerge_diff`) also exist, but:

  `builtin/am.c`: manually sets all flags; remerge_diff is not among them
  `sequencer.c`: manually sets all flags; remerge_diff is not among them

so `builtin/diff-tree.c` really is the only caller that was overlooked
when remerge-diff functionality was added.

This commit resolves the crash by adding `remerge_objdir` setup logic to
`builtin/diff-tree.c`, mirroring `builtin/log.c:cmd_log_walk_no_free`.
It also includes the necessary cleanup for `remerge_objdir`.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
---
    diff-tree: fix crash when used with --remerge-diff

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1771%2Fblanet%2Fxx%2Ffix-diff-tree-crash-on-remerge-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1771/blanet/xx/fix-diff-tree-crash-on-remerge-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1771

Range-diff vs v1:

 1:  f0b86faa275 ! 1:  57f0b1247d8 diff-tree: fix crash when used with --remerge-diff
     @@ Commit message
          When using "git-diff-tree" to get the tree diff for merge commits with
          the diff format set to `remerge`, a bug is triggered as shown below:
      
     -        $ git diff-tree -r --remerge-diff 363337e6eb
     -        363337e6eb812d0c0d785ed4261544f35559ff8b
     -        BUG: log-tree.c:1006: did a remerge diff without remerge_objdir?!?
     +      $ git diff-tree -r --remerge-diff 363337e6eb
     +      363337e6eb812d0c0d785ed4261544f35559ff8b
     +      BUG: log-tree.c:1006: did a remerge diff without remerge_objdir?!?
      
          This bug is reported by `log-tree.c:do_remerge_diff`, where a bug check
          added in commit 7b90ab467a (log: clean unneeded objects during log
     @@ Commit message
          "git-diff-tree", but were not accounted for in the original
          implementation (inferred from the commit message).
      
     -    This commit fixes the bug by adding initialization logic for
     -    `remerge_objdir` in `builtin/diff-tree.c`, mirroring the logic in
     -    `builtin/log.c:cmd_log_walk_no_free`. A final cleanup for
     -    `remerge_objdir` is also included.
     +    Elijah Newren, the author of the remerge diff feature, notes that other
     +    callers of `log-tree.c:log_tree_commit` (the only caller of
     +    `log-tree.c:do_remerge_diff`) also exist, but:
      
     +      `builtin/am.c`: manually sets all flags; remerge_diff is not among them
     +      `sequencer.c`: manually sets all flags; remerge_diff is not among them
     +
     +    so `builtin/diff-tree.c` really is the only caller that was overlooked
     +    when remerge-diff functionality was added.
     +
     +    This commit resolves the crash by adding `remerge_objdir` setup logic to
     +    `builtin/diff-tree.c`, mirroring `builtin/log.c:cmd_log_walk_no_free`.
     +    It also includes the necessary cleanup for `remerge_objdir`.
     +
     +    Reviewed-by: Elijah Newren <newren@gmail.com>
          Signed-off-by: Xing Xin <xingxin.xx@bytedance.com>
      
       ## builtin/diff-tree.c ##
      @@
     + #include "read-cache-ll.h"
       #include "repository.h"
       #include "revision.h"
     - #include "tree.h"
      +#include "tmp-objdir.h"
     + #include "tree.h"
       
       static struct rev_info log_tree_opt;
     - 
      @@ builtin/diff-tree.c: int cmd_diff_tree(int argc, const char **argv, const char *prefix)
       
       	opt->diffopt.rotate_to_strict = 1;


 builtin/diff-tree.c     | 13 +++++++++++++
 t/t4069-remerge-diff.sh | 35 +++++++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+)


base-commit: 406f326d271e0bacecdb00425422c5fa3f314930

Comments

Junio C Hamano Aug. 9, 2024, 3:32 p.m. UTC | #1
"blanet via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Elijah Newren, the author of the remerge diff feature, notes that other
> callers of `log-tree.c:log_tree_commit` (the only caller of
> `log-tree.c:do_remerge_diff`) also exist, but:
>
>   `builtin/am.c`: manually sets all flags; remerge_diff is not among them
>   `sequencer.c`: manually sets all flags; remerge_diff is not among them
>
> so `builtin/diff-tree.c` really is the only caller that was overlooked
> when remerge-diff functionality was added.

That is more than OK as a band-aid, and I'll take the patch as-is,
but I have to wonder if we do even better in a future follow-up
patch.

Any time do_remerge_diff() is entered, we know that either the end
user (from the command line) or the hard-coded caller (like
am/sequencer cited above) wants us to do the remerge-diff, which in
turn requires us to have the temporary object directory rotated into
the status of the primary object store.  And there is nothing in
that object directory rotation code that requires caller-specific
customization---it is the same "create remerge-diff directory as
tmp-objdir, rotate it into the alt object store chain as the
primary" regardless of the actual caller).

So wouldn't it work well if we

 (1) at the beginning of do_remerge_diff(), only once for a rev_info
     structure:
   (1-a) lazily do the "object directory rotation"
   (1-b) set up an atexit handler to clear the temporary object
         store
 (2) remove all the "ah, we need to prepare and tear down the
     temporary object store for _this_ operation" we have sprinkled
     in different code paths (including the one added by the fix we
     are looking at).

That way, we won't have to worry about adding future remerge_diff
users, including existing hard-coded callers.

ANyway, thanks for the fix.  It is very pleasing to see contributors
working well together.
Junio C Hamano Aug. 9, 2024, 3:46 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> but I have to wonder if we do even better in a future follow-up
> patch.

"if we do" -> "if we can do".

> So wouldn't it work well if we
>
>  (1) at the beginning of do_remerge_diff(), only once for a rev_info
>      structure:
>    (1-a) lazily do the "object directory rotation"
>    (1-b) set up an atexit handler to clear the temporary object
>          store

An atexit handler may not be enough, when a program wants to start
creating a real object after we did a remerge-diff but before
exiting. So we'd probably need to allow an explicit "ok, we are
done" clean-up call for such a program, too.

And the atexit handler can call the same clean-up function if the
program hasn't called it explicitly.  For logically read-only
operations like diff-tree, they do not have to worry about rotating
the real object store back to the primary status as soon as
possible.
diff mbox series

Patch

diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c
index 0d3c611aac0..b8df1d4b79b 100644
--- a/builtin/diff-tree.c
+++ b/builtin/diff-tree.c
@@ -8,6 +8,7 @@ 
 #include "read-cache-ll.h"
 #include "repository.h"
 #include "revision.h"
+#include "tmp-objdir.h"
 #include "tree.h"
 
 static struct rev_info log_tree_opt;
@@ -166,6 +167,13 @@  int cmd_diff_tree(int argc, const char **argv, const char *prefix)
 
 	opt->diffopt.rotate_to_strict = 1;
 
+	if (opt->remerge_diff) {
+		opt->remerge_objdir = tmp_objdir_create("remerge-diff");
+		if (!opt->remerge_objdir)
+			die(_("unable to create temporary object directory"));
+		tmp_objdir_replace_primary_odb(opt->remerge_objdir, 1);
+	}
+
 	/*
 	 * NOTE!  We expect "a..b" to expand to "^a b" but it is
 	 * perfectly valid for revision range parser to yield "b ^a",
@@ -230,5 +238,10 @@  int cmd_diff_tree(int argc, const char **argv, const char *prefix)
 		diff_free(&opt->diffopt);
 	}
 
+	if (opt->remerge_diff) {
+		tmp_objdir_destroy(opt->remerge_objdir);
+		opt->remerge_objdir = NULL;
+	}
+
 	return diff_result_code(&opt->diffopt);
 }
diff --git a/t/t4069-remerge-diff.sh b/t/t4069-remerge-diff.sh
index 07323ebafe0..ca8f999caba 100755
--- a/t/t4069-remerge-diff.sh
+++ b/t/t4069-remerge-diff.sh
@@ -110,6 +110,41 @@  test_expect_success 'can filter out additional headers with pickaxe' '
 	test_must_be_empty actual
 '
 
+test_expect_success 'remerge-diff also works for git-diff-tree' '
+	# With a clean merge
+	git diff-tree -r -p --remerge-diff --no-commit-id bc_resolution >actual &&
+	test_must_be_empty actual &&
+
+	# With both a resolved conflict and an unrelated change
+	cat <<-EOF >tmp &&
+	diff --git a/numbers b/numbers
+	remerge CONFLICT (content): Merge conflict in numbers
+	index a1fb731..6875544 100644
+	--- a/numbers
+	+++ b/numbers
+	@@ -1,13 +1,9 @@
+	 1
+	 2
+	-<<<<<<< b0ed5cb (change_a)
+	-three
+	-=======
+	-tres
+	->>>>>>> 6cd3f82 (change_b)
+	+drei
+	 4
+	 5
+	 6
+	 7
+	-eight
+	+acht
+	 9
+	EOF
+	sed -e "s/[0-9a-f]\{7,\}/HASH/g" tmp >expect &&
+	git diff-tree -r -p --remerge-diff --no-commit-id ab_resolution >tmp &&
+	sed -e "s/[0-9a-f]\{7,\}/HASH/g" tmp >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'setup non-content conflicts' '
 	git switch --orphan base &&