diff mbox series

2.36 gitk/diff-tree --stdin regression fix

Message ID xmqq7d7bsu2n.fsf@gitster.g (mailing list archive)
State Accepted
Commit f8781bfda31756acdc0ae77da7e70337aedae7c9
Headers show
Series 2.36 gitk/diff-tree --stdin regression fix | expand

Commit Message

Junio C Hamano April 26, 2022, 4:11 p.m. UTC
This only surfaced as a regression after 2.36 release, but the
breakage was already there with us for at least a year.

The diff_free() call is to be used after we completely finished with
a diffopt structure.  After "git diff A B" finishes producing
output, calling it before process exit is fine.  But there are
commands that prepares diff_options struct once, compares two sets
of paths, releases resources that were used to do the comparison,
then reuses the same diff_option struct to go on to compare the next
two sets of paths, like "git log -p".  

After "git log -p" finishes showing a single commit, calling it
before it goes on to the next commit is NOT fine.  There is a
mechanism, the .no_free member in diff_options struct, to help "git
log" to avoid calling diff_free() after showing each commit and
instead call it just one.  When the mechanism was introduced in
e900d494 (diff: add an API for deferred freeing, 2021-02-11),
however, we forgot to do the same to "diff-tree --stdin", which *is*
a moral equivalent to "git log".

During 2.36 release cycle, we started clearing the pathspec in
diff_free(), so programs like gitk that runs

    git diff-tree --stdin -- <pathspec>

downstream of a pipe, processing one commit after another, started
showing irrelevant comparison outside the given <pathspec> from the
second commit.  The same commit, by forgetting to teach the .no_free
mechanism, broke "diff-tree --stdin -I<regexp>" and nobody noticed
it for over a year, presumably because it is so seldom used an
option.

But <pathspec> is a different story.  The breakage was very
prominently visible and was reported immediately after 2.36 was
released.

Fix this breakage by mimicking how "git log" utilizes the .no_free
member so that "diff-tree --stdin" behaves more similarly to "log".

Protect the fix with a few new tests.

Reported-by: Matthias Aßhauer <mha1993@live.de>
Helped-by: René Scharfe <l.s.r@web.de>
Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * I feel MUCH better with this than the revert, now Phillip helped
   me to get the root cause straight.  Addition of clear_pathspec()
   to diff_tree() was *not* a mistake but is quite reasonable thing
   to do.  Not using the .no_free hack in a code path that needed it
   was.



 builtin/diff-tree.c     |  3 +++
 log-tree.c              |  1 +
 t/t4013-diff-various.sh | 14 ++++++++++++++
 3 files changed, 18 insertions(+)

Comments

René Scharfe April 27, 2022, 4:42 p.m. UTC | #1
Am 26.04.22 um 18:11 schrieb Junio C Hamano:
> This only surfaced as a regression after 2.36 release, but the
> breakage was already there with us for at least a year.
>
> The diff_free() call is to be used after we completely finished with
> a diffopt structure.  After "git diff A B" finishes producing
> output, calling it before process exit is fine.  But there are
> commands that prepares diff_options struct once, compares two sets
> of paths, releases resources that were used to do the comparison,
> then reuses the same diff_option struct to go on to compare the next
> two sets of paths, like "git log -p".
>
> After "git log -p" finishes showing a single commit, calling it
> before it goes on to the next commit is NOT fine.  There is a
> mechanism, the .no_free member in diff_options struct, to help "git
> log" to avoid calling diff_free() after showing each commit and
> instead call it just one.  When the mechanism was introduced in
> e900d494 (diff: add an API for deferred freeing, 2021-02-11),
> however, we forgot to do the same to "diff-tree --stdin", which *is*
> a moral equivalent to "git log".
>
> During 2.36 release cycle, we started clearing the pathspec in
> diff_free(), so programs like gitk that runs
>
>     git diff-tree --stdin -- <pathspec>
>
> downstream of a pipe, processing one commit after another, started
> showing irrelevant comparison outside the given <pathspec> from the
> second commit.  The same commit, by forgetting to teach the .no_free
> mechanism, broke "diff-tree --stdin -I<regexp>" and nobody noticed
> it for over a year, presumably because it is so seldom used an
> option.
>
> But <pathspec> is a different story.  The breakage was very
> prominently visible and was reported immediately after 2.36 was
> released.
>
> Fix this breakage by mimicking how "git log" utilizes the .no_free
> member so that "diff-tree --stdin" behaves more similarly to "log".
>
> Protect the fix with a few new tests.

We could check where reused diffopts caused a pathspec loss at runtime,
like in the patch below.  Then we "just" need to get the relevant test
coverage to 100% and we'll find them all.

With your patch on top of main, "make test" passes for me.  With the
patch below added as well I get failures in three test scripts:

t3427-rebase-subtree.sh                          (Wstat: 256 Tests: 3 Failed: 2)
  Failed tests:  2-3
  Non-zero exit status: 1
t4014-format-patch.sh                            (Wstat: 256 Tests: 190 Failed: 1)
  Failed test:  73
  Non-zero exit status: 1
t9350-fast-export.sh                             (Wstat: 256 Tests: 50 Failed: 3)
  Failed tests:  30, 32, 43
  Non-zero exit status: 1

The format-patch is a bit surprising to me because it already sets
no_free conditionally.  t4014 is successful if no_free is set in all
cases, so the condition seems to be too narrow -- but I don't understand
it.  Didn't look at the other cases.

---
 diff.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/diff.c b/diff.c
index ef7159968b..b7c837aca8 100644
--- a/diff.c
+++ b/diff.c
@@ -6455,9 +6455,16 @@ static void diff_free_ignore_regex(struct diff_options *options)

 void diff_free(struct diff_options *options)
 {
+	static struct diff_options *prev_options_with_pathspec;
+
 	if (options->no_free)
 		return;

+	if (prev_options_with_pathspec == options && !options->pathspec.nr)
+		BUG("reused struct diff_options, potentially lost pathspec");
+	if (options->pathspec.nr)
+		prev_options_with_pathspec = options;
+
 	diff_free_file(options);
 	diff_free_ignore_regex(options);
 	clear_pathspec(&options->pathspec);
--
2.35.3
René Scharfe April 27, 2022, 6:06 p.m. UTC | #2
Am 27.04.22 um 18:42 schrieb René Scharfe:
> Am 26.04.22 um 18:11 schrieb Junio C Hamano:
>> This only surfaced as a regression after 2.36 release, but the
>> breakage was already there with us for at least a year.
>>
>> The diff_free() call is to be used after we completely finished with
>> a diffopt structure.  After "git diff A B" finishes producing
>> output, calling it before process exit is fine.  But there are
>> commands that prepares diff_options struct once, compares two sets
>> of paths, releases resources that were used to do the comparison,
>> then reuses the same diff_option struct to go on to compare the next
>> two sets of paths, like "git log -p".
>>
>> After "git log -p" finishes showing a single commit, calling it
>> before it goes on to the next commit is NOT fine.  There is a
>> mechanism, the .no_free member in diff_options struct, to help "git
>> log" to avoid calling diff_free() after showing each commit and
>> instead call it just one.  When the mechanism was introduced in
>> e900d494 (diff: add an API for deferred freeing, 2021-02-11),
>> however, we forgot to do the same to "diff-tree --stdin", which *is*
>> a moral equivalent to "git log".
>>
>> During 2.36 release cycle, we started clearing the pathspec in
>> diff_free(), so programs like gitk that runs
>>
>>     git diff-tree --stdin -- <pathspec>
>>
>> downstream of a pipe, processing one commit after another, started
>> showing irrelevant comparison outside the given <pathspec> from the
>> second commit.  The same commit, by forgetting to teach the .no_free
>> mechanism, broke "diff-tree --stdin -I<regexp>" and nobody noticed
>> it for over a year, presumably because it is so seldom used an
>> option.
>>
>> But <pathspec> is a different story.  The breakage was very
>> prominently visible and was reported immediately after 2.36 was
>> released.
>>
>> Fix this breakage by mimicking how "git log" utilizes the .no_free
>> member so that "diff-tree --stdin" behaves more similarly to "log".
>>
>> Protect the fix with a few new tests.
>
> We could check where reused diffopts caused a pathspec loss at runtime,
> like in the patch below.  Then we "just" need to get the relevant test
> coverage to 100% and we'll find them all.
>
> With your patch on top of main, "make test" passes for me.  With the
> patch below added as well I get failures in three test scripts:
>
> t3427-rebase-subtree.sh                          (Wstat: 256 Tests: 3 Failed: 2)
>   Failed tests:  2-3
>   Non-zero exit status: 1
> t4014-format-patch.sh                            (Wstat: 256 Tests: 190 Failed: 1)
>   Failed test:  73
>   Non-zero exit status: 1
> t9350-fast-export.sh                             (Wstat: 256 Tests: 50 Failed: 3)
>   Failed tests:  30, 32, 43
>   Non-zero exit status: 1
>
> The format-patch is a bit surprising to me because it already sets
> no_free conditionally.  t4014 is successful if no_free is set in all
> cases, so the condition seems to be too narrow -- but I don't understand
> it.  Didn't look at the other cases.
>
> ---
>  diff.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/diff.c b/diff.c
> index ef7159968b..b7c837aca8 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -6455,9 +6455,16 @@ static void diff_free_ignore_regex(struct diff_options *options)
>
>  void diff_free(struct diff_options *options)
>  {
> +	static struct diff_options *prev_options_with_pathspec;
> +
>  	if (options->no_free)
>  		return;
>
> +	if (prev_options_with_pathspec == options && !options->pathspec.nr)
> +		BUG("reused struct diff_options, potentially lost pathspec");
> +	if (options->pathspec.nr)
> +		prev_options_with_pathspec = options;

This can report a false positive if a diffopt is reused with different
pathspecs, and one of them is empty (match all).  Which could be countered
by using a fresh diffopt every time (e.g. pushing it into a loop).

> +
>  	diff_free_file(options);
>  	diff_free_ignore_regex(options);
>  	clear_pathspec(&options->pathspec);
Junio C Hamano April 27, 2022, 8:03 p.m. UTC | #3
René Scharfe <l.s.r@web.de> writes:

>> +	if (prev_options_with_pathspec == options && !options->pathspec.nr)
>> +		BUG("reused struct diff_options, potentially lost pathspec");
>> +	if (options->pathspec.nr)
>> +		prev_options_with_pathspec = options;
>
> This can report a false positive if a diffopt is reused with different
> pathspecs, and one of them is empty (match all).  Which could be countered
> by using a fresh diffopt every time (e.g. pushing it into a loop).

The only use case to reset pathspec of a diffopt during iteration I
can think of is the hacky[*] version of "git log --follow" where the
pathspec is swapped when a rename of a single path being followed is
detected.

    Side note: hacky because the way it swaps a single pathspec upon
    seeing one rename means it does not work in a mergy-branchy
    history where one branch renames and there are still commits
    that need to be explored on the other branch that had the path
    under its original name.
diff mbox series

Patch

diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c
index 0e0ac1f167..116097a404 100644
--- a/builtin/diff-tree.c
+++ b/builtin/diff-tree.c
@@ -195,6 +195,7 @@  int cmd_diff_tree(int argc, const char **argv, const char *prefix)
 		int saved_dcctc = 0;
 
 		opt->diffopt.rotate_to_strict = 0;
+		opt->diffopt.no_free = 1;
 		if (opt->diffopt.detect_rename) {
 			if (!the_index.cache)
 				repo_read_index(the_repository);
@@ -217,6 +218,8 @@  int cmd_diff_tree(int argc, const char **argv, const char *prefix)
 		}
 		opt->diffopt.degraded_cc_to_c = saved_dcctc;
 		opt->diffopt.needed_rename_limit = saved_nrl;
+		opt->diffopt.no_free = 0;
+		diff_free(&opt->diffopt);
 	}
 
 	return diff_result_code(&opt->diffopt, 0);
diff --git a/log-tree.c b/log-tree.c
index 25165e2a91..f8c18fd8b9 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -1098,6 +1098,7 @@  int log_tree_commit(struct rev_info *opt, struct commit *commit)
 	opt->loginfo = &log;
 	opt->diffopt.no_free = 1;
 
+	/* NEEDSWORK: no restoring of no_free?  Why? */
 	if (opt->line_level_traverse)
 		return line_log_print(opt, commit);
 
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 750aee17ea..628b01f355 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -542,6 +542,20 @@  test_expect_success 'diff-tree --stdin with log formatting' '
 	test_cmp expect actual
 '
 
+test_expect_success 'diff-tree --stdin with pathspec' '
+	cat >expect <<-EOF &&
+	Third
+
+	dir/sub
+	Second
+
+	dir/sub
+	EOF
+	git rev-list master^ |
+	git diff-tree -r --stdin --name-only --format=%s dir >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'diff -I<regex>: setup' '
 	git checkout master &&
 	test_seq 50 >file0 &&