diff mbox series

2.36 format-patch regression fix

Message ID c36896a1-6247-123b-4fa3-b7eb24af1897@web.de (mailing list archive)
State New, archived
Headers show
Series 2.36 format-patch regression fix | expand

Commit Message

René Scharfe April 30, 2022, 10:32 a.m. UTC
e900d494dc (diff: add an API for deferred freeing, 2021-02-11) added a
way to allow reusing diffopts: the no_free bit.  244c27242f (diff.[ch]:
have diff_free() call clear_pathspec(opts.pathspec), 2022-02-16) made
that mechanism mandatory.

git format-patch only sets no_free when --output is given, causing it to
forget pathspecs after the first commit.  Set no_free unconditionally
instead.

The existing test was unable to detect this breakage because it checks
stderr for the absence of a certain string, but format-patch writes to
stdout.  Also the test was not checking the case of one commit modifying
multiple files and a pathspec limiting the diff.  Replace it with a more
thorough one.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/log.c           |  9 ++-------
 t/t4014-format-patch.sh | 33 +++++++++++++++++++++++++++++++--
 2 files changed, 33 insertions(+), 9 deletions(-)

--
2.35.3

Comments

Carlo Marcelo Arenas Belón April 30, 2022, 4:32 p.m. UTC | #1
On Sat, Apr 30, 2022 at 12:32:44PM +0200, René Scharfe wrote:
> e900d494dc (diff: add an API for deferred freeing, 2021-02-11) added a
> way to allow reusing diffopts: the no_free bit.  244c27242f (diff.[ch]:
> have diff_free() call clear_pathspec(opts.pathspec), 2022-02-16) made
> that mechanism mandatory.
> 
> git format-patch only sets no_free when --output is given, causing it to
> forget pathspecs after the first commit.  Set no_free unconditionally
> instead.

I remember when I saw the first commit long ago, and thought; well that is
very round about way to reintroduce the UNLEAK removal that might have made
it visible.

Haven't looked too closely, but considering that we were warned[1] the
interface was hard to use and might cause problems later and it did.

wouldn't it a better and more secure solution to UNLEAK and remove all this
code, at least until it could be refactored cleanly, of course?

Carlo

[1] https://lore.kernel.org/git/YCUFNVj7qlt9wzlX@coredump.intra.peff.net/
René Scharfe May 1, 2022, 9:35 a.m. UTC | #2
Am 30.04.22 um 18:32 schrieb Carlo Marcelo Arenas Belón:
> On Sat, Apr 30, 2022 at 12:32:44PM +0200, René Scharfe wrote:
>> e900d494dc (diff: add an API for deferred freeing, 2021-02-11) added a
>> way to allow reusing diffopts: the no_free bit.  244c27242f (diff.[ch]:
>> have diff_free() call clear_pathspec(opts.pathspec), 2022-02-16) made
>> that mechanism mandatory.
>>
>> git format-patch only sets no_free when --output is given, causing it to
>> forget pathspecs after the first commit.  Set no_free unconditionally
>> instead.
>
> I remember when I saw the first commit long ago, and thought; well that is
> very round about way to reintroduce the UNLEAK removal that might have made
> it visible.
>
> Haven't looked too closely, but considering that we were warned[1] the
> interface was hard to use and might cause problems later and it did.
>
> wouldn't it a better and more secure solution to UNLEAK and remove all this
> code, at least until it could be refactored cleanly, of course?

Silently self-destructing pathspecs are a safety hazard indeed.

no_free also affects freeing ignore_regex and parseopts, and even
closing the output file.  I don't know about the file, but leaking the
first two is harmless.  So removing the flag is safe as long as we make
sure the output file is closed as needed.

A safe diff_free() would only be called a particular diffopt once, when
it's no longer needed.  It could check for reuse by setting a flag the
first time, like in the patch below.  1426 tests in 163 test scripts
fail for me with it applied on top of the regression fixes from this
thread.

Removing the diff_free() calls from diff.c::diff_flush() and
log-tree.c::log_tree_commit() reduces this to just one or two in t7527
(seems to be flaky).  Perhaps this is still salvageable?

>
> Carlo
>
> [1] https://lore.kernel.org/git/YCUFNVj7qlt9wzlX@coredump.intra.peff.net/


---
 diff.c | 3 +++
 diff.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/diff.c b/diff.c
index ef7159968b..01296829b5 100644
--- a/diff.c
+++ b/diff.c
@@ -6458,10 +6458,13 @@ void diff_free(struct diff_options *options)
 	if (options->no_free)
 		return;

+	if (options->is_dead)
+		BUG("double diff_free() on %p", (void *)options);
 	diff_free_file(options);
 	diff_free_ignore_regex(options);
 	clear_pathspec(&options->pathspec);
 	FREE_AND_NULL(options->parseopts);
+	options->is_dead = 1;
 }

 void diff_flush(struct diff_options *options)
diff --git a/diff.h b/diff.h
index 8ae18e5ab1..c31d32ba19 100644
--- a/diff.h
+++ b/diff.h
@@ -398,6 +398,7 @@ struct diff_options {
 	struct strmap *additional_path_headers;

 	int no_free;
+	int is_dead;
 };

 unsigned diff_filter_bit(char status);
--
2.35.3
Ævar Arnfjörð Bjarmason May 20, 2022, 3:23 p.m. UTC | #3
On Sun, May 01 2022, René Scharfe wrote:

> Am 30.04.22 um 18:32 schrieb Carlo Marcelo Arenas Belón:
>> On Sat, Apr 30, 2022 at 12:32:44PM +0200, René Scharfe wrote:
>>> e900d494dc (diff: add an API for deferred freeing, 2021-02-11) added a
>>> way to allow reusing diffopts: the no_free bit.  244c27242f (diff.[ch]:
>>> have diff_free() call clear_pathspec(opts.pathspec), 2022-02-16) made
>>> that mechanism mandatory.
>>>
>>> git format-patch only sets no_free when --output is given, causing it to
>>> forget pathspecs after the first commit.  Set no_free unconditionally
>>> instead.
>>
>> I remember when I saw the first commit long ago, and thought; well that is
>> very round about way to reintroduce the UNLEAK removal that might have made
>> it visible.
>>
>> Haven't looked too closely, but considering that we were warned[1] the
>> interface was hard to use and might cause problems later and it did.
>>
>> wouldn't it a better and more secure solution to UNLEAK and remove all this
>> code, at least until it could be refactored cleanly, of course?
>
> Silently self-destructing pathspecs are a safety hazard indeed.
>
> no_free also affects freeing ignore_regex and parseopts, and even
> closing the output file.  I don't know about the file, but leaking the
> first two is harmless.  So removing the flag is safe as long as we make
> sure the output file is closed as needed.
>
> A safe diff_free() would only be called a particular diffopt once, when
> it's no longer needed.  It could check for reuse by setting a flag the
> first time, like in the patch below.  1426 tests in 163 test scripts
> fail for me with it applied on top of the regression fixes from this
> thread.
>
> Removing the diff_free() calls from diff.c::diff_flush() and
> log-tree.c::log_tree_commit() reduces this to just one or two in t7527
> (seems to be flaky).  Perhaps this is still salvageable?

Thanks both for handling this, and sorry that I was away at the time.

AFAICT the current status in this area is that with 2cc712324d5 (Merge
branch 'rs/fast-export-pathspec-fix', 2022-05-04) and 5048b20d1c2 (Merge
branch 'rs/format-patch-pathspec-fix', 2022-05-04) merged the known bugs
related to this have been fixed, along with 3da993f2e63 (Merge branch
'jc/diff-tree-stdin-fix', 2022-04-28).

"This" being my e900d494dcf (diff: add an API for deferred freeing,
2021-02-11), and 244c27242f4 (diff.[ch]: have diff_free() call
clear_pathspec(opts.pathspec), 2022-02-16) for the diff-tree case.

Not coincidentally around the same time my ab/plug-leak-in-revisions got
un-marked for "next" from [1] to [2], and I'm looking for a path forward
for this whole thing...

1. https://lore.kernel.org/git/xmqqbkwyz78z.fsf@gitster.g/
2. https://lore.kernel.org/git/xmqqwnfcskw2.fsf@gitster.g/

>> [1] https://lore.kernel.org/git/YCUFNVj7qlt9wzlX@coredump.intra.peff.net/
>
>
> ---
>  diff.c | 3 +++
>  diff.h | 1 +
>  2 files changed, 4 insertions(+)
>
> diff --git a/diff.c b/diff.c
> index ef7159968b..01296829b5 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -6458,10 +6458,13 @@ void diff_free(struct diff_options *options)
>  	if (options->no_free)
>  		return;
>
> +	if (options->is_dead)
> +		BUG("double diff_free() on %p", (void *)options);
>  	diff_free_file(options);
>  	diff_free_ignore_regex(options);
>  	clear_pathspec(&options->pathspec);
>  	FREE_AND_NULL(options->parseopts);
> +	options->is_dead = 1;
>  }
>
>  void diff_flush(struct diff_options *options)
> diff --git a/diff.h b/diff.h
> index 8ae18e5ab1..c31d32ba19 100644
> --- a/diff.h
> +++ b/diff.h
> @@ -398,6 +398,7 @@ struct diff_options {
>  	struct strmap *additional_path_headers;
>
>  	int no_free;
> +	int is_dead;
>  };
>
>  unsigned diff_filter_bit(char status);

Yes, that's quite scary. It shows that in general diff_free() isn't
reentrant-safe, but that we do call it repeatedly again.

However if we patch it like this instead we can see that (gulp!) we just
barely putter along, according to our test coverage at least. I.e. we
don't end up calling the parts of it that would be unsafe to call again:
	
	diff --git a/diff.c b/diff.c
	index ef7159968b6..0fe8bc5fade 100644
	--- a/diff.c
	+++ b/diff.c
	@@ -6438,14 +6438,23 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
	 
	 static void diff_free_file(struct diff_options *options)
	 {
	-	if (options->close_file)
	+	if (options->close_file) {
	+		if (options->is_dead)
	+			BUG("double diff_free() on %p", (void *)options);
	 		fclose(options->file);
	+	}
	 }
	 
	 static void diff_free_ignore_regex(struct diff_options *options)
	 {
	 	int i;
	 
	+	if (!options->ignore_regex_nr && !options->ignore_regex)
	+		return;
	+
	+	if (options->is_dead)
	+		BUG("double diff_free() on %p", (void *)options);
	+
	 	for (i = 0; i < options->ignore_regex_nr; i++) {
	 		regfree(options->ignore_regex[i]);
	 		free(options->ignore_regex[i]);
	@@ -6462,6 +6471,7 @@ void diff_free(struct diff_options *options)
	 	diff_free_ignore_regex(options);
	 	clear_pathspec(&options->pathspec);
	 	FREE_AND_NULL(options->parseopts);
	+	options->is_dead = 1;
	 }
	 
	 void diff_flush(struct diff_options *options)
	@@ -6560,7 +6570,6 @@ void diff_flush(struct diff_options *options)
	 free_queue:
	 	free(q->queue);
	 	DIFF_QUEUE_CLEAR(q);
	-	diff_free(options);
	 
	 	/*
	 	 * Report the content-level differences with HAS_CHANGES;
	diff --git a/diff.h b/diff.h
	index 8ae18e5ab1e..c31d32ba192 100644
	--- a/diff.h
	+++ b/diff.h
	@@ -398,6 +398,7 @@ struct diff_options {
	 	struct strmap *additional_path_headers;
	 
	 	int no_free;
	+	int is_dead;
	 };
	 
	 unsigned diff_filter_bit(char status);

I'd really like to fix this properly, but AFAICT the best way to do that
is to:

 A. Get ab/plug-leak-in-revisions merged down
 B. Fix diff_free() on top of that

Before I knew of these bugs I'd already written patches to get rid of
that whole "no_free" business. In retrospect it was completely the wrong
thing to do, but in hindsight something like it was needed to fix those
leaks as long as we didn't have a revisions_release().

I.e. the tricky cases where I ended up needing to set "no_free" are ones
where all the complexity neatly goes away once we start releasing the
"struct rev_info" properly, as it contains the data we'd like to
diff_free() at the end.

How does that plan sound, and is there anything I've missed?

I could also re-roll ab/plug-leak-in-revisions to include a fix that
makes it safe in the interim, i.e.:

	diff --git a/diff.c b/diff.c
	index ef7159968b6..2bc7ee81e4e 100644
	--- a/diff.c
	+++ b/diff.c
	@@ -6438,8 +6438,12 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
	 
	 static void diff_free_file(struct diff_options *options)
	 {
	-	if (options->close_file)
	+	if (options->close_file) {
	 		fclose(options->file);
	+
	+		options->file = NULL;
	+		options->close_file = 0;
	+	}
	 }
	 
	 static void diff_free_ignore_regex(struct diff_options *options)
	@@ -6450,7 +6454,8 @@ static void diff_free_ignore_regex(struct diff_options *options)
	 		regfree(options->ignore_regex[i]);
	 		free(options->ignore_regex[i]);
	 	}
	-	free(options->ignore_regex);
	+	options->ignore_regex_nr = 0;
	+	FREE_AND_NULL(options->ignore_regex);
	 }
	 
	 void diff_free(struct diff_options *options)

But as long as we're not adding new API users of it until the follow-up
after ab/plug-leak-in-revisions we should also be safe for now, but
perhaps it's prudent to do it anyway.

I *could* potentially produce a shorter series than
ab/plug-leak-in-revisions to narrowly try to remove "no_free" from
diff.c first, but it would basically need to first introduce a
release_revisions(), and without the other revisions API leaks being
fixed testing it would be much tricker. I'd really prefer not to do
that.

How does all that sound?
Junio C Hamano May 20, 2022, 5:23 p.m. UTC | #4
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> AFAICT the current status in this area is that with 2cc712324d5 (Merge
> branch 'rs/fast-export-pathspec-fix', 2022-05-04) and 5048b20d1c2 (Merge
> branch 'rs/format-patch-pathspec-fix', 2022-05-04) merged the known bugs
> related to this have been fixed, along with 3da993f2e63 (Merge branch
> 'jc/diff-tree-stdin-fix', 2022-04-28).

There is another possible known breakage around "diff -I --cc"
reported, no?

https://lore.kernel.org/git/a6a14213-bc82-d6fb-43dd-5a423c40a4f8@web.de/

> Not coincidentally around the same time my ab/plug-leak-in-revisions got
> un-marked for "next" from [1] to [2], and I'm looking for a path forward
> for this whole thing...

I think this was not because it had known bugs but only because we
wanted to avoid distractions and too much churns in the tree.

I think it is a good time to reactivate it; Phillip Wood took a
serious look at the latest round, if I am reading [*] correctly.

* https://lore.kernel.org/git/2cc2d026-1496-1ed9-838b-6fdf8cfba285@gmail.com/

Thanks.
diff mbox series

Patch

diff --git a/builtin/log.c b/builtin/log.c
index c211d66d1d..9acc130594 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1883,6 +1883,7 @@  int cmd_format_patch(int argc, const char **argv, const char *prefix)
 	rev.diff = 1;
 	rev.max_parents = 1;
 	rev.diffopt.flags.recursive = 1;
+	rev.diffopt.no_free = 1;
 	rev.subject_prefix = fmt_patch_subject_prefix;
 	memset(&s_r_opt, 0, sizeof(s_r_opt));
 	s_r_opt.def = "HEAD";
@@ -2008,13 +2009,7 @@  int cmd_format_patch(int argc, const char **argv, const char *prefix)

 	if (use_stdout) {
 		setup_pager();
-	} else if (rev.diffopt.close_file) {
-		/*
-		 * The diff code parsed --output; it has already opened the
-		 * file, but we must instruct it not to close after each diff.
-		 */
-		rev.diffopt.no_free = 1;
-	} else {
+	} else if (!rev.diffopt.close_file) {
 		int saved;

 		if (!output_directory)
diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
index 7dc5a5c736..fbec8ad2ef 100755
--- a/t/t4014-format-patch.sh
+++ b/t/t4014-format-patch.sh
@@ -926,11 +926,40 @@  test_expect_success 'format-patch --numstat should produce a patch' '
 '

 test_expect_success 'format-patch -- <path>' '
-	git format-patch main..side -- file 2>error &&
-	! grep "Use .--" error
+	rm -f *.patch &&
+	git checkout -b pathspec main &&
+
+	echo file_a 1 >file_a &&
+	echo file_b 1 >file_b &&
+	git add file_a file_b &&
+	git commit -m pathspec_initial &&
+
+	echo file_a 2 >>file_a &&
+	git add file_a &&
+	git commit -m pathspec_a &&
+
+	echo file_b 2 >>file_b &&
+	git add file_b &&
+	git commit -m pathspec_b &&
+
+	echo file_a 3 >>file_a &&
+	echo file_b 3 >>file_b &&
+	git add file_a file_b &&
+	git commit -m pathspec_ab &&
+
+	cat >expect <<-\EOF &&
+	0001-pathspec_initial.patch
+	0002-pathspec_a.patch
+	0003-pathspec_ab.patch
+	EOF
+
+	git format-patch main..pathspec -- file_a >output &&
+	test_cmp expect output &&
+	! grep file_b *.patch
 '

 test_expect_success 'format-patch --ignore-if-in-upstream HEAD' '
+	git checkout side &&
 	git format-patch --ignore-if-in-upstream HEAD
 '