diff mbox series

diff: fix a segfault in >2 tree -I<regex> and --output=<file>

Message ID patch-1.1-f7fd645468c-20220523T182954Z-avarab@gmail.com (mailing list archive)
State New
Headers show
Series diff: fix a segfault in >2 tree -I<regex> and --output=<file> | expand

Commit Message

Ævar Arnfjörð Bjarmason May 23, 2022, 6:31 p.m. UTC
Fix a regression in c45dc9cf30 (diff: plug memory leak from regcomp()
on {log,diff} -I, 2021-02-11), as noted in [1] there was a logic error
where we'd free the regex too soon.

Now we'll ensure that diff_free() can be called repeatedly
instead. We'd ultimately like to do away with the "no_free" confusion
surrounding it, and to attempt to free() things only once, as outlined
in [2]. But in the meantime this will fix the segfault.

Since the rest of diff_free() was already safe re-invoke this only
affected the -I<regex> and --output=<file> options. In both cases our
behavior before wasn't very sensible. When producing a combined diff
we'll go through combined-diff.c, which doesn't handle many of the
options that the corresponding diff.c codepaths do.

Thus we're here testing that -I<regex> is ignored in this case, and
likewise for --output=<file>, but since this is what we were doing
before c45dc9cf30 let's accept it for now.

1. https://lore.kernel.org/git/a6a14213-bc82-d6fb-43dd-5a423c40a4f8@web.de/
2. https://lore.kernel.org/git/220520.86pmk81a9z.gmgdl@evledraar.gmail.com/

Reported-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

On Sat, May 14 2022, René Scharfe wrote:

> Hi all,
>
> git diff segfaults when it's asked to produce a combined diff and ignore
> certain lines with --ignore-matching-lines/-I, e.g.:
>
>    $ git diff -I DEF_VER v2.33.3 v2.33.3^@
>    zsh: segmentation fault  ./git-diff -I DEF_VER v2.33.3 v2.33.3^@

 diff.c                  |  9 ++++++---
 t/t4013-diff-various.sh | 15 +++++++++++++++
 2 files changed, 21 insertions(+), 3 deletions(-)

Comments

Junio C Hamano May 23, 2022, 8:08 p.m. UTC | #1
Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Fix a regression in c45dc9cf30 (diff: plug memory leak from regcomp()
> on {log,diff} -I, 2021-02-11), as noted in [1] there was a logic error
> where we'd free the regex too soon.
>
> Now we'll ensure that diff_free() can be called repeatedly
> instead. We'd ultimately like to do away with the "no_free" confusion
> surrounding it, and to attempt to free() things only once, as outlined
> in [2]. But in the meantime this will fix the segfault.

Hmph, repeated calls to diff_free_file() now closes the file upon
the first call.  I would have expected that such a resource would be
released when all the references go away, i.e. upon the last call.
The same thing for the ignore-regex array.

Clearing the "options->close_file" bit, and using FREE_AND_NULL(),
would hide a breakage that could be caused by this change, doesn't
it, because any use-after-release will say "ah, no need to close the
file" and "oh, there is no regex".  The former is not so worrisome,
but the latter may be---we may no longer have regex because the
first call to diff_free_ignore_regex() has cleared it and the code
that wants to use the regex, if exists, would happily say "oh, there
is no regex", instead of exposing the use-after-release breakage to
a segfault.

> Thus we're here testing that -I<regex> is ignored in this case, and
> likewise for --output=<file>, but since this is what we were doing
> before c45dc9cf30 let's accept it for now.

It is true that the result of applying this patch is equivalent to
c45dc9cf (diff: plug memory leak from regcomp() on {log,diff} -I,
2021-02-11), but doesn't that merely point at the commit as the
source of behaviour breakage?  With ignore-regex leaking before that
commit, wouldn't we have been using ignore-regex with combined diff
machinery?

Sorry, but I am failing to convince myself that this is not sweep
the issue under the rug.

> 1. https://lore.kernel.org/git/a6a14213-bc82-d6fb-43dd-5a423c40a4f8@web.de/
> 2. https://lore.kernel.org/git/220520.86pmk81a9z.gmgdl@evledraar.gmail.com/
>
> Reported-by: René Scharfe <l.s.r@web.de>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>
> On Sat, May 14 2022, René Scharfe wrote:
>
>> Hi all,
>>
>> git diff segfaults when it's asked to produce a combined diff and ignore
>> certain lines with --ignore-matching-lines/-I, e.g.:
>>
>>    $ git diff -I DEF_VER v2.33.3 v2.33.3^@
>>    zsh: segmentation fault  ./git-diff -I DEF_VER v2.33.3 v2.33.3^@
>
>  diff.c                  |  9 ++++++---
>  t/t4013-diff-various.sh | 15 +++++++++++++++
>  2 files changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index e71cf758861..183c9f21305 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -6432,8 +6432,10 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o)
>  
>  static void diff_free_file(struct diff_options *options)
>  {
> -	if (options->close_file)
> -		fclose(options->file);
> +	if (!options->close_file)
> +		return;
> +	options->close_file = 0;
> +	fclose(options->file);
>  }
>  
>  static void diff_free_ignore_regex(struct diff_options *options)
> @@ -6444,7 +6446,8 @@ static void diff_free_ignore_regex(struct diff_options *options)
>  		regfree(options->ignore_regex[i]);
>  		free(options->ignore_regex[i]);
>  	}
> -	free(options->ignore_regex);
> +	options->ignore_regex_nr = 0;
> +	FREE_AND_NULL(options->ignore_regex);
>  }
>  
>  void diff_free(struct diff_options *options)
> diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
> index 056e922164d..b556d185f53 100755
> --- a/t/t4013-diff-various.sh
> +++ b/t/t4013-diff-various.sh
> @@ -614,4 +614,19 @@ test_expect_success 'diff -I<regex>: detect malformed regex' '
>  	test_i18ngrep "invalid regex given to -I: " error
>  '
>  
> +test_expect_success 'diff -I<regex>: combined diff does not segfault' '
> +	revs="HEAD~2 HEAD~ HEAD" &&
> +	git diff $revs >expect &&
> +	git diff -I . $revs >actual &&
> +	test_cmp expect actual

And indeed this casts such a broken behaviour in stone.

> +'
> +
> +test_expect_success 'diff --output=<file>: combined diff does not segfault' '
> +	revs="HEAD~2 HEAD~ HEAD" &&
> +	git diff --output=expect.file $revs >expect.out &&
> +	git diff $revs >actual &&
> +	test_cmp expect.out actual &&
> +	test_must_be_empty expect.file

So is this one.

> +'
> +
>  test_done
Ævar Arnfjörð Bjarmason May 24, 2022, 11:38 a.m. UTC | #2
On Mon, May 23 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Fix a regression in c45dc9cf30 (diff: plug memory leak from regcomp()
>> on {log,diff} -I, 2021-02-11), as noted in [1] there was a logic error
>> where we'd free the regex too soon.
>>
>> Now we'll ensure that diff_free() can be called repeatedly
>> instead. We'd ultimately like to do away with the "no_free" confusion
>> surrounding it, and to attempt to free() things only once, as outlined
>> in [2]. But in the meantime this will fix the segfault.
>
> Hmph, repeated calls to diff_free_file() now closes the file upon
> the first call.  I would have expected that such a resource would be
> released when all the references go away, i.e. upon the last call.
> The same thing for the ignore-regex array.

Yes, that would be much more sensible. But as noted:

    When producing a combined diff we'll go through combined-diff.c,
    which doesn't handle many of the options that the corresponding
    diff.c codepaths do.

I.e. the "right" thing to do in this case would require a much more
involved fix. We've somehow ended up not supporting --output=<file>, -I
and probably many other options in the combined-diff mode, which both in
testing and in this part of the implementation seems to have become an
afterthought.

So before any changes of mine we silently ignore those options, and in
those particular cases the "right" thing to do if we're not growing new
features would probably be to error out early if they were provided in
the combined diff mode.

But as a minimal fix just tailoring diff_free() towards the
not-combined-diff.c case seems to be the smallest & most correct thing
to do for now to address the segfault & the immediate issue.

> Clearing the "options->close_file" bit, and using FREE_AND_NULL(),
> would hide a breakage that could be caused by this change, doesn't
> it, because any use-after-release will say "ah, no need to close the
> file" and "oh, there is no regex".  The former is not so worrisome,
> but the latter may be---we may no longer have regex because the
> first call to diff_free_ignore_regex() has cleared it and the code
> that wants to use the regex, if exists, would happily say "oh, there
> is no regex", instead of exposing the use-after-release breakage to
> a segfault.

Yes, this wouldn't make much sense if we were supporting the file output
and -I in the combined-diff.c case, but AFAICT the two cases are:


 1. The "normal" diff case, where we set those up once, and diff_free()
    them once.

 2. The "combined-diff.c" case, where we might call diff_free() N times,
    but it's all to produce the diff itself, not for those options.


>> Thus we're here testing that -I<regex> is ignored in this case, and
>> likewise for --output=<file>, but since this is what we were doing
>> before c45dc9cf30 let's accept it for now.
>
> It is true that the result of applying this patch is equivalent to
> c45dc9cf (diff: plug memory leak from regcomp() on {log,diff} -I,
> 2021-02-11), but doesn't that merely point at the commit as the
> source of behaviour breakage?  With ignore-regex leaking before that
> commit, wouldn't we have been using ignore-regex with combined diff
> machinery?

No, because -I never did anything with the combined diff machinery,
neither did --output.

> Sorry, but I am failing to convince myself that this is not sweep
> the issue under the rug.

I think that's a fair summary, much of it was already under the rug,
we're sweeping some of the remainin parts under it :)

I think that whole combined-diff interaction really needs to be fix, not
just for the diff_free() case, but e.g. we should either error out or
support options that we're silently ignoring now.

But as noted in
https://lore.kernel.org/git/220520.86pmk81a9z.gmgdl@evledraar.gmail.com/
I do have patches queued up locally that form a better basis for fixing
these issues. I.e. once we fix this segfault and have
release_revisions() it'll be easy to get rid of that "no_free" case in
diff_free().

>> [...]
>>  void diff_free(struct diff_options *options)
>> diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
>> index 056e922164d..b556d185f53 100755
>> --- a/t/t4013-diff-various.sh
>> +++ b/t/t4013-diff-various.sh
>> @@ -614,4 +614,19 @@ test_expect_success 'diff -I<regex>: detect malformed regex' '
>>  	test_i18ngrep "invalid regex given to -I: " error
>>  '
>>  
>> +test_expect_success 'diff -I<regex>: combined diff does not segfault' '
>> +	revs="HEAD~2 HEAD~ HEAD" &&
>> +	git diff $revs >expect &&
>> +	git diff -I . $revs >actual &&
>> +	test_cmp expect actual
>
> And indeed this casts such a broken behaviour in stone.
>
>> +'
>> +
>> +test_expect_success 'diff --output=<file>: combined diff does not segfault' '
>> +	revs="HEAD~2 HEAD~ HEAD" &&
>> +	git diff --output=expect.file $revs >expect.out &&
>> +	git diff $revs >actual &&
>> +	test_cmp expect.out actual &&
>> +	test_must_be_empty expect.file
>
> So is this one.

I was on the fence about adding these tests, since I expected you to
comment on this aspect of them. I.e. we could just ignore the output
here and narrowly see if we segfault.

But since we had no tests at all for this before, and intentional or not
this behavior of combined-diff is long-standing behavior (that nobody
seems to have noticed or cared about) I think it's good to have tests
that check the "expected" (as in what we did before my c45dc9cf30)
output.
Junio C Hamano May 24, 2022, 7:38 p.m. UTC | #3
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I.e. the "right" thing to do in this case would require a much more
> involved fix. We've somehow ended up not supporting --output=<file>, -I
> and probably many other options in the combined-diff mode, which both in
> testing and in this part of the implementation seems to have become an
> afterthought.

OK, a hopefully final question.

How much less involved is it to add a new code (without doing
anything in this patch) to detect and die on the combination of
combined-diff with these two options, so that we can document the
fact that we do not support them?  It would give us much better way
forward than leaving the command silently ignore and give result
that is not in line with what was asked, wouldn't it?  That way, the
much more involved "fix" will turn into a change to add a missing
feature.

Thanks.
Ævar Arnfjörð Bjarmason May 24, 2022, 8:17 p.m. UTC | #4
On Tue, May 24 2022, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> I.e. the "right" thing to do in this case would require a much more
>> involved fix. We've somehow ended up not supporting --output=<file>, -I
>> and probably many other options in the combined-diff mode, which both in
>> testing and in this part of the implementation seems to have become an
>> afterthought.
>
> OK, a hopefully final question.
>
> How much less involved is it to add a new code (without doing
> anything in this patch)

...yeah, I think for this one it makes sense to narrowly focus on the
segfault...

> to detect and die on the combination of
> combined-diff with these two options, so that we can document the
> fact that we do not support them?  It would give us much better way
> forward than leaving the command silently ignore and give result
> that is not in line with what was asked, wouldn't it?  That way, the
> much more involved "fix" will turn into a change to add a missing
> feature.

I think not much, it's rather trivial for the case where we invoke "git
diff", I.e. just adding something to the "builtin_diff_combined()"
branch in builtin/diff.c to detect these two cases specifically.

I haven't looked in any depth into how we might reach code in
combine-diff.c through other means, and if any of it can set these two
indirectly somewhere else (i.e. other things that take diff options).

I also wonder if I'm just wrong in my assessment that it's a Bad Thing
that we take some of these without ever doing anything with them in some
modes, e.g.:

	git log --oneline -I foo

This will never do anything with that "-I foo" by definition, but would
as soon as you add -p, should we error without -p (or other diff-showing
options).

The same goes for range-diff, format-patch, --remerge-diff and any
number of other things where we take the full set of options, but only
do something with a limited subset of them.

It is helpful in some cases if we were more anal about it, e.g. when I
was wondering why -I didn't do anything with the combined diff, but also
handy for scripting and one-liners if you can tweak the command-line
back & forth without it being so strict.

So I don't know. Maybe I'm just trying to talk myself out of pulling on
that (bound to be long) thread, but I'm coming more around to this just
being a non-issue beyond the narrow and needed fix for diff_free() in
particular.

I.e. the more general approach of chasing down options that don't do
anything for a given "diff mode". We might still want to error on some
particular ones, such as -I with the combined diff (but not with
--oneline, or whatever).
René Scharfe June 18, 2022, 11:12 a.m. UTC | #5
Am 24.05.22 um 22:17 schrieb Ævar Arnfjörð Bjarmason:
>
> On Tue, May 24 2022, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>
>>> I.e. the "right" thing to do in this case would require a much more
>>> involved fix. We've somehow ended up not supporting --output=<file>, -I
>>> and probably many other options in the combined-diff mode, which both in
>>> testing and in this part of the implementation seems to have become an
>>> afterthought.
>>
>> OK, a hopefully final question.
>>
>> How much less involved is it to add a new code (without doing
>> anything in this patch)
>
> ...yeah, I think for this one it makes sense to narrowly focus on the
> segfault...
>
>> to detect and die on the combination of
>> combined-diff with these two options, so that we can document the
>> fact that we do not support them?  It would give us much better way
>> forward than leaving the command silently ignore and give result
>> that is not in line with what was asked, wouldn't it?  That way, the
>> much more involved "fix" will turn into a change to add a missing
>> feature.
>
> I think not much, it's rather trivial for the case where we invoke "git
> diff", I.e. just adding something to the "builtin_diff_combined()"
> branch in builtin/diff.c to detect these two cases specifically.
>
> I haven't looked in any depth into how we might reach code in
> combine-diff.c through other means, and if any of it can set these two
> indirectly somewhere else (i.e. other things that take diff options).

So let's add those checks there.

> I also wonder if I'm just wrong in my assessment that it's a Bad Thing
> that we take some of these without ever doing anything with them in some
> modes, e.g.:
>
> 	git log --oneline -I foo
>
> This will never do anything with that "-I foo" by definition, but would
> as soon as you add -p, should we error without -p (or other diff-showing
> options).

Which definition?  The documentation says:

       -I<regex>, --ignore-matching-lines=<regex>
           Ignore changes whose all lines match <regex>. This option may be specified more than once.

That sounds to me like it would affect history simplification, and thus
git log --oneline.  (Which seems expensive, but that's a different
concern.)  So based on that I'd expect at least a warning if -I is
ignored.

> The same goes for range-diff, format-patch, --remerge-diff and any
> number of other things where we take the full set of options, but only
> do something with a limited subset of them.
>
> It is helpful in some cases if we were more anal about it, e.g. when I
> was wondering why -I didn't do anything with the combined diff, but also
> handy for scripting and one-liners if you can tweak the command-line
> back & forth without it being so strict.
>
> So I don't know. Maybe I'm just trying to talk myself out of pulling on
> that (bound to be long) thread, but I'm coming more around to this just
> being a non-issue beyond the narrow and needed fix for diff_free() in
> particular.
>
> I.e. the more general approach of chasing down options that don't do
> anything for a given "diff mode". We might still want to error on some
> particular ones, such as -I with the combined diff (but not with
> --oneline, or whatever).

Supporting all combinations would be ideal.  Reporting unsupported
combinations would be the next best thing.  I wonder if we passed the
point of having so many options for e.g. git log that assessing all
of their pairings has become impractical, though. :-/

René
diff mbox series

Patch

diff --git a/diff.c b/diff.c
index e71cf758861..183c9f21305 100644
--- a/diff.c
+++ b/diff.c
@@ -6432,8 +6432,10 @@  static void diff_flush_patch_all_file_pairs(struct diff_options *o)
 
 static void diff_free_file(struct diff_options *options)
 {
-	if (options->close_file)
-		fclose(options->file);
+	if (!options->close_file)
+		return;
+	options->close_file = 0;
+	fclose(options->file);
 }
 
 static void diff_free_ignore_regex(struct diff_options *options)
@@ -6444,7 +6446,8 @@  static void diff_free_ignore_regex(struct diff_options *options)
 		regfree(options->ignore_regex[i]);
 		free(options->ignore_regex[i]);
 	}
-	free(options->ignore_regex);
+	options->ignore_regex_nr = 0;
+	FREE_AND_NULL(options->ignore_regex);
 }
 
 void diff_free(struct diff_options *options)
diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh
index 056e922164d..b556d185f53 100755
--- a/t/t4013-diff-various.sh
+++ b/t/t4013-diff-various.sh
@@ -614,4 +614,19 @@  test_expect_success 'diff -I<regex>: detect malformed regex' '
 	test_i18ngrep "invalid regex given to -I: " error
 '
 
+test_expect_success 'diff -I<regex>: combined diff does not segfault' '
+	revs="HEAD~2 HEAD~ HEAD" &&
+	git diff $revs >expect &&
+	git diff -I . $revs >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'diff --output=<file>: combined diff does not segfault' '
+	revs="HEAD~2 HEAD~ HEAD" &&
+	git diff --output=expect.file $revs >expect.out &&
+	git diff $revs >actual &&
+	test_cmp expect.out actual &&
+	test_must_be_empty expect.file
+'
+
 test_done