Message ID | patch-1.1-f7fd645468c-20220523T182954Z-avarab@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | diff: fix a segfault in >2 tree -I<regex> and --output=<file> | expand |
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Fix a regression in c45dc9cf30 (diff: plug memory leak from regcomp() > on {log,diff} -I, 2021-02-11), as noted in [1] there was a logic error > where we'd free the regex too soon. > > Now we'll ensure that diff_free() can be called repeatedly > instead. We'd ultimately like to do away with the "no_free" confusion > surrounding it, and to attempt to free() things only once, as outlined > in [2]. But in the meantime this will fix the segfault. Hmph, repeated calls to diff_free_file() now closes the file upon the first call. I would have expected that such a resource would be released when all the references go away, i.e. upon the last call. The same thing for the ignore-regex array. Clearing the "options->close_file" bit, and using FREE_AND_NULL(), would hide a breakage that could be caused by this change, doesn't it, because any use-after-release will say "ah, no need to close the file" and "oh, there is no regex". The former is not so worrisome, but the latter may be---we may no longer have regex because the first call to diff_free_ignore_regex() has cleared it and the code that wants to use the regex, if exists, would happily say "oh, there is no regex", instead of exposing the use-after-release breakage to a segfault. > Thus we're here testing that -I<regex> is ignored in this case, and > likewise for --output=<file>, but since this is what we were doing > before c45dc9cf30 let's accept it for now. It is true that the result of applying this patch is equivalent to c45dc9cf (diff: plug memory leak from regcomp() on {log,diff} -I, 2021-02-11), but doesn't that merely point at the commit as the source of behaviour breakage? With ignore-regex leaking before that commit, wouldn't we have been using ignore-regex with combined diff machinery? Sorry, but I am failing to convince myself that this is not sweep the issue under the rug. > 1. https://lore.kernel.org/git/a6a14213-bc82-d6fb-43dd-5a423c40a4f8@web.de/ > 2. https://lore.kernel.org/git/220520.86pmk81a9z.gmgdl@evledraar.gmail.com/ > > Reported-by: René Scharfe <l.s.r@web.de> > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > --- > > On Sat, May 14 2022, René Scharfe wrote: > >> Hi all, >> >> git diff segfaults when it's asked to produce a combined diff and ignore >> certain lines with --ignore-matching-lines/-I, e.g.: >> >> $ git diff -I DEF_VER v2.33.3 v2.33.3^@ >> zsh: segmentation fault ./git-diff -I DEF_VER v2.33.3 v2.33.3^@ > > diff.c | 9 ++++++--- > t/t4013-diff-various.sh | 15 +++++++++++++++ > 2 files changed, 21 insertions(+), 3 deletions(-) > > diff --git a/diff.c b/diff.c > index e71cf758861..183c9f21305 100644 > --- a/diff.c > +++ b/diff.c > @@ -6432,8 +6432,10 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o) > > static void diff_free_file(struct diff_options *options) > { > - if (options->close_file) > - fclose(options->file); > + if (!options->close_file) > + return; > + options->close_file = 0; > + fclose(options->file); > } > > static void diff_free_ignore_regex(struct diff_options *options) > @@ -6444,7 +6446,8 @@ static void diff_free_ignore_regex(struct diff_options *options) > regfree(options->ignore_regex[i]); > free(options->ignore_regex[i]); > } > - free(options->ignore_regex); > + options->ignore_regex_nr = 0; > + FREE_AND_NULL(options->ignore_regex); > } > > void diff_free(struct diff_options *options) > diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh > index 056e922164d..b556d185f53 100755 > --- a/t/t4013-diff-various.sh > +++ b/t/t4013-diff-various.sh > @@ -614,4 +614,19 @@ test_expect_success 'diff -I<regex>: detect malformed regex' ' > test_i18ngrep "invalid regex given to -I: " error > ' > > +test_expect_success 'diff -I<regex>: combined diff does not segfault' ' > + revs="HEAD~2 HEAD~ HEAD" && > + git diff $revs >expect && > + git diff -I . $revs >actual && > + test_cmp expect actual And indeed this casts such a broken behaviour in stone. > +' > + > +test_expect_success 'diff --output=<file>: combined diff does not segfault' ' > + revs="HEAD~2 HEAD~ HEAD" && > + git diff --output=expect.file $revs >expect.out && > + git diff $revs >actual && > + test_cmp expect.out actual && > + test_must_be_empty expect.file So is this one. > +' > + > test_done
On Mon, May 23 2022, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> Fix a regression in c45dc9cf30 (diff: plug memory leak from regcomp() >> on {log,diff} -I, 2021-02-11), as noted in [1] there was a logic error >> where we'd free the regex too soon. >> >> Now we'll ensure that diff_free() can be called repeatedly >> instead. We'd ultimately like to do away with the "no_free" confusion >> surrounding it, and to attempt to free() things only once, as outlined >> in [2]. But in the meantime this will fix the segfault. > > Hmph, repeated calls to diff_free_file() now closes the file upon > the first call. I would have expected that such a resource would be > released when all the references go away, i.e. upon the last call. > The same thing for the ignore-regex array. Yes, that would be much more sensible. But as noted: When producing a combined diff we'll go through combined-diff.c, which doesn't handle many of the options that the corresponding diff.c codepaths do. I.e. the "right" thing to do in this case would require a much more involved fix. We've somehow ended up not supporting --output=<file>, -I and probably many other options in the combined-diff mode, which both in testing and in this part of the implementation seems to have become an afterthought. So before any changes of mine we silently ignore those options, and in those particular cases the "right" thing to do if we're not growing new features would probably be to error out early if they were provided in the combined diff mode. But as a minimal fix just tailoring diff_free() towards the not-combined-diff.c case seems to be the smallest & most correct thing to do for now to address the segfault & the immediate issue. > Clearing the "options->close_file" bit, and using FREE_AND_NULL(), > would hide a breakage that could be caused by this change, doesn't > it, because any use-after-release will say "ah, no need to close the > file" and "oh, there is no regex". The former is not so worrisome, > but the latter may be---we may no longer have regex because the > first call to diff_free_ignore_regex() has cleared it and the code > that wants to use the regex, if exists, would happily say "oh, there > is no regex", instead of exposing the use-after-release breakage to > a segfault. Yes, this wouldn't make much sense if we were supporting the file output and -I in the combined-diff.c case, but AFAICT the two cases are: 1. The "normal" diff case, where we set those up once, and diff_free() them once. 2. The "combined-diff.c" case, where we might call diff_free() N times, but it's all to produce the diff itself, not for those options. >> Thus we're here testing that -I<regex> is ignored in this case, and >> likewise for --output=<file>, but since this is what we were doing >> before c45dc9cf30 let's accept it for now. > > It is true that the result of applying this patch is equivalent to > c45dc9cf (diff: plug memory leak from regcomp() on {log,diff} -I, > 2021-02-11), but doesn't that merely point at the commit as the > source of behaviour breakage? With ignore-regex leaking before that > commit, wouldn't we have been using ignore-regex with combined diff > machinery? No, because -I never did anything with the combined diff machinery, neither did --output. > Sorry, but I am failing to convince myself that this is not sweep > the issue under the rug. I think that's a fair summary, much of it was already under the rug, we're sweeping some of the remainin parts under it :) I think that whole combined-diff interaction really needs to be fix, not just for the diff_free() case, but e.g. we should either error out or support options that we're silently ignoring now. But as noted in https://lore.kernel.org/git/220520.86pmk81a9z.gmgdl@evledraar.gmail.com/ I do have patches queued up locally that form a better basis for fixing these issues. I.e. once we fix this segfault and have release_revisions() it'll be easy to get rid of that "no_free" case in diff_free(). >> [...] >> void diff_free(struct diff_options *options) >> diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh >> index 056e922164d..b556d185f53 100755 >> --- a/t/t4013-diff-various.sh >> +++ b/t/t4013-diff-various.sh >> @@ -614,4 +614,19 @@ test_expect_success 'diff -I<regex>: detect malformed regex' ' >> test_i18ngrep "invalid regex given to -I: " error >> ' >> >> +test_expect_success 'diff -I<regex>: combined diff does not segfault' ' >> + revs="HEAD~2 HEAD~ HEAD" && >> + git diff $revs >expect && >> + git diff -I . $revs >actual && >> + test_cmp expect actual > > And indeed this casts such a broken behaviour in stone. > >> +' >> + >> +test_expect_success 'diff --output=<file>: combined diff does not segfault' ' >> + revs="HEAD~2 HEAD~ HEAD" && >> + git diff --output=expect.file $revs >expect.out && >> + git diff $revs >actual && >> + test_cmp expect.out actual && >> + test_must_be_empty expect.file > > So is this one. I was on the fence about adding these tests, since I expected you to comment on this aspect of them. I.e. we could just ignore the output here and narrowly see if we segfault. But since we had no tests at all for this before, and intentional or not this behavior of combined-diff is long-standing behavior (that nobody seems to have noticed or cared about) I think it's good to have tests that check the "expected" (as in what we did before my c45dc9cf30) output.
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > I.e. the "right" thing to do in this case would require a much more > involved fix. We've somehow ended up not supporting --output=<file>, -I > and probably many other options in the combined-diff mode, which both in > testing and in this part of the implementation seems to have become an > afterthought. OK, a hopefully final question. How much less involved is it to add a new code (without doing anything in this patch) to detect and die on the combination of combined-diff with these two options, so that we can document the fact that we do not support them? It would give us much better way forward than leaving the command silently ignore and give result that is not in line with what was asked, wouldn't it? That way, the much more involved "fix" will turn into a change to add a missing feature. Thanks.
On Tue, May 24 2022, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > >> I.e. the "right" thing to do in this case would require a much more >> involved fix. We've somehow ended up not supporting --output=<file>, -I >> and probably many other options in the combined-diff mode, which both in >> testing and in this part of the implementation seems to have become an >> afterthought. > > OK, a hopefully final question. > > How much less involved is it to add a new code (without doing > anything in this patch) ...yeah, I think for this one it makes sense to narrowly focus on the segfault... > to detect and die on the combination of > combined-diff with these two options, so that we can document the > fact that we do not support them? It would give us much better way > forward than leaving the command silently ignore and give result > that is not in line with what was asked, wouldn't it? That way, the > much more involved "fix" will turn into a change to add a missing > feature. I think not much, it's rather trivial for the case where we invoke "git diff", I.e. just adding something to the "builtin_diff_combined()" branch in builtin/diff.c to detect these two cases specifically. I haven't looked in any depth into how we might reach code in combine-diff.c through other means, and if any of it can set these two indirectly somewhere else (i.e. other things that take diff options). I also wonder if I'm just wrong in my assessment that it's a Bad Thing that we take some of these without ever doing anything with them in some modes, e.g.: git log --oneline -I foo This will never do anything with that "-I foo" by definition, but would as soon as you add -p, should we error without -p (or other diff-showing options). The same goes for range-diff, format-patch, --remerge-diff and any number of other things where we take the full set of options, but only do something with a limited subset of them. It is helpful in some cases if we were more anal about it, e.g. when I was wondering why -I didn't do anything with the combined diff, but also handy for scripting and one-liners if you can tweak the command-line back & forth without it being so strict. So I don't know. Maybe I'm just trying to talk myself out of pulling on that (bound to be long) thread, but I'm coming more around to this just being a non-issue beyond the narrow and needed fix for diff_free() in particular. I.e. the more general approach of chasing down options that don't do anything for a given "diff mode". We might still want to error on some particular ones, such as -I with the combined diff (but not with --oneline, or whatever).
Am 24.05.22 um 22:17 schrieb Ævar Arnfjörð Bjarmason: > > On Tue, May 24 2022, Junio C Hamano wrote: > >> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: >> >>> I.e. the "right" thing to do in this case would require a much more >>> involved fix. We've somehow ended up not supporting --output=<file>, -I >>> and probably many other options in the combined-diff mode, which both in >>> testing and in this part of the implementation seems to have become an >>> afterthought. >> >> OK, a hopefully final question. >> >> How much less involved is it to add a new code (without doing >> anything in this patch) > > ...yeah, I think for this one it makes sense to narrowly focus on the > segfault... > >> to detect and die on the combination of >> combined-diff with these two options, so that we can document the >> fact that we do not support them? It would give us much better way >> forward than leaving the command silently ignore and give result >> that is not in line with what was asked, wouldn't it? That way, the >> much more involved "fix" will turn into a change to add a missing >> feature. > > I think not much, it's rather trivial for the case where we invoke "git > diff", I.e. just adding something to the "builtin_diff_combined()" > branch in builtin/diff.c to detect these two cases specifically. > > I haven't looked in any depth into how we might reach code in > combine-diff.c through other means, and if any of it can set these two > indirectly somewhere else (i.e. other things that take diff options). So let's add those checks there. > I also wonder if I'm just wrong in my assessment that it's a Bad Thing > that we take some of these without ever doing anything with them in some > modes, e.g.: > > git log --oneline -I foo > > This will never do anything with that "-I foo" by definition, but would > as soon as you add -p, should we error without -p (or other diff-showing > options). Which definition? The documentation says: -I<regex>, --ignore-matching-lines=<regex> Ignore changes whose all lines match <regex>. This option may be specified more than once. That sounds to me like it would affect history simplification, and thus git log --oneline. (Which seems expensive, but that's a different concern.) So based on that I'd expect at least a warning if -I is ignored. > The same goes for range-diff, format-patch, --remerge-diff and any > number of other things where we take the full set of options, but only > do something with a limited subset of them. > > It is helpful in some cases if we were more anal about it, e.g. when I > was wondering why -I didn't do anything with the combined diff, but also > handy for scripting and one-liners if you can tweak the command-line > back & forth without it being so strict. > > So I don't know. Maybe I'm just trying to talk myself out of pulling on > that (bound to be long) thread, but I'm coming more around to this just > being a non-issue beyond the narrow and needed fix for diff_free() in > particular. > > I.e. the more general approach of chasing down options that don't do > anything for a given "diff mode". We might still want to error on some > particular ones, such as -I with the combined diff (but not with > --oneline, or whatever). Supporting all combinations would be ideal. Reporting unsupported combinations would be the next best thing. I wonder if we passed the point of having so many options for e.g. git log that assessing all of their pairings has become impractical, though. :-/ René
diff --git a/diff.c b/diff.c index e71cf758861..183c9f21305 100644 --- a/diff.c +++ b/diff.c @@ -6432,8 +6432,10 @@ static void diff_flush_patch_all_file_pairs(struct diff_options *o) static void diff_free_file(struct diff_options *options) { - if (options->close_file) - fclose(options->file); + if (!options->close_file) + return; + options->close_file = 0; + fclose(options->file); } static void diff_free_ignore_regex(struct diff_options *options) @@ -6444,7 +6446,8 @@ static void diff_free_ignore_regex(struct diff_options *options) regfree(options->ignore_regex[i]); free(options->ignore_regex[i]); } - free(options->ignore_regex); + options->ignore_regex_nr = 0; + FREE_AND_NULL(options->ignore_regex); } void diff_free(struct diff_options *options) diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh index 056e922164d..b556d185f53 100755 --- a/t/t4013-diff-various.sh +++ b/t/t4013-diff-various.sh @@ -614,4 +614,19 @@ test_expect_success 'diff -I<regex>: detect malformed regex' ' test_i18ngrep "invalid regex given to -I: " error ' +test_expect_success 'diff -I<regex>: combined diff does not segfault' ' + revs="HEAD~2 HEAD~ HEAD" && + git diff $revs >expect && + git diff -I . $revs >actual && + test_cmp expect actual +' + +test_expect_success 'diff --output=<file>: combined diff does not segfault' ' + revs="HEAD~2 HEAD~ HEAD" && + git diff --output=expect.file $revs >expect.out && + git diff $revs >actual && + test_cmp expect.out actual && + test_must_be_empty expect.file +' + test_done