Message ID | 20221211015340.2181837-1-lars@oddbit.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] line-range: Fix infinite loop bug with degenerate '$' regex | expand |
Lars Kellogg-Stedman <lars@oddbit.com> writes: > When the -L argument to "git log" is passed the degenerate regular > expression "$" (as in "-L :$:line-range.c"), this results in an > infinite loop in find_funcname_matching_regexp(). Is "matching an empty line" the only way a regular expression can be a "degenerate" one? If not, perhaps being a bit more explicit would help the readers, e.g. ... a regular expression that matches any line, even an empty one, such as "-L :$:line-range.c", this results in ... > The primary change is that we pre-decrement the beginning-of-line > marker ('bol') before comparing it to '\n'. In the case of '$', where > we start with bol == eol, this ensures that bol will find the > beginning of the line on which the match occurred. This clear explanation probably deserves to be in the commit log message proper. > @@ -148,8 +148,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char > /* determine extent of line matched */ > bol = start+match[0].rm_so; > eol = start+match[0].rm_eo; > - while (bol > start && *bol != '\n') > - bol--; > + while (bol > start && *--bol != '\n'); Please write it on two lines, and highlight an empty body of the loop, like so while (condition) ; /* nothing */ I think the intention of the above loop is to decrement bol sufficiently and make it point at the terminating LF at the end of the previous line, and then the next increment here > if (*bol == '\n') > bol++; compensates to bring bol back to point at the beginning of line. In the iteration of the loop when bol == start + 1, we inspect *--bol (i.e. *start). If it is LF, we exit the loop, so bol == start and *bol == '\n'. If it is not LF, we iterate one more time, notice bol == start and exit the loop, so bol == start and *bol != '\n'. bol will never go below start, so dereference *bol to see it value where should be safe without OOB read. > @@ -161,6 +160,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char > return bol; > start = eol; > } > + return NULL; > } What is this change about? Isn't the above an endless loop without break, from which the only way for the control to leave it is by returning? > diff --git a/t/t4211-line-log.sh b/t/t4211-line-log.sh > index ac9e4d0928..19db07a8df 100755 > --- a/t/t4211-line-log.sh > +++ b/t/t4211-line-log.sh > @@ -315,4 +315,26 @@ test_expect_success 'line-log with --before' ' > test_cmp expect actual > ' > > +test_expect_success 'setup tests for zero-width regular expressions' ' > + cat > expect <<-EOF Style: lose the SP after "cat >". > + Modify func1() in file.c > + Add func1() and func2() in file.c > + EOF > +' > + > +test_expect_success 'zero-width regex $ matches any function name' ' > + git log --format="%s" --no-patch "-L:$:file.c" >actual && > + test_cmp expect actual > +' > + > +test_expect_success 'zero-width regex ^ matches any function name' ' > + git log --format="%s" --no-patch "-L:^:file.c" >actual && > + test_cmp expect actual > +' > + > +test_expect_success 'zero-width regex . matches any function name' ' "." is one character wide, not zero-width. Did you mean ".*"? > + git log --format="%s" --no-patch "-L:.:file.c" >actual && > + test_cmp expect actual > +' > + > test_done
Junio C Hamano <gitster@pobox.com> writes: > Lars Kellogg-Stedman <lars@oddbit.com> writes: > >> When the -L argument to "git log" is passed the degenerate regular >> expression "$" (as in "-L :$:line-range.c"), this results in an >> infinite loop in find_funcname_matching_regexp(). > > Is "matching an empty line" the only way a regular expression can be > a "degenerate" one? If not, perhaps being a bit more explicit would > help the readers, e.g. > > ... a regular expression that matches any line, even an empty > one, such as "-L :$:line-range.c", this results in ... I forgot to point out that the above comment on "degenerate" applies equally to the commit title (which matters more). Also, please do not upcase the first word after "<area>:" on the title line. Thanks.
On Sun, Dec 11, 2022 at 12:32:38PM +0900, Junio C Hamano wrote: > Lars Kellogg-Stedman <lars@oddbit.com> writes: > > > When the -L argument to "git log" is passed the degenerate regular > > expression "$" (as in "-L :$:line-range.c"), this results in an > > infinite loop in find_funcname_matching_regexp(). > > Is "matching an empty line" the only way a regular expression can be > a "degenerate" one? If not, perhaps being a bit more explicit would > help the readers, e.g. "Matching an empty line" isn't the issue (and if I implied that it was, that's my bad). The issue only crops up when matching the zero-width regular expression at the *end* of a line. I'll update the subject and description to be more clear... > This clear explanation probably deserves to be in the commit log > message proper. ...including adding the extended description to the commit message. Some projects object if commit messages get too wordy. > Please write it on two lines, and highlight an empty body of the > loop, like so Will do. > > @@ -161,6 +160,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char > > return bol; > > start = eol; > > } > > + return NULL; > > } > > What is this change about? Isn't the above an endless loop without > break, from which the only way for the control to leave it is by > returning? It's not an endless loop without break; this change modified the loop condition: > - while (1) { > + while (*start) { That would have prevented the infinite loop in the first place. I'm happy to drop this change if you prefer, but if this condition had been in place originally we wouldn't have had the infinite loop bug (we would still have had incorrect behavior, but it would have been perhaps easier for an end user to diagnose). > Style: lose the SP after "cat >". Will do.
Lars Kellogg-Stedman <lars@oddbit.com> writes: >> > @@ -161,6 +160,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char >> > return bol; >> > start = eol; >> > } >> > + return NULL; >> > } >> >> What is this change about? Isn't the above an endless loop without >> break, from which the only way for the control to leave it is by >> returning? > > It's not an endless loop without break; this change modified the loop > condition: Ah, thanks, that's clear, and sorry for the noise, and thanks.
diff --git a/line-range.c b/line-range.c index 955a8a9535..b0e26e7f9d 100644 --- a/line-range.c +++ b/line-range.c @@ -135,7 +135,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char { int reg_error; regmatch_t match[1]; - while (1) { + while (*start) { const char *bol, *eol; reg_error = regexec(regexp, start, 1, match, 0); if (reg_error == REG_NOMATCH) @@ -148,8 +148,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char /* determine extent of line matched */ bol = start+match[0].rm_so; eol = start+match[0].rm_eo; - while (bol > start && *bol != '\n') - bol--; + while (bol > start && *--bol != '\n'); if (*bol == '\n') bol++; while (*eol && *eol != '\n') @@ -161,6 +160,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char return bol; start = eol; } + return NULL; } static const char *parse_range_funcname( diff --git a/t/t4211-line-log.sh b/t/t4211-line-log.sh index ac9e4d0928..19db07a8df 100755 --- a/t/t4211-line-log.sh +++ b/t/t4211-line-log.sh @@ -315,4 +315,26 @@ test_expect_success 'line-log with --before' ' test_cmp expect actual ' +test_expect_success 'setup tests for zero-width regular expressions' ' + cat > expect <<-EOF + Modify func1() in file.c + Add func1() and func2() in file.c + EOF +' + +test_expect_success 'zero-width regex $ matches any function name' ' + git log --format="%s" --no-patch "-L:$:file.c" >actual && + test_cmp expect actual +' + +test_expect_success 'zero-width regex ^ matches any function name' ' + git log --format="%s" --no-patch "-L:^:file.c" >actual && + test_cmp expect actual +' + +test_expect_success 'zero-width regex . matches any function name' ' + git log --format="%s" --no-patch "-L:.:file.c" >actual && + test_cmp expect actual +' + test_done
When the -L argument to "git log" is passed the degenerate regular expression "$" (as in "-L :$:line-range.c"), this results in an infinite loop in find_funcname_matching_regexp(). Modify find_funcname_matching_regexp to correctly match the entire line instead of the zero-width match at eol and update the loop condition to prevent an infinite loop in the event of other undiscovered corner cases. Originally reported in <https://stackoverflow.com/q/74690545/147356>. Signed-off-by: Lars Kellogg-Stedman <lars@oddbit.com> --- This modifies my earlier patch so that instead of failing, the regular expression '$' correctly matches any function name (and so always returns the first function in the named file). This makes '$' behave the same as '.' and '^'. The primary change is that we pre-decrement the beginning-of-line marker ('bol') before comparing it to '\n'. In the case of '$', where we start with bol == eol, this ensures that bol will find the beginning of the line on which the match occurred. line-range.c | 6 +++--- t/t4211-line-log.sh | 22 ++++++++++++++++++++++ 2 files changed, 25 insertions(+), 3 deletions(-)