diff mbox series

[v3] line-range: Fix infinite loop bug with degenerate '$' regex

Message ID 20221211015340.2181837-1-lars@oddbit.com (mailing list archive)
State New, archived
Headers show
Series [v3] line-range: Fix infinite loop bug with degenerate '$' regex | expand

Commit Message

Lars Kellogg-Stedman Dec. 11, 2022, 1:53 a.m. UTC
When the -L argument to "git log" is passed the degenerate regular
expression "$" (as in "-L :$:line-range.c"), this results in an
infinite loop in find_funcname_matching_regexp().

Modify find_funcname_matching_regexp to correctly match the entire line
instead of the zero-width match at eol and update the loop condition to
prevent an infinite loop in the event of other undiscovered corner cases.

Originally reported in <https://stackoverflow.com/q/74690545/147356>.

Signed-off-by: Lars Kellogg-Stedman <lars@oddbit.com>
---
This modifies my earlier patch so that instead of failing, the
regular expression '$' correctly matches any function name (and so
always returns the first function in the named file). This makes '$'
behave the same as '.' and '^'.

The primary change is that we pre-decrement the beginning-of-line
marker ('bol') before comparing it to '\n'. In the case of '$', where
we start with bol == eol, this ensures that bol will find the
beginning of the line on which the match occurred.

 line-range.c        |  6 +++---
 t/t4211-line-log.sh | 22 ++++++++++++++++++++++
 2 files changed, 25 insertions(+), 3 deletions(-)

Comments

Junio C Hamano Dec. 11, 2022, 3:32 a.m. UTC | #1
Lars Kellogg-Stedman <lars@oddbit.com> writes:

> When the -L argument to "git log" is passed the degenerate regular
> expression "$" (as in "-L :$:line-range.c"), this results in an
> infinite loop in find_funcname_matching_regexp().

Is "matching an empty line" the only way a regular expression can be
a "degenerate" one?  If not, perhaps being a bit more explicit would
help the readers, e.g.

    ... a regular expression that matches any line, even an empty
    one, such as "-L :$:line-range.c", this results in ...

> The primary change is that we pre-decrement the beginning-of-line
> marker ('bol') before comparing it to '\n'. In the case of '$', where
> we start with bol == eol, this ensures that bol will find the
> beginning of the line on which the match occurred.

This clear explanation probably deserves to be in the commit log
message proper.

> @@ -148,8 +148,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
>  		/* determine extent of line matched */
>  		bol = start+match[0].rm_so;
>  		eol = start+match[0].rm_eo;
> -		while (bol > start && *bol != '\n')
> -			bol--;
> +		while (bol > start && *--bol != '\n');

Please write it on two lines, and highlight an empty body of the
loop, like so

		while (condition)
			; /* nothing */

I think the intention of the above loop is to decrement bol
sufficiently and make it point at the terminating LF at the end
of the previous line, and then the next increment here

>  		if (*bol == '\n')
>  			bol++;

compensates to bring bol back to point at the beginning of line.

In the iteration of the loop when bol == start + 1, we inspect
*--bol (i.e.  *start).  If it is LF, we exit the loop, so bol ==
start and *bol == '\n'.  If it is not LF, we iterate one more time,
notice bol == start and exit the loop, so bol == start and *bol !=
'\n'.  bol will never go below start, so dereference *bol to see it
value where should be safe without OOB read.

> @@ -161,6 +160,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
>  			return bol;
>  		start = eol;
>  	}
> +	return NULL;
>  }

What is this change about?  Isn't the above an endless loop without
break, from which the only way for the control to leave it is by
returning?

> diff --git a/t/t4211-line-log.sh b/t/t4211-line-log.sh
> index ac9e4d0928..19db07a8df 100755
> --- a/t/t4211-line-log.sh
> +++ b/t/t4211-line-log.sh
> @@ -315,4 +315,26 @@ test_expect_success 'line-log with --before' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success 'setup tests for zero-width regular expressions' '
> +	cat > expect <<-EOF

Style: lose the SP after "cat >".

> +	Modify func1() in file.c
> +	Add func1() and func2() in file.c
> +	EOF
> +'
> +
> +test_expect_success 'zero-width regex $ matches any function name' '
> +	git log --format="%s" --no-patch "-L:$:file.c" >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'zero-width regex ^ matches any function name' '
> +	git log --format="%s" --no-patch "-L:^:file.c" >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'zero-width regex . matches any function name' '

"." is one character wide, not zero-width.  Did you mean ".*"?

> +	git log --format="%s" --no-patch "-L:.:file.c" >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_done
Junio C Hamano Dec. 11, 2022, 3:34 a.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> Lars Kellogg-Stedman <lars@oddbit.com> writes:
>
>> When the -L argument to "git log" is passed the degenerate regular
>> expression "$" (as in "-L :$:line-range.c"), this results in an
>> infinite loop in find_funcname_matching_regexp().
>
> Is "matching an empty line" the only way a regular expression can be
> a "degenerate" one?  If not, perhaps being a bit more explicit would
> help the readers, e.g.
>
>     ... a regular expression that matches any line, even an empty
>     one, such as "-L :$:line-range.c", this results in ...

I forgot to point out that the above comment on "degenerate" applies
equally to the commit title (which matters more).  Also, please do
not upcase the first word after "<area>:" on the title line.

Thanks.
Lars Kellogg-Stedman Dec. 14, 2022, 2:53 p.m. UTC | #3
On Sun, Dec 11, 2022 at 12:32:38PM +0900, Junio C Hamano wrote:
> Lars Kellogg-Stedman <lars@oddbit.com> writes:
> 
> > When the -L argument to "git log" is passed the degenerate regular
> > expression "$" (as in "-L :$:line-range.c"), this results in an
> > infinite loop in find_funcname_matching_regexp().
> 
> Is "matching an empty line" the only way a regular expression can be
> a "degenerate" one?  If not, perhaps being a bit more explicit would
> help the readers, e.g.

"Matching an empty line" isn't the issue (and if I implied that it
was, that's my bad). The issue only crops up when matching the
zero-width regular expression at the *end* of a line.

I'll update the subject and description to be more clear...

> This clear explanation probably deserves to be in the commit log
> message proper.

...including adding the extended description to the commit message. Some projects
object if commit messages get too wordy.

> Please write it on two lines, and highlight an empty body of the
> loop, like so

Will do.

> > @@ -161,6 +160,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
> >  			return bol;
> >  		start = eol;
> >  	}
> > +	return NULL;
> >  }
> 
> What is this change about?  Isn't the above an endless loop without
> break, from which the only way for the control to leave it is by
> returning?

It's not an endless loop without break; this change modified the loop
condition:

> -       while (1) {
> +       while (*start) {

That would have prevented the infinite loop in the first place. I'm
happy to drop this change if you prefer, but if this condition had
been in place originally we wouldn't have had the infinite loop bug
(we would still have had incorrect behavior, but it would have been
perhaps easier for an end user to diagnose).

> Style: lose the SP after "cat >".

Will do.
Junio C Hamano Dec. 18, 2022, 1:33 a.m. UTC | #4
Lars Kellogg-Stedman <lars@oddbit.com> writes:

>> > @@ -161,6 +160,7 @@ static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
>> >  			return bol;
>> >  		start = eol;
>> >  	}
>> > +	return NULL;
>> >  }
>> 
>> What is this change about?  Isn't the above an endless loop without
>> break, from which the only way for the control to leave it is by
>> returning?
>
> It's not an endless loop without break; this change modified the loop
> condition:

Ah, thanks, that's clear, and sorry for the noise, and thanks.
diff mbox series

Patch

diff --git a/line-range.c b/line-range.c
index 955a8a9535..b0e26e7f9d 100644
--- a/line-range.c
+++ b/line-range.c
@@ -135,7 +135,7 @@  static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
 {
 	int reg_error;
 	regmatch_t match[1];
-	while (1) {
+	while (*start) {
 		const char *bol, *eol;
 		reg_error = regexec(regexp, start, 1, match, 0);
 		if (reg_error == REG_NOMATCH)
@@ -148,8 +148,7 @@  static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
 		/* determine extent of line matched */
 		bol = start+match[0].rm_so;
 		eol = start+match[0].rm_eo;
-		while (bol > start && *bol != '\n')
-			bol--;
+		while (bol > start && *--bol != '\n');
 		if (*bol == '\n')
 			bol++;
 		while (*eol && *eol != '\n')
@@ -161,6 +160,7 @@  static const char *find_funcname_matching_regexp(xdemitconf_t *xecfg, const char
 			return bol;
 		start = eol;
 	}
+	return NULL;
 }
 
 static const char *parse_range_funcname(
diff --git a/t/t4211-line-log.sh b/t/t4211-line-log.sh
index ac9e4d0928..19db07a8df 100755
--- a/t/t4211-line-log.sh
+++ b/t/t4211-line-log.sh
@@ -315,4 +315,26 @@  test_expect_success 'line-log with --before' '
 	test_cmp expect actual
 '
 
+test_expect_success 'setup tests for zero-width regular expressions' '
+	cat > expect <<-EOF
+	Modify func1() in file.c
+	Add func1() and func2() in file.c
+	EOF
+'
+
+test_expect_success 'zero-width regex $ matches any function name' '
+	git log --format="%s" --no-patch "-L:$:file.c" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'zero-width regex ^ matches any function name' '
+	git log --format="%s" --no-patch "-L:^:file.c" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'zero-width regex . matches any function name' '
+	git log --format="%s" --no-patch "-L:.:file.c" >actual &&
+	test_cmp expect actual
+'
+
 test_done