diff mbox series

[1/3] grep: make pcre1_tables version agnostic

Message ID 20190727202759.22310-2-carenas@gmail.com (mailing list archive)
State New, archived
Headers show
Series grep: memory leak in PCRE2 | expand

Commit Message

Carlo Marcelo Arenas Belón July 27, 2019, 8:27 p.m. UTC
6d4b5747f0 ("grep: change internal *pcre* variable & function names
to be *pcre1*", 2017-05-25), renamed most variables to be PCRE1
specific to give space to similarly named variables for PCRE2, but
in this case the change wasn't needed as the types were compatible
enough (const unsigned char* vs const uint8_t*) to be shared.

Revert that change, as 94da9193a6 ("grep: add support for PCRE v2",
2017-06-01) failed to create an equivalent PCRE2 version.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
 grep.c | 6 +++---
 grep.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

Comments

Ævar Arnfjörð Bjarmason July 27, 2019, 11:47 p.m. UTC | #1
On Sat, Jul 27 2019, Carlo Marcelo Arenas Belón wrote:

> 6d4b5747f0 ("grep: change internal *pcre* variable & function names
> to be *pcre1*", 2017-05-25), renamed most variables to be PCRE1
> specific to give space to similarly named variables for PCRE2, but
> in this case the change wasn't needed as the types were compatible
> enough (const unsigned char* vs const uint8_t*) to be shared.

Both the v1 and v2 functions return const unsigned char *. I don't know
where I got the uint8_t from. This makes more sense.

This series looks good to me. Thanks for the fix. Just one caveat:

The point of 6d4b5747f0 was not to only split out those variables we
couldn't get away with re-using. Then I would have later re-used
e.g. pcre1_jit_on & pcre2_jit_on as just pcre_jit_on. We could also do
that now.

I think doing that & this part of the your changes makes things less
readable. The two code branches we compile with ifdefs are mutually
exclusive, so having the variables be unique helps with eyeballing /
reasoning when changing the code.

> Revert that change, as 94da9193a6 ("grep: add support for PCRE v2",
> 2017-06-01) failed to create an equivalent PCRE2 version.
>
> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> ---
>  grep.c | 6 +++---
>  grep.h | 2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/grep.c b/grep.c
> index f7c3a5803e..cc65f7a987 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -389,14 +389,14 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
>
>  	if (opt->ignore_case) {
>  		if (has_non_ascii(p->pattern))
> -			p->pcre1_tables = pcre_maketables();
> +			p->pcre_tables = pcre_maketables();
>  		options |= PCRE_CASELESS;
>  	}
>  	if (is_utf8_locale() && has_non_ascii(p->pattern))
>  		options |= PCRE_UTF8;
>
>  	p->pcre1_regexp = pcre_compile(p->pattern, options, &error, &erroffset,
> -				      p->pcre1_tables);
> +				      p->pcre_tables);
>  	if (!p->pcre1_regexp)
>  		compile_regexp_failed(p, error);
>
> @@ -462,7 +462,7 @@ static void free_pcre1_regexp(struct grep_pat *p)
>  	{
>  		pcre_free(p->pcre1_extra_info);
>  	}
> -	pcre_free((void *)p->pcre1_tables);
> +	pcre_free((void *)p->pcre_tables);
>  }
>  #else /* !USE_LIBPCRE1 */
>  static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
> diff --git a/grep.h b/grep.h
> index 1875880f37..d34f66b384 100644
> --- a/grep.h
> +++ b/grep.h
> @@ -89,7 +89,7 @@ struct grep_pat {
>  	pcre *pcre1_regexp;
>  	pcre_extra *pcre1_extra_info;
>  	pcre_jit_stack *pcre1_jit_stack;
> -	const unsigned char *pcre1_tables;
> +	const unsigned char *pcre_tables;
>  	int pcre1_jit_on;
>  	pcre2_code *pcre2_pattern;
>  	pcre2_match_data *pcre2_match_data;
Carlo Marcelo Arenas Belón July 28, 2019, 2:50 a.m. UTC | #2
On Sat, Jul 27, 2019 at 4:47 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Sat, Jul 27 2019, Carlo Marcelo Arenas Belón wrote:
>
> > 6d4b5747f0 ("grep: change internal *pcre* variable & function names
> > to be *pcre1*", 2017-05-25), renamed most variables to be PCRE1
> > specific to give space to similarly named variables for PCRE2, but
> > in this case the change wasn't needed as the types were compatible
> > enough (const unsigned char* vs const uint8_t*) to be shared.

reading this again, had to admit there is a fair amount of guessing on
intent, but reading commits and the email discussion couldn't come
up with a better explanation.

is the root cause for the bug different?, could it be that the pcre2 API
was misunderstood? and you expected this pointer will be free together
with the context? (as it is done when a cloned context with tables is
used?)

> Both the v1 and v2 functions return const unsigned char *. I don't know
> where I got the uint8_t from. This makes more sense.

the type in PCRE2 is uint8_t, the documentation has a bug[1]
almost every platform git cares for would have unsigned char = uint8_t
though.

> The point of 6d4b5747f0 was not to only split out those variables we
> couldn't get away with re-using. Then I would have later re-used
> e.g. pcre1_jit_on & pcre2_jit_on as just pcre_jit_on. We could also do
> that now.

IMHO pcre_jit_on makes more sense as a bit, with local variables being
used for the pcre*_config() call with the right type.(uint32_t != int)

> I think doing that & this part of the your changes makes things less
> readable. The two code branches we compile with ifdefs are mutually
> exclusive, so having the variables be unique helps with eyeballing /
> reasoning when changing the code.

I thought that too until the typo[2] in the pcre?_jit_on variable (now
in next) kind of
proved us wrong.  Maybe the names are too similar?

either way, would you rather drop this patch and make a replica variable?
should I rebase to next with Reviewed-By tags so that all other changes
in flight that would conflict could be corrected?

Carlo

[1] https://bugs.exim.org/show_bug.cgi?id=2420
[2] https://public-inbox.org/git/20190722181923.21572-1-dev+git@drbeat.li/
diff mbox series

Patch

diff --git a/grep.c b/grep.c
index f7c3a5803e..cc65f7a987 100644
--- a/grep.c
+++ b/grep.c
@@ -389,14 +389,14 @@  static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
 
 	if (opt->ignore_case) {
 		if (has_non_ascii(p->pattern))
-			p->pcre1_tables = pcre_maketables();
+			p->pcre_tables = pcre_maketables();
 		options |= PCRE_CASELESS;
 	}
 	if (is_utf8_locale() && has_non_ascii(p->pattern))
 		options |= PCRE_UTF8;
 
 	p->pcre1_regexp = pcre_compile(p->pattern, options, &error, &erroffset,
-				      p->pcre1_tables);
+				      p->pcre_tables);
 	if (!p->pcre1_regexp)
 		compile_regexp_failed(p, error);
 
@@ -462,7 +462,7 @@  static void free_pcre1_regexp(struct grep_pat *p)
 	{
 		pcre_free(p->pcre1_extra_info);
 	}
-	pcre_free((void *)p->pcre1_tables);
+	pcre_free((void *)p->pcre_tables);
 }
 #else /* !USE_LIBPCRE1 */
 static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
diff --git a/grep.h b/grep.h
index 1875880f37..d34f66b384 100644
--- a/grep.h
+++ b/grep.h
@@ -89,7 +89,7 @@  struct grep_pat {
 	pcre *pcre1_regexp;
 	pcre_extra *pcre1_extra_info;
 	pcre_jit_stack *pcre1_jit_stack;
-	const unsigned char *pcre1_tables;
+	const unsigned char *pcre_tables;
 	int pcre1_jit_on;
 	pcre2_code *pcre2_pattern;
 	pcre2_match_data *pcre2_match_data;