diff mbox series

[2/2] grep/pcre2: factor out literal variable

Message ID ba503995-866d-fc80-4e38-b4dfd0e5c5bc@web.de (mailing list archive)
State New, archived
Headers show
Series [1/2] grep/pcre2: use PCRE2_UTF even with ASCII patterns | expand

Commit Message

René Scharfe Dec. 18, 2021, 7:53 p.m. UTC
Patterns that contain no wildcards and don't have to be case-folded are
literal.  Give this condition a name to increase the readability of the
boolean expression for enabling the option PCRE2_UTF.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 grep.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--
2.34.0

Comments

Ævar Arnfjörð Bjarmason Dec. 19, 2021, 7:37 p.m. UTC | #1
On Sat, Dec 18 2021, René Scharfe wrote:

> Patterns that contain no wildcards and don't have to be case-folded are
> literal.  Give this condition a name to increase the readability of the
> boolean expression for enabling the option PCRE2_UTF.
>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  grep.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/grep.c b/grep.c
> index 5badb6d851..2b6ac3205d 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -362,6 +362,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
>  	int jitret;
>  	int patinforet;
>  	size_t jitsizearg;
> +	int literal = !opt->ignore_case && (p->fixed || p->is_fixed);
>
>  	/*
>  	 * Call pcre2_general_context_create() before calling any
> @@ -382,8 +383,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
>  		}
>  		options |= PCRE2_CASELESS;
>  	}
> -	if (!opt->ignore_locale && is_utf8_locale() &&
> -	    !(!opt->ignore_case && (p->fixed || p->is_fixed)))
> +	if (!opt->ignore_locale && is_utf8_locale() && !literal)
>  		options |= (PCRE2_UTF | PCRE2_MATCH_INVALID_UTF);
>
>  #ifdef GIT_PCRE2_VERSION_10_36_OR_HIGHER

I think for this and 1/2 it would be really nice to pick up a version of
Hamza's CI changes:
https://lore.kernel.org/git/20211118084143.279174-2-someguy@effective-light.com/

Aside: Not needed for this change, but I wonder if we could benefit minutely
from:

    #ifdef PCRE2_LITERAL
    options |= PCRE2_LITERAL;
    #endif

It'll save PCRE2 the small effort of finding that we've got no metacharacters.
Junio C Hamano Dec. 20, 2021, 8:47 p.m. UTC | #2
René Scharfe <l.s.r@web.de> writes:

> Patterns that contain no wildcards and don't have to be case-folded are
> literal.  Give this condition a name to increase the readability of the
> boolean expression for enabling the option PCRE2_UTF.

Makes sense.  Quite a lot.

Thanks.
Junio C Hamano Dec. 20, 2021, 8:52 p.m. UTC | #3
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> I think for this and 1/2 it would be really nice to pick up a version of
> Hamza's CI changes:
> https://lore.kernel.org/git/20211118084143.279174-2-someguy@effective-light.com/

Are there so many incompatible versions of pcre2 library whose usage
are subtly different that we need to protect ourselves with multiple
variants in CI from breakages?

I doubt pcre2 library was _that_ bad.

Adding a special task that builds with the minimum version we
support may not be too bad, but the library should be stable enough
to allow us to declare it sufficient to test the most common version
with the most common build options in our ordinary build job(s).
Junio C Hamano Dec. 20, 2021, 8:53 p.m. UTC | #4
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> Aside: Not needed for this change, but I wonder if we could benefit minutely
> from:
>
>     #ifdef PCRE2_LITERAL
>     options |= PCRE2_LITERAL;
>     #endif
>
> It'll save PCRE2 the small effort of finding that we've got no metacharacters.

After the dust settles, it would be a good addition in a separate
patch.

Thanks.
Ævar Arnfjörð Bjarmason Dec. 20, 2021, 10:03 p.m. UTC | #5
On Mon, Dec 20 2021, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> I think for this and 1/2 it would be really nice to pick up a version of
>> Hamza's CI changes:
>> https://lore.kernel.org/git/20211118084143.279174-2-someguy@effective-light.com/
>
> Are there so many incompatible versions of pcre2 library whose usage
> are subtly different that we need to protect ourselves with multiple
> variants in CI from breakages?
>
> I doubt pcre2 library was _that_ bad.

It's really not, but:

 * We have an optional >=10.34 feature we use (albeit trivial)
 * We have an optional >=10.36 feature we use (major, and directly related)
 * We might be targeting JIT or not, and the error handling isn't the same (known PCRE caveat)
 * We might be targeting a PCRE that knows about Unicode, or not
 * We use it in a mode where we might feed UTF-8 invalid data into the UTF-8 mode

Any update to the relevant code really needs to test the combination of
those, so it's a perfect target for CI to make that less tedious.

> Adding a special task that builds with the minimum version we
> support may not be too bad, but the library should be stable enough
> to allow us to declare it sufficient to test the most common version
> with the most common build options in our ordinary build job(s).

That's a nice idea, but not the reality of the situation. Unless we're
willing to bump the version requirement & insist on JIT && Unicode
support before using it.

The CI itself should be realtively cheap, and just runs the few tests
that would spot any breakages with the above.
diff mbox series

Patch

diff --git a/grep.c b/grep.c
index 5badb6d851..2b6ac3205d 100644
--- a/grep.c
+++ b/grep.c
@@ -362,6 +362,7 @@  static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 	int jitret;
 	int patinforet;
 	size_t jitsizearg;
+	int literal = !opt->ignore_case && (p->fixed || p->is_fixed);

 	/*
 	 * Call pcre2_general_context_create() before calling any
@@ -382,8 +383,7 @@  static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 		}
 		options |= PCRE2_CASELESS;
 	}
-	if (!opt->ignore_locale && is_utf8_locale() &&
-	    !(!opt->ignore_case && (p->fixed || p->is_fixed)))
+	if (!opt->ignore_locale && is_utf8_locale() && !literal)
 		options |= (PCRE2_UTF | PCRE2_MATCH_INVALID_UTF);

 #ifdef GIT_PCRE2_VERSION_10_36_OR_HIGHER