diff mbox series

[3/3] grep: stop using a custom JIT stack with PCRE v1

Message ID 20190724151415.3698-4-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series grep: use custom JIT stack with pcre2 | expand

Commit Message

Ævar Arnfjörð Bjarmason July 24, 2019, 3:14 p.m. UTC
Simplify the PCRE v1 code for the same reasons as for the PCRE v2 code
in the last commit. Unlike with v2 we actually used the custom stack
in v1, but let's use PCRE's built-in 32 KB one instead, since
experience with v2 shows that's enough. Most distros are already using
v2 as a default, and the underlying sljit code is the same.

Unfortunately we can't just pass a NULL to pcre_jit_exec() as with
pcre2_jit_match(). Unlike the v2 function it doesn't support
that. Instead we need to use the fatter pcre_exec() if we'd like the
same behavior.

This will make things slightly slower than on the fast-path function,
but it's OK since we care less about v1 performance these days since
we have and recommend v2. Running a similar performance test as what I
ran in fbaceaac47 ("grep: add support for the PCRE v1 JIT API",
2017-05-25) via:

    GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE1=Y CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst' ./run HEAD~ HEAD p7820-grep-engines.sh

Gives us this, just the /perl/ results:

    Test                                            HEAD~             HEAD
    ---------------------------------------------------------------------------------------
    7820.3: perl grep 'how.to'                      0.19(0.67+0.52)   0.19(0.65+0.52) +0.0%
    7820.7: perl grep '^how to'                     0.19(0.78+0.44)   0.19(0.72+0.49) +0.0%
    7820.11: perl grep '[how] to'                   0.39(2.13+0.43)   0.40(2.10+0.46) +2.6%
    7820.15: perl grep '(e.t[^ ]*|v.ry) rare'       0.44(2.55+0.37)   0.45(2.47+0.41) +2.3%
    7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'          0.23(1.06+0.42)   0.22(1.03+0.43) -4.3%

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 28 +++++-----------------------
 grep.h |  5 -----
 2 files changed, 5 insertions(+), 28 deletions(-)

Comments

Carlo Marcelo Arenas Belón July 26, 2019, 1:15 p.m. UTC | #1
since this moves PCRE1 out of the JIT fast path, introduces the
regression where git grep will abort if there is binary data or non
UTF-8 text in the repository/log and should be IMHO hold out until a
fix for that can be merged.

this also needs additional changes to better support NO_LIBPCRE1_JIT,
patch to follow

Carlo
Ævar Arnfjörð Bjarmason July 26, 2019, 1:50 p.m. UTC | #2
On Fri, Jul 26 2019, Carlo Arenas wrote:

> since this moves PCRE1 out of the JIT fast path,

I think you're mostly replying to the wrong thread. None of the patches
I've sent disable PCRE v1 JIT, as the performance numbers show. The JIT
stack is resized, and for v2 some dead code removed.

> introduces the regression where git grep will abort if there is binary
> data or non UTF-8 text in the repository/log and should be IMHO hold
> out until a fix for that can be merged.

You're talking about the kwset series, not this cleanup series.

> this also needs additional changes to better support NO_LIBPCRE1_JIT,
> patch to follow

Looking forward to it, thanks!
Carlo Marcelo Arenas Belón July 26, 2019, 2:12 p.m. UTC | #3
On Fri, Jul 26, 2019 at 6:50 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Fri, Jul 26 2019, Carlo Arenas wrote:
>
> > since this moves PCRE1 out of the JIT fast path,
>
> I think you're mostly replying to the wrong thread. None of the patches
> I've sent disable PCRE v1 JIT, as the performance numbers show. The JIT
> stack is resized, and for v2 some dead code removed.

I didn't mean JIT was disabled, but that we are calling now the regular
PCRE1 function which does UTF-8 validation (unlike the one used before)

> > introduces the regression where git grep will abort if there is binary
> > data or non UTF-8 text in the repository/log and should be IMHO hold
> > out until a fix for that can be merged.
>
> You're talking about the kwset series, not this cleanup series.

a combination of both (as seen in pu) and that will also happen in next if
this series get merged there.

before this cleanup series, a git compiled against PCRE1 and not using
NO_LIBPCRE1_JIT will use the jit fast path function and therefore would
have no problems with binary or non UTF-8 content in the repository, but
will regress after.

Carlo
Ævar Arnfjörð Bjarmason July 26, 2019, 2:43 p.m. UTC | #4
On Fri, Jul 26 2019, Carlo Arenas wrote:

> On Fri, Jul 26, 2019 at 6:50 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>>
>> On Fri, Jul 26 2019, Carlo Arenas wrote:
>>
>> > since this moves PCRE1 out of the JIT fast path,
>>
>> I think you're mostly replying to the wrong thread. None of the patches
>> I've sent disable PCRE v1 JIT, as the performance numbers show. The JIT
>> stack is resized, and for v2 some dead code removed.
>
> I didn't mean JIT was disabled, but that we are calling now the regular
> PCRE1 function which does UTF-8 validation (unlike the one used before)
>
>> > introduces the regression where git grep will abort if there is binary
>> > data or non UTF-8 text in the repository/log and should be IMHO hold
>> > out until a fix for that can be merged.
>>
>> You're talking about the kwset series, not this cleanup series.
>
> a combination of both (as seen in pu) and that will also happen in next if
> this series get merged there.
>
> before this cleanup series, a git compiled against PCRE1 and not using
> NO_LIBPCRE1_JIT will use the jit fast path function and therefore would
> have no problems with binary or non UTF-8 content in the repository, but
> will regress after.

I see. Yes you're right, I misread pcrejit(3) about how the "fast path
API" worked (or more accurately, misremembered). Yes, this is now a new
caveat.

I have some patches on top of next I'm about to send that hopefully make
this whole thing less of a mess.
diff mbox series

Patch

diff --git a/grep.c b/grep.c
index 20ce95270a..6b52fed53a 100644
--- a/grep.c
+++ b/grep.c
@@ -406,12 +406,6 @@  static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
 
 #ifdef GIT_PCRE1_USE_JIT
 	pcre_config(PCRE_CONFIG_JIT, &p->pcre1_jit_on);
-	if (p->pcre1_jit_on) {
-		p->pcre1_jit_stack = pcre_jit_stack_alloc(1, 1024 * 1024);
-		if (!p->pcre1_jit_stack)
-			die("Couldn't allocate PCRE JIT stack");
-		pcre_assign_jit_stack(p->pcre1_extra_info, NULL, p->pcre1_jit_stack);
-	}
 #endif
 }
 
@@ -423,18 +417,9 @@  static int pcre1match(struct grep_pat *p, const char *line, const char *eol,
 	if (eflags & REG_NOTBOL)
 		flags |= PCRE_NOTBOL;
 
-#ifdef GIT_PCRE1_USE_JIT
-	if (p->pcre1_jit_on) {
-		ret = pcre_jit_exec(p->pcre1_regexp, p->pcre1_extra_info, line,
-				    eol - line, 0, flags, ovector,
-				    ARRAY_SIZE(ovector), p->pcre1_jit_stack);
-	} else
-#endif
-	{
-		ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line,
-				eol - line, 0, flags, ovector,
-				ARRAY_SIZE(ovector));
-	}
+	ret = pcre_exec(p->pcre1_regexp, p->pcre1_extra_info, line,
+			eol - line, 0, flags, ovector,
+			ARRAY_SIZE(ovector));
 
 	if (ret < 0 && ret != PCRE_ERROR_NOMATCH)
 		die("pcre_exec failed with error code %d", ret);
@@ -451,14 +436,11 @@  static void free_pcre1_regexp(struct grep_pat *p)
 {
 	pcre_free(p->pcre1_regexp);
 #ifdef GIT_PCRE1_USE_JIT
-	if (p->pcre1_jit_on) {
+	if (p->pcre1_jit_on)
 		pcre_free_study(p->pcre1_extra_info);
-		pcre_jit_stack_free(p->pcre1_jit_stack);
-	} else
+	else
 #endif
-	{
 		pcre_free(p->pcre1_extra_info);
-	}
 	pcre_free((void *)p->pcre1_tables);
 }
 #else /* !USE_LIBPCRE1 */
diff --git a/grep.h b/grep.h
index a65f4a1ae1..a405fc870c 100644
--- a/grep.h
+++ b/grep.h
@@ -14,13 +14,9 @@ 
 #ifndef GIT_PCRE_STUDY_JIT_COMPILE
 #define GIT_PCRE_STUDY_JIT_COMPILE 0
 #endif
-#if PCRE_MAJOR <= 8 && PCRE_MINOR < 20
-typedef int pcre_jit_stack;
-#endif
 #else
 typedef int pcre;
 typedef int pcre_extra;
-typedef int pcre_jit_stack;
 #endif
 #ifdef USE_LIBPCRE2
 #define PCRE2_CODE_UNIT_WIDTH 8
@@ -86,7 +82,6 @@  struct grep_pat {
 	regex_t regexp;
 	pcre *pcre1_regexp;
 	pcre_extra *pcre1_extra_info;
-	pcre_jit_stack *pcre1_jit_stack;
 	const unsigned char *pcre1_tables;
 	int pcre1_jit_on;
 	pcre2_code *pcre2_pattern;