diff mbox series

grep: under --debug, show whether PCRE JIT is enabled

Message ID 20190818201727.31505-1-dev+git@drbeat.li (mailing list archive)
State New, archived
Headers show
Series grep: under --debug, show whether PCRE JIT is enabled | expand

Commit Message

Beat Bolli Aug. 18, 2019, 8:17 p.m. UTC
This information is useful and not visible anywhere else, so show it.

Signed-off-by: Beat Bolli <dev+git@drbeat.li>
Suggested-by: Johannes Schindelin <johannes.schindelin@gmx.de>

---
This applies on top of 'ab/pcre-jit-fixes', currently in pu.

 grep.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Junio C Hamano Aug. 19, 2019, 10:23 p.m. UTC | #1
Beat Bolli <dev+git@drbeat.li> writes:

> This information is useful and not visible anywhere else, so show it.
>
> Signed-off-by: Beat Bolli <dev+git@drbeat.li>
> Suggested-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> ---
> This applies on top of 'ab/pcre-jit-fixes', currently in pu.

Thanks.  

We saw a few people exchanging patches on the list and discussing
topics aroud PCRE and especially JIT, but the discussion petered out
during the prerelease freeze.  I'd like to see the topics solidify
early in this cycle.

IIRC, some key points/issues addressed by various patches we saw
during the last cycle were:

 - A binary may be prepared to be JIT capable, but on a particular
   system (e.g. SELinux) JIT may not work.  Should we write off such
   a configuration as "broken"?  Should we just fall back on non-JIT?
   Should we fall back with loud warning?

 - JIT and non-JIT codepath may validate UTF-8 differently without
   care, but we should make sure JIT codepath behave identically to
   non JIT (only faster).

 - We should not be validating strict UTF-8 when we do not even know
   if the payload is UTF-8.  What mechanism, if any, do we have to
   let us say "this must be UTF-8 or otherwise it is an error" with
   confidence?  Should we error out in the middle of "git log"
   session upon seeing a binary haystack while looking for UTF-8
   needle (I think not)?

There may be others I am missing.

Is ab/pcre-jit-fixes a good base to collectively work on to finish
the topics floated around PCRE during the last cycle?

I'll queue this debugging aid on top in the meantime.  Thanks.
Carlo Marcelo Arenas Belón Aug. 24, 2019, 12:57 p.m. UTC | #2
On Mon, Aug 19, 2019 at 3:23 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> There may be others I am missing.

should we still support PCRE1? I think in this case the problem is
compounded by the fact that unless we do something like [1], the real
fix for those UTF-8 validation issues will require a yet unreleased
version of PCRE2 and will never be available for PCRE1, making the
user experience suboptimal.

and explained in [1] there was a series to cleanup (both for
maintainability and to mitigate regressions) the PCRE1 code that is
yet to be formally reviewed in [2]

there is also the question of if we should provide knobs so users can
"tune" their pcre library to workaround some of the quirks or if we
should do more work ourselves to handle those quirks and improve the
error reporting.

one example of that is as you pointed out JIT, but also applies to
other things like PCRE's stack size, or depending on our solution for
PCRE1, accepting the risk (which already exist anyway) to accept
problems with matching because of corrupted UTF-8 in the haystack

> Is ab/pcre-jit-fixes a good base to collectively work on to finish
> the topics floated around PCRE during the last cycle?

V3 of that (which was never sent) might be better IMHO, I had to also
admit I was surprised to see the whole no-kwset series this depended
on being dropped but would seem it was just partially merged with
pcre-jit-fixes (which is missing the patches that address the UTF-8
issues with PCRE2's unreleased flag and that should be part of that
V3)

it might be worth also rebasing pcre2-chartables-leakfix on top of
this to avoid conflicts, eventhough I had to admit that I was
expanding on integrating [3], to avoid having to squash a fix into
René's patch (as he suggested) and that would be part of a reroll from
that series.

Carlo

[1] https://public-inbox.org/git/CAPUEspgStVxL=0SoAg82vxRMRGLSEKdHrT-xq6nCW1sNq7nLsw@mail.gmail.com/
[2] https://public-inbox.org/git/20190726202642.7986-1-carenas@gmail.com/
[3] https://public-inbox.org/git/20190721194052.15440-1-carenas@gmail.com/
Johannes Schindelin Aug. 26, 2019, 2:28 p.m. UTC | #3
Hi Carlo,

On Sat, 24 Aug 2019, Carlo Arenas wrote:

> On Mon, Aug 19, 2019 at 3:23 PM Junio C Hamano <gitster@pobox.com> wrote:
> >
> > There may be others I am missing.
>
> should we still support PCRE1?

While Git for Windows has no problem to just drop PCRE1 support, I would
like to take a longer road in Git. Like, deprecate it first (I don't
know how to do that effectively, though, as packagers usually ignore
compile warnings, maybe we need to add a Makefile knob
YES_I_WANT_TO_BUILD_WITH_PCRE1_INSTEAD_OF_PCRE2 and
YES_I_WILL_STATE_MY_REASONS_ON_THE_GIT_MAILING_LIST or something like
that).

Ciao,
Dscho
Carlo Marcelo Arenas Belón Aug. 26, 2019, 2:42 p.m. UTC | #4
On Mon, Aug 26, 2019 at 7:29 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> On Sat, 24 Aug 2019, Carlo Arenas wrote:
>
> > On Mon, Aug 19, 2019 at 3:23 PM Junio C Hamano <gitster@pobox.com> wrote:
> > >
> > > There may be others I am missing.
> >
> > should we still support PCRE1?
>
> While Git for Windows has no problem to just drop PCRE1 support,

FWIW I don't want to drop PCRE1 support, I was not advocating for it,
but in the contrary
trying to find a way to keep it working as best as possible until we
really can't.

> I would
> like to take a longer road in Git. Like, deprecate it first (I don't
> know how to do that effectively, though, as packagers usually ignore
> compile warnings, maybe we need to add a Makefile knob
> YES_I_WANT_TO_BUILD_WITH_PCRE1_INSTEAD_OF_PCRE2 and
> YES_I_WILL_STATE_MY_REASONS_ON_THE_GIT_MAILING_LIST or something like
> that).

FWIW e6c531b808 (Makefile: make USE_LIBPCRE=YesPlease mean v2, not v1,
2018-03-11)
put a "deprecattion" warning in the Makefile by Ævar, but last time
this was discussed[1] Junio
made an IMHO sound argument for why that should be removed instead but
ab/pcre-jit-fixes and UTF-8 validation are likely to make that more
difficult (even if it is a mostly self inflicted wound AFAIK)

Carlo

[1] https://public-inbox.org/git/xmqqlg4xkh28.fsf@gitster-ct.c.googlers.com/
Junio C Hamano Aug. 26, 2019, 4:02 p.m. UTC | #5
Carlo Arenas <carenas@gmail.com> writes:

> ... but
> ab/pcre-jit-fixes and UTF-8 validation are likely to make that more
> difficult (even if it is a mostly self inflicted wound AFAIK)

Hmm, in what way?  Do you mean that we'd be invested even more in
pcre1 in an effort to keep supporting, that the sunk cost would
dissuade us from deprecating the support even more, or something?
Carlo Marcelo Arenas Belón Aug. 26, 2019, 4:36 p.m. UTC | #6
On Mon, Aug 26, 2019 at 9:02 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Carlo Arenas <carenas@gmail.com> writes:
>
> > ... but
> > ab/pcre-jit-fixes and UTF-8 validation are likely to make that more
> > difficult (even if it is a mostly self inflicted wound AFAIK)
>
> Hmm, in what way?  Do you mean that we'd be invested even more in
> pcre1 in an effort to keep supporting, that the sunk cost would
> dissuade us from deprecating the support even more, or something?

on the contrary, PCRE1 works fine but our recent changes make it worst
unnecessarily (IMHO)

 for example 685668faaa (grep: stop using a custom JIT stack with PCRE
v1, 2019-07-26) adds 2 regressions as discussed in [1]

* git grep -P will now throw an error if there are non UTF-8 documents
in the haystack (even if JIT is available)
* git grep -P '^([/](?!/)|[^/])*~/.*' will now fail with a cryptic
PCRE error instead of succeeding (but at least will be consistent and
show the same error with PCRE2)

Carlo

[1] https://public-inbox.org/git/CAPUEspgStVxL=0SoAg82vxRMRGLSEKdHrT-xq6nCW1sNq7nLsw@mail.gmail.com/
Junio C Hamano Aug. 27, 2019, 7:43 p.m. UTC | #7
Carlo Arenas <carenas@gmail.com> writes:

>> > ... but
>> > ab/pcre-jit-fixes and UTF-8 validation are likely to make that more
>> > difficult (even if it is a mostly self inflicted wound AFAIK)
>> ...
>  for example 685668faaa (grep: stop using a custom JIT stack with PCRE
> v1, 2019-07-26) adds 2 regressions as discussed in [1]
>
> * git grep -P will now throw an error if there are non UTF-8 documents
> in the haystack (even if JIT is available)
> * git grep -P '^([/](?!/)|[^/])*~/.*' will now fail with a cryptic
> PCRE error instead of succeeding (but at least will be consistent and
> show the same error with PCRE2)

Thanks.  This was exactly the kind of thing I wanted to hear when I
asked if ab/pcre-jit-fixes is a good base to further build on.

As far as I am concerned, ab/pcre-jit-fixes and anything pcre that
are still in flight are fair game for a reboot during this cycle (I
hope I did not merge any of them to 'next' by mistake yet).  If
there are bad apples queued on 'pu', let's make sure we eject them
and rethink the way to address the issue(s) they wanted to address.
diff mbox series

Patch

diff --git a/grep.c b/grep.c
index 9bc589720b..96272b3cfc 100644
--- a/grep.c
+++ b/grep.c
@@ -433,6 +433,8 @@  static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt)
 
 #ifdef GIT_PCRE1_USE_JIT
 	pcre_config(PCRE_CONFIG_JIT, &p->pcre1_jit_on);
+	if (opt->debug)
+		fprintf(stderr, "pcre1_jit_on=%d\n", p->pcre1_jit_on);
 #endif
 }
 
@@ -534,6 +536,8 @@  static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 	}
 
 	pcre2_config(PCRE2_CONFIG_JIT, &p->pcre2_jit_on);
+	if (opt->debug)
+		fprintf(stderr, "pcre2_jit_on=%d\n", p->pcre2_jit_on);
 	if (p->pcre2_jit_on) {
 		jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE);
 		if (jitret)
@@ -559,6 +563,9 @@  static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 			BUG("pcre2_pattern_info() failed: %d", patinforet);
 		if (jitsizearg == 0) {
 			p->pcre2_jit_on = 0;
+			if (opt->debug)
+				fprintf(stderr, "pcre2_jit_on=%d: (*NO_JIT) in regex\n",
+					p->pcre2_jit_on);
 			return;
 		}
 	}