diff mbox series

[RFC,v3] grep: treat PCRE2 jit compilation memory error as non fatal

Message ID 20190804031409.32764-1-carenas@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC,v3] grep: treat PCRE2 jit compilation memory error as non fatal | expand

Commit Message

Carlo Marcelo Arenas Belón Aug. 4, 2019, 3:14 a.m. UTC
94da9193a6 (grep: add support for PCRE v2, 2017-06-01) uses the
JIT fast path unless JIT support has not been compiled in the
linked library.

Starting from 10.23 of PCRE2, pcre2grep ignores any errors from
pcre2_jit_cpmpile as a workaround for their bug1749[1] and we
should do too, so that the interpreter could be used as a fallback
in cases where JIT was not available because of a security policy.

To be conservative, we are restricting initially the error to the
known error that would be returned in that case (and to be documented
as such in a future release of PCRE) and printing a warning so that
corrective action could be taken.

[1] https://bugs.exim.org/show_bug.cgi?id=1749

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
---
 grep.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Carlo Marcelo Arenas Belón Aug. 4, 2019, 7:43 a.m. UTC | #1
PROs:
* it works (only for PCRE2) and tested in OpenBSD, NetBSD, macOS, Linux (Debian)
* it applies everywhere (even pu) without conflicts
* it doesn't introduce any regressions in tests (tested in Debian with
SElinux in enforcing mode)
* it is simple

CONs:
* HardenedBSD still segfaults (bugfix proposed[1] to sljit/pcre)
* warning is noisy (at least once per thread) and might be even
ineffective as it goes to stderr while stdout with most the output
goes to a pager
* too conservative (pcre2grep shows all errors from pcre2_jit_compile
should be ignored)
* no tests

Known Issues:
* code is ugly (it even triggers a warning if you have the right compiler)
* code is suspiciously similar to one[2] that was rejected, but
hopefully commit message is better
* code is incomplete (PCRE1 has too many conflicting changes in flight
to attempt a similar fix)
* there are obvious blind spots in the tests that need fixing, and a
lot more testing in other platforms/architectures
* git still will sometimes die because the non fast path has UTF-8 issues

I still think the pcre.jit flag knob might be useful to workaround
some of the issues detailed in CONs but probably with a different
definition:
unset -> fallback (try JIT but use interpreter if that didn't work)
false -> don't even try to use JIT
true -> print warning and maybe even die (if we really think that is useful)

some performance numbers below for the perl tests

with JIT enabled (in non enforcing SELinux)

Test                                            this tree
---------------------------------------------------------------
7820.3: perl grep 'how.to'                      0.56(0.29+0.60)
7820.7: perl grep '^how to'                     0.49(0.29+0.54)
7820.11: perl grep '[how] to'                   0.54(0.39+0.51)
7820.15: perl grep '(e.t[^ ]*|v.ry) rare'       0.60(0.45+0.58)
7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'          0.58(0.30+0.61)

with "fallback to interpreter" (in enforcing SELinux)

Test                                            this tree
---------------------------------------------------------------
7820.3: perl grep 'how.to'                      0.64(0.59+0.56)
7820.7: perl grep '^how to'                     1.83(2.91+0.56)
7820.11: perl grep '[how] to'                   2.07(3.33+0.61)
7820.15: perl grep '(e.t[^ ]*|v.ry) rare'       2.89(4.91+0.66)
7820.19: perl grep 'm(ú|u)lt.b(æ|y)te'          0.78(0.86+0.55)

[1] https://github.com/zherczeg/sljit/pull/2
[2] https://public-inbox.org/git/20181209230024.43444-3-carenas@gmail.com/
Junio C Hamano Aug. 5, 2019, 8:16 p.m. UTC | #2
Carlo Arenas <carenas@gmail.com> writes:

> * code is suspiciously similar to one[2] that was rejected, but
> hopefully commit message is better
> ...
> [2] https://public-inbox.org/git/20181209230024.43444-3-carenas@gmail.com/

I do not recall ever rejecting that one.

It did not come with a good proposed log message to be accepted
as-is, so I do not find it surprising that I did not pick it up, was
waiting for a new iteration and then everybody forgot about it.

But that is quite different from getting rejected (with the
connotation that "don't attempt this bad idea again, unless the
world changes drastically").

In any case, this round looks a lot more reasoned.  I personally do
not think the warning() is a good idea.  As I said in the old
discussion, we by default should treat JIT as a mere optimization,
and we should stay out of the way most of the time.

An additional "must have JIT or we will die" [*1*] can be added on
top of this change, if somebody really cares.

Thanks.


[Reference]

*1* https://public-inbox.org/git/87pnu9yekk.fsf@evledraar.gmail.com/
diff mbox series

Patch

diff --git a/grep.c b/grep.c
index f7c3a5803e..593a1cb7a0 100644
--- a/grep.c
+++ b/grep.c
@@ -525,7 +525,13 @@  static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 	if (p->pcre2_jit_on == 1) {
 		jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE);
 		if (jitret)
-			die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret);
+			if (jitret == PCRE2_ERROR_NOMEMORY) {
+				warning("JIT couldn't be used in PCRE2");
+				p->pcre2_jit_on = 0;
+				return;
+			}
+			else
+				die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret);
 
 		/*
 		 * The pcre2_config(PCRE2_CONFIG_JIT, ...) call just