Message ID | 20190729105955.44390-1-carenas@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC,v2] grep: allow for run time disabling of JIT in PCRE | expand |
Known Issues: * PCRE1 is broken, but fixing it would make more sense on top of the topic[1] (not in pu) * it depends on the current ab/pcre-jit-fixes that is missing 1 critical commit in pu * no tests yet; would need to extend it on top of the debug from Beat and test-tool changes from Ævar, neither of which are final * need to build on top of pu and will need further changes to be ready for next/master/maint The code has been tested in OpenBSD with PCRE2 (latest from svn, but any version should work if they are JIT enabled), it is expected to also work in NetBSD (even with PAX enabled kernels) and macOS 10.13.6 but haven't yet tested them. HardenedBSD will likely segfault unless pcre.jit=0 as described in the original report[2] Testing with SElinux and PAX enabled for Linux encouraged Carlo [1] https://public-inbox.org/git/20190726202642.7986-1-carenas@gmail.com/ [2] https://public-inbox.org/git/20181209230024.43444-1-carenas@gmail.com/
Am 29.07.19 um 12:59 schrieb Carlo Marcelo Arenas Belón: > PCRE1 allowed for a compile time flag to disable JIT, but PCRE2 never > had one, forcing the use of JIT if -P was requested. > > After ed0479ce3d (Merge branch 'ab/no-kwset' into next, 2019-07-15) > the PCRE2 engine will be used more broadly and therefore adding this > knob will allow users a escape from situations where JIT might be > problematic. > > JIT will be used by default but it can be disabled with the --no-pcre-jit > option in `git grep` or by setting 0/false into the pcre.jit config. > > If a value of -1 is used instead then the following error is prevented by > using the interpreter when a JIT failure consistent with known security > restrictions is found at regex compilation time. > > $ git grep 'foo bar' > fatal: Couldn't JIT the PCRE2 pattern 'foo bar', got '-48' > > Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> > --- > V2: add command line to grep as suggested by Junio > > Documentation/git-grep.txt | 11 +++++++++++ > builtin/grep.c | 4 ++++ > grep.c | 30 ++++++++++++++++++++++++++---- > grep.h | 1 + > 4 files changed, 42 insertions(+), 4 deletions(-) > > diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt > index c89fb569e3..895c6b34ec 100644 > --- a/Documentation/git-grep.txt > +++ b/Documentation/git-grep.txt > @@ -13,6 +13,7 @@ SYNOPSIS > [-v | --invert-match] [-h|-H] [--full-name] > [-E | --extended-regexp] [-G | --basic-regexp] > [-P | --perl-regexp] > + [-j | --[no]-pcre-jit] Do users care? Enough so to add a short option for this? > [-F | --fixed-strings] [-n | --line-number] [--column] > [-l | --files-with-matches] [-L | --files-without-match] > [(-O | --open-files-in-pager) [<pager>]] > @@ -69,6 +70,12 @@ grep.fallbackToNoIndex:: > If set to true, fall back to git grep --no-index if git grep > is executed outside of a git repository. Defaults to false. > > +pcre.jit:: > + If set to false, disable JIT when using PCRE. Defaults to > + true. > + if set to -1 will try first to use JIT and fallback to the > + interpreter instead of returning an error. Why not implement only -1, without adding this config setting? > + > > OPTIONS > ------- > @@ -175,6 +182,10 @@ providing this option will cause it to die. > Use fixed strings for patterns (don't interpret pattern > as a regex). > > +-j:: > +--[no-]pcre-jit:: > + Diable JIT in PCRE with --no-pcre-jit. "Disable". René
René Scharfe <l.s.r@web.de> writes: >> +pcre.jit:: >> + If set to false, disable JIT when using PCRE. Defaults to >> + true. >> + if set to -1 will try first to use JIT and fallback to the >> + interpreter instead of returning an error. > > Why not implement only -1, without adding this config setting? ... nor command line option. If we have an auto-fallback, I would think that makes the most sense. IIRC the first iteration with only the configuration was really about working around the (non-working) pcre-jit---if we can self-detect and skip a non-working case, that would allow us to drop end-user facing knobs, which is ideal. Thanks for a doze of sanity.
On Mon, Jul 29, 2019 at 10:47 AM Junio C Hamano <gitster@pobox.com> wrote: > René Scharfe <l.s.r@web.de> writes: > >> +pcre.jit:: > >> + If set to false, disable JIT when using PCRE. Defaults to > >> + true. > >> + if set to -1 will try first to use JIT and fallback to the > >> + interpreter instead of returning an error. > > > > Why not implement only -1, without adding this config setting? > > ... nor command line option. If we have an auto-fallback, I would > think that makes the most sense. IIRC the first iteration with only > the configuration was really about working around the (non-working) > pcre-jit---if we can self-detect and skip a non-working case, that > would allow us to drop end-user facing knobs, which is ideal. because that was proposed earlier[1] and wasn't accepted ;) the main pushback though I got was that doing the fallback would degrade performance and so it was suggested[2] that keeping the error should be possible somehow (with the implication it will add yet another macro) since living without grep -P was a reasonable tradeoff at that time got punted, but the need to find a solution for this become more urgent once it was announced[3] PCRE2 would be used also used outside -P > Thanks for a doze of sanity. Obviously I am biased, but I kind of like the knob as it allows the user more flexibility to tweak the internals of grep and because we had made those internals already visible (ex: not handling any library errors ourselves and just aborting with a pcre error), but without any flexibility to fix those problems themselves (unless they open the code and rebuild, in most cases) the comment from the user that reported[4] a regression with GNU grep because of JIT stack size and that I quote below is representative of how that layering violation affect users, and while git users are more likely than grep users to do the code tweaking needed, they could use some help. "As using the JIT can not be turned off at runtime, nor can the stacksize be controlled without patching + recompiling, this breaks previously working expressions for me, so I consider this a new regression, introduced with b06f7a29a58bbdd5866afc1e92dba3fdc9e2ed59 . I tested that increasing the stack-size to 1 M fixes the problem for me. A better fix could maybe consist of a better error message, allowing stack-size control at runtime and / or making JIT optional at runtime." making JIT optional at runtime is therefore the title of this patch and as I mentioned in some other thread it might be even useful to us for our own tests. Carlo [1] https://public-inbox.org/git/20181209230024.43444-3-carenas@gmail.com/ [2] https://public-inbox.org/git/87zhtbn5xb.fsf@evledraar.gmail.com/ [3] https://public-inbox.org/git/CAPUEspjKxQFiRgmfb2SuR_xpVu4=MN66kGEeBK1pHdBgXQbv7Q@mail.gmail.com/ [4] https://www.mail-archive.com/bug-grep@gnu.org/msg05762.html
Am 30.07.19 um 02:49 schrieb Carlo Arenas: > On Mon, Jul 29, 2019 at 10:47 AM Junio C Hamano <gitster@pobox.com> wrote: >> René Scharfe <l.s.r@web.de> writes: >>>> +pcre.jit:: >>>> + If set to false, disable JIT when using PCRE. Defaults to >>>> + true. >>>> + if set to -1 will try first to use JIT and fallback to the >>>> + interpreter instead of returning an error. >>> >>> Why not implement only -1, without adding this config setting? >> >> ... nor command line option. If we have an auto-fallback, I would >> think that makes the most sense. IIRC the first iteration with only >> the configuration was really about working around the (non-working) >> pcre-jit---if we can self-detect and skip a non-working case, that >> would allow us to drop end-user facing knobs, which is ideal. > > because that was proposed earlier[1] and wasn't accepted ;) So you add all those knobs for Ævar? Users on OpenBSD would get an error whenever they used -P? And they are expected to discover an option for turning off said error? That could be handled automatically? This sounds like bullying to me. I simply wouldn't use -P anymore if that happened to me. If that error was returned without -P, I'd contemplate switching to the Silver Searcher or a similar tool. > the main pushback though I got was that doing the fallback would degrade > performance and so it was suggested[2] that keeping the error should be > possible somehow (with the implication it will add yet another macro) The slowdown by the fallback should be minimal -- just the extra wasted time to compile the pattern to machine code. Compilation time is probably (hopefully?) dwarfed by search time. IIUC, Ævar's concern was more about being able to discover when JIT is not available for some reason. Printing a warning with --debug or --verbose could help with that. Personally I don't care about JIT at all and wouldn't want to see such a warning and certainly prefer not to get any error message about it, even more so since there is nothing I can do about it. (Disabling security features to get faster search sounds sounds like a no-go.) > since living without grep -P was a reasonable tradeoff at that time got > punted, but the need to find a solution for this become more urgent once > it was announced[3] PCRE2 would be used also used outside -P Right. > Obviously I am biased, but I kind of like the knob as it allows the user > more flexibility to tweak the internals of grep and because we had > made those internals already visible (ex: not handling any library > errors ourselves and just aborting with a pcre error), but without any > flexibility to fix those problems themselves (unless they open the code > and rebuild, in most cases) > > the comment from the user that reported[4] a regression with GNU grep > because of JIT stack size and that I quote below is representative of how > that layering violation affect users, and while git users are more likely > than grep users to do the code tweaking needed, they could use some > help. So JIT can cause other problems, and some of them we don't (or can't) handle in our code? Turning on an unstable feature without a way to disable it sounds like a bad idea indeed. Is it really that bad? Can we grow the stack on demand, for example, as GNU grep does now in response to the bug you mentioned (http://bugs.gnu.org/19833)? Are there more such examples? > making JIT optional at runtime is therefore the title of this patch and as > I mentioned in some other thread it might be even useful to us for our > own tests. Testability is a valid concern, especially if the JIT code is considered, well, unfinished. René
Hi, On Mon, 29 Jul 2019, Carlo Marcelo Arenas Belón wrote: > $ git grep 'foo bar' > fatal: Couldn't JIT the PCRE2 pattern 'foo bar', got '-48' My immediate reaction to this error message was: That's not helpful. What is `-48` supposed to mean? Why do we even think it sensible to throw such an error message at the end user? Can't we do a much better job translating that into something that makes actual sense without knowing implementation details? But then, I realized that -48 must be a well-known constant in PCRE2, and my reaction transformed into something much more hopeful: why don't we detect the situation where the JIT'ed code was not actually executable [*1*], and fall back to the non-JIT'ed code path ourselves, without troubling the end user (maybe warning, but maybe better not lest we annoy the user with something pointless)? Even after finding out that -48 disappointingly means PCRE2_ERROR_NOMEMORY (as opposed to something like PCRE2_ERROR_CANNOT_EXECUTE_JIT_CODE), I like the idea of not bothering end users and doing the sensible fallback under the hood. Ciao, Dscho Footnote *1*: Why anybody would think it sensible to build a PCRE2 with JIT on an OS that does not allow executing code that was written by the same process is beyond me. Or is there a mode in OpenBSD that *does* allow JIT'ed code to be executed?
Hi Carlo, On Mon, 29 Jul 2019, Carlo Arenas wrote: > On Mon, Jul 29, 2019 at 10:47 AM Junio C Hamano <gitster@pobox.com> wrote: > > René Scharfe <l.s.r@web.de> writes: > > >> +pcre.jit:: > > >> + If set to false, disable JIT when using PCRE. Defaults to > > >> + true. > > >> + if set to -1 will try first to use JIT and fallback to the > > >> + interpreter instead of returning an error. > > > > > > Why not implement only -1, without adding this config setting? > > > > ... nor command line option. If we have an auto-fallback, I would > > think that makes the most sense. IIRC the first iteration with only > > the configuration was really about working around the (non-working) > > pcre-jit---if we can self-detect and skip a non-working case, that > > would allow us to drop end-user facing knobs, which is ideal. > > because that was proposed earlier[1] and wasn't accepted ;) > [...] > [1] https://public-inbox.org/git/20181209230024.43444-3-carenas@gmail.com/ For the record, I read https://public-inbox.org/git/xmqqh8flkgs2.fsf@gitster-ct.c.googlers.com/ as encouraging a slightly more powerful argument in favor. Junio seemed to hope that PCRE2's own `pcre2grep` would behave that way, and that would give us plenty reason to just imitate it. I don't know whether `pcre2grep` behaves that way, even if it does not, I think the benefits of the auto fallback to the end user are considerable. Ciao, Dscho
On Wed, Jul 31 2019, Johannes Schindelin wrote: > Hi, > > On Mon, 29 Jul 2019, Carlo Marcelo Arenas Belón wrote: > >> $ git grep 'foo bar' >> fatal: Couldn't JIT the PCRE2 pattern 'foo bar', got '-48' > > My immediate reaction to this error message was: That's not helpful. > What is `-48` supposed to mean? Why do we even think it sensible to > throw such an error message at the end user? Can't we do a much better > job translating that into something that makes actual sense without > knowing implementation details? > > But then, I realized that -48 must be a well-known constant in PCRE2, > and my reaction transformed into something much more hopeful: why don't > we detect the situation where the JIT'ed code was not actually > executable [*1*], and fall back to the non-JIT'ed code path ourselves, > without troubling the end user (maybe warning, but maybe better not lest > we annoy the user with something pointless)? > > Even after finding out that -48 disappointingly means > PCRE2_ERROR_NOMEMORY (as opposed to something like > PCRE2_ERROR_CANNOT_EXECUTE_JIT_CODE), I like the idea of not bothering > end users and doing the sensible fallback under the hood. > > Ciao, > Dscho > > Footnote *1*: Why anybody would think it sensible to build a PCRE2 with > JIT on an OS that does not allow executing code that was written by the > same process is beyond me. Or is there a mode in OpenBSD that *does* > allow JIT'ed code to be executed? We do detect if JIT isn't supported and fall back. That's what the pcre2_config(PCRE2_CONFIG_JIT, &p->pcre2_jit_on) code in grep.c does. This and is the subsequent pcre2_pattern_info() call is how PCRE documents that you should do this. What hasn't been supported is all of that saying "yes, I support JIT" and the feature then fail whaling. I had not encountered that before. So far that seems like because Carlo just built a completely broken PCRE v2 package, so I don't know if that's worth supporting on our side. I.e. this isn't something I think could plausibly happen in the wild. That should *not* be confused with me thinking other stuff Carlo's raised is a non-issue, e.g. running into the JIT stack limit etc. Some of that's clearly bugs in our/my grep.c code that need fixing.
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > For the record, I read > https://public-inbox.org/git/xmqqh8flkgs2.fsf@gitster-ct.c.googlers.com/ > as encouraging a slightly more powerful argument in favor. Junio seemed > to hope that PCRE2's own `pcre2grep` would behave that way, and that > would give us plenty reason to just imitate it. > > I don't know whether `pcre2grep` behaves that way, even if it does not, > I think the benefits of the auto fallback to the end user are > considerable. Thanks for digging that up ;-) I do agree with what was said there. JIT is merely an optimization, and we should be able to work without it and should not even bother the users with warning messages when we have to choose non-JIT codepath. Those who care about debugging can use a "--debug" option or something to figure out if their build on a particular pattern is or is not using JIT. I think I read somebody (Carlo?) made that argument in the thread earlier, and I agree with that sentiment.
On Wed, Jul 31, 2019 at 7:57 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > What hasn't been supported is all of that saying "yes, I support JIT" > and the feature then fail whaling. I had not encountered that before. > > So far that seems like because Carlo just built a completely broken PCRE > v2 package, so I don't know if that's worth supporting on our > side. I.e. this isn't something I think could plausibly happen in the > wild. since you are in Debian please follow the instructions here: https://wiki.debian.org/SELinux/Setup no need to rebuild git or pcre (but to enable selinux will need to reboot twice), then type as root the following: # set enforce 1 Carlo
diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index c89fb569e3..895c6b34ec 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -13,6 +13,7 @@ SYNOPSIS [-v | --invert-match] [-h|-H] [--full-name] [-E | --extended-regexp] [-G | --basic-regexp] [-P | --perl-regexp] + [-j | --[no]-pcre-jit] [-F | --fixed-strings] [-n | --line-number] [--column] [-l | --files-with-matches] [-L | --files-without-match] [(-O | --open-files-in-pager) [<pager>]] @@ -69,6 +70,12 @@ grep.fallbackToNoIndex:: If set to true, fall back to git grep --no-index if git grep is executed outside of a git repository. Defaults to false. +pcre.jit:: + If set to false, disable JIT when using PCRE. Defaults to + true. + if set to -1 will try first to use JIT and fallback to the + interpreter instead of returning an error. + OPTIONS ------- @@ -175,6 +182,10 @@ providing this option will cause it to die. Use fixed strings for patterns (don't interpret pattern as a regex). +-j:: +--[no-]pcre-jit:: + Diable JIT in PCRE with --no-pcre-jit. + -n:: --line-number:: Prefix the line number to matching lines. diff --git a/builtin/grep.c b/builtin/grep.c index 580fd38f41..b0e94875b2 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -923,6 +923,10 @@ int cmd_grep(int argc, const char **argv, const char *prefix) OPT_BOOL_F(0, "ext-grep", &external_grep_allowed__ignored, N_("allow calling of grep(1) (ignored by this build)"), PARSE_OPT_NOCOMPLETE), + OPT_GROUP("PCRE"), + OPT_SET_INT('j', "pcre-jit", &opt.pcre_jit, + N_("when to use JIT with PCRE"), + 1), OPT_END() }; diff --git a/grep.c b/grep.c index c7c06ae08d..d58cad0257 100644 --- a/grep.c +++ b/grep.c @@ -56,6 +56,7 @@ void init_grep_defaults(struct repository *repo) opt->repo = repo; opt->relative = 1; opt->pathname = 1; + opt->pcre_jit = 1; opt->max_depth = -1; opt->pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED; color_set(opt->colors[GREP_COLOR_CONTEXT], ""); @@ -125,6 +126,12 @@ int grep_config(const char *var, const char *value, void *cb) return 0; } + if (!strcmp(var, "pcre.jit")) { + int is_bool; + opt->pcre_jit = git_config_bool_or_int(var, value, &is_bool); + return 0; + } + if (!strcmp(var, "color.grep")) opt->color = git_config_colorbool(var, value); if (!strcmp(var, "color.grep.match")) { @@ -163,6 +170,7 @@ void grep_init(struct grep_opt *opt, struct repository *repo, const char *prefix opt->pattern_tail = &opt->pattern_list; opt->header_tail = &opt->header_list; + opt->pcre_jit = def->pcre_jit; opt->only_matching = def->only_matching; opt->color = def->color; opt->extended_regexp_option = def->extended_regexp_option; @@ -393,7 +401,8 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt) die("%s", error); #ifdef GIT_PCRE1_USE_JIT - pcre_config(PCRE_CONFIG_JIT, &p->pcre1_jit_on); + if (opt->pcre_jit) + pcre_config(PCRE_CONFIG_JIT, &p->pcre1_jit_on); #endif } @@ -489,11 +498,24 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt compile_regexp_failed(p, (const char *)&errbuf); } - pcre2_config(PCRE2_CONFIG_JIT, &p->pcre2_jit_on); + if (opt->pcre_jit) + pcre2_config(PCRE2_CONFIG_JIT, &p->pcre2_jit_on); + if (p->pcre2_jit_on) { jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE); - if (jitret) - die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret); + if (jitret) { + if ((opt->pcre_jit < 0) && + jitret == PCRE2_ERROR_NOMEMORY) { + /* + * JIT compiler isn't available but we can + * still fallback to the interpreter + */ + p->pcre2_jit_on = 0; + return; + } + else + die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret); + } /* * The pcre2_config(PCRE2_CONFIG_JIT, ...) call just diff --git a/grep.h b/grep.h index c0c71eb4a9..fff152e606 100644 --- a/grep.h +++ b/grep.h @@ -151,6 +151,7 @@ struct grep_opt { int allow_textconv; int extended; int use_reflog_filter; + int pcre_jit; int pcre1; int pcre2; int relative;
PCRE1 allowed for a compile time flag to disable JIT, but PCRE2 never had one, forcing the use of JIT if -P was requested. After ed0479ce3d (Merge branch 'ab/no-kwset' into next, 2019-07-15) the PCRE2 engine will be used more broadly and therefore adding this knob will allow users a escape from situations where JIT might be problematic. JIT will be used by default but it can be disabled with the --no-pcre-jit option in `git grep` or by setting 0/false into the pcre.jit config. If a value of -1 is used instead then the following error is prevented by using the interpreter when a JIT failure consistent with known security restrictions is found at regex compilation time. $ git grep 'foo bar' fatal: Couldn't JIT the PCRE2 pattern 'foo bar', got '-48' Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> --- V2: add command line to grep as suggested by Junio Documentation/git-grep.txt | 11 +++++++++++ builtin/grep.c | 4 ++++ grep.c | 30 ++++++++++++++++++++++++++---- grep.h | 1 + 4 files changed, 42 insertions(+), 4 deletions(-) base-commit: 870eea81669bfff4333b37b11fedd870cd05fd90