Message ID | 20190828145444.31588-1-carenas@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] grep: skip UTF8 checks explicitly | expand |
FWIW, the changes in grep.h are IMHO optional and hadn't been really tested as I couldn't find a version of PCRE1 old enough to trigger it but I am hoping are simple enough to work. As in my original proposal, there is no test because this is going to (as documented) trigger undefined behaviour and so I don't see how we can reliably test that. Goes without saying tthough, that the same is triggered when JIT is enabled and the JIT fast path is being used, so this is not a regression and will be more visible once ab/pcre-jit-fixes gets merged because we are moving out of the JIT fast path with a patch[0] in that series While Ævar made a point[1] that this wasn't a solution to the problem, it was because PCRE2 could have a better one (still missing but based on ab/pcre-jit-fixes), and it was expected that PCRE2 will be used a lot more and so it wasn't a good idea for it (something that doesn't apply to PCRE1) the patch was based on maint (like all bugfixes) but applies cleanly to master and next, it will conflict with pu but for easy of merge I'd applied it on top of cb/pcre1-cleanup and make it available in GitHub[2]; that branch could be used as well as a reroll for that topic (if that is preferred) the error message hasn't been changed on this patch, as it might make sense to improve it as well for PCRE2, but at least shouldn't be triggered anymore (ex, from running a freshly built git without the patch and linked against a non JIT enabled PCRE1): $ ./git-grep -P 'Nguyễn Thái.Ngọc' .mailmap:Nguyễn Thái Ngọc Duy <pclouds@gmail.com> fatal: pcre_exec failed with error code -10 Carlo [0] 685668faaa (grep: stop using a custom JIT stack with PCRE v1, 2019-07-26) [1] https://public-inbox.org/git/87lfwn70nb.fsf@evledraar.gmail.com/ [2] https://github.com/carenas/git/tree/pcre1-cleanup
ping any feedback on code/approach highly appreciated Carlo On Wed, Aug 28, 2019 at 9:57 AM Carlo Arenas <carenas@gmail.com> wrote: > > FWIW, the changes in grep.h are IMHO optional and hadn't been really > tested as I couldn't find a version of PCRE1 old enough to trigger it > but I am hoping are simple enough to work. > > As in my original proposal, there is no test because this is going to > (as documented) trigger undefined behaviour and so I don't see how we > can reliably test that. Goes without saying tthough, that the same is > triggered when JIT is enabled and the JIT fast path is being used, so > this is not a regression and will be more visible once > ab/pcre-jit-fixes gets merged because we are moving out of the JIT > fast path with a patch[0] in that series > > While Ævar made a point[1] that this wasn't a solution to the problem, > it was because PCRE2 could have a better one (still missing but based > on ab/pcre-jit-fixes), and it was expected that PCRE2 will be used a > lot more and so it wasn't a good idea for it (something that doesn't > apply to PCRE1) > > the patch was based on maint (like all bugfixes) but applies cleanly > to master and next, it will conflict with pu but for easy of merge I'd > applied it on top of cb/pcre1-cleanup and make it available in > GitHub[2]; that branch could be used as well as a reroll for that > topic (if that is preferred) > > the error message hasn't been changed on this patch, as it might make > sense to improve it as well for PCRE2, but at least shouldn't be > triggered anymore (ex, from running a freshly built git without the > patch and linked against a non JIT enabled PCRE1): > > $ ./git-grep -P 'Nguyễn Thái.Ngọc' > .mailmap:Nguyễn Thái Ngọc Duy <pclouds@gmail.com> > fatal: pcre_exec failed with error code -10 > > Carlo > > [0] 685668faaa (grep: stop using a custom JIT stack with PCRE v1, 2019-07-26) > [1] https://public-inbox.org/git/87lfwn70nb.fsf@evledraar.gmail.com/ > [2] https://github.com/carenas/git/tree/pcre1-cleanup
Carlo Arenas <carenas@gmail.com> writes: > ping > > any feedback on code/approach highly appreciated I'd prefer to see others weigh in before starting to merge the pcre stuff to 'next'. I do not mind taking this updated one that limits the scope to pcre1 as a pure regressino fix, if others agree. Thanks for keeping a tab on these pcre issues. >> While Ævar made a point[1] that this wasn't a solution to the problem, >> it was because PCRE2 could have a better one (still missing but based >> on ab/pcre-jit-fixes), and it was expected that PCRE2 will be used a >> lot more and so it wasn't a good idea for it (something that doesn't >> apply to PCRE1) >> >> the patch was based on maint (like all bugfixes) but applies cleanly >> to master and next, it will conflict with pu but for easy of merge I'd >> applied it on top of cb/pcre1-cleanup and make it available in >> GitHub[2]; that branch could be used as well as a reroll for that >> topic (if that is preferred) >> >> the error message hasn't been changed on this patch, as it might make >> sense to improve it as well for PCRE2, but at least shouldn't be >> triggered anymore (ex, from running a freshly built git without the >> patch and linked against a non JIT enabled PCRE1): >> >> $ ./git-grep -P 'Nguyễn Thái.Ngọc' >> .mailmap:Nguyễn Thái Ngọc Duy <pclouds@gmail.com> >> fatal: pcre_exec failed with error code -10 >> >> Carlo >> >> [0] 685668faaa (grep: stop using a custom JIT stack with PCRE v1, 2019-07-26) >> [1] https://public-inbox.org/git/87lfwn70nb.fsf@evledraar.gmail.com/ >> [2] https://github.com/carenas/git/tree/pcre1-cleanup
diff --git a/grep.c b/grep.c index f7c3a5803e..69ef69516e 100644 --- a/grep.c +++ b/grep.c @@ -421,7 +421,7 @@ static void compile_pcre1_regexp(struct grep_pat *p, const struct grep_opt *opt) static int pcre1match(struct grep_pat *p, const char *line, const char *eol, regmatch_t *match, int eflags) { - int ovector[30], ret, flags = 0; + int ovector[30], ret, flags = PCRE_NO_UTF8_CHECK; if (eflags & REG_NOTBOL) flags |= PCRE_NOTBOL; diff --git a/grep.h b/grep.h index 1875880f37..9c8797a017 100644 --- a/grep.h +++ b/grep.h @@ -3,6 +3,9 @@ #include "color.h" #ifdef USE_LIBPCRE1 #include <pcre.h> +#ifndef PCRE_NO_UTF8_CHECK +#define PCRE_NO_UTF8_CHECK 0 +#endif #ifdef PCRE_CONFIG_JIT #if PCRE_MAJOR >= 8 && PCRE_MINOR >= 32 #ifndef NO_LIBPCRE1_JIT
18547aacf5 ("grep/pcre: support utf-8", 2016-06-25) that was released with git 2.10 added the PCRE_UTF8 flag to PCRE1 matching including a call to has_non_ascii() to try to avoid breakage if there was non-utf8 encoded content in the haystack. Usually PCRE is compiled with JIT support (even if is not the default), and therefore the codepath used includes calling pcre_jit_exec, which skips UTF-8 validation by design (which might result in crashes or hangs) but when JIT support wasn't compiled we use pcre_exec instead with the posibility that grep might be aborted if invalid UTF-8 is found in the haystack. PCRE1 provides a flag since Mar 5, 2007 that could be used to skip the checks explicitly so use that to make both codepaths equivalent (the flag is ignored by pcre1_jit_exec) this fix is only implemented for PCRE1 because PCRE2 is likely to have a better solution (without the risks) instead in the future Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Helped-by: Eric Sunshine <sunshine@sunshineco.com> Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Suggested-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> --- V2: * drop PCRE2 support * add backward compatibility define grep.c | 4 ++-- grep.h | 3 +++ 2 files changed, 5 insertions(+), 2 deletions(-)