grep: fall back to interpreter mode if JIT fails

Message ID	20221216121557.30714-1-minipli@grsecurity.net (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> From: Mathias Krause <minipli@grsecurity.net> To: git@vger.kernel.org Cc: Mathias Krause <minipli@grsecurity.net>, =?utf-8?q?Carlo_Marcelo_Arenas_?= =?utf-8?q?Bel=C3=B3n?= <carenas@gmail.com> Subject: [PATCH] grep: fall back to interpreter mode if JIT fails Date: Fri, 16 Dec 2022 13:15:57 +0100 Message-Id: <20221216121557.30714-1-minipli@grsecurity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	grep: fall back to interpreter mode if JIT fails \| expand grep: fall back to interpreter mode if JIT fails

Mathias Krause Dec. 16, 2022, 12:15 p.m. UTC

From: Carlo Marcelo Arenas Belón <carenas@gmail.com>

Under Linux systems with SELinux's 'deny_execmem' or PaX's MPROTECT
enabled, the allocation of PCRE2's JIT rwx memory may be prohibited,
making pcre2_jit_compile() fail with PCRE2_ERROR_NOMEMORY (-48):

  [user@fedora git]$ git grep -c PCRE2_JIT
  grep.c:1

  [user@fedora git]$ # Enable SELinux's W^X policy
  [user@fedora git]$ sudo semanage boolean -m -1 deny_execmem

  [user@fedora git]$ # JIT memory allocation fails, breaking 'git grep'
  [user@fedora git]$ git grep -c PCRE2_JIT
  fatal: Couldn't JIT the PCRE2 pattern 'PCRE2_JIT', got '-48'

Instead of failing hard in this case and making 'git grep' unusable on
such systems, simply fall back to interpreter mode, leading to a much
better user experience.

Such a change was already proposed 4 years ago [1] but wasn't merged for
unknown reasons.

1. https://lore.kernel.org/r/20181209230024.43444-3-carenas@gmail.com

Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Mathias Krause <minipli@grsecurity.net>	# tweaked changelog, added comment
---
 grep.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

Ævar Arnfjörð Bjarmason Dec. 16, 2022, 4:12 p.m. UTC | #1

On Fri, Dec 16 2022, Mathias Krause wrote:

> From: Carlo Marcelo Arenas Belón <carenas@gmail.com>
>
> Under Linux systems with SELinux's 'deny_execmem' or PaX's MPROTECT
> enabled, the allocation of PCRE2's JIT rwx memory may be prohibited,
> making pcre2_jit_compile() fail with PCRE2_ERROR_NOMEMORY (-48):
>
>   [user@fedora git]$ git grep -c PCRE2_JIT
>   grep.c:1
>
>   [user@fedora git]$ # Enable SELinux's W^X policy
>   [user@fedora git]$ sudo semanage boolean -m -1 deny_execmem
>
>   [user@fedora git]$ # JIT memory allocation fails, breaking 'git grep'
>   [user@fedora git]$ git grep -c PCRE2_JIT
>   fatal: Couldn't JIT the PCRE2 pattern 'PCRE2_JIT', got '-48'
>
> Instead of failing hard in this case and making 'git grep' unusable on
> such systems, simply fall back to interpreter mode, leading to a much
> better user experience.
>
> Such a change was already proposed 4 years ago [1] but wasn't merged for
> unknown reasons.

Yeah, it's unfortunate that it fell between the cracks, and it's good to
have such a fallback mechanism.

> 1. https://lore.kernel.org/r/20181209230024.43444-3-carenas@gmail.com
>
> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> Signed-off-by: Mathias Krause <minipli@grsecurity.net>	# tweaked changelog, added comment
> ---
>  grep.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/grep.c b/grep.c
> index 06eed694936c..f2ada528b21d 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -317,8 +317,21 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
>  	pcre2_config(PCRE2_CONFIG_JIT, &p->pcre2_jit_on);
>  	if (p->pcre2_jit_on) {
>  		jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE);
> -		if (jitret)
> -			die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret);
> +		if (jitret) {
> +			/*
> +			 * Even though pcre2_config(PCRE2_CONFIG_JIT, ...)
> +			 * indicated JIT support, the library might still
> +			 * fail to generate JIT code for various reasons,
> +			 * e.g. when SELinux's 'deny_execmem' or PaX's
> +			 * MPROTECT prevent creating W|X memory mappings.
> +			 *
> +			 * Instead of faling hard, fall back to interpreter
> +			 * mode, just as if the pattern was prefixed with
> +			 * '(*NO_JIT)'.
> +			 */
> +			p->pcre2_jit_on = 0;
> +			return;

From my reading of the docs it returns two different codes:
PCRE2_ERROR_JIT_BADOPTION or PCRE2_ERROR_NOMEMORY.

This change will start treating both the same, but we only want to allow
the latter, surely?

So shouldn't this be e.g.:

	jitret = pcre2_jit_compile(...);
	if (jitret == PCRE2_ERROR_NOMEMORY) {
		/* code you added here */
	} else if (jitret) {
		BUG(...);
	}

I put a BUG() there, we could keep the die(), but
PCRE2_ERROR_JIT_BADOPTION is really more appropriate as a BUG(), and if
it starts returning any other codes our use of the API is also in some
unknown state, so we should also BUG() out.

Mathias Krause Dec. 16, 2022, 7:26 p.m. UTC | #2

Am 16.12.22 um 17:12 schrieb Ævar Arnfjörð Bjarmason:
> 
> On Fri, Dec 16 2022, Mathias Krause wrote:
> 
>> From: Carlo Marcelo Arenas Belón <carenas@gmail.com>
>>
>> Under Linux systems with SELinux's 'deny_execmem' or PaX's MPROTECT
>> enabled, the allocation of PCRE2's JIT rwx memory may be prohibited,
>> making pcre2_jit_compile() fail with PCRE2_ERROR_NOMEMORY (-48):
>>
>>   [user@fedora git]$ git grep -c PCRE2_JIT
>>   grep.c:1
>>
>>   [user@fedora git]$ # Enable SELinux's W^X policy
>>   [user@fedora git]$ sudo semanage boolean -m -1 deny_execmem
>>
>>   [user@fedora git]$ # JIT memory allocation fails, breaking 'git grep'
>>   [user@fedora git]$ git grep -c PCRE2_JIT
>>   fatal: Couldn't JIT the PCRE2 pattern 'PCRE2_JIT', got '-48'
>>
>> Instead of failing hard in this case and making 'git grep' unusable on
>> such systems, simply fall back to interpreter mode, leading to a much
>> better user experience.
>>
>> Such a change was already proposed 4 years ago [1] but wasn't merged for
>> unknown reasons.
> 
> Yeah, it's unfortunate that it fell between the cracks, and it's good to
> have such a fallback mechanism.

I fully agree, obviously!

>> 1. https://lore.kernel.org/r/20181209230024.43444-3-carenas@gmail.com
>>
>> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
>> Signed-off-by: Mathias Krause <minipli@grsecurity.net>	# tweaked changelog, added comment
>> ---
>>  grep.c | 17 +++++++++++++++--
>>  1 file changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/grep.c b/grep.c
>> index 06eed694936c..f2ada528b21d 100644
>> --- a/grep.c
>> +++ b/grep.c
>> @@ -317,8 +317,21 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
>>  	pcre2_config(PCRE2_CONFIG_JIT, &p->pcre2_jit_on);
>>  	if (p->pcre2_jit_on) {
>>  		jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE);
>> -		if (jitret)
>> -			die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret);
>> +		if (jitret) {
>> +			/*
>> +			 * Even though pcre2_config(PCRE2_CONFIG_JIT, ...)
>> +			 * indicated JIT support, the library might still
>> +			 * fail to generate JIT code for various reasons,
>> +			 * e.g. when SELinux's 'deny_execmem' or PaX's
>> +			 * MPROTECT prevent creating W|X memory mappings.
>> +			 *
>> +			 * Instead of faling hard, fall back to interpreter
>> +			 * mode, just as if the pattern was prefixed with
>> +			 * '(*NO_JIT)'.
>> +			 */
>> +			p->pcre2_jit_on = 0;
>> +			return;
> 
> From my reading of the docs it returns two different codes:
> PCRE2_ERROR_JIT_BADOPTION or PCRE2_ERROR_NOMEMORY.
> 
> This change will start treating both the same, but we only want to allow
> the latter, surely?
> 
> So shouldn't this be e.g.:
> 
> 	jitret = pcre2_jit_compile(...);
> 	if (jitret == PCRE2_ERROR_NOMEMORY) {
> 		/* code you added here */
> 	} else if (jitret) {
> 		BUG(...);
> 	}

I'm fine with it either way. In fact, just testing for the "JIT memory
cannot be allocated" error might be more on point and could detect wrong
API usage by the means of an error. However, from a user's point of view
a fall back to interpreter mode might still be desired in this case, as
a failing 'git grep' is simply not acceptable, IMHO.

> I put a BUG() there, we could keep the die(), but
> PCRE2_ERROR_JIT_BADOPTION is really more appropriate as a BUG(), and if
> it starts returning any other codes our use of the API is also in some
> unknown state, so we should also BUG() out.

Valid points. I'll give others some more time to churn in and if there
are no strong objections, I'll change it like you suggest.

Thanks,
Mathias

Junio C Hamano Dec. 16, 2022, 10:52 p.m. UTC | #3

Mathias Krause <minipli@grsecurity.net> writes:

> Such a change was already proposed 4 years ago [1] but wasn't merged for
> unknown reasons.
>
> 1. https://lore.kernel.org/r/20181209230024.43444-3-carenas@gmail.com

This part does not belong to the log message but should go below
three-dash line.  If you have a more concrete "it was rejected
because ...", to help judging why this version improves upon the
previous attempt and it is clear the reason for past rejection no
longer applies, that is a very different story, though.

The way I read the original thread (assuming that the lore archive
is showing all relevant messages there) is that RFC was reviewed
well and everybody was happy about the direction it took.  It even
received fairly concrete suggestions for the real, non-RFC version,
but that never materialized.

It is very clear that the posted patch was not yet in a mergeable
state as-is; "for unknown reasons" is less than helpful.

Now, we learned a more concrete reason, i.e. "it got tons of help
improving the draft into the final version, but the help was
discarded and the final version did not materialize", does the
attempt this time around improve on it sufficiently to make it
mergeable, or is it sufficiently different that it needs a fresh
review from scratch?  If the latter, then "for unknown reasons"
becomes even less relevant.

The rest of the proposed log message, and the change itself, both
look very reasonable and well explained, at least to me.

Thanks.

> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
> Signed-off-by: Mathias Krause <minipli@grsecurity.net>	# tweaked changelog, added comment
> ---
>  grep.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/grep.c b/grep.c
> index 06eed694936c..f2ada528b21d 100644
> --- a/grep.c
> +++ b/grep.c
> @@ -317,8 +317,21 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
>  	pcre2_config(PCRE2_CONFIG_JIT, &p->pcre2_jit_on);
>  	if (p->pcre2_jit_on) {
>  		jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE);
> -		if (jitret)
> -			die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret);
> +		if (jitret) {
> +			/*
> +			 * Even though pcre2_config(PCRE2_CONFIG_JIT, ...)
> +			 * indicated JIT support, the library might still
> +			 * fail to generate JIT code for various reasons,
> +			 * e.g. when SELinux's 'deny_execmem' or PaX's
> +			 * MPROTECT prevent creating W|X memory mappings.
> +			 *
> +			 * Instead of faling hard, fall back to interpreter
> +			 * mode, just as if the pattern was prefixed with
> +			 * '(*NO_JIT)'.
> +			 */
> +			p->pcre2_jit_on = 0;
> +			return;
> +		}
>  
>  		/*
>  		 * The pcre2_config(PCRE2_CONFIG_JIT, ...) call just

Junio C Hamano Dec. 16, 2022, 11:09 p.m. UTC | #4

Mathias Krause <minipli@grsecurity.net> writes:

> ... However, from a user's point of view a fall back to
> interpreter mode might still be desired in this case, as a failing
> 'git grep' is simply not acceptable, IMHO.

"git grep" that silently produces a wrong result (by falling back
after a problem is detected) would not be acceptable, either.
Receiving BADOPTION could be a sign that there is something wrong in
the input, not from the end-user but from the code, in which case
stopping with BUG() may be a more appropriate?

>> I put a BUG() there, we could keep the die(), but
>> PCRE2_ERROR_JIT_BADOPTION is really more appropriate as a BUG(), and if
>> it starts returning any other codes our use of the API is also in some
>> unknown state, so we should also BUG() out.
>
> Valid points. I'll give others some more time to churn in and if there
> are no strong objections, I'll change it like you suggest.

Yup, sounds quite sensible.

Carlo Marcelo Arenas Belón Dec. 17, 2022, 2:50 a.m. UTC | #5

On Fri, Dec 16, 2022 at 3:09 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Mathias Krause <minipli@grsecurity.net> writes:
>
> > ... However, from a user's point of view a fall back to
> > interpreter mode might still be desired in this case, as a failing
> > 'git grep' is simply not acceptable, IMHO.
>
> "git grep" that silently produces a wrong result (by falling back
> after a problem is detected) would not be acceptable, either.

except that an error at this point only invalidates the use of JIT,
so calling pcre2_jit_match() is invalid but calling pcre2_match() is not.

the later is setup to be used later by the code that is added,

> Receiving BADOPTION could be a sign that there is something wrong in
> the input, not from the end-user but from the code, in which case
> stopping with BUG() may be a more appropriate?

The way PCRE handles this kind of errors internally is to instruct pcre2_match()
to use the interpreter.

While a BUG() might be a way to ensure the code is using the right set
of options
I would expect that the failure will be reported by pcre2_compile
instead, with the
only cases left, only being interna to PCRE (ex: JIT can't yet support
a feature the
interpreter has)

Carlo

Ævar Arnfjörð Bjarmason Dec. 19, 2022, 9 a.m. UTC | #6

On Fri, Dec 16 2022, Carlo Arenas wrote:

[CC-ing pcre-dev@ for this "future error API" discussion]

> On Fri, Dec 16, 2022 at 3:09 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Mathias Krause <minipli@grsecurity.net> writes:
>>
>> > ... However, from a user's point of view a fall back to
>> > interpreter mode might still be desired in this case, as a failing
>> > 'git grep' is simply not acceptable, IMHO.
>>
>> "git grep" that silently produces a wrong result (by falling back
>> after a problem is detected) would not be acceptable, either.
>
> except that an error at this point only invalidates the use of JIT,
> so calling pcre2_jit_match() is invalid but calling pcre2_match() is not.
>
> the later is setup to be used later by the code that is added,

I think we could stumble ahead, but if this were to happen our
assumptions about how the API works have been invalidated.

The pcre2_jit_compile() doesn't promise to return a finite set of error
codes, but:

	[...]0 for success, or a negative error code otherwise[...]

But if new codes were added it's anyone's guess what state we'd be in,
so I think the safe thing is to BUG() out if we get as far as
pcre2_jit_compile() and don't get either PCRE2_ERROR_JIT_BADOPTION or
PCRE2_ERROR_NOMEMORY.

>> Receiving BADOPTION could be a sign that there is something wrong in
>> the input, not from the end-user but from the code, in which case
>> stopping with BUG() may be a more appropriate?
>
> The way PCRE handles this kind of errors internally is to instruct pcre2_match()
> to use the interpreter.
>
> While a BUG() might be a way to ensure the code is using the right set
> of options
> I would expect that the failure will be reported by pcre2_compile
> instead, with the
> only cases left, only being interna to PCRE (ex: JIT can't yet support
> a feature the
> interpreter has)

I agree that it's possible in general that an external library might
start returning a "benign" error code that we could recover from, so
BUG(...) would be overdoing it.

I just don't see that libpcre would do that in this case. In general the
JIT supports all patterns that the normal engine does, so if we're past
the "is it available?" hump it should work.

If it starts not doing that I'd think they'd do a major version upgrade,
or opt-in with a new flag etc.

Note that it already has a way of checking for "we tried to do the jit
thing, but it wasn't on in the end". See the code I added in
a25b9085043[1] (grep: fix segfault under -P + PCRE2 <=10.30 + (*NO_JIT),
2017-11-23), which comes right after the pcre2_jit_compile().

So not only would a BUG() biting us here require them to create a new
code for the state of "we have the JIT, but can't use it here" (for some
reason I can't imagine, as "PCRE2_ERROR_NOMEMORY" is already
"overloaded" to mean that).

It would also require them to invent a new "soft" failure mode for the
JIT, i.e. not the facility added in a25b9085043, where we can use the
JIT, but it's not on after all due to a "(*NO_JIT)" in the pattern
itself.

But I may be wrong, so I've CC'd pcre-dev@ to see if they have any
commentary on this proposed API paranoia. For them: The greater context
of this thread on the git ML is at [2].

1. https://github.com/git/git/commit/a25b9085043b8029169b4d5b172b78ca3f38fb37
2. https://lore.kernel.org/git/20221216121557.30714-1-minipli@grsecurity.net/

Mathias Krause Dec. 20, 2022, 7:29 p.m. UTC | #7

Am 19.12.22 um 10:00 schrieb Ævar Arnfjörð Bjarmason:
> 
> On Fri, Dec 16 2022, Carlo Arenas wrote:
> 
> [CC-ing pcre-dev@ for this "future error API" discussion]
> 
>> On Fri, Dec 16, 2022 at 3:09 PM Junio C Hamano <gitster@pobox.com> wrote:
>>>
>>> Mathias Krause <minipli@grsecurity.net> writes:
>>>
>>>> ... However, from a user's point of view a fall back to
>>>> interpreter mode might still be desired in this case, as a failing
>>>> 'git grep' is simply not acceptable, IMHO.
>>>
>>> "git grep" that silently produces a wrong result (by falling back
>>> after a problem is detected) would not be acceptable, either.
>>
>> except that an error at this point only invalidates the use of JIT,
>> so calling pcre2_jit_match() is invalid but calling pcre2_match() is not.
>>
>> the later is setup to be used later by the code that is added,
> 
> I think we could stumble ahead, but if this were to happen our
> assumptions about how the API works have been invalidated.

Well, pcre2_jit_compile() might fail for internal reasons, e.g.
pcre2jit(3) states: "[...] If a pattern is too big, a call to
pcre2_jit_compile() returns PCRE2_ERROR_NOMEMORY."

For example, the following fails for me:
$ git grep -P "$(perl -e 'print "(.)" x 4000')" -- grep.c
fatal: Couldn't JIT the PCRE2 pattern '(.)(.)(.)(.)…

But explicitly disabling JIT makes it "work":
$ git grep -P "(*NO_JIT)$(perl -e 'print "(.)" x 4000')" -- grep.c
$

It's a made up example and might even be intended behavior by git, but
it also proves a point Carlo already mentioned, a failing call to
pcre2_jit_compile() only invalidates the use of the JIT engine. We can
still use and fall back to the interpreter.

It would be used anyway if PCRE2 was compiled without JIT support, so I
don't see any issues with falling back to interpreter mode if the JIT
compilation fails -- for whatever reason.

> The pcre2_jit_compile() doesn't promise to return a finite set of error
> codes, but:
> 
> 	[...]0 for success, or a negative error code otherwise[...]
> 
> But if new codes were added it's anyone's guess what state we'd be in,
> so I think the safe thing is to BUG() out if we get as far as
> pcre2_jit_compile() and don't get either PCRE2_ERROR_JIT_BADOPTION or
> PCRE2_ERROR_NOMEMORY.

But why BUG()? JIT is an optimization that might fail for PCRE2 internal
reasons. Why should we make 'git grep' fail too in this case when we can
handle it just fine by attempting to use the interpreter?

If the pattern is really bogus, the interpreter will complain as well
and we'll error out. But failing just because the JIT engine can't
handle the pattern? Doesn't sound right to me.

>>> Receiving BADOPTION could be a sign that there is something wrong in
>>> the input, not from the end-user but from the code, in which case
>>> stopping with BUG() may be a more appropriate?
>>
>> The way PCRE handles this kind of errors internally is to instruct pcre2_match()
>> to use the interpreter.
>>
>> While a BUG() might be a way to ensure the code is using the right set
>> of options
>> I would expect that the failure will be reported by pcre2_compile
>> instead, with the
>> only cases left, only being interna to PCRE (ex: JIT can't yet support
>> a feature the
>> interpreter has)
> 
> I agree that it's possible in general that an external library might
> start returning a "benign" error code that we could recover from, so
> BUG(...) would be overdoing it.

And I think that's the case here: JIT is an optimization that might not
be available under all circumstances, as, for example, under SELinux's
'deny_execmem' setting. So we need to have a backup plan for such
systems anyway. Why not always try to use the interpreter if JIT
compilation fails?

> I just don't see that libpcre would do that in this case. In general the
> JIT supports all patterns that the normal engine does, so if we're past
> the "is it available?" hump it should work.

Leaving my above example aside, showing that JIT cannot handle very long
patterns the interpreter mode does, it is, in fact, unfortunate that
libpcre2 announces JIT support when it could know better that it does not.

In fact, I also try to fix this misbehavior in PCRE2/SLJIT[1,2], but the
maintainer thinks it's a bug in git which should be fixed by handling
errors from pcre2_jit_compile() and falling back to interpreter mode[3].

While I still think this can pretty much be fixed in PCRE2 alone,
handling possible PCRE2 JIT errors in git and falling back to
interpreter mode is still a sensible thing to do.

[1] https://github.com/PCRE2Project/pcre2/pull/157
[2] https://github.com/zherczeg/sljit/pull/136
[3] https://github.com/zherczeg/sljit/pull/136#issuecomment-1307254018

> If it starts not doing that I'd think they'd do a major version upgrade,
> or opt-in with a new flag etc.
> 
> Note that it already has a way of checking for "we tried to do the jit
> thing, but it wasn't on in the end". See the code I added in
> a25b9085043[1] (grep: fix segfault under -P + PCRE2 <=10.30 + (*NO_JIT),
> 2017-11-23), which comes right after the pcre2_jit_compile().

Yeah, that's required as well, unfortunately. :/

> So not only would a BUG() biting us here require them to create a new
> code for the state of "we have the JIT, but can't use it here" (for some
> reason I can't imagine, as "PCRE2_ERROR_NOMEMORY" is already
> "overloaded" to mean that).
> 
> It would also require them to invent a new "soft" failure mode for the
> JIT, i.e. not the facility added in a25b9085043, where we can use the
> JIT, but it's not on after all due to a "(*NO_JIT)" in the pattern
> itself.

We should really treat PCRE2 JIT as an *optional* optimization that
might not be available for certain cases. For these we should, IMHO,
simply use the interpreter mode, instead of bugging users with a BUG() /
die().

> But I may be wrong, so I've CC'd pcre-dev@ to see if they have any
> commentary on this proposed API paranoia. For them: The greater context
> of this thread on the git ML is at [2].
> 
> 1. https://github.com/git/git/commit/a25b9085043b8029169b4d5b172b78ca3f38fb37
> 2. https://lore.kernel.org/git/20221216121557.30714-1-minipli@grsecurity.net/

Thanks,
Mathias

Mathias Krause Dec. 20, 2022, 8:40 p.m. UTC | #8

Am 16.12.22 um 23:52 schrieb Junio C Hamano:
> Mathias Krause <minipli@grsecurity.net> writes:
> 
>> Such a change was already proposed 4 years ago [1] but wasn't merged for
>> unknown reasons.
>>
>> 1. https://lore.kernel.org/r/20181209230024.43444-3-carenas@gmail.com
> 
> This part does not belong to the log message but should go below
> three-dash line.  If you have a more concrete "it was rejected
> because ...", to help judging why this version improves upon the
> previous attempt and it is clear the reason for past rejection no
> longer applies, that is a very different story, though.
> 
> The way I read the original thread (assuming that the lore archive
> is showing all relevant messages there) is that RFC was reviewed
> well and everybody was happy about the direction it took.  It even
> received fairly concrete suggestions for the real, non-RFC version,
> but that never materialized.

It had a follow-up RFC[1] half a year later, adding a config option and,
after some more discussion, even command line switches[2]. But not just
IMHO is that far too much, but even you suggested to simply revert back
to the initial RFC and implement the automatic fallback[3], basically
merging it with a proper changelog[4]. As that never happened, I took up
the work and tried to do just that.

1. https://lore.kernel.org/git/20190728235427.41425-1-carenas@gmail.com/
2. https://lore.kernel.org/git/20190729105955.44390-1-carenas@gmail.com/
3. https://lore.kernel.org/git/xmqqh874vikk.fsf@gitster-ct.c.googlers.com/
4. https://lore.kernel.org/git/xmqqef1zmkp5.fsf@gitster-ct.c.googlers.com/

> It is very clear that the posted patch was not yet in a mergeable
> state as-is; "for unknown reasons" is less than helpful.
> 
> Now, we learned a more concrete reason, i.e. "it got tons of help
> improving the draft into the final version, but the help was
> discarded and the final version did not materialize", does the
> attempt this time around improve on it sufficiently to make it
> mergeable, or is it sufficiently different that it needs a fresh
> review from scratch?  If the latter, then "for unknown reasons"
> becomes even less relevant.

Sorry for not digging up that information for the initial patch
submission. I'll add it to v2.

> The rest of the proposed log message, and the change itself, both
> look very reasonable and well explained, at least to me.

Thanks a lot for the detailed review. I'll update the commit log
accordingly but will wait for more feedback to see on which solution we
want to settle on.

> Thanks.
> 
> 
>> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
>> Signed-off-by: Mathias Krause <minipli@grsecurity.net>	# tweaked changelog, added comment
>> ---
>>  grep.c | 17 +++++++++++++++--
>>  1 file changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/grep.c b/grep.c
>> index 06eed694936c..f2ada528b21d 100644
>> --- a/grep.c
>> +++ b/grep.c
>> @@ -317,8 +317,21 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
>>  	pcre2_config(PCRE2_CONFIG_JIT, &p->pcre2_jit_on);
>>  	if (p->pcre2_jit_on) {
>>  		jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE);
>> -		if (jitret)
>> -			die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n", p->pattern, jitret);
>> +		if (jitret) {
>> +			/*
>> +			 * Even though pcre2_config(PCRE2_CONFIG_JIT, ...)
>> +			 * indicated JIT support, the library might still
>> +			 * fail to generate JIT code for various reasons,
>> +			 * e.g. when SELinux's 'deny_execmem' or PaX's
>> +			 * MPROTECT prevent creating W|X memory mappings.
>> +			 *
>> +			 * Instead of faling hard, fall back to interpreter
>> +			 * mode, just as if the pattern was prefixed with
>> +			 * '(*NO_JIT)'.
>> +			 */
>> +			p->pcre2_jit_on = 0;
>> +			return;
>> +		}
>>  
>>  		/*
>>  		 * The pcre2_config(PCRE2_CONFIG_JIT, ...) call just

Ævar Arnfjörð Bjarmason Dec. 20, 2022, 9:11 p.m. UTC | #9

On Tue, Dec 20 2022, Mathias Krause wrote:

[De-CC-ing pcre-dev@, since this part is all git-specific]

> Am 19.12.22 um 10:00 schrieb Ævar Arnfjörð Bjarmason:
>> 
>> On Fri, Dec 16 2022, Carlo Arenas wrote:
>> 
>> [CC-ing pcre-dev@ for this "future error API" discussion]
>> 
>>> On Fri, Dec 16, 2022 at 3:09 PM Junio C Hamano <gitster@pobox.com> wrote:
>>>>
>>>> Mathias Krause <minipli@grsecurity.net> writes:
>>>>
>>>>> ... However, from a user's point of view a fall back to
>>>>> interpreter mode might still be desired in this case, as a failing
>>>>> 'git grep' is simply not acceptable, IMHO.
>>>>
>>>> "git grep" that silently produces a wrong result (by falling back
>>>> after a problem is detected) would not be acceptable, either.
>>>
>>> except that an error at this point only invalidates the use of JIT,
>>> so calling pcre2_jit_match() is invalid but calling pcre2_match() is not.
>>>
>>> the later is setup to be used later by the code that is added,
>> 
>> I think we could stumble ahead, but if this were to happen our
>> assumptions about how the API works have been invalidated.
>
> Well, pcre2_jit_compile() might fail for internal reasons, e.g.
> pcre2jit(3) states: "[...] If a pattern is too big, a call to
> pcre2_jit_compile() returns PCRE2_ERROR_NOMEMORY."
>
> For example, the following fails for me:
> $ git grep -P "$(perl -e 'print "(.)" x 4000')" -- grep.c
> fatal: Couldn't JIT the PCRE2 pattern '(.)(.)(.)(.)…
>
> But explicitly disabling JIT makes it "work":
> $ git grep -P "(*NO_JIT)$(perl -e 'print "(.)" x 4000')" -- grep.c
> $
>
> It's a made up example and might even be intended behavior by git, but
> it also proves a point Carlo already mentioned, a failing call to
> pcre2_jit_compile() only invalidates the use of the JIT engine. We can
> still use and fall back to the interpreter.

We should arguably do this, I hadn't bothered because I haven't been
able to find anything except pathological patterns where it matters, and
silently falling back in those cases will suck a lot more IMO.

If you are using such a pathological pattern it's almost always a better
idea to adjust your crazy pattern.

So I think in the *general* case we really should just keep this, and
*maybe* suggest the user try with (*NO_JIT) in the pattern.

But silently falling back kind of sucks, but unfortunately pcre2 doesn't
provide a way to say "failed because of SELinux" v.s. "failed because
the pattern is crazy", except that we could try to compile a known-good
pattern with the JIT, to disambiguate the two.

Anyway, if this is your goal you should really lead with that, not with
fixing a relatively obscure SELinux edge case...

> It would be used anyway if PCRE2 was compiled without JIT support, so I
> don't see any issues with falling back to interpreter mode if the JIT
> compilation fails -- for whatever reason.

It's the "for whatever reason" that I take issue with. We'd be in an
unknown state with the API behaving differently than we expect, and
returning unknown codes. That's different than the *known* error codes
(e.g. "no memory", oven though it's meaning is apparently overloaded to
the point of near-uselessness).

>> The pcre2_jit_compile() doesn't promise to return a finite set of error
>> codes, but:
>> 
>> 	[...]0 for success, or a negative error code otherwise[...]
>> 
>> But if new codes were added it's anyone's guess what state we'd be in,
>> so I think the safe thing is to BUG() out if we get as far as
>> pcre2_jit_compile() and don't get either PCRE2_ERROR_JIT_BADOPTION or
>> PCRE2_ERROR_NOMEMORY.
>
> But why BUG()? JIT is an optimization that might fail for PCRE2 internal
> reasons. Why should we make 'git grep' fail too in this case when we can
> handle it just fine by attempting to use the interpreter?
>
> If the pattern is really bogus, the interpreter will complain as well
> and we'll error out. But failing just because the JIT engine can't
> handle the pattern? Doesn't sound right to me.

See above, we're failing because our assumptions about how to use the
API have broken down at that point. We usually bug out in those cases.

>>>> Receiving BADOPTION could be a sign that there is something wrong in
>>>> the input, not from the end-user but from the code, in which case
>>>> stopping with BUG() may be a more appropriate?
>>>
>>> The way PCRE handles this kind of errors internally is to instruct pcre2_match()
>>> to use the interpreter.
>>>
>>> While a BUG() might be a way to ensure the code is using the right set
>>> of options
>>> I would expect that the failure will be reported by pcre2_compile
>>> instead, with the
>>> only cases left, only being interna to PCRE (ex: JIT can't yet support
>>> a feature the
>>> interpreter has)
>> 
>> I agree that it's possible in general that an external library might
>> start returning a "benign" error code that we could recover from, so
>> BUG(...) would be overdoing it.
>
> And I think that's the case here: JIT is an optimization that might not
> be available under all circumstances, as, for example, under SELinux's
> 'deny_execmem' setting. So we need to have a backup plan for such
> systems anyway. Why not always try to use the interpreter if JIT
> compilation fails?

See above, but maybe it's the least sucky thing (and definitely
simpler). I'm mainly checking that we're doing that we want here, and
that we're going into it with eyes open.

That we're now discussing a topic entirely different from SELinux on a
thread where we're (according to the commit message) fixing pcre2 where
the JIT is "unusable on such systems" is my main concern here. 

>> So not only would a BUG() biting us here require them to create a new
>> code for the state of "we have the JIT, but can't use it here" (for some
>> reason I can't imagine, as "PCRE2_ERROR_NOMEMORY" is already
>> "overloaded" to mean that).
>> 
>> It would also require them to invent a new "soft" failure mode for the
>> JIT, i.e. not the facility added in a25b9085043, where we can use the
>> JIT, but it's not on after all due to a "(*NO_JIT)" in the pattern
>> itself.
>
> We should really treat PCRE2 JIT as an *optional* optimization that
> might not be available for certain cases. For these we should, IMHO,
> simply use the interpreter mode, instead of bugging users with a BUG() /
> die().

To summarize some of the above, I think performance also matters, we
have cases where:

 A. We could use the non-JIT
 B. We could use the JIT, and it's a *lot* faster
 C. We can't use the JIT at all
 D. We can't use the JIT because we run into its limits

I think it's fair to die on "D" as in practice you only (I think!) run
into it on pathological patterns, but yes, another option would be to
fall back to "A".

But thinking you're doing "B" and not wanting to implicitly fall back to
"A" is also a valid use-case.

So I'm inclined to suggest that we should be less helpful with automatic
fallbacks, and just suggest a "try it with '(*NO_JIT)'" advice() or
something.

But as noted above needing to always disable an apparently "available"
JIT on some systems (SELinux) does throw a monkey wrench into that
particular suggestion :(

So I'm not sure, I'm mainly trying to encourage you to think through the
edge cases, and to summarize the full impact of the change in a re-roll.

Mathias Krause Jan. 18, 2023, 2:22 p.m. UTC | #10

On 20.12.22 22:11, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Dec 20 2022, Mathias Krause wrote:
> 
> [De-CC-ing pcre-dev@, since this part is all git-specific]
> 
>> Am 19.12.22 um 10:00 schrieb Ævar Arnfjörð Bjarmason:
>>>
>>> On Fri, Dec 16 2022, Carlo Arenas wrote:
>>>
>>> [CC-ing pcre-dev@ for this "future error API" discussion]
>>>
>>>> On Fri, Dec 16, 2022 at 3:09 PM Junio C Hamano <gitster@pobox.com> wrote:
>>>>>
>>>>> Mathias Krause <minipli@grsecurity.net> writes:
>>>>>
>>>>>> ... However, from a user's point of view a fall back to
>>>>>> interpreter mode might still be desired in this case, as a failing
>>>>>> 'git grep' is simply not acceptable, IMHO.
>>>>>
>>>>> "git grep" that silently produces a wrong result (by falling back
>>>>> after a problem is detected) would not be acceptable, either.
>>>>
>>>> except that an error at this point only invalidates the use of JIT,
>>>> so calling pcre2_jit_match() is invalid but calling pcre2_match() is not.
>>>>
>>>> the later is setup to be used later by the code that is added,
>>>
>>> I think we could stumble ahead, but if this were to happen our
>>> assumptions about how the API works have been invalidated.
>>
>> Well, pcre2_jit_compile() might fail for internal reasons, e.g.
>> pcre2jit(3) states: "[...] If a pattern is too big, a call to
>> pcre2_jit_compile() returns PCRE2_ERROR_NOMEMORY."
>>
>> For example, the following fails for me:
>> $ git grep -P "$(perl -e 'print "(.)" x 4000')" -- grep.c
>> fatal: Couldn't JIT the PCRE2 pattern '(.)(.)(.)(.)…
>>
>> But explicitly disabling JIT makes it "work":
>> $ git grep -P "(*NO_JIT)$(perl -e 'print "(.)" x 4000')" -- grep.c
>> $
>>
>> It's a made up example and might even be intended behavior by git, but
>> it also proves a point Carlo already mentioned, a failing call to
>> pcre2_jit_compile() only invalidates the use of the JIT engine. We can
>> still use and fall back to the interpreter.
> 
> We should arguably do this, I hadn't bothered because I haven't been
> able to find anything except pathological patterns where it matters, and
> silently falling back in those cases will suck a lot more IMO.
> 
> If you are using such a pathological pattern it's almost always a better
> idea to adjust your crazy pattern.
> 
> So I think in the *general* case we really should just keep this, and
> *maybe* suggest the user try with (*NO_JIT) in the pattern.

Except for the case I'm trying to address, where we simply cannot detect
*why* the JIT is failing from the error code alone. It'll error out with
PCRE2_ERROR_NOMEMORY for the case where the JIT fails to allocate W|X
memory as well as for the case where the pattern is crazy.

> But silently falling back kind of sucks, but unfortunately pcre2 doesn't
> provide a way to say "failed because of SELinux" v.s. "failed because
> the pattern is crazy", except that we could try to compile a known-good
> pattern with the JIT, to disambiguate the two.

Exactly, so what about something like this:

If JIT is generally available, try to JIT the user supplied pattern:
1/ If it fails with PCRE2_ERROR_NOMEMORY, try to compile a know good
   pattern, e.g. ".":
   1a/ If this fails with PCRE2_ERROR_NOMEMORY as well, fall back to
       interpreter, as JIT is non-functional because of SELinux / PaX.
   1b/ If not, it's a "crazy" pattern, suggest '(*NO_JIT)'.
2/ If it succeeds or fails with a different error, continue as of now,
   i.e. use JIT on success or die() on error, optionally suggesting
   '(*NO_JIT)'.

That should handle the case you're concerned about and only fall back to
interpreter mode if JIT won't be functional anyway. Admitted, this would
also allow crazy patterns, but there's simply no way to detect these
under such constraints.

> Anyway, if this is your goal you should really lead with that, not with
> fixing a relatively obscure SELinux edge case...

It's just that I'm suffering from that "obscure SELinux edge case", just
not under SELinux but PaX MPROTECT. I only mentioned SELinux to state,
it's not only an issue for PaX but regular systems as well. So it's not
a hypothetical case I like to get handled, but my work environment to
get usable again.

>> It would be used anyway if PCRE2 was compiled without JIT support, so I
>> don't see any issues with falling back to interpreter mode if the JIT
>> compilation fails -- for whatever reason.
> 
> It's the "for whatever reason" that I take issue with. We'd be in an
> unknown state with the API behaving differently than we expect, and
> returning unknown codes. That's different than the *known* error codes
> (e.g. "no memory", oven though it's meaning is apparently overloaded to
> the point of near-uselessness).
> 
>>> The pcre2_jit_compile() doesn't promise to return a finite set of error
>>> codes, but:
>>>
>>> 	[...]0 for success, or a negative error code otherwise[...]
>>>
>>> But if new codes were added it's anyone's guess what state we'd be in,
>>> so I think the safe thing is to BUG() out if we get as far as
>>> pcre2_jit_compile() and don't get either PCRE2_ERROR_JIT_BADOPTION or
>>> PCRE2_ERROR_NOMEMORY.
>>
>> But why BUG()? JIT is an optimization that might fail for PCRE2 internal
>> reasons. Why should we make 'git grep' fail too in this case when we can
>> handle it just fine by attempting to use the interpreter?
>>
>> If the pattern is really bogus, the interpreter will complain as well
>> and we'll error out. But failing just because the JIT engine can't
>> handle the pattern? Doesn't sound right to me.
> 
> See above, we're failing because our assumptions about how to use the
> API have broken down at that point. We usually bug out in those cases.
> 
>>>>> Receiving BADOPTION could be a sign that there is something wrong in
>>>>> the input, not from the end-user but from the code, in which case
>>>>> stopping with BUG() may be a more appropriate?
>>>>
>>>> The way PCRE handles this kind of errors internally is to instruct pcre2_match()
>>>> to use the interpreter.
>>>>
>>>> While a BUG() might be a way to ensure the code is using the right set
>>>> of options
>>>> I would expect that the failure will be reported by pcre2_compile
>>>> instead, with the
>>>> only cases left, only being interna to PCRE (ex: JIT can't yet support
>>>> a feature the
>>>> interpreter has)
>>>
>>> I agree that it's possible in general that an external library might
>>> start returning a "benign" error code that we could recover from, so
>>> BUG(...) would be overdoing it.
>>
>> And I think that's the case here: JIT is an optimization that might not
>> be available under all circumstances, as, for example, under SELinux's
>> 'deny_execmem' setting. So we need to have a backup plan for such
>> systems anyway. Why not always try to use the interpreter if JIT
>> compilation fails?
> 
> See above, but maybe it's the least sucky thing (and definitely
> simpler). I'm mainly checking that we're doing that we want here, and
> that we're going into it with eyes open.
> 
> That we're now discussing a topic entirely different from SELinux on a
> thread where we're (according to the commit message) fixing pcre2 where
> the JIT is "unusable on such systems" is my main concern here. 

Yeah, I overlooked that angle initially, but it's a valid concern.
However, limiting the functional change of falling back to interpreter
mode on "JIT's broken anyway" systems only should address these and get
me a functional 'git grep' again.

>>> So not only would a BUG() biting us here require them to create a new
>>> code for the state of "we have the JIT, but can't use it here" (for some
>>> reason I can't imagine, as "PCRE2_ERROR_NOMEMORY" is already
>>> "overloaded" to mean that).
>>>
>>> It would also require them to invent a new "soft" failure mode for the
>>> JIT, i.e. not the facility added in a25b9085043, where we can use the
>>> JIT, but it's not on after all due to a "(*NO_JIT)" in the pattern
>>> itself.
>>
>> We should really treat PCRE2 JIT as an *optional* optimization that
>> might not be available for certain cases. For these we should, IMHO,
>> simply use the interpreter mode, instead of bugging users with a BUG() /
>> die().
> 
> To summarize some of the above, I think performance also matters, we
> have cases where:
> 
>  A. We could use the non-JIT
>  B. We could use the JIT, and it's a *lot* faster
>  C. We can't use the JIT at all
>  D. We can't use the JIT because we run into its limits
> 
> I think it's fair to die on "D" as in practice you only (I think!) run
> into it on pathological patterns, but yes, another option would be to
> fall back to "A".
> 
> But thinking you're doing "B" and not wanting to implicitly fall back to
> "A" is also a valid use-case.

Agreed. My above sketched handling should do that, as in not falling
back to interpreter mode when the JIT would be functional per se, but
just failed on this particular pattern.

> So I'm inclined to suggest that we should be less helpful with automatic
> fallbacks, and just suggest a "try it with '(*NO_JIT)'" advice() or
> something.

Well, that's a real pain from a user's (my) point of view. Trust me, I'm
suffering from this, otherwise I wouldn't have brought up the issue ;)

> But as noted above needing to always disable an apparently "available"
> JIT on some systems (SELinux) does throw a monkey wrench into that
> particular suggestion :(

Yep.

> So I'm not sure, I'm mainly trying to encourage you to think through the
> edge cases, and to summarize the full impact of the change in a re-roll.

Yeah, I agree. The implied fallback to the interpreter, even for the
more obscure cases should have been mentioned in the changelog, but I
overlooked / ignored that case initially.

My take from the discussion is to do a re-roll with something like this:

  if (p->pcre2_jit_on) {
      jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE);
      if (jitret == PCRE2_ERROR_NOMEMORY && !pcre2_jit_functional()) {
          /* JIT's non-functional because of SELinux / PaX */
          p->pcre2_jit_on = 0;
          return;
      } else if (jitret) {
          die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n"
              "Try prefixing the pattern with '(*NO_JIT)'\n",
              p->pattern, jitret);
      }
      ...
  }

...with pcre2_jit_functional() being something like this:

  static int pcre2_jit_functional(void)
  {
      pcre2_code *code;
      size_t off;
      int err;

      code = pcre2_compile((PCRE2_SPTR)".", 1, 0, &err, &off, NULL);
      if (!code)
          return 0;

      err = pcre2_jit_compile(code, PCRE2_JIT_COMPLETE);
      pcre2_code_free(code);

      return err == 0;
  }


Thanks,
Mathias

Ævar Arnfjörð Bjarmason Jan. 18, 2023, 3:44 p.m. UTC | #11

On Wed, Jan 18 2023, Mathias Krause wrote:

> On 20.12.22 22:11, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Tue, Dec 20 2022, Mathias Krause wrote:
>> 
>> [De-CC-ing pcre-dev@, since this part is all git-specific]
>> 
>>> Am 19.12.22 um 10:00 schrieb Ævar Arnfjörð Bjarmason:
>>>>
>>>> On Fri, Dec 16 2022, Carlo Arenas wrote:
>>>>
>>>> [CC-ing pcre-dev@ for this "future error API" discussion]
>>>>
>>>>> On Fri, Dec 16, 2022 at 3:09 PM Junio C Hamano <gitster@pobox.com> wrote:
>>>>>>
>>>>>> Mathias Krause <minipli@grsecurity.net> writes:
>>>>>>
>>>>>>> ... However, from a user's point of view a fall back to
>>>>>>> interpreter mode might still be desired in this case, as a failing
>>>>>>> 'git grep' is simply not acceptable, IMHO.
>>>>>>
>>>>>> "git grep" that silently produces a wrong result (by falling back
>>>>>> after a problem is detected) would not be acceptable, either.
>>>>>
>>>>> except that an error at this point only invalidates the use of JIT,
>>>>> so calling pcre2_jit_match() is invalid but calling pcre2_match() is not.
>>>>>
>>>>> the later is setup to be used later by the code that is added,
>>>>
>>>> I think we could stumble ahead, but if this were to happen our
>>>> assumptions about how the API works have been invalidated.
>>>
>>> Well, pcre2_jit_compile() might fail for internal reasons, e.g.
>>> pcre2jit(3) states: "[...] If a pattern is too big, a call to
>>> pcre2_jit_compile() returns PCRE2_ERROR_NOMEMORY."
>>>
>>> For example, the following fails for me:
>>> $ git grep -P "$(perl -e 'print "(.)" x 4000')" -- grep.c
>>> fatal: Couldn't JIT the PCRE2 pattern '(.)(.)(.)(.)…
>>>
>>> But explicitly disabling JIT makes it "work":
>>> $ git grep -P "(*NO_JIT)$(perl -e 'print "(.)" x 4000')" -- grep.c
>>> $
>>>
>>> It's a made up example and might even be intended behavior by git, but
>>> it also proves a point Carlo already mentioned, a failing call to
>>> pcre2_jit_compile() only invalidates the use of the JIT engine. We can
>>> still use and fall back to the interpreter.
>> 
>> We should arguably do this, I hadn't bothered because I haven't been
>> able to find anything except pathological patterns where it matters, and
>> silently falling back in those cases will suck a lot more IMO.
>> 
>> If you are using such a pathological pattern it's almost always a better
>> idea to adjust your crazy pattern.
>> 
>> So I think in the *general* case we really should just keep this, and
>> *maybe* suggest the user try with (*NO_JIT) in the pattern.
>
> Except for the case I'm trying to address, where we simply cannot detect
> *why* the JIT is failing from the error code alone. It'll error out with
> PCRE2_ERROR_NOMEMORY for the case where the JIT fails to allocate W|X
> memory as well as for the case where the pattern is crazy.

*nod*, though I see you saw that I addressed that below.

>> But silently falling back kind of sucks, but unfortunately pcre2 doesn't
>> provide a way to say "failed because of SELinux" v.s. "failed because
>> the pattern is crazy", except that we could try to compile a known-good
>> pattern with the JIT, to disambiguate the two.
>
> Exactly, so what about something like this:
>
> If JIT is generally available, try to JIT the user supplied pattern:
> 1/ If it fails with PCRE2_ERROR_NOMEMORY, try to compile a know good
>    pattern, e.g. ".":
>    1a/ If this fails with PCRE2_ERROR_NOMEMORY as well, fall back to
>        interpreter, as JIT is non-functional because of SELinux / PaX.
>    1b/ If not, it's a "crazy" pattern, suggest '(*NO_JIT)'.
> 2/ If it succeeds or fails with a different error, continue as of now,
>    i.e. use JIT on success or die() on error, optionally suggesting
>    '(*NO_JIT)'.
>
> That should handle the case you're concerned about and only fall back to
> interpreter mode if JIT won't be functional anyway. Admitted, this would
> also allow crazy patterns, but there's simply no way to detect these
> under such constraints.

That sounds good, i.e. we could narrow the JIT falling back case to
these SELinux cases and the like, distinct from generic internal errors.

Maybe it's too much paranoia, but it should work & get rid of the
ambiguity.

>> Anyway, if this is your goal you should really lead with that, not with
>> fixing a relatively obscure SELinux edge case...
>
> It's just that I'm suffering from that "obscure SELinux edge case", just
> not under SELinux but PaX MPROTECT. I only mentioned SELinux to state,
> it's not only an issue for PaX but regular systems as well. So it's not
> a hypothetical case I like to get handled, but my work environment to
> get usable again.

Right, I didn't mean to imply it was, just that for the rest of us the
more general effect of this change outside of SELinux is of more general
interest.

>>> It would be used anyway if PCRE2 was compiled without JIT support, so I
>>> don't see any issues with falling back to interpreter mode if the JIT
>>> compilation fails -- for whatever reason.
>> 
>> It's the "for whatever reason" that I take issue with. We'd be in an
>> unknown state with the API behaving differently than we expect, and
>> returning unknown codes. That's different than the *known* error codes
>> (e.g. "no memory", oven though it's meaning is apparently overloaded to
>> the point of near-uselessness).
>> 
>>>> The pcre2_jit_compile() doesn't promise to return a finite set of error
>>>> codes, but:
>>>>
>>>> 	[...]0 for success, or a negative error code otherwise[...]
>>>>
>>>> But if new codes were added it's anyone's guess what state we'd be in,
>>>> so I think the safe thing is to BUG() out if we get as far as
>>>> pcre2_jit_compile() and don't get either PCRE2_ERROR_JIT_BADOPTION or
>>>> PCRE2_ERROR_NOMEMORY.
>>>
>>> But why BUG()? JIT is an optimization that might fail for PCRE2 internal
>>> reasons. Why should we make 'git grep' fail too in this case when we can
>>> handle it just fine by attempting to use the interpreter?
>>>
>>> If the pattern is really bogus, the interpreter will complain as well
>>> and we'll error out. But failing just because the JIT engine can't
>>> handle the pattern? Doesn't sound right to me.
>> 
>> See above, we're failing because our assumptions about how to use the
>> API have broken down at that point. We usually bug out in those cases.
>> 
>>>>>> Receiving BADOPTION could be a sign that there is something wrong in
>>>>>> the input, not from the end-user but from the code, in which case
>>>>>> stopping with BUG() may be a more appropriate?
>>>>>
>>>>> The way PCRE handles this kind of errors internally is to instruct pcre2_match()
>>>>> to use the interpreter.
>>>>>
>>>>> While a BUG() might be a way to ensure the code is using the right set
>>>>> of options
>>>>> I would expect that the failure will be reported by pcre2_compile
>>>>> instead, with the
>>>>> only cases left, only being interna to PCRE (ex: JIT can't yet support
>>>>> a feature the
>>>>> interpreter has)
>>>>
>>>> I agree that it's possible in general that an external library might
>>>> start returning a "benign" error code that we could recover from, so
>>>> BUG(...) would be overdoing it.
>>>
>>> And I think that's the case here: JIT is an optimization that might not
>>> be available under all circumstances, as, for example, under SELinux's
>>> 'deny_execmem' setting. So we need to have a backup plan for such
>>> systems anyway. Why not always try to use the interpreter if JIT
>>> compilation fails?
>> 
>> See above, but maybe it's the least sucky thing (and definitely
>> simpler). I'm mainly checking that we're doing that we want here, and
>> that we're going into it with eyes open.
>> 
>> That we're now discussing a topic entirely different from SELinux on a
>> thread where we're (according to the commit message) fixing pcre2 where
>> the JIT is "unusable on such systems" is my main concern here. 
>
> Yeah, I overlooked that angle initially, but it's a valid concern.
> However, limiting the functional change of falling back to interpreter
> mode on "JIT's broken anyway" systems only should address these and get
> me a functional 'git grep' again.

*nod*

>>>> So not only would a BUG() biting us here require them to create a new
>>>> code for the state of "we have the JIT, but can't use it here" (for some
>>>> reason I can't imagine, as "PCRE2_ERROR_NOMEMORY" is already
>>>> "overloaded" to mean that).
>>>>
>>>> It would also require them to invent a new "soft" failure mode for the
>>>> JIT, i.e. not the facility added in a25b9085043, where we can use the
>>>> JIT, but it's not on after all due to a "(*NO_JIT)" in the pattern
>>>> itself.
>>>
>>> We should really treat PCRE2 JIT as an *optional* optimization that
>>> might not be available for certain cases. For these we should, IMHO,
>>> simply use the interpreter mode, instead of bugging users with a BUG() /
>>> die().
>> 
>> To summarize some of the above, I think performance also matters, we
>> have cases where:
>> 
>>  A. We could use the non-JIT
>>  B. We could use the JIT, and it's a *lot* faster
>>  C. We can't use the JIT at all
>>  D. We can't use the JIT because we run into its limits
>> 
>> I think it's fair to die on "D" as in practice you only (I think!) run
>> into it on pathological patterns, but yes, another option would be to
>> fall back to "A".
>> 
>> But thinking you're doing "B" and not wanting to implicitly fall back to
>> "A" is also a valid use-case.
>
> Agreed. My above sketched handling should do that, as in not falling
> back to interpreter mode when the JIT would be functional per se, but
> just failed on this particular pattern.
>
>> So I'm inclined to suggest that we should be less helpful with automatic
>> fallbacks, and just suggest a "try it with '(*NO_JIT)'" advice() or
>> something.
>
> Well, that's a real pain from a user's (my) point of view. Trust me, I'm
> suffering from this, otherwise I wouldn't have brought up the issue ;)

That's fair enough, falling back in the "D" case sounds good.

>> But as noted above needing to always disable an apparently "available"
>> JIT on some systems (SELinux) does throw a monkey wrench into that
>> particular suggestion :(
>
> Yep.
>
>> So I'm not sure, I'm mainly trying to encourage you to think through the
>> edge cases, and to summarize the full impact of the change in a re-roll.
>
> Yeah, I agree. The implied fallback to the interpreter, even for the
> more obscure cases should have been mentioned in the changelog, but I
> overlooked / ignored that case initially.
>
> My take from the discussion is to do a re-roll with something like this:
>
>   if (p->pcre2_jit_on) {
>       jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE);
>       if (jitret == PCRE2_ERROR_NOMEMORY && !pcre2_jit_functional()) {
>           /* JIT's non-functional because of SELinux / PaX */
>           p->pcre2_jit_on = 0;
>           return;
>       } else if (jitret) {
>           die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n"
>               "Try prefixing the pattern with '(*NO_JIT)'\n",
>               p->pattern, jitret);
>       }
>       ...
>   }
>
> ...with pcre2_jit_functional() being something like this:
>
>   static int pcre2_jit_functional(void)
>   {
>       pcre2_code *code;
>       size_t off;
>       int err;
>
>       code = pcre2_compile((PCRE2_SPTR)".", 1, 0, &err, &off, NULL);
>       if (!code)
>           return 0;
>
>       err = pcre2_jit_compile(code, PCRE2_JIT_COMPLETE);
>       pcre2_code_free(code);
>
>       return err == 0;
>   }

This seems sensible as pseudocode, although please don't add another
pcre2_compile() entry point (as e.g. passing NULL here will bypass the
context we carefully set up, and if we have a custom allocator...).

Instead you could just re-enter the API itself via
compile_grep_patterns(), or perhaps lower via
compile_pcre2_pattern(). Then add some flag to "struct grep_opt" to
indicate that we shouldn't die() or BUG() there, but just run a "jit
test".

This code also treats all failures of pcre2_jit_compile() the same, but
here we only want PCRE2_ERROR_NOMEMORY.

Mathias Krause Jan. 19, 2023, 9:19 a.m. UTC | #12

On 18.01.23 16:44, Ævar Arnfjörð Bjarmason wrote:
>>> [snip]
>>> But silently falling back kind of sucks, but unfortunately pcre2 doesn't
>>> provide a way to say "failed because of SELinux" v.s. "failed because
>>> the pattern is crazy", except that we could try to compile a known-good
>>> pattern with the JIT, to disambiguate the two.
>>
>> Exactly, so what about something like this:
>>
>> If JIT is generally available, try to JIT the user supplied pattern:
>> 1/ If it fails with PCRE2_ERROR_NOMEMORY, try to compile a know good
>>    pattern, e.g. ".":
>>    1a/ If this fails with PCRE2_ERROR_NOMEMORY as well, fall back to
>>        interpreter, as JIT is non-functional because of SELinux / PaX.
>>    1b/ If not, it's a "crazy" pattern, suggest '(*NO_JIT)'.
>> 2/ If it succeeds or fails with a different error, continue as of now,
>>    i.e. use JIT on success or die() on error, optionally suggesting
>>    '(*NO_JIT)'.
>>
>> That should handle the case you're concerned about and only fall back to
>> interpreter mode if JIT won't be functional anyway. Admitted, this would
>> also allow crazy patterns, but there's simply no way to detect these
>> under such constraints.
> 
> That sounds good, i.e. we could narrow the JIT falling back case to
> these SELinux cases and the like, distinct from generic internal errors.
> 
> Maybe it's too much paranoia, but it should work & get rid of the
> ambiguity.

Nah, it's fine.

>>> [snip]
>>> See above, but maybe it's the least sucky thing (and definitely
>>> simpler). I'm mainly checking that we're doing that we want here, and
>>> that we're going into it with eyes open.
>>>
>>> That we're now discussing a topic entirely different from SELinux on a
>>> thread where we're (according to the commit message) fixing pcre2 where
>>> the JIT is "unusable on such systems" is my main concern here. 
>>
>> Yeah, I overlooked that angle initially, but it's a valid concern.
>> However, limiting the functional change of falling back to interpreter
>> mode on "JIT's broken anyway" systems only should address these and get
>> me a functional 'git grep' again.
> 
> *nod*
> 

>>>>> [snip]
>>>
>>> To summarize some of the above, I think performance also matters, we
>>> have cases where:
>>>
>>>  A. We could use the non-JIT
>>>  B. We could use the JIT, and it's a *lot* faster
>>>  C. We can't use the JIT at all
>>>  D. We can't use the JIT because we run into its limits
>>>
>>> I think it's fair to die on "D" as in practice you only (I think!) run
>>> into it on pathological patterns, but yes, another option would be to
>>> fall back to "A".
>>>
>>> But thinking you're doing "B" and not wanting to implicitly fall back to
>>> "A" is also a valid use-case.
>>
>> Agreed. My above sketched handling should do that, as in not falling
>> back to interpreter mode when the JIT would be functional per se, but
>> just failed on this particular pattern.
>>
>>> So I'm inclined to suggest that we should be less helpful with automatic
>>> fallbacks, and just suggest a "try it with '(*NO_JIT)'" advice() or
>>> something.
>>
>> Well, that's a real pain from a user's (my) point of view. Trust me, I'm
>> suffering from this, otherwise I wouldn't have brought up the issue ;)
> 
> That's fair enough, falling back in the "D" case sounds good.
> 
>>> But as noted above needing to always disable an apparently "available"
>>> JIT on some systems (SELinux) does throw a monkey wrench into that
>>> particular suggestion :(
>>
>> Yep.
>>
>>> So I'm not sure, I'm mainly trying to encourage you to think through the
>>> edge cases, and to summarize the full impact of the change in a re-roll.
>>
>> Yeah, I agree. The implied fallback to the interpreter, even for the
>> more obscure cases should have been mentioned in the changelog, but I
>> overlooked / ignored that case initially.
>>
>> My take from the discussion is to do a re-roll with something like this:
>>
>>   if (p->pcre2_jit_on) {
>>       jitret = pcre2_jit_compile(p->pcre2_pattern, PCRE2_JIT_COMPLETE);
>>       if (jitret == PCRE2_ERROR_NOMEMORY && !pcre2_jit_functional()) {
>>           /* JIT's non-functional because of SELinux / PaX */
>>           p->pcre2_jit_on = 0;
>>           return;
>>       } else if (jitret) {
>>           die("Couldn't JIT the PCRE2 pattern '%s', got '%d'\n"
>>               "Try prefixing the pattern with '(*NO_JIT)'\n",
>>               p->pattern, jitret);
>>       }
>>       ...
>>   }
>>
>> ...with pcre2_jit_functional() being something like this:
>>
>>   static int pcre2_jit_functional(void)
>>   {
>>       pcre2_code *code;
>>       size_t off;
>>       int err;
>>
>>       code = pcre2_compile((PCRE2_SPTR)".", 1, 0, &err, &off, NULL);
>>       if (!code)
>>           return 0;
>>
>>       err = pcre2_jit_compile(code, PCRE2_JIT_COMPLETE);
>>       pcre2_code_free(code);
>>
>>       return err == 0;
>>   }
> 
> This seems sensible as pseudocode, although please don't add another
> pcre2_compile() entry point (as e.g. passing NULL here will bypass the
> context we carefully set up, and if we have a custom allocator...).

That was intentional, actually. I specifically want to use the most
basic API to test for "general JIT availability." That test should tell
if PCRE2 JIT is working /in general/, not specifically how we make use
of it in git, which might not. However, that would be a different error,
i.e. not because PCRE2 failed to allocate JIT memory but us using the
API wrong, hitting limitations, etc.

Making use of the PCRE2 context we set up in compile_pcre2_pattern()
(and thereby possibly generating errors because of it) could mask API
usage errors resulting from these and may lead to a false fallback to
interpreter mode when we want to die() instead?

> Instead you could just re-enter the API itself via
> compile_grep_patterns(), or perhaps lower via
> compile_pcre2_pattern(). Then add some flag to "struct grep_opt" to
> indicate that we shouldn't die() or BUG() there, but just run a "jit
> test".

This feels kinda hacky and overkill, actually. A dedicated simplistic
test looks much cleaner, IMHO.

> This code also treats all failures of pcre2_jit_compile() the same, but
> here we only want PCRE2_ERROR_NOMEMORY.

Does it really matter? I mean, we could change the final test to 'err !=
PCRE2_ERROR_NOMEMORY' but we also return early when we're unable to
generate PCRE2 code for the "." pattern -- not strictly a JIT error as well.

I'd say instead of splitting hairs, pcre2_jit_functional() should stay
as above, as a failure to compile "." as a pattern can only happen (a)
under very tight memory constraints or (b) for JIT internal reasons,
which would boil down to not being able to allocate the required
resources to generate JIT code --- it's such a simple pattern it's not
expected to fail in any other way. Case (a) can be ignored, IMHO, as
that would lead to follow up errors soon anyway.


Thanks,
Mathias

grep: fall back to interpreter mode if JIT fails

Commit Message

Comments

Patch