diff mbox series

[v3,2/4] grep/pcre2: simplify boolean spaghetti

Message ID 20210124021229.25987-3-avarab@gmail.com (mailing list archive)
State Superseded
Headers show
Series grep: better support invalid UTF-8 haystacks | expand

Commit Message

Ævar Arnfjörð Bjarmason Jan. 24, 2021, 2:12 a.m. UTC
Simplify an expression I added in 870eea8166 (grep: do not enter
PCRE2_UTF mode on fixed matching, 2019-07-26) by using a simple
application of De Morgan's laws[1]. I.e.:

    NOT(A && B) is Equivalent to (NOT(A) OR NOT(B))

1. https://en.wikipedia.org/wiki/De_Morgan%27s_laws

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Junio C Hamano Jan. 24, 2021, 5:33 a.m. UTC | #1
Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

>     NOT(A && B) is Equivalent to (NOT(A) OR NOT(B))

At this level, however, the left one looks much simpler than the
right one ;-)


>  	if (!opt->ignore_locale && is_utf8_locale() && has_non_ascii(p->pattern) &&
> -	    !(!opt->ignore_case && (p->fixed || p->is_fixed)))
> +	    (opt->ignore_case || !(p->fixed || p->is_fixed)))
>  		options |= PCRE2_UTF;

In the context of this expression, well, I guess the rewritten one
is probably simpler but can we explain the whole condition in fewer
than three lines?  With or without the rewrite, it still looks too
complicated to me.
Johannes Sixt Jan. 24, 2021, 10:45 a.m. UTC | #2
Am 24.01.21 um 06:33 schrieb Junio C Hamano:
> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
> 
>>     NOT(A && B) is Equivalent to (NOT(A) OR NOT(B))
> 
> At this level, however, the left one looks much simpler than the
> right one ;-)
> 
> 
>>  	if (!opt->ignore_locale && is_utf8_locale() && has_non_ascii(p->pattern) &&
>> -	    !(!opt->ignore_case && (p->fixed || p->is_fixed)))
>> +	    (opt->ignore_case || !(p->fixed || p->is_fixed)))
>>  		options |= PCRE2_UTF;
> 
> In the context of this expression, well, I guess the rewritten one
> is probably simpler but can we explain the whole condition in fewer
> than three lines?  With or without the rewrite, it still looks too
> complicated to me.

Make the condition

 	if (!opt->ignore_locale &&
	    is_utf8_locale() &&
	    has_non_ascii(p->pattern) &&
	    (opt->ignore_case ||
		(!p->fixed &&
		 !p->is_fixed)))
	{
  		options |= PCRE2_UTF;
	}

With the knowledge of the equivalence

    (A => B)  <=>  (NOT(A) OR B)

(A => B means "if A then B"), the condition makes a lot of sense when
read aloud:

    if
       NOT ignore locale
       AND
       is UTF8
       AND
       has non-ASCII
       AND
         if
            NOT ignore case
         then if also
            NOT fixed
            AND
            NOT is fixed
    then
        ...


The codition amounts to extending a series of conjunctions with more
conjuctions IF a condition is satisfied. That's quite sensible.

You have to swap the polarity of the first condition of || in your head,
though, to achieve that meaning. That works with every OR condition, BTW.

-- Hannes
diff mbox series

Patch

diff --git a/grep.c b/grep.c
index efeb6dc58d..0bb772f727 100644
--- a/grep.c
+++ b/grep.c
@@ -491,7 +491,7 @@  static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
 		options |= PCRE2_CASELESS;
 	}
 	if (!opt->ignore_locale && is_utf8_locale() && has_non_ascii(p->pattern) &&
-	    !(!opt->ignore_case && (p->fixed || p->is_fixed)))
+	    (opt->ignore_case || !(p->fixed || p->is_fixed)))
 		options |= PCRE2_UTF;
 
 	p->pcre2_pattern = pcre2_compile((PCRE2_SPTR)p->pattern,