diff mbox series

[RFC/PATCH,5/7] grep: drop support for \0 in --fixed-strings <pattern>

Message ID 20190626000329.32475-6-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series grep: move from kwset to optional PCRE v2 | expand

Commit Message

Ævar Arnfjörð Bjarmason June 26, 2019, 12:03 a.m. UTC
Change "-f <file>" to not support patterns with "\0" in them under
--fixed-strings, we'll now only support these under --perl-regexp with
PCRE v2.

A previous change to Documentation/git-grep.txt changed the
description of "-f <file>" to be vague enough as to not promise that
this would work, and by dropping support for this we make it a whole
lot easier to move away from the kwset backend, which a subsequent
change will try to do.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 grep.c                         |  6 +--
 t/t7816-grep-binary-pattern.sh | 82 +++++++++++++++++-----------------
 2 files changed, 44 insertions(+), 44 deletions(-)

Comments

Junio C Hamano June 26, 2019, 4:14 p.m. UTC | #1
Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Change "-f <file>" to not support patterns with "\0" in them under
> --fixed-strings, we'll now only support these under --perl-regexp with
> PCRE v2.
>
> A previous change to Documentation/git-grep.txt changed the
> description of "-f <file>" to be vague enough as to not promise that
> this would work, and by dropping support for this we make it a whole
> lot easier to move away from the kwset backend, which a subsequent
> change will try to do.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---

This step, together with all others, looks sensibly justified to me
only when I wear tinted glasses that make me pretend that the final
goal is to promote pcre backend, which is much more important than
serving the current users.  When I remove the glasses, it smells
more like making excuses.

But as we saw discussed in the previous thread, I too think it is OK
to make 'Nobody would notice the updated behaviour of NUL in the
patterns' our working assumption and see if anybody screams---after
all we have to start somewhere to make progress.

A very good thing about this series is that it does *not* add a new
feature that people would miss, even if it went straight to 'master'
and to the next release.  All it does is to optimize differently and
changing the behaviour we assume nobody depends on.  

It is easy enough to revert the whole thing if it turns out to be
problematic even in the worst case, and nobody would notice ;-)
diff mbox series

Patch

diff --git a/grep.c b/grep.c
index 261bd3a342..14570c7ac1 100644
--- a/grep.c
+++ b/grep.c
@@ -644,6 +644,9 @@  static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 	p->word_regexp = opt->word_regexp;
 	p->ignore_case = opt->ignore_case;
 
+	if (memchr(p->pattern, 0, p->patternlen) && !opt->pcre2)
+		die(_("given pattern contains NULL byte (via -f <file>). This is only supported with -P under PCRE v2"));
+
 	/*
 	 * Even when -F (fixed) asks us to do a non-regexp search, we
 	 * may not be able to correctly case-fold when -i
@@ -666,9 +669,6 @@  static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 		return;
 	}
 
-	if (memchr(p->pattern, 0, p->patternlen) && !opt->pcre2)
-		die(_("given pattern contains NULL byte (via -f <file>). This is only supported with -P under PCRE v2"));
-
 	if (opt->fixed) {
 		/*
 		 * We come here when the pattern has the non-ascii
diff --git a/t/t7816-grep-binary-pattern.sh b/t/t7816-grep-binary-pattern.sh
index 9e09bd5d6a..60bab291e4 100755
--- a/t/t7816-grep-binary-pattern.sh
+++ b/t/t7816-grep-binary-pattern.sh
@@ -60,23 +60,23 @@  test_expect_success 'setup' "
 "
 
 # Simple fixed-string matching that can use kwset (no -i && non-ASCII)
-nul_match 1 1 1 '-F' 'yQf'
-nul_match 0 0 0 '-F' 'yQx'
-nul_match 1 1 1 '-Fi' 'YQf'
-nul_match 0 0 0 '-Fi' 'YQx'
-nul_match 1 1 1 '' 'yQf'
-nul_match 0 0 0 '' 'yQx'
-nul_match 1 1 1 '' 'æQð'
-nul_match 1 1 1 '-F' 'eQm[*]c'
-nul_match 1 1 1 '-Fi' 'EQM[*]C'
+nul_match P P P '-F' 'yQf'
+nul_match P P P '-F' 'yQx'
+nul_match P P P '-Fi' 'YQf'
+nul_match P P P '-Fi' 'YQx'
+nul_match P P 1 '' 'yQf'
+nul_match P P 0 '' 'yQx'
+nul_match P P 1 '' 'æQð'
+nul_match P P P '-F' 'eQm[*]c'
+nul_match P P P '-Fi' 'EQM[*]C'
 
 # Regex patterns that would match but shouldn't with -F
-nul_match 0 0 0 '-F' 'yQ[f]'
-nul_match 0 0 0 '-F' '[y]Qf'
-nul_match 0 0 0 '-Fi' 'YQ[F]'
-nul_match 0 0 0 '-Fi' '[Y]QF'
-nul_match 0 0 0 '-F' 'æQ[ð]'
-nul_match 0 0 0 '-F' '[æ]Qð'
+nul_match P P P '-F' 'yQ[f]'
+nul_match P P P '-F' '[y]Qf'
+nul_match P P P '-Fi' 'YQ[F]'
+nul_match P P P '-Fi' '[Y]QF'
+nul_match P P P '-F' 'æQ[ð]'
+nul_match P P P '-F' '[æ]Qð'
 
 # The -F kwset codepath can't handle -i && non-ASCII...
 nul_match P 1 1 '-i' '[æ]Qð'
@@ -90,38 +90,38 @@  nul_match P 0 1 '-i' '[Æ]Qð'
 nul_match P 0 1 '-i' 'ÆQÐ'
 
 # \0 in regexes can only work with -P & PCRE v2
-nul_match P 1 1 '' 'yQ[f]'
-nul_match P 1 1 '' '[y]Qf'
-nul_match P 1 1 '-i' 'YQ[F]'
-nul_match P 1 1 '-i' '[Y]Qf'
-nul_match P 1 1 '' 'æQ[ð]'
-nul_match P 1 1 '' '[æ]Qð'
-nul_match P 0 1 '-i' 'ÆQ[Ð]'
-nul_match P 1 1 '' 'eQm.*cQ'
-nul_match P 1 1 '-i' 'EQM.*cQ'
-nul_match P 0 0 '' 'eQm[*]c'
-nul_match P 0 0 '-i' 'EQM[*]C'
+nul_match P P 1 '' 'yQ[f]'
+nul_match P P 1 '' '[y]Qf'
+nul_match P P 1 '-i' 'YQ[F]'
+nul_match P P 1 '-i' '[Y]Qf'
+nul_match P P 1 '' 'æQ[ð]'
+nul_match P P 1 '' '[æ]Qð'
+nul_match P P 1 '-i' 'ÆQ[Ð]'
+nul_match P P 1 '' 'eQm.*cQ'
+nul_match P P 1 '-i' 'EQM.*cQ'
+nul_match P P 0 '' 'eQm[*]c'
+nul_match P P 0 '-i' 'EQM[*]C'
 
 # Assert that we're using REG_STARTEND and the pattern doesn't match
 # just because it's cut off at the first \0.
-nul_match 0 0 0 '-i' 'NOMATCHQð'
-nul_match P 0 0 '-i' '[Æ]QNOMATCH'
-nul_match P 0 0 '-i' '[æ]QNOMATCH'
+nul_match P P 0 '-i' 'NOMATCHQð'
+nul_match P P 0 '-i' '[Æ]QNOMATCH'
+nul_match P P 0 '-i' '[æ]QNOMATCH'
 
 # Ensure that the matcher doesn't regress to something that stops at
 # \0
-nul_match 0 0 0 '-F' 'yQ[f]'
-nul_match 0 0 0 '-Fi' 'YQ[F]'
-nul_match 0 0 0 '' 'yQNOMATCH'
-nul_match 0 0 0 '' 'QNOMATCH'
-nul_match 0 0 0 '-i' 'YQNOMATCH'
-nul_match 0 0 0 '-i' 'QNOMATCH'
-nul_match 0 0 0 '-F' 'æQ[ð]'
+nul_match P P P '-F' 'yQ[f]'
+nul_match P P P '-Fi' 'YQ[F]'
+nul_match P P 0 '' 'yQNOMATCH'
+nul_match P P 0 '' 'QNOMATCH'
+nul_match P P 0 '-i' 'YQNOMATCH'
+nul_match P P 0 '-i' 'QNOMATCH'
+nul_match P P P '-F' 'æQ[ð]'
 nul_match P P P '-Fi' 'ÆQ[Ð]'
-nul_match P 0 1 '-i' 'ÆQ[Ð]'
-nul_match 0 0 0 '' 'yQNÓMATCH'
-nul_match 0 0 0 '' 'QNÓMATCH'
-nul_match 0 0 0 '-i' 'YQNÓMATCH'
-nul_match 0 0 0 '-i' 'QNÓMATCH'
+nul_match P P 1 '-i' 'ÆQ[Ð]'
+nul_match P P 0 '' 'yQNÓMATCH'
+nul_match P P 0 '' 'QNÓMATCH'
+nul_match P P 0 '-i' 'YQNÓMATCH'
+nul_match P P 0 '-i' 'QNÓMATCH'
 
 test_done