diff mbox series

[v3,3/7] grep tests: add missing "grep.patternType" config test

Message ID patch-v3-3.7-fcad1b1664b-20211129T143956Z-avarab@gmail.com (mailing list archive)
State Superseded
Headers show
Series grep: simplify & delete "init" & "config" code | expand

Commit Message

Ævar Arnfjörð Bjarmason Nov. 29, 2021, 2:50 p.m. UTC
Extend the grep tests to assert that setting
"grep.patternType=extended" followed by "grep.patternType=default"
will behave as if "--extended-regexp" was provided, and not as
"--basic-regexp". In a subsequent commit we'll need to treat
"grep.patternType=default" as a special-case, but let's make sure we
don't ignore it if "grep.patternType" was set to a non-"default" value
before.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t7810-grep.sh | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Junio C Hamano Nov. 29, 2021, 9:52 p.m. UTC | #1
Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Extend the grep tests to assert that setting
> "grep.patternType=extended" followed by "grep.patternType=default"
> will behave as if "--extended-regexp" was provided, and not as
> "--basic-regexp".

grep.patternType is the usual "last-one wins".  If the last value
set to patternType is the default, the setting to grep.extendedRegexp
should take effect (so if it is set to true, we'd see ERE behavour).

Back in the days when the "return to the default matching behavior"
part was written in 84befcd0 (grep: add a grep.patternType
configuration setting, 2012-08-03), grep.extendedRegexp was the only
way to configure the behaviour since b22520a3 (grep: allow -E and -n
to be turned on by default via configuration, 2011-03-30).  It was
understandable that we referred to the behaviour that honors the
older configuration variable as "the default matching" behaviour.
It is fairly clear in its log message:

    When grep.patternType is set to a value other than "default", the
    grep.extendedRegexp setting is ignored. The value of "default" restores
    the current default behavior, including the grep.extendedRegexp
    behavior.

So, unless your description is a typo, I am somewhat surprised by
your findings that =default that comes later does not defeat an
earlier =extended.

It should just clear that earlier extended set by grep.patternType
and only pay attention to grep.extendedRegexp variable.  Doing
anything else is a bug, I think.

Thanks.

diff --git i/Documentation/config/grep.txt w/Documentation/config/grep.txt
index 44abe45a7c..95fcb3ca29 100644
--- i/Documentation/config/grep.txt
+++ w/Documentation/config/grep.txt
@@ -8,7 +8,8 @@ grep.patternType::
 	Set the default matching behavior. Using a value of 'basic', 'extended',
 	'fixed', or 'perl' will enable the `--basic-regexp`, `--extended-regexp`,
 	`--fixed-strings`, or `--perl-regexp` option accordingly, while the
-	value 'default' will return to the default matching behavior.
+	value 'default' will return to the default matching behavior, which is,
+	to honor `grep.extendedRegexp` option and choose either basic or extended.
 
 grep.extendedRegexp::
 	If set to true, enable `--extended-regexp` option by default. This
Junio C Hamano Dec. 3, 2021, 12:48 a.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Extend the grep tests to assert that setting
>> "grep.patternType=extended" followed by "grep.patternType=default"
>> will behave as if "--extended-regexp" was provided, and not as
>> "--basic-regexp".
>
> grep.patternType is the usual "last-one wins".  If the last value
> set to patternType is the default, the setting to grep.extendedRegexp
> should take effect (so if it is set to true, we'd see ERE behavour).
>
> Back in the days when the "return to the default matching behavior"
> part was written in 84befcd0 (grep: add a grep.patternType
> configuration setting, 2012-08-03), grep.extendedRegexp was the only
> way to configure the behaviour since b22520a3 (grep: allow -E and -n
> to be turned on by default via configuration, 2011-03-30).  It was
> understandable that we referred to the behaviour that honors the
> older configuration variable as "the default matching" behaviour.
> It is fairly clear in its log message:
>
>     When grep.patternType is set to a value other than "default", the
>     grep.extendedRegexp setting is ignored. The value of "default" restores
>     the current default behavior, including the grep.extendedRegexp
>     behavior.
>
> So, unless your description is a typo, I am somewhat surprised by
> your findings that =default that comes later does not defeat an
> earlier =extended.
>
> It should just clear that earlier extended set by grep.patternType
> and only pay attention to grep.extendedRegexp variable.  Doing
> anything else is a bug, I think.

So, let's see how 

  $ git -c grep.patternType=extended \
	-c grep.patternType=default \
	grep foo

works today.

We start from builtin/grep.c::cmd_grep(), which calls
git_config(grep_cmd_config).  grep_cmd_config() farms out most of
the work to grep.c::grep_config(), which populates the grep_defaults
structure. grep_defaults.pattern_type_option first becomes
GREP_PATTERN_TYPE_ERE and then it gets overwritten to
GREP_PATTERN_TYPE_UNSPECIFIED.

Then grep.c::grep_init() copies that grep_defaults to the
per-invocation "struct grep_opt opt" that is on-stack in
builtin/grep.c::cmd_grep().  

opt.patternType becomes GREP_PATTERN_TYPE_UNSPECIFIED;
opt.extendedRegexp in the same structure is 0, because nobody has
touched the corresponding member in grep_defaults in
grep_cmd_config().

Then parse_options() gets its turn to futz with members in "opt".
-E/-G/-F/-P would be parsed into a separate variable "pattern_type";
in this case, there is no command line option, so the pattern_type
variable has GREP_PATTERN_TYPE_UNSPECIFIED.

And finally grep.c::grep_commit_pattern_type() is called to combine
what is in "pattern_type" and "opt".

It calls grep_set_pattern_type_option() to futz with members in opt
that is what determines the final choice.

 - If pattern_type is not UNSPECIFIED, use that;

 - Otherwise, if opt->pattern_type_option is not UNSPECIFIED, use that;

 - Otherwise, i.e. if pattern_type from the command line and
   opt->pattern_type_option from the configuration are both
   UNSPECIFIED, then check if opt->extended_regexp_option (which is
   set from the config via grep.extendedRegexp) is set.  If so, call
   grep_set_pattern_type_option() to use ERE.

Now, what does grep_set_pattern_type_option() do?

The first thing it does is when pattern_type given is not ERE, drop
the opt->extended_regexp_option bit (which may have been set by
having grep.extendedRegexp configuration set to true).  This is
because, just like 'fixed' and 'pcre2', the runtime after the opt
structure is set up, the code does not look at a single "type"
member that enumerates BRE, ERE, FIXED, PCRE to determine the type
of the pattern, and the 'extended_regexp_option' member, after
grep_commit_pattern_type() finishes its processing, is used to
signal that ERE is in effect.  But as we've seen in the design goal
of the earlier change 84befcd0 (grep: add a grep.patternType
configuration setting, 2012-08-03), the bit obtained from the
grep.extendedRegexp configuration variable is only valid when
grep.patternType is set to UNSPECIFIED (aka default), so there needs
some dropping of this bit happen.

But with two grep.patternType configuration, I do not think the bug
will trigger.  As we traced above, we just get UNSPECIFIED in
grep_defaults.pattern_type_option, that is copied to cmd_grep()::opt,
and it gets combined with UNSPECIFIED in cmd_grep()::pattern_type in
grep_commit_pattern_type().  But the three-step logic in the "commit"
will not do anything in this case.  So, I do not see any code that
makes this behave as if "git grep -E foo" was given.

I suspect that if you do

  $ git -c grep.extendedRegexp=true \
	-c grep.patternType=default \
	grep foo

it should set the .extended_regexp_option member to true and the
.pattern_type_option member to UNSPECIFIED in grep_defaults, copy it
to cmd_grep()::opt, and grep_commit_pattern_type() will try to
combine that "opt" with pattern_type==UNSPECIFIED.  The third "both
pattern_type and opt.pattern_type_option are UNSPECIFIED" case
triggers, and grep_set_pattern_type_option() would be called, with
its pattern_type parameter explicitly set to ERE.

The logic to combine these two are convoluted and I sense that it
could be simplified without breaking the established semantics, but
so far I am not seeing how the code can break in such a way that

>> Extend the grep tests to assert that setting
>> "grep.patternType=extended" followed by "grep.patternType=default"
>> will behave as if "--extended-regexp" was provided, and not as
>> "--basic-regexp".

this claim holds.

So,... after spending too much time following the code, I went back
to the actual test added by the code and see this:

+	test_expect_success "grep $L with grep.patternType=extended and grep.patternType=default" '
+		echo "${HC}ab:a+bc" >expected &&
+		git \
+			-c grep.patternType=extended \
+			-c grep.patternType=default \
+			grep "a+b*c" $H ab >actual &&
+		test_cmp expected actual
+	'

Here, file "ac" has three lines

        a+b*c
        a+bc
        abc

and "a+b*c" pattern is designed to hit the first line with -F, the
second one with -G (because + is literal in BRE so it must exist
literally in the haystack, b* matches single b but not literal b* in
the haystack), the last one with -E (because neither + or * is
literal, so the first two lines do not match, but the last one
matches).  The expectation in the code, unlike what is in the log
message, is that this should match as if -G was given.

So, I guess there is no bug (other than the alarming false report in
the log message).
diff mbox series

Patch

diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 6b6423a07c3..724b1bbbc1c 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -451,6 +451,15 @@  do
 		test_cmp expected actual
 	'
 
+	test_expect_success "grep $L with grep.patternType=extended and grep.patternType=default" '
+		echo "${HC}ab:a+bc" >expected &&
+		git \
+			-c grep.patternType=extended \
+			-c grep.patternType=default \
+			grep "a+b*c" $H ab >actual &&
+		test_cmp expected actual
+	'
+
 	test_expect_success "grep $L with grep.patternType=extended and grep.extendedRegexp=false" '
 		echo "${HC}ab:abc" >expected &&
 		git \