mbox series

[v10,0/9] grep: simplify & delete "init" & "config" code

Message ID cover-v10-0.9-00000000000-20220204T211534Z-avarab@gmail.com (mailing list archive)
Headers show
Series grep: simplify & delete "init" & "config" code | expand

Message

Ævar Arnfjörð Bjarmason Feb. 4, 2022, 9:20 p.m. UTC
This series makes using the grep.[ch] API easier, by having it follow
the usual pattern of being initialized with:

    defaults -> config -> command-line

This is to make some follow-up work easier, this is a net code
deletion if we exclude newly added tests.

Changes since v9:

 * Integrated the proposed fixup from Junio's branch, and updated both
   the tests and commit messages accordingly. I changed "BRE" and
   "ERE" in the test descriptions to indicate what we're expecting to
   match as, and added the relevant test's from Junio's comments on an
   earlier round.

Junio: Sorry about the back & forth on this series. Each time I came
back to it it had been a while, and between a large mail queue and
fixing some isolated issues manged to lose the larger problem you were
pointing out. Thanks again!

Ævar Arnfjörð Bjarmason (9):
  grep.h: remove unused "regex_t regexp" from grep_opt
  log tests: check if grep_config() is called by "log"-like cmds
  grep tests: add missing "grep.patternType" config tests
  built-ins: trust the "prefix" from run_builtin()
  grep.c: don't pass along NULL callback value
  grep API: call grep_config() after grep_init()
  grep.h: make "grep_opt.pattern_type_option" use its enum
  grep.c: do "if (bool && memchr())" not "if (memchr() && bool)"
  grep: simplify config parsing and option parsing

 builtin/grep.c    |  27 +++++------
 builtin/log.c     |  13 +++++-
 builtin/ls-tree.c |   2 +-
 git.c             |   1 +
 grep.c            | 113 ++++++----------------------------------------
 grep.h            |  31 +++++++++----
 revision.c        |   4 +-
 t/t4202-log.sh    |  24 ++++++++++
 t/t7810-grep.sh   |  96 +++++++++++++++++++++++++++++++++++++++
 9 files changed, 185 insertions(+), 126 deletions(-)

Range-diff against v9:
 1:  b9372cde017 =  1:  184f7e0c5bd grep.h: remove unused "regex_t regexp" from grep_opt
 2:  30219a0ae9d =  2:  ac397cc6a18 log tests: check if grep_config() is called by "log"-like cmds
 3:  a75b288340b !  3:  3464c76cfd7 grep tests: add missing "grep.patternType" config tests
    @@ Commit message
         followed by a "grep.extendedRegexp=false" behaves as though
         "grep.extendedRegexp" wasn't provided.
     
    +    See [1] for the source of some of these tests, and their
    +    initial (pseudocode) implementation, and [2] for a later discussion
    +    about a breakage due to missing testing (which had been noted in [1]
    +    all along).
    +
    +    1. https://lore.kernel.org/git/xmqqv8zf6j86.fsf@gitster.g/
    +    2. https://lore.kernel.org/git/xmqqpmoczwtu.fsf@gitster.g/
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
         Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
    @@ t/t7810-grep.sh: do
     +		test_cmp expected actual
     +	'
     +
    -+	test_expect_success "grep $L with grep.extendedRegexp and grep.patternType are both last-one-wins independently (BRE)" '
    ++	test_expect_success "grep $L with grep.extendedRegexp and grep.patternType are both last-one-wins independently (ERE)" '
     +		echo "${HC}ab:abc" >expected &&
     +		git \
     +			-c grep.patternType=fixed \
    @@ t/t7810-grep.sh: do
     +	'
     +
     +	test_expect_success "grep $L with grep.extendedRegexp and grep.patternType are both last-one-wins independently (ERE)" '
    ++		echo "${HC}ab:abc" >expected &&
    ++		git \
    ++			-c grep.extendedRegexp=false \
    ++			-c grep.patternType=default \
    ++			-c grep.extendedRegexp=true \
    ++			grep "a+b*c" $H ab >actual &&
    ++		test_cmp expected actual
    ++	'
    ++
    ++	test_expect_success "grep $L with grep.extendedRegexp and grep.patternType are both last-one-wins independently (BRE)" '
    ++		echo "${HC}ab:a+bc" >expected &&
    ++		git \
    ++			-c grep.extendedRegexp=true \
    ++			-c grep.extendedRegexp=false \
    ++			grep "a+b*c" $H ab >actual &&
    ++		test_cmp expected actual
    ++	'
    ++
    ++	test_expect_success "grep $L with grep.extendedRegexp and grep.patternType are both last-one-wins independently (BRE)" '
    ++		echo "${HC}ab:abc" >expected &&
    ++		git \
    ++			-c grep.extendedRegexp=false \
    ++			-c grep.extendedRegexp=true \
    ++			-c grep.patternType=default \
    ++			grep "a+b*c" $H ab >actual &&
    ++		test_cmp expected actual
    ++	'
    ++
    ++	test_expect_success "grep $L with grep.extendedRegexp and grep.patternType are both last-one-wins independently (BRE)" '
     +		echo "${HC}ab:a+bc" >expected &&
     +		git \
     +			-c grep.patternType=default \
    @@ t/t7810-grep.sh: do
     +			grep "a+b*c" $H ab >actual &&
     +		test_cmp expected actual
     +	'
    -+
     +	test_expect_success "grep $L with grep.patternType=extended and grep.patternType=default" '
     +		echo "${HC}ab:a+bc" >expected &&
     +		git \
 4:  6e7f9730a7d =  4:  c6ada96298a built-ins: trust the "prefix" from run_builtin()
 5:  fbcfea84696 =  5:  1f09de53e07 grep.c: don't pass along NULL callback value
 6:  96c8cc7806e =  6:  ce646154538 grep API: call grep_config() after grep_init()
 7:  e09616056b4 =  7:  6446b4f0f33 grep.h: make "grep_opt.pattern_type_option" use its enum
 8:  61fc6a4dac8 =  8:  df8ba5aba68 grep.c: do "if (bool && memchr())" not "if (memchr() && bool)"
 9:  445467e87f7 !  9:  ccbdfa48315 grep: simplify config parsing and option parsing
    @@ Commit message
         last-one-wins variable, with "grep.extendedRegexp" yielding to
         "grep.patternType", except when "grep.patternType=default".
     
    -    Note that this applies as we parse the config, i.e. a sequence of:
    +    Note that as the previously added tests indicate this cannot be done
    +    on-the-fly as we see the config variables, without introducing more
    +    state keeping. I.e. if we see:
     
    -        -c grep.patternType=perl
    -        -c grep.extendedRegexp=true \
    +        -c grep.extendedRegexp=false
             -c grep.patternType=default
    +        -c extendedRegexp=true
    +
    +    We need to select ERE, since grep.patternType=default unselects that
    +    variable, which normally has higher precedence, but we also need to
    +    select BRE in cases of:
     
    -    should select ERE due to "grep.extendedRegexp=true and
    -    grep.patternType=default". We can determine this as we parse the
    -    config, because:
    +        -c grep.extendedRegexp=true \
    +        -c grep.extendedRegexp=false
     
    -     * If we see "grep.extendedRegexp" we set "extended_regexp_option" to
    -       its boolean value.
    +    Which would not be the case for this, which select ERE:
     
    -     * If we see "grep.extendedRegexp" but
    -       "grep.patternType=[default|<unset>]" is in effect we *don't* set
    -       the internal "pattern_type_option" to update the pattern type.
    +        -c grep.patternType=extended \
    +        -c grep.extendedRegexp=false
     
    -     * If we see "grep.patternType!=default" we can set our internal
    -       "pattern_type_option" directly, it doesn't matter what the state of
    -       "extended_regexp_option" is, but we don't forget what it was, in
    -       case we see a "grep.patternType=default" again.
    +    Therefore we cannot do this on-the-fly in grep_config without also
    +    introducing tracking variables for not only the pattern type, but what
    +    the source of that pattern type was.
     
    -     * If we see a "grep.patternType=default" we can set the pattern to
    -       ERE or BRE depending on whether we last saw a
    -       "grep.extendedRegexp=true" or
    -       "grep.extendedRegexp=[false|<unset>]".
    +    So we need to decide on the pattern after our config was fully
    +    parsed. Let's do that by deferring the decision on the pattern type
    +    until it's time to compile it in compile_regexp().
     
    -    With this change the "extended_regexp_option" member is only used
    -    within grep_config(), and in the current codebase we could equally
    -    track it as a "static" variable within that function, see [1] for a
    -    version for this patch that did that. We're keeping it a struct member
    -    to make that function reentrant, in case it ends up mattering in the
    -    future.
    +    By that time we've not only parsed the config, but also handled the
    +    command-line options. Those will set "opt.pattern_type_option" (*not*
    +    "opt.extended_regexp_option"!).
     
    -    The command-line parsing in cmd_grep() can then completely ignore
    -    "grep.extendedRegexp". Whatever effect it had before that step won't
    -    matter if we see -G, -E, -P etc.
    +    At that point all we need to do is see if "grep.patternType" was
    +    UNSPECIFIED in the end (including an explicit "=default"), if so we'll
    +    use the "grep.extendedRegexp" configuration, if any.
     
         See my 07a3d411739 (grep: remove regflags from the public grep_opt
         API, 2017-06-29) for addition of the two comments being removed here,
    @@ builtin/grep.c: int cmd_grep(int argc, const char **argv, const char *prefix)
      		int fallback = 0;
     
      ## grep.c ##
    -@@ grep.c: static int parse_pattern_type_arg(const char *opt, const char *arg)
    - 
    - define_list_config_array_extra(color_grep_slots, {"match"});
    - 
    -+static void adjust_pattern_type(enum grep_pattern_type *pto, const int ero)
    -+{
    -+	if (*pto == GREP_PATTERN_TYPE_UNSPECIFIED)
    -+		*pto = ero ? GREP_PATTERN_TYPE_ERE : GREP_PATTERN_TYPE_BRE;
    -+}
    -+
    - /*
    -  * Read the configuration file once and store it in
    -  * the grep_defaults template.
    -@@ grep.c: int grep_config(const char *var, const char *value, void *cb)
    - 
    - 	if (!strcmp(var, "grep.extendedregexp")) {
    - 		opt->extended_regexp_option = git_config_bool(var, value);
    -+		adjust_pattern_type(&opt->pattern_type_option,
    -+				    opt->extended_regexp_option);
    - 		return 0;
    - 	}
    - 
    - 	if (!strcmp(var, "grep.patterntype")) {
    - 		opt->pattern_type_option = parse_pattern_type_arg(var, value);
    -+		if (opt->extended_regexp_option == -1)
    -+			return 0;
    -+		adjust_pattern_type(&opt->pattern_type_option,
    -+				    opt->extended_regexp_option);
    - 		return 0;
    - 	}
    - 
     @@ grep.c: void grep_init(struct grep_opt *opt, struct repository *repo)
      	opt->header_tail = &opt->header_list;
      }
    @@ grep.c: void grep_init(struct grep_opt *opt, struct repository *repo)
      					const char *origin, int no,
      					enum grep_pat_token t,
     @@ grep.c: static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
    + 	int err;
    + 	int regflags = REG_NEWLINE;
      
    ++	if (opt->pattern_type_option == GREP_PATTERN_TYPE_UNSPECIFIED)
    ++		opt->pattern_type_option = (opt->extended_regexp_option
    ++					    ? GREP_PATTERN_TYPE_ERE
    ++					    : GREP_PATTERN_TYPE_BRE);
    ++
      	p->word_regexp = opt->word_regexp;
      	p->ignore_case = opt->ignore_case;
     -	p->fixed = opt->fixed;
    @@ grep.h: struct grep_opt {
      	int pathname;
      	int null_following_name;
     @@ grep.h: struct grep_opt {
    - 	.relative = 1, \
    - 	.pathname = 1, \
    - 	.max_depth = -1, \
    -+	.extended_regexp_option = -1, \
    - 	.pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED, \
    - 	.colors = { \
    - 		[GREP_COLOR_CONTEXT] = "", \
    -@@ grep.h: struct grep_opt {
      
      int grep_config(const char *var, const char *value, void *);
      void grep_init(struct grep_opt *, struct repository *repo);