Message ID | patch-v2-7.8-140a7416223-20211110T013632Z-avarab@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | grep: simplify & delete code by changing obscure cfg variable behavior | expand |
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > Change the interaction between "grep.patternType=default" and > "grep.extendedRegexp=true" to make setting "grep.extendedRegexp=true" > synonymous with setting "grep.patternType=extended". This description alone is not quite understandable. It is not saying much more than the single line title, and presense of it does not seem to improve the understanding by the readers. > When "grep.patternType" was introduced in 84befcd0a4a (grep: add a > grep.patternType configuration setting, 2012-08-03) we made two > seemingly contradictory promises: > > 1. You can set "grep.patternType", and "[setting it to] 'default' > will return to the default matching behavior". > > 2. Support the existing "grep.extendedRegexp" option, but ignore it > when the new "grep.patternType" is set, *except* "when the > `grep.patternType` option is set. to a value other than 'default'". OK, so setting grep.patternType=default makes grep.extendedRegexp to be taken into account. By grep.patternType to something else, the other one is ignored. 2. is a very explicit way to say so. Where did you get 1. from? If you have this paragraph in the log message in mind, I agree that it is less than ideally phrased, but ... Rather than adding an additional setting for grep.fooRegexp for current and future pattern matching options, add a grep.patternType setting that can accept appropriate values for modifying the default grep pattern matching behavior. The current values are "basic", "extended", "fixed", "perl" and "default" for setting -G, -E, -F, -P and the default behavior respectively. ... with the understanding of 2. (which is in what the commit adds to Documentation/config.txt), it is reasonable to understand that "the default behaviour" is "use BRE or ERE, depending on the setting of grep.extendedRegexp". Doesn't the code behave that way? I think the above is exactly how the commit wanted to make the code behave. > I think that 84befcd0a4a probably didn't intend this behavior, but > instead ended up conflating our internal "unspecified" state with a > user's explicit desire to set the configuration back to the > default. I am not sure where that comes from, but if I imagine somebody confuses between "default" and "basic" and considers "default" a synonym for "basic", I can sort-of understand it. Is it what is happening here? But it is not what the original .patternType patch wanted to do back then, and it is not what we want to see now. > I.e. a user would correctly expect this to keep working: > > # ERE grep > git -c grep.extendedRegexp=true grep <pattern> This makes sense. > And likewise for "grep.patternType=default" to take precedence over > the disfavored "grep.extendedRegexp" option, i.e. the usual "last set > wins" semantics. > > # BRE grep > git -c grep.extendedRegexp=true -c grep.patternType=basic grep <pattern> This makes sense, too. Do either of the above two not work as you expect (i.e. the first use ERE and the second use BRE)? What I have trouble with is that it is unclear if you are describing what should happen (in the above, I said "makes sense", to show my agreement, assuming that it is the case), or if you are describing what does happen that you disagree with. Another thing I have trouble with is your mention of "keep working". Are you proposing to deliberately break what is working as users correctly expect? Why? > But probably not for this to ignore the favored "grep.patternType" > option entirely, say if /etc/gitconfig was still setting > "grep.extendedRegexp", but "~/.gitconfig" used the new > "grep.patternType" (and wanted to use the "default" value): > > # Was ERE, now BRE > git -c grep.extendedRegexp=true grep.patternType=default grep <pattern> I do not quite get your "Was X, now Y" label. What did you want to say with that? Also I am not sure what you exactly mean when you say "and wanted to use the 'default' value". There is no single "THE" default value. If patternType=default is the last patterntype (it may be set in many places, but the last one should win), the user is telling that the last-one-wins setting of extendedRegexp is to be honored. So, if grep.extendedRegexp in /etc/gitconfig is the last one defined, we would choose between BRE and ERE depending on that setting. Isn't that what is happening in the current code? Or are all the above explanation result of simple misunderstanding that setting to "default" means setting to "basic"?
On Fri, Nov 12 2021, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > [...] I'll try to reply to all the rest of the feedback, just really quick on this, because I think it might represent a bit of a gordian knot. > Another thing I have trouble with is your mention of "keep working". > Are you proposing to deliberately break what is working as users > correctly expect? Why? Yes, I'd like to change the behavior, because it makes the grep API much easier to deal with, and beacuse I think it impacts nobody in practice. The real goal for this series is that I've got pending patches to speedup diffcore-pickaxe massively by moving it over to PCRE & drop the kwset.c code. An old perf test I dug up for that is in [1]. To do that I needed to re-use the bits of grep.c machinery that deal with setting up patterns, dealing with BRE,ERE,PCRE etc. elsewhere. I *can* do that in a different way, but it's going to be much easier if we can gradually evolve the already working grep API to become an internal textual pattern matching API. Eventually I'd like to move all of regcomp()/regexec() over to such a thing, because we can for any other ranodm thing we use regexes for get speedups by using PCRE (and optionally use its interface to understand BRE/ERE syntax). The alternative is to split that part off from grep.c, which is a bit more painful, or to have the init bits etc. take some "no config doesn't go first", "no it goes first" flags just to support this one API user. So it would be generally useful to know if you're at all open to that. Reading between the lines in some other comments I fear that it may be a "no" except if we mark it as deprecated, wait some years, maybe remove/change it then etc. 1. GIT_TEST_LONG= GIT_PERF_REPEAT_COUNT=10 GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE=1 CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst' ./run origin/next HEAD -- p4209-pickaxe.sh Test origin/next HEAD ------------------------------------------------------------------------------------------------------------------ 4209.1: git log -S'int main' <limit-rev>.. 0.38(0.36+0.01) 0.37(0.33+0.04) -2.6% 4209.2: git log -S'æ' <limit-rev>.. 0.51(0.47+0.04) 0.32(0.27+0.05) -37.3% 4209.3: git log --pickaxe-regex -S'(int|void|null)' <limit-rev>.. 0.72(0.68+0.03) 0.57(0.54+0.03) -20.8% 4209.4: git log --pickaxe-regex -S'if *\([^ ]+ & ' <limit-rev>.. 0.60(0.55+0.02) 0.39(0.34+0.05) -35.0% 4209.5: git log --pickaxe-regex -S'[àáâãäåæñøùúûüýþ]' <limit-rev>.. 0.43(0.40+0.03) 0.50(0.44+0.06) +16.3% 4209.6: git log -G'(int|void|null)' <limit-rev>.. 0.64(0.55+0.09) 0.63(0.56+0.05) -1.6% 4209.7: git log -G'if *\([^ ]+ & ' <limit-rev>.. 0.64(0.59+0.05) 0.63(0.56+0.06) -1.6% 4209.8: git log -G'[àáâãäåæñøùúûüýþ]' <limit-rev>.. 0.63(0.54+0.08) 0.62(0.55+0.06) -1.6% 4209.9: git log -i -S'int main' <limit-rev>.. 0.39(0.35+0.03) 0.38(0.35+0.02) -2.6% 4209.10: git log -i -S'æ' <limit-rev>.. 0.39(0.33+0.06) 0.32(0.28+0.04) -17.9% 4209.11: git log -i --pickaxe-regex -S'(int|void|null)' <limit-rev>.. 0.90(0.84+0.05) 0.58(0.53+0.04) -35.6% 4209.12: git log -i --pickaxe-regex -S'if *\([^ ]+ & ' <limit-rev>.. 0.71(0.64+0.06) 0.40(0.37+0.03) -43.7% 4209.13: git log -i --pickaxe-regex -S'[àáâãäåæñøùúûüýþ]' <limit-rev>.. 0.43(0.40+0.03) 0.50(0.46+0.04) +16.3% 4209.14: git log -i -G'(int|void|null)' <limit-rev>.. 0.64(0.57+0.06) 0.62(0.56+0.05) -3.1% 4209.15: git log -i -G'if *\([^ ]+ & ' <limit-rev>.. 0.65(0.59+0.06) 0.63(0.54+0.08) -3.1% 4209.16: git log -i -G'[àáâãäåæñøùúûüýþ]' <limit-rev>.. 0.63(0.55+0.08) 0.62(0.56+0.05) -1.6%
diff --git a/Documentation/config/grep.txt b/Documentation/config/grep.txt index ae51f2d91c8..f4b7d3041fb 100644 --- a/Documentation/config/grep.txt +++ b/Documentation/config/grep.txt @@ -12,8 +12,7 @@ grep.patternType:: grep.extendedRegexp:: If set to true, enable `--extended-regexp` option by default. This - option is ignored when the `grep.patternType` option is set to a value - other than 'default'. + option is ignored when the `grep.patternType` option is set. grep.threads:: Number of grep worker threads to use. If unset (or set to 0), Git will diff --git a/builtin/grep.c b/builtin/grep.c index 0ea124321b6..942c4b25077 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -845,7 +845,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix) int i; int dummy; int use_index = 1; - int pattern_type_arg = GREP_PATTERN_TYPE_UNSPECIFIED; int allow_revs; struct option options[] = { @@ -879,16 +878,16 @@ int cmd_grep(int argc, const char **argv, const char *prefix) N_("descend at most <depth> levels"), PARSE_OPT_NONEG, NULL, 1 }, OPT_GROUP(""), - OPT_SET_INT('E', "extended-regexp", &pattern_type_arg, + OPT_SET_INT('E', "extended-regexp", &opt.pattern_type_option, N_("use extended POSIX regular expressions"), GREP_PATTERN_TYPE_ERE), - OPT_SET_INT('G', "basic-regexp", &pattern_type_arg, + OPT_SET_INT('G', "basic-regexp", &opt.pattern_type_option, N_("use basic POSIX regular expressions (default)"), GREP_PATTERN_TYPE_BRE), - OPT_SET_INT('F', "fixed-strings", &pattern_type_arg, + OPT_SET_INT('F', "fixed-strings", &opt.pattern_type_option, N_("interpret patterns as fixed strings"), GREP_PATTERN_TYPE_FIXED), - OPT_SET_INT('P', "perl-regexp", &pattern_type_arg, + OPT_SET_INT('P', "perl-regexp", &opt.pattern_type_option, N_("use Perl-compatible regular expressions"), GREP_PATTERN_TYPE_PCRE), OPT_GROUP(""), @@ -982,7 +981,6 @@ int cmd_grep(int argc, const char **argv, const char *prefix) argc = parse_options(argc, argv, prefix, options, grep_usage, PARSE_OPT_KEEP_DASHDASH | PARSE_OPT_STOP_AT_NON_OPTION); - grep_commit_pattern_type(pattern_type_arg, &opt); if (use_index && !startup_info->have_repository) { int fallback = 0; diff --git a/grep.c b/grep.c index fb3f63c63ef..dda8e536fe3 100644 --- a/grep.c +++ b/grep.c @@ -60,8 +60,10 @@ int grep_config(const char *var, const char *value, void *cb) if (userdiff_config(var, value) < 0) return -1; - if (!strcmp(var, "grep.extendedregexp")) { - opt->extended_regexp_option = git_config_bool(var, value); + if (opt->pattern_type_option == GREP_PATTERN_TYPE_UNSPECIFIED && + !strcmp(var, "grep.extendedregexp") && + git_config_bool(var, value)) { + opt->pattern_type_option = GREP_PATTERN_TYPE_ERE; return 0; } @@ -115,62 +117,6 @@ void grep_init(struct grep_opt *opt, struct repository *repo) opt->header_tail = &opt->header_list; } -static void grep_set_pattern_type_option(enum grep_pattern_type pattern_type, struct grep_opt *opt) -{ - /* - * When committing to the pattern type by setting the relevant - * fields in grep_opt it's generally not necessary to zero out - * the fields we're not choosing, since they won't have been - * set by anything. The extended_regexp_option field is the - * only exception to this. - * - * This is because in the process of parsing grep.patternType - * & grep.extendedRegexp we set opt->pattern_type_option and - * opt->extended_regexp_option, respectively. We then - * internally use opt->extended_regexp_option to see if we're - * compiling an ERE. It must be unset if that's not actually - * the case. - */ - if (pattern_type != GREP_PATTERN_TYPE_ERE && - opt->extended_regexp_option) - opt->extended_regexp_option = 0; - - switch (pattern_type) { - case GREP_PATTERN_TYPE_UNSPECIFIED: - /* fall through */ - - case GREP_PATTERN_TYPE_BRE: - break; - - case GREP_PATTERN_TYPE_ERE: - opt->extended_regexp_option = 1; - break; - - case GREP_PATTERN_TYPE_FIXED: - opt->fixed = 1; - break; - - case GREP_PATTERN_TYPE_PCRE: - opt->pcre2 = 1; - break; - } -} - -void grep_commit_pattern_type(enum grep_pattern_type pattern_type, struct grep_opt *opt) -{ - if (pattern_type != GREP_PATTERN_TYPE_UNSPECIFIED) - grep_set_pattern_type_option(pattern_type, opt); - else if (opt->pattern_type_option != GREP_PATTERN_TYPE_UNSPECIFIED) - grep_set_pattern_type_option(opt->pattern_type_option, opt); - else if (opt->extended_regexp_option) - /* - * This branch *must* happen after setting from the - * opt->pattern_type_option above, we don't want - * grep.extendedRegexp to override grep.patternType! - */ - grep_set_pattern_type_option(GREP_PATTERN_TYPE_ERE, opt); -} - static struct grep_pat *create_grep_pat(const char *pat, size_t patlen, const char *origin, int no, enum grep_pat_token t, @@ -492,9 +438,10 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt) p->word_regexp = opt->word_regexp; p->ignore_case = opt->ignore_case; - p->fixed = opt->fixed; + p->fixed = opt->pattern_type_option == GREP_PATTERN_TYPE_FIXED; - if (memchr(p->pattern, 0, p->patternlen) && !opt->pcre2) + if (opt->pattern_type_option != GREP_PATTERN_TYPE_PCRE && + memchr(p->pattern, 0, p->patternlen)) die(_("given pattern contains NULL byte (via -f <file>). This is only supported with -P under PCRE v2")); p->is_fixed = is_fixed(p->pattern, p->patternlen); @@ -545,14 +492,14 @@ static void compile_regexp(struct grep_pat *p, struct grep_opt *opt) return; } - if (opt->pcre2) { + if (opt->pattern_type_option == GREP_PATTERN_TYPE_PCRE) { compile_pcre2_pattern(p, opt); return; } if (p->ignore_case) regflags |= REG_ICASE; - if (opt->extended_regexp_option) + if (opt->pattern_type_option == GREP_PATTERN_TYPE_ERE) regflags |= REG_EXTENDED; err = regcomp(&p->regexp, p->pattern, regflags); if (err) { diff --git a/grep.h b/grep.h index 30a7dfd3294..e4e548aed90 100644 --- a/grep.h +++ b/grep.h @@ -143,7 +143,6 @@ struct grep_opt { int unmatch_name_only; int count; int word_regexp; - int fixed; int all_match; #define GREP_BINARY_DEFAULT 0 #define GREP_BINARY_NOMATCH 1 @@ -152,7 +151,6 @@ struct grep_opt { int allow_textconv; int extended; int use_reflog_filter; - int pcre2; int relative; int pathname; int null_following_name; @@ -161,8 +159,7 @@ struct grep_opt { int max_depth; int funcname; int funcbody; - int extended_regexp_option; - int pattern_type_option; + enum grep_pattern_type pattern_type_option; int ignore_locale; char colors[NR_GREP_COLORS][COLOR_MAXLEN]; unsigned pre_context; @@ -201,7 +198,6 @@ struct grep_opt { int grep_config(const char *var, const char *value, void *); void grep_init(struct grep_opt *, struct repository *repo); -void grep_commit_pattern_type(enum grep_pattern_type, struct grep_opt *opt); void append_grep_pat(struct grep_opt *opt, const char *pat, size_t patlen, const char *origin, int no, enum grep_pat_token t); void append_grep_pattern(struct grep_opt *opt, const char *pat, const char *origin, int no, enum grep_pat_token t); diff --git a/revision.c b/revision.c index 9f9b0d2429e..ed29d245c89 100644 --- a/revision.c +++ b/revision.c @@ -2864,8 +2864,6 @@ int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s diff_setup_done(&revs->diffopt); - grep_commit_pattern_type(GREP_PATTERN_TYPE_UNSPECIFIED, - &revs->grep_filter); if (!is_encoding_utf8(get_log_output_encoding())) revs->grep_filter.ignore_locale = 1; compile_grep_patterns(&revs->grep_filter); diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index 6b6423a07c3..a59a9726357 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -443,7 +443,7 @@ do ' test_expect_success "grep $L with grep.extendedRegexp=true and grep.patternType=default" ' - echo "${HC}ab:abc" >expected && + echo "${HC}ab:a+bc" >expected && git \ -c grep.extendedRegexp=true \ -c grep.patternType=default \
Change the interaction between "grep.patternType=default" and "grep.extendedRegexp=true" to make setting "grep.extendedRegexp=true" synonymous with setting "grep.patternType=extended". This changes our existing config parsing behavior as detailed below, but in a way that's consistent with how we parse other configuration. We are breaking past promises here, but I doubt that this will impact anyone in practice. The reduction in complexity and resulting consistency with other default config behavior is worth it. When "grep.patternType" was introduced in 84befcd0a4a (grep: add a grep.patternType configuration setting, 2012-08-03) we made two seemingly contradictory promises: 1. You can set "grep.patternType", and "[setting it to] 'default' will return to the default matching behavior". 2. Support the existing "grep.extendedRegexp" option, but ignore it when the new "grep.patternType" is set, *except* "when the `grep.patternType` option is set. to a value other than 'default'". I think that 84befcd0a4a probably didn't intend this behavior, but instead ended up conflating our internal "unspecified" state with a user's explicit desire to set the configuration back to the default. I.e. a user would correctly expect this to keep working: # ERE grep git -c grep.extendedRegexp=true grep <pattern> And likewise for "grep.patternType=default" to take precedence over the disfavored "grep.extendedRegexp" option, i.e. the usual "last set wins" semantics. # BRE grep git -c grep.extendedRegexp=true -c grep.patternType=basic grep <pattern> But probably not for this to ignore the favored "grep.patternType" option entirely, say if /etc/gitconfig was still setting "grep.extendedRegexp", but "~/.gitconfig" used the new "grep.patternType" (and wanted to use the "default" value): # Was ERE, now BRE git -c grep.extendedRegexp=true grep.patternType=default grep <pattern> I think that in practice nobody or almost nobody is going to be relying on this obscure interaction, and as shown here it makes the config parsing much simpler. We no longer have to carry a complex state machine in "grep_commit_pattern_type()" and "grep_set_pattern_type_option()". We can also do away with the "int fixed" and "int pcre2" members in favor of using "pattern_type_option" directly in "grep.c", as well as dropping the "pattern_type_arg" variable in "builtin/grep.c" in favor of using the "pattern_type_option" member directly. See my 07a3d411739 (grep: remove regflags from the public grep_opt API, 2017-06-29) for addition of the two comments being removed here, i.e. the complexity noted in that commit is now going away. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- Documentation/config/grep.txt | 3 +- builtin/grep.c | 10 ++--- grep.c | 71 +++++------------------------------ grep.h | 6 +-- revision.c | 2 - t/t7810-grep.sh | 2 +- 6 files changed, 16 insertions(+), 78 deletions(-)