Message ID | pull.1278.git.git.1655740174420.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | grep: add --max-count command line option | expand |
On 6/20/22 10:49, Carlos L. via GitGitGadget wrote:
> + unsigned max_count;
Why not make this intmax_t? That way, you don't have to worry about
casting -1 to unsigned. Also on typical 64-bit machines you no longer
have to worry about mishandling counts greater than 2**32 (the limit
becomes 2**63 - 1 which is plenty).
These days it's typically better to avoid unsigned types in C when you
can, as standard tools like 'gcc -fsanitize=undefined' can catch signed
int overflow whereas unsigned int overflow always wraps around which is
typically bad news.
Hi, On Monday, June 20th, 2022 at 17:57, Paul Eggert <eggert@cs.ucla.edu> wrote: > On 6/20/22 10:49, Carlos L. via GitGitGadget wrote: > > > + unsigned max_count; > > > Why not make this intmax_t? That way, you don't have to worry about > casting -1 to unsigned. Also on typical 64-bit machines you no longer > have to worry about mishandling counts greater than 232 (the limit > becomes 263 - 1 which is plenty). This does not work well with OPTION_INTEGER, since it assumes the value to be int-sized: parse-options.c: 219 *(int *)opt->value = strtol(arg, (char **)&s, 10); I also wanted to avoid using signed int so both sides of the comparison with `count` in grep_source_1() have the same sign.
On 6/20/22 11:25, Carlos L. wrote: > This does not work well with OPTION_INTEGER, since it assumes the value to be int-sized: > > parse-options.c: > 219 *(int *)opt->value = strtol(arg, (char **)&s, 10); OK, so parse-options messes up if the user specifies a count that does not fit in 'int'? Although that's a separate bug, let's not make things worse here; let's make the new count an 'int'. In the long run parse-options should be changed to use strtoimax instead of strtol, and the corresponding integers should be changed to intmax_t, and the proper thing should be done if the string value does not fit into intmax_t. But this longer-run fix affects all integer-valued options, not just this one. > I also wanted to avoid using signed int so both sides of the comparison with `count` in grep_source_1() have the same sign. Such comparisons cannot misfire if both values are nonnegative, and that can easily be arranged here.
diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 3d393fbac1b..19b817d5e58 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -23,6 +23,7 @@ SYNOPSIS [--break] [--heading] [-p | --show-function] [-A <post-context>] [-B <pre-context>] [-C <context>] [-W | --function-context] + [(-m | --max-count) <num>] [--threads <num>] [-f <file>] [-e] <pattern> [--and|--or|--not|(|)|-e <pattern>...] @@ -238,6 +239,13 @@ providing this option will cause it to die. `git diff` works out patch hunk headers (see 'Defining a custom hunk-header' in linkgit:gitattributes[5]). +-m <num>:: +--max-count <num>:: + Limit the amount of matches per file. When using the `-v` or + `--invert-match` option, the search stops after the specified + number of non-matches. A value of -1 will return unlimited + results (the default). + --threads <num>:: Number of grep worker threads to use. See `grep.threads` in 'CONFIGURATION' for more information. diff --git a/builtin/grep.c b/builtin/grep.c index bcb07ea7f75..4ab28995da0 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -961,6 +961,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix) OPT_BOOL_F(0, "ext-grep", &external_grep_allowed__ignored, N_("allow calling of grep(1) (ignored by this build)"), PARSE_OPT_NOCOMPLETE), + OPT_INTEGER('m', "max-count", &opt.max_count, + N_("maximum number of results per file")), OPT_END() }; grep_prefix = prefix; @@ -1101,6 +1103,13 @@ int cmd_grep(int argc, const char **argv, const char *prefix) if (recurse_submodules && untracked) die(_("--untracked not supported with --recurse-submodules")); + /* + * Optimize out the case where the amount of matches is limited to zero. + * We do this to keep results consistent with GNU grep(1). + */ + if (opt.max_count == 0) + exit(EXIT_FAILURE); + if (show_in_pager) { if (num_threads > 1) warning(_("invalid option combination, ignoring --threads")); diff --git a/grep.c b/grep.c index 82eb7da1022..a010f9f4132 100644 --- a/grep.c +++ b/grep.c @@ -1686,6 +1686,8 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle bol = eol + 1; if (!left) break; + if (opt->max_count != (unsigned)-1 && count == opt->max_count) + break; left--; lno++; } diff --git a/grep.h b/grep.h index c722d25ed9d..218585a8679 100644 --- a/grep.h +++ b/grep.h @@ -171,6 +171,7 @@ struct grep_opt { int show_hunk_mark; int file_break; int heading; + unsigned max_count; void *priv; void (*output)(struct grep_opt *opt, const void *data, size_t size); @@ -181,6 +182,7 @@ struct grep_opt { .relative = 1, \ .pathname = 1, \ .max_depth = -1, \ + .max_count = (unsigned)-1, \ .pattern_type_option = GREP_PATTERN_TYPE_UNSPECIFIED, \ .colors = { \ [GREP_COLOR_CONTEXT] = "", \