diff mbox series

[v2] grep: support the --pathspec-from-file option

Message ID 20191204203911.237056-1-emilyshaffer@google.com (mailing list archive)
State New, archived
Headers show
Series [v2] grep: support the --pathspec-from-file option | expand

Commit Message

Emily Shaffer Dec. 4, 2019, 8:39 p.m. UTC
Teach 'git grep' to use OPT_PATHSPEC_FROM_FILE and update the
documentation accordingly.

This changes enables 'git grep' to receive the pathspec from a file by
specifying the path, or from stdin by specifying '-' as the path. This
matches the previous functionality of '-f', so the documentation of '-f'
has been expanded to describe this functionality. To let '-f' match the
new '--pathspec-from-file' option, also teach a '--patterns-from-file'
long name to '-f'.

Since there are now two arguments which can attempt to read from stdin,
add a safeguard to check whether the user specified '-' for both of
them. It is still possible for a user to pass '/dev/stdin' to one or
both arguments at present; we do not explicitly check.

Signed-off-by: Emily Shaffer <emilyshaffer@google.com>
---
Refactored to use am/pathspec-from-file. This changes the implementation
significantly since v1, but the testcases mostly remain the same.

This change builds on top of am/pathspec-from-file and is dependent on
at least "parse-options.h: add new options `--pathspec-from-file`,
`--pathspec-file-nul`". I followed the example of "commit: support the
--pathspec-from-file" option and retained the tests from v1 of this
topic.

 Documentation/git-grep.txt | 22 ++++++++++++++++--
 builtin/grep.c             | 35 ++++++++++++++++++++++++-----
 t/t7810-grep.sh            | 46 ++++++++++++++++++++++++++++++++++++--
 3 files changed, 94 insertions(+), 9 deletions(-)

Comments

Denton Liu Dec. 4, 2019, 9:05 p.m. UTC | #1
Hi Emily,

On Wed, Dec 04, 2019 at 12:39:11PM -0800, Emily Shaffer wrote:
> @@ -289,6 +293,20 @@ In future versions we may learn to support patterns containing \0 for
>  more search backends, until then we'll die when the pattern type in
>  question doesn't support them.
>  
> +--pathspec-from-file <file>::
> +	Read pathspec from <file> instead of the command line. If `<file>` is
> +	exactly `-` then standard input is used; standard input cannot be used
> +	for both --patterns-from-file and --pathspec-from-file. Pathspec elements
> +	are separated by LF or CR/LF. Pathspec elements can be quoted as
> +	explained for the configuration variable `core.quotePath` (see
> +	linkgit:git-config[1]). See also `--pathspec-file-nul` and global
> +	`--literal-pathspecs`.
> +
> +--pathspec-file-nul::
> +	Only meaningful with `--pathspec-from-file`. Pathspec elements are
> +	separated with NUL character and all other characters are taken
> +	literally (including newlines and quotes).

Does it make sense to have a corresponding --patterns-file-nul option?
As in, is it possible for patterns to contain inline newlines? If it's
not possible, then that option probably isn't necessary.

> +
>  -e::
>  	The next parameter is the pattern. This option has to be
>  	used for patterns starting with `-` and should be used in

> @@ -1125,6 +1129,44 @@ test_expect_success 'grep --no-index descends into repos, but not .git' '
>  	)
>  '
>  
> +test_expect_success 'setup pathspecs-file tests' '
> +cat >excluded-file <<EOF &&
> +bar
> +EOF
> +cat >pathspec-file <<EOF &&
> +foo
> +bar
> +baz
> +EOF
> +cat >unrelated-file <<EOF &&
> +xyz
> +EOF
> +git add excluded-file pathspec-file unrelated-file
> +'

Could you please change these here-docs to be <<-\EOF and then indent
the test case?

> +
> +cat >pathspecs <<EOF
> +pathspec-file
> +unrelated-file
> +EOF
> +
> +cat >expected <<EOF
> +pathspec-file:bar
> +EOF

In an earlier email, I was wondering aloud why these two blocks were
outside of the test case above. I think the answer to that is that
you're following the existing style of the test case.

In that case, could I pester you to do some test clean up while you're
at it? I think it'd be nice to move the cats into their respective test
cases (with <<-\EOF) and also rename the files from 'expected' to
'expect'. But otherwise, I think it's fine as is as well.

Thanks,

Denton

> +
> +test_expect_success 'grep --pathspec-from-file with file' '
> +	git grep --pathspec-from-file pathspecs "bar" >actual &&
> +	test_cmp expected actual
> +'
> +
> +test_expect_success 'grep --pathspec-file with stdin' '
> +	git grep --pathspec-from-file - "bar" <pathspecs >actual &&
> +	test_cmp expected actual
> +'
> +
> +test_expect_success 'grep with two stdin inputs fails' '
> +	test_must_fail git grep --pathspec-from-file - --patterns-from-file - <pathspecs
> +'
> +
>  test_expect_success 'setup double-dash tests' '
>  cat >double-dash <<EOF &&
>  --
> -- 
> 2.24.0.393.g34dc348eaf-goog
>
Junio C Hamano Dec. 4, 2019, 9:24 p.m. UTC | #2
Denton Liu <liu.denton@gmail.com> writes:

>> +--pathspec-file-nul::
>> +	Only meaningful with `--pathspec-from-file`. Pathspec elements are
>> +	separated with NUL character and all other characters are taken
>> +	literally (including newlines and quotes).
>
> Does it make sense to have a corresponding --patterns-file-nul option?

I do not think so.  "grep" is always record oriented and the record
separter is the LF, so patterns file can safely be delimited with
LF.

>> +test_expect_success 'setup pathspecs-file tests' '
>> +cat >excluded-file <<EOF &&
>> +bar
>> +EOF
>> ...
>> +git add excluded-file pathspec-file unrelated-file
>> +'
>
> Could you please change these here-docs to be <<-\EOF and then indent
> the test case?

Good suggestion.

	test_expect_success 'setup ...' '
		cat >excluded-file <<-\EOF &&
		bar
		EOF
		...
		git add ...
	'

If each line in these files consists of a short single token (which
seems to be the case), perhaps consider using test_write_lines?

	test_write_lines >excluded-file bar &&
	test_write_lines >pathspec-file foo bar baz &&
	test_write_lines >unrelated-file xyz &&
	test_write_lines >pathspecs pathspec-file unrelated-file &&
Junio C Hamano Dec. 4, 2019, 10:24 p.m. UTC | #3
Emily Shaffer <emilyshaffer@google.com> writes:

> Teach 'git grep' to use OPT_PATHSPEC_FROM_FILE and update the
> documentation accordingly.
>
> This changes enables 'git grep' to receive the pathspec from a file by
> specifying the path, or from stdin by specifying '-' as the path. This
> matches the previous functionality of '-f', so the documentation of '-f'
> has been expanded to describe this functionality. To let '-f' match the
> new '--pathspec-from-file' option, also teach a '--patterns-from-file'
> long name to '-f'.
>
> Since there are now two arguments which can attempt to read from stdin,
> add a safeguard to check whether the user specified '-' for both of
> them. It is still possible for a user to pass '/dev/stdin' to one or
> both arguments at present; we do not explicitly check.

OK.  I guess this is good enough at least for now and possibly
forever as that falls into the "doctor, it hurts when I do this"
category.

> Refactored to use am/pathspec-from-file. This changes the implementation
> significantly since v1, but the testcases mostly remain the same.

I am hoping that this topic, and Alexandr's "add" and "checkout"
stuff, would help establishing a good pattern for other commands to
follow.

> diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
> index c89fb569e3..56b1c5a302 100644
> --- a/Documentation/git-grep.txt
> +++ b/Documentation/git-grep.txt
> @@ -24,7 +24,8 @@ SYNOPSIS
>  	   [-A <post-context>] [-B <pre-context>] [-C <context>]
>  	   [-W | --function-context]
>  	   [--threads <num>]
> -	   [-f <file>] [-e] <pattern>
> +	   [-f | --patterns-from-file <file>] [-e] <pattern>

OK.

> +	   [--pathspec-from-file=<file> [--pathspec-file-nul]]

OK.

> @@ -270,7 +271,10 @@ providing this option will cause it to die.
>  	See `grep.threads` in 'CONFIGURATION' for more information.
>  
>  -f <file>::
> -	Read patterns from <file>, one per line.
> +--patterns-from-file <file>::
> +	Read patterns from <file>, one per line. If `<file>` is exactly `-` then
> +	standard input is used; standard input cannot be used for both
> +	--patterns-from-file and --pathspec-from-file.

Makes sense---otherwise they would compete and we cannot know which
kind each line in the standard input is ;-)

> @@ -289,6 +293,20 @@ In future versions we may learn to support patterns containing \0 for
>  more search backends, until then we'll die when the pattern type in
>  question doesn't support them.
>  
> +--pathspec-from-file <file>::
> +	Read pathspec from <file> instead of the command line. If `<file>` is
> +	exactly `-` then standard input is used; standard input cannot be used
> +	for both --patterns-from-file and --pathspec-from-file. Pathspec elements
> +	are separated by LF or CR/LF. Pathspec elements can be quoted as
> +	explained for the configuration variable `core.quotePath` (see
> +	linkgit:git-config[1]). See also `--pathspec-file-nul` and global
> +	`--literal-pathspecs`.
> +
> +--pathspec-file-nul::
> +	Only meaningful with `--pathspec-from-file`. Pathspec elements are
> +	separated with NUL character and all other characters are taken
> +	literally (including newlines and quotes).
> +

I think these matches the ones in git-reset.txt from the earlier
work by Alexandr's, which is good.

> diff --git a/builtin/grep.c b/builtin/grep.c
> index 50ce8d9461..54ba991c42 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -31,6 +31,7 @@ static char const * const grep_usage[] = {
>  };
>  
>  static int recurse_submodules;
> +static int patterns_from_stdin, pathspec_from_stdin;
>  
>  #define GREP_NUM_THREADS_DEFAULT 8
>  static int num_threads;
> @@ -723,15 +724,18 @@ static int context_callback(const struct option *opt, const char *arg,
>  static int file_callback(const struct option *opt, const char *arg, int unset)
>  {

Shouldn't this be renamed?  Earlier, the only thing that can be
taken from a file was the patterns, but now a file can be given
to specify a set of pathspec elements.  "patterns_file_callback()"
or something like taht?

>  	struct grep_opt *grep_opt = opt->value;
> -	int from_stdin;
>  	FILE *patterns;
>  	int lno = 0;
>  	struct strbuf sb = STRBUF_INIT;
>  
>  	BUG_ON_OPT_NEG(unset);
>  
> -	from_stdin = !strcmp(arg, "-");
> -	patterns = from_stdin ? stdin : fopen(arg, "r");
> +	patterns_from_stdin = !strcmp(arg, "-");

Totally outside the scope of this patch, but we may want to
introduce

	int is_stdin_cmdline_marker(const char *arg)
	{
		return !strcmp(arg, "-");
	}

which later can be taught also about "/dev/stdin".  Even outside the
pathspec-from-file option, I think "diff --no-index" also has some
special-case provision for treating "-" as "input coming from the
standard input string", which would benefit from such a helper.

> +	if (patterns_from_stdin && pathspec_from_stdin)
> +		die(_("cannot specify both patterns and pathspec via stdin"));

It's kind of sad that we need to check this in this callback and
also inside cmd_grep() top-level we will see further down...

> +	if (pathspec_from_file) {
> +		if (pathspec.nr)
> +			die(_("--pathspec-from-file is incompatible with pathspec arguments"));
> +
> +		pathspec_from_stdin = !strcmp(pathspec_from_file, "-");
> +
> +		if (patterns_from_stdin && pathspec_from_stdin)
> +			die(_("cannot specify both patterns and pathspec via stdin"));

... here.

Thanks.
Alexandr Miloslavskiy Dec. 5, 2019, 11:58 a.m. UTC | #4
I'm excited to see someone else join my effort, thanks for continuing my 
effort! Also, less work for me :)

On 04.12.2019 21:39, Emily Shaffer wrote:

>   static int file_callback(const struct option *opt, const char *arg, int unset)
>   {
>   	struct grep_opt *grep_opt = opt->value;
> -	int from_stdin;
>   	FILE *patterns;
>   	int lno = 0;
>   	struct strbuf sb = STRBUF_INIT;
>   
>   	BUG_ON_OPT_NEG(unset);
>   
> -	from_stdin = !strcmp(arg, "-");
> -	patterns = from_stdin ? stdin : fopen(arg, "r");
> +	patterns_from_stdin = !strcmp(arg, "-");
> +
> +	if (patterns_from_stdin && pathspec_from_stdin)

To my understanding, this check will not work as expected. 
`file_callback` will be called at the moment of parsing args. 
`pathspec_from_stdin` is only initialized later.

Maybe it would be better to convert `file_callback` into a regular 
function and call it after the options were parsed, similar to how 
`pathspec_from_file` is parsed later?

This will also allow to move global variables into local scope and 
resolve other small issues raised by other reviewers.

> +test_expect_success 'grep with two stdin inputs fails' '
> +	test_must_fail git grep --pathspec-from-file - --patterns-from-file - <pathspecs
> +'
> +

It is usually a good idea to test for specific error, like this:

   test_must_fail git grep --pathspec-from-file - --patterns-from-file - 
<pathspecs 2>err &&
   test_i18ngrep "cannot specify both patterns and pathspec via stdin" 
err &&
Johannes Schindelin Dec. 6, 2019, 11:22 a.m. UTC | #5
Hi Emily,

On Wed, 4 Dec 2019, Emily Shaffer wrote:

> diff --git a/builtin/grep.c b/builtin/grep.c
> index 50ce8d9461..54ba991c42 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -809,6 +813,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
>  	int use_index = 1;
>  	int pattern_type_arg = GREP_PATTERN_TYPE_UNSPECIFIED;
>  	int allow_revs;
> +	char *pathspec_from_file;
> +	int pathspec_file_nul;
>
>  	struct option options[] = {

I need this on top to make it work without problems (I have not yet
triggered the CI build, but I will in a moment):

-- snipsnap --
Subject: [PATCH] fixup??? grep: support the --pathspec-from-file option

We must not use those variables uninitialized, this causes CI build
failures left and right (see e.g.
https://dev.azure.com/gitgitgadget/git/_build/results?buildId=22821)

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/grep.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 54ba991c425..c2aa1aeebd6 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -813,8 +813,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
 	int use_index = 1;
 	int pattern_type_arg = GREP_PATTERN_TYPE_UNSPECIFIED;
 	int allow_revs;
-	char *pathspec_from_file;
-	int pathspec_file_nul;
+	char *pathspec_from_file = NULL;
+	int pathspec_file_nul = 0;

 	struct option options[] = {
 		OPT_BOOL(0, "cached", &cached,
--
2.24.0.windows.2.611.ge9aced84530
SZEDER Gábor Dec. 6, 2019, 11:34 a.m. UTC | #6
On Wed, Dec 04, 2019 at 12:39:11PM -0800, Emily Shaffer wrote:
> Teach 'git grep' to use OPT_PATHSPEC_FROM_FILE and update the
> documentation accordingly.

> diff --git a/builtin/grep.c b/builtin/grep.c
> index 50ce8d9461..54ba991c42 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c

> @@ -809,6 +813,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
>  	int use_index = 1;
>  	int pattern_type_arg = GREP_PATTERN_TYPE_UNSPECIFIED;
>  	int allow_revs;
> +	char *pathspec_from_file;
> +	int pathspec_file_nul;

Uninitialized variables ...
>  
>  	struct option options[] = {
>  		OPT_BOOL(0, "cached", &cached,
> @@ -896,8 +902,10 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
>  		OPT_BOOL('W', "function-context", &opt.funcbody,
>  			N_("show the surrounding function")),
>  		OPT_GROUP(""),
> -		OPT_CALLBACK('f', NULL, &opt, N_("file"),
> +		OPT_CALLBACK('f', "patterns-from-file", &opt, N_("file"),
>  			N_("read patterns from file"), file_callback),
> +		OPT_PATHSPEC_FROM_FILE(&pathspec_from_file),
> +		OPT_PATHSPEC_FILE_NUL(&pathspec_file_nul),
>  		{ OPTION_CALLBACK, 'e', NULL, &opt, N_("pattern"),
>  			N_("match <pattern>"), PARSE_OPT_NONEG, pattern_callback },
>  		{ OPTION_CALLBACK, 0, "and", &opt, NULL,
> @@ -1062,6 +1070,23 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
>  	pathspec.recursive = 1;
>  	pathspec.recurse_submodules = !!recurse_submodules;
>  
> +	if (pathspec_from_file) {

... one of which is checked here ...

> +		if (pathspec.nr)
> +			die(_("--pathspec-from-file is incompatible with pathspec arguments"));

... causing a lot of tests to fail in our OSX CI builds (but,
strangely, in none of the Linux builds), either with the above error
message, e.g.:

  expecting success of 4014.150 'format-patch --pretty=mboxrd':
  [...]
  ++git format-patch --pretty=mboxrd --stdout -1 01b96447182ff76f30268fb82d3f9e9e09b14e6c~1..01b96447182ff76f30268fb82d3f9e9e09b14e6c
  ++git grep -h --no-index -A11 '^>From could trip up a loose mbox parser' patch
  fatal: --pathspec-from-file is incompatible with pathspec arguments
  error: last command exited with $?=128

or with:

  expecting success of 7814.2 'grep correctly finds patterns in a submodule': 
  [...]
  ++git grep -e '(3|4)' --recurse-submodules
  fatal: could not open '����?' for reading: No such file or directory
  error: last command exited with $?=128

> +		pathspec_from_stdin = !strcmp(pathspec_from_file, "-");
> +
> +		if (patterns_from_stdin && pathspec_from_stdin)
> +			die(_("cannot specify both patterns and pathspec via stdin"));
> +
> +		parse_pathspec_file(&pathspec, 0, PATHSPEC_PREFER_CWD |
> +				    (opt.max_depth != -1 ? PATHSPEC_MAXDEPTH_VALID : 0),
> +				    prefix, pathspec_from_file,
> +				    pathspec_file_nul);

The other uninitialized variable is passed as parameter above and
checked below, but I haven't seen any test failures caused by it.

> +	} else if (pathspec_file_nul) {
> +		die(_("--pathspec-file-nul requires --pathspec-from-file"));
> +	}
> +
>  	if (list.nr || cached || show_in_pager) {
>  		if (num_threads > 1)
>  			warning(_("invalid option combination, ignoring --threads"));
Emily Shaffer Dec. 13, 2019, 3:07 a.m. UTC | #7
On Wed, Dec 04, 2019 at 02:24:07PM -0800, Junio C Hamano wrote:
> Emily Shaffer <emilyshaffer@google.com> writes:
> 
> > Teach 'git grep' to use OPT_PATHSPEC_FROM_FILE and update the
> > documentation accordingly.
> >
> > This changes enables 'git grep' to receive the pathspec from a file by
> > specifying the path, or from stdin by specifying '-' as the path. This
> > matches the previous functionality of '-f', so the documentation of '-f'
> > has been expanded to describe this functionality. To let '-f' match the
> > new '--pathspec-from-file' option, also teach a '--patterns-from-file'
> > long name to '-f'.
> >
> > Since there are now two arguments which can attempt to read from stdin,
> > add a safeguard to check whether the user specified '-' for both of
> > them. It is still possible for a user to pass '/dev/stdin' to one or
> > both arguments at present; we do not explicitly check.
> 
> OK.  I guess this is good enough at least for now and possibly
> forever as that falls into the "doctor, it hurts when I do this"
> category.
> 
> > Refactored to use am/pathspec-from-file. This changes the implementation
> > significantly since v1, but the testcases mostly remain the same.
> 
> I am hoping that this topic, and Alexandr's "add" and "checkout"
> stuff, would help establishing a good pattern for other commands to
> follow.
> 
> > diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
> > index c89fb569e3..56b1c5a302 100644
> > --- a/Documentation/git-grep.txt
> > +++ b/Documentation/git-grep.txt
> > @@ -24,7 +24,8 @@ SYNOPSIS
> >  	   [-A <post-context>] [-B <pre-context>] [-C <context>]
> >  	   [-W | --function-context]
> >  	   [--threads <num>]
> > -	   [-f <file>] [-e] <pattern>
> > +	   [-f | --patterns-from-file <file>] [-e] <pattern>
> 
> OK.
> 
> > +	   [--pathspec-from-file=<file> [--pathspec-file-nul]]
> 
> OK.
> 
> > @@ -270,7 +271,10 @@ providing this option will cause it to die.
> >  	See `grep.threads` in 'CONFIGURATION' for more information.
> >  
> >  -f <file>::
> > -	Read patterns from <file>, one per line.
> > +--patterns-from-file <file>::
> > +	Read patterns from <file>, one per line. If `<file>` is exactly `-` then
> > +	standard input is used; standard input cannot be used for both
> > +	--patterns-from-file and --pathspec-from-file.
> 
> Makes sense---otherwise they would compete and we cannot know which
> kind each line in the standard input is ;-)

At one point Jonathan Nieder had suggested to me that it might be nice
to watch for "--" in the standard input. But I think I wasn't confident
in the idea that all commands which take arbitrary number of arguments,
and pathspec expect their arg list to end in <arg...> -- <pathspec>.

> 
> > @@ -289,6 +293,20 @@ In future versions we may learn to support patterns containing \0 for
> >  more search backends, until then we'll die when the pattern type in
> >  question doesn't support them.
> >  
> > +--pathspec-from-file <file>::
> > +	Read pathspec from <file> instead of the command line. If `<file>` is
> > +	exactly `-` then standard input is used; standard input cannot be used
> > +	for both --patterns-from-file and --pathspec-from-file. Pathspec elements
> > +	are separated by LF or CR/LF. Pathspec elements can be quoted as
> > +	explained for the configuration variable `core.quotePath` (see
> > +	linkgit:git-config[1]). See also `--pathspec-file-nul` and global
> > +	`--literal-pathspecs`.
> > +
> > +--pathspec-file-nul::
> > +	Only meaningful with `--pathspec-from-file`. Pathspec elements are
> > +	separated with NUL character and all other characters are taken
> > +	literally (including newlines and quotes).
> > +
> 
> I think these matches the ones in git-reset.txt from the earlier
> work by Alexandr's, which is good.

Yes, I took them verbatim.

> 
> > diff --git a/builtin/grep.c b/builtin/grep.c
> > index 50ce8d9461..54ba991c42 100644
> > --- a/builtin/grep.c
> > +++ b/builtin/grep.c
> > @@ -31,6 +31,7 @@ static char const * const grep_usage[] = {
> >  };
> >  
> >  static int recurse_submodules;
> > +static int patterns_from_stdin, pathspec_from_stdin;
> >  
> >  #define GREP_NUM_THREADS_DEFAULT 8
> >  static int num_threads;
> > @@ -723,15 +724,18 @@ static int context_callback(const struct option *opt, const char *arg,
> >  static int file_callback(const struct option *opt, const char *arg, int unset)
> >  {
> 
> Shouldn't this be renamed?  Earlier, the only thing that can be
> taken from a file was the patterns, but now a file can be given
> to specify a set of pathspec elements.  "patterns_file_callback()"
> or something like taht?

Done. I was hesitant to touch the other code; I guess it is fine.

> 
> >  	struct grep_opt *grep_opt = opt->value;
> > -	int from_stdin;
> >  	FILE *patterns;
> >  	int lno = 0;
> >  	struct strbuf sb = STRBUF_INIT;
> >  
> >  	BUG_ON_OPT_NEG(unset);
> >  
> > -	from_stdin = !strcmp(arg, "-");
> > -	patterns = from_stdin ? stdin : fopen(arg, "r");
> > +	patterns_from_stdin = !strcmp(arg, "-");
> 
> Totally outside the scope of this patch, but we may want to
> introduce
> 
> 	int is_stdin_cmdline_marker(const char *arg)
> 	{
> 		return !strcmp(arg, "-");
> 	}
> 
> which later can be taught also about "/dev/stdin".  Even outside the
> pathspec-from-file option, I think "diff --no-index" also has some
> special-case provision for treating "-" as "input coming from the
> standard input string", which would benefit from such a helper.

Agreed; that's probably a handy thing to put into parse-options.h. I
suggest we think of this next time we are looking for Outreachy
microprojects or similar. I'll write it down somewhere.

> 
> > +	if (patterns_from_stdin && pathspec_from_stdin)
> > +		die(_("cannot specify both patterns and pathspec via stdin"));
> 
> It's kind of sad that we need to check this in this callback and
> also inside cmd_grep() top-level we will see further down...

Another reviewer suggested not using a callback and instead waiting
until after the entire argparse to do file reads; this would let us
avoid the weird double check here. I think I will do it.

> 
> > +	if (pathspec_from_file) {
> > +		if (pathspec.nr)
> > +			die(_("--pathspec-from-file is incompatible with pathspec arguments"));
> > +
> > +		pathspec_from_stdin = !strcmp(pathspec_from_file, "-");
> > +
> > +		if (patterns_from_stdin && pathspec_from_stdin)
> > +			die(_("cannot specify both patterns and pathspec via stdin"));
> 
> ... here.
> 
> Thanks.
Emily Shaffer Dec. 13, 2019, 4 a.m. UTC | #8
On Thu, Dec 05, 2019 at 12:58:19PM +0100, Alexandr Miloslavskiy wrote:
> I'm excited to see someone else join my effort, thanks for continuing my
> effort! Also, less work for me :)
> 
> On 04.12.2019 21:39, Emily Shaffer wrote:
> 
> >   static int file_callback(const struct option *opt, const char *arg, int unset)
> >   {
> >   	struct grep_opt *grep_opt = opt->value;
> > -	int from_stdin;
> >   	FILE *patterns;
> >   	int lno = 0;
> >   	struct strbuf sb = STRBUF_INIT;
> >   	BUG_ON_OPT_NEG(unset);
> > -	from_stdin = !strcmp(arg, "-");
> > -	patterns = from_stdin ? stdin : fopen(arg, "r");
> > +	patterns_from_stdin = !strcmp(arg, "-");
> > +
> > +	if (patterns_from_stdin && pathspec_from_stdin)
> 
> To my understanding, this check will not work as expected. `file_callback`
> will be called at the moment of parsing args. `pathspec_from_stdin` is only
> initialized later.
> 
> Maybe it would be better to convert `file_callback` into a regular function
> and call it after the options were parsed, similar to how
> `pathspec_from_file` is parsed later?
> 
> This will also allow to move global variables into local scope and resolve
> other small issues raised by other reviewers.

Yeah, I think this is a good idea for the reasons you stated, so I'll do
so.

> 
> > +test_expect_success 'grep with two stdin inputs fails' '
> > +	test_must_fail git grep --pathspec-from-file - --patterns-from-file - <pathspecs
> > +'
> > +
> 
> It is usually a good idea to test for specific error, like this:
> 
>   test_must_fail git grep --pathspec-from-file - --patterns-from-file -
> <pathspecs 2>err &&
>   test_i18ngrep "cannot specify both patterns and pathspec via stdin" err &&

Sure. Thanks.
diff mbox series

Patch

diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt
index c89fb569e3..56b1c5a302 100644
--- a/Documentation/git-grep.txt
+++ b/Documentation/git-grep.txt
@@ -24,7 +24,8 @@  SYNOPSIS
 	   [-A <post-context>] [-B <pre-context>] [-C <context>]
 	   [-W | --function-context]
 	   [--threads <num>]
-	   [-f <file>] [-e] <pattern>
+	   [-f | --patterns-from-file <file>] [-e] <pattern>
+	   [--pathspec-from-file=<file> [--pathspec-file-nul]]
 	   [--and|--or|--not|(|)|-e <pattern>...]
 	   [--recurse-submodules] [--parent-basename <basename>]
 	   [ [--[no-]exclude-standard] [--cached | --no-index | --untracked] | <tree>...]
@@ -270,7 +271,10 @@  providing this option will cause it to die.
 	See `grep.threads` in 'CONFIGURATION' for more information.
 
 -f <file>::
-	Read patterns from <file>, one per line.
+--patterns-from-file <file>::
+	Read patterns from <file>, one per line. If `<file>` is exactly `-` then
+	standard input is used; standard input cannot be used for both
+	--patterns-from-file and --pathspec-from-file.
 +
 Passing the pattern via <file> allows for providing a search pattern
 containing a \0.
@@ -289,6 +293,20 @@  In future versions we may learn to support patterns containing \0 for
 more search backends, until then we'll die when the pattern type in
 question doesn't support them.
 
+--pathspec-from-file <file>::
+	Read pathspec from <file> instead of the command line. If `<file>` is
+	exactly `-` then standard input is used; standard input cannot be used
+	for both --patterns-from-file and --pathspec-from-file. Pathspec elements
+	are separated by LF or CR/LF. Pathspec elements can be quoted as
+	explained for the configuration variable `core.quotePath` (see
+	linkgit:git-config[1]). See also `--pathspec-file-nul` and global
+	`--literal-pathspecs`.
+
+--pathspec-file-nul::
+	Only meaningful with `--pathspec-from-file`. Pathspec elements are
+	separated with NUL character and all other characters are taken
+	literally (including newlines and quotes).
+
 -e::
 	The next parameter is the pattern. This option has to be
 	used for patterns starting with `-` and should be used in
diff --git a/builtin/grep.c b/builtin/grep.c
index 50ce8d9461..54ba991c42 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -31,6 +31,7 @@  static char const * const grep_usage[] = {
 };
 
 static int recurse_submodules;
+static int patterns_from_stdin, pathspec_from_stdin;
 
 #define GREP_NUM_THREADS_DEFAULT 8
 static int num_threads;
@@ -723,15 +724,18 @@  static int context_callback(const struct option *opt, const char *arg,
 static int file_callback(const struct option *opt, const char *arg, int unset)
 {
 	struct grep_opt *grep_opt = opt->value;
-	int from_stdin;
 	FILE *patterns;
 	int lno = 0;
 	struct strbuf sb = STRBUF_INIT;
 
 	BUG_ON_OPT_NEG(unset);
 
-	from_stdin = !strcmp(arg, "-");
-	patterns = from_stdin ? stdin : fopen(arg, "r");
+	patterns_from_stdin = !strcmp(arg, "-");
+
+	if (patterns_from_stdin && pathspec_from_stdin)
+		die(_("cannot specify both patterns and pathspec via stdin"));
+
+	patterns = patterns_from_stdin ? stdin : fopen(arg, "r");
 	if (!patterns)
 		die_errno(_("cannot open '%s'"), arg);
 	while (strbuf_getline(&sb, patterns) == 0) {
@@ -742,7 +746,7 @@  static int file_callback(const struct option *opt, const char *arg, int unset)
 		append_grep_pat(grep_opt, sb.buf, sb.len, arg, ++lno,
 				GREP_PATTERN);
 	}
-	if (!from_stdin)
+	if (!patterns_from_stdin)
 		fclose(patterns);
 	strbuf_release(&sb);
 	return 0;
@@ -809,6 +813,8 @@  int cmd_grep(int argc, const char **argv, const char *prefix)
 	int use_index = 1;
 	int pattern_type_arg = GREP_PATTERN_TYPE_UNSPECIFIED;
 	int allow_revs;
+	char *pathspec_from_file;
+	int pathspec_file_nul;
 
 	struct option options[] = {
 		OPT_BOOL(0, "cached", &cached,
@@ -896,8 +902,10 @@  int cmd_grep(int argc, const char **argv, const char *prefix)
 		OPT_BOOL('W', "function-context", &opt.funcbody,
 			N_("show the surrounding function")),
 		OPT_GROUP(""),
-		OPT_CALLBACK('f', NULL, &opt, N_("file"),
+		OPT_CALLBACK('f', "patterns-from-file", &opt, N_("file"),
 			N_("read patterns from file"), file_callback),
+		OPT_PATHSPEC_FROM_FILE(&pathspec_from_file),
+		OPT_PATHSPEC_FILE_NUL(&pathspec_file_nul),
 		{ OPTION_CALLBACK, 'e', NULL, &opt, N_("pattern"),
 			N_("match <pattern>"), PARSE_OPT_NONEG, pattern_callback },
 		{ OPTION_CALLBACK, 0, "and", &opt, NULL,
@@ -1062,6 +1070,23 @@  int cmd_grep(int argc, const char **argv, const char *prefix)
 	pathspec.recursive = 1;
 	pathspec.recurse_submodules = !!recurse_submodules;
 
+	if (pathspec_from_file) {
+		if (pathspec.nr)
+			die(_("--pathspec-from-file is incompatible with pathspec arguments"));
+
+		pathspec_from_stdin = !strcmp(pathspec_from_file, "-");
+
+		if (patterns_from_stdin && pathspec_from_stdin)
+			die(_("cannot specify both patterns and pathspec via stdin"));
+
+		parse_pathspec_file(&pathspec, 0, PATHSPEC_PREFER_CWD |
+				    (opt.max_depth != -1 ? PATHSPEC_MAXDEPTH_VALID : 0),
+				    prefix, pathspec_from_file,
+				    pathspec_file_nul);
+	} else if (pathspec_file_nul) {
+		die(_("--pathspec-file-nul requires --pathspec-from-file"));
+	}
+
 	if (list.nr || cached || show_in_pager) {
 		if (num_threads > 1)
 			warning(_("invalid option combination, ignoring --threads"));
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 7d7b396c23..355890a72a 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -549,6 +549,10 @@  test_expect_success 'grep -f, non-existent file' '
 	test_must_fail git grep -f patterns
 '
 
+text_expect_success 'grep --pathspec-from-file, non-existent file' '
+	test_must_fail git grep --pathspec-from-file pathspecs
+'
+
 cat >expected <<EOF
 file:foo mmap bar
 file:foo_mmap bar
@@ -582,8 +586,8 @@  mmap
 vvv
 EOF
 
-test_expect_success 'grep -f, multiple patterns' '
-	git grep -f patterns >actual &&
+test_expect_success 'grep --patterns-from-file, multiple patterns' '
+	git grep --patterns-from-file patterns >actual &&
 	test_cmp expected actual
 '
 
@@ -1125,6 +1129,44 @@  test_expect_success 'grep --no-index descends into repos, but not .git' '
 	)
 '
 
+test_expect_success 'setup pathspecs-file tests' '
+cat >excluded-file <<EOF &&
+bar
+EOF
+cat >pathspec-file <<EOF &&
+foo
+bar
+baz
+EOF
+cat >unrelated-file <<EOF &&
+xyz
+EOF
+git add excluded-file pathspec-file unrelated-file
+'
+
+cat >pathspecs <<EOF
+pathspec-file
+unrelated-file
+EOF
+
+cat >expected <<EOF
+pathspec-file:bar
+EOF
+
+test_expect_success 'grep --pathspec-from-file with file' '
+	git grep --pathspec-from-file pathspecs "bar" >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'grep --pathspec-file with stdin' '
+	git grep --pathspec-from-file - "bar" <pathspecs >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'grep with two stdin inputs fails' '
+	test_must_fail git grep --pathspec-from-file - --patterns-from-file - <pathspecs
+'
+
 test_expect_success 'setup double-dash tests' '
 cat >double-dash <<EOF &&
 --