diff mbox series

grep: die gracefully when outside repository

Message ID 087c92e3904dd774f672373727c300bf7f5f6369.1697317276.git.code@khaugsbakk.name (mailing list archive)
State Superseded
Headers show
Series grep: die gracefully when outside repository | expand

Commit Message

Kristoffer Haugsbakk Oct. 14, 2023, 9:02 p.m. UTC
Die gracefully when `git grep --no-index` is run outside of a Git
repository and the path is outside the directory tree.

If you are not in a Git repository and say:

    git grep --no-index search ..

You trigger a `BUG`:

    BUG: environment.c:213: git environment hasn't been setup
    Aborted (core dumped)

Because `..` is a valid path which is treated as a pathspec. Then
`pathspec` figures out that it is not in the current directory tree. The
`BUG` is triggered when `pathspec` tries to advice the user about the path
to the (non-existing) repository.

Reported-by: ks1322 ks1322 <ks1322@gmail.com>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
---
 pathspec.c      |  3 +++
 t/t7810-grep.sh | 13 +++++++++++++
 2 files changed, 16 insertions(+)


base-commit: 43c8a30d150ecede9709c1f2527c8fba92c65f40

Comments

Jeff King Oct. 15, 2023, 3:26 a.m. UTC | #1
On Sat, Oct 14, 2023 at 11:02:38PM +0200, Kristoffer Haugsbakk wrote:

> Die gracefully when `git grep --no-index` is run outside of a Git
> repository and the path is outside the directory tree.
> 
> If you are not in a Git repository and say:
> 
>     git grep --no-index search ..
> 
> You trigger a `BUG`:
> 
>     BUG: environment.c:213: git environment hasn't been setup
>     Aborted (core dumped)
> 
> Because `..` is a valid path which is treated as a pathspec. Then
> `pathspec` figures out that it is not in the current directory tree. The
> `BUG` is triggered when `pathspec` tries to advice the user about the path
> to the (non-existing) repository.

Is it even reasonable for "grep --no-index" to care about leaving the
tree in the first place? That is, is there a reason we should not allow:

  git grep --no-index foo ../bar

? And if we do want to care, there is a weirdness here that even with
your patch, we check to see if the file exists:

  $ git grep --no-index foo ../does-exist
  fatal: '../does-exist' is outside the directory tree

  $ git grep --no-index foo ../does-not-exist
  fatal: ../does-not-exist: no such path in the working tree.

If we want to avoid leaving the current directory, then I think we need
to be checking much sooner (but again, I would argue that it is not
worth caring about in no-index mode).

I do think your patch does not make anything worse (and indeed makes the
error output much better). So I do not mind it in the meantime. But I
have a feeling that we'd end up reverting it as part of the fix for the
larger issue.

-Peff
Kristoffer Haugsbakk Oct. 15, 2023, 8 a.m. UTC | #2
On Sun, Oct 15, 2023, at 05:26, Jeff King wrote:
> Is it even reasonable for "grep --no-index" to care about leaving the
> tree in the first place? That is, is there a reason we should not allow:
>
>   git grep --no-index foo ../bar
>
> ?

On second thought yeah, it doesn't make sense. We are outside of any
repository already so what does it matter where the file is relative to
the current working directory?

It seems that `pathspec.c:init_pathspec_item` should let you through in
this case.
Junio C Hamano Oct. 15, 2023, 5:57 p.m. UTC | #3
Jeff King <peff@peff.net> writes:

> Is it even reasonable for "grep --no-index" to care about leaving the
> tree in the first place? That is, is there a reason we should not allow:
>
>   git grep --no-index foo ../bar

A huge difference between the bare "grep" and "git grep" is that we
know the scope of the project tree, so it goes recursive by default.
Should the above command line recursively go below ../bar?  Would we
allow "/" to be given?

I actually do not think these "we are allowing Git tools to be used
on random garbage" is a good idea to begin with X-<.  If we invented
something nice for our variant in "git grep" and wish we can use it
outside the repository, contributing the feature to implementations
of "grep" would have been the right way to move forward, instead of
contaminating the codebase with things that are not related to Git.
Whoever did 59332d13 (Resurrect "git grep --no-index", 2010-02-06)
should be punished X-<.

Anyway.

2e48fcdb (grep docs: document --no-index option, 2010-02-25) seems
to have wanted to explicitly limit the search within the "current
directory", and I am fine to keep the search space limited by the
cwd.  On the other hand, of course, the users can shoot themselves
in the foot with "grep -r foo /", so letting them use "git grep" the
same way is perhaps OK.  Especially if it simplifies the code if we
lift the limitation, that is a very tempting thing to do.

> If we want to avoid leaving the current directory, then I think we need
> to be checking much sooner (but again, I would argue that it is not
> worth caring about in no-index mode).
Junio C Hamano Oct. 17, 2023, 4:42 p.m. UTC | #4
Kristoffer Haugsbakk <code@khaugsbakk.name> writes:

> diff --git a/pathspec.c b/pathspec.c
> index 3a3a5724c44..e115832f17a 100644
> --- a/pathspec.c
> +++ b/pathspec.c
> @@ -468,6 +468,9 @@ static void init_pathspec_item(struct pathspec_item *item, unsigned flags,
>  					   &prefixlen, copyfrom);
>  		if (!match) {
>  			const char *hint_path = get_git_work_tree();
> +			if (!have_git_dir())
> +				die(_("'%s' is outside the directory tree"),
> +				    copyfrom);
>  			if (!hint_path)
>  				hint_path = get_git_dir();
>  			die(_("%s: '%s' is outside repository at '%s'"), elt,

It is curious that the original has two sources of hint_path (i.e.,
get_git_dir() is used as a fallback for get_git_work_tree()).  Are
we certain that the check is at the right place?  If we do not have
a repository, then both would fail by returning NULL, so it should
not matter if we add the new check before we check either or both,
or even after we checked both before dying.

I wonder if

	const char *hint_path = get_git_work_tree();

	if (!hint_path)
	        hint_path = get_git_dir();
	if (hint_path)
		die(_("%s: '%s' is outside repository at '%s'"),
		    elt, copyfrom, absolute_path(hint_path));
	else
		die(_("%s: '%s' is outside the directory tree"),
		    elt, copyfrom);

makes the intent of the code clearer.  We want to hint the location
of the repository by computing hint_path, and if we can compute it,
we use it in the error message, but otherwise we don't add hint.  And
we apply that conditional whether we have repository or not---what we
care about is the NULL-ness of the hint string we computed.

> diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
> index 39d6d713ecb..b976f81a166 100755
> --- a/t/t7810-grep.sh
> +++ b/t/t7810-grep.sh
> @@ -1234,6 +1234,19 @@ test_expect_success 'outside of git repository with fallbackToNoIndex' '
>  	)
>  '
>  
> +test_expect_success 'outside of git repository with pathspec outside the directory tree' '
> +	test_when_finished rm -fr non &&
> +	rm -fr non &&
> +	mkdir -p non/git/sub &&
> +	(
> +		GIT_CEILING_DIRECTORIES="$(pwd)/non" &&
> +		export GIT_CEILING_DIRECTORIES &&
> +		cd non/git &&
> +		test_expect_code 128 git grep --no-index search .. 2>error &&
> +		grep "is outside the directory tree" error

Excellent.  This is a very good use of the GIT_CEILING_DIRECTORIES
facility.

> +	)
> +'
> +
>  test_expect_success 'inside git repository but with --no-index' '
>  	rm -fr is &&
>  	mkdir -p is/git/sub &&
>
> base-commit: 43c8a30d150ecede9709c1f2527c8fba92c65f40
Kristoffer Haugsbakk Oct. 17, 2023, 7:51 p.m. UTC | #5
On Tue, Oct 17, 2023, at 18:42, Junio C Hamano wrote:
> It is curious that the original has two sources of hint_path (i.e.,
> get_git_dir() is used as a fallback for get_git_work_tree()).  Are
> we certain that the check is at the right place?  If we do not have
> a repository, then both would fail by returning NULL, so it should
> not matter if we add the new check before we check either or both,
> or even after we checked both before dying.
>
> I wonder if
>
> 	const char *hint_path = get_git_work_tree();
>
> 	if (!hint_path)
> 	        hint_path = get_git_dir();
> 	if (hint_path)
> 		die(_("%s: '%s' is outside repository at '%s'"),
> 		    elt, copyfrom, absolute_path(hint_path));
> 	else
> 		die(_("%s: '%s' is outside the directory tree"),
> 		    elt, copyfrom);
>
> makes the intent of the code clearer.

That doesn't work since `get_git_dir()` triggers `BUG` instead of
returning `NULL`.

The `hint_path` declaration has to be at the start because of style
rules. But we can initialize it after.

I can also have a second look at the test since I am using `grep` to
test the failure output and not the translation string variant.

-- >8 --
Subject: [PATCH] fixup! grep: die gracefully when outside repository

---
 pathspec.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/pathspec.c b/pathspec.c
index e115832f17a..0c1061fad11 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -467,10 +467,11 @@ static void init_pathspec_item(struct pathspec_item *item, unsigned flags,
 		match = prefix_path_gently(prefix, prefixlen,
 					   &prefixlen, copyfrom);
 		if (!match) {
-			const char *hint_path = get_git_work_tree();
+			const char *hint_path;
 			if (!have_git_dir())
 				die(_("'%s' is outside the directory tree"),
 				    copyfrom);
+			hint_path = get_git_work_tree();
 			if (!hint_path)
 				hint_path = get_git_dir();
 			die(_("%s: '%s' is outside repository at '%s'"), elt,
Junio C Hamano Oct. 17, 2023, 8:25 p.m. UTC | #6
Kristoffer Haugsbakk <code@khaugsbakk.name> writes:

> On Tue, Oct 17, 2023, at 18:42, Junio C Hamano wrote:
>> It is curious that the original has two sources of hint_path (i.e.,
>> get_git_dir() is used as a fallback for get_git_work_tree()).  Are
>> we certain that the check is at the right place?  If we do not have
>> a repository, then both would fail by returning NULL, so it should
>> not matter if we add the new check before we check either or both,
>> or even after we checked both before dying.
>>
>> I wonder if
>>
>> 	const char *hint_path = get_git_work_tree();
>>
>> 	if (!hint_path)
>> 	        hint_path = get_git_dir();
>> 	if (hint_path)
>> 		die(_("%s: '%s' is outside repository at '%s'"),
>> 		    elt, copyfrom, absolute_path(hint_path));
>> 	else
>> 		die(_("%s: '%s' is outside the directory tree"),
>> 		    elt, copyfrom);
>>
>> makes the intent of the code clearer.
>
> That doesn't work since `get_git_dir()` triggers `BUG` instead of
> returning `NULL`.

Ah, interesting.

> The `hint_path` declaration has to be at the start because of style
> rules. But we can initialize it after.

Yes, what you have below (but please leave a blank line between the
last line of decl and the first line of statement for readablility)
looks very readable and sensible.

> I can also have a second look at the test since I am using `grep` to
> test the failure output and not the translation string variant.

That is not necessary, as we no longer run under phoney i18n that
required us to use test_i18ngrep.  It is OK to assume that the tests
are run under "C" locale.

Thanks.

> -- >8 --
> Subject: [PATCH] fixup! grep: die gracefully when outside repository
>
> ---
>  pathspec.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/pathspec.c b/pathspec.c
> index e115832f17a..0c1061fad11 100644
> --- a/pathspec.c
> +++ b/pathspec.c
> @@ -467,10 +467,11 @@ static void init_pathspec_item(struct pathspec_item *item, unsigned flags,
>  		match = prefix_path_gently(prefix, prefixlen,
>  					   &prefixlen, copyfrom);
>  		if (!match) {
> -			const char *hint_path = get_git_work_tree();
> +			const char *hint_path;
>  			if (!have_git_dir())
>  				die(_("'%s' is outside the directory tree"),
>  				    copyfrom);
> +			hint_path = get_git_work_tree();
>  			if (!hint_path)
>  				hint_path = get_git_dir();
>  			die(_("%s: '%s' is outside repository at '%s'"), elt,
Junio C Hamano Oct. 17, 2023, 8:39 p.m. UTC | #7
Kristoffer Haugsbakk <code@khaugsbakk.name> writes:

> diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
> index 39d6d713ecb..b976f81a166 100755
> --- a/t/t7810-grep.sh
> +++ b/t/t7810-grep.sh
> @@ -1234,6 +1234,19 @@ test_expect_success 'outside of git repository with fallbackToNoIndex' '
>  	)
>  '
>  
> +test_expect_success 'outside of git repository with pathspec outside the directory tree' '
> +	test_when_finished rm -fr non &&
> +	rm -fr non &&
> +	mkdir -p non/git/sub &&
> +	(
> +		GIT_CEILING_DIRECTORIES="$(pwd)/non" &&
> +		export GIT_CEILING_DIRECTORIES &&
> +		cd non/git &&
> +		test_expect_code 128 git grep --no-index search .. 2>error &&
> +		grep "is outside the directory tree" error
> +	)
> +'
> +

So you create non/git/sub, go to non/git (so there is sub/ directory),
and try running "..".

If you had a directory non/tig next to non/git and used ../tig
instead of .. as the path given to "git grep", it would also
correctly fail.  Searching in a non-existing path, ../non, dies in a
different error, with an error message that is not technically
wrong, but it probably can be improved.  It has been a while since I
looked at the pathspec matching code, but if we are lucky, it might
be just the matter of swapping the order of checking (in other
words, check "is it outside" first and then "does it exist" next, or
something like that)?

 t/t7810-grep.sh | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git c/t/t7810-grep.sh w/t/t7810-grep.sh
index b976f81a16..84838c0fe1 100755
--- c/t/t7810-grep.sh
+++ w/t/t7810-grep.sh
@@ -1234,16 +1234,30 @@ test_expect_success 'outside of git repository with fallbackToNoIndex' '
 	)
 '
 
-test_expect_success 'outside of git repository with pathspec outside the directory tree' '
+test_expect_success 'no repository with path outside $cwd' '
 	test_when_finished rm -fr non &&
 	rm -fr non &&
-	mkdir -p non/git/sub &&
+	mkdir -p non/git/sub non/tig &&
 	(
 		GIT_CEILING_DIRECTORIES="$(pwd)/non" &&
 		export GIT_CEILING_DIRECTORIES &&
 		cd non/git &&
 		test_expect_code 128 git grep --no-index search .. 2>error &&
 		grep "is outside the directory tree" error
+	) &&
+	(
+		GIT_CEILING_DIRECTORIES="$(pwd)/non" &&
+		export GIT_CEILING_DIRECTORIES &&
+		cd non/git &&
+		test_expect_code 128 git grep --no-index search ../tig 2>error &&
+		grep "is outside the directory tree" error
+	) &&
+	(
+		GIT_CEILING_DIRECTORIES="$(pwd)/non" &&
+		export GIT_CEILING_DIRECTORIES &&
+		cd non/git &&
+		test_expect_code 128 git grep --no-index search ../non 2>error &&
+		grep "no such path in the working tree" error
 	)
 '
diff mbox series

Patch

diff --git a/pathspec.c b/pathspec.c
index 3a3a5724c44..e115832f17a 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -468,6 +468,9 @@  static void init_pathspec_item(struct pathspec_item *item, unsigned flags,
 					   &prefixlen, copyfrom);
 		if (!match) {
 			const char *hint_path = get_git_work_tree();
+			if (!have_git_dir())
+				die(_("'%s' is outside the directory tree"),
+				    copyfrom);
 			if (!hint_path)
 				hint_path = get_git_dir();
 			die(_("%s: '%s' is outside repository at '%s'"), elt,
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 39d6d713ecb..b976f81a166 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -1234,6 +1234,19 @@  test_expect_success 'outside of git repository with fallbackToNoIndex' '
 	)
 '
 
+test_expect_success 'outside of git repository with pathspec outside the directory tree' '
+	test_when_finished rm -fr non &&
+	rm -fr non &&
+	mkdir -p non/git/sub &&
+	(
+		GIT_CEILING_DIRECTORIES="$(pwd)/non" &&
+		export GIT_CEILING_DIRECTORIES &&
+		cd non/git &&
+		test_expect_code 128 git grep --no-index search .. 2>error &&
+		grep "is outside the directory tree" error
+	)
+'
+
 test_expect_success 'inside git repository but with --no-index' '
 	rm -fr is &&
 	mkdir -p is/git/sub &&