diff mbox series

[v1,1/2] builtin/grep.c: add --sparse option

Message ID 20220817075633.217934-2-shaoxuan.yuan02@gmail.com (mailing list archive)
State Superseded
Headers show
Series grep: integrate with sparse index | expand

Commit Message

Shaoxuan Yuan Aug. 17, 2022, 7:56 a.m. UTC
Add a --sparse option to `git-grep`. This option is mainly used to:

If searching in the index (using --cached):

With --sparse, proceed the action when the current cache_entry is
marked with SKIP_WORKTREE bit (the default is to skip this kind of
entry). Before this patch, --cached itself can realize this action.
Adding --sparse here grants the user finer control over sparse
entries. If the user only wants to peak into the index without
caring about sparse entries, --cached should suffice; if the user
wants to peak into the index _and_ cares about sparse entries,
combining --sparse with --cached can address this need.

Suggested-by: Victoria Dye <vdye@github.com>
Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
---
 builtin/grep.c                  | 10 +++++++++-
 t/t7817-grep-sparse-checkout.sh | 12 ++++++------
 2 files changed, 15 insertions(+), 7 deletions(-)

Comments

Derrick Stolee Aug. 17, 2022, 2:12 p.m. UTC | #1
On 8/17/2022 3:56 AM, Shaoxuan Yuan wrote:
> Add a --sparse option to `git-grep`. This option is mainly used to:
> 
> If searching in the index (using --cached):
> 
> With --sparse, proceed the action when the current cache_entry is

This phrasing is awkward. It might be better to reframe to describe the
_why_ before the _what_

  When the '--cached' option is used with the 'git grep' command, the
  search is limited to the blobs found in the index, not in the worktree.
  If the user has enabled sparse-checkout, this might present more results
  than they would like, since the files outside of the sparse-checkout are
  unlikely to be important to them.

  Change the default behavior of 'git grep' to focus on the files within
  the sparse-checkout definition. To enable the previous behavior, add a
  '--sparse' option to 'git grep' that triggers the old behavior that
  inspects paths outside of the sparse-checkout definition when paired
  with the '--cached' option.

Or something like that. The documentation updates will also help clarify
what happens when '--cached' is not included. I assume '--sparse' is
ignored, but perhaps it _could_ allow looking at the cached files outside
the sparse-checkout definition, this could make the simpler invocation of
'git grep --sparse <pattern>' be the way that users can search after their
attempt to search the worktree failed.

> marked with SKIP_WORKTREE bit (the default is to skip this kind of
> entry). Before this patch, --cached itself can realize this action.
> Adding --sparse here grants the user finer control over sparse
> entries. If the user only wants to peak into the index without

s/peak/peek/

> caring about sparse entries, --cached should suffice; if the user
> wants to peak into the index _and_ cares about sparse entries,
> combining --sparse with --cached can address this need.
> 
> Suggested-by: Victoria Dye <vdye@github.com>
> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
> ---
>  builtin/grep.c                  | 10 +++++++++-
>  t/t7817-grep-sparse-checkout.sh | 12 ++++++------
>  2 files changed, 15 insertions(+), 7 deletions(-)

You mentioned in Slack that you missed the documentation of the --sparse
option. Just pointing it out here so we don't forget.

> 
> diff --git a/builtin/grep.c b/builtin/grep.c
> index e6bcdf860c..61402e8084 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -96,6 +96,8 @@ static pthread_cond_t cond_result;
>  
>  static int skip_first_line;
>  
> +static int grep_sparse = 0;
> +

I initially thought it might be good to not define an additional global,
but there are many defined in this file outside of the context and they
are spread out with extra whitespace like this.

>  static void add_work(struct grep_opt *opt, struct grep_source *gs)
>  {
>  	if (opt->binary != GREP_BINARY_TEXT)
> @@ -525,7 +527,11 @@ static int grep_cache(struct grep_opt *opt,
>  	for (nr = 0; nr < repo->index->cache_nr; nr++) {
>  		const struct cache_entry *ce = repo->index->cache[nr];
>  
> -		if (!cached && ce_skip_worktree(ce))

This logic would skip files marked with SKIP_WORKTREE _unless_ --cached
was provided.

> +		/*
> +		 * If ce is a SKIP_WORKTREE entry, look into it when both
> +		 * --sparse and --cached are given.
> +		 */
> +		if (!(grep_sparse && cached) && ce_skip_worktree(ce))
>  			continue;

The logic of this if statement is backwards from the comment because a
true statement means "skip the entry" _not_ "look into it".

	/*
	 * Skip entries with SKIP_WORKTREE unless both --sparse and
	 * --cached are given.
	 */

But again, we might want to consider this alternative:

	/*
	 * Skip entries with SKIP_WORKTREE unless --sparse is given.
	 */
	if (!grep_sparse && ce_skip_worktree(ce))
		continue;

This will require further changes below, specifically this bit:

			/*
			 * If CE_VALID is on, we assume worktree file and its
			 * cache entry are identical, even if worktree file has
			 * been modified, so use cache version instead
			 */
			if (cached || (ce->ce_flags & CE_VALID)) {
				if (ce_stage(ce) || ce_intent_to_add(ce))
					continue;
				hit |= grep_oid(opt, &ce->oid, name.buf,
						 0, name.buf);
			} else {

We need to activate this grep_oid() call also when ce_skip_worktree(c) is
true. That is, if we want 'git grep --sparse' to extend the search beyond
the worktree and into the sparse entries.

>  
>  		strbuf_setlen(&name, name_base_len);
> @@ -963,6 +969,8 @@ int cmd_grep(int argc, const char **argv, const char *prefix)
>  			   PARSE_OPT_NOCOMPLETE),
>  		OPT_INTEGER('m', "max-count", &opt.max_count,
>  			N_("maximum number of results per file")),
> +		OPT_BOOL(0, "sparse", &grep_sparse,
> +			 N_("search sparse contents and expand sparse index")),

This "and expand sparse index" is an internal implementation detail, not a
heplful item for the help text. Instead, perhaps:

	"search the contents of files outside the sparse-checkout definition"

(Also, while the sparse index is being expanded right now, I would expect
to not expand the sparse index by the end of the series.)

> -test_expect_success 'grep --cached searches entries with the SKIP_WORKTREE bit' '
> +test_expect_success 'grep --cached and --sparse searches entries with the SKIP_WORKTREE bit' '
>  	cat >expect <<-EOF &&
>  	a:text
>  	b:text
>  	dir/c:text
>  	EOF
> -	git grep --cached "text" >actual &&
> +	git grep --cached --sparse "text" >actual &&
>  	test_cmp expect actual
>  '

Please add a test that demonstrates the change of behavior when only --cached
is provided, not --sparse.

(If you take my suggestion to allow 'git grep --sparse' to do something
different, then also add a test for that case.)

>  
> @@ -143,7 +143,7 @@ test_expect_success 'grep --recurse-submodules honors sparse checkout in submodu
>  	test_cmp expect actual
>  '
>  
> -test_expect_success 'grep --recurse-submodules --cached searches entries with the SKIP_WORKTREE bit' '
> +test_expect_success 'grep --recurse-submodules --cached and --sparse searches entries with the SKIP_WORKTREE bit' '
>  	cat >expect <<-EOF &&
>  	a:text
>  	b:text
> @@ -152,7 +152,7 @@ test_expect_success 'grep --recurse-submodules --cached searches entries with th
>  	sub/B/b:text
>  	sub2/a:text
>  	EOF
> -	git grep --recurse-submodules --cached "text" >actual &&
> +	git grep --recurse-submodules --cached --sparse "text" >actual &&
>  	test_cmp expect actual
>  '
> @@ -166,7 +166,7 @@ test_expect_success 'working tree grep does not search the index with CE_VALID a
>  	test_cmp expect actual
>  '
>  
> -test_expect_success 'grep --cached searches index entries with both CE_VALID and SKIP_WORKTREE' '
> +test_expect_success 'grep --cached and --sparse searches index entries with both CE_VALID and SKIP_WORKTREE' '
>  	cat >expect <<-EOF &&
>  	a:text
>  	b:text
> @@ -174,7 +174,7 @@ test_expect_success 'grep --cached searches index entries with both CE_VALID and
>  	EOF
>  	test_when_finished "git update-index --no-assume-unchanged b" &&
>  	git update-index --assume-unchanged b &&
> -	git grep --cached text >actual &&
> +	git grep --cached --sparse text >actual &&
>  	test_cmp expect actual
>  '

Same with these two tests. Add additional commands that show the change of
behavior when only using '--cached'.

Thanks,
-Stolee
Junio C Hamano Aug. 17, 2022, 5:13 p.m. UTC | #2
Derrick Stolee <derrickstolee@github.com> writes:

> On 8/17/2022 3:56 AM, Shaoxuan Yuan wrote:
>> Add a --sparse option to `git-grep`. This option is mainly used to:
>> 
>> If searching in the index (using --cached):
>> 
>> With --sparse, proceed the action when the current cache_entry is
>
> This phrasing is awkward. It might be better to reframe to describe the
> _why_ before the _what_

Thanks for an excellent suggestion.  As a project participant, I
could guess the motivation, but couldn't link the parts of the
proposed log message to what I thought was being said X-<.  The
below is much clearer.

>   When the '--cached' option is used with the 'git grep' command, the
>   search is limited to the blobs found in the index, not in the worktree.
>   If the user has enabled sparse-checkout, this might present more results
>   than they would like, since the files outside of the sparse-checkout are
>   unlikely to be important to them.

Great.  As an explanation of the reasoning behind the design
decision, I do not think it is bad to go even stronger than "might
... would like" and assume or declare that those users who use
sparse-checkout are the ones who do NOT want to see the parts of the
tree that are sparsed out.  And based on that assumption, "grep" and
"grep --cached" should not bother reporting hit from the part that
the user is not interested in.

By stating the design and the reasoning behind that decision clearly
like so, we allow future developers to reconsider the earlier design
decision more easily.  In 7 years, they may find that the Git users
in their era use sparse-checkout even when they still care about the
contents in the sparsed out area, in which case the basic assumption
behind this change is no longer valid and would allow them to make
"grep" and "grep --cached" behave differently.

>   Change the default behavior of 'git grep' to focus on the files within
>   the sparse-checkout definition. To enable the previous behavior, add a
>   '--sparse' option to 'git grep' that triggers the old behavior that
>   inspects paths outside of the sparse-checkout definition when paired
>   with the '--cached' option.

Yup.  Is that "--sparse" or "--unsparse"?  We are busting the sparse
boundary and looking for everything, and calling the option to do so
"--sparse" somehow feels counter-intuitive, at least to me.

Thanks.
Victoria Dye Aug. 17, 2022, 5:34 p.m. UTC | #3
Junio C Hamano wrote:
> Derrick Stolee <derrickstolee@github.com> writes:
> 
>> On 8/17/2022 3:56 AM, Shaoxuan Yuan wrote:
>>> Add a --sparse option to `git-grep`. This option is mainly used to:
>>>
>>> If searching in the index (using --cached):
>>>
>>> With --sparse, proceed the action when the current cache_entry is
>>
>> This phrasing is awkward. It might be better to reframe to describe the
>> _why_ before the _what_
> 
> Thanks for an excellent suggestion.  As a project participant, I
> could guess the motivation, but couldn't link the parts of the
> proposed log message to what I thought was being said X-<.  The
> below is much clearer.
> 
>>   When the '--cached' option is used with the 'git grep' command, the
>>   search is limited to the blobs found in the index, not in the worktree.
>>   If the user has enabled sparse-checkout, this might present more results
>>   than they would like, since the files outside of the sparse-checkout are
>>   unlikely to be important to them.
> 
> Great.  As an explanation of the reasoning behind the design
> decision, I do not think it is bad to go even stronger than "might
> ... would like" and assume or declare that those users who use
> sparse-checkout are the ones who do NOT want to see the parts of the
> tree that are sparsed out.  And based on that assumption, "grep" and
> "grep --cached" should not bother reporting hit from the part that
> the user is not interested in.
> 
> By stating the design and the reasoning behind that decision clearly
> like so, we allow future developers to reconsider the earlier design
> decision more easily.  In 7 years, they may find that the Git users
> in their era use sparse-checkout even when they still care about the
> contents in the sparsed out area, in which case the basic assumption
> behind this change is no longer valid and would allow them to make
> "grep" and "grep --cached" behave differently.
> 
>>   Change the default behavior of 'git grep' to focus on the files within
>>   the sparse-checkout definition. To enable the previous behavior, add a
>>   '--sparse' option to 'git grep' that triggers the old behavior that
>>   inspects paths outside of the sparse-checkout definition when paired
>>   with the '--cached' option.
> 
> Yup.  Is that "--sparse" or "--unsparse"?  We are busting the sparse
> boundary and looking for everything, and calling the option to do so
> "--sparse" somehow feels counter-intuitive, at least to me.

It is a bit unintuitive, but '--sparse' is already used to mean "operate on
SKIP_WORKTREE entries (i.e., pretend the repo isn't a sparse-checkout)" in
both 'add' (0299a69694 (add: implement the --sparse option, 2021-09-24)) and
'rm' (f9786f9b85 (rm: add --sparse option, 2021-09-24)). The
'checkout-index' option '--ignore-skip-worktree-bits' indicates similar
behavior (and is, IMO, similarly confusing with its use of "ignore").

I'm not sure '--unsparse' would fit as an alternative, though, since 'git
grep' isn't really "unsparsifying" the repo (to me, that would imply
updating the index to remove the 'SKIP_WORKTREE' flag). Rather, it's looking
at files that are sparse when, by default, it does not. 

I still like the consistency of '--sparse' with existing similar options in
other commands but, if we want to try something clearer here, maybe
something like '--search-sparse' is more descriptive?

> 
> Thanks.
Elijah Newren Aug. 17, 2022, 5:37 p.m. UTC | #4
On Wed, Aug 17, 2022 at 7:25 AM Derrick Stolee <derrickstolee@github.com> wrote:
>
> On 8/17/2022 3:56 AM, Shaoxuan Yuan wrote:
> > Add a --sparse option to `git-grep`. This option is mainly used to:
> >
> > If searching in the index (using --cached):
> >
> > With --sparse, proceed the action when the current cache_entry is
>
> This phrasing is awkward. It might be better to reframe to describe the
> _why_ before the _what_
>
>   When the '--cached' option is used with the 'git grep' command, the
>   search is limited to the blobs found in the index, not in the worktree.
>   If the user has enabled sparse-checkout, this might present more results
>   than they would like, since the files outside of the sparse-checkout are
>   unlikely to be important to them.
>
>   Change the default behavior of 'git grep' to focus on the files within
>   the sparse-checkout definition. To enable the previous behavior, add a
>   '--sparse' option to 'git grep' that triggers the old behavior that
>   inspects paths outside of the sparse-checkout definition when paired
>   with the '--cached' option.
>
> Or something like that. The documentation updates will also help clarify
> what happens when '--cached' is not included. I assume '--sparse' is
> ignored, but perhaps it _could_ allow looking at the cached files outside
> the sparse-checkout definition, this could make the simpler invocation of
> 'git grep --sparse <pattern>' be the way that users can search after their
> attempt to search the worktree failed.

In addition to Stolee's comments, isn't this command line confusing?

  $ git grep --cached --sparse   # Do a *dense* search
  $ git grep --cached            # Do a *sparse* search

?
Derrick Stolee Aug. 17, 2022, 5:43 p.m. UTC | #5
On 8/17/2022 1:34 PM, Victoria Dye wrote:
> Junio C Hamano wrote:
>> Yup.  Is that "--sparse" or "--unsparse"?  We are busting the sparse
>> boundary and looking for everything, and calling the option to do so
>> "--sparse" somehow feels counter-intuitive, at least to me.
> 
> It is a bit unintuitive, but '--sparse' is already used to mean "operate on
> SKIP_WORKTREE entries (i.e., pretend the repo isn't a sparse-checkout)" in
> both 'add' (0299a69694 (add: implement the --sparse option, 2021-09-24)) and
> 'rm' (f9786f9b85 (rm: add --sparse option, 2021-09-24)). The
> 'checkout-index' option '--ignore-skip-worktree-bits' indicates similar
> behavior (and is, IMO, similarly confusing with its use of "ignore").
> 
> I'm not sure '--unsparse' would fit as an alternative, though, since 'git
> grep' isn't really "unsparsifying" the repo (to me, that would imply
> updating the index to remove the 'SKIP_WORKTREE' flag). Rather, it's looking
> at files that are sparse when, by default, it does not. 
> 
> I still like the consistency of '--sparse' with existing similar options in
> other commands but, if we want to try something clearer here, maybe
> something like '--search-sparse' is more descriptive?

My interpretation of '--sparse' is "include skip-worktree paths"
thinking of those paths being "sparse paths".

A too-long version could be '--ignore-sparse-checkout', but I can
understand the confusion where '--sparse' is interpreted as
'--respect-sparse-checkout'.

The existing pattern here means that it isn't Shaoxuan's responsibility
to pick a better name, but if we are interested in changing the name,
then we have some work to replace the previous '--sparse' options with
that name. I could do that replacement, assuming we land on a better name
and are willing to have that change of behavior.

Thanks,
-Stolee
Junio C Hamano Aug. 17, 2022, 6:47 p.m. UTC | #6
Derrick Stolee <derrickstolee@github.com> writes:

>> It is a bit unintuitive, but '--sparse' is already used to mean "operate on
>> SKIP_WORKTREE entries (i.e., pretend the repo isn't a sparse-checkout)" in
>> both 'add' (0299a69694 (add: implement the --sparse option, 2021-09-24)) and
>> 'rm' (f9786f9b85 (rm: add --sparse option, 2021-09-24)). The
>> 'checkout-index' option '--ignore-skip-worktree-bits' indicates similar
>> behavior (and is, IMO, similarly confusing with its use of "ignore").

OK, I forgot about these precedents.  "ignore skip worktree bits" is
quite a mouthful, but expresses what is going on quite clearly.
Instead of honoring the skip-worktree bit, behave as if they are not
set, so we bust the "sparse" boundary.

> The existing pattern here means that it isn't Shaoxuan's responsibility
> to pick a better name, but if we are interested in changing the name,
> then we have some work to replace the previous '--sparse' options with
> that name. I could do that replacement, assuming we land on a better name
> and are willing to have that change of behavior.

It all depends on how deeply the existing "--sparse" are anchored in
users' minds.  If we have been with them for nearly a year and three
major releases, it is too late to casually "fix" without a proper
transition strategy, I am afraid.  And I am not even sure if it is
worth the trouble.

In any case, let's leave it totally outside the scope of the topic.
As long as we are consistently unintuitive with "--sparse", then I
think we are OK, because users are malleable and can easily get used
to anything as long as it is consistent ;-)

Thanks.
Shaoxuan Yuan Aug. 24, 2022, 6:20 p.m. UTC | #7
Hi reviewrs,

I came back from busying with relocation :)

On 8/17/2022 10:12 PM, Derrick Stolee wrote:
 > On 8/17/2022 3:56 AM, Shaoxuan Yuan wrote:
 >> Add a --sparse option to `git-grep`. This option is mainly used to:
 >>
 >> If searching in the index (using --cached):
 >>
 >> With --sparse, proceed the action when the current cache_entry is
 >
 > This phrasing is awkward. It might be better to reframe to describe the
 > _why_ before the _what_
 >
 >   When the '--cached' option is used with the 'git grep' command, the
 >   search is limited to the blobs found in the index, not in the worktree.
 >   If the user has enabled sparse-checkout, this might present more 
results
 >   than they would like, since the files outside of the 
sparse-checkout are
 >   unlikely to be important to them.
 >
 >   Change the default behavior of 'git grep' to focus on the files within
 >   the sparse-checkout definition. To enable the previous behavior, add a
 >   '--sparse' option to 'git grep' that triggers the old behavior that
 >   inspects paths outside of the sparse-checkout definition when paired
 >   with the '--cached' option.

Good suggestion!

 > Or something like that. The documentation updates will also help clarify
 > what happens when '--cached' is not included. I assume '--sparse' is
 > ignored, but perhaps it _could_ allow looking at the cached files outside
 > the sparse-checkout definition, this could make the simpler invocation of
 > 'git grep --sparse <pattern>' be the way that users can search after 
their
 > attempt to search the worktree failed.

This simpler version was in my earlier local branch, but later I
decided not to go with it. I found the difference between these two
approaches, is that "--cached --sparse" is more correct in terms of
how Git actually works (because sparsity is a concept in the index);
and "--sparse" is more comfortable for the end user.

I found the former one better here, because it is more self-explanatory,
and thus more info for the user, i.e. "you are now looking at the
index, and Git will also consider files outside of sparse definition."

To be honest, I don't know which one is "better", but I think I'll
keep the current implementation unless something more convincing shows
up later.

 >> marked with SKIP_WORKTREE bit (the default is to skip this kind of
 >> entry). Before this patch, --cached itself can realize this action.
 >> Adding --sparse here grants the user finer control over sparse
 >> entries. If the user only wants to peak into the index without
 >
 > s/peak/peek/
 >
 >> caring about sparse entries, --cached should suffice; if the user
 >> wants to peak into the index _and_ cares about sparse entries,
 >> combining --sparse with --cached can address this need.
 >>
 >> Suggested-by: Victoria Dye <vdye@github.com>
 >> Signed-off-by: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
 >> ---
 >>  builtin/grep.c                  | 10 +++++++++-
 >>  t/t7817-grep-sparse-checkout.sh | 12 ++++++------
 >>  2 files changed, 15 insertions(+), 7 deletions(-)
 >
 > You mentioned in Slack that you missed the documentation of the --sparse
 > option. Just pointing it out here so we don't forget.

Will do.

 >>
 >> diff --git a/builtin/grep.c b/builtin/grep.c
 >> index e6bcdf860c..61402e8084 100644
 >> --- a/builtin/grep.c
 >> +++ b/builtin/grep.c
 >> @@ -96,6 +96,8 @@ static pthread_cond_t cond_result;
 >>
 >>  static int skip_first_line;
 >>
 >> +static int grep_sparse = 0;
 >> +
 >
 > I initially thought it might be good to not define an additional global,
 > but there are many defined in this file outside of the context and they
 > are spread out with extra whitespace like this.
 >
 >>  static void add_work(struct grep_opt *opt, struct grep_source *gs)
 >>  {
 >>      if (opt->binary != GREP_BINARY_TEXT)
 >> @@ -525,7 +527,11 @@ static int grep_cache(struct grep_opt *opt,
 >>      for (nr = 0; nr < repo->index->cache_nr; nr++) {
 >>          const struct cache_entry *ce = repo->index->cache[nr];
 >>
 >> -        if (!cached && ce_skip_worktree(ce))
 >
 > This logic would skip files marked with SKIP_WORKTREE _unless_ --cached
 > was provided.
 >
 >> +        /*
 >> +         * If ce is a SKIP_WORKTREE entry, look into it when both
 >> +         * --sparse and --cached are given.
 >> +         */
 >> +        if (!(grep_sparse && cached) && ce_skip_worktree(ce))
 >>              continue;
 >
 > The logic of this if statement is backwards from the comment because a
 > true statement means "skip the entry" _not_ "look into it".
 >
 >     /*
 >      * Skip entries with SKIP_WORKTREE unless both --sparse and
 >      * --cached are given.
 >      */

Got it.

 > But again, we might want to consider this alternative:
 >
 >     /*
 >      * Skip entries with SKIP_WORKTREE unless --sparse is given.
 >      */
 >     if (!grep_sparse && ce_skip_worktree(ce))
 >         continue;
 >
 > This will require further changes below, specifically this bit:
 >
 >             /*
 >              * If CE_VALID is on, we assume worktree file and its
 >              * cache entry are identical, even if worktree file has
 >              * been modified, so use cache version instead
 >              */
 >             if (cached || (ce->ce_flags & CE_VALID)) {
 >                 if (ce_stage(ce) || ce_intent_to_add(ce))
 >                     continue;
 >                 hit |= grep_oid(opt, &ce->oid, name.buf,
 >                          0, name.buf);
 >             } else {
 >
 > We need to activate this grep_oid() call also when ce_skip_worktree(c) is
 > true. That is, if we want 'git grep --sparse' to extend the search beyond
 > the worktree and into the sparse entries.
 >
 >>
 >>          strbuf_setlen(&name, name_base_len);
 >> @@ -963,6 +969,8 @@ int cmd_grep(int argc, const char **argv, const 
char *prefix)
 >>                 PARSE_OPT_NOCOMPLETE),
 >>          OPT_INTEGER('m', "max-count", &opt.max_count,
 >>              N_("maximum number of results per file")),
 >> +        OPT_BOOL(0, "sparse", &grep_sparse,
 >> +             N_("search sparse contents and expand sparse index")),
 >
 > This "and expand sparse index" is an internal implementation detail, 
not a
 > heplful item for the help text. Instead, perhaps:
 >
 >     "search the contents of files outside the sparse-checkout definition"

Sounds good!

 > (Also, while the sparse index is being expanded right now, I would expect
 > to not expand the sparse index by the end of the series.)
 >
 >> -test_expect_success 'grep --cached searches entries with the 
SKIP_WORKTREE bit' '
 >> +test_expect_success 'grep --cached and --sparse searches entries 
with the SKIP_WORKTREE bit' '
 >>      cat >expect <<-EOF &&
 >>      a:text
 >>      b:text
 >>      dir/c:text
 >>      EOF
 >> -    git grep --cached "text" >actual &&
 >> +    git grep --cached --sparse "text" >actual &&
 >>      test_cmp expect actual
 >>  '
 >
 > Please add a test that demonstrates the change of behavior when only 
--cached
 > is provided, not --sparse.

Sure!

 > (If you take my suggestion to allow 'git grep --sparse' to do something
 > different, then also add a test for that case.)
 >
 >>
 >> @@ -143,7 +143,7 @@ test_expect_success 'grep --recurse-submodules 
honors sparse checkout in submodu
 >>      test_cmp expect actual
 >>  '
 >>
 >> -test_expect_success 'grep --recurse-submodules --cached searches 
entries with the SKIP_WORKTREE bit' '
 >> +test_expect_success 'grep --recurse-submodules --cached and 
--sparse searches entries with the SKIP_WORKTREE bit' '
 >>      cat >expect <<-EOF &&
 >>      a:text
 >>      b:text
 >> @@ -152,7 +152,7 @@ test_expect_success 'grep --recurse-submodules 
--cached searches entries with th
 >>      sub/B/b:text
 >>      sub2/a:text
 >>      EOF
 >> -    git grep --recurse-submodules --cached "text" >actual &&
 >> +    git grep --recurse-submodules --cached --sparse "text" >actual &&
 >>      test_cmp expect actual
 >>  '
 >> @@ -166,7 +166,7 @@ test_expect_success 'working tree grep does not 
search the index with CE_VALID a
 >>      test_cmp expect actual
 >>  '
 >>
 >> -test_expect_success 'grep --cached searches index entries with both 
CE_VALID and SKIP_WORKTREE' '
 >> +test_expect_success 'grep --cached and --sparse searches index 
entries with both CE_VALID and SKIP_WORKTREE' '
 >>      cat >expect <<-EOF &&
 >>      a:text
 >>      b:text
 >> @@ -174,7 +174,7 @@ test_expect_success 'grep --cached searches 
index entries with both CE_VALID and
 >>      EOF
 >>      test_when_finished "git update-index --no-assume-unchanged b" &&
 >>      git update-index --assume-unchanged b &&
 >> -    git grep --cached text >actual &&
 >> +    git grep --cached --sparse text >actual &&
 >>      test_cmp expect actual
 >>  '
 >
 > Same with these two tests. Add additional commands that show the 
change of
 > behavior when only using '--cached'.

--
Thanks,
Shaoxuan
Derrick Stolee Aug. 24, 2022, 7:08 p.m. UTC | #8
On 8/24/2022 2:20 PM, Shaoxuan Yuan wrote:
> Hi reviewrs,
> 
> I came back from busying with relocation :)

Welcome back! I'm looking forward to overlapping our timezones a bit more.
 
> On 8/17/2022 10:12 PM, Derrick Stolee wrote:
>> On 8/17/2022 3:56 AM, Shaoxuan Yuan wrote:
>>> Add a --sparse option to `git-grep`. This option is mainly used to:

>> Or something like that. The documentation updates will also help clarify
>> what happens when '--cached' is not included. I assume '--sparse' is
>> ignored, but perhaps it _could_ allow looking at the cached files outside
>> the sparse-checkout definition, this could make the simpler invocation of
>> 'git grep --sparse <pattern>' be the way that users can search after their
>> attempt to search the worktree failed.
> 
> This simpler version was in my earlier local branch, but later I
> decided not to go with it. I found the difference between these two
> approaches, is that "--cached --sparse" is more correct in terms of
> how Git actually works (because sparsity is a concept in the index);
> and "--sparse" is more comfortable for the end user.
> 
> I found the former one better here, because it is more self-explanatory,
> and thus more info for the user, i.e. "you are now looking at the
> index, and Git will also consider files outside of sparse definition."
> 
> To be honest, I don't know which one is "better", but I think I'll
> keep the current implementation unless something more convincing shows
> up later.

I think it is fine for you to keep the "--sparse requires --cached"
approach that you have now, since we can always choose to extend
the options to allow --sparse without --cached later.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/builtin/grep.c b/builtin/grep.c
index e6bcdf860c..61402e8084 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -96,6 +96,8 @@  static pthread_cond_t cond_result;
 
 static int skip_first_line;
 
+static int grep_sparse = 0;
+
 static void add_work(struct grep_opt *opt, struct grep_source *gs)
 {
 	if (opt->binary != GREP_BINARY_TEXT)
@@ -525,7 +527,11 @@  static int grep_cache(struct grep_opt *opt,
 	for (nr = 0; nr < repo->index->cache_nr; nr++) {
 		const struct cache_entry *ce = repo->index->cache[nr];
 
-		if (!cached && ce_skip_worktree(ce))
+		/*
+		 * If ce is a SKIP_WORKTREE entry, look into it when both
+		 * --sparse and --cached are given.
+		 */
+		if (!(grep_sparse && cached) && ce_skip_worktree(ce))
 			continue;
 
 		strbuf_setlen(&name, name_base_len);
@@ -963,6 +969,8 @@  int cmd_grep(int argc, const char **argv, const char *prefix)
 			   PARSE_OPT_NOCOMPLETE),
 		OPT_INTEGER('m', "max-count", &opt.max_count,
 			N_("maximum number of results per file")),
+		OPT_BOOL(0, "sparse", &grep_sparse,
+			 N_("search sparse contents and expand sparse index")),
 		OPT_END()
 	};
 	grep_prefix = prefix;
diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh
index eb59564565..ca71f526eb 100755
--- a/t/t7817-grep-sparse-checkout.sh
+++ b/t/t7817-grep-sparse-checkout.sh
@@ -118,13 +118,13 @@  test_expect_success 'grep searches unmerged file despite not matching sparsity p
 	test_cmp expect actual
 '
 
-test_expect_success 'grep --cached searches entries with the SKIP_WORKTREE bit' '
+test_expect_success 'grep --cached and --sparse searches entries with the SKIP_WORKTREE bit' '
 	cat >expect <<-EOF &&
 	a:text
 	b:text
 	dir/c:text
 	EOF
-	git grep --cached "text" >actual &&
+	git grep --cached --sparse "text" >actual &&
 	test_cmp expect actual
 '
 
@@ -143,7 +143,7 @@  test_expect_success 'grep --recurse-submodules honors sparse checkout in submodu
 	test_cmp expect actual
 '
 
-test_expect_success 'grep --recurse-submodules --cached searches entries with the SKIP_WORKTREE bit' '
+test_expect_success 'grep --recurse-submodules --cached and --sparse searches entries with the SKIP_WORKTREE bit' '
 	cat >expect <<-EOF &&
 	a:text
 	b:text
@@ -152,7 +152,7 @@  test_expect_success 'grep --recurse-submodules --cached searches entries with th
 	sub/B/b:text
 	sub2/a:text
 	EOF
-	git grep --recurse-submodules --cached "text" >actual &&
+	git grep --recurse-submodules --cached --sparse "text" >actual &&
 	test_cmp expect actual
 '
 
@@ -166,7 +166,7 @@  test_expect_success 'working tree grep does not search the index with CE_VALID a
 	test_cmp expect actual
 '
 
-test_expect_success 'grep --cached searches index entries with both CE_VALID and SKIP_WORKTREE' '
+test_expect_success 'grep --cached and --sparse searches index entries with both CE_VALID and SKIP_WORKTREE' '
 	cat >expect <<-EOF &&
 	a:text
 	b:text
@@ -174,7 +174,7 @@  test_expect_success 'grep --cached searches index entries with both CE_VALID and
 	EOF
 	test_when_finished "git update-index --no-assume-unchanged b" &&
 	git update-index --assume-unchanged b &&
-	git grep --cached text >actual &&
+	git grep --cached --sparse text >actual &&
 	test_cmp expect actual
 '