[GSoC] grep: fix worktree case in submodules
diff mbox series

Message ID ba3d8a953a2cc5b4ff03fefa434ffd7bd6a78f15.1564505605.git.matheus.bernardino@usp.br
State New
Headers show
Series
  • [GSoC] grep: fix worktree case in submodules
Related show

Commit Message

Matheus Tavares Bernardino July 30, 2019, 4:53 p.m. UTC
Running git-grep with --recurse-submodules results in a cached grep for
the submodules even when --cached is not used. This makes all
modifications in submodules' tracked files be always ignored when
grepping. Solve that making git-grep respect the cached option when
invoking grep_cache() inside grep_submodule(). Also, add tests to
ensure that the desired behavior is performed.

Reported-by: Daniel Zaoui <jackdanielz@eyomi.org>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 builtin/grep.c                     | 10 ++++++----
 t/t7814-grep-recurse-submodules.sh | 21 +++++++++++++++++++++
 2 files changed, 27 insertions(+), 4 deletions(-)

Comments

Junio C Hamano July 30, 2019, 8:04 p.m. UTC | #1
Matheus Tavares <matheus.bernardino@usp.br> writes:

> @@ -475,7 +475,7 @@ static int grep_submodule(struct grep_opt *opt,
>  		strbuf_release(&base);
>  		free(data);
>  	} else {
> -		hit = grep_cache(&subopt, pathspec, 1);
> +		hit = grep_cache(&subopt, pathspec, cached);
>  	}

Interesting.  It appears to me that this has always searched in
submodule index and never working tree.  I am not sure if there was
any specific reason to avoid looking into the working tree files.

Passing the 'cached' bit down from grep_cache() does look like a
sensible way to go.

> @@ -523,7 +523,8 @@ static int grep_cache(struct grep_opt *opt,
>  			}
>  		} else if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
>  			   submodule_path_match(repo->index, pathspec, name.buf, NULL)) {
> -			hit |= grep_submodule(opt, pathspec, NULL, ce->name, ce->name);
> +			hit |= grep_submodule(opt, pathspec, NULL, ce->name,
> +					      ce->name, cached);
>  		} else {
>  			continue;
>  		}
> @@ -598,7 +599,8 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
>  			free(data);
>  		} else if (recurse_submodules && S_ISGITLINK(entry.mode)) {
>  			hit |= grep_submodule(opt, pathspec, &entry.oid,
> -					      base->buf, base->buf + tn_len);
> +					      base->buf, base->buf + tn_len,
> +					      1); /* ignored */

The trailing comment is misleading.  In the context of reviewing
this patch, we can probably tell it applies only to that "1", but
if you read only the postimage, the "ignored" comment looks as if
the call itself is somehow ignored by somebody unspecified.  It is
not clear at all that it is only about the final parameter.

If you must...

		hit |= grep_submodule(opt, pathspec, &entry.oid,
				      base->buf, base->buf + tn_len,
				      1 /* ignored */);

... is a reasonable way to write it.

> diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
> index a11366b4ce..946f91fa57 100755
> --- a/t/t7814-grep-recurse-submodules.sh
> +++ b/t/t7814-grep-recurse-submodules.sh
> @@ -408,4 +408,25 @@ test_expect_success 'grep --recurse-submodules with submodules without .gitmodul
>  	test_cmp expect actual
>  '
>  
> +reset_and_clean () {
> +	git reset --hard &&
> +	git clean -fd &&
> +	git submodule foreach --recursive 'git reset --hard' &&
> +	git submodule foreach --recursive 'git clean -fd'

Do we need two separate foreach, instread of a single one that does
both reset and clean (i.e. "git reset --hard && git clean -f -d")?
Christian Couder July 30, 2019, 10:02 p.m. UTC | #2
On Tue, Jul 30, 2019 at 10:04 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matheus Tavares <matheus.bernardino@usp.br> writes:

> > @@ -598,7 +599,8 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
> >                       free(data);
> >               } else if (recurse_submodules && S_ISGITLINK(entry.mode)) {
> >                       hit |= grep_submodule(opt, pathspec, &entry.oid,
> > -                                           base->buf, base->buf + tn_len);
> > +                                           base->buf, base->buf + tn_len,
> > +                                           1); /* ignored */
>
> The trailing comment is misleading.  In the context of reviewing
> this patch, we can probably tell it applies only to that "1", but
> if you read only the postimage, the "ignored" comment looks as if
> the call itself is somehow ignored by somebody unspecified.  It is
> not clear at all that it is only about the final parameter.
>
> If you must...
>
>                 hit |= grep_submodule(opt, pathspec, &entry.oid,
>                                       base->buf, base->buf + tn_len,
>                                       1 /* ignored */);

Yeah, I suggested adding an "/* ignored */" comment, but I was indeed
thinking about something like this.

> ... is a reasonable way to write it.
Matheus Tavares Bernardino July 30, 2019, 11:40 p.m. UTC | #3
On Tue, Jul 30, 2019 at 5:04 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Matheus Tavares <matheus.bernardino@usp.br> writes:
>
> > @@ -475,7 +475,7 @@ static int grep_submodule(struct grep_opt *opt,
> >               strbuf_release(&base);
> >               free(data);
> >       } else {
> > -             hit = grep_cache(&subopt, pathspec, 1);
> > +             hit = grep_cache(&subopt, pathspec, cached);
> >       }
>
> Interesting.  It appears to me that this has always searched in
> submodule index and never working tree.  I am not sure if there was
> any specific reason to avoid looking into the working tree files.

It seems that git-grep was taking the worktree into account before
f9ee2fc ("grep: recurse in-process using 'struct repository'",
02-08-2017). So maybe it was just a mistake during the in-process
conversion.

> Passing the 'cached' bit down from grep_cache() does look like a
> sensible way to go.
>
> > @@ -523,7 +523,8 @@ static int grep_cache(struct grep_opt *opt,
> >                       }
> >               } else if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
> >                          submodule_path_match(repo->index, pathspec, name.buf, NULL)) {
> > -                     hit |= grep_submodule(opt, pathspec, NULL, ce->name, ce->name);
> > +                     hit |= grep_submodule(opt, pathspec, NULL, ce->name,
> > +                                           ce->name, cached);
> >               } else {
> >                       continue;
> >               }
> > @@ -598,7 +599,8 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
> >                       free(data);
> >               } else if (recurse_submodules && S_ISGITLINK(entry.mode)) {
> >                       hit |= grep_submodule(opt, pathspec, &entry.oid,
> > -                                           base->buf, base->buf + tn_len);
> > +                                           base->buf, base->buf + tn_len,
> > +                                           1); /* ignored */
>
> The trailing comment is misleading.  In the context of reviewing
> this patch, we can probably tell it applies only to that "1", but
> if you read only the postimage, the "ignored" comment looks as if
> the call itself is somehow ignored by somebody unspecified.  It is
> not clear at all that it is only about the final parameter.
>
> If you must...
>
>                 hit |= grep_submodule(opt, pathspec, &entry.oid,
>                                       base->buf, base->buf + tn_len,
>                                       1 /* ignored */);
>
> ... is a reasonable way to write it.

Right, thanks.

> > diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
> > index a11366b4ce..946f91fa57 100755
> > --- a/t/t7814-grep-recurse-submodules.sh
> > +++ b/t/t7814-grep-recurse-submodules.sh
> > @@ -408,4 +408,25 @@ test_expect_success 'grep --recurse-submodules with submodules without .gitmodul
> >       test_cmp expect actual
> >  '
> >
> > +reset_and_clean () {
> > +     git reset --hard &&
> > +     git clean -fd &&
> > +     git submodule foreach --recursive 'git reset --hard' &&
> > +     git submodule foreach --recursive 'git clean -fd'
>
> Do we need two separate foreach, instread of a single one that does
> both reset and clean (i.e. "git reset --hard && git clean -f -d")?

Indeed! Thanks. I'll send a v2 addressing both comments.
Junio C Hamano July 31, 2019, 3:57 p.m. UTC | #4
Christian Couder <christian.couder@gmail.com> writes:

> On Tue, Jul 30, 2019 at 10:04 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Matheus Tavares <matheus.bernardino@usp.br> writes:
>
>> > @@ -598,7 +599,8 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
>> >                       free(data);
>> >               } else if (recurse_submodules && S_ISGITLINK(entry.mode)) {
>> >                       hit |= grep_submodule(opt, pathspec, &entry.oid,
>> > -                                           base->buf, base->buf + tn_len);
>> > +                                           base->buf, base->buf + tn_len,
>> > +                                           1); /* ignored */
>>
>> The trailing comment is misleading.  In the context of reviewing
>> this patch, we can probably tell it applies only to that "1", but
>> if you read only the postimage, the "ignored" comment looks as if
>> the call itself is somehow ignored by somebody unspecified.  It is
>> not clear at all that it is only about the final parameter.
>>
>> If you must...
>>
>>                 hit |= grep_submodule(opt, pathspec, &entry.oid,
>>                                       base->buf, base->buf + tn_len,
>>                                       1 /* ignored */);
>
> Yeah, I suggested adding an "/* ignored */" comment, but I was indeed
> thinking about something like this.
>
>> ... is a reasonable way to write it.

Thanks.  In this case, I am not sure if the comment here really
helps.  If anything, shouldn't there be a comment near the top of
grep_submodule() that says 'cached bit is meaningful only when you
feed an empty oid, aka "not grepping inside a tree object"'?
Matheus Tavares Bernardino Aug. 1, 2019, 3:08 a.m. UTC | #5
On Wed, Jul 31, 2019 at 12:57 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Christian Couder <christian.couder@gmail.com> writes:
>
> > On Tue, Jul 30, 2019 at 10:04 PM Junio C Hamano <gitster@pobox.com> wrote:
> >>
> >> Matheus Tavares <matheus.bernardino@usp.br> writes:
> >
> >> > @@ -598,7 +599,8 @@ static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
> >> >                       free(data);
> >> >               } else if (recurse_submodules && S_ISGITLINK(entry.mode)) {
> >> >                       hit |= grep_submodule(opt, pathspec, &entry.oid,
> >> > -                                           base->buf, base->buf + tn_len);
> >> > +                                           base->buf, base->buf + tn_len,
> >> > +                                           1); /* ignored */
> >>
> >> The trailing comment is misleading.  In the context of reviewing
> >> this patch, we can probably tell it applies only to that "1", but
> >> if you read only the postimage, the "ignored" comment looks as if
> >> the call itself is somehow ignored by somebody unspecified.  It is
> >> not clear at all that it is only about the final parameter.
> >>
> >> If you must...
> >>
> >>                 hit |= grep_submodule(opt, pathspec, &entry.oid,
> >>                                       base->buf, base->buf + tn_len,
> >>                                       1 /* ignored */);
> >
> > Yeah, I suggested adding an "/* ignored */" comment, but I was indeed
> > thinking about something like this.
> >
> >> ... is a reasonable way to write it.
>
> Thanks.  In this case, I am not sure if the comment here really
> helps.  If anything, shouldn't there be a comment near the top of
> grep_submodule() that says 'cached bit is meaningful only when you
> feed an empty oid, aka "not grepping inside a tree object"'?

Right, it makes sense. I'll add that.

Patch
diff mbox series

diff --git a/builtin/grep.c b/builtin/grep.c
index 560051784e..2699001fbd 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -403,7 +403,7 @@  static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 static int grep_submodule(struct grep_opt *opt,
 			  const struct pathspec *pathspec,
 			  const struct object_id *oid,
-			  const char *filename, const char *path)
+			  const char *filename, const char *path, int cached)
 {
 	struct repository subrepo;
 	struct repository *superproject = opt->repo;
@@ -475,7 +475,7 @@  static int grep_submodule(struct grep_opt *opt,
 		strbuf_release(&base);
 		free(data);
 	} else {
-		hit = grep_cache(&subopt, pathspec, 1);
+		hit = grep_cache(&subopt, pathspec, cached);
 	}
 
 	repo_clear(&subrepo);
@@ -523,7 +523,8 @@  static int grep_cache(struct grep_opt *opt,
 			}
 		} else if (recurse_submodules && S_ISGITLINK(ce->ce_mode) &&
 			   submodule_path_match(repo->index, pathspec, name.buf, NULL)) {
-			hit |= grep_submodule(opt, pathspec, NULL, ce->name, ce->name);
+			hit |= grep_submodule(opt, pathspec, NULL, ce->name,
+					      ce->name, cached);
 		} else {
 			continue;
 		}
@@ -598,7 +599,8 @@  static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
 			free(data);
 		} else if (recurse_submodules && S_ISGITLINK(entry.mode)) {
 			hit |= grep_submodule(opt, pathspec, &entry.oid,
-					      base->buf, base->buf + tn_len);
+					      base->buf, base->buf + tn_len,
+					      1); /* ignored */
 		}
 
 		strbuf_setlen(base, old_baselen);
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index a11366b4ce..946f91fa57 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -408,4 +408,25 @@  test_expect_success 'grep --recurse-submodules with submodules without .gitmodul
 	test_cmp expect actual
 '
 
+reset_and_clean () {
+	git reset --hard &&
+	git clean -fd &&
+	git submodule foreach --recursive 'git reset --hard' &&
+	git submodule foreach --recursive 'git clean -fd'
+}
+
+test_expect_success 'grep --recurse-submodules without --cached considers worktree modifications' '
+	reset_and_clean &&
+	echo "A modified line in submodule" >>submodule/a &&
+	echo "submodule/a:A modified line in submodule" >expect &&
+	git grep --recurse-submodules "A modified line in submodule" >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'grep --recurse-submodules with --cached ignores worktree modifications' '
+	reset_and_clean &&
+	echo "A modified line in submodule" >>submodule/a &&
+	test_must_fail git grep --recurse-submodules --cached "A modified line in submodule" >actual 2>&1 &&
+	test_must_be_empty actual
+'
 test_done