diff mbox series

[6/7] grep: add repository to OID grep sources

Message ID b2df2455871dc5c94ec804e9c0ce935c19da335f.1628618950.git.jonathantanmy@google.com (mailing list archive)
State Superseded
Headers show
Series In grep, no adding submodule ODB as alternates | expand

Commit Message

Jonathan Tan Aug. 10, 2021, 6:28 p.m. UTC
Record the repository whenever an OID grep source is created, and teach
the worker threads to explicitly provide the repository when accessing
objects.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 builtin/grep.c | 2 +-
 grep.c         | 7 +++++--
 grep.h         | 4 +++-
 3 files changed, 9 insertions(+), 4 deletions(-)

Comments

Emily Shaffer Aug. 11, 2021, 9:52 p.m. UTC | #1
On Tue, Aug 10, 2021 at 11:28:44AM -0700, Jonathan Tan wrote:
> 
> Record the repository whenever an OID grep source is created, and teach
> the worker threads to explicitly provide the repository when accessing
> objects.

Since this extra information is not used by this series at all, does it
make sense to only include this patch with the series covering the rest
of your partial-clone-with-submodules work?

 - Emily
Matheus Tavares Aug. 11, 2021, 11:28 p.m. UTC | #2
On Tue, Aug 10, 2021 at 3:29 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> Record the repository whenever an OID grep source is created, and teach
> the worker threads to explicitly provide the repository when accessing
> objects.

This patch fixes the second NEEDSWORK comment at grep_submodule(),
right? Maybe this comment could be replaced with a mention that
add_submodule_odb_by_path() is called for testing purposes, and it
should no longer produce a real addition to the alternates list?

> diff --git a/grep.h b/grep.h
> index f4a3090f1c..c5234f9b38 100644
> --- a/grep.h
> +++ b/grep.h
> @@ -187,6 +187,7 @@ struct grep_source {
>                 GREP_SOURCE_BUF,
>         } type;
>         void *identifier;
> +       struct repository *repo; /* if GREP_SOURCE_OID */

Hmm, the grep threads now have two `struct repository` references, one
in `struct grep_source` and one in `struct grep_opt` (see
builtin/grep.c:run()). The one from `struct grep_opt` will always be
`the_repository` (in the worker threads). I wonder if, in the long
run, we could instruct the worker threads to use the new reference
from `struct grep_source` throughout grep.c, and perhaps even remove
the one from `struct grep_opt`.

This would solve the issue I mentioned in [1], about git grep
currently ignoring the textconv attributes from submodules, and
instead applying the ones from the superproject in the submodules'
files. (Here there is an example of this bug: [2] .)

It would also allow grep to use the textconv cache from each
submodule, instead of saving/reading everything from the
superproject's textconv cache.

While this doesn't happen, though, could we perhaps add a comment
somewhere to avoid any confusion regarding the two different
repository pointers that the worker threads hold?

[1]: https://lore.kernel.org/git/CAHd-oW5iEQarYVxEXoTG-ua2zdoybTrSjCBKtO0YT292fm0NQQ@mail.gmail.com/
[2]: https://gitlab.com/-/snippets/1896951

>         char *buf;
>         unsigned long size;
> @@ -198,7 +199,8 @@ struct grep_source {
>  void grep_source_init_file(struct grep_source *gs, const char *name,
>                            const char *path);
>  void grep_source_init_oid(struct grep_source *gs, const char *name,
> -                         const char *path, const struct object_id *oid);
> +                         const char *path, const struct object_id *oid,
> +                         struct repository *repo);
>  void grep_source_init_buf(struct grep_source *gs);
>  void grep_source_clear_data(struct grep_source *gs);
>  void grep_source_clear(struct grep_source *gs);
> --
> 2.33.0.rc1.237.g0d66db33f3-goog
>
Jonathan Tan Aug. 13, 2021, 4:44 p.m. UTC | #3
> On Tue, Aug 10, 2021 at 11:28:44AM -0700, Jonathan Tan wrote:
> > 
> > Record the repository whenever an OID grep source is created, and teach
> > the worker threads to explicitly provide the repository when accessing
> > objects.
> 
> Since this extra information is not used by this series at all, does it
> make sense to only include this patch with the series covering the rest
> of your partial-clone-with-submodules work?

It is used by this series - it prevents an instance in which the
submodule ODBs would be lazily registered as alternates. I'll add a note
in the commit message to this effect.
Jonathan Tan Aug. 13, 2021, 4:47 p.m. UTC | #4
> On Tue, Aug 10, 2021 at 3:29 PM Jonathan Tan <jonathantanmy@google.com> wrote:
> >
> > Record the repository whenever an OID grep source is created, and teach
> > the worker threads to explicitly provide the repository when accessing
> > objects.
> 
> This patch fixes the second NEEDSWORK comment at grep_submodule(),
> right? Maybe this comment could be replaced with a mention that
> add_submodule_odb_by_path() is called for testing purposes, and it
> should no longer produce a real addition to the alternates list?

Good catch. I'll update the comment.

> > diff --git a/grep.h b/grep.h
> > index f4a3090f1c..c5234f9b38 100644
> > --- a/grep.h
> > +++ b/grep.h
> > @@ -187,6 +187,7 @@ struct grep_source {
> >                 GREP_SOURCE_BUF,
> >         } type;
> >         void *identifier;
> > +       struct repository *repo; /* if GREP_SOURCE_OID */
> 
> Hmm, the grep threads now have two `struct repository` references, one
> in `struct grep_source` and one in `struct grep_opt` (see
> builtin/grep.c:run()). The one from `struct grep_opt` will always be
> `the_repository` (in the worker threads). I wonder if, in the long
> run, we could instruct the worker threads to use the new reference
> from `struct grep_source` throughout grep.c, and perhaps even remove
> the one from `struct grep_opt`.
> 
> This would solve the issue I mentioned in [1], about git grep
> currently ignoring the textconv attributes from submodules, and
> instead applying the ones from the superproject in the submodules'
> files. (Here there is an example of this bug: [2] .)
> 
> It would also allow grep to use the textconv cache from each
> submodule, instead of saving/reading everything from the
> superproject's textconv cache.
> 
> While this doesn't happen, though, could we perhaps add a comment
> somewhere to avoid any confusion regarding the two different
> repository pointers that the worker threads hold?
> 
> [1]: https://lore.kernel.org/git/CAHd-oW5iEQarYVxEXoTG-ua2zdoybTrSjCBKtO0YT292fm0NQQ@mail.gmail.com/
> [2]: https://gitlab.com/-/snippets/1896951

Ah...another good catch. Thanks also for a link to your email - I'll
mention it when I add the comment you suggested.
diff mbox series

Patch

diff --git a/builtin/grep.c b/builtin/grep.c
index 5a40e18e47..ea6df6dca4 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -347,7 +347,7 @@  static int grep_oid(struct grep_opt *opt, const struct object_id *oid,
 	struct grep_source gs;
 
 	grep_source_name(opt, filename, tree_name_len, &pathbuf);
-	grep_source_init_oid(&gs, pathbuf.buf, path, oid);
+	grep_source_init_oid(&gs, pathbuf.buf, path, oid, opt->repo);
 	strbuf_release(&pathbuf);
 
 	if (num_threads > 1) {
diff --git a/grep.c b/grep.c
index ba3711dc56..14c677b4ae 100644
--- a/grep.c
+++ b/grep.c
@@ -1853,7 +1853,8 @@  void grep_source_init_file(struct grep_source *gs, const char *name,
 }
 
 void grep_source_init_oid(struct grep_source *gs, const char *name,
-			  const char *path, const struct object_id *oid)
+			  const char *path, const struct object_id *oid,
+			  struct repository *repo)
 {
 	gs->type = GREP_SOURCE_OID;
 	gs->name = xstrdup_or_null(name);
@@ -1862,6 +1863,7 @@  void grep_source_init_oid(struct grep_source *gs, const char *name,
 	gs->size = 0;
 	gs->driver = NULL;
 	gs->identifier = oiddup(oid);
+	gs->repo = repo;
 }
 
 void grep_source_init_buf(struct grep_source *gs)
@@ -1901,7 +1903,8 @@  static int grep_source_load_oid(struct grep_source *gs)
 {
 	enum object_type type;
 
-	gs->buf = read_object_file(gs->identifier, &type, &gs->size);
+	gs->buf = repo_read_object_file(gs->repo, gs->identifier, &type,
+					&gs->size);
 	if (!gs->buf)
 		return error(_("'%s': unable to read %s"),
 			     gs->name,
diff --git a/grep.h b/grep.h
index f4a3090f1c..c5234f9b38 100644
--- a/grep.h
+++ b/grep.h
@@ -187,6 +187,7 @@  struct grep_source {
 		GREP_SOURCE_BUF,
 	} type;
 	void *identifier;
+	struct repository *repo; /* if GREP_SOURCE_OID */
 
 	char *buf;
 	unsigned long size;
@@ -198,7 +199,8 @@  struct grep_source {
 void grep_source_init_file(struct grep_source *gs, const char *name,
 			   const char *path);
 void grep_source_init_oid(struct grep_source *gs, const char *name,
-			  const char *path, const struct object_id *oid);
+			  const char *path, const struct object_id *oid,
+			  struct repository *repo);
 void grep_source_init_buf(struct grep_source *gs);
 void grep_source_clear_data(struct grep_source *gs);
 void grep_source_clear(struct grep_source *gs);