diff mbox series

[v2,7/8] submodule-config: pass repo upon blob config read

Message ID 94db10a4e5943d689113693c64633ddffa5508cc.1628888668.git.jonathantanmy@google.com (mailing list archive)
State Superseded
Headers show
Series In grep, no adding submodule ODB as alternates | expand

Commit Message

Jonathan Tan Aug. 13, 2021, 9:05 p.m. UTC
When reading the config of a submodule, if reading from a blob, read
using an explicitly specified repository instead of by adding the
submodule's ODB as an alternate and then reading an object from
the_repository.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 config.c           | 18 ++++++++++++------
 config.h           |  3 +++
 submodule-config.c |  5 +++--
 3 files changed, 18 insertions(+), 8 deletions(-)

Comments

Matheus Tavares Aug. 16, 2021, 2:32 p.m. UTC | #1
On Fri, Aug 13, 2021 at 6:05 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> When reading the config of a submodule, if reading from a blob, read
> using an explicitly specified repository instead of by adding the
> submodule's ODB as an alternate and then reading an object from
> the_repository.

Great!

At first, I thought this would also allow us to remove another
NEEDSWORK comment in grep_submodule(), together with a lock
protection:

/*
 * NEEDSWORK: repo_read_gitmodules() might call
 * add_to_alternates_memory() via config_from_gitmodules(). This
 * operation causes a race condition with concurrent object readings
 * performed by the worker threads. That's why we need obj_read_lock()
 * here. It should be removed once it's no longer necessary to add the
 * subrepo's odbs to the in-memory alternates list.
 */
obj_read_lock();
repo_read_gitmodules(subrepo, 0);

Back when I wrote this comment, my conclusion was that the alternates
mechanics were the only thread-unsafe object-reading operations in
repo_read_gitmodules()'s call chains. So once the add-to-alternates
mechanics were gone, we could also remove the lock.

But with further inspection now, I see that this is not really the
case. For example, we have a few global variables in packfile.c
collecting some statistics (pack_mmap_calls, pack_open_windows, etc.)
which are updated on obj readings from both the_repository *and*
submodules. So I no longer think its safe to remove the
obj_read_lock() protection here, as the NEEDSWORK comment suggests,
even if we are not using the alternates list anymore.

Do you want to remove this comment in your patchset? I can also send a
follow-up patch explaining this situation and removing the comment
(but not the locking), if you prefer.

> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
[...]
> diff --git a/submodule-config.c b/submodule-config.c
> index 2026120fb3..f95344028b 100644
> --- a/submodule-config.c
> +++ b/submodule-config.c
> @@ -649,9 +649,10 @@ static void config_from_gitmodules(config_fn_t fn, struct repository *repo, void
>                         config_source.file = file;
>                 } else if (repo_get_oid(repo, GITMODULES_INDEX, &oid) >= 0 ||
>                            repo_get_oid(repo, GITMODULES_HEAD, &oid) >= 0) {
> +                       config_source.repo = repo;
>                         config_source.blob = oidstr = xstrdup(oid_to_hex(&oid));
>                         if (repo != the_repository)
> -                               add_to_alternates_memory(repo->objects->odb->path);
> +                               add_submodule_odb_by_path(repo->objects->odb->path);

Ok. Like in grep_submodule(), this should no longer add the submodule
ODB to the alternates list, so this call is now mostly used as a
fallback and also for testing.

To see if we are indeed testing this add-to-alternates case, I
reverted the change that made the code read from the submodule instead
of the_repository:

diff --git a/config.c b/config.c
index a85c12e6cc..cd37a9dcd9 100644
--- a/config.c
+++ b/config.c
@@ -1805,7 +1805,7 @@ int git_config_from_blob_oid(config_fn_t fn,
        unsigned long size;
        int ret;

-       buf = repo_read_object_file(repo, oid, &type, &size);
+       buf = read_object_file(oid, &type, &size);

Then, I ran t7814-grep-recurse-submodules.sh , where you've added the
GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB=1 envvar. This correctly
produced the following error:

BUG: submodule.c:205: register_all_submodule_odb_as_alternates() called
[...]
not ok 23 - grep --recurse-submodules with submodules without
.gitmodules in the working tree

Nice! So the change made by this patch is covered by test 23. I think
it would be nice to mention that in this patch's message.
Matheus Tavares Aug. 16, 2021, 3:48 p.m. UTC | #2
On Fri, Aug 13, 2021 at 6:05 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>

Oops, I accidentally deleted this part in my previous reply:

> diff --git a/config.c b/config.c
> index f33abeab85..a85c12e6cc 100644
> --- a/config.c
> +++ b/config.c
> @@ -1820,6 +1821,7 @@ int git_config_from_blob_oid(config_fn_t fn,
>  }
>
>  static int git_config_from_blob_ref(config_fn_t fn,
> +                                   struct repository *repo,
>                                     const char *name,
>                                     void *data)
>  {
> @@ -1827,7 +1829,7 @@ static int git_config_from_blob_ref(config_fn_t fn,
>
>         if (get_oid(name, &oid) < 0)

This should be `repo_get_oid(repo, ...)` now.

>                 return error(_("unable to resolve config blob '%s'"), name);
> -       return git_config_from_blob_oid(fn, name, &oid, data);
> +       return git_config_from_blob_oid(fn, name, repo, &oid, data);
>  }
Matheus Tavares Aug. 16, 2021, 7:57 p.m. UTC | #3
On Mon, Aug 16, 2021 at 11:32 AM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
>
> On Fri, Aug 13, 2021 at 6:05 PM Jonathan Tan <jonathantanmy@google.com> wrote:
> >
> > When reading the config of a submodule, if reading from a blob, read
> > using an explicitly specified repository instead of by adding the
> > submodule's ODB as an alternate and then reading an object from
> > the_repository.
>
> Great!
>
> At first, I thought this would also allow us to remove another
> NEEDSWORK comment in grep_submodule(), together with a lock
> protection:
>
> /*
>  * NEEDSWORK: repo_read_gitmodules() might call
>  * add_to_alternates_memory() via config_from_gitmodules(). This
>  * operation causes a race condition with concurrent object readings
>  * performed by the worker threads. That's why we need obj_read_lock()
>  * here. It should be removed once it's no longer necessary to add the
>  * subrepo's odbs to the in-memory alternates list.
>  */
> obj_read_lock();
> repo_read_gitmodules(subrepo, 0);
>
> Back when I wrote this comment, my conclusion was that the alternates
> mechanics were the only thread-unsafe object-reading operations in
> repo_read_gitmodules()'s call chains. So once the add-to-alternates
> mechanics were gone, we could also remove the lock.
>
> But with further inspection now, I see that this is not really the
> case. For example, we have a few global variables in packfile.c
> collecting some statistics (pack_mmap_calls, pack_open_windows, etc.)
> which are updated on obj readings from both the_repository *and*
> submodules.

Sorry, this is incorrect. I forgot that repo_read_object_file() (which
is part of repo_read_gitmodules()'s call chain) also acquires the
obj_read_mutex before accessing those global variables. So the
NEEDSWORK might be right.

Nevertheless, I think it might be better to look into
repo_read_gitmodules() more carefully before removing this lock. And
this is something for another series. Sorry about the noise.
Jonathan Tan Aug. 16, 2021, 8:02 p.m. UTC | #4
> /*
>  * NEEDSWORK: repo_read_gitmodules() might call
>  * add_to_alternates_memory() via config_from_gitmodules(). This
>  * operation causes a race condition with concurrent object readings
>  * performed by the worker threads. That's why we need obj_read_lock()
>  * here. It should be removed once it's no longer necessary to add the
>  * subrepo's odbs to the in-memory alternates list.
>  */
> obj_read_lock();
> repo_read_gitmodules(subrepo, 0);
> 
> Back when I wrote this comment, my conclusion was that the alternates
> mechanics were the only thread-unsafe object-reading operations in
> repo_read_gitmodules()'s call chains. So once the add-to-alternates
> mechanics were gone, we could also remove the lock.
> 
> But with further inspection now, I see that this is not really the
> case. For example, we have a few global variables in packfile.c
> collecting some statistics (pack_mmap_calls, pack_open_windows, etc.)
> which are updated on obj readings from both the_repository *and*
> submodules. So I no longer think its safe to remove the
> obj_read_lock() protection here, as the NEEDSWORK comment suggests,
> even if we are not using the alternates list anymore.
> 
> Do you want to remove this comment in your patchset? I can also send a
> follow-up patch explaining this situation and removing the comment
> (but not the locking), if you prefer.

I think you can make the patch yourself - the comment change you
describe seems unrelated to this patch set.

> Ok. Like in grep_submodule(), this should no longer add the submodule
> ODB to the alternates list, so this call is now mostly used as a
> fallback and also for testing.
> 
> To see if we are indeed testing this add-to-alternates case, I
> reverted the change that made the code read from the submodule instead
> of the_repository:
> 
> diff --git a/config.c b/config.c
> index a85c12e6cc..cd37a9dcd9 100644
> --- a/config.c
> +++ b/config.c
> @@ -1805,7 +1805,7 @@ int git_config_from_blob_oid(config_fn_t fn,
>         unsigned long size;
>         int ret;
> 
> -       buf = repo_read_object_file(repo, oid, &type, &size);
> +       buf = read_object_file(oid, &type, &size);
> 
> Then, I ran t7814-grep-recurse-submodules.sh , where you've added the
> GIT_TEST_FATAL_REGISTER_SUBMODULE_ODB=1 envvar. This correctly
> produced the following error:
> 
> BUG: submodule.c:205: register_all_submodule_odb_as_alternates() called
> [...]
> not ok 23 - grep --recurse-submodules with submodules without
> .gitmodules in the working tree
> 
> Nice! So the change made by this patch is covered by test 23. I think
> it would be nice to mention that in this patch's message.

Thanks for checking this. I'll mention this in the commit message.
Jonathan Tan Aug. 16, 2021, 8:09 p.m. UTC | #5
> > @@ -1827,7 +1829,7 @@ static int git_config_from_blob_ref(config_fn_t fn,
> >
> >         if (get_oid(name, &oid) < 0)
> 
> This should be `repo_get_oid(repo, ...)` now.

Ah, good catch! This wasn't caught by the tests because the submodule
config mechanism always passes a full-length hexadecimal string hash as
"name" - and probably would never be caught because
git_config_from_blob_ref() is only called from config_with_options(),
which is called with non-NULL source from only 2 files:
submodule-config.c (this one) and builtin/config.c (which most likely
will never operate on any repo other than the_repository). I'll refactor
the API to avoid this situation in the first place.
Jonathan Tan Aug. 16, 2021, 8:57 p.m. UTC | #6
> > > @@ -1827,7 +1829,7 @@ static int git_config_from_blob_ref(config_fn_t fn,
> > >
> > >         if (get_oid(name, &oid) < 0)
> > 
> > This should be `repo_get_oid(repo, ...)` now.
> 
> Ah, good catch! This wasn't caught by the tests because the submodule
> config mechanism always passes a full-length hexadecimal string hash as
> "name" - and probably would never be caught because
> git_config_from_blob_ref() is only called from config_with_options(),
> which is called with non-NULL source from only 2 files:
> submodule-config.c (this one) and builtin/config.c (which most likely
> will never operate on any repo other than the_repository). I'll refactor
> the API to avoid this situation in the first place.

The refactoring I was thinking of was parsing the OID in
builtin/config.c and then passing only the OID (instead of a
user-provided string that could contain anything), but that doesn't
really work because the user-provided string is used in certain outputs.
I'll just change it to repo_get_oid() in this patch.
diff mbox series

Patch

diff --git a/config.c b/config.c
index f33abeab85..a85c12e6cc 100644
--- a/config.c
+++ b/config.c
@@ -1796,6 +1796,7 @@  int git_config_from_mem(config_fn_t fn,
 
 int git_config_from_blob_oid(config_fn_t fn,
 			      const char *name,
+			      struct repository *repo,
 			      const struct object_id *oid,
 			      void *data)
 {
@@ -1804,7 +1805,7 @@  int git_config_from_blob_oid(config_fn_t fn,
 	unsigned long size;
 	int ret;
 
-	buf = read_object_file(oid, &type, &size);
+	buf = repo_read_object_file(repo, oid, &type, &size);
 	if (!buf)
 		return error(_("unable to load config blob object '%s'"), name);
 	if (type != OBJ_BLOB) {
@@ -1820,6 +1821,7 @@  int git_config_from_blob_oid(config_fn_t fn,
 }
 
 static int git_config_from_blob_ref(config_fn_t fn,
+				    struct repository *repo,
 				    const char *name,
 				    void *data)
 {
@@ -1827,7 +1829,7 @@  static int git_config_from_blob_ref(config_fn_t fn,
 
 	if (get_oid(name, &oid) < 0)
 		return error(_("unable to resolve config blob '%s'"), name);
-	return git_config_from_blob_oid(fn, name, &oid, data);
+	return git_config_from_blob_oid(fn, name, repo, &oid, data);
 }
 
 char *git_system_config(void)
@@ -1958,12 +1960,16 @@  int config_with_options(config_fn_t fn, void *data,
 	 * If we have a specific filename, use it. Otherwise, follow the
 	 * regular lookup sequence.
 	 */
-	if (config_source && config_source->use_stdin)
+	if (config_source && config_source->use_stdin) {
 		return git_config_from_stdin(fn, data);
-	else if (config_source && config_source->file)
+	} else if (config_source && config_source->file) {
 		return git_config_from_file(fn, config_source->file, data);
-	else if (config_source && config_source->blob)
-		return git_config_from_blob_ref(fn, config_source->blob, data);
+	} else if (config_source && config_source->blob) {
+		struct repository *repo = config_source->repo ?
+			config_source->repo : the_repository;
+		return git_config_from_blob_ref(fn, repo, config_source->blob,
+						data);
+	}
 
 	return do_git_config_sequence(opts, fn, data);
 }
diff --git a/config.h b/config.h
index a2200f3111..147f5e0490 100644
--- a/config.h
+++ b/config.h
@@ -49,6 +49,8 @@  const char *config_scope_name(enum config_scope scope);
 struct git_config_source {
 	unsigned int use_stdin:1;
 	const char *file;
+	/* The repository if blob is not NULL; leave blank for the_repository */
+	struct repository *repo;
 	const char *blob;
 	enum config_scope scope;
 };
@@ -136,6 +138,7 @@  int git_config_from_mem(config_fn_t fn,
 			const char *buf, size_t len,
 			void *data, const struct config_options *opts);
 int git_config_from_blob_oid(config_fn_t fn, const char *name,
+			     struct repository *repo,
 			     const struct object_id *oid, void *data);
 void git_config_push_parameter(const char *text);
 void git_config_push_env(const char *spec);
diff --git a/submodule-config.c b/submodule-config.c
index 2026120fb3..f95344028b 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -649,9 +649,10 @@  static void config_from_gitmodules(config_fn_t fn, struct repository *repo, void
 			config_source.file = file;
 		} else if (repo_get_oid(repo, GITMODULES_INDEX, &oid) >= 0 ||
 			   repo_get_oid(repo, GITMODULES_HEAD, &oid) >= 0) {
+			config_source.repo = repo;
 			config_source.blob = oidstr = xstrdup(oid_to_hex(&oid));
 			if (repo != the_repository)
-				add_to_alternates_memory(repo->objects->odb->path);
+				add_submodule_odb_by_path(repo->objects->odb->path);
 		} else {
 			goto out;
 		}
@@ -702,7 +703,7 @@  void gitmodules_config_oid(const struct object_id *commit_oid)
 
 	if (gitmodule_oid_from_commit(commit_oid, &oid, &rev)) {
 		git_config_from_blob_oid(gitmodules_cb, rev.buf,
-					 &oid, the_repository);
+					 the_repository, &oid, the_repository);
 	}
 	strbuf_release(&rev);