[1/2] config: allow config_with_options() to handle any repo
diff mbox series

Message ID 4920d3c474375abb39ed163c5ed6138a5e5dccc6.1566863604.git.matheus.bernardino@usp.br
State New
Headers show
Series
  • config: make config_with_options() handle any repo
Related show

Commit Message

Matheus Tavares Bernardino Aug. 26, 2019, 11:57 p.m. UTC
Currently, config_with_options() relies on the global the_repository
when it has to configure from a blob. A possible way to bypass this
limitation, when working with arbitrary repos, is to add their object
directories into the in-memory alternates list. That's what
submodule-config.c::config_from_gitmodules() does, for example. But this
approach can negatively affect performance and memory. So introduce
repo_config_with_options() which takes a struct repository as an
additional parameter and pass it down in the call stack. Also, leave the
original config_with_options() as a macro behind
NO_THE_REPOSITORY_COMPATIBILITY_MACROS.

Finally, adjust documentation and add a rule to coccinelle to reflect
the change.

Note: the following patch will take care of actually using the added
function in place of config_with_options() at config_from_gitmodules().

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 Documentation/technical/api-config.txt        |  5 +++-
 config.c                                      | 26 +++++++++----------
 config.h                                      | 16 ++++++++----
 .../coccinelle/the_repository.pending.cocci   | 11 ++++++++
 submodule-config.c                            |  2 +-
 5 files changed, 39 insertions(+), 21 deletions(-)

Comments

Duy Nguyen Aug. 27, 2019, 9:26 a.m. UTC | #1
On Tue, Aug 27, 2019 at 6:57 AM Matheus Tavares
<matheus.bernardino@usp.br> wrote:
>
> Currently, config_with_options() relies on the global the_repository
> when it has to configure from a blob.

Not really reading the patch, but my last experience with moving
config.c away from the_repo [1] shows that there are more hidden
dependencies, in git_path() and particularly the git_config_clear()
call in git_config_set_multivar_... Not really sure if those deps
really affect your goals or not. Have a look at that branch, filtering
on config.c for more info (and if you want to pick up some patches
from that, you have my sign-off).

[1] https://gitlab.com/pclouds/git/commits/submodules-in-worktrees

--
Duy
Matheus Tavares Bernardino Aug. 27, 2019, 11:46 p.m. UTC | #2
Hi, Duy

On Tue, Aug 27, 2019 at 6:26 AM Duy Nguyen <pclouds@gmail.com> wrote:
>
> On Tue, Aug 27, 2019 at 6:57 AM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
> >
> > Currently, config_with_options() relies on the global the_repository
> > when it has to configure from a blob.
>
> Not really reading the patch, but my last experience with moving
> config.c away from the_repo [1] shows that there are more hidden
> dependencies, in git_path() and particularly the git_config_clear()
> call in git_config_set_multivar_... Not really sure if those deps
> really affect your goals or not. Have a look at that branch, filtering
> on config.c for more info (and if you want to pick up some patches
> from that, you have my sign-off).

Thanks for the advice. Indeed, I see now that do_git_config_sequence()
may call git_pathdup(), which relies on the_repo. For my use in patch
2/2, repo_config_with_options() won't ever get to call
do_git_config_sequence(), so that's fine. But in other use cases it
may have to, so I'll need to check that.

> [1] https://gitlab.com/pclouds/git/commits/submodules-in-worktrees
>
> --
> Duy
Matheus Tavares Bernardino Aug. 29, 2019, 4:24 a.m. UTC | #3
On Tue, Aug 27, 2019 at 8:46 PM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
>
> Hi, Duy
>
> On Tue, Aug 27, 2019 at 6:26 AM Duy Nguyen <pclouds@gmail.com> wrote:
> >
> > On Tue, Aug 27, 2019 at 6:57 AM Matheus Tavares
> > <matheus.bernardino@usp.br> wrote:
> > >
> > > Currently, config_with_options() relies on the global the_repository
> > > when it has to configure from a blob.
> >
> > Not really reading the patch, but my last experience with moving
> > config.c away from the_repo [1] shows that there are more hidden
> > dependencies, in git_path() and particularly the git_config_clear()
> > call in git_config_set_multivar_... Not really sure if those deps
> > really affect your goals or not. Have a look at that branch, filtering
> > on config.c for more info (and if you want to pick up some patches
> > from that, you have my sign-off).
>
> Thanks for the advice. Indeed, I see now that do_git_config_sequence()
> may call git_pathdup(), which relies on the_repo. For my use in patch
> 2/2, repo_config_with_options() won't ever get to call
> do_git_config_sequence(), so that's fine. But in other use cases it
> may have to, so I'll need to check that.

While working on this, I think I may have found a bug: The
repo_read_config() function takes a repository R as parameter and
calls this chain of functions:

repo_read_config(struct repository *R) > config_with_options() >
do_git_config_sequence() > git_pathdup("config.worktree")

Shouldn't, however, the last call consider R instead of using
the_repository? i.e., use repo_git_path(R, "config.worktree"),
instead?

If so, how could we get R there? I mean, we could pass it through this
chain, but the chain already passes a "struct config_options", which
carries the "commondir" and "git_dir" fields. So it would probably be
confusing to have them and an extra repository parameter (which also
has "commondir" and "git_dir"), right? Any ideas on how to better
approach this?

> > [1] https://gitlab.com/pclouds/git/commits/submodules-in-worktrees
> >
> > --
> > Duy
Duy Nguyen Aug. 29, 2019, 9:31 a.m. UTC | #4
On Thu, Aug 29, 2019 at 11:24 AM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
>
> On Tue, Aug 27, 2019 at 8:46 PM Matheus Tavares Bernardino
> <matheus.bernardino@usp.br> wrote:
> >
> > Hi, Duy
> >
> > On Tue, Aug 27, 2019 at 6:26 AM Duy Nguyen <pclouds@gmail.com> wrote:
> > >
> > > On Tue, Aug 27, 2019 at 6:57 AM Matheus Tavares
> > > <matheus.bernardino@usp.br> wrote:
> > > >
> > > > Currently, config_with_options() relies on the global the_repository
> > > > when it has to configure from a blob.
> > >
> > > Not really reading the patch, but my last experience with moving
> > > config.c away from the_repo [1] shows that there are more hidden
> > > dependencies, in git_path() and particularly the git_config_clear()
> > > call in git_config_set_multivar_... Not really sure if those deps
> > > really affect your goals or not. Have a look at that branch, filtering
> > > on config.c for more info (and if you want to pick up some patches
> > > from that, you have my sign-off).
> >
> > Thanks for the advice. Indeed, I see now that do_git_config_sequence()
> > may call git_pathdup(), which relies on the_repo. For my use in patch
> > 2/2, repo_config_with_options() won't ever get to call
> > do_git_config_sequence(), so that's fine. But in other use cases it
> > may have to, so I'll need to check that.
>
> While working on this, I think I may have found a bug: The
> repo_read_config() function takes a repository R as parameter and
> calls this chain of functions:
>
> repo_read_config(struct repository *R) > config_with_options() >
> do_git_config_sequence() > git_pathdup("config.worktree")
>
> Shouldn't, however, the last call consider R instead of using
> the_repository? i.e., use repo_git_path(R, "config.worktree"),
> instead?

Yes. You just found one of the plenty traps because the_repository is
still hidden in many core functions.

> If so, how could we get R there? I mean, we could pass it through this
> chain, but the chain already passes a "struct config_options", which
> carries the "commondir" and "git_dir" fields. So it would probably be
> confusing to have them and an extra repository parameter (which also
> has "commondir" and "git_dir"), right? Any ideas on how to better
> approach this?

I would change 'struct config_options' to carry 'struct repository'
which also contains git_dir and other info inside. Though I have no
idea how big that change would be (didn't check the code). Config code
relies on plenty callbacks without "void *cb_data" so relying on
global state is the only way in some cases.
Jeff King Aug. 29, 2019, 2 p.m. UTC | #5
On Thu, Aug 29, 2019 at 04:31:34PM +0700, Duy Nguyen wrote:

> > If so, how could we get R there? I mean, we could pass it through this
> > chain, but the chain already passes a "struct config_options", which
> > carries the "commondir" and "git_dir" fields. So it would probably be
> > confusing to have them and an extra repository parameter (which also
> > has "commondir" and "git_dir"), right? Any ideas on how to better
> > approach this?
> 
> I would change 'struct config_options' to carry 'struct repository'
> which also contains git_dir and other info inside. Though I have no
> idea how big that change would be (didn't check the code). Config code
> relies on plenty callbacks without "void *cb_data" so relying on
> global state is the only way in some cases.

I'm not sure about that, at least for this particular git_pathdup(). We
pass along the git_dir because we might not have a repository struct yet
(i.e., when reading config before repo discovery has happened).

So it might be that this case should actually be making a path out of
$git_dir/config.worktree (but I'm not 100% sure, as I don't know the ins
and outs of worktree config files).

I'm sure there are other gotchas in the config code, though, related to
things for which we _do_ need a repository. E.g., include_by_branch()
looks at the_repository, and should use a repository struct matching the
git_dir we're looking at (though it may be acceptable to bail during
early pre-repo-initialization config and just disallow branch includes,
which is what happens now).

-Peff
Matheus Tavares Bernardino Aug. 29, 2019, 4:44 p.m. UTC | #6
On Thu, Aug 29, 2019 at 11:00 AM Jeff King <peff@peff.net> wrote:
>
> On Thu, Aug 29, 2019 at 04:31:34PM +0700, Duy Nguyen wrote:
>
> > > If so, how could we get R there? I mean, we could pass it through this
> > > chain, but the chain already passes a "struct config_options", which
> > > carries the "commondir" and "git_dir" fields. So it would probably be
> > > confusing to have them and an extra repository parameter (which also
> > > has "commondir" and "git_dir"), right? Any ideas on how to better
> > > approach this?
> >
> > I would change 'struct config_options' to carry 'struct repository'
> > which also contains git_dir and other info inside. Though I have no
> > idea how big that change would be (didn't check the code). Config code
> > relies on plenty callbacks without "void *cb_data" so relying on
> > global state is the only way in some cases.
>
> I'm not sure about that, at least for this particular git_pathdup(). We
> pass along the git_dir because we might not have a repository struct yet
> (i.e., when reading config before repo discovery has happened).

Yes, I think read_early_config(), for example, may call
config_with_options() before the_repo is initialized.

> So it might be that this case should actually be making a path out of
> $git_dir/config.worktree (but I'm not 100% sure, as I don't know the ins
> and outs of worktree config files).

Makes sense, config.worktree files are per-worktree, which have
different git_dir's, right?

> I'm sure there are other gotchas in the config code, though, related to
> things for which we _do_ need a repository. E.g., include_by_branch()
> looks at the_repository, and should use a repository struct matching the
> git_dir we're looking at (though it may be acceptable to bail during
> early pre-repo-initialization config and just disallow branch includes,
> which is what happens now).

I think config_with_options() is another example of a place where we
should have a reference to a repo (but we currently don't). When
configuring from a given blob, it will call
git_config_from_blob_ref(), which calls get_oid() and
read_object_file(). Both of these functions will use the_repo by
default. But the git_dir and commondir fields passed to
config_with_options() through 'struct config_options' may not refer to
the_repo, right?

I'm not sure what is the best solution to this, though. I mean, we
could add a 'struct repository' in 'struct config_options', but as you
already pointed out, some callers might not have a repository struct
yet...

> -Peff
Duy Nguyen Aug. 30, 2019, 9:09 a.m. UTC | #7
On Thu, Aug 29, 2019 at 11:44 PM Matheus Tavares Bernardino
<matheus.bernardino@usp.br> wrote:
> > I'm sure there are other gotchas in the config code, though, related to
> > things for which we _do_ need a repository. E.g., include_by_branch()
> > looks at the_repository, and should use a repository struct matching the
> > git_dir we're looking at (though it may be acceptable to bail during
> > early pre-repo-initialization config and just disallow branch includes,
> > which is what happens now).
>
> I think config_with_options() is another example of a place where we
> should have a reference to a repo (but we currently don't). When
> configuring from a given blob, it will call
> git_config_from_blob_ref(), which calls get_oid() and
> read_object_file(). Both of these functions will use the_repo by
> default. But the git_dir and commondir fields passed to
> config_with_options() through 'struct config_options' may not refer to
> the_repo, right?
>
> I'm not sure what is the best solution to this, though. I mean, we
> could add a 'struct repository' in 'struct config_options', but as you
> already pointed out, some callers might not have a repository struct
> yet...

Early setup code has always been special (there's a lot of stuff you
don't have access too). Ideally we could have a lower level API that
takes git_dir and git_commondir only, no access to 'struct
repository'. This is used for early access. And we have a higher level
API that only takes struct repo and pass repo->gitdir down to the that
lowlevel one. But I guess that's not the reality we're in.

Since early setup code is special, perhaps you could make 'struct
config_options' take both git_dir and struct repo, but not both at the
same time. Early setup code sets repo to NULL and git_dir something
else. Other code always leave git_dir and git_common_dir to NULL,
documented to say those are for early setup only.

PS. Again still not looking at the code so I may just be talking rubbish here.

Patch
diff mbox series

diff --git a/Documentation/technical/api-config.txt b/Documentation/technical/api-config.txt
index 7d20716c32..ad99353df8 100644
--- a/Documentation/technical/api-config.txt
+++ b/Documentation/technical/api-config.txt
@@ -47,7 +47,7 @@  will first feed the user-wide one to the callback, and then the
 repo-specific one; by overwriting, the higher-priority repo-specific
 value is left at the end).
 
-The `config_with_options` function lets the caller examine config
+The `repo_config_with_options` function lets the caller examine config
 while adjusting some of the default behavior of `git_config`. It should
 almost never be used by "regular" Git code that is looking up
 configuration variables. It is intended for advanced callers like
@@ -65,6 +65,9 @@  Specify options to adjust the behavior of parsing config files. See `struct
 config_options` in `config.h` for details. As an example: regular `git_config`
 sets `opts.respect_includes` to `1` by default.
 
+The `config_with_options` macro is provided as an alias for
+`repo_config_with_options`, using 'the_repository' by default.
+
 Reading Specific Files
 ----------------------
 
diff --git a/config.c b/config.c
index 3900e4947b..dc116f2582 100644
--- a/config.c
+++ b/config.c
@@ -1624,17 +1624,16 @@  int git_config_from_mem(config_fn_t fn,
 	return do_config_from(&top, fn, data, opts);
 }
 
-int git_config_from_blob_oid(config_fn_t fn,
-			      const char *name,
-			      const struct object_id *oid,
-			      void *data)
+int git_config_from_blob_oid(struct repository *r, config_fn_t fn,
+			     const char *name, const struct object_id *oid,
+			     void *data)
 {
 	enum object_type type;
 	char *buf;
 	unsigned long size;
 	int ret;
 
-	buf = read_object_file(oid, &type, &size);
+	buf = repo_read_object_file(r, oid, &type, &size);
 	if (!buf)
 		return error(_("unable to load config blob object '%s'"), name);
 	if (type != OBJ_BLOB) {
@@ -1649,15 +1648,14 @@  int git_config_from_blob_oid(config_fn_t fn,
 	return ret;
 }
 
-static int git_config_from_blob_ref(config_fn_t fn,
-				    const char *name,
-				    void *data)
+static int git_config_from_blob_ref(struct repository *r, config_fn_t fn,
+				    const char *name, void *data)
 {
 	struct object_id oid;
 
-	if (get_oid(name, &oid) < 0)
+	if (repo_get_oid(r, name, &oid) < 0)
 		return error(_("unable to resolve config blob '%s'"), name);
-	return git_config_from_blob_oid(fn, name, &oid, data);
+	return git_config_from_blob_oid(r, fn, name, &oid, data);
 }
 
 const char *git_etc_gitconfig(void)
@@ -1751,9 +1749,9 @@  static int do_git_config_sequence(const struct config_options *opts,
 	return ret;
 }
 
-int config_with_options(config_fn_t fn, void *data,
-			struct git_config_source *config_source,
-			const struct config_options *opts)
+int repo_config_with_options(struct repository *r, config_fn_t fn, void *data,
+			     struct git_config_source *config_source,
+			     const struct config_options *opts)
 {
 	struct config_include_data inc = CONFIG_INCLUDE_INIT;
 
@@ -1774,7 +1772,7 @@  int config_with_options(config_fn_t fn, void *data,
 	else if (config_source && config_source->file)
 		return git_config_from_file(fn, config_source->file, data);
 	else if (config_source && config_source->blob)
-		return git_config_from_blob_ref(fn, config_source->blob, data);
+		return git_config_from_blob_ref(r, fn, config_source->blob, data);
 
 	return do_git_config_sequence(opts, fn, data);
 }
diff --git a/config.h b/config.h
index f0ed464004..33ce46ecc3 100644
--- a/config.h
+++ b/config.h
@@ -82,16 +82,22 @@  int git_config_from_mem(config_fn_t fn,
 			const char *name,
 			const char *buf, size_t len,
 			void *data, const struct config_options *opts);
-int git_config_from_blob_oid(config_fn_t fn, const char *name,
-			     const struct object_id *oid, void *data);
+int git_config_from_blob_oid(struct repository *r, config_fn_t fn,
+			     const char *name, const struct object_id *oid,
+			     void *data);
 void git_config_push_parameter(const char *text);
 int git_config_from_parameters(config_fn_t fn, void *data);
 void read_early_config(config_fn_t cb, void *data);
 void read_very_early_config(config_fn_t cb, void *data);
 void git_config(config_fn_t fn, void *);
-int config_with_options(config_fn_t fn, void *,
-			struct git_config_source *config_source,
-			const struct config_options *opts);
+int repo_config_with_options(struct repository *r, config_fn_t fn, void *data,
+			     struct git_config_source *config_source,
+			     const struct config_options *opts);
+#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
+#define config_with_options(fn, data, config_source, opts) \
+	repo_config_with_options(the_repository, fn, data, config_source, opts)
+#endif
+
 int git_parse_ssize_t(const char *, ssize_t *);
 int git_parse_ulong(const char *, unsigned long *);
 int git_parse_maybe_bool(const char *);
diff --git a/contrib/coccinelle/the_repository.pending.cocci b/contrib/coccinelle/the_repository.pending.cocci
index 2ee702ecf7..53ecc975bf 100644
--- a/contrib/coccinelle/the_repository.pending.cocci
+++ b/contrib/coccinelle/the_repository.pending.cocci
@@ -142,3 +142,14 @@  expression H;
 - format_commit_message(
 + repo_format_commit_message(the_repository,
   E, F, G, H);
+
+@@
+expression E;
+expression F;
+expression G;
+expression H;
+@@
+- config_with_options(
++ repo_config_with_options(the_repository,
+  E, F, G, H);
+
diff --git a/submodule-config.c b/submodule-config.c
index 4264ee216f..1d28b17071 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -681,7 +681,7 @@  void gitmodules_config_oid(const struct object_id *commit_oid)
 	submodule_cache_check_init(the_repository);
 
 	if (gitmodule_oid_from_commit(commit_oid, &oid, &rev)) {
-		git_config_from_blob_oid(gitmodules_cb, rev.buf,
+		git_config_from_blob_oid(the_repository, gitmodules_cb, rev.buf,
 					 &oid, the_repository);
 	}
 	strbuf_release(&rev);